16 Sep 2014

What's the big deal about metadata?

8:20 am on 16 September 2014

Metadata is all the rage these days. You’ve probably heard it come from the mouths of politicians and reporters in the past couple of days. It’s well on the path to becoming an amusingly misused cliché, but it’s what it is and why it’s important is still worth knowing.

At its most basic, data enables decisions. What to buy, what to sell, where to shop – you’re analysing data. And one of the data sources with the least restrictions on its use is metadata. Analysis of metadata enables businesses to target products and governments to target criminals. Being able to carry out this analysis also lends itself to mass surveillance.

No caption

Photo: Rhiannon Josland

So, what is metadata?

Assuming the description you’ve might have heard, “data about data” hasn’t helped much, one way to describe metadata is that it’s the context.

Let’s imagine you’re writing an assignment. The data is what you’re being marked on. The metadata is information like course code on the front page and number of pages you’ve written for your assignment. The metadata is the context that makes the thing you’re interested in valuable or meaningful. There’s no point trying to mark someone’s work if we don’t know which course they’re taking.

If you’re sending someone a snap, the Snapchat app wants to know the sender and recipient accounts, the time to display the photo for. That’s all metadata. The data is, well, the snap.

If you suspect that your partner is cheating on you, you don’t necessarily need to read the text message to find it odd that they’re receiving them after 11pm.

One other aspect of metadata is that it is incidental. If you send a tweet, you create metadata like when it was sent, which network you’re on and the device it was sent from. Companies are trying hard to create a profile about you, your interests and your buying potential. We can guess that an iPhone user may be more willing to spend money than someone running an old Android tablet. Twitter can know for sure. That’s to the metadata it collects, it knows exactly who clicks what.

If a company is offering you a free service, either via a website or an app, they’ll typically sell information about their users to advertisers. This information is much more specialised than what might be available via radio or TV, where statements such as “19 per cent of viewers are male between the ages of 34-59” and “12 per cent of our users are from Auckland” are common. A social network will also be able to tell advertisers who is most interested in particular topics, who will influence those people and which times in the day they are most likely to click on ads. Those profiles are built on metadata.

Remind me why it’s important?

Governments may not need to know the contents of messages to be able to guess pretty well what people are talking about. Without listening in to the call itself, phoning a sex hotline, the urologist or a holiday resort is fairly telling. If you suspect that your partner is cheating on you, you don’t necessarily need to read the text message to find it odd that they’re receiving them after 11pm.

What governments are hunting for are unfamiliar patterns. Perhaps a pre-paid cell phone that has never been used is called fairly often from phone boxes. Perhaps an IP address associated with fraudulent credit card transactions is being used for VOIP calls to phone numbers associated with money laundering. They’re hunting for criminals.

The problem is that the systems developed to hunt for criminals require pervasive surveillance of every communication. For the detective work to work, there needs to be complete coverage.

For electronic communications, all of this metadata is very cheap for a government to collect. The internet’s physical infrastructure relies on a relatively small number of powerful links between parts of the world. If you can wiretap the fiber optic cables or hack the satellite company, then you have access to the metadata of most of the communications in the world.

 If you’re a government looking for terrorists, you get to look for the needle in the haystack with a stronger metal detector.

The hardware that sits on top of those cables (and the software that sits on top of that) is vulnerable too. When someone says they are building a service in the cloud, what they mean is they’re renting computers that are housed in a large warehouse. Contrast this with each company having its own own small room of computers. Using large, centralised server warehouses is more cost efficient for businesses. It’s also more cost efficient for government surveillance, as there’s now a much smaller number of points that need to be monitored.

If you use a commercial messaging app, say What’s App, Facebook Messenger, Viber (does anyone still use MSN Messenger?), you are also making it easier for governments to track what you’re doing. Centralised services provide convenience, but they create single points of failure. 

Collection and mining of metadata isn’t restricted to clandestine methods either. The link shortening site Bitly is open about the fact that its revenue comes from analysing what links are being shared and visited.

Messaging metadata creates patterns about who influences whom. That’s really important information if you’re trying to understand the dynamics of a crime syndicate, a child pornography ring, or a sports team. By monitoring a group’s interactions, you’re able to find its leaders. Once you find the group’s leaders, you know who to buy off, take down or support.

Metadata collection isn’t all bad. Google Maps is able to tell you an estimated travel time, based in part on the position information sent by drivers ahead of you. When traffic halts, the road turns red on your screen.

Algorithms for conducting this sort of analysis are always improving. That creates an incentive for people who are monitoring to store everything indefinitely. When a new version of the analysis tool comes out, they’re able to rerun the analysis on the historical data. If you’re a government looking for terrorists, you get to look for the needle in the haystack with a stronger metal detector.

[image:171393:full]
Even your cute cat photos are storing a tonne of metadata

What do I do? 

Almost everything that you interact with records how you use it. Just having your cell phone turned on generates metadata. Your phone sends signals to cell towers on a regular basis, even between calls. For developers making apps for your smartphone, gathering usage data is often essential to maintain and improve your service.

App developers want to provide you with features that you actually use. If they detect that a part of their app isn’t being used heavily, they may take it away from the next version. This will simplify the application and make the rest of it even easier to use.

Be deliberate about the choices you make with technology. Prefer HTTPS websites over HTTP. The S stands for secure. If you want to have a private conversation with someone, then have a conversation with them. If you’re an aspiring journalist, don’t use channels like email to chat to sources who may be putting themselves in a compromising position.

There is little you can do to influence the collection of metadata. But you do have control over the acceptable use of that data, such as its storage and analysis. You don’t need to accept that metadata collected to keep a service running is used for wider purposes that you don’t agree with.

For commercial services, the decision is easy. Don’t use the service. Use a decentralised alternative. If you don’t know what this means, the Internet makes it easy to learn.

For government, your impact is less direct but more powerful. You live in a democracy and your voice is heard loudly. You are an important part of a democratic society. Society constructs laws according to your will. It is fear and silence that enables agencies to expand their surveillance net. For them coverage is never complete.

This content is brought to you with funding assistance from New Zealand On Air.