Hiding from big data

IT security: With the increasing commercial use of personal data and multiple security breaches, will people pay for privacy?

Jun 5th 2014

AS THE chorus of Twisted Sister’s 1984 anti-authority hit “We’re Not Gonna Take It” faded, Aral Balkan got ready to launch his latest project from the main stage at Handheld, a small technology-design conference held in Cardiff, Wales, last November. The entrepreneur’s big idea? A phone that flies in the face of a consumer-technology industry transfixed by big data and how to make money from it.

Users of Mr Balkan’s phone will have extensive control over any data it collects. However, those data will not be “monetised”. So far, what Mr Balkan calls his Indie Phone is just an idea and one that is largely dependent on a crowdfunding campaign aiming to raise the several millions of pounds needed to put the device into production.

But with online security frequently breached and personal data plundered, Mr Balkan is not alone in thinking there will be a big market for privacy products. Other new ventures are meant to appeal to an audience believed to have grown indignant at the gathering of personal data for commercial purposes by industry giants like Google and Facebook.

Omlet is one example. It is a new social-messaging service launched by Monica Lam of Stanford University. Using Omlet, users may send messages to their contacts and share media, all of which is stored in the cloud with a third-party provider of their choice rather than being kept and used for purposes such as targeted advertising. Although the service is free, with no centralised ownership of data on the network, Dr Lam hopes to make money by collaborating with cloud services like Dropbox or Box.com, which might then be enabled as the default storage providers.

Companies could find privacy a useful marketing tool, reckons Stephen Wicker, a specialist in digital communications at Cornell University. Dr Wicker is one of many experts urging firms to offer consumers options which do not leave them vulnerable to the hacking or misuse of databases. A service which promises to keep users’ data for a day instead of a year or more is, he believes, something that people would be willing to pay extra for.

Big Brother is worried

Pressure from governments could encourage many more privacy products. There is an increasing awareness of just how much can be learned or predicted about an individual from myriad data floating around the web. The White House, for one, is co-hosting a series of big-data workshops in response to a call by President Obama in January for a privacy review following Edward Snowden’s revelations about snooping by intelligence agencies.

At the first of these workshops, in March, there was much discussion about two very different concepts that might be adopted by industry to make computer systems more secure: homomorphic encryption and differential privacy. Both terms have become buzzwords, but many experts find they have big limitations.

Homomorphic encryption allows computations to be carried out on data while they are in an encrypted state. Currently data must be decrypted on computer servers for them to be analysed and information extracted, which leaves them vulnerable to attack or accidental leaks.

But homomorphic encryption slows processing speeds drastically. Vinod Vaikuntanathan of the Massachusetts Institute of Technology and his colleagues initially found that extracting information from encrypted data could take a quintillion (10^₁₈) times longer than open data. In his latest experiments this has dropped to ten thousand or so times longer, which is still far too slow to make it a serious alternative to existing data-protection systems.

Yet homomorphic encryption could have some uses, as in one-off situations where privacy is crucial. This might be for vote tallying during an election.

Differential privacy tackles another means of obtaining private information. Even anonymised databases can be vulnerable to “linkage attacks”. These allow people’s identity to be inferred by comparing anonymous information with data from an open source. Finding out whether someone is listed as having leukaemia in an anonymised medical-research database, for example, might be possible by matching the data with details about the person gleaned from their open social-media profile, which might well include their day, month and place of birth, home town and occupation.

Instead of directly outputting data when an anonymised resource is queried, differential privacy uses algorithms to add variable levels of “noise” to the data so that information about a specific individual is obscured. Frank McSherry, a differential-privacy researcher at Microsoft, says it is a bit like trying to listen to a million guitarists at once. If they are all playing with distortion (the added noise) it is hard to pick out an individual playing a different tune.

The problem with differential privacy is that there is no universally accepted way of doing it. Still, as Dr McSherry notes, at least researchers have begun thinking about how it might best be deployed. That is encouraging for researchers wanting to protect large, highly sensitive databases. But businesses accustomed to exploiting the richness of data, legally or otherwise, may be flatly uninterested.

Does privacy sell?

There is scant evidence that concern about privacy is causing a fundamental change in the way data are used and stored. Carsten Casper of Gartner, a technology consultancy, says that there is no big privacy-revolution happening in IT infrastructure. Companies are asking more questions about privacy, but Mr Casper says nine out of ten of those questions are to do with the location of data centres. Companies are keen to know if they will incur legislative restrictions on the use of data by moving to a particular jurisdiction, for example. Some firms also ask if public concern over the security of the cloud is significant enough to make the consolidation of data-centres a bad idea. In all cases, Mr Casper advises his clients to take a broad view, as privacy is far from the only factor they need to consider.

Yet interest in privacy-enhancing features is growing. These include such things as better control over which individuals within a company have access to certain data, or systems which can help to enforce stricter privacy policies. But neither big changes to business models nor hefty expenditure on special technologies are on the cards—unless they improve trust and customer loyalty. “Those things turn into money,” says Mr Casper.

Nevertheless, some researchers still think that privacy could be enhanced by new technology. Robert Watson, a computer scientist at the University of Cambridge, has spent years researching “compartmentalisation”. This is a way of segregating the components of programs within any service, such as an e-mail interface, so that only the communications that are absolutely necessary between those components are allowed. This could prevent malicious code in an e-mail attachment, say, infecting other programs. From a security point of view, that is a good thing. But what has it got to do with privacy? “We can use compartmentalisation to restrict the effects of a successful compromise,” says Dr Watson. It would therefore enforce security policies, which would deny an attacker or malicious insider access to information.

One of Dr Watson’s colleagues, Ben Laurie, a software-security engineer at Google and a director of the Open Rights Group, which campaigns to preserve the openness of a digital society, agrees that compartmentalisation could help limit the impact of security flaws. He should know, because he is also a core member of the OpenSSL project, which provides a popular library of open-source encryption software. In April OpenSSL was found to have a vulnerability known as Heartbleed.

Heartbleed could be exploited by attackers to gain access to the memory on computer servers, which might reveal data such as users’ passwords and other sensitive information. Strong compartmentalisation, reckons Dr Watson, would have prevented that.

Compartmentalising programs does mean additional costs, but Dr Watson believes firms would find the investment worthwhile because it would make data more secure. Moreover, the costs of compartmentalisation would fall as more computers use it, as happened in another area of computer science. Virtualisation allows multiple (virtual) machines, each with its own operating system and applications, to run independently on the same processor. For it to be commercially viable, processors had to be redesigned. The changes were expensive, but rising energy costs made running independent physical machines exorbitant, so their implementation was accelerated. If demand for improved security grew fast enough, then redesigns for compartmentalisation could be speeded up too.

For those who know where to look there are ways to hide online

Some in the industry believe governments need to intervene to protect privacy. There is some movement in this direction. In Britain, for instance, the Information Commissioner’s Office is working to develop new privacy standards to publicly certify an organisation’s compliance with data-protection laws. But critics think such proposals fall short of the mark—especially since the revelations that America’s National Security Agency (NSA) ran a surveillance programme known as PRISM, which collected information directly from the servers of big technology companies, such as Microsoft, Google and Facebook.

For those who know where to look there are ways to hide online, including the Tor network, which allows anonymous web browsing by routing users’ online activity through a network of randomly selected computers provided by volunteers. Tor has proved to be remarkably robust. An NSA presentation entitled “Tor Stinks”, which was leaked to newspapers by Mr Snowden last October, said: “We will never be able to de-anonymise all Tor users all the time.” New software should make the use of Tor less geeky, but the network is slow compared with the regular web.

There are other ways being investigated to protect security and privacy. One, known as “anonymous credentials”, provides authentication without identification. Instead of providing proof of identity to log into, say, an online service, a user’s device could be asked to complete a secret cryptographic puzzle which only authorised parties know how to solve. Another idea is “private information retrieval”, which allows excerpts from a database to be made available in confidence.

But all these processes come at some cost. Commercial use of big data, for things like market research and targeting advertising to individuals, helps to subsidise many products and services. Companies and consumers may be reluctant to pay more for greater privacy, choosing instead to take more care online. Just how big and successful the market for privacy becomes will depend on the demand for new products, like Mr Balkan’s Indie Phone—provided they can get to market.

This article appeared in the Technology Quarterly section of the print edition under the headline "Hiding from big data"

From the June 7th 2014 edition

Discover stories from this section and more in the list of contents

Explore the edition

Reuse this content