Babbage | Digital attribution

Stripped search

Photos stripped of identity can be found through image-matching services

By G.F. | SEATTLE

IMAGE metadata can carry sensitive information about its creator: GPS-derived geographic co-ordinates, say. They therefore deserve to be obscured. Twitter and Facebook, for instance, often strip much metadata as a matter of course for that reason. All too often, though, websites remove such information not to protect the creator, but to appropriate his creation. Images are cropped to excise visible watermarks or digitally scrubbed to get rid of metadata using readily available digital erasors. This lets unscrupulous users dispense with proper attribution—and, crucially, compensation—to an image's originator.

But even as photographs displayed on webpages explode in size and quantity, the ability has grown to extract useful information from the image data themselves. Rather than rely on the text embedded in an image file's metadata or an exact pixel-for-pixel match of an image (impractical for many reasons), algorithms developed for use in computer-vision research, in which computers learn to identify parts of a scene in a photograph or illustration, can deconstruct the image data in a variety of ways to call out objects and features. Thus, an analysis of a picture of the Mona Lisa could be matched successfully against both other pictures of the same painting taken at different times and transformations (including parodies) of the original painting that match closely enough in form, colour and composition.

This type of automated picture analysis has given rise to "reverse" searches, matching a supplied image against all indexed images, and such offerings have stepped up as well. Google Images and TinEye (created by Idée, a firm based in Montréal) offer excellent free services for personal use. The general purpose for such reverse searches is to identify the possible origin of an image, find a higher-resolution version or determine on which sites the image has been used. (A slightly related kind of service, which relies on more traditional metadata, can help establish whether a stolen camera has been put back into use by looking for serial numbers.)

Prior to an effective reverse-image search, photographers or rightsholders had to resort to tracking down purloined images by hand, sometimes hiring services to do this. Alternatively they could use digital watermarking, a "steganographic" method which lets you embed readable codes as a seemingly imperceptible layer of tiny modifications over the entire image. Modifying or cropping the image typically doesn't remove the code. Digimarc was the pioneer in the field over 15 years ago. (It also now offers anti-counterfeiting technology for banknotes and drivers licences, too.)

Google and Idée use a more general approach. TinEye, which predates Google's service, employs image recognition, image processing and computer-vision research, says its co-founder and boss Leila Boujnane. This creates a sort of fingerprint for an image that can be used to match against other images without requiring exact comparisons—or, for that matter, even storing the original image in full. The company also manages to track transformations in which one image is used as a component in a collage or multiple images are overlayed into a single picture. A set of objects in some relation to each other into one image retain that relationship in cropped, skewed and modified derivatives.

Ms Boujnane says TinEye's index approaches 4 billion unique images, compared with 200m when it began offering the service five years ago. As with many data-oriented businesses, the precipitous drop in price of computing power and storage now enables a small firm like Idée to maintain datasets so big that they would have been the province of a Google or Facebook just a few years ago.

The number of images available on the web is unknown and unknowable; Instagram and Facebook each host billions, though many are reachable only within the network. The total is probably at least in the hundred of billions, if not trillions, including innumerable duplicates. A common question in job interviews at internet firms is how to calculate such a sum.

Besides its free service, Idée competes with Digimarc and others to assist individual and corporate image-copyright holders to keep tabs on uses. Idée plans to introduce a service to track the use of images and alert a customer when an image is spotted during indexing at a new site, for instance.

Indexes like TinEye and Google Images make plausible deniability by image appropriators less credible. "I didn't know, I couldn't find out where it came from" is no longer a credible defence, says Ms Boujnane. But she notes that there is more to it than copyright control, though she admits that is where the money lies. For many people, the notion of being associated with a work seems to rank higher.

More from Babbage

And it’s goodnight from us

Why 10, not 9, is better than 8

For Microsoft, Windows 10 is both the end of the line and a new beginning


Future, imperfect and tense

Deadlines in the future are more likely to be met if they are linked to the mind's slippery notions of the present