Babbage | Digitising books

Academic time machine

A publisher looks back in time nearly 150 years to complete its digital collection

By G.F. | SEATTLE

SPRINGER GROUP traces its roots to a Berlin bookshop, opened by Julius Springer in 1842. He began publishing journals in 1843, the same year this newspaper was founded. Despite a slew of mergers, acquisitions and spin-offs—including accreting another hoary firm, Kluwer Academic Publishers, founded in 19th-century Netherlands—the publisher of academic and business tomes has an acute sense of its own history. Like many of its peers, it now publishes print and electronic versions of books simultaneously. Its contemporary digital library contains some 50,000 titles, mostly published after 2005 when the digitisation drive began in earnest. Now, though, Springer has decided to make the 65,000 tucked away in its vast archive available electronically under a commercial licence.

Many are pedestrian volumes or outdated research, of interest to science historians, if that. But there are also some gems. Your correspondent was sent a few samples—works by Albert Einstein, Sir John Eccles and Rudolf Diesel. Crisp digital reproductions of other seminal writings, long since out of print, by Niels Bohr, Lise Meitner, Werner Siemens and a host of scientific and technological luminaries can now be had, with the full text embedded for excerpting or searching. This effort, still underway, parallels other work to open up historical scientific records. The Royal Society says its trove of roughly 24,000 papers more than 70 years old—dating back as far as 1665—may be freely accessed. (Another 36,000 remain under licence.)

Scanning Springer's backlist proved no mean feat. First, the company had to figure out for which works Springer holds copyright, surveying records at all the firms swept up in recent years, says Thijs Willems, who heads the book-archiving project. To create a definitive list his group scoured old catalogs and national libraries. They eventually assembled an archive of 100,000 print books in English, Dutch and German, many of which were different editions of the same work. The firm arranged access from libraries to those that Springer had lost due to the vagaries of time, war, etc. It decided to scan only the last available edition of a given work; earlier editions might be added to the trove in the future.

Copyright remains a bother. The United States has a solid dividing line—all works published or registered in America before 1923 are firmly in the public domain. But other countries maintain fuzzier policies, some of which have yet to be properly tested in court. In principle, books published as early as 1870 might still be under copyright in Britain. If a work was published that year, say, by an author who was 20 at the time, but who lived to 100, it would remain under copyright for 70 years after the author's death, ie, until 2020. The odds of this happening are long, but British publishers refrain from releasing pre-1870 titles into the public domain, just in case.

Springer only began securing electronic rights in 1995. To make scanned versions of older books available, it often required tracking down and negotiating with estates and authors. Mr Willems says that living writers typically cheer the project, happy to see their books immortalised. The publisher also hoped to avoid the controversy surrounding Google Books, where the internet-search giant and academic institutions involved in the project owned no copyrights. In effect, Google tried to bypass the publishers, authors or libraries that held these rights, says Ray Colon, a Springer executive.

Mr Colon says that the firm is perfectly happy to let anyone else distribute those of its publications that are in the public domain, although it does not want to suggest which works fall into that category. (It does not make its own high-quality digitisations available; others would need to obtain copies of a work and create their own scan.)

There were non-legal challenges, too. For example, until 1941 German books were typically set in a blackletter type style known as Fraktur, and sometimes mistakenly called Gothic. (Fraktur was banned in 1941 by Hitler's secretary, Martin Bormann, for being too Jewish.) The intricate black-letter face can be difficult for modern readers to decipher. It was a tough ask for modern optical-character-recognition software, too, which had to be trained to accommodate it. On top of that, the dictionaries used by the software needed to be supplemented with obsolete words that were commonly used in the 19th century.

A publisher active in Germany during the Third Reich might also be concerned about works dating from that era. Of Springer books that survive, Mr Willems says fewer then ten have been withheld. Curiously, these were written in the 1930s by Jewish authors, and contain references to racial differences and religion that, they reasoned, might look unpalatable out of context. Mr Willems says this handful of books, while not included in the collection by default, will nonetheless be made available on request to those who subscribe to the appropriate collection. The books are not shocking; releasing them would not be illegal under Germany's constitutional ban on the publication of undemocratic materials, he assures.

Springer has painstakingly produced the highest possible quality of scans, principally to avoid having to start from scratch when today's viewing technology is superseded by something dramatically better. Mr Willems and his team also embedded rich metadata—details like author, date of publication, number of pages, and so on—in standard formats which are likely to persist for a while. They took especial care in reproducing illustrations. These digital books are, after all, meant to last for ever.

More from Babbage

And it’s goodnight from us

Why 10, not 9, is better than 8

For Microsoft, Windows 10 is both the end of the line and a new beginning


Future, imperfect and tense

Deadlines in the future are more likely to be met if they are linked to the mind's slippery notions of the present