Book ’em | Concurrent Media

Libraries With so many folks beating up on Google this week for being evil and allegedly cutting a net neutrality deal with Verizon I want to take a moment to give the devil his due where it’s, well, due. Far away from the FCC’s 12th Street offices in DC, Google has been engaged in a multi-year effort to scan and digitize the world’s books to make their texts’ searchable and, where possible, accessible.

The quest has triggered its share of legal controversies, but it’s the sort of massive undertaking that could prove hugely useful to just about everyone for a long time, yet one that few other private corporations would have the wherewithal or inclination to undertake. Outside the U.S., governments might take up part of the task, but in the U.S. there is less than zero chance of that happening, for any number of reasons. And few governments are likely to be as comprehensive in their approach as Google, whose efforts span language, geographic and historic boundaries.

In a fascinating post yesterday on the official Google Books blog, software engineer Leonid Taycher describes some of the challenges involved in ingesting, sorting and collating all that information and then organizing it into some kind of coherent, usable database. He also discusses the interesting problem of figuring out just how many books there are out there in need of digitizing, and how Google came up with its count of 129, 864,880. Worth a read.

That said, it’s worth noting there are other huge archives out there (photographs, artworks, musical scores) waiting to be digitized. Decisions about how that process proceeds, how metadata is defined and many other questions Google is currently resolving privately will have enormous implications both for the archival value of those databases and their potential commercialization. We haven’t even begun to tackle those issues on a policy level.

Share this: