08 December 2009

A whole Library of Congress, eh?

Books of The Times - In ‘Googled,’ Ken Auletta Explores Company’s Inner Workings - Review - NYTimes.com
Google has become such a household term that its name has morphed into a verb. “Its index contained one trillion Web pages in 2008,” Mr. Auletta writes, “and according to Brin, every four hours Google indexed the equivalent of the entire Library of Congress.”

The NYT provides a link to other articles about the Library of Congress, but not to a definition of the unit--neither, presumably did Brin. The unit is frequently an abstraction based on the number of books in the library. Michael Hart's calculation is that a library of congress equals about 13 terabytes. But Matt Raymond writing in the Library of Congress blog, notes that

we can as of this moment say that the approximate amount of our collections that are digitized and freely and publicly available on the Internet is about 74 terabytes. We can also say that we have about 15.3 million digital items online.

Perhaps Mr. Brin or other people who throw around this unit of measurement, should be more specific.