Tuesday, February 27, 2007

Compare your Library with LibraryThing

Tim Splading of LibraryThing just release and new set of data, An export of all the ISBN's that LibraryThing knows about.

Tim writes:
Over on Next Generation Catalogs for Libraries, NCSU's Emily Lynema, asked me:
"Do you have any idea of the coverage of non-fiction, research materials in LT? Have you done any projects to look at overlap with a research institution (or with WorldCat)?"
No, we haven't. And I'm dying to find out, both for academic and non-academic libraries.
So I decided to see how hard it would be to write a script to compare the LibraryThing dataset against a simple export from our library system. It turns out it didn't take to long. And I have posted the perl source code on my personal website so you no longer have that as an excuse for not helping Tim out.

Here are the stats for The University of Waikato Library:

Out of approximately 500,000 Bib records in our database I found only about 178,460 unique ISBNs. LibraryThing has 1,774,322 ISBNs so they have ten times as many as us! Note: This was found to be an error during normalisation. The number is now 292,073


UoW Library and LibraryThing have 45,259 73,377 ISBN's in common, which means that LibraryThing only has about 15% of the ISBN's we have or in other words 75% of our ISBN's are ones that LibraryThing doesn't have. This seems like a surprisingly large number given how much larger LibraryThing's database is. Tim may have the right idea though, as he said he suspects LibraryThing users tend to have the paperback (cheaper) copies of books rather than the more expensive hardcover versions that libraries tend to buy. It would be interesting to see if that is infact the reason, or if we just have very different set of resources from what is cataloged in LibraryThing.
DatabaseTotal ISBNsUnique ISBNsPercentage Unique
University of Waikato292,073218,69674.88%
LibraryThing1,774,3221,700,94395.86%

Total ISBNs in common: 73,377


I figure since they asked the question, NCSU Libraries should be next...

External Links:


Update: Updated Figures after discovering I had dropped a whole bunch of ISBN's when normalising them.

1 comment:

Mickey said...

Thanks for sharing this. I looked at your perl script. I don't think you need the line

use IO::File;

as you don't use any of this modules features. Are you assuming that the two lists that you read from have no duplicates?