OCLC research scientist, Brian Lavoie, opened a recent blog post with the arresting pair of questions and answers:

What is the most popular work by an Irish author? Where could one go to find out? The answer to the first question is Gulliver’s Travels. The answer to the second is WorldCat.


Eagerly, I read the rest of the article, but curiously, it didn’t tell me exactly how WorldCat, the OCLC-created aggregation of library catalogues, answered the first question (or the second). Instead, he refers to a separate article, in which the authors (Brian Lavoie and Lorcan Dempsey) clarify the position:

The answer to the general question—how can these questions be answered?—is through a computational analysis of the published record as it is represented in library bibliographic and holdings data.

The computational analysis is not described further, and in any case, such an analysis is not, unfortunately, a step that is available to most users of WorldCat. In fact the general reader would be well advised to avoid using WorldCat if they simply wanted to find out the most popular work by an Irish author.

Brian Lavoie helpfully provided a link to the free public interface of WorldCat in the article. Using that search system, if a casual user searches for “Gulliver’s Travels” (begging the question how they chose this book in the first place), they will see several thousand hits, all linked by a title that matches “Gulliver’s Travels”, but not much can be interpreted beyond that about relative popularity. In any case, the results themselves are a clear indication that a collection is only as good as the metadata supplied to it. Any researcher would be challenged indeed to infer anything useful from the main WorldCat catalogue interface:

The first hit is an eAudiobook, credited to Jonathan Swift and Gordon Griffin, the second hit is credited to Max Fleischer (I believe it is an animated film based on Swift’s Gulliver’s Travels), and so on. It’s a glorious hotchpotch, and of no use whatever in revealing what the most popular Irish book is. It is only possible to interpret these results intelligently if you can bring some prior knowledge to the subject – for example, your knowledge that Jonathan Swift (and not Gordon Griffin) wrote the famous book Gulliver’s Travels. Mr Lavoie claims that WorldCat is an example of literary scholar Franco Moretti’s technique of “distant reading”, which I think he means in this case the use of library catalogues to infer knowledge about the history of literature. Sadly, I think even Mr Moretti would be stumped if all he had to go on was this interface. In other words, searching a database like WorldCat for a famous literary work is surprisingly unhelpful. You would have to do a lot of work to identify hits for the main title and not for adaptations, simplifications, translations, and so on (unless you rely on very imperfect metadata).

The OCLC authors tantalizingly get close to revealing how they got that result later in the same study:

Employing a methodology developed by OCLC Research and utilized in several previous studies, the Irish presence was isolated within the published record represented by WorldCat.

Even this is not an explanation of how they actually carried out the analysis, but I lost interest at that point and I did not hunt any further in the “several previous studies”. I’m sure the authors’ criterion for stating that Gulliver’s Travels to be the most popular work is reasonable (it is based on a count of the number of works by Irish authors held in library collections represented in WorldCat) but to reach this result is not, I think, something that even skilled information professionals would be able to replicate for themselves using only the WorldCat public interface.

In other words, what I believe this article is stating is “Using our tools, we can identify many interesting things from the WorldCat collection”. But that’s not an option open to most of us, who don’t have a team of research scientists to analyze the data. If you want to find out for yourself, the public face of WorldCat is not where I would go to find out. You could always try Wikipedia.