When you use an academic library catalogue, what are you actually searching?

The graphic shows the main search screen of Cambridge University Library, called iDiscover (actually powered by Primo). The user is presented with, as you might expect, a simple search box, and three options in which to search:

  • Cambridge Libraries Collections
  • Articles and online resources
  • Search everything

The default is the first. So I tried a search, for “Hamlet grave”, which gives me 7.5 million hits on Google. I found just 16 hits in the entire Cambridge Libraries Collection – a bit unlikely!

As far as I can see, all that is searched in this default option is the title, author, and description field. The description field for individual catalogue entries contains just a few words, for example:

You can see the words “Hamlet” and “grave” in the description field. Clearly, this is not a full-text search.

When I tried the search again using “search everything”, I got over 45,000 hits – a more likely number. Now the system tells me that it has some full-text online resources. I assume that what I am seeing here is the library supplier’s full-text index of the resources that have been licensed to the library.

Finally, this “search everything” option is not without its quirks. I would assume a search for terms “A B” would list results by relevance in the following order:

  1. Both terms in sequence in the title
  2. Both terms in the title but not in sequence
  3. One term in the title
  4. No terms in the title, but words appear in the text (etc)

Instead, we have a rather strange order. At position 9 in the results, we have a book that has none of the terms in the title. At position 10, we are back to the above listing. I don’t know what the relevance criteria are here.

In conclusion, I have two questions:

  1. Why is the default search that of the traditional library catalogue? When I see these results, they are essentially a digital version of the index cards that used to be the only guide to a library collections. Why do libraries not by default offer the biggest possible index? Would we be happy if Google omitted some of the content to which it has access?
  2. Why does the library not attempt to digitise the full text of its own collection? Why do libraries leave digitisation to third-party products like ProQuest or EBSCO?  

Perhaps one fundamental reason (suggested Helle Lauridsen) is ownership. Traditionally, libraries bought or acquired a print text, and owned it. Today, much, perhaps most of their collection is licensed. Paradoxically, the content they own is typically not full-text digitised, yet the content they license is full-text. How that digitisation is to be achieved is for another post; but we can at least agree that the dream scenario for researchers and students, and for the libraries, is to provide discovery based on the full text of all the library content, licensed or owned.

Thanks to Helle Lauridsen who gave some very patient answers to my questions about libraries for the above post.