A recent excellent webinar about the O’Reilly Online Learning site, part of the excellent Haystack Conferences, was presented by Anthony Groves, O’Reilly Media’s Technical Lead of Search, and promised a fascinating insight into how discovery should be provided. Some might be more familiar with this site under its earlier title Safari Books Online, and it was under this label that I reviewed the site back in 2013. Anthony Groves he pointed out with reference to discovery, this online learning site about IT is run by IT professionals for IT professionals. In other words, if O’Reilly don’t know how to implement search and discovery, who does?

O’Reilly Online Library home page: this is now a learning site, not just a book collection

Anthony Groves’ presentation was remarkably candid. He began by stating the search interface had received very little attention for several years, and then he demonstrated very clearly what was wrong with the old site and why it needed changing. First, let’s look at the main display of results:

Search results

When a user searches for content, they see a list of hits, as you would expect, with star ratings for reviews. Along the top of the page are six headings, five of them representing facets: the display above shows how you can select only books (an impressive 46,000 of them) or videos, and so on.

To create such a clean interface is the result of a lot of insight into user behaviour. The target user is very clearly displayed on the site home page (shown above): an IT professional who learns from their laptop or mobile device. Not surprisingly, then, Groves reports that among the most popular searches on the site are technical topics: “Python”, “Kubernetes”, “JavaScript”, and so on. Also popular are title and author queries, and what he called “mixed keyword queries”, comprising words from the title and the author together. What was surprising was that the relevance algorithm in use when he joined O’Reilly didn’t deliver expected results. If a user searched for “Robert Martin”, for example, the first hits they were shown were for authors with “Robert” and “Martin” in the title, but strangely, not “Robert Martin”. Similarly surprising was that a search for “javascript” didn’t produce titles with “JavaScript” in them at the top of the relevant results, but titles about specific types of JavaScript. The site uses Solr, and it is not difficult to configure Solr to display hits in the title before hits in (say) the text, but equally it can be configured the other way round; it was curious to think why anyone would have wanted the site displaying the way it did. Satisfyingly, the new site now shows content by both author names together before a single author name match.

Moving beyond this, Groves revealed a very impressive site testing routine that measured relevance of search hits by awarding points between 0 and 4, a process called “judgement capturing”. There is even an internal version of the search display where relevance scores can be displayed. And, as you might expect, since this site is not about e-commerce (users will already have subscribed to enter the site), the success metric is not click-through to any specific point such as a shopping basket, but based around content usage. This varies depending on the media, so for books, for example, he explained, we check how many pages they read; for a video, how long they spend on the page; for an online learning module, if the user registers for the training, and so on. This is excellent practice.

In another interesting comparison between the old and new sites, he revealed that the old site used to boost new content above popular content; today, the results are more balanced, with a mixture of new and popular content.

In conclusion, then, an impressive presentation, and a great insight into understanding user journeys for a specific site. The book Relevant Search was used not just as an example of a search, but was recommended more than once in the presentation – clearly, this is a book worth looking at.