Reading Time: 5 minutes
Photo by charlesdeluvio on Unsplash

It was just like old times: for the first time in three years we were talking face to face, rather than feeling locked into the anaesthetized environment of a conference call. At the Search Solutions 2022 Conference, we were able to ask formal and informal questions, and to share our thoughts about the presentations – which is how I heard one participant state it was the best conference he had attended this year.

Although the attendance was lower than normal, the participants enjoyed a remarkable range and quality of presentations, from Google and Spotify to a public library, all in a single day.

Interface and UX design are certainly vital components of search, but this is one of the most commercially sensitive areas of search, so perhaps it was not surprising to hear the first words of the UX presentation (from Lexis Nexis) were “I know some of my competitors are here”, and every slide marked as “confidential”, something rather paradoxical in a public presentation. Nonetheless, it is a unique feature of Search Solutions, and indeed of the group that runs it, that all aspects of search and discovery with a kind of equal and disinterested focus: how does the search work? Do the trials represent real usage? How can this be tested? Hence the questions on how the A/B trial described in this presentation was actually set up, and if the trial was actually valid.

Amy Walduck, of the State Library of Queensland, is a champion for public libraries – she pointed out there were more public libraries than branches of MacDonalds in Queensland. She presented a fascinating not-for-profit project, visualizing query data from the library catalogue system. The images looked great, but it was difficult to know what to do with the results – the data alone already revealed the insights, without the visualisations. Nonetheless, her talk revealed, among other things, how many people keyed in full sentence questions to a library catalogue system. Although Amy stated confidently “a librarian would never key the full terms in, they would just add the keywords”, you realised from present-day Google Search using natural-language queries, that the world of search is changing, and this is an example.

Brammert Ottens of Spotify pointed out that Spotify is a destination site rather than a portal, which makes it easier to understand user behaviour and preferences. It’s difficult using Google or an academic search engine to be clear about how satisfied the user is, because you don’t often know what they do after searching. But with Spotify you can track their usage. He also revealed how different it is to provide a search environment for podcasts, which are searched for and played very differently to music tracks. “Success” for a music query is very different to success for a podcast query.

Mohamed Yahya of Bloomberg described another very slick search operation. It was a revelation to me to see the Bloomberg terminal in operation, able to answer questions such as “Which German bonds will mature in the next ten years?”, in other words, making use of transformers. But for me, at least, I felt that the Bloomberg environment, for all its sophisticated search and retrieval tools, produced rather questionable results, in light of the stated goal of providing results that were high precision and “explainable”. The Bloomberg service scans thousands of news stories every day, and Bloomberg employs some 3,000 human editors rewriting and shaping stories for publication on the site as Bloomberg news. This means that when you ask the system a question such as “what is the origin of the drones used by Russia in Ukraine?”, the first (and by implication the best) answer we were shown was simply credited as “Bloomberg News”. Although I have the greatest regard for the Bloomberg editors, my attitude to news is based not just on the content, but also on the source: authority is vital. If I read something in the Financial Times, I treat it differently to the Daily Mail. The Bloomberg system requires trust; the user has to trust the authority of the Bloomberg editors; in other words the channel and the product have become linked. Recent complaints about Amazon mixing an e-commerce platform with a collection of products have raised similar issues there.

Filip Radlinski of Google won the prize for the best presentation at SS22. He made the very valid point that recommendation and search are becoming increasingly indistinguishable. His description of “soft” attributes, used by people when making value judgements, such as “this film was less dark than that one”, was fascinating, as well as revealing the words people actually use when describing their responses to movies (“posh”, “chilled”, “boring”). His examples of natural language, and how they make sense to humans, while potentially being challenging to machines, were wonderful:

Q: Do you want lunch?

A: I had a big breakfast.

Given the challenge of trying to interpret this kind of language by machine, I couldn’t help feeling that the Spotify model of recommendations is the most successful machine-based model in operation today for many purposes. There is no need for descriptions, just listen to some tracks and I will find more like the ones you like.

Farhad Shokraneh gave an impassioned talk that revealed some of the problems with systematic reviews. He pointed out that doctors make around one million clinical decisions every day in the NHS, yet there are only a few hundred systematic reviews available: in other words, the system does not scale, and could never scale, using current methods. If anyone wanted justification for developing new tools, his talk gave it. The mania for precision over recall means that money is wasted trying to make a systematic review “complete”, when it can never be complete, and may not need to be for the purpose of recommending best treatment: words that create panic in many health information professionals, for whom “complete” becomes a mantra. Under the slogan “goodbye search, hello answer” he outlined a future for open science, and a complete rethink of current (Boolean-search based) processes.

Gavin Moore, from an NHS hospital trust, revealed the sad lack of funding in the UK health system that leaves patient and practice information not findable and therefore wasted. The attempt to save money by closing the NICE Evidence search system in March 2022 has removed a key aid for all health professionals. Gavin’s work to replicate that tool within one trust was wonderful to hear, but why was the nationwide repository closed down if it was so useful?

Two presentations relating to enterprise search gave clear exemplars of what can be achieved using BERT, a transformer-based approach to search. This meant that, for example, the European Union Publications Office can have its vast collection of content searchable using natural-language questions: you can search for specific legislation and see if it is still in force – something that could not be done by string-based search tools in one step.

Finally, the day ended with some quick ad hoc sessions (Cedric Ulmer of France Labs reminded us that “open” as stated by the FAIR principles includes the freedom to understand the code), and, finally, presentations of the 2022 search awards. These were potentially fascinating, but we had to take the presenters’ word for it, because there was no time to show examples of what the winners had achieved.

It is perhaps impossible to summarize such a wide-ranging event, but one participant pointed out that this conference demonstrated overall how vector search, predicted at an earlier conference to be a game-changer, had finally come true: ours is a vector search age. What should we predict for next time? Son of BERT, perhaps?