
Anyone interested in search and discovery is spoilt for choice at present. First, the excellent Haystack conference on open-source search, and now the wonderful BCS Information Retrieval Specialist Group conference (full details here).
Packed into one day was not only a fascinating range of academic and industry presentations, but the announcement of no fewer than five prizes and, to round things off, a panel session where experts had three minutes to identify future trends in search.
The day was too rich to cover fully, so I’ll just mention the highlights.
If you thought there was nothing left to say about information retrieval, then Olivia Fould’s presentation suggested enough ideas to keep researchers happy for several years. She looked at the concept of “clutter” – all the items on a page that get in the way of information retrieval, such as advertising or simply the presence of other content. It was hardly surprising that the time to read a page, or the ability to capture key ideas, is reduced when the page is full of clutter. Although some of the experiments she described were unnaturally simplistic, such as identifying “message” words when distractors were presented nearby, she soon moved on to carrying out search tasks with and without clutter. She then moved on to describe the challenge general practitioners have compared with a hundred years ago: many more diseases to diagnose, and over 24,000 different prescriptions available. While this is an increase in complexity, I didn’t feel it was an example of clutter; no matter, by the end of the presentation, I felt we were all ready to examine clutter in many different contexts. Specifically, I’d like to set up an experiment to assess intelligibility of different texts, not because of layout or advertising, but simply because some writers can write more clearly than others. Quite rightly, Fould’s presentation was awarded the prize for best of the day.
David Corney, data scientist at Full Fact, a charity investigating false facts, described some of the tools they have employed to attempt to identify misleading claims. A look at their funding reveals they are paid considerable sums by Facebook. This does not invalidate the work they do (all funding is clearly stated on their website), but it does, I think, limit the tasks they carry out. I can’t help feeling that stating false facts is only one tool in the armoury of information manipulation. Specifically, a claim about a service may simply serve to attract attention to something other than the main point. For example, a widely quoted statistic about HS2 is that it will not reduce journey time for travellers on conventional lines. This may be true, but it will dramatically increase capacity on those other lines, and so improve reliability.
Nonetheless, it was fascinating to see how AI and other tools can be used to try to match claims – as well as some indication of the limits of such software. One example given was that if, say, French president Macron and UK prime minister Johnson both make the same claim about, say, unemployment, the AI tools such as Sentence BERT place these statements together. Hence Full Fact has introduced entity recognition, as well as crowd-sourcing to develop training sets of what is perceived as matching or non-matching statements. Perhaps unsurprisingly, they found that the combined approach is better than anu individual approach. Their precision has increased, but it remains very low. Nonetheless, expect much more of this kind of automated tool in the future, as governments attempt to impose more control on what is being stated.
Steve Sale of Astra Zeneca revealed how enterprise search is increasingly customised behind the scenes to identify the domain of a query: is this a product query, or a legal question? Do you want the science of a drug, or its manufacturing details? If the domain or context can be identified, the search tools can be adjusted to suit. You feel if only Amazon could understand this, I would not be given recommendations for my grandmother’s Christmas present (which I was looking for last week) when I now want books about search systems.
All credit to Martin White for stepping in at one day’s notice to substitute for another speaker. Although he claimed his talk was off the cuff, it gave the impression, as always, of a lifetime of experience of providing enterprise search. He disposed of several myths around AI, but at the same time revealed the dreadful user scores when asked about enterprise search – only a minority of users are happy with it.
Rene Spijker of Cochrane presented a good overview of systematic reviews and how they might be improved. Tantalisingly, right at the end of his talk, he mentioned an Australian case study that completed a systematic review in two weeks, down from the six months or more it usually takes. The success came from several initiatives, but included the use of ML tools. Given such a dramatic improvement, it would make sense, I guess, to try to apply its learnings across the board. It certainly sounds a more encouraging direction than the current Cochrane attempts to use crowd-sourcing.
Tim Gollins gave one of the most fascinating presentations, describing the National Records of Scotland collection. However, fascinating as the talk was, it didn’t appear to reveal the use of the many and varied information retrieval tools described elsewhere in the conference. I look at the NRS website and the search interface seemed rudimentary. Despite Tim Gollins’ disparagement of technology as a solution (with messages such as “digitisation is not the answer”), I believe that some simple improvements to the search interface could improve things for any user of NRS.
A greater contrast from the National Records of Scotland with Theresa Regli’s description of digital asset management databases (DAMs) could not be imagined. We all sat up when she revealed the typical budget for a DAM project. Her fascinating talk demonstrated, for me at least, why DAMs seem to be so ubiquitous – they are based around a dashboard that enables the product manager or marketing manager to manage a specific product or range effectively. It was, as she described it, “an engine for customer experience”, and she described the DAM contents as “a sequence of saved searches”, such as ingredients, key selling points, SKU numbers, and so on.
Perhaps the most exciting part of the day was right at the end. A panel of experts was asked to identify the key trends for the coming year. What are they? There were repeated references to two ideas:
- Vector search, so that search engines will include qualities of things rather than just words. Vespa search was mentioned more than once as a pioneer in this area.
- Feedback loops. These are already common in recommender systems, which suggest what you might be interested in based on your search history.
Let’s see if they both become big in 2022. I’ll certainly be at next year’s conference to find out more.
Leave a Reply