Reading Time: 3 minutes
Photo by Edu Grande on Unsplash

Todd Carpenter, in a Scholarly Kitchen article, praises Esther Dyson’s phrase “garbage deletion”, by which she means the process of removing objects in a search and leaving only the good contents. “Good” here means appropriate for our needs; you can call it selection or deletion, as you prefer. You select what you want, and remove what isn’t relevant. Either way, it is a necessary process in a world where there is too much content to absorb.

But Carpenter also points out that generative AI is working in the opposite direction: instead of reducing the content we have to read, it is increasing it. There are various ways of dealing with the problem; Carpenter’s solution is to pay for selection and curation.

But the situation is worse even than Carpenter depicts. Generative AI is based entirely on reworking existing content, so will not produce anything innovative (whatever some commentators and even AI researchers would have us believe). The difference is somewhat similar to primary and secondary sources: ideally, it is better to go the primary source, but limitations of time mean that for the most part we depend on secondary sources. Here is the real challenge: what we face today is a barrage of commentators, who to a large extent simply recycle ideas from elsewhere. Generative AI just gives you more of the same, at times with fancier phrasing.

Like many US commentators, Carpenter (and Esther Dyson before him) has the idea that paid is somehow good, so that Dyson states that good content is only the content you have paid for, or for which you have paid someone to make a selection of what is most relevant. A cursory read of the (paid) national press makes it clear that paying for content does not imply you will not read the same derivative opinions repeated endlessly. I am relieved that some of the best commentaries and guides to scholarly communication are (like much of Scholarly Kitchen itself) entirely free. That doesn’t remove the problem: we still need to assess quality, and this is something we do every day. It’s true that we can form an opinion of any article we read, given sufficient time – but we don’t have that luxury.

Now, if AI were to identify the genuinely innovative pieces, that would be transformative indeed. Carpenter claims in a subsequent post that AI is not the best tool do this. His recommendation is “the best way to assess quality or applicability is to read the content” – a very strange inference from a technologist. It might be the best, but we lack the time to do it. When I presented AI tools for manuscript submission to journal editors, they often stated they could achieve better results, and of course they could – given sufficient time. Better to switch to tools that reduce the task for humans, such as identifying a shortlist of, say, ten articles, rather than a hundred.

The history of the last 20 years of resource discovery in academic publishing has been the steady and increasing use of tools – relevance ranking, then machine learning, and similar – to reduce the manual burden of finding relevant content. Academics, for the most part, don’t complain because these tools clearly work.  So why the assumption that the latest AI tools cannot have any validity? To me, one of the most obvious connections to be made is to join up the latest AI techniques with what you would describe today as traditional searching, using Solr or Elasticsearch to refine and improve our discovery of content. The recent Search Solutions Conference in London gave plenty of examples of how it can be done –  for example, the use of RAG, retrieval-augmented generation, that unites search with generative AI tools. Discovery is an ongoing and steadily increasing problem; we have to find ways of helping the human brain work more effectively.