Reading Time: 4 minutes
Photo by Hansjörg Keller on Unsplash

We can all agree that a lot of research, perhaps most research, starts with asking questions. Does smoking cause cancer? What are the causes of the French Revolution? What causes inflation? The next step is for the researcher to review the literature. Has anyone asked (and ideally answered) this question already? Which paper should I read first?

Aaron Tay, as usual, has examined the question in detail, and comes up with some solutions I didn’t even know existed. Essentially, his assessment is to see if there are any machine-based tools we can use to facilitate the process of finding key works.

How can we assess the process? We can ask researchers themselves, but frequently, the researchers are not aware of the tools available. It’s difficult for a researcher to provide a considered evaluation of a tool they only discovered a few minutes ago, and certainly in the humanities, awareness of AI-based tools for scholarly discovery is not high.

Tay identifies four main strategies (although these are in some cases subdivided):

  1. searching usinf phrases such as “seminal works”. This is a method that I noticed some time ago, which appears often in reviews of books (what makes a book significant?) Just look for the word “famous”, or “significant” (Tay provides suggested Boolean searches for several semantically similar terms, for example “seminal”, “fundamental”, and “pivotal”). However, I notice that book reviews, which are by their nature open, tend to be a bit more effusive in their praise than articles. It is not the done thing to call another article “pathbreaking” or “seminal”. Nonetheless, there are almost 58,000 uses of the phrase “seminal paper” on CORE.
  2. Asking a question of discovery services, e.g. “What are the seminal paper on the French Revolution?”. Asking questions is now possible in Google, but not, it seems, in Google Scholar. It is, of course, possible in ChatGPT, but all of these methods rely for their success on the available corpus: certainly full text rather than abstracts or (worse) keywords, which rules out Web of Science and Scopus.
  3. Using a tool that counts the cites that come from references, for example using Connected Papers.
  4. Using Reference Publication Year Spectroscopy (RPYS for short). I found the description of this method bewilderingly complicated. It seems to be based around an analysis of citations, identifying the year when key papers were published. It doesn’t look to me as though this method is at all user-friently, nor will it find seminal works quickly, which was Tay’s initial criterion. Even the most simplified version of this methodology, Biblioshiny, described as “the shiny app for no coders”,  requires the installation of the software statistical package R. This doesn’t look like a methodology I could recommend to researchers.

Other factors

In any case, this is not just a retrieval question. I can think of several other factors that influence the identification of seminal works.

First, your strategy will depend on the question you ask. Is your question broad or narrow? None of the methods above will work for broad questions such as “Find me the key works on the French Revolution”.

Second, some of the tools above, notably method (3) combine citations with content. But, as we all know, citations are not a perfect guide. There is no guarantee that a highly cited paper is the most significant.

The task of finding significant content is a very big one; there are several other ways of identifying relevant content (and no doubt talking with a few researchers would reveal several more). Google provides links to a number of books on a topic if you search for that topic:

Suggested books when you search for a topic on Google

But we don’t know anything about the rationale Google uses to identify these books. More interesting is Five Books, a site where experts recommend five books on a topic, and talk about their choice. This explanation I find remarkably helpful, in many ways more valuable than a scholarly bibliography, as it enables me to identify something about the background to the books. For example, Francois Furet’s Interpreting the French Revolution was written as a polemic addressed to people who already knew something about the subject. Hence is it probably not the first book you would read on the subject.  

What next?

The description of tools to find seminal works suggests to me that there is no clear recommendation that could currently be made for researchers. At this point, it would be a very interesting exercise to involve the researchers themselves, for two reasons. Firstly, the academics will be able to provide actual statements of research questions; and secondly, they will be able to provide the all-important human evaluation of the success or otherwise of these methods. So, in conclusion, a question worth asking, but one that needs the involvement of AI developers and human researchers together.