Reading Time: 5 minutes
Photo by Tom Wheatley on Unsplash

Karin Wulf’s recent reposting of a Scholarly Kitchen article from 2015 is one of the most sensible articles I have read about citations, from someone who actually uses them (actually two articles – the other one is here). There is a large literature about citations, over many years, but it seems to have had little effect on the use of citations to measure importance of an article: if in a scholarly article you refer to another article, this is deemed to count as an endorsement of the cited source.  

It’s not difficult to pick holes in this view, but, somewhat like government policy being regularly condemned by economists, it seems to make little difference; the policy continues, regardless.

This post is simply to flag a few points about citations. Karin Wulf made some fundamental points in her two posts, and her readers added a few more. She identified several cases where citations are used without any kind of implied endorsement:

  • Citations may record primary sources, which do not typically have citation counts done for them.
  • Citations may be used as a reference to other work in progress, without any endorsement or critique implied.
  • Some Citations become “the exemplar”, e.g. discussing the emergence of nations in the wake of democratic revolutions to reference Jurgen Habermas on the “public sphere,” and Benedict Anderson on “imagined communities.” Elliot Green, in a 2016 post, complains that many references of this kind are to the concept implied by the title rather than the full book – but I see no problem with this.
  • All citations are not equal; they are used very differently in science and arts contexts. Older citations are far more important in HSS (humanities and social sciences).

My gripe about citations is that their use has become such a fundamental part of pedagogical practice drilled into school students (“Always cite your sources!”) that the use of citations can become unquestioning. As long as you state a citation for any statement, however feeble the source, your argument is valid. I wrote about some examples of this uncritical use of citations in Wikipedia here.

Within scholarly publications, by far the most common citation use is simply context: educating the reader about scholarly context, rather than intellectual influence; most citations are used to indicate the current state of knowledge in an area.

Scite.ai has gone one better than simply counting citations. This software tool looks at citations in some detail, and is able to classify them into three groups (see the excellent article describing how scite works):

  1. Supporting – to scite’s credit, supporting citations must include some primary evidence of the support.
  2. Mentioning
  3. Contrasting

Not mentioned in the scite documentation, but very apparent in the actual results, is another group, “unclassified”, where presumably the tool was not able to determine what kind of citation is involved. The tool uses machine learning with a training set created by human experts, and human experts are brought in where the software delivers a result that conflicts with expectations. Unfortunately, the vast majority of citations are simple mentions. Here is a scite analysis of a scholarly book:

Example of a scite.ai entry for one article

All credit to scite for identifying where in the article the cite appears, but the analysis of the citations is not so good. The scite criteria for supporting or contrasting statements mean that in this case, of the 1,800 or so citations, just 8 (0.04%) are classified as anything other than “mentioning”.

What are these “mentions”? Here are some (typical, I think) examples from the introduction to a scholarly article about pain management:

Nowadays there is no place to debate the need of assertive treatment of acute postoperative pain [1–4]. We all agree on the importance of multimodal treatment of perioperative pain, since it has direct effects on the outcome of patient recovery [2–5]. Despite this knowledge, acute postoperative pain still is a significant problem that occurs between 60 to 80% of patients according the different series reported and it often remains undertreated [6–8]. Opioid analgesics are the mainstay of acute pain treatment in the inpatient setting [9]

These nine references appear to be fairly generic, as if the authors are giving examples, stating the current state of knowledge and practice: “this is what we know, and this is what we currently do”. The text is a rhetorical act, stating “trust us, that we, the authors, have read the literature, before making any new assertions”.

Following this introductory section, further references (at least in science papers) tend to be about methodology. It is in the discussion section where some disagreements can be found. From the same article:

In comparison to previous studies conducted with intravenous formulations of ibuprofen [8,9,13] the median morphine consumption was lower in both treatment arms …

Here, “in comparison to” implies a contrast with this paper, hence a contrasting citation. It is tempting to think that an analysis of the wording of articles might enable a more revealing examination of citations. Would it not be fascinating, for example, to identify which were the fundamental papers in an area, those papers deemed by researchers to be essential knowledge for any subsequent study? A search on Google Scholar for “fundamental paper” reveals 15,200 hits, with examples such as:

Google Scholar results for “fundamental paper”

Each of the above references includes a summary by scite.ai of the supporting, contrasting, and mentioning citations to the work by other papers. Even though scite.ai does not include any sentiment analysis, would not be possible to collect phrases such as “fundamental”, “groundbreaking”, “key”, with reference to papers or articles, to identify the essential papers on any topic? That might be more revealing than the parlour game of looking at the most cited books and articles for a given time period, and trying to explain their order. There are examples from Nature News (2014), Elliott Green (2016), as well as a whole series of posts by Eugene Garfield, who created the citation index. Unfortunately for this simple idea, the most cited articles are often not research papers but about methodologies, and in any case, the number of citations, or even the number of citations per year (as recommended by Anne-Wil Harzing of Publish or Perish), don’t necessarily tally with the overall importance of the paper. Here are four examples of highly cited books and papers to illustrate the point:

Cites (in 2014)Per yearAuthorsTitleYear
16746728Page and BrinThe PageRank citation1999
144701447Doudna and CharpentierA programmable dual-RNA-guided endonuclease2012
173114Thomson, JJCathode Rays1897
1725405228Sambrook et alMolecular Cloning (book)1989
Some highly cited articles and books

The Doudna and Charpentier paper is the famous article about CRISPR which led to their Nobel prize in 2020. Thomson’s paper about cathode rays was the discovery of the electron. Is the Sambrook book on molecular cloning a hundred times more important than Thomson’s article?  

What can we conclude about citations? My conclusion is (and I don’t have a citation to validate this idea) is that I will keep my fingers crossed and hope I get cited by someone in the near future. More fundamentally, it would be nice if someone managed to extract more significance from the natural language around citations, to enable us to make more meaningful assessments about the value of papers.