Reading Time: 3 minutes
arXiv: graph of posts by subject area over time
Growth of arXiv subject areas by year

Preprint servers, of which the best known is arXiv (most of the preprint servers attempt some fancy combination of lower- and upper-case letters in their title), are increasingly used and are attracting increasing attention. They appear to fit the academic workflow very well, particularly for scientific publishing. It takes months for a journal article to go through the publishing cycle between submission and publication, and it is not surprising that researchers want their work to appear in public as fast as possible. So by posting an article on a preprint server, the researcher can establish primacy: I got there first.

Another powerful reason is for early career researchers to get something in public. Even if not fully published, a paper can be referred to and commented on. Finally, preprint servers make it possible for researchers to share articles and to get round the limitations of the the barriers that prevent their subscription content being read. Frequently the preprint and the final article are identical. If you want to be convinced at the success of preprint servers, have a look at some of the statistics for arXiv, such as the growth of yearly submission rates by subject areas.

For all these reasons, preprint servers seem like a good thing. Hence it is surprising that this Scholarly Kitchen article (by Rob Johnson and Andrea Chiarelli) is so downbeat. Perhaps the report on which the article is based is more wide-ranging: the report has a subheading “The Transformative role of preprints”, but you wouldn’t think so from this article. Firstly, the article suggests the only two successful preprint servers are arXiv and RePEC. I was intrigued by this claim; an independent website dedicated to preprints, and quoted by the authors, doesn’t even list RePEC as a preprint server; nor is RePEC mentioned in the Wikipedia article on preprints. Instead, it is mainly a collection of links to published papers in economics, with only around 40,000 “working papers” (which I assume to be another term for preprint). In other words, RePEC is a community that is rather different to arXiv, which comprises nothing but preprints. It might be worth exploring how the two collections differ.

The unenthusiastic tone of this article suggests Preprint servers are only successful in maths, and biological sciences: “for authors in other fields, however, eveidence of the benefits of preprint posting remains largely anecdotal”.  The authors don’t mention that arXiv was founded as a physics collection, and that it contains computing, statistics, and finance. In fact, these other subjects continue to grow, so that since 2017 physics has comprised less than 50% of the total content.

Yet rather than examining the growth of preprint servers, the article then focuses on how commercial publishers should respond to this situation – should they join them, beat them, wait them out? Clearly, the preprint servers are “they”, while the readers of this article are “we”. Not a very co-operative tone. Plus, the authors state, “most preprint servers have not in fact entered the publishing ‘market’ at all”. In other words, preprint servers don’t set out to make any money from the process. It’s as if unless you make money from it, your contribution does not count.

And despite what the article states, preprint servers do indeed carry out some validation of articles. Articles that claim a cure for cancer has been found are picked up and rejected before being posted to the new medRxiv server, for example.

Perhaps more valuable, although more challenging, might be for a report to compare subjects in which there is a dominant preprint server, such as physics, with subjects where there is no large-scale preprint collection, such as chemistry. Is there a difference in the research process between the two disciplines? Or another angle might be to examine the extent to which a preprint model such as ChemRxiv could be successful as an oligopoly of the major chemistry societies; or to look at attempts to create a commercial preprint service such as PubSURE.

Finally, it would be worth examining the process by which articles on preprint servers progress (or otherwise) to publication. Which articles are published, and how they are identified by peer-reviewed journals, might be an intriguing study. All in all, I feel this article (and report) is an opportunity missed.