How to find the right content

A recent Scholarly Kitchen topic asked the very pertinent question: how to build a sustainable research infrastructure. Actually the subject was less about sustainability than universality; while sustainable is a term full of positive associations, I don’t think they are appropriate here: reproducibility, availability, . There is already a research infrastructure, but it is dreadfully fragmented. I wouldn’t call that unsustainable – some, at least, of the interested parties, notably the publishers, are quite happy with the way it works at present.

This Scholarly Kitchen discussion was more about providing better tools that meet the FAIR principles (findable, accessible, interoperable, reusable). The key question here is getting content out of silos and into collections that are bigger than those of a single publisher, or single institution. I’d call it more a “universal research infrastructure” rather than “sustainable”, but no matter.

The initiatives described in the post are all excellent, but to some extent miss the point. Phill Jones and Liz Allen seemed closest to the problem, highlighting the current fragmented infrastructure with lots of non-communicating repositories. Gabi Mejias, who works with ORCID, naturally saw ORCID as the solution. Karin Wulf complained that science and arts solutions might need to work differently, which, although true, wasn’t very relevant here, except to remind us that whatever solution we come up with has to be valid across all subjects, and for books as well as articles.

Where would good solutions come from? Cross-industry initiatives such as Crossref and ORCID are fine as far as they go, although ORCID only solves one small part of the problem (disambiguating authors with the same name), and is still not sufficiently widely adopted to be universal, eight years after it was launched in 2012. But which other body can make universal solutions happen?

  • Cross-industry initiatives such as ORCID and hypothes.is only cover small aspects of the research landscape.  
  • Researchers are busy doing research and aiming to gain (or to keep) tenure, so I wouldn’t expect any major initiatives to come from researchers directly. The present system suits the incumbents fine.
  • Publishers exist to publish the content of their authors, not to make the entire research universe more accessible to researchers. They have no vested interest in a single universal repository.
  • Libraries are a good candidate to make something happen. They are interested in promoting the research environment, and grouping together widely disparate resources into a single index, but even libraries see the world from a single institute perspective. Each library is busy indexing its owned and licensed content, and if that is a task that many institutions share, they show little incentive to share the benefits. Each institute has a very different and often non-overlapping perspective. Some of the best funded and largest institutions spend much of their time trying to get the one institution to work as a unit (Cambridge has more than 100 constituent libraries).
  • Aggregators are interested in creating a single universe of content: Web of Science, Science Direct, for example. But these are not full-text repositories, and those repositories that are full text are only partial collections.
  • Paradoxically, the country that has done most to create a unified research landscape is the United States. Yes, the home of free enterprise was responsible for creating PubMed, the biggest resource of biomedical content, excellently curated and maintained at Federal expense.

For the most part, the current academic research content repository resembles in more ways than one the Victorian railway system: there are multiple railway companies (the subscription publishers), multiple mutually incompatible systems (thousands of different citation styles required by various journals), and mutually incompatible formats for publications: XML (although that varies from publisher to publisher), Latex, PDF, and many others.

What would solve the problem? Open access is pushing academic content towards a single repository (although it is remarkable that even open-access publishers have no vested interest in creating a single repository of open-access content. Like any other publisher, they exist to represent (and to profit from) their authors creating papers, not from users finding content. A global repository of all the world’s full-text academic content has a lot to recommend it. I wonder if anyone has ever tried to build such a repository. It would have a lot of users.