The report “A Bibliographic Scan of Digital Scholarly Communication Infrastructure” sounds inviting. Scholarly communications is a wide field, and things move fast in this sector. The report is provided free of charge, having been funded by the Mellon Foundation, so it sounded authoritative, and I turned to it with great interest.
I noticed right at the start the words “This bibliography covers a lot of ground”. The section on publishing is equally vague: “Publishing covers a lot of ground”. That suggests a lack of focus. I certainly hit problems on reading the report, problems both with the navigation and with the content. I have concentrated here on the navigation, because that is what makes the report all but unusable. But the reader should beware that the content has problems as well.
Who is the report for? The title states “infrastructure”, and the introduction states this is an attempt to capture “recent literature” across that infrastructure. While the report states it makes no attempt to be comprehensive, it begs the question what it is trying to cover. Most of the report (around 100 pages of it) is a description of “projects or programs” arranged by function. Are these just digital projects, or just recent projects? There is no contents list of projects, and it is tiresome to scroll back and forth to see which projects are included. The projects range from Elsevier (included rather curiously as an organisation that runs projects) to tiny initiatives such as Enhanced Network Monographs.
The report overall is 154 pages, beginning with a short introduction to the main areas of the academic workflow, such as “repositories”, “researcher tools”, and “publishing”. Here is a problem immediately: publishing covers much of the entire scholarly communications workflow, yet this unstated definition seems to include Silverchair, a platform provider, and ScholarOne, the leading manuscript tracking software tool (while Editorial Manager, the other major tracking software, is not included). Neither Silverchair nor ScholarOne publish anything. So we have an incomplete list with questionable categorisation. Other points I noted include:
- It is incomplete (it includes ArXiv, but not bioRxiv, medRxiv, or chemRxiv; it includes the platform provider Silverchair, but doesn’t include HighWire or Ingenta). The criteria for inclusion don’t seem very clear.
- Many of the tables have incomplete information. Several of the tables list “project name” and “organization”, but frequently an organization appears as a project name – Silverchair is an organisation, not a project; Ubiquity Press is a publisher, not a project; and so on. Frequently, the organization is missing.
- This report states that it is based on the Mind the Gap report, which I noted was inconsistent and incomplete.
Each section includes a bibliography. There are (I would guess) around 800 individual items listed in the bibliographies overall. But it is not a full bibliographical essay, and the inclusion criteria are not clear. Some of the items in the bibliographies are commented, but most are not. Why were these bibliographic items included? More annotation and explanation would make the bibliographies far more useful for non-experts (who presumably are who this report is intended for). As it is, by the time you have established which of these many titles I should read first, I would be an expert on the subject and I would no longer need the bibliography.
There is indeed some descriptive text for each section, but these descriptions are often so short as to be useless. Here is the full text of the “Discovery” section:
Discovery tools include a mix of well-established large-scale commercial products, as well as a number of open and non-profit projects. Some of the latter have remarkable reach, considering the small size of the organizations that created and maintain them (for example, Unpaywall and the Open Access Button). Also of interest is the growth of discovery tools that incorporate artificial intelligence and machine learning (for example, Meta and IRIS.AI).
Is that all there is to say about discovery? What about (to take just one example) TrendMD? Other descriptions refer to topics as if they are already understood and require no further explanation. For example, section 1.4 is headed “Collective Action and the Funding of Open Projects”. But the entire text is based on the reader already understanding what is meant by collective action. In other words, this is a bibliography for those who already know.
Some of the descriptions seem to be reasonably watertight, for example section 2.2 is a list of repositories. But the list includes both full repository software, such as Dspace, and tools for repositories, such as Permissions Checker. This list is a hotchpotch.
How digital is this survey?
You might expect the word “digital” in the title “Digital Scholarly Communications” would mean the report is restricted to cover only digital initiatives, but it includes many aspects of scholarly communication that do not strike me as primarily digital. For example, how is peer review part of the digital scholarly communications infrastructure? You could say it is neither digital nor part of the infrastructure. Yet there are over ten references to it in the “Publishing” section.
Does the layout help navigation?
Infographics can greatly facilitate an understanding of a subject. In this respect, Boesman and Kramer’s infographic is still valuable as a summary of the scholarly communications workflow:
Although the specific tools described may have changed, the graphic conveys very clearly that this is a wheel: in order to publish, you have to discover, following which you disseminate what you have discovered, so the end becomes a beginning. In contrast, the graphic in this report is not very easy to comprehend:
How am I supposed to use this diagram? The map of the research workflow is not very clear. The graphic doesn’t tally with the rest of the report, for example, “math notation” is included as one aspect of publishing, but I can find no reference to “math notation” elsewhere in the report. But then I see two projects listed as “publishing – math”, which presumably are the ones referred to here. And is there really only one project relating to digital rights management? Are there 15 repositories or only three “providers” of repositories? What are they? There are 39 listed in the “repositories” table, which doesn’t correspond to either of those numbers.
This section is the most tantalising. Each project comprises a brief description, usually taken from the project website (indicated by the abbreviation “ws”), and the text included with quotation marks, as if to excuse the author from having to define what the project is. In some case, the first few words of the Wikipedia entry for the project are quoted verbatim (and incidentally, “wiki” is not a standard abbreviation for “Wikipedia”). Both Wikipedia and a project website may of course be little more than sales copy. In many cases, it is impossible to understand what the project is from the text provided:
- “Meta includes coverage of the biomedical sciences with real-time updates from PubMed” (yes, but what exactly is Meta?)
- “Open Humans is dedicated to empowering individuals and communities around their personal data” (very nice, but what is it?)
- “As a versatile, interoperable software solution, Pure can be configured to the growing requirements of your institution”
- Open Knowledge Maps is defined as: “Our goal is to revolutionize discovery of scientific knowledge.”
In other words, copying and pasting from a website produces a problem, not a solution, for the reader. Yet two editors, as well as a copy-editor and an author, are credited to this report.
Although the author states he has identified which of the 206 projects he describes are commercial and which not for profit, you cannot see this (essential) information at the project entry itself. You can see it in the tables, but it is tedious to have to search the whole report to find any references to a project in a table, then consult it in the project report. Nor is it easy to see which of the projects is open-source.
Use of statistics
Overall there is a lack of any kind of quantification. Are we describing small-scale initiatives or large projects? There is no indication of the scale of any project described. Worse, in the rare examples where statistics are used, the results are unhelpful. The “repositories” section gives no indication of how large or small any of the projects is. The text states “The majority of the repository projects represented below – 30 out of 39, or 76.9% – are non-profit in their orientation.” But a simple listing of repositories, most of which are tiny, and to state the majority are not for profit, is not meaningful. I would guess that most repositories are built using just two or three software tools, but that is what I would hope a report such as this would tell me. Equally useful would be to state what proportion of repositories use commercial or non-commercial software.
All in all, I found this report very challenging to use. It is all the more disappointing given that the author received funding to compile it. It’s a shame because clearly there is some good information buried within – you just have to find it.