Reading Time: 3 minutes
Subject coverage of some of the collections evaluated. “GSC”, Google Scholar, has the largest collection, fairly evenly spread across subjects. Others, such as PubMed (PMD), are heavily focused on medical subjects.

Choosing where to search is a central part of the research process. Every academic needs to know where to search. You would think such a subject would be at the top of every academic’s reading list, yet here is an article buried in a research journal, unlikely to be noticed by a researcher seeking practical assistance. Michael Gusenbauer, of the University of Innsbruck, Austria, compares 56 academic content databases, using “absolute” and “relative” subject coverage. The research is excellent, but what is needed by researchers is a simple summary to turn it into a “best-buy” recommendation – and if a simple best-buy is not possible, then why not.

The article compares what Gusenbauer describes as “absolute” subject coverage, compared with “relative” subject coverage – not a distinction that I think is particularly useful for this exercise, until it is explained that the terms are being used in a specialised sense solely for this article. In this context, “absolute” means overall, while “relative” simply means “domain-specific”. The conclusion is that some databases are better at some subjects than others, which is not really surprising when we consider the databases considered include CABI (for agriculture and plant science) and PubMed (for medicine).

Are there any surprising discoveries? Yes, for example that Europe PMC has more content than PubMed. But more than the specific discoveries, the big conclusion I draw from this article is: how to researchers find all this out for themselves? It is well known that different researchers have different ways of discovering content. But is it sufficient to therefore conclude it is up to the individual researcher to choose whatever method they prefer? My worry is that such a methodology will result in less efficient searching. For many years in higher education it was assumed that the students would simply pick up good study habits of their own accord. It wasn’t true, yet we still leave postgraduate students to work out their study methods as best they can. If they have a good supervisor, they are lucky; if they don’t, they have to learn for themselves.

What about recommendations? After all the detailed consideration of the various collections, and the description of a new metric for comparing the databases (what Gusenbauer terms the “bag of keywords” approach, shamelessly drawing on the “bag of words” technique used in machine learning), the conclusion to the article reveals a rather different approach.

Overall, the optimal choice of database system (what), is determined by why researchers are searching (the goals) and how they want to search (the heuristics)—the so-called ‘search triangle’ (Gusenbauer & Haddaway, 2021).

Perhaps it would have been better to say this at the outset: that the subject coverage of each database is not the sole criterion for a researcher to base their decision on. After all, the typical researcher knows perfectly well the difference between a subject collection such as PubMed and a general collection such as Google Scholar, and make their choice of resource accordingly. The shame is that this conclusion points the reader to other papers that spell out this message more clearly. Perhaps, just as it is considered poor form to include citations in an abstract, we should forbid any citations in the conclusion to an article, if it is to be of practical use. A further requirement, for example for systematic searches, is to be able to download content in a usable format – some databases are better at this than others, and in any case, only full-text services can offer such a download.

In any case, the growth of recommender tools, which use AI techniques to identify related content to any selected article, means that for many purposes searching is discarded entirely. There is no need for keywords if in the background the tool is identifying relevant concepts.

Would any researcher (not a bibliometrician) bother about reading this entire article? Is the methodology even relevant for researchers to use? Unfortunately, as far as the average researcher is concerned, the nature of academic publishing is to research something, not to provide a consumer guide. A simply-worded restatement of this article, or the collection of articles that Gusenbauer has authored in the past few years, would be valuable indeed. Better still might be to combine the article recommendations with the actual practice of some academics based on their feedback from using this article, and who could also comment on why they made the choices they did. That way we might learn more about how researchers actually identify relevant content.