Boolean Search
Boolean Search

You might say that Boolean is designed for humans, so the title of this post is a misnomer. In this post I hope to show that Boolean search is not a magic bullet. If humans miss terms, then Boolean will miss them as well. Adding peer reviewers to a Boolean search won’t improve things much, since the human brain is not very good at identifying missing terms. Think how hard crosswords are, and in this case you are even given the number of characters and a clue!  

I was at an excellent workshop of the EAHIL annual event, and we were looking at some examples of a systematic review. Very helpfully, the workshop convenor had made some copies of one search across several different systems, including the Cochrane Library, PubMed, Scopus, and this one, from the ACM Digital Library:

Searched for "mHealth" OR
"m-health" OR "mobile health" OR "mobile device"
OR "mobile app" OR "smartphone" OR "mobile phone"
AND ("depress*" OR "self- harm" OR suicid* OR anxi* OR “PTSD"
OR "social anxi" OR "separation anxi "OR phobia OR
"generalised anxiety disorder" OR “OCD" OR "conduct
disorder" OR 'eating disorder" OR anorexi' OR bulimi* OR "binge
eating" OR “body image" OR "mental health"" OR
schizophren" OR "bipolar affective disorder" OR psychos* OR
insomnia" OR stress* ) AND (child" OR teenage" OR adolescen* OR
"young per" OR youth OR “young adult* ") 

Attendees at the workshop were asked to critique this search. The audience was enthusiastic and keen to carry out this exercise, and there was no shortage of comments. What emerged was:

  • All the attendees seemed to enjoy tackling a Boolean search to see if it was correct. Everyone in the room was a professional systematic review searcher, but the very fact that the over 30 attendees came up with several corrections to a search within five minutes suggests that Boolean is not perhaps as precise as one would wish.
  • One attendee pointed out there were two types of error. One was a duplication of searches, for example the string anx* would also pick up the more specific string “generalised anxiety disorder”, so there is no need for the more specific term. Minor errors will not affect the results of the search.
  • The other, more serious error, is a missing term in the search – as mentioned in an earlier post, this search, about the effect of mobile devices on mental health in young people, lacks the term “cellphone” (or “cell phone”), even though it includes “smartphone” and “mobile”, which means this search is potentially disastrous: it will not find some relevant articles in the literature, specifically, articles using American English terms such as “cellphone”. That is not supposed to happen with Boolean search.
  • Even if all the terms were entered correctly (I noticed, for example, a rogue space in the string “self- harm” which will not retrieve the correct “self-harm”), the search is still very broad, stressing recall over precision. Any mention of “insomnia” or “psychosis” and mobile phones and young people will be retrieved – that is quite a broad search.
  • Some of the Boolean searches provided included the qualifier “TITLE-ABS-KEY”. This means that a term was only searched in the title, abstract, or keywords. When I asked a convenor why the full-text was not searched, she stated the same search had to be used for all content, as otherwise this would bias results in favour of full-text articles. From my experience of human- or author-generated keywords for academic articles, I would distrust a search of keywords, and I see no reason for deliberately limiting searches to anything other than full-text where available.
  • One attendee sitting next to me reviewed this search by putting square brackets around it.  More generally, Boolean could be laid out more clearly. This search is, in effect:

Various terms for mobile phones

AND

Various terms for anxiety

AND

Various terms for young people

Showing the search in this way, with visual clues makes it immediately more intelligible.

What conclusions can we draw from this exercise?

  • Firstly, this search had no fewer than 17 different variations, each of them based around a different search interface. The lack of interoperability is a waste of time for information professionals, especially because not everyone will understand the specific variations in individual search tools.
  • Secondly, given the widespread trend in academic research towards replicability, it seems vital that any article describing a systematic review should include the exact search used. A cursory glance at some published articles showed that a generic search had been given, which was not the actual search carried out. However cumbersome, it is vital that the exact search is replicated, and the name of the search interface used.
  • Thirdly, aide-memoire systems such as PRESS (peer review of electronic search strategies) should be extended to include what one participant called “reiteration”. You do your search once, you review the answer, then you refine your search to improve it. There is no mention of reiteration in the PRESS process.
  • Fourthly, even peer reviewing and “reiterating” a search has a fundamental problem if the search lacks terms: an article about cellphones and anxiety in young people would not be identified from the Boolean search provided even if all the above guidelines are followed. You can’t find, in other words, what you haven’t thought of. This seems to me to be a fundamental limitation of Boolean search.  Is there no better way of identifying a set of relevant documents from a corpus?