They say that the English Lake poets, Wordsworth and Coleridge, only celebrated the natural beauty of the landscape because that landscape was under threat and rapidly disappearing. Perhaps the same is true of the book index. Dennis Duncan’s book (Index, A History of the, 2021) is a hymn in praise of, and at the same time a lament for, the traditional back-of-the-book index.

The humble back-of-book index is one of those inventions that are so successful, so integrated into our daily practices, that they can often become invisible.

It’s a familiar story, the disappearance of a traditional human skill, but, for better or worse, but it’s not actually true. There is probably more indexing being done today than at any time in human history, if you include the abstract – but it’s not the traditional indexers who are compiling them.

Duncan’s book contains enough topics for several books, and perhaps unsurprisingly, the individual components don’t always blend well, partly perhaps because of his taste for the frivolous and inconsequential, but for a more fundamental issue with the book

The most impressive part, for me, is the result of some detailed searching in the archives. Duncan reveals some amazing discoveries – most notably, Robert Grosseteste’s Tabula Distinctionem, or Table of Distinctions, which use what must be some of the earliest infographics ever, for example, in the illustration below, several topics associated with God, each topic shown by a small graphic.

Less successful is Duncan’s exploration of indexes for satirical purposes (there are no fewer than 11 satirical indexes listed in the index), and his discovery of novels with indexes (Pale Fire, and, less well known, Clarissa), as well as indexes compiled to settle scores. If the purpose of these is to provide some gentle entertainment for a general reader, they are in my opinion dragged out a little too long.

That space could have been used for something more important. The biggest omission in the book is Duncan’s superficial coverage of indexes since 1900. There is no mention of the abstract, the bedrock of scholarly publication for academic publishing, nor of the A&I (abstracting and indexing) database, such as Web of Science or Scopus, both of them in daily use by the majority of researchers (see the recent survey here). These are summaries of academic content, designed to make them intelligible without reading the whole thing. Abstracts continue to be created by hand; no software has yet created an acceptable abstract for an academic article (although there are many ongoing attempts). Nor is there any awareness of advances in machine learning that identify not just words, but phrases (“heart attack” and “attack” are two very different concepts). So Duncan’s book has an excellent introduction to indexing through history, but then peters out into quirky anecdotes about indexing, without following through the mainstream of information retrieval represented by indexing, in all its current manifestations. If there is a relevant role for the human indexer to play, and I believe there is, the book should define clearly what that role is. It is trite to have two indexes at the back of the book, one (compiled automatically) with all the inadequacies pointed out, and one (compiled by a human) that is presented as if perfect. Human indexing is frequently inadequate: Duncan himself gives examples of unhelpful indexing from a recent biography of Cardinal Newman:

Wiseman, Nicholas 69, 118–19, 129, 133–4, 135, 158, 182–3, 187, 192, 198, 213, 225, 232, 234, 317–18, 321, 325, 328, 330, 331–2, 339, 341, 342, 345, 352, 360, 372–4, 382, 400, 405, 418, 419, 420, 424–7, 435–6, 437, 446–7, 463, 464, 466–8, 469, 470, 471, 472, 474–5, 476–7, 486–9, 499, 506, 507, 512, 515–17, 521, 526, 535, 540, 565, 567, 568, 569–72, 574, 597, 598, 608, 662, 694, 709.

We are all familiar with detailed indexes like this. There are plenty of other examples of shocking indexing, for example, Diana Henry, A Bird in the Hand (2015), a book of chicken recipes. Unsurprisingly, there are a lot of index terms under “chicken”. But the index is arranged by recipe title, however worded. So you have index entries such as  for (under S, and W respectively):

  • Soothing North Indian chicken 183
  • Warm salad of griddled chicken, freekeh, preserved lemon and mint 121

My point is not just to shame the compilers of these indexes, but to point out that indexing is not a simple as “human=good, machine=bad”. Duncan himself refers to the embedded index, a simple tool, available in Microsoft Word, by which the author himself or herself can tag the words or phrases they wish to be indexed. The role of the human indexer is then simply to decide the arrangement of the terms in the index. An ideal embedded index would work for ebooks (linking to the term chosen) or to page (for a print volume). The embedded index tool in Microsoft Word appears to work only for print, linking to page numbers, although it wouldn’t be difficult to use it to link direct to the point of embedding:

Example of Microsoft Word indexing. The index is shown below the “section break” label; it comprises just two terms.

The human-compiled index to the book, by Paula Clarke Bain, who describes herself as “a professional indexer and a human being”, is excellent. It has several joke entries, which are mildly entertaining:

recompense, feeble, for indexers 207

there see here

here, see there

snag a ram see anagrams

Society of Indexers

[Hi colleagues! – PCB]

But the index also has several topics that arguably should not be indexed – and are not included simply to raise a laugh. For example:

position of index in book

poppies, unfortunate effect of (poppies are not mentioned in the text)

sparring in the back pages (several references)

things, indexing

Most importantly, the manual index fails to link direct to the relevant word or phrase. So, for example, a reference in the index to “Aesop’s Fables” does not take you to the phrase “Aesop’s Fables” in the text. It takes you to the start of the page in which this phrase occurs. Like most ebooks with an index, this book fails to provide a precise links. Is it surprising that the manual index is no longer used?

Equally importantly, the ebook provides links both to and from footnotes or endnotes. After I have read a footnote, I want to return to where I was in the text – and I can do so with an ebook. But I cannot do this with any term in the index. Each page number is a link to the relevant page – but you cannot return to where you were in the index. The hand-compiled index, in other words, does not solve one of the problems of any print book – you need to keep track of where you are in the book when you consult a footnote.

Abstracts are typically compiled by the author of the article, while back-of-the-book indexes are usually compiled by a professional indexer (something Duncan seems to be perfectly happy with; even as the author of one of the few books on indexing published in the last fifty years, it doesn’t seem to occur to him that he might want to index his own work). Even more surprising, the indexer for this book is only mentioned in the acknowledgements.

The the traditional, manually compiled back-of-the book index is now obsolete. But the index hasn’t disappeared – it is just used in different, not always visible ways. As Dennis Duncan points out, when we search Google, we are searching an index, not the Web itself. That index is compiled by machine, and for all its limitations, it largely fulfils the goal of the 19th-century movement to compile a universal index to all knowledge.

Far too often, Duncan, a perceptive scholar, presents a challenge, identifies the solution, and then strangely moves away from it. In his introduction, Duncan points out the uselessness of page numbers as a means of indexing content when teaching a group of students all using different editions of a text. He describes the way that the Kindle ebook reader software uses the “locator”, a number of words (actually 150 bytes), within which a hit can be found. While not perfect, this tool represents a reasonable method for finding references in an ebook that might not have the page numbers of its print equivalent. You could, as Duncan’s students found, simply to find things via a string, the first few words of the paragraph containg the concept. If you are going to go to the trouble of compiling a subject index, you can use the techniqu of the embedded index, which Duncan himself describes – and then ignores.

All in all, a fascinating book, but one which could be better still if it looked at how humans and machines can effectively combine to provide a digital index.