I prefer the title above. The official title of this event (“I never metadata I didn’t like”) is perhaps more revealing than the organisers imagined. We only like metadata. Imagine putting together a one-day conference about metadata. You are unlikely to get much of a debate: we all agree that metadata is a good thing. In this post, I will try to suggest at least something of a finesse to this received opinion. I don’t disagree that metadata is a good thing; I just have different recommendations for how best to manage it.

The conference, with about 65 attendees, had several excellent presentations. So how could I possibly complain about a day devoted to metadata? Well, a number of reasons:

  • Telling people to do good doesn’t necessarily make things any better (although it might make you feel better).
  • The people responsible for poor metadata are almost certainly not the people in the conference.

With that in mind, let’s look at some of the individual sessions. Many of the morning sessions were pitched at a very low level. Jo McEntyre’s talk was potentially fascinating. She described a number of initiatives by which Europe PMC provides additional metadata to life science researchers. Unfortunately, she didn’t explain any of these in any detail, so I was left none the wiser about. Better might have been to provide a use case with specific examples, how the success was measured, and so on.

Other presentations did have more specific, if trivial, examples. Fran Frenzel of the Senate House Library, University of London showed how a simple change to a book title by the publisher before publication can mean that library discovery systems show two entries for the same book, causing any amount of confusion. Is this the same book or just one with a fairly similar title? You are unlikely to be able to resolve the problem for yourself without some external information.

Stephanie Dawson revealed some interesting new developments with Science Open, mainly around the development of subject collections, either curated by an individual, or grouped around a single term (such as a collection of articles about coleoptera). These collections act as means of discovery, bringing users to the respective publisher sites.

But for me the talk of the day was Sean Harrop of the BMJ. Perhaps it’s must my fondness for horror stories, but he gave a revealing talk with case studies of visible issues with metadata. Each of the three episodes had a similar background: these were not simple mistakes. Instead, a sensible decision had resulted content not appearing, or appearing with the wrong metadata. The result was, as his title, “met-argh! Data”. As horror stories, they were much more memorable than the rather bland exhortations from other presenters to pay attention to metadata.

Franziska Buehring of De Gruyter revealed that authors of humanities articles and (especially) book chapters pay much less attention to metadata than science content. A presentation from Hindawi scored a new low for impact: the entire presentation was recited from the bullet points on the screen. I fell asleep. Finally, two speakers from Emerald described the first few weeks of the newly released Emerald publishing platform. The presentation ended with five questions publishers should ask themselves, although you couldn’t help thinking that the impact of such a statement was predictable: publishers who paid attention to metadata would think about the principles, while the ones who should be asking the questions would not bother.

My take on metadata is a little different. Asking people to compile better metadata is not only unlikely to be successful, but possibly even counter-productive. Don’t ask people to do things they don’t know about or understand. Academic authors are great at researching, but why should they understand what a conflict of interest is? Train a machine to identify potential conflict of interest, and leave the decision-making around the edge cases to humans.  Then use the system to identify other similar examples. Or follow Sean Harrop’s principle: if I apply a machine-based change to a document, at least I know, even if I introduce errors, they will be consistent.  

When is metadata worse than useless? An example from the Emerald project manager. He showed a lovely example of well-coded metadata illustrating how authors can provide a data accessibility statement. The example was something like:

<data-accessibility> The authors state that the data for this research are available on request to the authors. </data accessibility>

The metadata is perfect, but the data isn’t available! The whole point about a data accessibility statement is to make the data open, not to elegantly code how the data is not there. This is complying with the letter of the law but not the principle.

And the toothbrushes? They were displayed in glass cases at the event venue, the British Dental Society, in Wimpole Street, London. I’m sure there’s a metadata joke there somewhere, but I haven’t been able to think of one.