Anyone reading the latest issue of the invaluable Research Information (The Meaning of Semantics, Four industry figures discuss the latest developments around semantic enrichment with Tim Gillett) would be left little the wiser about semantic enrichment after reading it. Although the line-up of people interviewed is impressive, each respondent answered the questions in a very different way, which revealed perhaps how little agreement we really have when we talk to humans about what we believe to be one topic – especially when that topic includes the word “semantic”.
What is semantic enrichment?
This question seemed simple enough. Babis Marmanis of Copyright Clearance Center and Giuliano Maciocci of eLife answered in broadly similar ways: “the enhancement of content with information about its meaning”. But Donald Samulack pointed out the enhancement could be by something visual, such as an infographic, and Jordan White of ProQuest described “adding disparate pieces of content … sharing certain metadata” (sounds suspiciously like using keywords to me). It could all be described as enrichment, although not perhaps what interviewer Tim Gillett perhaps had in mind when he asked the question.
What are the key industry developments of semantic enrichment in the last ten years?
Here again there was a range of answers. Babis Marmanis outlined entity extraction, which today can be achieved in a far richer way than ten years ago. Giuliano Maciocci described quite correctly the uncharismatic but vital steady progress of XML coding of articles and chapters, most recently to the JATS XML standard, plus ORCIDs for persistent identifiers. JATS and ORCID may be very unsemantic, but without these tools the clever semantic stuff can hardly begin. Donald Samulack talked again about adding visual elements. Jordan White intriguingly described“taste clusters”, or “featured snippets”, which I thought was provided by the journal abstract, although on reading a few abstracts I am often left none the wiser on what the article is really about. Perhaps this reveals that leaving semantic enrichment to humans isn’t always wholly successful.
Next, we came to the big question: how do these developments benefit the academic community?
Marmanis talked enthusiastically about synthesising, drawing inferences from and taking action from research articles – the holy grail that corresponds to the way a human would deal with an article. Sadly, it is still some way off realisation. Maciocci, practical and pragmatic as before, talked about providing more APIs to link content to other sites. Samulack described saving time for the researcher by delivering the “stopping power” of an article – a summary of what it means.
The final question was about the future: what comes next?
Of course such a wide-ranging question, with no single correct answer, will produce considerable variety in the responses, but here perhaps the results were least satisfactory. Marmanis, interestingly, pinned his hope on “well-curated semantic lexicons … with minimal human intervention”. I would have thought that a true AI-based semantic enrichment might not result in a lexicon, which is a means, rather than an end; researchers don’t want a lexicon, they want answers. Maciocci talked about “automatic inference”, which seems a much more fruitful way of proceeding. Samulack described the need for images to have plain-text descriptions – I would have thought advances in image processing mean that most images would have automatically generated text descriptions of their contents before long. Jordan White talked about the end of relevance-ranked search, and not a moment too soon, being replaced by “trusted seminal works at the top of search results, vetted by evidence of usage”. It seems odd that the journey of semantic enrichment, which after all is about revealing the meaning of a text, should require usage results, just like Amazon provides when you buy something on their platform: since you bought X, most users then bought Y. It works, but it’s hardly very semantic.