A new publication from OCLC outlines progress on the use of linked data for research archives and special collections. There is a list of 21 members of the relevant OCLC linked data review group, so this publication looks to represent a considered opinion rather than just an individual view.

Incidentally, this is a report that assumes a lot of pre-existing knowledge. For example, there are several references to “NACO”, never expanded. I searched on the Web and found the National Association of Caravan Owners, the National Association of Counties, until guessing that Library of Congress might be linked, and a search for “NACO LOC” revealed the Library of Congress Name Authority Cooperative Program (which is NACP, but no matter). Clearly, all readers of this report are expected to know what NACO is.

The OCLC report summarises some of the developments on using linked data in libraries. The first thing to note is that this is not a new initiative. This report lists two projects from the last two years, but OCLC has been experimenting with linked data since 2009. In 11 years, you would hope that a technical initiative would have become sustainable, but the report states, under “sustainability”:

To date, the majority of linked data efforts have been grant-funded or special one-off projects. This has impacted the perception of value and utility, especially for library administrators, and has made sustainability of these projects problematic. Existing systems (such as Wikibase) are perceived as novel, and it is burdensome to set up and receive approval for them. There is an educational barrier that is difficult to overcome (“the trouble with triples”). We are ultimately looking at a lot of labor with little commitment to financing that labor.

page 9

If that is the case after 11 years, then I would have doubts over the viability of the approach.

For lay readers like myself, it is difficult to understand much of what the report is describing, because it contains not a single example of linked data in use for the purpose described in the report. However, there is one paragraph of great interest, listing some use cases. This paragraph made lots of sense to me. It gives some examples, which I will quote in full:

POTENTIAL FOR BETTER DISCOVERY

Many discovery-related tasks are challenging or impossible in the current library discovery environment. A linked data infrastructure would likely make these tasks possible. For example:

  • Inclusion of identity markers of people represented in the collections. ◦ e.g., gender: “I’m looking for women in printing in the 16th / 17th century.”
  • The need for information on correspondents’ networks, within and across collections, including geographically or organization-based circles.
  • The ability to study social mobility by connecting information on people with their locations to see how they moved around, spread knowledge, and met other people. […]
  • Even though linked data structures will bring significant advantages to discovery, end users will need to adapt to new ways of interacting with linked data structures.  

As I see it, there are two problems here. First, how is the additional metadata described above to be added to the content? For example, how will we identify “women in printing” except by some specialist knowledge from an expert who adds a tag? But the report does not appear to address this, simply stating many other areas where, for example, we can capture relationships between research objects, their creators, their users, and their circumstances, and the report simply states “there are opportunities for linked data to express these relationships more efficiently”. Perhaps, but at what cost to compile the data? Paradoxically, although linked data is all about joining up metadata, the report states:

Much linked data work in archives has focused on boutique projects involving a subset of collections related to a specific subject or topic and on remediating and entifying existing description, which is time and labor intensive.  

page 9

If project A collects information about women in printing in the 16th century, and project B collects information about genetic variation in hamsters, then have we gained anything in merging metadata about the collections?

Equally important, even if we do find lots of experts to identify all the women in publishing in the 16th and 17th centuries, it states explicitly above that “end users will need to adapt to new ways of interacting with linked data structures”. That sounds ominous. There are many examples of poor UX being the downfall of many new initiatives, and in the many years that linked data has been around, finding effective ways for average, non-trained users to interact with it have been rare. The report itself states the “educational barrier that is difficult to overcome” when using linked data – they describe it as “the trouble with triples”.