Matching entities across data sources using different identifiers and formats is a pervasive issue on the web. This group revolves around developing a web API that data providers can expose, which eases the reconciliation of third-party data to their own identifiers. OpenRefine's reconciliation API is used as a starting point. Our goals are to document this existing API, share our experiences and lessons learnt from it, propose an improved protocol in the view of promoting it as a standard, and build tooling around it. A description of the existing protocol can be found here: https://reconciliation-api.github.io/specs/latest/
Here is a quick recap of the activity around the reconciliation API in 2023.
Our Community Group kept a sustained activity this year, with discussions happening in monthly video calls (whose minutes are published on our mailing list) and on GitHub. We continued our improvements of the specifications on various fronts: internationalization, accessibility, alignment of the protocol with REST principles, enhancing the expressiveness of reconciliation queries and more. These improvements will be published in the upcoming version of the specifications. We also published final specifications for the 0.1 and 0.2 versions of the protocol, which have broad adoption. This year also saw various reconciliation services improve. For instance, new public reconciliation services for the Global Names database and the Répertoire International des Sources Musicales (RISM) were published. The conciliator framework to develop reconciliation services was updated and so were ReconToolkit, Nomenklatura, Skohub-reconcile and TEI publisher. Thanks to a grant from the NFDI4Culture consortium and as a follow-up to an Outreachy internship, OpenRefine improved the user experience of its reconciliation feature in many ways, which will be released in the upcoming 3.8 version of the tool.
We have surely missed some more: if so, let us know on the mailing list or during our monthly meetings. May 2024 bring more of those exciting developments!
We are happy to announce that we have released the version 0.2 of the specifications. This version adds a range of mostly backwards-compatible features to the original API used by OpenRefine (which was released as version 0.1). Here is a highlight of the most noticeable changes:
Services can require authentication using a range of methods, taken from the OpenAPI specifications;
Exposing a type hierarchy has been made possible;
Reconciliation candidates can expose individual reconciliation features, for cases where the global matching score is not precise enough;
Reconciling without supplying entity names (but only properties) was enabled.
Brick is a uniform metadata schema for buildings. The goal of the project is to represent subsystems in a building, independently of their vendors, providing a standard for building management systems. And it offers a reconciliation service for its vocabulary.
I’ve recently had the opportunity to briefly present our Community Group and what we do in a lightning talk at SWIB20, this years iteration of the annual (and this year digital) Semantic Web in Libraries conference (slides, video):
OpenRefine, and in particular its reconciliation feature, are widely used in the library world, where authority files are an established part of traditional cataloging workflows. Early reconciliation data sources for library use cases include FAST, VIAF, and VIVO.
Our Open Infrastructure team at hbz is offering a reconciliation service for the Integrated Authority File (GND). The GND is the main authority file in the German-speaking library field. It contains persons and corporations, subject headings, geographical entities, events, and works. With our reconciliation service, we’re building a bridge from a traditional library dataset to new applications within and outside the library domain, e.g. in the (German-speaking) digital humanities. This complements the general development of the GND in recent years, especially within the GND4C project, of opening up organizational structures, processes, data models, and tooling of the GND to other cultural heritage institutions like archives and museums.
Besides services, the library world is also the source of new clients that interact with services using the reconciliation API. Two of the known clients are from the library domain: AlmaRefine and Cocoda. Managing, identifying, and connecting entities is at the very core of librarianship, making it an ideal field for the goals of our Community Group.
Therefore, I’m very happy to join Antonin as co-chair of our group. I’m looking forward to help advancing and promoting our goal of a common protocol for data matching on the Web, both in the library field and beyond.
The reconciliation test bench developed by our Community Group gives an overview of the API features supported by reconciliation endpoints available online. It also lets developers try out their service interactively, helping them improve reconciliation quality and user experience.
Today, lobid announced that their GND reconciliation endpoint now implements the Suggest API, which helps users select entities, properties and types from OpenRefine’s user interface. They report that the test bench was used to plan and test this improvement. We hope this will encourage other services to implement more aspects of the API.
We have started to map the existing environment around entity reconciliation on the Web. Our goal is to get a complete picture of all the data providers, clients, protocols, tools and other resources which are relevant to our community group.
Matching entities across data sources using different identifiers and formats is a pervasive issue on the web.
This group revolves around developing a web API that data providers can expose, which eases the reconciliation of third-party data to their own identifiers. OpenRefine’s reconciliation API is used as a starting point. Our goals are to document this existing API, share our experiences and lessons learnt from it, propose an improved protocol in the view of promoting it as a standard, and build tooling around it.
A description of the existing protocol can be found here:
This is a community initiative. This group was originally proposed on 2019-06-08 by Antonin Delpeuch. The following people supported its creation: Antonin Delpeuch, Ettore Rizza, Owen Stephens, Juliane Schneider, Ethan Gruber, Thad Guidry, Christina Harlow, Markus Mandalka. W3C’s hosting of this group does not imply endorsement of the activities.