Use Case Virtual International Authority File (VIAF)

From Library Linked Data
Revision as of 20:25, 17 October 2010 by Jschneid4 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Back to Use Cases & Case Studies page

Name

The Wiki page URL should be of the form "Use_Case_Name", where Name is a short name by which we can refer to the use case in discussions. The Wiki page URL can act as a URI identifier for the use case.

Virtual International Authority File (VIAF)

Owner

The person responsible for maintaining the correctness/completeness of this use case. Most obviously, this would be the creator.

Jeff Young

Background and Current Practice

Where this use case takes place in a specific domain, and so requires some prior information to understand, this section is used to describe that domain. As far as possible, please put explanation of the domain in here, to keep the scenario as short as possible. If this scenario is best illustrated by showing how applying technology could replace current existing practice, then this section can be used to describe the current practice. Often, the key to why a use case is important also lies in what problem would occur if it was not achieved, or what problem means it is hard to achieve.

[Note: VIAF's Web interface is based an SRU XML "record" database. The "General Perspective" sections of this document reflect this reasonably familiar conceptualization. Sections labeled "Linked Data Perspective" try to show how run-time Linked Data features and use cases have been layered on.]

General Perspective

[Copied from http://www.oclc.org/research/activities/viaf/]

A joint project with the Library of Congress, the Deutsche Nationalbibliothek, and the Bibliothèque nationale de France, VIAF explores virtually combining the name authority files of all three institutions into a single name authority service.

As of the fall of 2009 there are 18 personal name authority files from 15 organizations participating in VIAF.

Linked Data Perspective

Increasingly, VIAF participants publish their authority records on the Web in a hodgepodge of formats and languages using a variety of REST and non-REST URIs forms. VIAF uses these source URIs, if available, but the URI behaviors, semantics, and document markups are often inconsistent and site-specific and sometimes ever-so-slightly and yet perniciously deviant:

Institution Link rdf:type Note
National Library of Sweden http://libris.kb.se/resource/auth/207435 a foaf:Person Linked Data 303 URIs forwarding to Different Documents
Deutsche Nationalbibliothek http://d-nb.info/gnd/118640445 (via inferencing:)
a rdaFrbr:Person
perhaps other rdf:types that require a priori knowledge of the http://d-nb.info/gnd/ namespace
Not Linked Data (no 303), but content-negotiates to HTML and RDF/XML.
Library of Congress/NACO (OCLC Research proxy) http://errol.oclc.org/laf/n++79034525.html [Web document] HTML only
Biblioteca Nacional de España http://catalogo.bne.es/uhtbin/authoritybrowse.cgi?action=display&authority_id=XX971832 [Web document] HTML only
Getty Research Institute http://www.getty.edu/vow/ULANFullDisplay?find=&role=&nation=&subjectid=500010879 [Web document] HTML only

For records that aren't identified at the source, VIAF HTML links to a local "processed" representation instead:

Goal

Two short statements stating (1) what is achieved in the scenario without reference to linked data, and (2) how we use linked data technology to achieve this goal.

  1. Consolidate the classification, identity, and discoverability of things (people, organizations, etc.) coming from various authority agencies for unexpected reuse.
  2. Strengthen the coherent/resolvable identity of these things and integrate them with variant human and machine-friendly Web document representations using 303 (See Other) and content-negotiable generic documents.

Target Audience

The main audience of your case. For example scholars, the general public, service providers, archivists, computer programs...

General Perspective

[Copied from http://www.oclc.org/research/activities/viaf/]

The goal of this project is to facilitate research across languages anywhere in the world by making authorities truly international.

OCLC is conducting this research because we have proven software for matching and linking authority records for personal names. This software will be used to match the authority records from The Deutsche Nationalbibliothek (dnb) and the Bibliothèque nationale de France (BnF) to the corresponding authority records from the Library of Congress (LC).

Once the existing authority records are linked, shared OAI servers will be established to maintain the authority files and to provide user access to the files. Users then will be able to see names displayed in the most appropriate language. For example, German users will be able to see a name displayed in the form established by the dnb, while French users will see the same name as established by the BnF, and American users will view the name as established by LC. Users in their respective countries will be able to view name records as established by the other nations, thus making the authorities truly international and facilitating research across languages anywhere in the world.

Linked Data Perspective

The only audience limitation for identifying an instance of some class (currently "people") is the extent of included individuals that are recognized by participating authority agencies. Within that set, any audience can benefit from globally-unique, actionable, descriptively-essential, interrelated identifiers for the individuals.

Use Case Scenario

The use case scenario itself, described as a story in which actors interact with systems. This section should focus on the user needs in this scenario. Do not mention technical aspects and/or the use of linked data.

General Perspective

The VIAF server root (http://viaf.org/) includes a form for searching "cluster" records using an SRU API. For example:

An essential feature of this SRU implementation is REST URI support for accessing individual "records":

These identifiers are suitable for human and machine-oriented applications because of their support for content-negotiation (HTML, XML, RDF/XML).

Linked Data Perspective

The numerous elements in XML "cluster records" (not to mention their attributes and hierarchy) are VIAF-specific, evolving, and detailed artifacts of the authority matching algorithms. The need is to represent these "cluster records" in a variety of result-oriented conceptual models (OWL) that have been designed for casual and detailed use cases.

Application of linked data for the given use case

This section describes how linked data technology could be used to support the use case above. Try to focus on linked data on an abstract level, without mentioning concrete applications and/or vocabularies. Hint: Nothing library domain specific.

Implementation Issues

The http://viaf.org/viaf/* URI pattern was baked into VIAF early based on the SRU "VIAF database" foundation:

http://viaf.org/viaf/search

Rewrite rules were added to give each record in this database a REST URI that delivers an HTML representation:

http://viaf.org/viaf/24604287

This URI was eventually upgraded to support Linked Data 303 URIs forwarding to One Generic Document to satisfy human (HTML), machine (XML), and semantic (RDF/XML) agents. The implementation's ability of map database "records" to a conceptualized 303 "real world object" is extremely handy because this pattern has potential for transitioning legacy physical models to broadly-conceptualized Linked Data models. Nevertheless, it can also be awkward because mapping database records to real world objects is currently the ONLY way for this implementation to support 303 real world objects. I hope this constraint will be relaxed once the value of Linked Data has proven itself.

In the mean time, the technical challenge has been to expose rationalized OWL individuals using pre-ordained URIs:

Resource Type URI Behavior Note
Real World Object http://viaf.org/viaf/96994048 303 (See Other)
Generic Document http://viaf.org/viaf/96994048/ 200 (OK) Content-negotiation Note the useful Content-Location header (thanks Ralph!)
Web Document http://viaf.org/viaf/96994048/viaf.html 200 (OK)
(application/xhtml+xml, text/html)
Content-negotiation default
Web Document http://viaf.org/viaf/96994048/viaf.xml 200 (OK)
(application/xml, text/xml)
[editorial note: Ralph, please remove the distracting
XSL Stylesheet reference from this representation]
Web Document http://viaf.org/viaf/96994048/rdf.xml 200 (OK)
(application/rdf+xml)
A more conventional and standards-compliant
name would have been something like "about.rdf".

If you're a fan of opaque URIs, then you might sympathize with this constraint. I mourn for human intuition and would prefer URIs with a class name path segment (e.g. "/person"), but because the Linked Data aspects of VIAF are experimental this hasn't been a priority. As a consequence, the classes that diverse users are likely to recognize have reluctantly been encoded in the URI hash "fragment" or suppressed altogether:

Real World Object rdf:type
http://viaf.org/viaf/96994048/#{URL-encoded form of the established heading} viaf:EstablishedHeading
http://viaf.org/viaf/96994048/#XRefAlternate:{URL-encoded form of the variant heading} viaf:EstablishedHeading
http://viaf.org/viaf/96994048/#skos:Concept skos:Concept
http://viaf.org/viaf/96994048/#foaf:Person foaf:Person
http://viaf.org/viaf/96994048/#rdaEnt:Person rdaEnt:Person

Ontology Considerations

Unfortunately, the "/viaf" URI path segment didn't provide much insight into the class of "Real World Object" in this case. The initial "cluster records" focus has been on "personal names", but with plans to expand outwards eventually. The obvious existing classes foaf:Person and skos:Concept were both lacking because of preferred/alternate name/label limitations. Eventually we developed a generalized VIAF ontology (JPEG) (OWL) and assigned the 303 real world objects to rdf:type viaf:NameAuthorityCluster. Although this class lacks the intuition of foaf:Person and skos:Concept, it has the merit of reflecting a key class in VIAF's internal and feedback processes.

Internal Rationalization

Native XML "cluster records" are constructed to support internal processing and are presumably not intuitive or stable representations for feedback to participating institutions:

In contrast, the VIAF Ontology is a concise distillation of the conceptual results:

Bulk harvesting of this distillation could be supported using RSS, Atom, and/or OAI-PMH, but this feature has not been implemented yet. In the mean time, the OWL individuals are available record-by-record by content-negotiating application/rdf+xml from generic resource URIs:

Users can also bypass content-negotiation by accessing the RDF/XML representation directly:

External Rationalization

Keep in mind that VIAF never uses or stores RDF. The VIAF OWL was rationalized after the XML cluster records were developed and the RDF representations for individuals are conjured from those cluster record at runtime using XSLT. This should be reassuring for legacy Web applications because it means they shouldn't need to be rewritten to be interoperable on the Semantic Web. Instead, existing REST URIs for HTML documents can be upgraded to support content-negotiation for retrospectively-rationalized RDF.

RDF produced like this at runtime could be rationalized against a local or external OWL ontology. In VIAF, we have tried to crosswalk to as many other OWL ontologies as seem useful. Current, this includes foaf:Person, skos:Concept, and rdaEnt:Person. Each new crosswalk results in some redundancy in the RDF representation, but in principle it should make these resources suitable for use from all these perspectives. These crosswalks are still experimental, though, and are likely to change significantly in the future.

Existing Work (optional)

This section is used to refer to existing technologies or approaches which achieve the use case (Hint: Specific approaches in the library domain). It may especially refer to running prototypes or applications.

Related Vocabularies (optional)

Here you can list and clarify the use of vocabularies (element sets and value vocabularies) which can be helpful and applied within this context.

Problems and Limitations

SKOSXL

This section lists reasons why this scenario is or may be difficult to achieve, including pre-requisites which may not be met, technological obstacles etc. Please explicitly list here the technical challenges made apparent by this use case. This will aid in creating a roadmap to overcome those challenges.

Some of the classes and properties in the VIAF ontology need to be reconsidered from SKOSXL perspective. The viaf:Heading, viaf:EstablishedHeading, viaf:XRefAlternate, and viaf:XRefRelated classes can probably be moved to skosxl:Label.

Switching to SKOSXL properties isn't so easy, though. Ideally, viaf:hasEstablishedForm, viaf:hasXrefAlternate would be moved to skosxl:prefLabel and skosxl:altLabel. The difficulty is in the SKOS S14 integrity condition:

S14: A resource has no more than one value of skos:prefLabel per language tag.

In VIAF, "preferred" literals are coupled with contributing agencies rather than language tag. There isn't a trivial one-to-one mapping between agency and language tag, so some work is needed to satisfy the condition.

The rdf:type on the 303 URI is unstable

The rdf:type assigned to VIAF 303 URIs (e.g. http://viaf.org/viaf/108389263) has been unstable and remains uncomfortable. Originally it was a foaf:Person then a skos:Concept, and is currently a viaf:NameAuthorityCluster with the foaf:Person and skos:Concept identified with hash URIs:

It's becoming quite clear, though, that people don't understand hash URIs and most automatically assume that the 303 URI is the only one that matters. Here are some possible resolutions that would be nice to have some feedback on:

  1. shuffle the URIs so the Person/Organization gets the 303 instead (and risk confusing existing users)
  2. merge the rdf:types into the individual identified with the 303 even if it creates a weird chimera of Person (or Organization, etc.), skos:Concept, viaf:NameAuthorityCluster (and risk confusing existing and future users)

Sharing Linked Data URIs in MARC

Some of the contributing agencies are starting to identify "the thing" the authority record is about. If these identifiers are included in the authority records that contributors send in, then VIAF can assert owl:sameAs to wire them together in the cluster. Corine Deliot has described a solution derived from MARBI discussions of ISNI. (Her example using the VIAF URI is problematic as described in the previous section.)

Related Use Cases and Unanticipated Uses (optional)

The scenario above describes a particular case of using linked data.. However, by allowing this scenario to take place, the likely solution allows for other use cases. This section captures unanticipated uses of the same system apparent in the use case scenario.

VIAF seems a specific example of Use Case Vocabulary Merging

Library Linked Data Dimensions / Topics

The dimensions and topics are used to organize the use cases. At the same time, they might help you to identify additional aspects currently not covered. If appropriate topics and/or dimensions are missing, please specify them here and annotate them by a “*”.

*these items are not in the initial list, suggestion for adding them

References (optional)

This section is used to refer to cited literature and quoted websites.