Experience - Richard Wallis

From Bibframe2Schema.org Community Group

Sharing Experience

Over several years I have been promoting the benefits of websites in general, and cultural heritage websites in particular, publishing Schema.org data to be consumed by search engines and others to aid the discovery and discoverability of the resources they describe. Accompanying these efforts have been several practical approaches to being able capture bibliographic description from MARC records and representing that description in Schema.org formatted data understandable to search engines.

Schema Bib Extend Community Group

As Chair of the Schema Bib Extend Community Group I helped introduce several enhancements into the Schema.org vocabulary, and its bibliographic extension (bib.schema.org), that improved its capabilities for representing bibliographic concepts. Examples of these include properties such as exampleOfWork & workExample enabling the representation of the relationships between [FRBR concepts] such as Works, Expressions, Manifestations and Items; the properties translationOfWork & workTranslation to interlink translations etc.

In addition to proposing enhancements to the vocabulary, the Group also shared several practical examples of how to represent bibliographic metadata concepts and data. These have helped many, from a bibliographic/MARC metadata background, get to grips with the different approaches taken by the generic cross-domain vocabulary that is Schema.org, to those with which they are familiar. For example, a [bibliographic] Work, can be represented using the Schema CreativeWork type, whereas a Manifestation is represented using a combination of the Schema CreativeWork & Product types.

One of the initial challenges, both within discussions in the Group and approaching practical implementations, was the lack of a broadly understood entity based view of bibliographic resources and associated metadata standards. Initial Bibframe standards, and trials, did help somewhat in this area.

The Challenge of Converting MARC to an Entity-based View

Converting MARC directly to Schema.org, introduces two significant challenges. Firstly, to identify and extract the entities (Work, Manifestation, Person, Organization, Place, etc.) from a source record, and then representing those entities, and their relationships, in Schema.org.

Fortunately, Bibframe 2.0 and its supporting conversion specifications and example conversion programs, have introduced a standardised way of producing bibliographic entity descriptions from MARC records. The acceptance of these, albeit yet to be universal, is sufficient to build upon, thus removing the first of those two challenges. In simple terms "Let the Bibframe community establish how to extract represent bibliographic entities from MARC data, and we can focus on how to represent those [Bibframe] entities and relationship in Schema.org.

Its Not just a Simple Conversion

Working with and advising several National and other Libraries and some of their system suppliers, it has become very clear that a simple 1-2-1 conversion script is not sufficient to produce useful Schema.org data from a Bibframe (2.0) source. Several challenges influence this conclusion, for example:

  • Reification. In Bibframe in the relationship between an Item, Manifestation and a Work entity properties such as author, title etc. are considered to be work-level properties (to be discovered from an Item by following the itemOf and then from the Instance entity via its instanceOf property to the work entity); whereas in Schema.org all these values would be represented in each of the descriptions.
  • Type Selection. In Bibframe there are only the general Work, Instance and Item types available - property values are used to identify the form and format (Book, Article, CD, etc.) of the entity being described. Schema.org utilises specific entity types to trpesent these (Book, Article,Painting, etc.
  • There is no Agent type in Schema.org. It is a general assumption, on the wider web, that a data publisher will know if an entity represents an Organization or a Person. This conflicts with the general use of Agent in Bibframe and requires some conversion processing as to which to use, or default to, in Schema.org production.

Not all conversion processes are this complex. For example, the label property of Bibframe title entities can all easily be converted to Schema name properties.

Conversion Processing

I have tried several approaches to conversion processing – XSLT Transform scripts parsing XML Formatted Bibframe, bespoke Python coded programs, SPARQL scripts operating on Bibframe data loaded into a triplestore, SPARQL scripts running inside inline scripts as part of a data conversion pipeline.

The most flexibility and success has come from utilising the SPARQL query language. It can be applied easily either in inline scripts or in a triplestore. It can either be used to CONSTRUCT Schema.org triples for output, or INSERT Schema.org triples into the dataset being queried. Using SPARQL and RDF triples has the benefit of coping with duplicate triple creation. It is equally applicable to simple 1-2-1 triple creation, as to complex situation dependant processing.

Simple example

Creating Schema name triple for entity with a Bibframe title entity

INSERT { 
    ?s schema:name ?o . 
} 
WHERE {
    ?s bf:title/rdfs:label ?o . 
}
Complex example

Identifying a Bibframe Instance type, and adding Schema CreativeWork & Product types, exampleOfWork triple linking it to its Work, plus adding relevant triples from that work for creator, inLanguage, etc.

INSERT {
  ?s a schema:CreativeWork ;
     a schema:Product;
       schema:exampleOfWork ?w;
       schema:contributor ?contributor;
       schema:creator ?creator;
       schema:description ?description;
       schema:inLanguage ?language.
       
  ?w schema:workExample ?s.
}
where {
    {
        ?s a bf:Instance;
           bf:instanceOf ?w.
    }
    UNION{
        OPTIONAL{
            ?w bf:contributor ?contributor .
        }
        OPTIONAL{
            ?w bf:creator ?creator .
        }
        OPTIONAL{
            ?w bf:description ?description .
        }
        OPTIONAL{
            ?w bf:genre ?genre .
        }
        OPTIONAL{
            ?w bf:language ?language .
        }
    }
}