Converting XML-encoded texts to RDF and back
Posted on:We created a generic research platform called Knora to be (primarily) used in the humanities domain. Knora internally uses an RDF-triplestore and offers a RESTful API to its users to perform all necessary operations (reading, creating, updating, and deleting data). The Knora base ontology provides basic value types designed for the representation of qualitative data, including versioning and permissions.
An important part of data in the humanities are marked up texts (e.g., for digital editions). Once imported into Knora, it is our goal to represent these texts adequately in RDF and to export them if the user wishes to do so. At the moment, we support the import of XML-encoded texts into Knora and their export as XML. The export delivers an XML document that is equivalent to the imported one (equivalent, but not necessarily identical on the character stream level).
Before importing an XML document representing a text, a mapping has to be provided. A mapping expresses the relations between XML elements and attributes and their corresponding entities defined in ontologies (classes and properties). With a mapping provided, XML documents can be converted to RDF and stored in Knora’s triplestore. During the conversion, markup and content are separated since we use a so called standoff-based approach (referring to positions or ranges of the text via index positions of single characters). The text is stored as a string, the markup is represented as RDF-triples, allowing for SPARQL queries.
Our goal is to develop an editor that allows for creating and editing texts directly in a native standoff format. For now, we are still using embedded markup (e.g., HTML in a browser-based GUI) that is converted to RDF and back, limiting the advantages of the standoff apprach. One of the main advantages of standoff is the ability to add layers of annotations to a text without interfering with the existing markup (unlike as in embedded markup like XML-based documents where overlap may occur). Our approach is inspired by Desmond Schmidt’s work: http://multiversiondocs.blogspot.com
You will find more information about the creation and handling of standoff markup in Knora here:
- mapping XML to RDF: https://docs.knora.org/paradox/03-apis/api-v1/xml-to-standoff-mapping.html
- standoff entities defined in the Knora base ontology: https://docs.knora.org/paradox/02-knora-ontologies/knora-base.html#text-with-standoff-markup
- tests that illustrate the use of the XML to standoff conversion: https://github.com/dhlab-basel/Knora/blob/develop/webapi/src/test/scala/org/knora/webapi/e2e/v1/StandoffV1R2RSpec.scala