W3C

RDFTM: Survey of Interoperability Proposals

W3C Editor's Draft 24 February 2005

This version:
http://www.w3.org/2001/sw/BestPractices/RDFTM/survey-2005-02-24
Latest version:
http://www.w3.org/2001/sw/BestPractices/RDFTM/survey
Previous version:
This is the first Working Draft
Editors:
Steve Pepper, Ontopia <pepper@ontopia.net>
Fabio Vitali, University of Bologna <fabio@cs.unibo.it>

Abstract

The Resource Description Framework (RDF) is a model developed by the W3C for representing information about resources in the World Wide Web. Topic Maps is a model for knowledge integration developed by the ISO. This document contains a survey of existing proposals for integrating RDF and Topic Maps data and is intended to be a starting point for establishing standard guidelines for RDF/Topic Maps interoperability.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is the first deliverable of the RDF/Topic Maps Interoperability Task Force (RDFTM) initiated by the W3C with the support of the ISO Topic Maps committee (ISO/IEC JTC1/SC34).

This document is a W3C Editor's Draft and is expected to change. The SWBPD WG does not expect this document to become a Recommendation. Rather, after further development, review and refinement, it will be published and maintained as a WG Note.

This document is not yet a Public Working Draft. We encourage public comments. Please send comments to public-swbp-wg@w3.org and include the text "comment" in the subject line.

Table of contents

1 Introduction
    1.1 Background
    1.2 Overview of Proposals
2 Criteria for evaluating the proposals
    2.1 Translation features
    2.2 Major issues
        2.2.1 Topic Maps issues
        2.2.2 RDF issues
    2.3 Test Cases
        2.3.1 TM2RDF Test Case
        2.3.2 RDF2TM Test Case
3 Existing translation proposals
    3.1 The Moore Proposal
        3.1.1 Description
        3.1.2 Analysis
        3.1.3 Test Cases
    3.2 The Stanford Proposal
        3.2.1 Description
        3.2.2 Analysis
        3.2.3 Test Cases
    3.3 The Ogievetsky Proposal
        3.3.1 Description
        3.3.2 Analysis
        3.3.3 Test Cases
    3.4 The Garshol Proposal
        3.4.1 Description
        3.4.2 Analysis
        3.4.3 Test Cases
    3.5 The Unibo Proposal
        3.5.1 Description
        3.5.2 Analysis
        3.5.3 Test Cases
    3.6 Other Proposals and Contributions
4 Analysis
    4.1 Object mappings and semantic mappings
    4.2 The importance of being faithful
    4.3 Semantic mapping issues
        4.3.1 Identity
        4.3.2 Topic names
        4.3.3 Binary associations
        4.3.4 N-ary associations
        4.3.5 Occurrences
        4.3.6 Types and subtypes
        4.3.7 Reification
        4.3.8 Scope
        4.3.9 Other issues
5 Conclusion
Acknowledgements
References

1 Introduction

1.1 Background

The Resource Description Framework (RDF) is a model developed by the W3C for representing information about resources in the World Wide Web. Topic Maps is a model for knowledge integration developed by the ISO. The two specifications were developed in parallel during the late 1990's within their separate organizations for what at first appeared to be very different purposes. The results, however, turned out to have a lot in common and this has led to calls for their unification.

While unification has to date not been possible (for a variety of technical and political reasons), a number of attempts have been made to uncover the synergies between RDF and Topic Maps and to find ways of achieving interoperability at the data level. There is now widespread recognition within the respective user communities that achieving such interoperability is a matter of some urgency. A Task Force has therefore been established by the W3C with the support of the ISO Topic Maps committee to address this issue.

The RDF/Topic Maps Interoperability Task Force has initially been chartered to provide "guidelines for users who want to combine usage of the W3C's RDF/OWL family of specifications and the ISO's family of Topic Maps standards." Two deliverables will be produced:

This document is the first of those deliverables. It consists of a summary of the major existing proposals for achieving data interoperability, an analysis, and suggestions pointing towards a general, unified approach that will be described in the second deliverable.

1.2 Overview of Proposals

Five existing proposals have been identified as being sufficiently complete and well-documented to be suitable for detailed examination. They will be referred to by the names of their authors or, in the case of multiple authors, by the name of the organization to which they are affiliated. They are, in chronological order:

Moore

RDF2TM and TM2RDF proposal described in [Moore 01]. Not implemented. Has no references to previous work.

Stanford

TM2RDF proposal described in [Lacher 01]. Implemented. References [Moore 01].

Ogievetsky

TM2RDF proposal described in [Ogievetsky 01b]. Implemented in the XTM2RDF Translator. References [Moore 01] and [Lacher 01].

Garshol

RDF2TM and TM2RDF proposal described in [Garshol 01] and [Garshol 03a]. Documented in [Garshol 03b], [Ontopia 03a], and [Ontopia 03b], and implemented in Omnigator and the Ontopia Knowledge Suite. References [Moore 01], [Lacher 01] and [Ogievetsky 01b].

Unibo

RDF2TM and TM2RDF proposal described in [Gentilucci 02] and [Ciancarini 03]. Implemented in Meta. References [Moore 01], [Lacher 01], [Ogievetsky 01b], [Garshol 01] and [Garshol 03a].

The following proposals will only be considered briefly since they are insufficiently complete to warrant detailed examination:

The following contributions are also recognized as being relevant:

This survey compares and evaluates the five proposals identified above in three related but independent ways: by examining key features qualifying the translation proposal, key issues in the two standards that need to be considered, and a number of test cases that show in practice the desired result.


2 Criteria for evaluating the proposals

2.1 Translation features

Each translation proposal is evaluated against the following general criteria:

Completeness

The criterion completeness is used to evaluate the extent to which each proposal is able to handle every semantic construct that can be expressed in the source model and provide a means to represent it without loss of information in the target model. A complete translation will by definition be reversible.

Fidelity

The criterion fidelity expresses the degree to which the results of a translation are faithful to the underlying conceptual model of the target paradigm. This quality can be thought of as naturalness, that is, as corresponding to the way in which someone familiar with the target paradigm would naturally express the information content in that paradigm. Naturalness normally also confers improved readability on the result.


2.2 Major issues

This section briefly characterizes some of the major issues in translating data from RDF to Topic Maps and vice versa. The five existing proposals will also be evaluated in terms of the degree to which they are able to provide satisfactory solutions to these issues.

2.2.1 Topic Maps issues

Identity

In Topic Maps, a topic can have multiple identifiers of different types (subject identifiers, subject locators, and source locators), whereas RDF only allows a single identifier.

Scope

Topic Maps allow statements about the contextual validity of an assertion to be made using scope. No equivalent concept exists in RDF.

N-ary associations

Associations in Topic Maps can have one, two or more role players. An RDF statement represents a binary relationship.

Association roles

Associations in Topic Maps are inherently "multidirectional" and use association roles to characterize the nature of the role-playing topics' involvement in the relationship. In RDF there is no need for association roles because relationships are always binary and directional. One result of this is that symmetric relationships are always represented as such in Topic Maps whereas this is not possible in RDF (except with the help of OWL).

Variants

Topic names (base names) may have multiple variants for use in different processing contexts. RDF has no equivalent concept.

Reification

Topic Maps has a relatively simple mechanism for reifying names, associations, occurrences, association roles and topic maps. RDF's reification mechanism appears to be controversial in some quarters. Whether or not this is true, reification is an issue that needs to be addressed.

2.2.2 RDF issues

Containers

RDF has the concept of containers, which may be bags, sequences or alternatives and are described using the predefined types rdf:Bag, rdf:Seq and rdf:Alt, and the predefined properties rdf:_1 .. rdf:n. Topic Maps has no equivalent concept.

Collections

RDF has the concept of (closed) collections described using the built-in type rdf:List, the built-in properties rdf:first and rdf:rest, and the predefined resource rdf:nil. Topic Maps has no equivalent concept.

Language tags

Plain literals may have optional language tags (expressed in RDF/XML using the xml:lang attribute). Topic Maps has no formal equivalent, although scope is widely used for this purpose.

XML literals

RDF allows the value of a property to be an XML literal. Topic Maps does not currently permit arbitrary XML to be embedded inside an XTM document (however, the revised version of ISO 13250 will allow this).

Typed literals

RDF permits datatyping of literals; the current version of ISO 13250 does not allow this but the revised version will.


2.3 Test Cases

This survey uses two simple test cases to enable an initial evaluation of the criterion "fidelity". These are distinct from the larger set of test cases being developed for use in evaluating the Guidelines that will be produced by the RDFTM Task Force as its second deliverable.

Test cases and the results of translations are given in LTM and N3 notation (for Topic Maps and RDF respectively) in order to aid readability. There is one test case for TM2RDF translations and a second for RDF2TM translations.

2.3.1 TM2RDF Test Case

This test case consists of the assertions that the opera Tosca was premiered on 14th January 1900, has a synopsis at a certain location, and was composed by the composer Giacomo Puccini. All topics have a single subject identifier.

The example is taken from Steve Pepper's Italian Opera Topic Map at http://www.ontopia.net/omnigator/docs/navigator/opera.hytm. [@@Note: The example was modified to use role types that are different from the corresponding topic types. All test case results need to be reviewed.]

[puccini : person   = "Giacomo Puccini" @"http://en.wikipedia.org/wiki/Puccini"]
[tosca   : opera    = "Tosca"  @"http://psi.ontopia.net/opera/#tosca"]

{tosca, premiere-date, [[1900-01-14]]}
{tosca, synopsis,      "http://www.azopera.com/learn/synopsis/tosca.shtml"}

composed-by( tosca : work, puccini : composer )

[person        = "Person"        @"http://psi.ontopia.net/music/#person"]
[composer      = "Composer"      @"http://psi.ontopia.net/music/#composer"]
[opera         = "Opera"         @"http://psi.ontopia.net/music/#opera"]
[work          = "Work"          @"http://psi.ontopia.net/music/#work"]
[premiere-date = "Première date" @"http://psi.ontopia.net/music/#premiere-date"]
[synopsis      = "Synopsis"      @"http://psi.ontopia.net/music/#synopsis"]
[composed-by   = "Composed by"   @"http://psi.ontopia.net/music/#composed-by"]

2.3.2 RDF2TM Test Case

This test case consists of the assertions that a concert took place on a certain date, at a certain venue, with a particular conductor and soloist. The location of the venue is given by coordinates. The concert, venue, conductor and soloist all have labels and are represented by blank nodes. All classes and properties have labels.

The example is taken from Masahide Kanzuki's Music Vocabulary example at http://www.kanzaki.com/ns/music.

@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix ev: <http://ebiquity.umbc.edu/v2.1/ontology/event.owl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix music: <http://www.kanzaki.com/ns/music#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

[ rdf:type  music:Concert;
  dc:title "Tokyo Green Symphony Orchestra 12th Concert";
  dc:date "2003-11-02T14:00+09:00";
  ev:location  [
    rdf:type music:Venue;
    dc:title "Sumida Triphony";
    geo:lat "35.69";
    geo:long "139.81" ];
  music:conductor  [
    rdf:type music:Conductor;
    foaf:name "Yuri Nitta" ];
  music:performer  [
    rdf:type music:Violinist;
    foaf:name "Tomoko Kawada" ]
] .

music:Concert    rdfs:label "Concert" .
music:Conductor  rdfs:label "Conductor" .
music:Venue      rdfs:label "Venue" .
music:Violinist  rdfs:label "Violinist" .

music:conductor  rdfs:label "Conductor" .
music:performer  rdfs:label "Performer" .

dc:date     rdfs:label "Date" .
dc:title    rdfs:label "Title" .
ev:location rdfs:label "Event Location" .
foaf:name   rdfs:label "Name" .
geo:lat     rdfs:label "Latitude" .
geo:long    rdfs:label "Longitude" .

3 Existing translation proposals

3.1 The Moore Proposal

3.1.1 Description

[Moore 01] was the first paper to address the issue of interoperability between RDF and Topic Maps. The paper starts out by presenting data models, developed by the author, that "capture the isness [sic] of the two paradigms". Having presented the two models, Moore introduces the distinction between what he calls "mapping the model" and "modelling the model". The key difference is that the former is "semantic", whereas the latter "uses each standard as a tool for describing other models".

Moore provides examples of both strategies, but states clearly that "mapping" is preferable to "modelling". The reason for this is that a goal is to be able to run, say, a TMQL query against an RDF model and get "expected results" ("i.e. those that would be gained from running a query against the equivalent topic map"). Moore points out that this is only possible when a mapping approach is used.

"Modelling the model"

Moore's RDF2TM modelling approach is based on defining PSIs for every RDF construct (i.e., resource, statement, property, subject, object, identity, literal, and model) and expressing RDF statements as ternary associations of type rdf-statement using the role types rdf-subject, rdf-property and rdf-object. This raises issues with the handling of literals (since role players in associations cannot be strings) to which no clear solution is proposed.

The TM2RDF modeling model is based on defining RDF properties for each TM construct as follows: topic, topicassoc, instanceof, topicassocmember, roleplayingtopic, roledefiningtopic, topicoccur, topicname, topicnamevalue, scopeset, subjindicatorref, resourceref. An example of a simple binary association is given that involves five topics (for the association type and role types, in addition to the role-playing topics). The RDF equivalent of this requires 22 statements, three for each of the five topics, and seven for the association itself.

"Mapping the model"

Moore concludes that the "modelling" approach, while interesting, is of limited usefulness, and he goes on to describe a "mapping" approach based on the observation that "RDF is concerned with describing the arcs between entities with identity [whereas] Topic Maps is concerned with describing typed relationships between entities with identity." A number of semantic equivalences are defined, as follows:

RDFTopic Maps
RDF modelTopic Map
IdentitySubjectIndicatorReference
ResourceTopic
StatementAssociation (approximate)

The mapping from RDF statement to association is identified as being problematic because "RDF has three pieces of information and Topic Map associations have five", leading the author to suspect that a "complete" mapping of the models may not be possible. The remainder of the paper is devoted to examining how to represent RDF statements as associations and vice versa.

RDF statements are viewed as binary associations whose role-players correspond to the subject and object of the statement and have the role types 'subject' and 'object' respectively. The mechanism for representing the property of the statement is unclear, since the text and diagram appear to be at odds with one another. However, both text and diagram assign some significance to the name of the topic that represents the subject role.

According to Moore, this approach has a problem in that 'arc' is "not a first class entity model". Why this should be a problem is not entirely clear. To solve it, Moore advocates extending the Topic Maps model with the notion of arcs (and association templates), but the usefulness of this is also not clear.

Viewing associations as RDF statements employs a different approach. An incomplete example shows a binary association being represented as two RDF statements, with the role-playing topics being the subject and object in the one and the object and subject in the other. This approach is perhaps motivated by the recognition that RDF statements have direction whereas associations do not. However this is not stated explicitly; nor is it clear how the approach would work with associations that involve more than two role players.

3.1.2 Analysis

  1. Completeness: The "modelling" approaches are reasonably complete. The "mapping" approach is simply a sketch that focuses on RDF statements and associations. Other constructs like names, occurrences and scope are not covered. Neither approach is reversible. In the case of the "modelling" approach, the assumption is that one is working in one domain or the other, but not in both. In the case of the "mapping" approach, the fact that a statement maps to a single association whereas an association maps to two statements shows that translations cannot be reversed.
  2. Fidelity: The "mapping" approach is shown to be superior to the "modelling" approach. The latter results in low fidelity in both directions. Whatever the direction, a "natural" source document leads to an "unnatural" result document and achieving a "natural" result document is only possible if the starting point is an "unnatural" source document. In the example given in the [Moore 01], a simple binary association translates to 22 RDF statements. Moore's mapping approach achieves somewhat better fidelity: Going from Topic Maps to RDF, a binary association requires two RDF statements; going the other way, an RDF statement maps to a single association.

3.1.3 Test Cases

TM2RDF

The following (incomplete) result of Moore's "modelling" approach was constructed by hand based on the binary association example given in [Moore 01]. It does not cover the two occurrences in the test case since there are no examples of how this proposal handles occurrences. Lack of clarity in [Moore 01] prevents the construction of a corresponding result of the "mapping" approach. However the latter could be expected to contain significantly fewer RDF statements.

# topic 1: puccini
<http://en.wikipedia.org/wiki/Puccini>
 <http://www.empolis.com/rdftmmapping#tm-topicname>
  _:topic1 .

_:topic1
 <http://www.empolis.com/rdftmmapping#tm-topicnamevalue>
  "Giacomo Puccini" .

<http://en.wikipedia.org/wiki/Puccini>
 <http://www.empolis.com/rdftmmapping#tm-subjindicatorref>
  "http://en.wikipedia.org/wiki/Puccini" .

# topic 2: tosca
<http://psi.ontopia.net/opera/#tosca>
 <http://www.empolis.com/rdftmmapping#tm-topicname>
  _:topic2 .

_:topic2
 <http://www.empolis.com/rdftmmapping#tm-topicnamevalue>
  "Tosca" .

<http://psi.ontopia.net/opera/#tosca>
 <http://www.empolis.com/rdftmmapping#tm-subjindicatorref>
  "http://psi.ontopia.net/opera/#tosca" .

# topic 3: composer
<http://psi.ontopia.net/music/#composer>
 <http://www.empolis.com/rdftmmapping#tm-topicname>
  _:topic3 .

_:topic3
 <http://www.empolis.com/rdftmmapping#tm-topicnamevalue>
  "Composer" .

<http://psi.ontopia.net/music/#composer>
 <http://www.empolis.com/rdftmmapping#tm-subjindicatorref>
  "http://psi.ontopia.net/music/#composer" .

# topic 4: opera
<http://psi.ontopia.net/music/#opera>
 <http://www.empolis.com/rdftmmapping#tm-topicname>
  _:topic4 .

_:topic4
 <http://www.empolis.com/rdftmmapping#tm-topicnamevalue>
  "Opera" .

<http://psi.ontopia.net/music/#opera>
 <http://www.empolis.com/rdftmmapping#tm-subjindicatorref>
  "http://psi.ontopia.net/music/#opera" .

# topic 5: composed-by
<http://psi.ontopia.net/music/#composed-by>
 <http://www.empolis.com/rdftmmapping#tm-topicname>
  _:topic5 .

_:topic5
 <http://www.empolis.com/rdftmmapping#tm-topicnamevalue>
  "Composed by" .

<http://psi.ontopia.net/music/#composed-by>
 <http://www.empolis.com/rdftmmapping#tm-subjindicatorref>
  "http://psi.ontopia.net/music/#composed-by" .

# association
_:assoc-1
 <http://www.empolis.com/rdftmmapping#tm-instanceof>
  <http://psi.ontopia.net/music/#composed-by> .
_:assoc-1
 <http://www.empolis.com/rdftmmapping#tm-topicassocmember>
  _:assocmember-1 .
_:assoc-1
 <http://www.empolis.com/rdftmmapping#tm-topicassocmember>
  _:assocmember-2 .
_:assocmember-1
 <http://www.empolis.com/rdftmmapping#tm-roledefiningtopic>
  <http://psi.ontopia.net/music/#composer> .
_:assocmember-1
 <http://www.empolis.com/rdftmmapping#tm-roleplayingtopic>
  <http://en.wikipedia.org/wiki/Puccini> .
_:assocmember-2
 <http://www.empolis.com/rdftmmapping#tm-roledefiningtopic>
  <http://psi.ontopia.net/music/#work> .
_:assocmember-2
 <http://www.empolis.com/rdftmmapping#tm-roleplayingtopic>
  <http://psi.ontopia.net/opera/#tosca> .
RDF2TM

This test case cannot be represented as a topic map in its entirety following the "modelling" approach because there is no provision for RDF statements whose objects are literals (which is the case for 17 of the 26 statements in the test case, including all the names, labels, and titles). The nine statements whose objects are resources would each be represented as a ternary association of type statement, as follows:

statement( ag0 : subject, performer : property, ag3 : object )

(This ternary association captures the assertion that the TGSO 12th Concert (ag0) has the performer Tomoko Kawada (ag3), both of which are blank nodes in the RDF.)

Neither can the RDF2TM test case be represented as a topic map in accordance with Moore's alternative "mapping" approach, because of insufficient information in [Moore 01]. Each RDF statement would in theory be represented by a single binary association, but once again there is no provision for handling statements whose objects are literals.


3.2 The Stanford Proposal

3.2.1 Description

Despite recognizing that "both directions [i.e., RDF2TM and TM2RDF] are equally important" for achieving interoperability between Topic Maps and RDF, Lacher and Decker [Lacher 01] choose to focus on making it possible to query Topic Maps using an "RDF-aware infrastructure" that was co-developed by one of the authors. This proposal is thus TM2RDF only.

Reference is made to the layered integration model of data interoperability which separates the data integration problem into three quasi-independent layers: the syntax layer, the object layer, and the semantic layer. Since the RDF data model is essentially a directed, labelled graph, the idea is to build an internal graph representation of the topic map on the object layer and then perform a "bijective graph transformation" such that the topic map can be viewed as RDF. Ignoring the syntax layer means that the approach will work with both the SGML and the XML serialization syntaxes of Topic Maps. Ignoring the semantic layer (i.e., adopting the approach termed "modelling the model" in [Moore 01]) has the advantage according to the authors that all information is preserved. (The authors point out that a semantic mapping "could possibly lead to a loss of information".)

Instead of defining their own model for Topic Maps, Lacher and Decker use PMTM4, the Processing Model for Topic Maps, proposed by Newcomb and Biezunski ([PMTM4]), which they characterize as "the only source of a valid mapping from the XTM syntax to a valid internal Topic Map representation".

[PMTM4] proposes a graph model consisting of three node types (for topics, associations, and scopes), and four arc types: associationMember (aM), associationScope (aS), associationTemplate (aT), and scopeComponent (sC). The aM arc is "peculiar" in that it is both typed and labelled (and thus effectively has three ends) in order to connect the association with both the role-player and its role (or role type). Names and occurrences are regarded as specializations of associations; URIs and strings are not part of the model.

To illustrate their approach Lacher and Decker show a simple (untyped) association between the country Denmark (which has a name) and the natural resource petroleum. This is represented as a PMTM4 graph consisting of eight t-nodes, two a-nodes, four aM arcs, and one aT arc. The (binary) association between Denmark and petroleum requires two aM arcs (one for each role), and so does the name "Denmark" (since topic names are regarded in PMTM4 as a kind of binary association).

Lacher and Decker define RDF classes and properties for each of the PMTM4 node types and arc types. The transformation consists essentially of replacing a-, t-, and s-nodes with RDF nodes of corresponding types, and replacing arcs with corresponding properties. However in order to handle the "three-legged" aM arcs, reification is necessary, thus introducing one new RDF node and four new properties (rdf:subject, rdf:predicate, rdf:object and tms:roleLabel) for each aM arc. The resulting "RDF Topic Map graph" is shown as a figure consisting of a total of 17 nodes and 20 arcs. (The actual totals should probably be higher since rdf:type is only specified for a few nodes.)

The authors opt to represent each undirected PMTM4 arc by a single directed RDF arc (rather than two directed RDF arcs) in order to avoid consistency problems, pointing out that while this is not a lossy transformation, it does require consideration when formulating queries.

No syntax example is given in [Lacher 01] to show the result of the transformation but from the text it is clear that node identity is either based on source locators (where XML IDs were specified in the source topic map) or else generated (where no IDs were specified). Subject identifiers and subject locators are not used – presumably because the PMTM4 model does not extend to identifiers.

Having constructed an RDF graph from the topic map, Lacher and Decker show how it can be queried, together with native RDF data, by a single query expressed in F-Logic syntax. The query uses the RDF-encoded topic map to find all countries that have petroleum as a natural resource and then extracts links to DMOZ Travel_and_Tourism pages for those countries from the RDF-encoded Open Directory:

FORALL pages <- Country, DMOZCountry Y,X, Z
    Y[tms:roleLabel->country;rdf:object->Country]
        @CIA_WORLD_FACTBOOK and
    X[tms:roleLabel->natural-resource;
      rdf:object->petroleum;
      rdf:subject->Z[tms:associationMember->Country]
        @CIA_WORLD_FACTBOOK]
        @CIA_WORLD_FACTBOOK and
    Country[mapsTo->DMOZCountry] and
    DMOZCountry[Travel_and_Tourism ->dmozpage[links->pages]]
        @DMOZ.

3.2.2 Analysis

  1. Completeness: The approach is complete with respect to PMTM4, but the latter is not a complete model for Topic Maps (since is does not handle URIs and strings). The Stanford proposal is therefore not complete.
  2. Fidelity: The proposal has low fidelity since it requires upwards of 20 statements to represent information that would naturally be modelled using two statements in RDF.

3.2.3 Test Cases

TM2RDF

A test case has been requested from the authors. The following is an attempt to hand-code parts of the test case. Only the association and the names of the two role-playing topics are shown. All occurrences, type-instance relationships, and names of typing topics are omitted. It is estimated that these would require an additional 115 statements (12*2=26; + 12*2=24; + 12*5+5=65).

@prefix  rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix  tms:   <http://www.standford.edu/rdftmmapping/tm-schema#> .
@prefix  psi1:  <file:psi1.xtm#> .
@prefix  core:  <file:core.xtm#> .

### composed-by association ---------------------------
_:puccini-tosca-assoc
  rdf:type                  tms:association ;
  tms:associationTemplate   _:composed-by .

# reified statement representing composer role
_:puccini-composer-role
  rdf:type          rdf:Statement ;
  rdf:subject       _:puccini-tosca-assoc ;
  rdf:predicate     tms:roleLabel ;
  rdf:object        _:puccini ;
  tms:roleLabel     _:composer .

# reified statement representing work role
_:puccini-work-role
  rdf:type          rdf:Statement ;
  rdf:subject       _:puccini-tosca-assoc ;
  rdf:predicate     tms:roleLabel ;
  rdf:object        _:tosca ;
  tms:roleLabel     _:work .

### topic-basename association for puccini ------------
_:puccini-name-assoc
  rdf:type                  tms:association ;
  tms:associationTemplate   psi1:at-topic-basename .

# reified statement representing topic role
_:puccini-topic-role
  rdf:type          rdf:Statement ;
  rdf:subject       _:puccini-name-assoc ;
  rdf:predicate     tms:roleLabel ;
  rdf:object        _:puccini ;
  tms:roleLabel     core:role-topic .

# reified statement representing basename role
_:puccini-name-role
  rdf:type          rdf:Statement ;
  rdf:subject       _:puccini-name-assoc ;
  rdf:predicate     tms:roleLabel ;
  rdf:object        "Giacomo Puccini" ;
  tms:roleLabel     core:role-basename .

### topic-basename association for tosca --------------
_:tosca-name-assoc
  rdf:type                  tms:association ;
  tms:associationTemplate   psi1:at-topic-basename .

# reified statement representing topic role
_:tosca-topic-role
  rdf:type          rdf:Statement ;
  rdf:subject       _:tosca-name-assoc ;
  rdf:predicate     tms:roleLabel ;
  rdf:object        _:tosca ;
  tms:roleLabel     core:role-topic .

# reified statement representing basename role
_:tosca-name-role
  rdf:type          rdf:Statement ;
  rdf:subject       _:tosca-name-assoc ;
  rdf:predicate     tms:roleLabel ;
  rdf:object        "Tosca" ;
  tms:roleLabel     core:role-basename .

### specification of node types -----------------------
_:puccini                 rdf:type          tms:topic .
_:tosca                   rdf:type          tms:topic .
_:composed-by             rdf:type          tms:topic .
_:composer                rdf:type          tms:topic .
_:opera                   rdf:type          tms:topic .

tms:associationTemplate   rdf:type          tms:topic .
tms:roleLabel             rdf:type          tms:topic .

core:role-topic           rdf:type          tms:topic .
core:role-basename        rdf:type          tms:topic .

3.3 The Ogievetsky Proposal

3.3.1 Description

From XTM to RDF

[Ogievetsky 01b] describes a method for transforming XTM instances into RDF that has been implemented using XSLT by the author in the XTM2RDF Translator. Transformations are described in terms of the processing of XTM elements and the approach is thus very syntax-oriented. The resulting RDF conforms to a vocabulary defined in an RDF Topic Maps Schema (RTM). The RTM vocabulary consists of 11 classes and 17 properties defined partly in terms of XTM itself and partly in terms of [PMTM4], the "processing model" proposed by Newcomb and Biezunski and described in the preceding section.

The classes and properties defined by the RTM vocabulary are:

rdfs:Class
t-node, topic, scope, member, association, basename, variantname, occurrence, class-subclass, class-instance, templaterpc
rdf:Property
association-role, validIn, indicatedBy, constitutedBy, name, templatedBy, role-topic, role-basename, role-variantname, role-occurrence, role-superclass, role-subclass, role-class, role-instance, role-template, role-role, role-rpc

Each <topic> element results in the creation of an RDF statement of type rtm:topic. The topic's subject locator (if any) becomes the URI of the subject of the statement, otherwise a blank node is created. Subject identifiers (if any) result in properties of type rtm:indicatedBy. The purpose of stating that topics are of type rtm:topic is somewhat unclear since this is apparently "redundant" in the case of topics that have a single subject locator and no subject identifiers. The rationale seems to be the desire to use rtm:topic as an element type name in order to aid readability when using the "third RDF basic abbreviated form".

Associations are represented as blank nodes whose type property corresponds to the association type. In addition there is one property for each role in the association. That property corresponds to the role type (e.g. ns1:composer and ns2:work in the example below); its value is a node of type rtm:member that references the role player. Referencing is done through an rtm:indicatedBy property when the role player has a subject identifier and an rtm:constitutedBy property when the role player has a subject locator. (It is unclear from the text what form the reference takes when the role player has neither.)

The following example (in N3 notation) shows how the association between Tosca and Puccini is represented in RDF:

[ rdf:type ns1:composed-by;
  ns1:composer  [
    rdf:type rtm:member;
    rtm:indicatedBy <http://en.wikipedia.org/wiki/Puccini> ];
  ns1:work  [
    rdf:type rtm:member;
    rtm:indicatedBy ns2:tosca ] ].

The blank node of type ns1:composed-by (representing the association) has two properties: ns1:composer and ns1:work, respectively. The values of these properties are blank nodes of type rtm:member that have additional rtm:indicatedBy properties. In all, seven RDF statements are used to represent the association. (Normally in RDF, of course, a relationship like this would be represented by a single statement.)

Interestingly, what appears to be a very opaque RDF representation when described in prose or encoded in N3 syntax becomes immediately more legible (at least for someone familiar with XTM syntax) when encoded in "third RDF basic abbreviated form":

<ns1:composed-by>
  <ns1:composer>
    <rtm:member>
      <rtm:indicatedBy rdf:resource="http://en.wikipedia.org/wiki/Puccini" />
    </rtm:member>
  </ns1:composer>
  <ns1:work>
    <rtm:member>
      <rtm:indicatedBy rdf:resource="http://psi.ontopia.net/opera/#tosca" />
    </rtm:member>
  </ns1:work>
</ns1:composed-by>

This leads to the suspicion that the exigencies of XSLT-based processing and the desire to output readable RDF/XML syntax have influenced the form of RDF chosen for the target model.

In accordance with PMTM4, the approach to handling associations described above is extended to other Topic Maps constructs. Thus, type-instance relationships are regarded as associations of a specific type; occurrences, base names and variants are all regarded as subtypes of association with fixed pairs of role types (topic/occurrence, topic/name, and basename/variantname, respectively).

String values for names and internal occurrences are represented as the values of rtm:name properties of member nodes. The following example shows the base name of the composer Puccini as output by the xtm2rdf.xsl XSLT stylesheet. A blank node represents the topic-basename relationship. Syntactically, the rtm:baseName construct has exactly the same "shape" as the association shown above:

<rtm:baseName rdf:ID="XSLTbaseName122124120120">
  <rtm:role-topic>
    <rtm:member>
      <rtm:indicatedBy rdf:resource="#puccini" />
    </rtm:member>
  </rtm:role-topic>
  <rtm:role-name>
    <rtm:member>
      <rtm:name>Giacomo Puccini</rtm:name>
    </rtm:member>
  </rtm:role-name>
</rtm:baseName>

As with binary associations, seven RDF statements are required to represent a single topic name characteristic that would normally be modelled using a single statement in RDF.

Roundtripping RDF to Topic Maps and back

Having presented the methodology for translating XTM to RDF, Ogievetsky considers roundtripping from RDF to XTM and back to RDF. [Ogievetsky 01b] is actually a continuation of earlier work for which only a set of slides, [Ogievetsky 01a], is available. In the earlier work RDF data was translated into XTM, again using XSLT stylesheets.

To demonstrate roundtripping [Ogievetsky 01b] shows an example of a Dublin Core fragment in RDF being translated to XTM according to the methodology in [Ogievetsky 01a], and then translated back to RDF according to the methodology in [Ogievetsky 01b]. The source document contains a single RDF statement asserting that the resource ZARA.xml has the creator "Jane M. Folpe". This translates to a topic map consisting of six TAOs (five topics and one association), which in turn translates back to RDF as a set of no less than 26 RDF statements. "Obviously we accumulated a lot of semantic luggage during our roundtrip" is Ogievetsky's laconic comment.

The remainder of [Ogievetsky 01b] is devoted to showing how "RDF Topic Maps" can be queried (using the RDF query language SquishQL) and constrained (using DAML+OIL). In addition, a brief comparison is made with a tolog-like query language. (Tolog is a topic maps query language defined by Ontopia and described in [Ontopia 04].) The following sample query shows how to find all topics that have names in the scope "taxon":

SELECT ?topic, ?name
FROM  http://www.cogx.com/xtm2rdf/seacr.rtm#
WHERE
  (rdf::type ?a ?rtm::basename)
  (rtm::role-topic ?a ?m1) (rtm::indicatedBy ?m1 ?topic)
  (rtm::role-name ?a ?m2)(rtm::name ?m2 ?name)
  (rtm::validIn ?a ?s)(rtm::indicatedBy ?s this::taxon)
USING
  rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#
  rtm FOR http://www.cogx.com/xtm2rdf/rtm.rdf#
  this FOR  http://www.cogx.com/xtm2rdf/seacr.rtm#

3.3.2 Analysis

  1. Completeness: The proposal appears to be fairly complete in that it covers more-or-less every aspect of XTM syntax (which requires extending the underlying PMTM4 model in order to cater for identifiers). The example of roundtripping shows clearly that this proposal in combination with the undocumented RDF2TM translation fails the test of reversibility since a single RDF statement ends up as 26 statements after the roundtrip.
  2. Fidelity: The fact that this proposal requires seven statements to represent information content that would naturally be modelled using one statement in RDF demonstrates that it has low fidelity. Translating the Topic Maps test case results in an RDF document containing 109 statements.

3.3.3 Test Cases

TM2RDF
@prefix rtm: <http://www.cogx.com/xtm2rdf/rtm.rdf#> .
@prefix ns1: <http://psi.ontopia.net/music/#> .
@prefix ns2: <http://psi.ontopia.net/opera/#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix this: <#> .

this:XSLTbaseName121120120120
  rdf:type rtm:baseName;
  rtm:role-name  [
    rdf:type rtm:member;
    rtm:name "Opera" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:opera ] .

this:XSLTbaseName121121120120
  rdf:type rtm:baseName;
  rtm:role-name  [
    rdf:type rtm:member;
    rtm:name "Composer" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:composer ] .

this:XSLTbaseName121122120120
  rdf:type rtm:baseName;
  rtm:role-name  [
    rdf:type rtm:member;
    rtm:name "Première date" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:premiere-date ] .

this:XSLTbaseName121123120120
  rdf:type rtm:baseName;
  rtm:role-name  [
    rdf:type rtm:member;
    rtm:name "Composed by" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:composed-by ] .

this:XSLTbaseName121126120120
  rdf:type rtm:baseName;
  rtm:role-name  [
    rdf:type rtm:member;
    rtm:name "Synopsis" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:synopsis ] .

this:XSLTbaseName122124120120
  rdf:type rtm:baseName;
  rtm:role-name  [
    rdf:type rtm:member;
    rtm:name "Giacomo Puccini" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:puccini ] .

this:XSLTbaseName122125120120
  rdf:type rtm:baseName;
  rtm:role-name  [
    rdf:type rtm:member;
    rtm:name "Tosca" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:tosca ] .

this:XSLTinstanceOf120124120120
  rdf:type rtm:classInstance;
  rtm:role-class  [
    rdf:type rtm:member;
    rtm:indicatedBy ns1:person ];
  rtm:role-instance  [
    rdf:type rtm:member;
    rtm:indicatedBy this:puccini ] .

this:XSLTinstanceOf120125120120
  rdf:type rtm:classInstance;
  rtm:role-class  [
    rdf:type rtm:member;
    rtm:indicatedBy ns1:opera ];
  rtm:role-instance  [
    rdf:type rtm:member;
    rtm:indicatedBy this:tosca ] .

this:XSLToccurrence123125120120
  rdf:type ns2:premiere-date;
  rtm:role-occurrence  [
    rdf:type rtm:member;
    rtm:name "1900 (14 Jan)" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:tosca ] .

this:XSLToccurrence124125120120
  rdf:type ns2:synopsis;
  rtm:role-occurrence  [
    rdf:type rtm:member;
    rtm:constitutedBy <http://www.azopera.com/learn/synopsis/tosca.shtml> ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:tosca ] .

[ rdf:type ns1:composed-by;
  ns1:composer  [
    rdf:type rtm:member;
    rtm:indicatedBy <http://en.wikipedia.org/wiki/Puccini> ];
  ns1:work  [
    rdf:type rtm:member;
    rtm:indicatedBy ns2:tosca ] ].

this:composed-by
  rdf:type rtm:topic;
  rtm:indicatedBy ns1:composed-by .

this:composer
  rdf:type rtm:topic,
           rdf:Property;
  rtm:indicatedBy ns1:composer;
  rdfs:domain ns1:composed-by;
  rdfs:range rtm:member .

this:work
  rdf:type rtm:topic,
           rdf:Property;
  rtm:indicatedBy ns1:work;
  rdfs:domain ns1:composed-by;
  rdfs:range rtm:member .

this:premiere-date
  rdf:type rtm:topic,
           rdfs:Class;
  rtm:indicatedBy ns2:premiere-date;
  rdf:subClassOf rtm:occurrence .

this:puccini
  rdf:type rtm:topic;
  rtm:indicatedBy <http://en.wikipedia.org/wiki/Puccini> .

this:synopsis
  rdf:type rtm:topic,
           rdfs:Class;
  rtm:indicatedBy ns2:synopsis;
  rdf:subClassOf rtm:occurrence .

this:tosca
  rdf:type rtm:topic;
  rtm:indicatedBy ns2:tosca .

3.4 The Garshol Proposal

3.4.1 Description

This proposal was originally presented in [Garshol 01] as part of a comparative analysis of the RDF and Topic Maps models. The analysis was further developed (and partially extended to address OWL) in [Garshol 03a]. The approach has been implemented by the author in the Ontopia Knowledge Suite and the Omnigator.

[Garshol 03a] starts by comparing RDF and Topic Maps through an examination of concepts that are fundamental to both paradigms: "symbols and things", "assertions", "identity", "reification", "qualification", and "types and subtypes". For each concept, Garshol shows how they are expressed in each paradigm and draws out the similarities and differences.

Comparing RDF and Topic Maps

According to Garshol, RDF and Topic Maps are both "identity-based technologies"; that is, the key concept in both is symbols representing identifiable things about which assertions can be made. In Topic Maps, "things" are called "subjects"; in RDF they are called "resources" and, despite different definitions, they are essentially the same concept. Subjects are represented by topics; resources are represented by RDF nodes (or "nodes" for short). According to Garshol, the correspondence between "topic" and "node" is close but not exact; however, he does not explain why.

Assertions express relationships between things and take the form of "topic characteristics" in Topic Maps and "statements" in RDF. A topic characteristic can be a name, an occurrence, or an association. An RDF statement can thus in theory be mapped to any one of these three kinds of construct. Special attention is paid to associations since these can be of any arity, whereas all RDF statements are binary. Binary associations map fairly well to RDF statements, but associations of other arities do not.

In addition, RDF statements have direction but associations do not. Topic Maps uses the notion of "roles" to express the nature of each subject's involvement in the relationship; in RDF this involvement is implicit in the subject-predicate-object structure of the statement.

For these reasons, the correspondence between topic characteristics and statements is considered to be close, but not exact.

The issue of identity is considered to be "quite a thorny problem for interoperability between topic maps and RDF" since, although both Topic Maps and RDF use URIs as identifiers, they do so in different ways. In RDF there is only one kind of identifier and a node can have at most one (blank nodes and literals have none). In Topic Maps, topics can have any number of identifiers and a distinction is made between "subject locators" (URIs which identify the subject directly, formerly called "subject addresses") and "subject identifiers" (URIs which identify the subject indirectly, via a subject indicator). Garshol refers the reader to a more in-depth discussion of the issue in [Pepper 03].

Garshol's discussion of reification brings out certain differences between Topic Maps and RDF but does not reach any conclusion regarding the degree of correspondence between the two, although the point is made that reification is not a very commonly used feature. Qualification is seen as being related to reification, since the built-in Topic Maps feature "scope" is essentially a mechanism for making certain kinds of assertions about other assertions, but no proposal is made regarding how to express scope in RDF.

The concept of types and subtypes, on the other hand, is regarded as being identical in Topic Maps and RDF (except for the fact that the subClassOf property is part of RDF Schema rather than RDF itself).

Garshol summarizes his analysis by pointing to three fundamental differences between RDF and Topic Maps that "make it technically very difficult to merge" the two paradigms: identity, assertions, and reification (including qualification). The rest of his paper therefore focuses on ways to "move data between the two with as little effort as possible" (rather than on how to unify the two models).

To map or to model?

The approach taken by [Moore 01], [Lacher 01], [Ogievetsky 01b], and [Garshol 02], described as "modelling topic maps in RDF" (and vice versa), is briefly considered and then rejected as being

both heavy-weight and rather awkward to work with. Any query or retrieval specified in end-user terms will have to explicitly take into account topic map model features, and information from topic maps will not interoperate cleanly with other RDF information.

Garshol's conclusion is that "although this [modelling] approach is easy to use, the results do not meet the criterion of clean integration with other RDF data."

As an alternative, Garshol proposes to use vocabulary-specific mappings underpinned by a generic mapping. Statements should in general be mapped to names, occurrences or associations since this provides the most "natural" results. However, it is not possible to know which of these is most appropriate for any given statement without an understanding of the semantics of the properties involved – hence the need for vocabulary-specific mappings. For example, the RDF statement

<http://example.com/X>
  <http://example.com/Y>
    "foo" .

could be mapped in Topic Maps to either a name or an internal occurrence (since the object is a literal). Similarly, the statement

<http://example.com/X>
  <http://example.com/W>
    <http://example.com/Z> .

could be mapped to either an association or an external occurrence (since the object is a resource).

RDF2TM mapping

The solution according to Garshol is to provide mapping information, which he does using an RDF vocabulary called RTM ([Ontopia 03a]) that is used to annotate RDF documents (or their schemas) and thus guide the translation process. The RTM vocabulary is used for translating from RDF to Topic Maps and consists of the following RDF properties: maps-to, type, in-scope, subject-role, object-role.

The maps-to property can have the following values:

Mappings that use rtm:occurrence or rtm:association will automatically use the property to type the resulting Topic Maps construct, unless rtm:type is used to override this behaviour. The rtm:in-scope property can be used to specify scoping topics for base names, occurrence, or associations. Finally, the rtm:subject-role and rtm:object-role properties are used to specify the types of role played by the subject and object of an RDF statement when the statement maps to an association.

This vocabulary (and the implementations in the Ontopia Knowledge Suite and the Omnigator) go somewhat beyond what is covered in [Garshol 03a]. For example, it is recognized that properties may be mapped to various kinds of identifiers (source locators, subject identifiers, and subject locators) or to the privileged instance-of relationship, in addition to names, occurrences and associations.

In addition, greater provision is made for defaulting. Resource URIs are always mapped to subject identifiers and RDF statements can be imported as associations in the absence of role type information, in which case the predefined topics subject and object are used as role types.

TM2RDF mapping

Going from Topic Maps to RDF is shown to require additional information in order for optimal and/or predictable results to be achieved. The following problems are identified:

  1. Choosing properties when mapping names
  2. Choosing the subject when mapping associations

Garshol points out a number of issues that are not addressed in his analysis, including multiple identifiers, n-ary associations, reification and scoping, unary associations, variant names, and a number of (unspecified) "tricky edge cases"; for some of these he sketches possible solutions which have since been implemented in the Ontopia Knowledge Suite:

A second vocabulary (called TMR, [Ontopia 03b]), consisting of six published subjects, addresses many of these issues. Name mapping is handled by tmr:name-property, tmr:type, and tmr:property, and the problem of mapping associations is solved using tmr:preferred-role, tmr:association-type, and tmr:role-type.

As with the RDF2TM translation, the implementations provide some level of defaulting. Both subject identifiers and subject locators are automatically mapped to resource URIs. In addition, associations can be exported to RDF in the absence of mapping information about roles; in this case the choice of subject and object for the resulting statement is arbitrary.

The remainder of [Garshol 03a] is devoted to a comparison of the respective constraint and query languages of Topic Maps and RDF and is thus beyond the scope of this analysis.

3.4.2 Analysis

  1. Completeness: As currently specified this proposal provides an extensive but incomplete solution and the author himself identifies most of the respects in which it is incomplete. Those which are not mentioned include containers, collections, XML literals and typed literals. A high degree of reversibility and roundtripping is achievable, provided appropriate reverse mappings are generated during the translation. An issue exists with subject locators that end up as subject identifiers when roundtripping from Topic Maps to RDF and back to Topic Maps.
  2. Fidelity: The proposal scores well on fidelity. Even when using default mappings the results are quite natural. The TM2RDF test case results in an RDF document containing 13 statements. The RDF2TM test case results in a topic map containing 25 TAOs (19 topic, three associations, and three occurrences).

3.4.3 Test Cases

The test translations were performed using Ontopia's Omnigator Eight (OKS 2.1.0, build 2004-12-15 #1495) [Ontopia 05].

TM2RDF

The source document was opened in the Omnigator and exported to RDF with default mappings. The document was then converted to N3 format using the Mindswap RDF Converter [Mindswap 02], and finally tidied by hand in order to ease comparison with the source document. Note that since default settings were used, the choice of Tosca as the subject of the music:composed-by statement was entirely abitrary.

@prefix : <http://psi.ontopia.net/music/#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://en.wikipedia.org/wiki/Puccini>
   rdf:type   :person;
   rdfs:label "Giacomo Puccini" .

<http://psi.ontopia.net/opera/#tosca>
   rdf:type :opera;
   rdfs:label "Tosca";
   :premiere-date "1900-01-14";
   :synopsis <http://www.azopera.com/learn/synopsis/tosca.shtml> .
   :composed-by <http://en.wikipedia.org/wiki/Puccini>;

:composed-by   rdfs:label "Composed by" .
:composer      rdfs:label "Composer" .
:opera         rdfs:label "Opera" .
:person        rdfs:label "Person" .
:premiere-date rdfs:label "Premi\u00C3\u00A8re date" .
:synopsis      rdfs:label "Synopsis" .
:work          rdfs:label "Work" .
RDF2TM

The source document was imported to the Omnigator using mappings specified using the RDF2TM plug-in. (Default mappings were used for all properties except dc:title and foaf:name, which were mapped to base names instead of internal occurrences.) The document was then exported to HyTM, converted by script to LTM, and then tidied by hand in order to ease comparison with the source document. Note that the mapping vocabulary allows for more precise specification of the subject and object role types, but this capability was not used here. [@@Use direct export to LTM when available.]

@"utf-8"

[id_1 : Concert = "Tokyo Green Symphony Orchestra 12th Concert"]
 {id_1, date, [ [2003-11-02T14:00+09:00]]}
 location( id_1 : subject, id_2 : object )
 conductor( id_1 : subject, id_3 : object )
 performer( id_1 : subject, id_4 : object )

[id_2 : Venue = "Sumida Triphony"]
 {id_2, lat, [ [35.69]]}
 {id_2, long, [ [139.81]]}

[id_3 : Conductor = "Yuri Nitta"]

[id_4 : Violinist = "Tomoko Kawada"]

[Concert = "Concert" @"http://www.kanzaki.com/ns/music#Concert"]
[Conductor = "Conductor" @"http://www.kanzaki.com/ns/music#Conductor"]
[Venue = "Venue" @"http://www.kanzaki.com/ns/music#Venue"]
[Violinist = "Violinist" @"http://www.kanzaki.com/ns/music#Violinist"]

[conductor = "Conductor" @"http://www.kanzaki.com/ns/music#conductor"]
[performer = "Performer" @"http://www.kanzaki.com/ns/music#performer"]

[date = "Date" @"http://purl.org/dc/elements/1.1/date"]
[title = "Title" @"http://purl.org/dc/elements/1.1/title"]
[location = "Event Location" @"http://ebiquity.umbc.edu/v2.1/ontology/event.owl#location"]
[name = "Name" @"http://xmlns.com/foaf/0.1/name"]
[lat = "Latitude" @"http://www.w3.org/2003/01/geo/wgs84_pos#lat"]
[long = "Longitude" @"http://www.w3.org/2003/01/geo/wgs84_pos#long"]

[subject = ":subject" @"http://psi.ontopia.net/rdf2tm/#subject"]
[object = ":object" @"http://psi.ontopia.net/rdf2tm/#object"]

3.5 The Unibo Proposal

3.5.1 Description

The Unibo proposal is based on work done as part of a larger project whose goal was to develop a toolset for managing metadata in either RDF or Topic Maps. The approach to RDF/TM interoperability is described briefly in [Ciancarini 03] and more fully in [Gentilucci 02] (in Italian). This description draws on both sources.

Ciancarini et al start by giving a brief introduction to RDF and Topic Maps before focusing on how to translate between the two. They cite the work of Moore, Lacher and Decker, and Ogievetsky, all of which, they claim, suffers from a common drawback, namely the "rather awkward appearance of the documents coming out of the conversion." The authors clearly prefer Garshol's approach, which produces much more "readable" results and which is similar to their own. The main difference is that Garshol does not utilize the "standard RDF and RDF predicates" and thus always requires a mapping to be specified.

Like earlier authors, Ciancarini et al recognize that there are two fundamental approaches to tackling the problem of translation, corresponding to Moore's "modelling the model" and "mapping the model". The first of these is seen to be problematic in that "the converted document is necessarily very different from the one that would have been written directly in the destination language, and hardly readable." The problem with the second one is that it is "not always possible" to identify semantic equivalences, and that doing so often requires a case-by-case approach and thus has no general usefulness.

The authors therefore consider a hybrid approach to be the optimal solution and their implementation in the META Converter combines a generic mapping, which tries to stay as close as possible to the original semantics, with the ability to define specific mappings using an XML vocabulary. Section 3.3 of [Gentilucci 02] provides a fairly detailed overview of the generic mapping while Chapter 4 describes the mechanism for specific mappings.

Identity

Like Garshol, Ciancarini et al assume a basic equivalence between topic and resource (although they are less clear on the distinction between resources and RDF nodes), but they differ in how identity is expressed. The default behaviour in the Unibo proposal is to equate subject locators with resource URIs and to represent subject identifiers using the RDFS property isDefinedBy. Examples given in [Gentilucci 02] (e.g., 3.8 and 4.2) show how this leads to resources that clearly represent non-addressable subjects, such as "Mario Rossi" and "Format", being translated to addressable subjects (using <resourceRef> for subjectIdentity).

Topics that have no subject locator are translated to blank nodes whose ID is generated from the topic's base name. When going the other way, the ID of a blank node becomes a topic name and thus results in lower fidelity (since the ID of a blank node and a topic name have different semantics).

names

The Unibo proposal is alone is assuming a fundamental equivalence of semantics between base names and the rdfs:label property: names that have no variants are thus easy to handle. Variant names are seen to represent a greater challenge which is solved through the use of four RDF predicates: baseName, variant, parameter, and variantName. A base name that has a variant is represented through a blank node with rdfs:label and tm2rdf:variant properties: the former is a literal that corresponds to the value of the topic name (i.e., the <baseNameString> in XTM syntax); the value of the latter property is another blank node that has variant and parameter properties. Thus a topic with a base name and sort name

[mario_rossi = "Mario Rossi";"rossi mario"]

results in the following statements:

_:mario_rossi
  tm2rdf:baseName    _:bn1_mario_rossi .

_:bn1_mario_rossi
  rdfs:label         "Mario Rossi" ;
  tm2rdf:variant     _:v11_mario_rossi .

_:v11_mario_rossi
  tm2rdf:variantName "rossi mario" ;
  tm2rdf:parameter   _:param1 .

_:param1
  rdfs:isDefinedBy   <http://www.topicmaps.org/xtm/1.0/core.xtm#sort> .
Associations: TM2RDF

Predictably, representing associations in RDF is regarded as difficult because of RDF's "more primitive nature" compared to Topic Maps. A generic translation is possible "at the level of the model," but it is "complex and artificial" and comes at the price of "abusing the RDF way of expressing relationships." The basic approach is similar to Ogievetsky's in that the roles (or "members") are contained in an RDF bag of blank nodes. However, whereas in Ogievetsky the bag is the association, the Unibo proposal uses an additional resource to represent the association; this resource has a tm2rdf:association property, the object of which is the bag of members. All in all, nine RDF statements are required to represent a single binary association.

The tm2rdf:association property is characterized as a "supporting predicate" whose purpose is to "add a little legibility" to the resulting document. A variation on this is also suggested in which the bag of members and the association become a single node: this is effectively the same solution as Ogievetsky's.

[Gentilucci 02] also describes two alternative approaches in which n-ary associations are decomposed into a number of binary relations. Both of these require six RDF statements in order to represent a single ternary association. Given the following association:

X( A : rA , B : rB , C : rC )

(i.e. an association of type X between topics A, B, and C playing the roles rA, rB, and rC respectively), the first of these alternative approaches results in the following six statements:

A X B .   A X C .   B X A .   B X C .   C X A .   C X B .

Role types are lost. In addition, the fact that each pair of role players is related through the same predicate twice (both as subject and object and as object and subject) means that only symmetrical relationships would be represented correctly. Finally, the semantic of A, B, and C all being involved in the same relationship is also lost; this may or may not involve real loss of information depending on the nature of the relationship.

The second alternative approach involves predicates that correspond to role types and results in the following statements:

B rA A .   C rA A .   A rB B .   C rB B .  A rC C .   B rC C .

While role types are now preserved, the association type is lost (although it could in theory be preserved through additional statements relating it to rA, rB, and rC). In addition, it seems doubtful that the original semantics are correctly preserved. For example: Can it be assumed to be the case that the relationship between role players B and A (rA) is the same as that between C and A? Finally, the point made above about losing the semantic of the involvement of A, B, and C in the same relationship also pertains here.

Having considered these alternatives, the Unibo proposal comes down in favour of the approach that uses the tm2rdf:association property, at least in the absence of more specific mapping information.

Associations: RDF2TM

When translating in the opposite direction, from RDF to Topic Maps, the generic solution proposed by Unibo is to translate RDF statements to associations. The example given in [Gentilucci 02] results in a typed binary association with untyped roles and does not take into consideration the case in which the object of the RDF statement is a literal. However, it is recognized that "it might be preferable, in certain contexts, to apply other types of conversion" and this leads into a discussion of "attributes" and the role of schema information.

It is recognized that certain RDF statements are more appropriately mapped to either internal or external occurrences, with the occurrence type corresponding to the property of the statement. Knowing when to do this requires some kind of schema information. This is essentially the same as Garshol's approach, except for the fact that Unibo uses an XML vocabulary rather than an RDF vocabulary to specify the mapping information.

scope

In this context a proposal is put forward for representing scoped occurrences in RDF: An rdfs:seeAlso property has a blank node as its object; the blank node has an rdfs:isDefinedBy property (whose object is the URI of an external occurrence) and one or more tm2rdf:scope properties. This results in a construct whose "shape" is very different from that of an unscoped occurrence. In addition, given that the range of the rdfs:isDefinedBy property is rdf:Resource, it is unclear how this approach would work with internal occurrences.

A "not very elegant" way to represent scoped names is suggested that involves defining a property whose rdf:type is tm2rdf:baseName that corresponds either directly or indirectly (it is not clear which) to each scoping topic. In addition to being inelegant, this would not work with scopes comprised of multiple scoping topics. The alternative is the same that proposed by Garshol: i.e., to qualify reified statements using the tm2rdf:scope property.

For scoped associations, reification in the RDF sense is not necessary since associations are already represented as resources (at least in the default mapping). Thus, all that is necessary to represent a scoped association is to assign one or more scope properties to that resource. The downside to this is that scoping is now handled in three different ways (for generically mapped associations, for occurrences, and for names and specifically mapped associations respectively).

Reification, typing, and subtyping

Neither reification, nor typing or subtyping, are regarded as posing problems since both RDF and Topic Maps support all three concepts in essentially the same way: instanceOf equates to rdf:type; the supertype-subtype relationship (represented in Topic Maps using an association with a predefined type) equates to rdfs:subClassOf, and reification is essentially the same in Topic Maps and RDF.

Specific mappings

The description above has focused on the Unibo proposal's approach to generic translations. However, Ciancarini et al recognize that a generic approach will not always produce optimal results and their tool therefore also provides a way to "guide" the translation process. This consists of a simple XML vocabulary that allows the user to specify how to translate a (binary) association to a single RDF statement (and vice versa). As in the Garshol proposal, this involves specifying correspondences between association role types and the statement's subject and object. In addition, a user can specify which RDF properties should be mapped to occurrences rather than to associations. The following extract shows how mappings for the TM2RDF test case would be specified:

<?xml version="1.0"?>
<xtm2rdf>
  <property_associations>
    <li id="composed-by">
      <domain_role id="work"/>
      <range_role id="composer"/>
    </li>
  </property_associations>
  <property_occurrences>
    <li id="premiere-date"/>
    <li id="synopsis"/>
  </property_occurrences>
</xtm2rdf>

These mappings would cause the composed-by association to be represented as a single statement in RDF, with Tosca ("work" = domain) as the subject and Puccini ("composer" = range) as object. In addition, the mapping contains information that would cause properties of type premiere-date and synopsis to be mapped to occurrences when going from RDF to Topic Maps. (Although not stated explicitly, this information is presumably not required when going the other way.)

3.5.2 Analysis

  1. Completeness: The proposal is fairly complete but some features, e.g., language tags and data typing in RDF, and reification of a topic map, are not covered explicitly. The proposal permits some degree of reversibility, but the result of a roundtrip may not always be the same as the starting point. For example, using the generic mappings, most RDF statements would be translated to typed associations with untyped roles, each of which would result in several statements when translated back to RDF.
  2. Fidelity: The approach results in a high degree of readability in both directions provided mapping information is provided. Generic translations are less satisfactory, with a single binary association resulting in nine RDF statements.

3.5.3 Test Cases

The test translations were performed using the tool Meta. The current version of the tool... [@@TBC]

TM2RDF

The source document was translate using the tool Meta ([Gentilucci 02]) with the default settings. The resulting document was then converted to N3 format using the Mindswap RDF Converter [Mindswap 02]. In order to make the document easier to read the IDs randomly generated by the tool were changed by hand.


     @prefix music: <http://psi.ontopia.net/music/#gt; .
     @prefix opera: <http://psi.ontopia.net/opera/#gt; .
     @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#gt; .
     @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#gt; .
     @prefix s: <http://cs.unibo.it/meta/tmschema.rdf#gt; .

    :composed-by
         rdf:type rdfs:Class;
         rdfs:isDefinedBy music:composed-by;
         rdfs:label "Composed by" .

    :composed-by_Association
         rdf:type :composed-by;
         s:association :composed-by_members .

    :composed-by_member_1
         rdf:type :composer;
         rdfs:isDefinedBy :puccini .

    :composed-by_member_2
         rdf:type :work;
         rdfs:isDefinedBy :tosca .

    :composed-by_members
         rdf:type rdf:Bag;
         rdf:_1 :composed-by_member_1;
         rdf:_2 :composed-by_member_2 .

    :person
         rdf:type rdfs:Class;
         rdfs:isDefinedBy music:person;
         rdfs:label "Person" .

    :composer
         rdf:type rdfs:Class;
         rdfs:isDefinedBy music:composer;
         rdfs:label "Composer" .

    :opera
         rdf:type rdfs:Class;
         rdfs:isDefinedBy music:opera;
         rdfs:label "Opera" .

    :work
         rdf:type rdfs:Class;
         rdfs:isDefinedBy music:work;
         rdfs:label "Work" .

    :premiere-date
         rdf:type rdfs:Class;
         rdfs:isDefinedBy opera:premiere-date;
         rdfs:label "Premiere date" .

    :puccini
         rdf:type :composer;
         rdfs:isDefinedBy <http://en.wikipedia.org/wiki/Puccini>;
         rdfs:label "Giacomo Puccini" .

    :synopsis
         rdf:type rdfs:Class;
         rdfs:isDefinedBy opera:synopsis;
         rdfs:label "Synopsis" .

    :tosca
         rdf:type :opera;
         rdfs:isDefinedBy opera:tosca;
         rdfs:label "Tosca";
         rdfs:seeAlso :tosca_occurrence_1,
                      :tosca_occurrence_2 .

    :tosca_occurrence_1
         rdf:type :premiere-date;
         rdfs:label "1900 (14 Jan)" .

    :tosca_occurrence_2
         rdf:type :synopsis;
         rdfs:isDefinedBy <http://www.azopera.com/learn/synopsis/tosca.shtml> .
RDF2TM

[@@A problem was encountered when trying to translate this test case because the tool is currently being updated to support the latest RDF specification. The test case will be added in the next draft.]


3.6 Other Proposals and Contributions

The preceding sections have described the five most relevant proposals for RDF/TM interoperability. A number of other proposals and contributions have also been considered:

Both [Vlist 01] and [Prudhommeaux 02] are early "first cuts" (produced "before breakfast" and as "an evening's work" respectively). Both are very incomplete and have been superseded by later work; they are mentioned here for the sake of completeness.

[Garshol 02] is a complete RDF Schema for Topic Maps based on an early version of the Topic Maps Data Model ([TMDM]). It is similar to the Stanford and Ogievetsky proposals in that it models Topic Maps in RDF and falls therefore into the general category of "object mappings" which the author himself has since rejected. Its principal interest lies in the fact that it is based on a more complete and consistent model than that of PMTM4 and it will therefore be used to illustrate the difference between object mappings and semantic mappings in the next section.

[Kaminsky 02] presents a conceptual metamodel called Braque that is a "superset of the most popular proposed semantic web metamodels" (viz. XML, RDF, and Topic Maps) and defines transformations from each of these into Braque. Kaminsky's work is clearly of great interest to anyone seeking to unify RDF and Topic Maps into a single model. However, that is not the mandate of the RDFTM task force. Neither can Kaminsky's proposal be considered as a solution to the more immediate problem of interchanging RDF and Topic Maps data, since no transformations out of Braque are defined. [Kaminsky 02] is therefore considered to be out of scope for the current work.

[Pepper 03] provides an in-depth discussion of identifiers in RDF and Topic Maps and is thus relevant to the issue of Identity listed in section 2.2. The authors' main goal is to clarify the distinction between direct and indirect identification of subjects, and to pinpoint the lack of an ontological distinction between "resources" (in general) and "information resources" (which is a subset of the "resources"). Direct identification is possible only for information resources; indirect identification is possible for any kind of resource (including information resources). With the publication of [TAG], and the introduction of the formal concept of information resource, these distinctions have now been recognized and this could pave the way to solving the issue of Identity, at least as far as RDF and Topic Maps interoperability is concerned.

[Vatant 04] investigates how OWL (Web Ontology Language) may be used to constrain Topic Maps and has relevance for the expression of additional information that may be used to guide a translation. This will have greater relevance for the RDFTM task force's second deliverable.


4 Analysis

The first point to be noted is that all of the major proposals suffer from the fact that neither Topic Maps nor RDF had stable, formalized data models at the time they were written. PMTM4 never had any official standing and has since been superseded (according to its authors) by the Topic Maps Data Model ([TMDM]), Part 2 of the forthcoming revised Topic Maps standard, and the Topic Maps Reference Model ([TMRM]), a Working Draft of Part 5 of the same standard. In the case of RDF, the Concepts and Abstract Syntax specification ([RDF-Concepts]) first appeared in 2004. Now that these formal models exist, it should be possible to define complete and correct mappings at either the object or the semantic level.

4.1 Object mappings and semantic mappings

All the existing approaches fall into two distinct categories that Moore terms "modelling the model" and "mapping the model". Following the terminology of Lacher and Decker these might be more appropriately termed "object mappings" and "semantic mappings" respectively. The basic difference between the two approaches can be summed up as follows:

The advantage of an object mapping is that it is easy to make it generic (provided, of course, that the object model on which it is based is complete) and this ensures completeness without any additional effort. The disadvantage is the fidelity of the result. Semantic mappings have much higher fidelity but suffer from the disadvantage that genericity is impossible to ensure and thus, in order to guarantee completeness, it requires additional information not always present in the source document.

Of the existing proposals, Stanford and Ogievetsky both use object mappings based on [PMTM4]. Moore discusses both an object mapping (based on his own inaccurate models) and a semantic mapping. Garshol dismisses object mappings and concentrates solely on semantic mappings. Unibo attempts to combine both approaches in order to achieve the dual goals of providing a default, generic mapping that can be performed without additional information, while at the same time making it possible to provide specific mapping information in cases where higher fidelity is required.

4.2 The importance of being faithful

The notion of "fidelity" was defined in section 2.1 as follows:

The criterion fidelity expresses the degree to which the results of a translation are faithful to the underlying conceptual model of the target paradigm. This quality can be thought of as naturalness, that is, as corresponding to the way in which someone familiar with the target paradigm would naturally express the information content in that paradigm. Naturalness normally also confers improved readability on the result.

Fidelity is extremely important because the result of a "low-fi" translation is structurally different from data that was originally created in the target model. This has the following consequences, all of which lead to reduced interoperability:

Object mappings generally rate very low on fidelity and are therefore subject to all three of these failings. As an example, consider the following topic map:

{tosca, music:premiere-date, [[1900-01-14]]}

This defines an occurrence of type music:premiere-date whose value is "1900-01-14". A semantic mapping to RDF would result in the following translation:

_:a0  music:premiere-date  "1900-01-14" .

An object mapping would look as follows:

_:a1, rdf:type, tm:Topic .
_:a1, tm:occurrence, _:a2 .
_:a2, rdf:type: tm:Occurrence .
_:a2, tm:occurrence-type, _:a3 .
_:a3, tm:subject-identifier, music:premiere-date .
_:a2, tm:resource, "1900-01-14" .

This example uses the vocabulary defined in [Garshol 02] that is based on [TMDM], in order to conform to the most standard data model for Topic Maps. It serves to illustrate the fact that object mappings are inherently more verbose than semantic mappings. They also involve a significant amount of indirection and can thus be expected to lead to a lot of processing overhead. More important than either of these points, however, is that the semantics are actually different. The result of an object mapping consists of constructs that carry Topic Maps semantics (such as "topic", "occurrence", "occurrence type", etc.) and which processors are required to understand in order to be able to process the result correctly.

As an example, consider merging the results of semantic and object mappings respectively with native RDF data that includes the following statement:

_:b0  music:premiere-date  "1900-11-10" .

This statement asserts that some resource had its premiere date on 1900-11-10. A merged result that used the semantic mapping would look as follows:

_:a0  music:premiere-date  "1900-01-14" .
_:b0  music:premiere-date  "1900-11-10" .

This would be easily queryable (for example for all premières that took place in the year 1900) in terms of the music vocabulary alone. Contrast this with the following result of merging where one of the components is based on an object mapping:

_:a1, rdf:type, tm:Topic .
_:a1, tm:occurrence, _:a2 .
_:a2, rdf:type: tm:Occurrence .
_:a2, tm:occurrence-type, _:a3 .
_:a3, tm:subject-identifier, music:premiere-date .
_:a2, tm:resource, "1900-01-14" .
_:b0  music:premiere-date  "1900-11-10" .

This would clearly be much harder to query and would require knowledge of the tm vocabulary in addition to the music vocabulary. The very complexity of the queries given by Lacher and Decker, and Ogievetsky, respectively, speak volumes in this regard.

Given the importance of high fidelity it would seem to make sense to prefer a semantic mapping, provided that a sufficient degree of completeness can be achieved. The following section therefore looks at the issues involved in defining semantic mappings with a particular emphasis on determining whether the existence of formal data models for Topic Maps and RDF now makes it possible to ensure completeness as well as fidelity.

4.3 Semantic mapping issues

4.3.1 Identity

Although both RDF and Topic Maps use URIrefs as identifiers, they differ crucially in that Topic Maps offers two modes of identification, direct and indirect, whereas RDF offers only one. This prompts the question, which Topic Maps construct(s) should be regarded as being semantically equivalent to the URIref of an RDF resource? Subject identifiers, subject locators, ... or both?

Since identifiers are not part of the PMTM4 model, this issue is simply ignored in the Stanford proposal. Moore's position is not stated explicitly, but the examples he gives indicate that subject identifiers, at least, are regarded as equivalents. Both Ogievetsky and Unibo favour subject locators and define a separate property for handling subject identifiers. Garshol translates URIrefs to subject identifiers when going from RDF to Topic Maps, but is more agnostic when going the other way, translating both subject identifiers and subject locators to URIrefs.

There are problems with all of these approaches. Clearly, identifiers have to be mapped somehow, otherwise there will be loss of information. Equating URIrefs in RDF with subject locators is problematic in several ways. Firstly it leads to incorrect semantics (as the description of the Unibo proposal shows). Secondly, it results in lower fidelity (since the identifier of a non-addressable subject like Puccini will not be treated as the URIref of the corresponding resource, as would be most natural in RDF). Finally, the identifiers of occurrence types and association types could not be used as the URIrefs of RDF properties.

Equating URIrefs with subject identifiers rather than subject locators also results in lower fidelity, since the identifier of an addressable subject (i.e., an information resource) will not become the URIref of the corresponding resource, as would be most natural in RDF. However, this alternative does not exhibit the other problems that result from favouring subject locators.

There is a dilemma here and Garshol's agnosticism is in some ways a recognition of it. As a result, his TM2RDF translations exhibit the highest fidelity as far as identity is concerned. Unfortunately he loses the information about whether the URIref originated in a subject identifier or a subject locator and is thus reduced to translating every URIref to a subject identifier when going the other way. This leads to problems with roundtripping, as noted above.

The ideal solution would be to allow either subject identifiers or subject locators to be regarded as URIrefs (and vice versa), but at the same time to retain sufficient information when going from Topic Maps to RDF to be able to perform roundtripping. The recognition in [TAG] of the distinction between resources in general and information resources and the insights in [Pepper 03] may provide the foundation for such a solution.

The issue of multiple identifiers is treated explicitly by Garshol only. For those proposals that regard the subject locator as the semantic equivalent of a resource's URIref and define a custom property for subject identifiers (Ogievetsky and Unibo) this was a non-issue as long as topics could only have one subject locator. However, in the forthcoming version of ISO 13250 multiple subject locators will be allowed so the issue will have to be faced explicitly. Garshol's proposal to use equivalence properties defined in OWL (i.e., owl:sameAs, owl:equivalentClass, and owl:equivalentProperty) should clearly be investigated in more detail since such an approach is likely to lead to increased interoperability between RDF and Topic Maps.

4.3.2 Topic names

In RDF the name of a resource is usually represented by a single statement. RDF Schema defines a property for this purpose (rdfs:label) but many vocabularies define their own properties (e.g. dc:title, foaf:name, etc.). An accurate semantic mapping from Topic Maps can be achieved by translating base names to such properties.

Both Garshol and Unibo take this approach, differing only in that Unibo always maps a base name to rdfs:label (and vice versa), while Garshol allows base names (including scoped base names) to be mapped to other properties. It should be noted that both proposals were written before the introduction of typed names in the Topic Maps model so neither can be considered a complete solution today.

Of the semantic mapping proposals, only Unibo provides a solution for handling variant names, by representing names that have variants as complex objects. Given the limitations of RDF the approach seems sound enough, except for the introduction of what appears to be a superfluous blank node as the value of the tm2rdf:parameter property.

4.3.3 Binary associations

In a semantic mapping a binary association should ideally be represented by a single RDF statement (and vice versa). The challenges relate to the lack of role types in RDF: When going from Topic Maps to RDF, which role-playing topic should become the subject of the resulting statement (and how should role types be preserved)? When going from RDF to Topic Maps: What role types should be assigned to the subject and object of the statement (and how to retain knowledge of what the subject and object were)?

Both Garshol and Unibo solve this by allowing additional information to be provided that allows the RDF subject and object to be connected with their respective role types. Unibo uses a single XML vocabulary that is external to the document being translated. Garshol uses an RDF vocabulary for going from RDF to Topic Maps, and a set of Published Subjects for going from Topic Maps to RDF. Garshol's approach has the advantage of allowing source documents to be self-describing (the mappings can be included in the source documents or their schemas). The disadvantage of Garshol is the use of two different vocabularies, one for each direction. A cleaner solution would be to use a single vocabulary.

In the absence of additional information, Unibo falls back to a low-fidelity object mapping that requires nine RDF statements to represent a single binary association. Garshol, on the other hand, tries to perform a semantic mapping anyway: the predefined classes subject and object are used when going from RDF to Topic Maps, and a role-player is selected at random to be the subject of the resulting statement when going from Topic Maps to RDF. As currently implemented this leads to loss of information and the inability to perform roundtripping. However, it is perfectly feasible for the latter translation to retain the information necessary to perform roundtripping in the form of an annotation to the schema using Garshol's own RTM vocabulary.

4.3.4 N-ary associations

Most of the existing proposals for translating associations with more than two role-players are unsatisfactory, since they result in a large number of RDF statements. [Noy 04] proposes patterns for representing n-ary relations in RDF in which the relation is "re-represented" as a class rather than a property. Each such pattern requires n statements in order to express the relationship. Using the example given in section 3.5.1 the result would be one of the following (P stands for the re-represented relation):

P rdf:type X .   P rA A .   P rB B .   P rC C .   # Pattern 2
P rdf:type X .   A rA P .   P rB B .   P rC C .   # Pattern 1

The first of these (labelled "Pattern 2") is identical to Garshol's proposal for n-ary associations. If such patterns are adopted in the RDF community it would seem to be advisable, in the interest of fidelity, to follow them as closely as possible when translating n-ary associations from Topic Maps to RDF.

4.3.5 Occurrences

Both Garshol and Unibo recognize that occurrences are most naturally represented as single RDF statements where the property corresponds to the occurrence type. Internal and external occurrences correspond to statements whose objects are literals and resources respectively. Going from Topic Maps to RDF presents no problems at all; going the other way seems to require additional information in order to distinguish an internal occurrence from a name, and an external occurrence from an association or identifier.

It is unclear how Unibo behaves in the absence of additional mapping information. The default in Garshol (at least as implemented in the Omnigator is to translate statements whose objects are literals to internal occurrences and statements whose objects are resources to associations.

4.3.6 Types and subtypes

Garshol and Unibo agree on the fundamental semantic equivalence between type-instance and rdf:type, on the one hand, and supertype-subtype and rdfs:subClassOf on the other. In addition, association types and occurrence types are seen to equate to RDF properties. Role types present particular problems, as discussed above, and name types did not exist at the time the proposals were written.

4.3.7 Reification

Only Garshol and Unibo mention reification and neither proposal regards it as being problematic. In actual fact, Unibo only talks explicitly about reification of associations, while Garshol mentions reified names, occurrences, and associations. Neither proposal covers the reification of topic maps and association roles.

4.3.8 Scope

All the existing proposals discuss scope in one form or another but only Garshol and Unibo do so in terms of its semantics, i.e., as a way to express the contextual validity of an assertion. Garshol makes the point that scope is most properly regarded as a particular kind of assertion made about another assertion. Since assertions about assertions are handled through reification in both paradigms, and reification translates rather easily, Garshol proposes to translate scope using reification and a property that captures the semantics of contextual validity.

Garshol treats scoped base names as a special case, however, and allows a base name in a particular scope to be translated to a specific property. For example, a base name in the scope 'nickname' might be translated using the foaf:nick property. While this undoubtedly results in a high degree of fidelity (much higher than translating to, say, a reified rdfs:label statement with an rdftm:scoped-by property), such special-casing introduces a degree of inconsistency. Why should only base names be treated in this way? Why not associations and occurrences as well?

The answer may be that associations and occurrences have types whereas names do not (or did not, until recently). It could be argued that the lack of typed names in Topic Maps has led to scoped names being used in ways that distort the semantics of scope. Or, to put it another way: Given that the forthcoming revised Topic Maps standard will permit typed names, would it be more appropriate to represent a nickname as a name of type 'nickname' (or foaf:nick) rather than a name in the scope 'nickname'? If so, it would be possible to avoid treating scoped names as a special case and still enable a high degree of fidelity.

Unibo handles scope in three different ways (one of which involves reification) depending on the kind of construct in question. This is clearly even more inconsistent, and it is probably also unnecessary since the reification approach seems to be usable for scoping any kind of topic characteristic.

4.3.9 Other issues

None of the existing proposals discuss how to represent RDF containers and collections, language tags, XML literals or typed literals in Topic Maps. Of these issues, the latter two are addressed by recent datatyping extensions to the Topic Maps model. Language tagging can be seen as a kind of contextual information akin to scope and treated accordingly. Containers and collections may or may not require special treatment.


5 Conclusion

The foregoing analysis has shown that semantic mappings appear to fit the requirements for data interoperability between RDF and Topic Maps better than object mappings. The Garshol and Unibo proposals appear to come closest to providing a useable solution. However there are a number of outstanding issues, the most important of which relate to:


Acknowledgements

Valentina Presutti and Nicola Gessa (University of Bologna) and Lars Marius Garshol (Ontopia) have contributed actively to the writing of this document. The test case result in section 3.3 was contributed by Nikita Ogievetsky.


References

[Ciancarini 03]
Ciancarini, Paolo; Gentilucci, Riccardo; Pirruccio, Marco; Presutti, Valentina; Vitali, Fabio: Metadata on the Web: On the integration of RDF and Topic Maps (2003)
[Garshol 01]
Garshol, Lars Marius: Topic maps, RDF, DAML, OIL: A comparison (2001)
[Garshol 02]
Garshol, Lars Marius: An RDF Schema for topic maps (2002)
[Garshol 03a]
Garshol, Lars Marius: Living with Topic Maps and RDF (2003)
[Garshol 03b]
Garshol, Lars Marius: The RTM RDF to topic maps mapping: Definition and introduction (2003)
[Gentilucci 02]
Gentilucci, Riccardo; Pirruccio, Marco: Metainformazioni sul World Wide Web: Conversione di formato e navigazione, University of Bologna, Masters Thesis, (2002; in print; in Italian)
[Kaminsky 02]
Kaminsky, Piotr: Integrating Information on the Semantic Web Using Partially Ordered Multi Hypersets (2002)
[Lacher 01]
Lacher, Martin S.; Decker, Stefan: On the Integration of Topic Maps and RDF Data (2001)
[Mindswap 02]
MindSwap: RDF Converter (2002)
[Moore 01]
Moore, Graham: RDF and Topic Maps: An exercise in convergence (2001)
[Noy 04]
Noy, Natasha; Rector, Alan: Defining N-ary Relations on the Semantic Web: Use With Individuals (2004)
[Ogievetsky 01a]
Ogievetsky, Nikita: Harvesting XML Topic Maps from RDF (2001)
[Ogievetsky 01b]
Ogievetsky, Nikita: XML Topic Maps through RDF glasses (2001)
[Ogievetsky 02]
Ogievetsky, Nikita: DAML and Quantum Topic Maps (2002)
[Ontopia 03a]
Ontopia: RTM: An RDF-to-TM mapping (2003)
[Ontopia 03b]
Ontopia: TMR: A TM-to-RDF mapping (2003)
[Ontopia 04]
Ontopia: tolog: Language tutorial (2004)
[Ontopia 05]
Ontopia: Omnigator Eight (2005)
[Pepper 03]
Pepper, Steve; Schwab, Sylvia: Curing the Web's Identity Crisis: Subject Indicators for RDF (2003)
[PMTM4]
Biezunski, Michel; Newcomb, Steven R.: Topicmaps.net's Processing Model for XTM 1.0, version 1.0.2 (2001)
[Prudhommeaux 02]
Prud'hommeaux, Eric; Moore, Graham: RDF Topic Map Mapping (2002)
[RDF-Concepts]
Klyne, Graham; Carroll, Jeremy J.: Resource Description Framework (RDF): Concepts and Abstract Syntax (W3C Recommendation, 2004)
[TAG]
Jacobs, Ian; Walsh, Norman: Architecture of the World Wide Web, Volume One (W3C Recommendation, 2004)
[TMDM]
Garshol, Lars Marius; Moore, Graham: ISO/IEC 13250: Topic Maps — Data Model (Final Committee Draft, 2005)
[TMRM]
Durusau, Patrick; Newcomb, Steven R.: ISO/IEC 13250: Topic Maps — Reference Model (Working Draft, 2004)
[Vatant 04]
Vatant, Bernard: Ontology-driven topic maps (2004)
[Vlist 01]
Vlist, Eric van der: Representing XML Topic Maps as RDF (2001)
[XTM1.1]
Garshol, Lars Marius; Moore, Graham: ISO/IEC 13250: Topic Maps — XML Syntax (Final Committee Draft, 2005)

Valid XHTML 1.0!