W3C

A Survey of RDF/Topic Maps Interoperability Proposals

W3C Editor's Draft 17 March 2005

This version:
http://www.w3.org/2001/sw/BestPractices/RDFTM/survey-2005-03-17
Latest version:
http://www.w3.org/2001/sw/BestPractices/RDFTM/survey
Previous version:
http://www.w3.org/2001/sw/BestPractices/RDFTM/survey-2005-02-24
Editors:
Steve Pepper, Ontopia <pepper@ontopia.net>
Fabio Vitali, University of Bologna <fabio@cs.unibo.it>
Lars Marius Garshol, Ontopia <larsga@ontopia.net>
Nicola Gessa, University of Bologna <gessa@cs.unibo.it>
Valentina Presutti, University of Bologna <presutti@cs.unibo.it>

Abstract

The Resource Description Framework (RDF) is a model developed by the W3C for representing information about resources in the World Wide Web. Topic Maps is a standard for knowledge integration developed by the ISO. This document contains a survey of existing proposals for integrating RDF and Topic Maps data and is intended to be a starting point for establishing standard guidelines for RDF/Topic Maps interoperability.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is the first deliverable of the RDF/Topic Maps Interoperability Task Force (RDFTM) initiated by the Semantic Web Best Practices and Deployment Working Group of the W3C (SWBPD) with the support of the ISO Topic Maps committee (ISO/IEC JTC1/SC34).

This document is a W3C Editor's Draft and is expected to change. The SWBPD does not expect this document to become a Recommendation. Rather, after further development, review and refinement, it will be published and maintained as a WG Note.

This document is not yet a Public Working Draft. We encourage public comments. Please send comments to public-swbp-wg@w3.org and include the text "comment" in the subject line.

Table of contents

1 Introduction
    1.1 Background
    1.2 Purpose and target audience
    1.3 Overview of proposals
2 Criteria for evaluating the proposals
    2.1 Translation features
    2.2 Test cases
        2.2.1 TM2RDF test case
        2.2.2 RDF2TM test case
3 Existing translation proposals
    3.1 The Moore Proposal
        3.1.1 Description
        3.1.2 Analysis
        3.1.3 Test cases
    3.2 The Stanford Proposal
        3.2.1 Description
        3.2.2 Analysis
        3.2.3 Test cases
    3.3 The Ogievetsky Proposal
        3.3.1 Description
        3.3.2 Analysis
        3.3.3 Test cases
    3.4 The Garshol Proposal
        3.4.1 Description
        3.4.2 Analysis
        3.4.3 Test cases
    3.5 The Unibo Proposal
        3.5.1 Description
        3.5.2 Analysis
        3.5.3 Test cases
    3.6 Other Proposals and Contributions
4 Analysis
    4.1 Object mappings and semantic mappings
    4.2 The importance of naturalness
    4.3 Semantic mapping issues
        4.3.1 Identity
        4.3.2 Names
        4.3.3 Binary relationships
        4.3.4 Non-binary relationships
        4.3.5 Occurrences
        4.3.6 Types and subtypes
        4.3.7 Reification
        4.3.8 Scope
        4.3.9 Other issues
5 Conclusion
Acknowledgements
References

1 Introduction

1.1 Background

The Resource Description Framework (RDF) is a model developed by the W3C for representing information about resources in the World Wide Web. Topic Maps is a standard for knowledge integration developed by the ISO. The two specifications were developed in parallel during the late 1990's within their separate organizations for what at first appeared to be very different purposes. The results, however, turned out to have a lot in common and this has led to calls for their unification.

While unification has to date not been possible (for a variety of technical and political reasons), a number of attempts have been made to uncover the synergies between RDF and Topic Maps and to find ways of achieving interoperability at the data level. There is now widespread recognition within the respective user communities that achieving such interoperability is a matter of some urgency. Work has therefore been initiated by the Semantic Web Best Practices and Deployment Working Group of the W3C with the support of the ISO Topic Maps committee to address this issue. The goal of this work is to provide "guidelines for users who want to combine usage of the W3C's RDF/OWL family of specifications and the ISO's family of Topic Maps standards." Two deliverables are expected to be produced:

1.2 Purpose and target audience

This document is the first of those deliverables. It consists of a summary and analysis of the major existing proposals for achieving data interoperability between RDF and Topic Maps. Its purpose is to prepare the ground for a new and definitive proposal based on a synthesis of previous work.

The primary goal is to achieve interoperability between RDF and Topic Maps at the data level. This means that it should be possible to translate data from one form to the other without unacceptable loss of information or corruption of the semantics. It should also be possible to query the results of a translation in terms of the target model and it should be possible to share vocabularies across the two paradigms.

[RDF-Schema] and [OWL] are considered relevant to this work to the extent that the classes and properties they define are supportive of its goals. However, it is explicity not a goal of the current work to enable the general use of RDF Schema and OWL with Topic Maps, although this issue may be addressed later.

This document is aimed at readers with a particularly deep interest in the problem of RDF/Topic Maps interoperability and a willingness to acquire the necessary understanding of both paradigms. The reader is consequently expected to have a level of familiarity with both RDF and Topic Maps that at least corresponds to the tutorial material in [Pepper 00] and [RDF-Primer]. To fully understand this survey, the reader must in addition be familiar with the models described in [TMDM] and [RDF-Concepts], and the syntaxes described in [LTM] and [N3].

1.3 Overview of proposals

Five existing proposals are covered in this survey. They have been chosen as being sufficiently complete and well-documented to be suitable for detailed examination. They are also representative of the breadth of approaches that have been taken to date and can all be considered to be seminal in one way or another. They will be referred to by the names of their authors or, in the case of multiple authors, by the name of the organization to which the authors are affiliated. Each proposal builds upon and references previous work and they are characterized here in terms of the translation directions that they cover: i.e., RDF to Topic Maps (RDF2TM), and Topic Maps to RDF (TM2RDF), respectively. They are, in chronological order:

Moore

RDF2TM and TM2RDF proposal described in [Moore 01]. Not implemented.

Stanford

TM2RDF proposal described in [Lacher 01]. Implemented.

Ogievetsky

TM2RDF proposal described in [Ogievetsky 01b]. Implemented in the XTM2RDF Translator.

Garshol

RDF2TM and TM2RDF proposal described in [Garshol 01] and [Garshol 03a]. Documented in [Garshol 03b], [Ontopia 03a], and [Ontopia 03b], and implemented in Ontopia Knowledge Suite.

Unibo

RDF2TM and TM2RDF proposal described in [Gentilucci 02] and [Ciancarini 03]. Implemented in Meta.

The following proposals will only be considered briefly since they are insufficiently complete to warrant detailed examination:

The following contributions are also recognized as being relevant:

This survey describes the five main proposals in chronological order. Each proposal is summarized and evaluated in terms of criteria that are described in the next section. In addition, test cases are applied against each proposal. It is important to note that all five proposals were published before the respective communities had formalized their data models (in [TMDM] and [RDF-Concepts], respectively). They were also published before the advent of the Web Ontology Language ([OWL]). This accounts in part for the immaturity of some of the proposals; any quoted statements about the limitations of either paradigm should also be viewed in this light.


2 Criteria for evaluating the proposals

2.1 Translation features

Each translation proposal is evaluated against the following general criteria:

Completeness

The criterion completeness is used to evaluate the extent to which each proposal is able to handle every semantic construct that can be expressed in the source model and provide a means to represent it without loss of information in the target model. A complete translation will by definition be reversible.

Naturalness

The criterion naturalness expresses the degree to which the results of a translation correspond to the way in which someone familiar with the target paradigm would naturally express the information content in that paradigm. Naturalness normally also confers improved readability on the result. The importance of naturalness is discussed in section 4.2.


2.2 Test cases

This survey uses two simple test cases to enable an initial evaluation of the criterion "naturalness". These test cases are not intended to be complete since their purpose is merely to give a feel for the kind of results produced by the various proposals. A complete suite of test cases is expected to be created along with the Guidelines for RDF/Topic Maps interoperability that the SWBPD expects to produce.

Test cases and the results of translations are given in LTM and and N3 notation (for Topic Maps and RDF respectively) in order to aid readability. There is one test case for translations from Topic Maps to RDF (TM2RDF) and a second for translations from RDF to Topic Maps (RDF2TM).

In order to aid understanding, the two test cases are basically identical in terms of their information content, which is taken from [Pepper 05]. They consist of the assertions that the opera Tosca was premiered on 14th January 1900, has a synopsis at a location with a certain URL, and was composed by the composer Giacomo Puccini.

Both test cases are separated into instance data (above the dotted line comment) and ontology or schema data that might normally be expected to come from a shared document, such as a topic map ontology or the RDF namespace document respectively.

2.3.1 TM2RDF test case

[puccini : person   = "Giacomo Puccini"]
[tosca   : opera    = "Tosca"]

{tosca, premiere-date, [[1900-01-14]]}
{tosca, synopsis,      "http://www.azopera.com/learn/synopsis/tosca.shtml"}

composed-by( tosca : work, puccini : composer )

               /* ------------------------------------- */

[person        = "Person"        @"http://psi.ontopia.net/music/#person"]
[composer      = "Composer"      @"http://psi.ontopia.net/music/#composer"]
[opera         = "Opera"         @"http://psi.ontopia.net/music/#opera"]
[work          = "Work"          @"http://psi.ontopia.net/music/#work"]

[premiere-date = "Première date" @"http://psi.ontopia.net/music/#premiere-date"]
[synopsis      = "Synopsis"      @"http://psi.ontopia.net/music/#synopsis"]
[composed-by   = "Composed by"   @"http://psi.ontopia.net/music/#composed-by"]

2.3.2 RDF2TM test case

@prefix music: <http://psi.ontopia.net/music/#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .

[ rdf:type music:opera;
  rdfs:label "Tosca";
  music:premiere-date "1900-01-14";
  music:synopsis <http://www.azopera.com/learn/synopsis/tosca.shtml>;
  music:composed-by [
    rdf:type music:person;
    rdfs:label "Giacomo Puccini" ]
] .

               # ---------------------------------------

music:person        rdfs:label "Person" .
music:opera         rdfs:label "Opera" .

music:composed-by   rdfs:label "Composed by" .
music:premiere-date rdfs:label "Première date" .
music:synopsis      rdfs:label "Synopsis" .

3 Existing translation proposals

3.1 The Moore Proposal

3.1.1 Description

[Moore 01] was the first paper to address the issue of interoperability between RDF and Topic Maps. The paper starts out by presenting data models developed by the author that "capture the isness [sic] of the two paradigms". Having presented the two models, Moore introduces the distinction between what he calls "mapping the model" and "modelling the model". The key difference is that the former is "semantic", whereas the latter "uses each standard as a tool for describing other models". The two approaches will hereafter be termed "semantic mapping" and "object mapping", respectively.

Moore provides examples of both strategies but states clearly that semantic mapping is preferable to object mapping. The reason for this is that a goal is to be able to run, say, a TMQL query against an RDF model and get "expected results" ("i.e. those that would be gained from running a query against the equivalent topic map"). Moore points out that this is only possible when a semantic mapping approach is used.

An object mapping

Moore's RDF2TM object mapping approach is based on defining PSIs for every RDF construct in his model (i.e., resource, statement, property, subject, object, identity, literal, and model) and expressing RDF statements as ternary associations of type rdf-statement using the role types rdf-subject, rdf-property and rdf-object. This raises issues with the handling of literals (since role players in associations cannot be strings) to which no solution is proposed.

The TM2RDF object mapping approach is based on defining RDF properties for each TM construct as follows: topic, topicassoc, instanceof, topicassocmember, roleplayingtopic, roledefiningtopic, topicoccur, topicname, topicnamevalue, scopeset, subjindicatorref, resourceref. An example of a simple binary association is given that involves five topics (for the association type and role types, in addition to the role-playing topics). The RDF equivalent of this requires 22 statements, three for each of the five topics, and seven for the association itself.

A semantic mapping

Moore concludes that the object mapping approach, while interesting, is of limited usefulness, and he goes on to describe a semantic mapping approach (which he calls "mapping the model") based on the observation that "RDF is concerned with describing the arcs between entities with identity [whereas] Topic Maps is concerned with describing typed relationships between entities with identity." A number of semantic equivalences are defined, as follows:

RDFTopic Maps
RDF modelTopic Map
IdentitySubjectIndicatorReference
ResourceTopic
StatementAssociation (approximate)

The mapping from RDF statement to association is identified as being problematic because "RDF has three pieces of information and Topic Map associations have five", leading the author to suspect that a "complete" semantic mapping of the models may not be possible. The remainder of the paper is devoted to examining how to represent RDF statements as associations and vice versa.

RDF statements are viewed as binary associations whose role-players correspond to the subject and object of the statement and have the role types 'subject' and 'object' respectively. The mechanism for representing the property of the statement is not fully defined, since the text and the diagram contradict each another. However, both text and diagram assign some significance to the name of the topic that represents the subject role.

According to Moore, this approach has a problem in that 'arc' is "not a first class entity in the TopicMap model". Why this should be a problem is not made clear, but Moore advocates solving it by extending the Topic Maps model with the notion of arcs (and association templates).

A different approach is employed in order to view associations as RDF statements. An incomplete example shows a binary association represented as two RDF statements, with the role-playing topics being the subject and object in the one and the object and subject in the other. This approach is perhaps motivated by the recognition that RDF statements have direction whereas associations do not. However this is not stated explicitly; nor is it clear how the approach would work with associations that involve more than two role players.

3.1.2 Summary

Moore's object mapping approach is reasonably complete, whereas his semantic mapping approach is just a sketch that focuses on RDF statements and associations. Other constructs like names, occurrences and scope are not covered. Neither approach is reversible. In the case of the object mapping approach, the assumption is that one is working in one domain or the other, but not in both. In the case of the semantic mapping approach, the fact that a statement maps to a single association whereas an association maps to two statements shows that translations cannot be reversed.

Semantic mappings are shown to be superior to object mappings in terms of naturalness. The latter yields unnatural results in both directions. Whatever the direction, a "natural" source document leads to an "unnatural" result and achieving a "natural" result is only possible if the starting point is "unnatural". In the object mapping example given in the [Moore 01], a simple binary association translates to 22 RDF statements.

Moore's semantic mapping approach, on the other hand, achieves a more natural result: Going from Topic Maps to RDF, a binary association requires two RDF statements; going the other way, an RDF statement maps to a single association.

3.1.3 Test cases

TM2RDF

The following (incomplete) result of Moore's object mapping approach was constructed by hand, based on the binary association example given in [Moore 01]. It does not cover the two occurrences in the test case since there are no examples of how this proposal handles occurrences. Lack of clarity in [Moore 01] prevents the construction of a corresponding result of the semantic mapping approach; however, the latter could be expected to contain significantly fewer RDF statements.

# topic 1: puccini
_:puccini
 <http://www.empolis.com/rdftmmapping#tm-topicname>
  _:topic1 .

_:topic1
 <http://www.empolis.com/rdftmmapping#tm-topicnamevalue>
  "Giacomo Puccini" .

_:puccini
 <http://www.empolis.com/rdftmmapping#tm-instanceof>
  "http://psi.ontopia.net/music/#person" .

# topic 2: tosca
_:tosca
 <http://www.empolis.com/rdftmmapping#tm-topicname>
  _:topic2 .

_:topic2
 <http://www.empolis.com/rdftmmapping#tm-topicnamevalue>
  "Tosca" .

_:tosca
 <http://www.empolis.com/rdftmmapping#tm-instanceof>
  "http://psi.ontopia.net/music/#opera" .

# topic 3: composer
<http://psi.ontopia.net/music/#composer>
 <http://www.empolis.com/rdftmmapping#tm-topicname>
  _:topic3 .

_:topic3
 <http://www.empolis.com/rdftmmapping#tm-topicnamevalue>
  "Composer" .

<http://psi.ontopia.net/music/#composer>
 <http://www.empolis.com/rdftmmapping#tm-subjindicatorref>
  "http://psi.ontopia.net/music/#composer" .

# topic 4: opera
<http://psi.ontopia.net/music/#opera>
 <http://www.empolis.com/rdftmmapping#tm-topicname>
  _:topic4 .

_:topic4
 <http://www.empolis.com/rdftmmapping#tm-topicnamevalue>
  "Opera" .

<http://psi.ontopia.net/music/#opera>
 <http://www.empolis.com/rdftmmapping#tm-subjindicatorref>
  "http://psi.ontopia.net/music/#opera" .

# topic 5: composed-by
<http://psi.ontopia.net/music/#composed-by>
 <http://www.empolis.com/rdftmmapping#tm-topicname>
  _:topic5 .

_:topic5
 <http://www.empolis.com/rdftmmapping#tm-topicnamevalue>
  "Composed by" .

<http://psi.ontopia.net/music/#composed-by>
 <http://www.empolis.com/rdftmmapping#tm-subjindicatorref>
  "http://psi.ontopia.net/music/#composed-by" .

# topic 6: person
<http://psi.ontopia.net/music/#person>
 <http://www.empolis.com/rdftmmapping#tm-topicname>
  _:topic6 .

_:topic6
 <http://www.empolis.com/rdftmmapping#tm-topicnamevalue>
  "Person" .

<http://psi.ontopia.net/music/#person>
 <http://www.empolis.com/rdftmmapping#tm-subjindicatorref>
  "http://psi.ontopia.net/music/#person" .

# topic 7: work
<http://psi.ontopia.net/music/#work>
 <http://www.empolis.com/rdftmmapping#tm-topicname>
  _:topic7 .

_:topic7
 <http://www.empolis.com/rdftmmapping#tm-topicnamevalue>
  "Work" .

<http://psi.ontopia.net/music/#work>
 <http://www.empolis.com/rdftmmapping#tm-subjindicatorref>
  "http://psi.ontopia.net/music/#work" .

# association
_:assoc-1
 <http://www.empolis.com/rdftmmapping#tm-instanceof>
  <http://psi.ontopia.net/music/#composed-by> .
_:assoc-1
 <http://www.empolis.com/rdftmmapping#tm-topicassocmember>
  _:assocmember-1 .
_:assoc-1
 <http://www.empolis.com/rdftmmapping#tm-topicassocmember>
  _:assocmember-2 .
_:assocmember-1
 <http://www.empolis.com/rdftmmapping#tm-roledefiningtopic>
  <http://psi.ontopia.net/music/#composer> .
_:assocmember-1
 <http://www.empolis.com/rdftmmapping#tm-roleplayingtopic>
  _:puccini .
_:assocmember-2
 <http://www.empolis.com/rdftmmapping#tm-roledefiningtopic>
  <http://psi.ontopia.net/music/#work> .
_:assocmember-2
 <http://www.empolis.com/rdftmmapping#tm-roleplayingtopic>
  _:tosca .

The main thing to note about this test result is the number of statements required (28) to represent just a part of the information content that would naturally be expressed in RDF using just 12 statements. Since associations require seven statements it can be reasonably assumed that the two occurrences that are not represented here would require a further 2*7 statements, plus two statements (each) for the names of the occurrence types. This would bring the total number of statements to approximately 46.

The verbosity (or "statement bloat") seen here is typical of TM2RDF translation approaches that are based on object mappings, as will be confirmed by the accounts of the Stanford and Ogievetsky proposals.

RDF2TM

This test case cannot be represented as a topic map in its entirety following Moore's object mapping approach because there is no provision for RDF statements whose objects are literals (which is the case for eight of the 12 statements in the test case, including all the names). The four statements whose objects are resources would each be represented as a ternary association of type statement, as follows:

statement( ag0 : subject, composed-by : property, ag1 : object )

(This ternary association captures the assertion that Tosca (ag0) was composed by Puccini (ag1).)

The RDF2TM test case can also not be represented as a topic map in accordance with Moore's alternative semantic mapping approach due to insufficient information in [Moore 01]. Each RDF statement would in theory be represented by a single binary association, but once again there is no provision for handling statements whose objects are literals.


3.2 The Stanford Proposal

3.2.1 Description

Lacher and Decker [Lacher 01] focus on making it possible to query Topic Maps using an "RDF-aware infrastructure" that was co-developed by one of the authors. This proposal is thus TM2RDF only.

Reference is made to the layered integration model of data interoperability which separates the data integration problem into three quasi-independent layers: the syntax layer, the object layer, and the semantic layer. The idea is to build an RDF representation of the topic map on the object layer and then perform a "bijective graph transformation" such that the topic map can be viewed as RDF. Ignoring the syntax layer means that the approach will work with both the SGML and the XML serialization syntaxes of Topic Maps. Ignoring the semantic layer (i.e., adopting what we have termed an object mapping approach) has the advantage, according to the authors, that all information is preserved. (The authors point out that a semantic mapping "could possibly lead to a loss of information".)

Instead of defining their own model for Topic Maps, Lacher and Decker use PMTM4, the Processing Model for Topic Maps, proposed by Newcomb and Biezunski in [PMTM4], which has since been superseded by [TMDM].

PMTM4 is a graph model consisting of three node types (for topics, associations, and scopes), and four arc types: associationMember (aM), associationScope (aS), associationTemplate (aT), and scopeComponent (sC). The aM arc is "peculiar" in that it is both typed and labelled (and thus effectively has three ends) in order to connect the association with both the role-player and its role (or role type). Names and occurrences are regarded as specializations of associations; URIs and strings are not part of the model.

To illustrate their approach Lacher and Decker show a simple (untyped) association between the country Denmark (which has a name) and the natural resource petroleum. This is represented as a PMTM4 graph consisting of eight t-nodes, two a-nodes, four aM arcs, and one aT arc. The (binary) association between Denmark and petroleum requires two aM arcs (one for each role), and so does the name "Denmark" (since topic names are regarded in PMTM4 as a kind of binary association).

Lacher and Decker define RDF classes and properties for each of the PMTM4 node types and arc types. The transformation consists essentially of replacing a-, t-, and s-nodes with RDF nodes of corresponding types, and replacing arcs with corresponding properties. However in order to handle the "three-legged" aM arcs, reification is necessary, thus introducing one new RDF node and four new properties (rdf:subject, rdf:predicate, rdf:object and tms:roleLabel) for each aM arc. The resulting "RDF Topic Map graph" is shown as a figure consisting of a total of 17 nodes and 20 arcs. (The actual totals should probably be higher since rdf:type is only specified for a few nodes.)

The authors opt to represent each undirected PMTM4 arc by a single, directed RDF arc (rather than two arcs) in order to avoid consistency problems, pointing out that while this is not a lossy transformation, it does require consideration when formulating queries.

No syntax example is given in [Lacher 01] to show the result of the transformation but from the text it is clear that node identity is either based on source locators (where XML IDs were specified in the source topic map) or else generated (where no IDs were specified). Subject identifiers and subject locators are not used – presumably because the PMTM4 model does not extend to identifiers.

Having constructed an RDF graph from the topic map, Lacher and Decker show how it can be queried, together with native RDF data, by a single query expressed in F-Logic syntax. The query uses the RDF-encoded topic map to find all countries that have petroleum as a natural resource and then extracts links to DMOZ Travel_and_Tourism pages for those countries from the RDF-encoded Open Directory:

FORALL pages <- Country, DMOZCountry Y,X, Z
    Y[tms:roleLabel->country;rdf:object->Country]
        @CIA_WORLD_FACTBOOK and
    X[tms:roleLabel->natural-resource;
      rdf:object->petroleum;
      rdf:subject->Z[tms:associationMember->Country]
        @CIA_WORLD_FACTBOOK]
        @CIA_WORLD_FACTBOOK and
    Country[mapsTo->DMOZCountry] and
    DMOZCountry[Travel_and_Tourism ->dmozpage[links->pages]]
        @DMOZ.

3.2.2 Summary

The Stanford approach is complete with respect to PMTM4, but the latter is not a complete model for Topic Maps (since is does not handle URIs and strings). The Stanford proposal itself is therefore not complete.

The proposal does not score well in terms of naturalness since it requires upwards of 20 statements to represent information that would naturally be modelled using two statements in RDF. The TM2RDF test cases results in approximately 160 statements.

3.2.3 Test cases

TM2RDF

A test case has been requested from the authors. The following is an attempt to hand-code parts of the test case. Only the association and the names of the two role-playing topics are shown. All occurrences, type-instance relationships, and names of typing topics are omitted. It is estimated that these would require a further 115 statements (13*2=26; + 12*2=24; + 12*5+5=65) in addition to the 45 statements shown below.

@prefix  rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix  tms:   <http://www.standford.edu/rdftmmapping/tm-schema#> .
@prefix  psi1:  <file:psi1.xtm#> .
@prefix  core:  <file:core.xtm#> .

### composed-by association ---------------------------
_:puccini-tosca-assoc
  rdf:type                  tms:association ;
  tms:associationTemplate   _:composed-by .

# reified statement representing composer role
_:puccini-composer-role
  rdf:type          rdf:Statement ;
  rdf:subject       _:puccini-tosca-assoc ;
  rdf:predicate     tms:roleLabel ;
  rdf:object        _:puccini ;
  tms:roleLabel     _:composer .

# reified statement representing work role
_:puccini-work-role
  rdf:type          rdf:Statement ;
  rdf:subject       _:puccini-tosca-assoc ;
  rdf:predicate     tms:roleLabel ;
  rdf:object        _:tosca ;
  tms:roleLabel     _:work .

### topic-basename association for puccini ------------
_:puccini-name-assoc
  rdf:type                  tms:association ;
  tms:associationTemplate   psi1:at-topic-basename .

# reified statement representing topic role
_:puccini-topic-role
  rdf:type          rdf:Statement ;
  rdf:subject       _:puccini-name-assoc ;
  rdf:predicate     tms:roleLabel ;
  rdf:object        _:puccini ;
  tms:roleLabel     core:role-topic .

# reified statement representing basename role
_:puccini-name-role
  rdf:type          rdf:Statement ;
  rdf:subject       _:puccini-name-assoc ;
  rdf:predicate     tms:roleLabel ;
  rdf:object        "Giacomo Puccini" ;
  tms:roleLabel     core:role-basename .

### topic-basename association for tosca --------------
_:tosca-name-assoc
  rdf:type                  tms:association ;
  tms:associationTemplate   psi1:at-topic-basename .

# reified statement representing topic role
_:tosca-topic-role
  rdf:type          rdf:Statement ;
  rdf:subject       _:tosca-name-assoc ;
  rdf:predicate     tms:roleLabel ;
  rdf:object        _:tosca ;
  tms:roleLabel     core:role-topic .

# reified statement representing basename role
_:tosca-name-role
  rdf:type          rdf:Statement ;
  rdf:subject       _:tosca-name-assoc ;
  rdf:predicate     tms:roleLabel ;
  rdf:object        "Tosca" ;
  tms:roleLabel     core:role-basename .

### specification of node types -----------------------
_:puccini                 rdf:type          tms:topic .
_:tosca                   rdf:type          tms:topic .
_:composed-by             rdf:type          tms:topic .
_:composer                rdf:type          tms:topic .
_:opera                   rdf:type          tms:topic .

tms:associationTemplate   rdf:type          tms:topic .
tms:roleLabel             rdf:type          tms:topic .

core:role-topic           rdf:type          tms:topic .
core:role-basename        rdf:type          tms:topic .

3.3 The Ogievetsky Proposal

3.3.1 Description

From XTM to RDF

[Ogievetsky 01b] describes both a method for transforming topic maps expressed in XTM syntax ([XTM1.0]) to RDF and the author's XSLT-based implementation of this approach in the XTM2RDF Translator. Transformations are described in terms of the processing of XTM elements and the approach is thus very syntax-oriented. The resulting RDF conforms to a vocabulary (called RTM) which consists of 11 classes and 17 properties defined partly in terms of XTM itself and partly in terms of PMTM4, the "processing model" proposed by Newcomb and Biezunski and described in the preceding section.

The classes and properties defined by the RTM vocabulary are:

rdfs:Class
t-node, topic, scope, member, association, basename, variantname, occurrence, class-subclass, class-instance, templaterpc
rdf:Property
association-role, validIn, indicatedBy, constitutedBy, name, templatedBy, role-topic, role-basename, role-variantname, role-occurrence, role-superclass, role-subclass, role-class, role-instance, role-template, role-role, role-rpc

Each <topic> element results in the creation of an RDF statement of type rtm:topic. The topic's subject locator (if any) becomes the URI of the subject of the statement, otherwise a blank node is created. Subject identifiers (if any) result in properties of type rtm:indicatedBy. The purpose of stating that topics are of type rtm:topic seems to be the desire to use rtm:topic as an element type name in order to aid readability when using the "third RDF basic abbreviated form".

Associations are represented as blank nodes whose type corresponds to the association type. In addition, for each role in the association there is one statement whose property corresponds to the role type (e.g. ns1:composer and ns1:work in the example below); its value is a node of type rtm:member that references the role player. Referencing is done through an rtm:indicatedBy property when the role player has a subject identifier and an rtm:constitutedBy property when the role player has a subject locator. (The text does not state what form the reference takes when the role player has neither.)

The following example shows how the association between Tosca and Puccini is represented in RDF/XML in "third RDF basic abbreviated form":

<ns1:composed-by>
  <ns1:composer>
    <rtm:member>
      <rtm:indicatedBy rdf:resource="http://en.wikipedia.org/wiki/Puccini" />
    </rtm:member>
  </ns1:composer>
  <ns1:work>
    <rtm:member>
      <rtm:indicatedBy rdf:resource="http://psi.ontopia.net/opera/#tosca" />
    </rtm:member>
  </ns1:work>
</ns1:composed-by>

In all, seven RDF statements are used to represent the association. (Normally in RDF, of course, a relationship like this would be represented by a single statement.)

There is a very obvious similarity between the syntax shown above and XTM, which could indicate that the desire to output readable RDF/XML syntax (and perhaps the exigencies of XSLT-based processing) have influenced the form of RDF chosen for the target model.

In accordance with PMTM4, the approach to handling associations described above is extended to other Topic Maps constructs. Thus, the type-instance relationship is regarded as an association of a specific type; and occurrence, base name, and variant are all regarded as subtypes of association with fixed pairs of role types (topic/occurrence, topic/name, and basename/variantname, respectively).

String values for names and internal occurrences are represented as the values of rtm:name properties of member nodes. The following example shows the base name of the composer Puccini as output by the xtm2rdf.xsl XSLT stylesheet. A blank node represents the topic-basename relationship. Syntactically, the rtm:baseName construct has exactly the same "shape" as the association shown above:

<rtm:baseName rdf:ID="XSLTbaseName122124120120">
  <rtm:role-topic>
    <rtm:member>
      <rtm:indicatedBy rdf:resource="#puccini" />
    </rtm:member>
  </rtm:role-topic>
  <rtm:role-name>
    <rtm:member>
      <rtm:name>Giacomo Puccini</rtm:name>
    </rtm:member>
  </rtm:role-name>
</rtm:baseName>

As with binary associations, seven RDF statements are required to represent a single topic name characteristic that would naturally be modelled using a single statement in RDF.

Round-tripping RDF to Topic Maps and back

Having presented the methodology for translating XTM to RDF, Ogievetsky considers round-tripping from RDF to XTM and back to RDF. [Ogievetsky 01b] is actually a continuation of earlier work for which only a set of slides ([Ogievetsky 01a]) is available. In the earlier work RDF data was translated into XTM, again using XSLT stylesheets.

To demonstrate round-tripping [Ogievetsky 01b] shows an example of a Dublin Core fragment in RDF being translated to XTM according to the methodology in [Ogievetsky 01a], and then translated back to RDF according to the methodology in [Ogievetsky 01b]. The source document contains a single RDF statement asserting that the resource ZARA.xml has the creator "Jane M. Folpe". This translates to a topic map consisting of six TAOs (five topics and one association), which in turn translates back to RDF as a set of no less than 26 RDF statements. "Obviously we accumulated a lot of semantic luggage during our roundtrip" is Ogievetsky's laconic comment.

The remainder of [Ogievetsky 01b] is devoted to showing how "RDF Topic Maps" can be queried (using the RDF query language SquishQL) and constrained (using DAML+OIL). The following sample query shows how to find all topics that have names in the scope "taxon":

SELECT ?topic, ?name
FROM  http://www.cogx.com/xtm2rdf/seacr.rtm#
WHERE
  (rdf::type ?a ?rtm::basename)
  (rtm::role-topic ?a ?m1) (rtm::indicatedBy ?m1 ?topic)
  (rtm::role-name ?a ?m2)(rtm::name ?m2 ?name)
  (rtm::validIn ?a ?s)(rtm::indicatedBy ?s this::taxon)
USING
  rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#
  rtm FOR http://www.cogx.com/xtm2rdf/rtm.rdf#
  this FOR  http://www.cogx.com/xtm2rdf/seacr.rtm#

3.3.2 Summary

The proposal appears to be fairly complete in that it covers more-or-less every aspect of XTM syntax (which requires extending the underlying PMTM4 model in order to cater for identifiers). The example of round-tripping shows clearly that this proposal in combination with the undocumented RDF2TM translation fails the test of reversibility since a single RDF statement ends up as 26 statements after the roundtrip.

The proposal requires seven statements to represent information content that would naturally be modelled using one statement in RDF and thus rates very low in terms of naturalness. Translating the Topic Maps test case results in an RDF document containing 125 statements.

3.3.3 Test cases

TM2RDF
@prefix rtm: <http://www.cogx.com/xtm2rdf/rtm.rdf#> .
@prefix ns1: <http://psi.ontopia.net/music/#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix this: <#> .

this:XSLTbaseName121120120120
  rdf:type rtm:baseName;
  rtm:role-name  [
    rdf:type rtm:member;
    rtm:name "Opera" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:opera ] .

this:XSLTbaseName121121120120
  rdf:type rtm:baseName;
  rtm:role-name  [
    rdf:type rtm:member;
    rtm:name "Composer" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:composer ] .

this:XSLTbaseName121122120120
  rdf:type rtm:baseName;
  rtm:role-name  [
    rdf:type rtm:member;
    rtm:name "Première date" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:premiere-date ] .

this:XSLTbaseName121123120120
  rdf:type rtm:baseName;
  rtm:role-name  [
    rdf:type rtm:member;
    rtm:name "Composed by" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:composed-by ] .

this:XSLTbaseName121126120120
  rdf:type rtm:baseName;
  rtm:role-name  [
    rdf:type rtm:member;
    rtm:name "Synopsis" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:synopsis ] .

this:XSLTbaseName121127120120
  rdf:type rtm:baseName;
  rtm:role-name  [
    rdf:type rtm:member;
    rtm:name "Work" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:work ] .

this:XSLTbaseName121126120120
  rdf:type rtm:baseName;
  rtm:role-name  [
    rdf:type rtm:member;
    rtm:name "Person" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:person ] .

this:XSLTbaseName122124120120
  rdf:type rtm:baseName;
  rtm:role-name  [
    rdf:type rtm:member;
    rtm:name "Giacomo Puccini" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:puccini ] .

this:XSLTbaseName122125120120
  rdf:type rtm:baseName;
  rtm:role-name  [
    rdf:type rtm:member;
    rtm:name "Tosca" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:tosca ] .

this:XSLTinstanceOf120124120120
  rdf:type rtm:classInstance;
  rtm:role-class  [
    rdf:type rtm:member;
    rtm:indicatedBy ns1:person ];
  rtm:role-instance  [
    rdf:type rtm:member;
    rtm:indicatedBy this:puccini ] .

this:XSLTinstanceOf120125120120
  rdf:type rtm:classInstance;
  rtm:role-class  [
    rdf:type rtm:member;
    rtm:indicatedBy ns1:opera ];
  rtm:role-instance  [
    rdf:type rtm:member;
    rtm:indicatedBy this:tosca ] .

this:XSLToccurrence123125120120
  rdf:type ns1:premiere-date;
  rtm:role-occurrence  [
    rdf:type rtm:member;
    rtm:name "1900 (14 Jan)" ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:tosca ] .

this:XSLToccurrence124125120120
  rdf:type ns1:synopsis;
  rtm:role-occurrence  [
    rdf:type rtm:member;
    rtm:constitutedBy <http://www.azopera.com/learn/synopsis/tosca.shtml> ];
  rtm:role-topic  [
    rdf:type rtm:member;
    rtm:indicatedBy this:tosca ] .

[ rdf:type ns1:composed-by;
  ns1:composer  [
    rdf:type rtm:member;
    rtm:indicatedBy this:puccini ];
  ns1:work  [
    rdf:type rtm:member;
    rtm:indicatedBy this:tosca ] ].

this:composed-by
  rdf:type rtm:topic;
  rtm:indicatedBy ns1:composed-by .

this:composer
  rdf:type rtm:topic,
           rdf:Property;
  rtm:indicatedBy ns1:composer;
  rdfs:domain ns1:composed-by;
  rdfs:range rtm:member .

this:work
  rdf:type rtm:topic,
           rdf:Property;
  rtm:indicatedBy ns1:work;
  rdfs:domain ns1:composed-by;
  rdfs:range rtm:member .

this:premiere-date
  rdf:type rtm:topic,
           rdfs:Class;
  rtm:indicatedBy ns1:premiere-date;
  rdf:subClassOf rtm:occurrence .

this:synopsis
  rdf:type rtm:topic,
           rdfs:Class;
  rtm:indicatedBy ns1:synopsis;
  rdf:subClassOf rtm:occurrence .

this:opera
  rdf:type rtm:topic,
           rdfs:Class;
  rtm:indicatedBy ns1:opera .

this:person
  rdf:type rtm:topic,
           rdfs:Class;
  rtm:indicatedBy ns1:person .

this:puccini
  rdf:type rtm:topic .

this:tosca
  rdf:type rtm:topic .

3.4 The Garshol Proposal

3.4.1 Description

This proposal was originally presented in [Garshol 01] as part of a comparative analysis of the RDF and Topic Maps models. The analysis was further developed (and extended to partially address OWL) in [Garshol 03a]. The approach has been implemented by the author in the Ontopia Knowledge Suite.

[Garshol 03a] starts by comparing RDF and Topic Maps through an examination of concepts that are fundamental to both paradigms: "symbols and things", "assertions", "identity", "reification", "qualification", and "types and subtypes". For each concept, Garshol shows how they are expressed in each paradigm and draws out the similarities and differences.

Comparing RDF and Topic Maps

According to Garshol, RDF and Topic Maps are both "identity-based technologies"; that is, the key concept in both is symbols representing identifiable things about which assertions can be made. In Topic Maps, "things" are called "subjects"; in RDF they are called "resources" and, despite different definitions, they are essentially the same concept. Subjects are represented by topics; resources are represented by RDF nodes (or "nodes" for short). According to Garshol, the correspondence between "topic" and "node" is close but not exact; he does not explain why, but the reason is presumably that RDF nodes can be literals, which would be represented as strings rather than topics in Topic Maps.

Assertions express relationships between things and take the form of "topic characteristics" in Topic Maps and "statements" in RDF. A topic characteristic can be a name, an occurrence, or an association. An RDF statement can thus in theory be mapped to any one of these three kinds of construct. Special attention is paid to associations since these can be of any arity, whereas all RDF statements are binary. A binary association maps fairly well to an RDF statement, but a non-binary association does not.

In addition, RDF statements have direction but associations do not. Topic Maps uses the notion of "roles" to express the nature of each subject's involvement in the relationship; in RDF this involvement is implicit in the subject-predicate-object structure of the statement.

For these reasons, the correspondence between topic characteristics and statements is considered to be close, but not exact.

The issue of identity is considered to be "quite a thorny problem for interoperability between topic maps and RDF" since, although Topic Maps and RDF both use URIs as identifiers, they do so in different ways. In RDF there is only one kind of identifier and a node can have at most one of them (blank nodes and literals have none). In Topic Maps, topics can have any number of identifiers and a distinction is made between "subject locators" (URIs which identify the subject directly, formerly called "subject addresses") and "subject identifiers" (URIs which identify the subject indirectly, via a subject indicator). Garshol refers the reader to a more in-depth treatment of the issue in [Pepper 03], which is discussed in section 3.6 below.

Garshol's discussion of reification brings out certain differences between Topic Maps and RDF but does not reach any conclusion regarding the degree of correspondence between the two, although the point is made that reification is not a very commonly used feature. Qualification is seen as being related to reification, since the built-in Topic Maps feature "scope" is essentially a mechanism for making certain kinds of assertions about other assertions, but no proposal is made regarding how to express scope in RDF.

The concept of types and subtypes, on the other hand, is regarded as being identical in Topic Maps and RDF (except for the fact that the subClassOf property is part of RDF Schema rather than RDF itself).

Garshol summarizes his analysis by pointing to three fundamental differences between RDF and Topic Maps that "make it technically very difficult to merge" the two paradigms: identity, assertions, and reification (including qualification). The rest of his paper therefore focuses on ways to "move data between the two with as little effort as possible" (rather than on how to unify the two models).

Object mappings or semantic mappings?

The object mapping approach taken by [Moore 01], [Lacher 01], [Ogievetsky 01b], and [Garshol 02] is briefly considered and then rejected as being

both heavy-weight and rather awkward to work with. Any query or retrieval specified in end-user terms will have to explicitly take into account topic map model features, and information from topic maps will not interoperate cleanly with other RDF information.

Garshol's conclusion is that "although this [object mapping] approach is easy to use, the results do not meet the criterion of clean integration with other RDF data."

As an alternative, Garshol proposes to use vocabulary-specific mappings underpinned by a generic mapping. Statements should in general be mapped to names, occurrences or associations since this provides the most "natural" results. However, it is not possible to know which of these is most appropriate for any given statement without an understanding of the semantics of the property in question – hence the need for vocabulary-specific mappings. For example, the RDF statement:

<http://example.com/X>
  <http://example.com/Y>
    "foo" .

could be mapped in Topic Maps to either a name or an internal occurrence (since the object is a literal). Similarly, the statement:

<http://example.com/X>
  <http://example.com/W>
    <http://example.com/Z> .

could be mapped to either an association or an external occurrence (since the object is a resource). An optimal semantic translation cannot be performed without knowledge of the semantics of the properties Y and W.

RDF2TM mapping

The solution according to Garshol is to provide additional mapping information. This is done using an RDF vocabulary called RTM ([Ontopia 03a]) which is used to annotate RDF documents (or their schemas) and thus guide the translation process. The RTM vocabulary is used for translating from RDF to Topic Maps and consists of the following RDF properties: maps-to, type, in-scope, subject-role, object-role.

The maps-to property can have the following values:

Mappings that use rtm:occurrence or rtm:association will automatically use the statement's property to type the resulting Topic Maps construct, unless rtm:type is used to override this behaviour. The rtm:in-scope property can be used to specify scoping topics for base names, occurrences, and associations. Finally, the rtm:subject-role and rtm:object-role properties are used to specify the types of role played by the subject and object of an RDF statement when the statement maps to an association.

The vocabulary (and the implementations) go somewhat beyond what is covered in [Garshol 03a]. For example, it is recognized that properties may be mapped to various kinds of identifiers (source locators, subject identifiers, and subject locators) or to the privileged instance-of relationship, in addition to names, occurrences, and associations.

In addition, greater provision is made in the implementation for defaulting. Resource URIs are always mapped to subject identifiers and RDF statements can be imported as associations in the absence of role type information, in which case the predefined topics subject and object are used as role types.

TM2RDF mapping

Going from Topic Maps to RDF is shown to require additional information in order for optimal and/or predictable results to be achieved. The following problems are identified:

  1. Choosing properties when mapping names
  2. Choosing the subject when mapping associations

Garshol points out a number of issues that are not addressed in his analysis, including multiple identifiers, n-ary associations, reification and scoping, unary associations, variant names, and a number of (unspecified) "tricky edge cases"; for some of these he sketches possible solutions which have since been implemented:

  • Multiple URIs for the same topic can be handled using the RDF properties for equivalence found in OWL.

  • Associations with more than two roles can be turned into resources whose type is the association type, and each role can then be represented as a separate statement with the role type as the property and the association resource as the subject.

  • Reification and scoping can in general be represented by using RDF reification to represent the statement that would connect the topic characteristic with the topic. A special property will have to be defined for representing scope. As for the reification this is done by simply merging the resource representing the topic characteristic assignment with that representing the reifying topic.

  • Binary non-symmetric associations can be handled by having the mapping contain one association from the association type to the preferred subject role.

  • Selection of name properties can be done by having the mapping contain an association from the topic type to a topic representing the preferred RDF name property.

A second vocabulary (called TMR, [Ontopia 03b]), consisting of six published subjects, addresses many of these issues. Name mapping is handled by tmr:name-property, tmr:type, and tmr:property, and the problem of mapping associations is solved using tmr:preferred-role, tmr:association-type, and tmr:role-type.

As with the RDF2TM translation, the implementations provide some level of defaulting. Both subject identifiers and subject locators are automatically mapped to resource URIs. In addition, associations can be exported to RDF in the absence of mapping information about roles; in this case the choice of subject and object for the resulting statement is arbitrary.

The remainder of [Garshol 03a] is devoted to a comparison of the respective constraint and query languages of Topic Maps and RDF and is thus beyond the scope of this analysis.

3.4.2 Summary

As currently specified the Garshol proposal provides an almost complete solution and the author himself identifies most of the respects in which it is incomplete. Those which are not mentioned include containers, collections, XML literals and typed literals. A high degree of reversibility and round-tripping is achievable, provided appropriate reverse mappings are generated during the translation. An issue exists with subject locators that end up as subject identifiers when round-tripping from Topic Maps to RDF and back to Topic Maps.

The proposal scores well in terms of naturalness. Even when using default mappings the results are quite natural. The TM2RDF test case results in an RDF document containing 13 statements. The RDF2TM test case results in a topic map containing 25 TAOs (19 topic, three associations, and three occurrences).

3.4.3 Test cases

The test translations were performed using Ontopia's Omnigator Eight (OKS 2.1.0, build 2004-12-15 #1495) [Ontopia 05].

TM2RDF

The source document was opened in the Omnigator and exported to RDF with default mappings. The document was then converted to N3 format using the Mindswap RDF Converter [Mindswap 02], and finally tidied by hand in order to ease comparison with the source document.

Note that since default settings were used, the choice of Puccini as the subject of the music:composed-by statement was entirely abitrary. Using Garshol's mapping mechanism it would be possible to ensure that Tosca, rather than Puccini, became the subject of this statement. The result of the translation would then be identical to the equivalent RDF2TM, except that the role types 'composer' and 'work' would also be present.

@prefix : <http://psi.ontopia.net/music/#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

[ rdf:type :person;
  rdfs:label "Giacomo Puccini";
  :composed-by  [
    rdf:type  :opera;
    rdfs:label "Tosca";
    :premiere-date "1900-01-14";
    :synopsis <http://www.azopera.com/learn/synopsis/tosca.shtml> ]].

:composer      rdfs:label "Composer" .
:opera         rdfs:label "Opera" .
:person        rdfs:label "Person" .
:work          rdfs:label "Work" .

:composed-by   rdfs:label "Composed by" .
:premiere-date rdfs:label "Première date" .
:synopsis      rdfs:label "Synopsis" .
RDF2TM

The source document was imported to the Omnigator with mappings specified using the RDF2TM plug-in. Default mappings were used for all properties except rdfs:label, which was mapped to a base name instead of an internal occurrence, and music:synopsis, which was mapped to an occurrence instead of an association. The document was then exported to LTM and tidied by hand in order to ease comparison with the source document.

Note that Garshol's mapping vocabulary allows for more precise specification of the subject and object role types, but this capability was not used here. As a result, the role types 'work' and 'composer become 'subject' and 'object', respectively and a scoping topic ("Generated name") is added to the names of these topics.

@"utf-8"
#VERSION "1.3"

[puccini : person = "Giacomo Puccini"]
[tosca   : opera  = "Tosca"]

{tosca, premiere-date, [[1900-01-14]]}
{tosca, synopsis,      "http://www.azopera.com/learn/synopsis/tosca.shtml"}

composed-by( tosca : subject, puccini : object )

[person = "Person"  @"http://psi.ontopia.net/music/#person"]
[opera  = "Opera"  @"http://psi.ontopia.net/music/#opera"]

[premiere-date = "Première date"  @"http://psi.ontopia.net/music/#premiere-date"]
[synopsis      = "Synopsis"  @"http://psi.ontopia.net/music/#synopsis"]
[composed-by   = "Composed by"  @"http://psi.ontopia.net/music/#composed-by"]

[subject  = ":subject" / gen-name  @"http://psi.ontopia.net/rdf2tm/#subject"]
[object   = ":object" / gen-name  @"http://psi.ontopia.net/rdf2tm/#object"]
[gen-name = "Generated Name"  @"http://psi.ontopia.net/rdf2tm/#gen-name"]

3.5 The Unibo Proposal

3.5.1 Description

The Unibo proposal is described briefly in [Ciancarini 03] and more fully in [Gentilucci 02] (in Italian). This description draws on both sources.

Ciancarini et al cite the work of Moore, Lacher and Decker, and Ogievetsky, all of which, they claim, suffers from a common drawback, namely the "rather awkward appearance of the documents coming out of the conversion." The authors clearly prefer Garshol's approach, which produces much more "readable" results and which is similar to their own. The main difference is that Garshol does not utilize the "standard RDF and RDFS predicates" and thus always requires a mapping to be specified.

Like earlier authors, Ciancarini et al recognize that there are two fundamental approaches to tackling the problem of translation, corresponding to what this survey calls object mapping and semantic mapping. The first of these is seen to be problematic in that "the converted document is necessarily very different from the one that would have been written directly in the destination language, and hardly readable." The problem with the second one is that it is "not always possible" to identify semantic equivalences, and that doing so often requires a case-by-case approach and thus has no general usefulness.

The authors therefore consider a hybrid approach to be the optimal solution and their implementation in the Meta Converter combines a generic mapping, which tries to stay as close as possible to the original semantics, with the ability to define specific mappings using an XML vocabulary. Section 3.3 of [Gentilucci 02] provides a fairly detailed overview of the generic mapping while Chapter 4 describes the mechanism for specific mappings.

Identity

Like Garshol, Ciancarini et al assume a basic equivalence between topic and resource (although they are less clear on the distinction between resources and RDF nodes), but they differ in how identity is expressed. The default behaviour in the Unibo proposal is to equate subject locators with resource URIs and to represent subject identifiers using the RDFS property isDefinedBy. Examples given in [Gentilucci 02] (e.g., 3.8 and 4.2) show how this leads to resources that clearly represent non-addressable subjects, such as "Mario Rossi" and "Format", being translated to addressable subjects (using <resourceRef> for subjectIdentity).

Topics that have no subject locator are translated to blank nodes whose ID is generated from the topic's base name. When going the other way, the ID of a blank node becomes a topic name, which is clearly unnatural (since the ID of a blank node and a topic name have different semantics).

names

The Unibo proposal is alone in assuming a fundamental equivalence of semantics between base names and the rdfs:label property: Names that have no variants are thus easy to handle. Variant names are seen to represent a greater challenge which is solved through the use of four RDF predicates: baseName, variant, parameter, and variantName. A base name that has a variant is represented through a blank node with rdfs:label and tm2rdf:variant properties: the former is a literal that corresponds to the value of the topic name (i.e., the <baseNameString> in XTM syntax); the value of the latter property is another blank node that has variant and parameter properties. Thus a topic with a base name and sort name:

[mario_rossi = "Mario Rossi";"rossi mario"]

results in the following statements:

_:mario_rossi
  tm2rdf:baseName    _:bn1_mario_rossi .

_:bn1_mario_rossi
  rdfs:label         "Mario Rossi" ;
  tm2rdf:variant     _:v11_mario_rossi .

_:v11_mario_rossi
  tm2rdf:variantName "rossi mario" ;
  tm2rdf:parameter   _:param1 .

_:param1
  rdfs:isDefinedBy   <http://www.topicmaps.org/xtm/1.0/core.xtm#sort> .
Associations: TM2RDF

Predictably, representing associations in RDF is regarded as difficult because of what the authors term RDF's "more primitive" (i.e., low level) nature compared to Topic Maps. A generic translation is possible "at the level of the model," but it is "complex and artificial" and comes at the price of "abusing the RDF way of expressing relationships." The basic approach is similar to Ogievetsky's in that the roles (or "members") are contained in an RDF bag of blank nodes. However, whereas in Ogievetsky the bag is the association, the Unibo proposal uses an additional resource to represent the association; this resource has a tm2rdf:association property, the object of which is the bag of members. All in all, nine RDF statements are required to represent a single binary association.

The tm2rdf:association property is characterized as a "supporting predicate" whose purpose is to "add a little legibility" to the resulting document. A variation on this is also suggested in which the bag of members and the association become a single node: This is effectively the same solution as Ogievetsky's.

[Gentilucci 02] also describes two alternative approaches in which n-ary associations are decomposed into a number of binary relations. Both of these require six RDF statements in order to represent a single ternary association. Given the following association:

X( A : rA , B : rB , C : rC )

(i.e. an association of type X between topics A, B, and C playing the roles rA, rB, and rC respectively), the first of these alternative approaches results in the following six statements:

A X B .   A X C .   B X A .   B X C .   C X A .   C X B .

Role types are lost. In addition, the fact that each pair of role players is related through the same predicate twice (both as subject and object and as object and subject) means that only symmetrical relationships would be represented correctly. Finally, the semantic of A, B, and C all being involved in the same relationship is also lost; this may or may not involve real loss of information depending on the nature of the relationship.

The second alternative approach involves predicates that correspond to role types and results in the following statements:

B rA A .   C rA A .   A rB B .   C rB B .  A rC C .   B rC C .

While role types are now preserved, the association type is lost (although it could in theory be preserved through additional statements relating it to rA, rB, and rC). In addition, it seems doubtful that the original semantics are correctly preserved. For example: Can it be assumed to be the case that the relationship between role players B and A (rA) is the same as that between C and A? Finally, the point made above about losing the semantic of the involvement of A, B, and C in the same relationship also pertains here.

Having considered these alternatives, the Unibo proposal comes down in favour of the approach that uses the tm2rdf:association property, at least in the absence of more specific mapping information.

Associations: RDF2TM

When translating in the opposite direction, from RDF to Topic Maps, the generic solution proposed by Unibo is to translate RDF statements to associations. The example given in [Gentilucci 02] results in a typed binary association with untyped roles and does not take into consideration the case in which the object of the RDF statement is a literal. However, it is conceded that "it might be preferable, in certain contexts, to apply other types of conversion" and this leads into a discussion of "attributes" and the role of schema information.

The Unibo proposal recognizes that certain RDF statements are more appropriately mapped to either internal or external occurrences, with the occurrence type corresponding to the property of the statement, but knowing when to do this requires some kind of schema information. This is essentially the same as Garshol's approach, except for the fact that Unibo uses an XML vocabulary rather than an RDF vocabulary to specify the mapping information.

Scope

In this context a proposal is put forward for representing scoped occurrences in RDF: An rdfs:seeAlso property has a blank node as its object; the blank node has an rdfs:isDefinedBy property (whose object is the URI of an external occurrence) and one or more tm2rdf:scope properties. This results in a construct whose "shape" is very different from that of an unscoped occurrence. In addition, given that the range of the rdfs:isDefinedBy property is rdf:Resource, it is unclear how this approach would work with internal occurrences.

A "not very elegant" way to represent scoped names is suggested that involves defining a property, whose rdf:type is tm2rdf:baseName, that corresponds either directly or indirectly (it is not clear which) to each scoping topic. In addition to being inelegant, this would not work with scopes comprised of multiple scoping topics. The alternative is the same as that proposed by Garshol: i.e., to qualify reified statements. To do this, Unibo defines the tm2rdf:scope property.

For scoped associations, reification in the RDF sense is not necessary since associations are already represented as resources (at least in the default mapping). Thus, all that is required is to assign one or more tm2rdf:scope properties to that resource. The downside to this is that scoping is now handled in three different ways (for generically mapped associations, for occurrences, and for names and specifically mapped associations, respectively).

Reification, typing, and subtyping

Neither reification, typing, or subtyping are regarded by Unibo as posing problems since both RDF and Topic Maps support all three concepts in essentially the same way: instanceOf equates to rdf:type; the supertype-subtype relationship (represented in Topic Maps using an association with a predefined type) equates to rdfs:subClassOf, and reification is essentially the same in Topic Maps and RDF.

Specific mappings

The description above has focused on the Unibo proposal's approach to generic translations. However, it is recognized that a generic approach will not always produce optimal results and so a method is provided for "guiding" the translation process. This consists of a simple XML vocabulary that allows the user to specify how to translate a (binary) association to a single RDF statement (and vice versa). As in the Garshol proposal, this involves specifying correspondences between association role types and the statement's subject and object. In addition, a user can specify which RDF properties should be mapped to occurrences rather than to associations. The following extract shows how mappings for the TM2RDF test case would be specified:

<?xml version="1.0"?>
<xtm2rdf>
  <property_associations>
    <li id="composed-by">
      <domain_role id="work"/>
      <range_role id="composer"/>
    </li>
  </property_associations>
  <property_occurrences>
    <li id="premiere-date"/>
    <li id="synopsis"/>
  </property_occurrences>
</xtm2rdf>

These mappings would cause the composed-by association to be represented as a single statement in RDF, with Tosca ("work" = domain) as the subject and Puccini ("composer" = range) as object. In addition, the mapping contains information that would cause properties of type premiere-date and synopsis to be mapped to occurrences when going from RDF to Topic Maps. (Although not stated explicitly, this information is presumably not required when going the other way.)

3.5.2 Summary

The Unibo proposal is fairly complete but some features, e.g., language tags and data typing in RDF, and reification of roles and topic maps, are not covered explicitly. The proposal permits some degree of reversibility, but the result of a roundtrip may not always be the same as the starting point. For example, using the generic mappings, most RDF statements would be translated to typed associations with untyped roles, each of which would result in several statements when translated back to RDF.

The approach produces somewhat natural results in both directions provided mapping information is supplied. Generic translations are far less satisfactory, with a single binary association resulting in nine RDF statements.

3.5.3 Test cases

TM2RDF

The source document was translated using the tool Meta ([Gentilucci 02]) with the default settings. The resulting document was then converted to N3 format using the Mindswap RDF Converter [Mindswap 02], and finally tidied by hand in order to ease comparison with the source document.

@prefix : <#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix s: <http://cs.unibo.it/meta/tmschema.rdf#> .

# tosca: type, name, occurrences
:tosca rdf:type :opera;
  rdfs:label "Tosca";
  rdfs:seeAlso :dataOccurrence_7a34558a,
               :dataOccurrence_d5a519a2 .

:dataOccurrence_7a34558a rdf:type :premiere-date;
  rdfs:label "1900-01-14" .

:dataOccurrence_d5a519a2 rdf:type :synopsis;
  rdfs:isDefinedBy <http://www.azopera.com/learn/synopsis/tosca.shtml> .

# puccini: type, name
:puccini rdf:type :person;
  rdfs:label "Giacomo Puccini" .

# association
:x1ko2ljned-b rdf:type :composed-by;
  s:association :members_x1ko2ljned-b .
:members_x1ko2ljned-b rdf:type rdf:Bag;
  rdf:_1 :x1ko2ljned-b_1;
  rdf:_2 :x1ko2ljned-b_2 .
:x1ko2ljned-b_1 rdf:type :composer;
  rdfs:isDefinedBy :puccini .
:x1ko2ljned-b_2 rdf:type :work;
  rdfs:isDefinedBy :tosca .

# typing topics
:opera rdf:type rdfs:Class;
  rdfs:isDefinedBy <http://psi.ontopia.net/music/#opera>;
  rdfs:label "Opera" .

:person rdf:type rdfs:Class;
  rdfs:isDefinedBy <http://psi.ontopia.net/music/#person>;
  rdfs:label "Person" .

:composed-by rdf:type rdfs:Class;
  rdfs:isDefinedBy <http://psi.ontopia.net/music/#composed-by>;
  rdfs:label "Composed by" .

:premiere-date rdf:type rdfs:Class;
  rdfs:isDefinedBy <http://psi.ontopia.net/music/#premiere-date>;
  rdfs:label "Premiere date" .

:synopsis rdf:type rdfs:Class;
  rdfs:isDefinedBy <http://psi.ontopia.net/music/#synopsis>;
  rdfs:label "Synopsis" .

:composer rdfs:isDefinedBy <http://psi.ontopia.net/music/#composer>;
  rdfs:label "Composer" .

:work rdfs:isDefinedBy <http://psi.ontopia.net/music/#work>;
  rdfs:label "Work" .

38 statements are required to represent what would naturally be expressed as 12 statements in RDF. Much of the additional baggage is due to the use of nine statements to represent the association between Tosca and Puccini; specifying that the five typing topics are instances of rdfs:Class; requiring an additional statement for each subject identifier; and expressing occurrences via a blank node.

The amount of additional baggage would be reduced somewhat (although not completely) if Meta's ability to express additional mapping information were used.

RDF2TM

The result of this test case as delivered by the developers of Meta seems to be in error. Among other things, the topic Puccini is missing altogether. The editors have reported this to the developers and if their assumption is correct the test case will be updated in the next draft of this Survey.

[ass-ins_1 : opera = "Tosca"]
{ass-ins_1, premiere-date, "1900-01-14"}

[tosca @"http://www.azopera.com/learn/synopsis/tosca.shtml"]

synopsis( ass-ins_1, tosca )

[opera : class = "Opera" @"http://psi.ontopia.net/music/#opera"]
[composer : class = "Composer" @"http://psi.ontopia.net/music/#composer"]
[premiere-date : occurrence = "Premiere date" @"http://psi.ontopia.net/music/#premiere-date"]
[synopsis : association = "Synopsis" @"http://psi.ontopia.net/music/#synopsis"]
[composed-by : association = "Composed by" @"http://psi.ontopia.net/music/#composed-by"]

[class @"http://www.topicmaps.org/xtm/1.0/core.xtm#class"]
[association @"http://www.topicmaps.org/xtm/1.0/core.xtm#association"]
[occurrence @"http://www.topicmaps.org/xtm/1.0/core.xtm#occurrence"]

3.6 Other Proposals and Contributions

The preceding sections have described the five most relevant proposals for RDF/TM interoperability. A number of other proposals and contributions have also been considered:

Both [Vlist 01] and [Prudhommeaux 02] are early "first cuts" (produced "before breakfast" and as "an evening's work" respectively). Both are very incomplete and have been superseded by later work; they are mentioned here for the sake of completeness.

[Garshol 02] is a complete RDF Schema for Topic Maps based on an early version of [TMDM]. It is similar to the Stanford and Ogievetsky proposals in that it models Topic Maps in RDF and falls therefore into the general category of object mappings which the author himself has since rejected. Its principal interest lies in the fact that it is based on a more complete and consistent model than that of PMTM4 and it will therefore be used to illustrate the difference between object mappings and semantic mappings in the next section.

[Kaminsky 02] presents a conceptual metamodel called Braque that is characterized by the author as being a "superset of the most popular proposed semantic web metamodels" (viz. XML, RDF, and Topic Maps), and defines transformations from each of these into Braque. Kaminsky's work is clearly of great interest to anyone seeking to unify RDF and Topic Maps into a single model. However, that is not the mandate of the RDFTM task force. Kaminsky's proposal cannot be considered as a solution to the more immediate problem of interchanging RDF and Topic Maps data, since no transformations out of Braque are defined. [Kaminsky 02] is therefore considered to be out of scope for the current work.

[Pepper 03] provides an in-depth discussion of identifiers in RDF and Topic Maps and is thus relevant to the issue of identity discussed below. The authors' main goal is to clarify the distinction between direct and indirect identification of subjects, and to pinpoint the lack of an ontological distinction between "resources" (in general) and "information resources" (which are a subset of "resources"). Direct identification is possible only for information resources; indirect identification is possible for any kind of resource (including information resources). With the publication of [TAG], and the introduction of the formal concept of information resource in the architecture of the World Wide Web, these distinctions have now been recognized and this could pave the way to solving the issue of identity, at least as far as RDF and Topic Maps interoperability is concerned.

[Vatant 04] investigates how OWL (Web Ontology Language) may be used to constrain topic maps and has relevance for the expression of additional information that may be used to guide a translation. This will be more pertinent to the Guidelines to be developed on this basis of the current survey.


4 Analysis

This chapter provides a general analysis based on the preceding discussion of existing proposals for translating data between RDF and Topic Maps.

The first point to be noted is that all of the major proposals suffer from the fact that neither Topic Maps nor RDF had stable, formalized data models at the time they were written. PMTM4 never had any official standing and has since been superseded by the [TMDM], part 2 of the forthcoming revised Topic Maps standard. In the case of RDF, [RDF-Concepts] and [OWL] first appeared in 2004. Now that these formal models exist, it should be possible to define complete and correct mappings at either the object or the semantic level.

4.1 Object mappings and semantic mappings

All the existing approaches fall into two distinct categories that Moore originally termed "modelling the model" and "mapping the model". Following the terminology of Lacher and Decker these might be more appropriately termed "object mappings" and "semantic mappings" respectively. The basic difference between the two approaches can be summed up as follows:

The advantage of an object mapping is that it is easy to make it generic (provided, of course, that the object model on which it is based is complete) and this ensures completeness without any additional effort. The disadvantage is the unnaturalness of the result. Semantic mappings yield much more natural results but suffer from the disadvantage that genericity is much harder to ensure and may in some cases require additional information not always present in the source document.

Of the existing proposals, Stanford and Ogievetsky both use object mappings based on [PMTM4]. Moore discusses both an object mapping (based on his own inaccurate models) and a semantic mapping. Garshol dismisses object mappings and concentrates solely on semantic mappings. Unibo attempts to combine both approaches in order to achieve the dual goals of a default, generic mapping that can be used without additional information, and a method for providing specific mapping information in cases where a more natural translation is required.

4.2 The importance of naturalness

The notion of "naturalness" was defined in section 2.1 as follows:

The criterion naturalness expresses the degree to which the results of a translation correspond to the way in which someone familiar with the target paradigm would naturally express the information content in that paradigm. Naturalness normally also confers improved readability on the result.

Naturalness is extremely important because the result of an "unnatural" translation is structurally different from data that was originally created in the target model. This has the following consequences, all of which lead to reduced interoperability:

Object mappings generally rate very low on naturalness and are therefore subject to all three of these failings. As an example, consider the following topic map:

{tosca, music:premiere-date, [[1900-01-14]]}

This defines an occurrence of type music:premiere-date whose value is "1900-01-14". A semantic mapping to RDF would result in the following translation:

_:a0  music:premiere-date  "1900-01-14" .

An object mapping would look as follows:

_:a1, rdf:type, tm:Topic .
_:a1, tm:occurrence, _:a2 .
_:a2, rdf:type: tm:Occurrence .
_:a2, tm:occurrence-type, _:a3 .
_:a3, tm:subject-identifier, music:premiere-date .
_:a2, tm:resource, "1900-01-14" .

This example uses the vocabulary defined in [Garshol 02] that is based on [TMDM], in order to conform to the most standard data model for Topic Maps. It serves to illustrate the fact that object mappings are inherently more verbose than semantic mappings. They also involve a significant amount of indirection and can thus be expected to lead to a lot of processing overhead. Even more important is that the semantics are actually different. The result of an object mapping consists of constructs that carry Topic Maps semantics (such as "topic", "occurrence", "occurrence type", etc.) which RDF processors are required to understand in order to be able to process the result correctly.

As an example, consider merging the results of semantic and object mappings respectively with native RDF data that includes the following statement:

_:b0  music:premiere-date  "1900-11-10" .

This statement asserts that some resource had its premiere date on 1900-11-10. A merged result that used the semantic mapping would look as follows:

_:a0  music:premiere-date  "1900-01-14" .
_:b0  music:premiere-date  "1900-11-10" .

This would be easily queryable (for example for all premières that took place in the year 1900) in terms of the music vocabulary alone. Contrast this with the following result of merging where one of the components is based on an object mapping:

_:a1, rdf:type, tm:Topic .
_:a1, tm:occurrence, _:a2 .
_:a2, rdf:type: tm:Occurrence .
_:a2, tm:occurrence-type, _:a3 .
_:a3, tm:subject-identifier, music:premiere-date .
_:a2, tm:resource, "1900-01-14" .
_:b0  music:premiere-date  "1900-11-10" .

This would clearly be much harder to query and would require knowledge of the tm vocabulary in addition to the music vocabulary. The very complexity of the queries given by Lacher and Decker, and Ogievetsky, respectively, speaks volumes in this regard.

Given the importance of naturalness it would seem to make sense to prefer a semantic mapping, provided that a sufficient degree of completeness can be achieved. The following section therefore looks at the issues involved in defining semantic mappings with a particular emphasis on determining whether the existence of formal data models for Topic Maps and RDF now makes it possible to ensure completeness as well as naturalness.

4.3 Semantic mapping issues

4.3.1 Identity

Although both RDF and Topic Maps use URIs as identifiers, they differ crucially in that Topic Maps offers two modes of identification, direct (using subject locators) and indirect (using subject identifiers), whereas RDF offers only one. This prompts the question, which Topic Maps construct(s) should be regarded as being semantically equivalent to the URI of an RDF resource? Subject identifiers, subject locators, ... or both?

Since identifiers are not part of the PMTM4 model, this issue is simply ignored in the Stanford proposal. Moore's position is not stated explicitly, but the examples he gives indicate that subject identifiers, at least, are regarded as equivalents. Both Ogievetsky and Unibo favour subject locators and define a separate property for handling subject identifiers. Garshol translates URIs to subject identifiers when going from RDF to Topic Maps, but is more agnostic when going the other way, translating both subject identifiers and subject locators to URIs.

There are problems with all of these approaches. Clearly, identifiers have to be mapped somehow, otherwise there will be loss of information. Equating URIs in RDF with subject locators is problematic in several ways. Firstly it leads to incorrect semantics (as the description of the Unibo proposal shows). Secondly, the result is less natural (since the identifier of a non-addressable subject like Puccini will not be treated as the URI of the corresponding resource, as would be most natural in RDF). Finally, the identifiers of occurrence types and association types (which are typically subject identifiers) could not be used as the URIs of RDF properties.

Equating URIs with subject identifiers rather than subject locators also yields unnatural results, since the identifier of an addressable subject (i.e., an information resource) will not become the URI of the corresponding resource, as would be most natural in RDF. However, this alternative does not exhibit the other problems that result from favouring subject locators.

There is a dilemma here and Garshol's agnosticism is in some ways a recognition of it. As a result, his TM2RDF translations exhibit the highest degree of naturalness as far as identity is concerned. Unfortunately he loses the information about whether the URI originated in a subject identifier or a subject locator and is thus reduced to translating every URI to a subject identifier when going the other way. This leads to problems with round-tripping, as noted above.

The ideal solution would be to allow either subject identifiers or subject locators to be regarded as URIs (and vice versa), but at the same time to retain sufficient information when going from Topic Maps to RDF to be able to perform round-tripping. The recognition in [TAG] of the distinction between resources in general and information resources, and the insights in [Pepper 03], may provide the foundation for such a solution.

The issue of multiple identifiers is treated explicitly by Garshol only. For those proposals that regard the subject locator as the semantic equivalent of a resource's URI and define a custom property for subject identifiers (Ogievetsky and Unibo), this was a non-issue as long as topics could only have one subject locator. However, in the forthcoming version of ISO 13250 multiple subject locators will be allowed and then the issue will have to be faced explicitly. Garshol's proposal to use equivalence properties defined in OWL (i.e., owl:sameAs, owl:equivalentClass, and owl:equivalentProperty) should clearly be investigated in more detail since such an approach is likely to lead to increased interoperability between RDF and Topic Maps.

4.3.2 Names

In RDF the name of a resource is usually represented by a single statement. ("Name" is here defined to mean a label used by a human to name a subject.) RDF Schema defines a property for this purpose (rdfs:label) but many vocabularies define their own properties (e.g., dc:title, foaf:name, etc.). An accurate semantic mapping from Topic Maps can be achieved by translating base names to such properties.

Both Garshol and Unibo take this approach, differing only in that Unibo always maps a base name to rdfs:label (and vice versa), while Garshol allows base names (including scoped base names) to be mapped to other properties. It should be noted that both proposals were written before the introduction of typed names in the Topic Maps model so neither can be considered a complete solution today.

In Topic Maps a base name can have variants. These are alternative forms of a name that are intended to be used in specific processing contexts, such as sorting and display. Of the semantic mapping proposals, only Unibo provides a solution for handling variant names; this is done by representing names that have variants as complex objects, an approach that seems sound enough, except for the introduction of what appears to be a superfluous blank node as the value of the tm2rdf:parameter property.

4.3.3 Binary relationships

Representations of binary relationships have somewhat different topographies in RDF and Topic Maps. RDF uses a single statement (or sometimes two statements that are the inverse of each other), in which the subject and object represent the two resources that participate in the relationship. The nature of those two resources' involvement in the relationship can be adduced from their positions as subject and object.

In Topic Maps there is no concept of subject and object in a binary association because the association has no direction. The nature of the two participating topics' involvement in the relationship is stated explicitly through their role types.

The challenge when translating from Topic Maps to RDF is to know which role-playing topic should become the subject of the resulting statement and how to preserve the role types. When going from RDF to Topic Maps, the challenge is to know which role types to assign to the subject and object of the statement respectively and how to preserve knowledge of what the subject and object were.

Both Garshol and Unibo solve this by allowing additional information to be provided that allows the RDF subject and object to be connected with their respective role types. Unibo uses a single XML vocabulary that is external to the document being translated. Garshol uses an RDF vocabulary for going from RDF to Topic Maps, and a set of Published Subjects for going from Topic Maps to RDF. Garshol's approach has the advantage of allowing source documents to be self-describing (the mappings can be included in the source documents or their schemas). The disadvantage of Garshol is the use of two different vocabularies, one for each direction. A cleaner solution would be to use a single vocabulary.

In the absence of additional information, Unibo falls back to an object mapping that requires nine RDF statements to represent a single binary association. Garshol, on the other hand, performs a semantic mapping anyway, using the predefined classes subject and object when going from RDF to Topic Maps, and selecting a role-player at random to be the subject of the resulting statement when going from Topic Maps to RDF. As currently implemented this leads to loss of information and the inability to perform round-tripping. However, it is perfectly feasible for the latter translation to retain the information necessary to perform round-tripping (in the form of an annotation to the schema using Garshol's own RTM vocabulary).

4.3.4 Non-binary relationships

One major difference between the models of RDF and Topic Maps is that the latter permits non-binary relationships to be expressed directly: An association may have one, two, or more role players. In RDF on the other hand the base model permits only binary relationships.

Most of the existing proposals for translating associations with more than two role-players are unsatisfactory, since they result in a large number of RDF statements. [Noy 04] proposes patterns for representing n-ary relations in RDF in which the relation is "re-represented" as a class rather than a property. Each such pattern requires n statements in order to express the relationship. Using the example given in section 3.5.1 the result would be one of the following, depending on the pattern used (P stands for the re-represented relation):

P rdf:type X .   P rA A .   P rB B .   P rC C .   # Pattern 2
P rdf:type X .   A rA P .   P rB B .   P rC C .   # Pattern 1

The first of these (labelled "Pattern 2") is identical to Garshol's proposal for n-ary associations. If such patterns are adopted in the RDF community it would seem to be advisable, in the interest of compatibility, to follow them as closely as possible when translating n-ary associations from Topic Maps to RDF.

Topic Maps also permits unary associations, i.e. "relationships" that only involve a single role player. Although seldom used, they do occur occasionally in order to express the equivalent of boolean properties (which might be regarded as binary relationships in which one of the role players is the subject "true"). The following example from [Pepper 05] asserts that the opera Turandot is unfinished:

unfinished( turandot : work )

None of the existing semantic mapping proposals caters explicitly for unary associations.

4.3.5 Occurrences

Both Garshol and Unibo recognize that occurrences are most naturally represented as single RDF statements where the property corresponds to the occurrence type. Internal and external occurrences correspond to statements whose objects are literals and resources respectively. Going from Topic Maps to RDF presents no problems at all; going the other way seems to require additional information in order to distinguish an internal occurrence from a name, and an external occurrence from an association or identifier.

It is unclear how Unibo behaves in the absence of additional mapping information. The default in Garshol (at least as implemented in the Omnigator is to translate statements whose objects are literals to internal occurrences and statements whose objects are resources to associations.

4.3.6 Types and subtypes

Garshol and Unibo agree on the fundamental semantic equivalence between the concept of type-instance in [TMDM] and rdf:type, on the one hand; and between supertype-subtype and rdfs:subClassOf on the other. In addition, association types and occurrence types are regarded as equivalent to RDF properties. Role types present particular problems, as discussed above, and name types, as already noted, did not exist at the time the proposals were written.

4.3.7 Reification

Only Garshol and Unibo mention reification and neither proposal regards it as being problematic. In actual fact, Unibo only talks explicitly about the reification of associations, while Garshol mentions reified names, occurrences, and associations. Neither proposal covers the reification of topic maps and association roles.

4.3.8 Scope

The concept of scope is peculiar to Topic Maps and has been regarded as one of the major stumbling blocks for RDF/Topic Maps interoperability. All the existing proposals discuss the issue in one form or another but only Garshol and Unibo do so in terms of its semantics, i.e., as a way to express the contextual validity of an assertion. Garshol makes the point that scope is most properly regarded as a special kind of assertion made about another assertion. Since assertions about assertions are handled through reification in both paradigms, and reification translates rather easily, Garshol proposes to translate scope using reification together with a property that captures the semantics of contextual validity.

Garshol treats scoped base names as a special case, however, and allows a base name in a particular scope to be translated to a specific property. For example, a base name in the scope 'nickname' might be translated using the foaf:nick property. While this undoubtedly yields more natural results (much more natural than translating to, say, a reified rdfs:label statement with an rdftm:scoped-by property), such special-casing introduces a degree of inconsistency in the handling of scope. Why should only base names be treated in this way? Why not associations and occurrences as well?

The answer may be that associations and occurrences have types whereas names do not (or did not, until recently). It could be argued that the lack of typed names in Topic Maps has led to scoped names being used in ways that distort the semantics of scope. Or, to put it another way: Given that the forthcoming revised Topic Maps standard will permit typed names, would it be more appropriate to represent a nickname as a name of type 'nickname' (or foaf:nick) rather than a name in the scope 'nickname'? If so, it would be possible to avoid treating scoped names as a special case and still obtain natural results.

Unibo handles scope in three different ways (one of which involves reification) depending on the kind of construct in question. This is clearly even more inconsistent, and it is probably also unnecessary since the reification approach seems to be usable for scoping any kind of topic characteristic.

4.3.9 Other issues

None of the existing proposals discuss how to represent RDF containers and collections, language tags, XML literals or typed literals in Topic Maps. Of these issues, the latter two are addressed by recent datatyping extensions to the Topic Maps model. Language tagging can be seen as a kind of contextual information akin to scope and treated accordingly. Containers and collections may or may not require special treatment: Since they are expressed using the fundamental building blocks of RDF (nodes and arcs), they may be represented using associations in Topic Maps. The semantics would not be lost and could be recovered when round-tripping. However, they would not be "visible" in terms of some equivalent Topic Maps construct.


5 Conclusion

The main result of this document is the identification and comparison of five different proposals addressing a number of issues related to data interoperability and translation between RDF and Topic Maps.

Among the several possible criteria for evaluating these proposals, two, completeness and naturalness, have been selected as the most relevant and appropriate for evaluating the qualities and limitations of each proposal. Completeness, defined as the extent to which any semantic structure in the source model is correctly (i.e., without losing or adding information) translated into the destination model, provides a clear indication of the semantic power of each translation approach. Naturalness, defined as the extent to which a translated model resembles in structure and content an equivalent model expressed directly in the target paradigm, provides an indication of the level of integration that each approach offers for the translated result to merge and interact with other models expressed in the same paradigm.

The analysis of the proposals identified two main approaches towards translation, which we dubbed "object mapping" (providing a translation of every structural component of the source paradigm) and "semantic mapping" (providing a structure corresponding to every conceptual structure of the source model). Although it is not the purpose of this document to provide suggestions and guidelines for translation paths between RDF and Topic Maps, the relative merits of semantic mapping over object mapping are clearly apparent and strongly imply that further guidelines pursue semantic mapping as the basis of any recommended approach to translation.

A number of outstanding issues need to be considered when providing a semantic mapping of the two paradigms, including identity, non-binary associations, roles, etc. Furthermore, semantic mapping has constraints in its applicability when the source model uses constructs that are not directly mappable into the target paradigm. In this case, two possible approaches can be foreseen, each championed by one of the two most recent proposals: Each asks the user for additional information for disambiguating these structures, but Garshol requires this to be provided (at least when going from RDF to Topic Maps), while Unibo falls back to object mapping in this case.

The analysis of the options and solutions provided in literature, therefore, clearly shows the advantages of semantic mapping, but at the same time lists the issues that need to be addressed and solved in any future translation approach. However, now that both RDF and Topic Maps have formal data models, and with the help of RDF Schema and OWL, it seems likely that most, if not all, of the issues we have listed here can be resolved without resorting to the restricted interoperability offered by object mapping.


Acknowledgements

The editors wish to thank Nikita Ogievetsky for providing the test case result in section 3.3, and Natasha Noy, Mike Uschold, David Wood, and Ralph Swick of the Semantic Web Best Practices and Deployment Working Group for reviewing earlier versions of this document.


References

[Ciancarini 03]
Ciancarini, Paolo; Gentilucci, Riccardo; Pirruccio, Marco; Presutti, Valentina; Vitali, Fabio: Metadata on the Web: On the integration of RDF and Topic Maps, http://www.idealliance.org/papers/extreme03/html/2003/Presutti01/EML2003Presutti01.html (2003)
[Garshol 01]
Garshol, Lars Marius: Topic maps, RDF, DAML, OIL: A comparison, http://www.ontopia.net/topicmaps/materials/tmrdfoildaml.html (2001)
[Garshol 02]
Garshol, Lars Marius: An RDF Schema for topic maps, http://psi.ontopia.net/rdf/ (2002)
[Garshol 03a]
Garshol, Lars Marius: Living with Topic Maps and RDF, http://www.ontopia.net/topicmaps/materials/tmrdf.html (2003)
[Garshol 03b]
Garshol, Lars Marius: The RTM RDF to topic maps mapping: Definition and introduction, http://www.ontopia.net/topicmaps/materials/rdf2tm.html (2003)
[Gentilucci 02]
Gentilucci, Riccardo; Pirruccio, Marco: Metainformazioni sul World Wide Web: Conversione di formato e navigazione, University of Bologna, Masters Thesis, (2002; in print; in Italian)
[Kaminsky 02]
Kaminsky, Piotr: Integrating Information on the Semantic Web Using Partially Ordered Multi Hypersets, http://www.ideanest.com/braque/Thesis-web.pdf (2002)
[Lacher 01]
Lacher, Martin S.; Decker, Stefan: On the Integration of Topic Maps and RDF Data, http://www.idealliance.org/papers/extreme03/html/2001/Lacher01/EML2001Lacher01-toc.html (2001)
[LTM]
Garshol, Lars Marius: The Linear Topic Map Notation: Definition and introduction, version 1.2, http://www.ontopia.net/download/ltm.html (2002)
[Mindswap 02]
MindSwap: RDF Converter, http://www.mindswap.org/2002/rdfconvert/ (2002)
[Moore 01]
Moore, Graham: RDF and Topic Maps: An exercise in convergence, http://xml.coverpages.org/moore-topicmapsrdf200105.pdf (2001)
[N3]
Berners-Lee, Tim: Notation 3, http://www.w3.org/DesignIssues/Notation3.html (2001)
[Noy 04]
Noy, Natasha; Rector, Alan: Defining N-ary Relations on the Semantic Web: Use With Individuals, http://www.w3.org/TR/swbp-n-aryRelations/ (2004)
[Ogievetsky 01a]
Ogievetsky, Nikita: Harvesting XML Topic Maps from RDF, http://www.cogx.com/kt2001 (2001)
[Ogievetsky 01b]
Ogievetsky, Nikita: XML Topic Maps through RDF glasses, http://www.cogx.com/rdfglasses.html (2001)
[Ogievetsky 02]
Ogievetsky, Nikita: DAML and Quantum Topic Maps, http://www.cogx.com/kt2002/ (2002)
[Ontopia 03a]
Ontopia: RTM: An RDF-to-TM mapping, http://psi.ontopia.net/rdf2tm/ (2003)
[Ontopia 03b]
Ontopia: TMR: A TM-to-RDF mapping, http://psi.ontopia.net/tm2rdf/ (2003)
[Ontopia 04]
Ontopia: tolog: Language tutorial, http://www.ontopia.net/omnigator/docs/query/tutorial.html (2004)
[Ontopia 05]
Ontopia: Omnigator Eight, http://www.ontopia.net/omnigator/ (2005)
[OWL]
Smith, Michael K.; Welty, Chris; McGuiness, Deborah L.: OWL Web Ontology Language Guide, http://www.w3.org/TR/owl-guide/ (W3C Recommendation, 2004)
[Pepper 00]
Pepper, Steve: The TAO of Topic Maps: Finding the Way in the Age of Infoglut, http://www.ontopia.net/topicmaps/materials/tao.html (2000)
[Pepper 03]
Pepper, Steve; Schwab, Sylvia: Curing the Web's Identity Crisis: Subject Indicators for RDF, http://www.ontopia.net/topicmaps/materials/identitycrisis.html (2003)
[Pepper 05]
Pepper, Steve: Italian Opera Topic Map, http://www.ontopia.net/omnigator/docs/navigator/opera.hytm (2005)
[PMTM4]
Biezunski, Michel; Newcomb, Steven R.: Topicmaps.net's Processing Model for XTM 1.0, version 1.0.2, http://topicmaps.net/pmtm4.htm (2001)
[Prudhommeaux 02]
Prud'hommeaux, Eric; Moore, Graham: RDF Topic Map Mapping, http://www.w3.org/2002/06/09-RDF-topic-maps/ (2002)
[RDF-Concepts]
Klyne, Graham; Carroll, Jeremy J.: Resource Description Framework (RDF): Concepts and Abstract Syntax, http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/ (W3C Recommendation, 2004)
[RDF-Primer]
Manola, Frank; Miller, Eric: RDF Primer, http://www.w3.org/TR/2004/REC-rdf-primer-20040210/ (W3C Recommendation, 2004)
[RDF-Schema]
Brickley, Dan; Guha, R.V.: RDF Schema, http://www.w3.org/TR/2004/REC-rdf-schema-20040210/ (W3C Recommendation, 2004)
[TAG]
Jacobs, Ian; Walsh, Norman: Architecture of the World Wide Web, Volume One, http://www.w3.org/TR/webarch/ (W3C Recommendation, 2004)
[TMDM]
Garshol, Lars Marius; Moore, Graham: ISO/IEC 13250: Topic Maps — Data Model, http://www.isotopicmaps.org/sam/sam-model/ (Final Committee Draft, 2005)
[TMRM]
Durusau, Patrick; Newcomb, Steven R.: ISO/IEC 13250: Topic Maps — Reference Model, http://www.isotopicmaps.org/tmmm/TMMM-4.6/TMMM-4.6.html (Working Draft, 2004)
[Vatant 04]
Vatant, Bernard: Ontology-driven topic maps, http://www.idealliance.org/europe/04/call/xmlpapers/03-03-03.91/.03-03-03.html (2004)
[Vlist 01]
Vlist, Eric van der: Representing XML Topic Maps as RDF, http://lists.w3.org/Archives/Public/www-rdf-interest/2001Mar/0050.html (2001)
[XTM1.0]
Pepper, Steve; Moore, Graham: XML Topic Maps (XTM) 1.0, http://www.topicmaps.org/xtm/1.0/ (2001)
[XTM1.1]
Garshol, Lars Marius; Moore, Graham: ISO/IEC 13250: Topic Maps — XML Syntax, http://www.isotopicmaps.org/sam/sam-xtm/ (Final Committee Draft, 2005)

Valid XHTML 1.0!