W3C

Wordnet in RDFS and OWL

Editor's Draft

This version:
$Id: wordnet-sw-20040713.html,v 1.3 2004/08/05 15:45:45 bmcbride Exp $
Latest version:
...
Previous versions:
This is the first public version
Editor:
@@TBA
Contributers:
@@TBA

Copyright © 2004 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.


Abstract

This document describes an RDF Schema and OWL ontology for representing Wordnet.

Status of this Document

This document is an initial strawman draft being developed by the Wordnet Task Force of the Semantic Web Best Practices and Deployment Working Group.

The following section describes the intended status of this document if and when it is pubished.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

We encourage public comments. Please send comments to public-swbp-wg@w3.org

Publication as a draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Open issues, todo items:

Sections where further work are marked within the document with @@ and a comment. Open issues and comment can be found in Appendix F

Publication as a draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.


Introduction

The Wordnet @@ref lexicon is proving to be an useful resource for semantic web developers. This document presents an RDF/OWL representation of the entire structure of Wordnet. By doing so, we allow Wordnet data to be accessed via RDF APIs and query languages, and to be mixed with non-Wordnet data, as well as with other lexically-oriented material, such as extensions to, and derivatives of, Wordnet and Wordnet-tagged corpuses.

A related but distinct activity would be to describe the use of Wordnet as a basis for RDF/OWL class and/or property hierarchy. Wordnet's noun term (hypernym) hierarchy captures "an X is a kind of Y" relationships between English category terms based on conventional usage. While there are several projects working in this area, it is not a task we currently address in this document.

This current document does not explore the issues raised by the mapping of Wordnet structures into RDF (eg. noun terms and/or synsets into classes). Future revisions of this document, or companion documents, may address some of the issues this raises, such as the different assumptions underlying lexical databases when contrasted with formal ontologies. Here we concentrate on reflecting into RDF/XML the core structures and content of Wordnet, without consideration for mapping those notions into RDF's own notions of classes, properties and instances.

This approach echoes that of SKOS @@ref , which reflects into RDF the broader/narrower relationships used by thesauri, without requiring that each thesaurus be re-engineered as an RDF/OWL class hierarchy. Unlike SKOS, the structuring vocabulary used here draws directly from the conceptual framework underpinning Wordnet, allowing for concepts such as 'antonym' to be used to relate concepts/synsets. It may be possible for future versions of this document and SKOS to share more common structure, since the structuring vocabularies address similar (yet distinct) problems.

The Structure of Wordnet

This section describes the structure of the RDFS and OWL representation of Wordnet described here. For a full explanation of Wordnet terms and concepts, the reader should refer to the Wordnet documentation@@ref.

In Wordnet, a word form@@ref is closest to the commonsense meaning of the term word. It is typically a sequence of characters such as "cat", or "dog", or "chat". The same word form can have different meanings in different langauges, for example, in English, the word form "chat" means to converse informally, whilst in French it means a cat. A word represents a word form in a language. The French word "chat" is a different word to the English word "chat", though they both have the same word form "chat". The same word can be used in different senses, for example the word ""dog" (in English) can mean a kind of animal or to follow. A word sense represents a word used in particular sense.

As can be seen from figure 1, words are represented by resources of type wn:Word@@ref. The properties wn:hasWordForm and wn:hasLanguage(@@check these names) relate a word to its word form and language respectively. The property wn:hasSense relates words to their senses. (@@ is there some ordering information we are losing here?)

diagram of wordnet classes and relations @@update to include Aldo's changes.

A central concept in Wordnet is the synset@@ref. A synset represents a set of synonyms, that is word senses with similar meanings. Synsets may also be considered to represent concepts in a thesaurus or ontology, but such considerations are beyond the scope of this document at this time. For our present purposes, synsets are considered to be collections of word senses@@ref.

Synsets are represented by resources of type wn:SynSet. The properties wn:inSynSet and wn:hasWordSense relate synsets and word senses. A wn:WordSense resource can be thought of as representing a (word, synset) pair.

Wordnet defines semantic relations between resources in this basic structure that represent linguistic and conceptual relationships between the terms. For example, the wn:hasHypernym and hasHyponym properties represent hypernym (@@explain) and hyponym (@@explain) relations between synsets.

Wordnet Classes and Properties

This section describes each of the Wordnet classes and properties. @@incomplete

wn:Word

wn:Word is the class of Wordnet words. A wn:Word is a word form in a specific language. The English word "chat" is different to the French word "chat".

wn:WordSense

wn:WordSense is a class of words with specific senses. Thus the word "plant in the sense of a growing organism is a different wn:WordSense to the word "plant" in the sense of a factory.

Naming Wordnet Resources

All Wordnet resources are named with URIs@@ref having a common prefix. Throughout this document, we will refer to this prefix as $WNBASE.All terms and concepts used in the Wordnet ontology are named in the namespace $WNBASE/2-0/ontology.

The word $$ is named by the URI $WNBASE/word/$$#.

The synset whose identifier is $$ is named by the URI $WNBASE/synset/$$# @@I'm assuming the id's are ok for inclusion inthe uri - defining escaping rules if not.

A word sense, which is sense number N in the synset with identifier $$ is named by $WNBASE/sense/$$/N# @@reconsider in the light of danbri noting existing of sense id's in wordnet 2.

References

RDF version of the schema. @@ fix formatting

@@TBD

Appendix A - Wordnet Ontology

@@TBD A listing of the ontology.

Appendex B - Glossary

Appendix C - Example Use Case

Appendix D - Test Cases

This section lists some simple queries, expressed in RDQL, that illustrate features of the design of this Wordnet ontology.

Appendix E - Requirements

This section sets out the technical requirements to be satisfied by this design for representing Wordnet using semantic web languages. @@Currently incomplete.

Completeness and Accuracy
The encoding shall represent Wordnet completely and accurately, with minimal information loss.
Multilingual
The encoding shall be capable of representing wordnets for languages other than English and other than western European languages. It should be possible to merge the semantic web encodings of Wordnets for various languages without confusion.
Semantic Web Language Support
The encoding will be represented in OWL Lite @@ref and shall be usefully processable by a reasoner that supports only RDF Schema @@ref.

Appendix F - Outstanding Issues/Comments

Rationale for representing words and languages
Propose to include rational in description of Word when adding properties.
Where to use XML literal to support internationalization
action jjc to investigate.
Do we have sense id's?
action guus to investigate
How do we handle versioning with respect to synset uris
action aldo to investigate

Appendix G - Decisions and Rational