Warning:
This wiki has been archived and is now read-only.

Data Catalog Vocabulary/DC-SKOS

From W3C eGovernment Wiki
Jump to: navigation, search

Introduction to Dublin Core and SKOS

Linked Data and the RDF data model encourage vocabulary reuse across the Web. Instead of being constrained by a particular DTD or XML Schema, data publishers are free to cherry pick existing vocabularies when making RDF data available--and when creating their own new vocabularies. Or as the Semantic Web FAQ says:

... the “ethos” of the Semantic Web is to share and reuse as much as possible ...

In this spirit the Data Catalog Vocabulary (dcat) leverages several existing vocabularies to tie it into the larger world of Linked Data. Two of the vocabularies that dcat reuses are Dublin Core Metadata Terms, and the Simple Knowledge Organization System (SKOS). This document is a brief introduction to Dublin Core and SKOS for the purposes of discussion in the dcat w3c egov telecon on 2010-04-22.

Dublin Core

The Dublin Core effort began in 1995 at a workshop in Dublin, Ohio sponsored by the Online Computer Library Center (OCLC) and the National Center for Supercomputing Applications (NCSA). The primary goal of the workshop was to establish a core set of metadata elements for the description of networked resources (documents on the Internet). The report from the meeting details 13 elements which the group was able to agree on: subject, title, author, publisher, other-agent, date, object-type, form, identifier, relation, source, language and coverage. A specific syntax wasn't established, however a proof of concept SGML DTD was included as an appendix.

The Dublin Core Metadata Initiative was formed soon after, and has held annual, international workshops ever since. In 1996 DCMI and W3C members arrived at a mechanism for embedding metadata in HTML. In 1998 the initial set of 13 core elements was refined and expanded to 15: contributor, coverage, creator, date, description, format, identifier, language, publisher, relation, rights, source, subject, title and type. The Dublin Core was then standardized as RFC 2413, NISO Z3985 and ISO 15836.

Over the next 15 years the DCMI worked on a series of specifications that addressed usage, serialization formats, and developed an abstract model. To someone already familiar with the RDF data model, the Dublin Core Abstract Model (DCAM) will look oddly familiar. It's no surprise that Eric Miller, then a research scientist at OCLC, played an instrumental role in the initial Dublin Core work, and went on to be the Semantic Web Activity Lead at the W3C--where he worked on the intial specifications for RDF. So in many ways the Dublin Core Abstract Model co-evolved with the RDF data model. Today the original 15 element set of metadata has been re-expressed as the DCMI Metadata Terms (DCTERMS). DCTERMS is now expressed as an RDF Schema, and includes 55 properties and 22 classes.

DCTERMS itself is used in documents all around the web in application domains like the BBC, the Australian Government and RSS feed syndication. Dublin Core featured quite prominently in Ian Hixie's (Google) 2005 survey of a billion web documents. In addition the Dublin Core schema is used heavily in the Linked Data web, as this query for dcterms:title in Sindice indicates.

Here's an example of some DCTERMS metadata extracted from the <head> element of an HTML document from the Australian National Government:

 <head>
 <title>Topics - australia.gov.au</title	
 <meta name="DCTERMS.description" content="Information about many topics can be fiscovered on australia.gov.au"/> 
 <meta name="DCTERMS.title" content="Topics"/> 
 <meta name="DCTERMS.audience" scheme="AGOSPTERMS.AgospInterests" content="All"/> 
 <meta name="DCTERMS.language" scheme="DCTERMS.RFC4646" content="English"/> 
 <meta name="DCTERMS.format" scheme="agosp-format" content="text/html"/> 
 <meta name="DCTERMS.created" scheme="DCTERMS.ISO8601" content="2010-03-17T13:31:50"/> 
 <meta name="DCTERMS.modified" scheme="DCTERMS.ISO8601" content="2010-03-17T13:31:53"/> 
 <meta name="DCTERMS.creator" content="corporateName=Department of Finance and Deregulation"/> 
 <meta name="DCTERMS.publisher" content="corporateName=Department of Finance and Deregulation"/> 
 </head>

Here's a chunk of XHTML extracted from a page at the data.gov.uk site, that demonstrates DCTERMS use with RDFa:

  <div id="dataPackage" xmlns:dct="http://purl.org/dc/terms/"
	about="/id/dataset/os-50k-gazetteer"	> 
    <div class="data"> 
      <div class='package_title' property="dct:description"> 
      <p>1:50 000 Scale Gazetteer provides an excellent reference tool or location finder.  The Gazetteer contains entries for airports, farms, hills, woodlands, commons and other places, including over 42 000 towns and settlements with coordinates to 1 km resolution.</p> 
<p>Licence detail: UK Crown Copyright with data.gov.uk rights; see www.ordnancesurvey.co.uk/opendata/licence for further information</p> 
      </div> 
      <h2>Overview</h2> 
      <table class='package_table'> 
        <tbody> 
          <tr> 
            <td class='package_label'>Released</td> 
            <td class='package_details'><div property="dct:created">2010-04-01</div></td> 
          </tr> 
          <tr> 
            <td class='package_label'>Last updated</td> 
            <td class='package_details'><div property="dct:modified">2009-06</div></td> 
          </tr> 
          <tr> 
            <td class='package_label'>Update frequency</td> 
            <td class='package_details'><div>Annual</div></td> 
          </tr> 
          <tr> 
            <td class='package_label'>Tags</td> 
            <td class='package_details'>
              <div> 
                <a href="/data/tag/gazetteer" rel="dct:subject">gazetteer</a> 
                <a href="/data/tag/50000" rel="dct:subject">50000</a> 
                <a href="/data/tag/use" rel="dct:subject">use</a> 
                <a href="/data/tag/land" rel="dct:subject">land</a> 
                <a href="/data/tag/coordinates" rel="dct:subject">coordinates</a> 
                <a href="/data/tag/ordnance-survey" rel="dct:subject">ordnance-survey</a> 
                <a href="/data/tag/reference" rel="dct:subject">reference</a> 
                <a href="/data/tag/placename" rel="dct:subject">placename</a> 
                <a href="/data/tag/os" rel="dct:subject">os</a> 
              </div> 
            </td> 
          </tr> 
          <tr> 
            <td class='package_label'>Wiki</td> 
            <td class='package_details'>
              <div>
                <a href='/wiki/index.php/Package:os-50k-gazetteer' rel='dct:isReferencedBy'>1:50 000 Scale Gazetteer</a>
              </div>
            </td> 
          </tr> 
        </tbody> 
      </table> 
      ...
    </div>
  </div>

Simple Knowledge Organization System (SKOS)

Work on SKOS began as early as 1999 and continued through 2004 as part of EU funded Language Independent Metadata Browsing of European Resources (LIMBER) and Semantic Web Advanced Development for Europe projects. In 2004 it continued to be developed at the W3C as part of the [Semantic Web Deployment Working Group http://www.w3.org/2006/07/SWD/]. In 2009 the SKOS Reference became a W3C Recommendation.

The essential idea of SKOS is to provide a common data model for sharing and linking knowledge organization systems using the World Wide Web. By knowledge organization systems is meant traditional library and information science tools such as thesauri, classification schemes, subject heading systems, taxonomies, and folksonomies. Unlike more formal knowledge representation language like OWL, SKOS provides a simpler migration path for traditional knowledge rganization systems, where re-engineering for representation in OWL is too expensive.

In SKOS each idea or meaning in a knowledge organization system is identified with a URI as a skos:Concept, and grouped into a skos:ConceptScheme. Each skos:Concept is then described with labels (skos:prefLabel, skos:altLabel), documentation properties (skos:note, skos:changeNote, skos:definition, skos:editorialNote, skos:example, skos:historyNote, skos:scopeNote) and linked together through various semantic relations (skos:narrower, skos:broader, skos:related). In addition you can link concepts together between concept schemes using mapping relations (skos:exactMatch, skos:closeMatch, skos:narrowMatch, skos:broaderMatch, skos:relatedMatch).

SKOS is starting to be used quite a bit in the Linked Data community. For example, dbpedia uses SKOS to model categories derived from Wikipedia's Categories, e.g. Time, some of which is represented here in Turtle:

 @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
 @prefix category: <http://dbpedia.org/resource/Category:> .
 category:Time a skos:Concept ;
   skos:prefLabel "Time"@en ; 
   skos:broader category:Spacetime, category:Fundamental_physics_concepts ,
                category:Physics, category:Dimension ,
                category:Physical_quantities .

Another example is the Library of Congress who publishes the Library of Congress Subject Headings using SKOS. Here is an example that uses RDFa to publish SKOS:

  <div id="conceptDetailTab" about="http://id.loc.gov/authorities/sh95000541#concept" typeof="skos:Concept"> 
    <h1 property="skos:prefLabel" xml:lang="en">World Wide Web</h1> 
    <span rel="skos:inScheme" resource="http://id.loc.gov/authorities#conceptScheme" /> 
    <span rel="owl:sameAs" resource="info:lc/authorities/sh95000541" /> 
    <div id="wrapConcept" about="http://id.loc.gov/authorities/sh95000541#concept"> 
    <br/> 
                    
    <b>URI:</b> 
    <<a href="http://id.loc.gov/authorities/sh95000541#concept">http://id.loc.gov/authorities/sh95000541#concept</a>>
    <br />
    <br /> 
    <b>Type:</b> 
    <span rel="skos:inScheme" resource="http://id.loc.gov/authorities#topicalTerms">Topical Term</span> 
    <br />
    <br /> 
                    
    <b>Alternate Labels:</b> 
    W3 (World Wide Web); Web (World Wide Web); World Wide Web (Information retrieval system); WWW (World Wide Web)
    <br />
    <br /> 
    <div id="rdfaBlock"> 		   
      <span property="skos:altLabel" xml:lang="en">W3 (World Wide Web)</span> 
      <span property="skos:altLabel" xml:lang="en">Web (World Wide Web)</span> 
      <span property="skos:altLabel" xml:lang="en">World Wide Web (Information retrieval system)</span>
      <span property="skos:altLabel" xml:lang="en">WWW (World Wide Web)</span> 
    </div> 
		    
    <b>Broader Terms:</b> 
    <ul> 
      <li> 
        <a href="http://id.loc.gov/authorities/sh88002671#concept" rel="skos:broader"> 
        <span property="skos:prefLabel" xml:lang="en">Hypertext systems</span> 
        </a> 
      </li>                
      <li> 
        <a href="http://id.loc.gov/authorities/sh92002381#concept" rel="skos:broader"> 
        <span property="skos:prefLabel" xml:lang="en">Multimedia systems</span> 
        </a> 
      </li>                       
    </ul> 
                    
    <b>Narrower Terms:</b>

    <ul>
      <li> 
        <a href="http://id.loc.gov/authorities/sh2008009697#concept" rel="skos:narrower"> 
        <span property="skos:prefLabel" xml:lang="en">Invisible Web</span> 
        </a> 
      </li>                     
      <li> 
        <a href="http://id.loc.gov/authorities/sh2007008317#concept" rel="skos:narrower"> 
        <span property="skos:prefLabel" xml:lang="en">Mashups (World Wide Web)</span> 
        </a> 
      </li>
      <li> 
        <a href="http://id.loc.gov/authorities/sh2002000569#concept" rel="skos:narrower"> 
        <span property="skos:prefLabel" xml:lang="en">Semantic Web</span> 
        </a> 
      </li>
      <li> 
        <a href="http://id.loc.gov/authorities/sh2007008319#concept" rel="skos:narrower"> 
        <span property="skos:prefLabel" xml:lang="en">Web 2.0</span> 
        </a> 
      </li> 
      <li> 
        <a href="http://id.loc.gov/authorities/sh2003001415#concept" rel="skos:narrower"> 
        <span property="skos:prefLabel" xml:lang="en">WebDAV (Standard)</span> 
        </a> 
      </li> 
      <li> 
        <a href="http://id.loc.gov/authorities/sh97003254#concept" rel="skos:narrower"> 
        <span property="skos:prefLabel" xml:lang="en">WebTV (Trademark)</span> 
        </a> 
      </li>                         
    </ul> 
                    
    <b>Related Terms:</b>
 
    <ul>                         
      <li> 
        <a href="http://id.loc.gov/authorities/sh92002816#concept" rel="skos:related"> 
        <span property="skos:prefLabel" xml:lang="en">Internet</span> 
        </a> 
      </li>                   
    </ul> 
                    
    <b>Editorial Notes:</b> 

    <ul> 
      <li property="skos:editorialNote" xml:lang="en">ASTI; Engr. index; Web. 3</li> 
    </ul> 
                    
    <b>Sources:</b> 
    <ul> 
    <li property="dcterms:source" xml:lang="en">Work cat.: 94067520: December, J. The World Wide Web Unleashed, c1994 (WWW, the Web, a distributed hypermedia system, a collection of interconnected hardware, software, and networked systems, it is a concept, not a program, system, or protocol, it is an interface)</li> 
    <li property="dcterms:source" xml:lang="en">94234135: Brown, S. The Internet via Mosaic and World Wide Web, c1994 (WWW, the Web) p. 35 (Although the WWW is primarily used on a global scale as a part of the Internet, it is feasible for a two-machine network to run the WWW client/server software)</li> 
    <li property="dcterms:source" xml:lang="en">Internet publishing handbook, c1995: p. 15 (World-Wide Web system is known by its various names: WWW, W3, and Web)</li> 
    <li property="dcterms:source" xml:lang="en">MAGS, Dec. 8, 1995: article by Robert M. Metcalfe (first generation of WWW based on Hypertext Transfer Protocol and Hypertext Transfer Markup Language)</li> 
    </ul> 
                    
                    
    <b>LC Classification:</b> TK5105.888
    <br /> 
    <br /> 
                    
    <b>Created:</b> 
    <span property="dcterms:created" content="2000-04-28T00:00:00-04:00" datatype="xs:dateTime">2000-04-28</span> 
    <br /> 
    <br /> 
                    
    <b>Last Modified:</b> 
    <span property="dcterms:modified" datatype="xs:dateTime" content="2001-10-01T09:56:06-04:00">2001-10-01 09:56:06</span> 
    <br /> 
    <br /> 
                    
                    
    <b>Similar concepts from other vocabularies:</b> 
    <ul> 
      <li> 
         <a rel="skos:closeMatch" href="http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb13319953j"><http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb13319953j></a> <img src="/static/images/newsite.gif" alt="Offsite link" /> 
      </li> 
    </ul>
 
  </div>

Sites like subj3ct are also using SKOS as a way to harvest and interlink concepts from different concept schemes.