RDF Content Labels: Use Cases and Draft Schemas

NOTE: this document is published here as a work-in-progress within the EU-funded Quatro project. It uses W3C technology and explores the application of W3C's RDF/XML format to content labelling in the PICS tradition. It is not currently a work item of any W3C Working Group. I am exploring the possibilities for bringing this work into a chartered W3C Group, eg. as an Interest Group note through the Semantic Web Interest Group. --Dan Brickley

This is:
http://www.w3.org/2004/12/q/doc/rdf-contentlabels.html
Last updated:
2005/2/17
Editor(s):
Kal Ahmed, Techquila <kal@techquila.com>
Phil Archer, ICRA <phil.archer@icra.org>
Dan Brickley, W3C <danbri@w3.org>

Abstract

This document describes use cases for content labels and an RDF schema designed to:

  1. Define a content label as an RDF class. This will be a description of a resource or, more usually, a collection of resources.
  2. Define a rule set for identifying the correct content label for a given URI from within a set of content labels.

Status of this Document

This is a second draft describing use cases and an associated RDF schema designed to meet them. Now in its second draft, we think that the use cases described in this document are very close to the encapsulating the final requirements. We intend to use feedback on this document as input for the next phase of our work.

This document has been produced through a collaboration between members of the ICRA Labelling Working Group with support from Vodafone Group Services, Dan Brickley of the W3C W3C Semantic Web team, and two project groups: IA Japan's Mobile Filtering Project and QUATRO. It has not been reviewed more generally within W3C and should not be considered to be on the "standards track". It is published here to facilitate collaboration and discussion about possible future directions for this and related work.

Discussion of this document takes place on the public mailing list http://lists.w3.org/Archives/Public/public-quatro/. To contribute or comment, please subscribe by sending mail to public-quatro-request@w3.org with subscribe as the subject. The archive of this list can be read by the general public.

We are particularly interested in feedback regarding the costs and benefits of implementing this style of labelling using a generic RDF logical rule language, rather than as a custom RDF application vocabulary.

This document is work in progress and does not imply endorsement by, or the final consensus, of all its authors.

Table of Contents

1 Introduction
2 Use cases
  2.1 Use Case 1: Distributed content production, centralised label control
  2.2 Use Case 2: Distributed content production, distributed label control
  2.3 Use Case 3: Centralised generic labels with local override
  2.4 Use case 4: Video on demand
3 The Schemas
3.1 A schema for RDF Content Labels
4 Current Issues
  4.1 Owl
  4.2 Frequency Modifiers
  4.3 Adding trust, web services etc.
  4.4 Verbosity
5 References


1 Introduction

There are a variety of reasons for which a content provider may wish to label their material as collections of resources rather than providing an individual descriptions for an individual URI. These include, but are not limited to, child protection, quality assurance, standards conformance, copyright notices, authorship etc.

Different interest groups have come together to find a solution to this based on Semantic Web Technologies. The experiences gained using the PICS system have been instructive in determining the design of content labels based on RDF.

2 Use cases

Note: In the following use cases, the example given is labelling using the ICRA system. The authors wish to stress that the concept of an RDF content label has been designed to support any labelling scheme.

2.1 Use case 1: Distributed content production, centralised label control

Exemplary Portal Inc. has 40 production centres around the world. Each is responsible for a subdomain of example.org and is largely autonomous. The Exemplary Portal operates two further domains at example.net and example-inc.net for internal functions but these domains are used to supply some content to the public-facing web properties. As well as content produced in-house, Exemplary Portal carries a great deal of third party content.

Although production is spread around the world, corporate liability is concentrated in one department at head office.

Each production centre should arrange for content to carry an identical tag that points to the labelling information. The tag should be regarded as stable over the medium to long term. Labelling information should be under the direct and easy control of the corporate liability department. It is posted online at www.example.org/labels.rdf.

The link tag should therefore be:

<link rel="meta" href="http://www.example.org/labels.rdf" type="application/rdf+xml" />

This can also be expressed as an HTTP Response Header:

Link: <http://www.example.org/labels.rdf>; /="/"; rel="meta" type="application/rdf+xml";

This method allows Exemplary Portal to make inclusion of the same link a feature of its standard server configuration.

A client following that link should receive back an RDF graph that is about, or can be interpreted as being about, the resource carrying the link. For example:

<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:label="http://www.w3.org/2004/12/q/contentlabel#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:i="http://www.icra.org/rdfs/vocabularyv03#">

  <rule:Ruleset>
    <rule:hostRestriction>example.org</rule:hostRestriction>
    <rule:hostRestriction>example.net</rule:hostRestriction>
    <rule:hostRestriction>example-inc.net</rule:hostRestriction>
    <rule:rules rdf:parseType="Collection">

      <rule:oneOf>
	<rdf:li>
	  <uri:Matches>
	    <uri:value>.*ads.*</uri:value>
	  </uri:Matches>
	</rdf:li>
	<rdf:li>
	  <uri:Matches>
	    <uri:value>.*banners.*</uri:value>
	  </uri:Matches>
	</rdf:li>
	<rule:confers>
	  <rule:PropertySet>
            <label:hasLabel rdf:resource="#advert"/>
          </rule:PropertySet>
        </rule:confers>
      </rule:oneOf>

      <uri:Matches>
	<uri:value>*</uri:value>
	<rule:confers>
	  <rule:PropertySet>
	    <label:hasLabel rdf:resource="#defaultContentPage"/>
	  </rule:PropertySet>
	</rule:confers>
      </uri:Matches>

    </rule:rules>
  </rule:Ruleset>

  <label:contentLabel rdf:ID="defaultContentPage">
    <i:cz>1</i:cz>
    <i:lz>1</i:lz>
    <i:nz>1</i:nz>
    <i:oz>1</i:oz>
    <i:sz>1</i:sz>
    <i:vz>1</i:vz>
  </label:contentLabel>

  <label:contentLabel rdf:ID="advert">
    <i:cz>1</i:cz>
    <i:lz>1</i:lz>
    <i:nz>1</i:nz>
    <i:oz>0</i:oz>
    <i:sz>1</i:sz>
    <i:vz>1</i:vz>
  </label:contentLabel>

  <rdf:Description rdf:about="">
    <dc:creator rdf:resource="http://www.icra.org" />
    <label:authorityFor>http://www.icra.org/rdfs/vocabularyv03#</label:authorityFor>
  </rdf:Description>

</rdf:RDF>

Data for use case 1

As well as data about the labelled resource, the RDF instance carries a limited amount of data about itself. That is:

This small addition to the file would enable the labelling organisation to make assertions via other mechanisms about the veracity of the labels.

The rules determine which of the two available labels apply to resources on example.org according to their URIs. Test data is available

2.2 Use case 2: Distributed content production, distributed label control

The Content Management Company is a major portal for a single country, Germany. In accordance with German practice, it categorises its content into age brackets 5: 0-6, 6-12, 12-16, 16-18 and 18+. The Content Management Company defines 5 labels and puts them in an RDF instance.

<?xml version="1.0"?>
<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:label="http://www.w3.org/2004/12/q/contentlabel#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
 xmlns:i="http://www.icra.org/rdfs/vocabularyv03#">

  <rdf:Description rdf:about="">
    <dc:creator rdf:resource="http://www.icra.org" />
    <label:authorityFor>http://www.icra.org/rdfs/vocabularyv03#</label:authorityFor>
  </rdf:Description>

  <label:Ruleset>
    <label:hostRestriction rdf:resource="#hosts" />
  </label:Ruleset>

  <label:Ruleset rdf:ID="hosts">
    <label:hostRestriction>example.de</label:hostRestriction>
  </label:Ruleset>

  <label:contentLabel rdf:ID="defaultContentPage">
    <i:cz>1</i:cz>
    <i:lz>1</i:lz>
    <i:nz>1</i:nz>
    <i:oz>1</i:oz>
    <i:sz>1</i:sz>
    <i:vz>1</i:vz>
  </label:contentLabel>

  <label:contentLabel rdf:ID="a6">
    <i:cz>1</i:cz>
    <i:lz>1</i:lz>
    <i:nb>1</i:nb>
    <i:oz>1</i:oz>
    <i:sz>1</i:sz>
    <i:vz>1</i:vz>
  </label:contentLabel>

  <label:contentLabel rdf:ID="a12">
    <i:cb>1</i:cb>
    <i:lc>1</i:lc>
    <i:na>1</i:na>
    <i:nb>1</i:nb>
    <i:oz>1</i:oz>
    <i:sa>1</i:sa>
    <i:vb>1</i:vb>
    <i:vc>1</i:vc>
    <i:vd>1</i:vd>
    <i:vg>1</i:vg>
  </label:contentLabel>

  <label:contentLabel rdf:ID="a16">
    <i:cb>1</i:cb>
    <i:lb>1</i:lb>
    <i:lc>1</i:lc>
    <i:na>1</i:na>
    <i:nb>1</i:nb>
    <i:nc>1</i:nc>
    <i:oa>1</i:oa>
    <i:ob>1</i:ob>
    <i:oc>1</i:oc>
    <i:sa>1</i:sa>
    <i:sb>1</i:sb>
    <i:vb>1</i:vb>
    <i:vc>1</i:vc>
    <i:vd>1</i:vd>
<i:vg>1</i:vg>
<label:hasModifier><i:xb /></label:hasModifier>
</label:contentLabel>

<label:contentLabel rdf:ID="a18">
    <i:ca>1</i:ca>
    <i:la>1</i:la>
    <i:lb>1</i:lb>
    <i:lc>1</i:lc>
    <i:na>1</i:na>
    <i:nb>1</i:nb>
    <i:nc>1</i:nc>
    <i:oa>1</i:oa>
    <i:ob>1</i:ob>
    <i:oc>1</i:oc>
    <i:sa>1</i:sa>
    <i:sb>1</i:sb>
    <i:sc>1</i:sc>
    <i:sd>1</i:sd>
    <i:sf>1</i:sf>
    <i:vb>1</i:vb>
    <i:vc>1</i:vc>
    <i:vd>1</i:vd>
    <i:ve>1</i:ve>
    <i:vf>1</i:vf>
    <i:vg>1</i:vg>
    <i:vh>1</i:vh>
    <i:vi>1</i:vi>
    <i:vj>1</i:vj>
</label:contentLabel>

</rdf:RDF>

Data for use case 2

The CMS operated by the Content Management Company uses metadata associated with elements present on each page to assign a label for the generated HTML. This is achieved by inserting one of 5 possible link tags:

<link rel="meta" href="http://www.example.de/labels.rdf#defaultContentLabel" type="application/rdf+xml" />

<link rel="meta" href="http://www.example.de/labels.rdf#a6" type="application/rdf+xml" />

<link rel="meta" href="http://www.example.de/labels.rdf#a12" type="application/rdf+xml" />

<link rel="meta" href="http://www.example.de/labels.rdf#a16" type="application/rdf+xml" />

><link rel="meta" href="http://www.example.de/labels.rdf#a18" type="application/rdf+xml" />

Although there are no rules for determining which label applies to which URI, there is a host restriction in place so that it is clear that the labels may only be applied to example.de. Furthmore, this example shows the host restriction list as a separate block within the RDF instance. By extension, the list of hosts may be held in a separate file. This allows the main RDF instance to be signed or otherwise "approved of" by the labelling organisation while allowing the content provider to add new hosts.

Test data is available

2.3 Use case 3: Centralised generic labels with local override

This is a combination of use cases 1 and 2. The staff at one of the Exemplary Portal's 40 production centres feels that a particular page they've created should not be associated with the "defaultContentLabel" but should use the "advert" label instead. This is achieved easily by changing the link tag as follows:

<link rel="meta" href="http://www.example.org/labels.rdf#advert" type="application/rdf+xml" />

The same RDF instance is referenced as for other pages on the example.org site. However, the tag points to a specific fragment of the instance and the filter should respect this rather than processing the application rules.

NB. It is quite possible that a user's computer will already hold the RDF instance in cache and, in this use case, probably allow the page based on the information it already has. Filters should, however, always follow links to RDF instances to look for data of the type ContentLabel. This means that it is possible a client will have two different descriptions of the same resource. However, they will each have a different provenance so that a client, such as a filter, will be able to give more weight to one description than another.

Search engine results

A very similar situation obtains in a search engine's results. The labelling of the results page cannot be pre-determined since, by its nature, it is compiled at runtime based on a user's input. Similarly, its URL is not a guide to what content is on the page.

Use case 3 covers this well since images and other page elements that appear irrespective of the search conducted will be labelled through the centralised system (their URIs are static). The HTML page itself will also be labelled through the centralised system, BUT, if the search engine recognises that the label is inappropriate for the results returned, an extra link tag can be inserted in the HTML <head> that points to a more appropriate label. This will effectively override the ApplicationRules specified in the RDF instance.

Test data is available

2.4 Use case 4: Video on demand

The Dutch Example Video Company makes its movies available online. It wants to label its website in the usual way but also wants to provide separate labels for the individual movies.

<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:label="http://www.w3.org/2004/12/q/contentlabel#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:i="http://www.icra.org/rdfs/vocabularyv03#">

  <rdf:Description rdf:about="">
    <dc:creator rdf:resource="http://www.icra.org" />
    <label:authorityFor>http://www.icra.org/rdfs/vocabularyv03#</label:authorityFor>
  </rdf:Description>

  <label:Ruleset>
    <label:hostRestriction>examplevideocompany.nl</label:hostRestriction>
    <label:hostRestriction>examplevideocompany.com</label:hostRestriction>
    <label:rules rdf:parseType="Collection">

      <label:Matches>
	<label:value>movie1</label:value>
	<label:confers>
	  <label:PropertySet>
	    <label:hasLabel rdf:resource="#label_1"/>
            <label:frequentScenes rdf:resource="#label_2" />
	  </label:PropertySet>
	</label:confers>
      </label:Matches>

      <label:Matches>
	<label:value>movie2</label:value>
	<label:confers>
	  <label:PropertySet>
            <label:hasLabel rdf:resource="#label_2" />
            <label:singleScene rdf:resource="#label_3" />
	  </label:PropertySet>
	</label:confers>
      </label:Matches>

      <label:Matches>
	<label:value>.*</label:value>
	<label:confers>
	  <label:PropertySet>
	    <label:hasLabel rdf:resource="#label_1"/>
	  </label:PropertySet>
	</label:confers>
      </label:Matches>

    </label:rules>
  </label:Ruleset>

  <label:contentLabel rdf:ID="label_1">
    <rdfs:label>No nudity, violence etc.</rdfs:label>
    <i:nz>1</i:nz>
    <i:sz>1</i:sz>
    <i:vz>1</i:vz>
    <i:lz>1</i:lz>
    <i:oz>1</i:oz>
    <i:cz>1</i:cz>
  </label:contentLabel>

  <label:contentLabel rdf:ID="label_2">
    <rdfs:label>Mild nudity</rdfs:label>
    <i:na>1</i:na>
    <i:sz>1</i:sz>
    <i:vz>1</i:vz>
    <i:lz>1</i:lz>
    <i:oz>1</i:oz>
    <i:cz>1</i:cz>
  </label:contentLabel>

  <label:contentLabel rdf:ID="label_3">
    <rdfs:label>Mild language</rdfs:label>
    <i:nz>1</i:nz>
    <i:sz>1</i:sz>
    <i:vz>1</i:vz>
    <i:lc>1</i:lc>
    <i:oz>1</i:oz>
    <i:cz>1</i:cz>
  </label:contentLabel>

</rdf:RDF>

Data for use case 4

The data includes a rule that allows a client to infer that everything on the Example Video Company's website should have label 1. However, specific information is also provided for two movies. The first movie has frequent scenes of what the content provider calls mild nudity (bare breasts in the ICRA vocabulary) but otherwise there is no sex, nudity, potentially offensive language, violence etc. in this movie.

Likewise movie 2 has mild nudity throughout and a single scene when mild expletives are used.

This use case also exemplifies the power of using rules rather than specifying the resource's URI in the usual way. The Example Video Company operates the same site on two top level domains (.nl and .com), both accessible with and without the www prefix. Furthermore, it offers its movies in a range of formats (Real, Quick Time etc.) Whichever domain is being used, whichever format the user chooses, assuming a consistent approach is used to name the files, the same label will be applied without duplicating the metadata.

Note: In this example, all descriptions are held in a single file, however, the nature of RDF is that the data can be divided between any number of files. For example, each movie could point to its own data file that then referenced "label_1," "label_2" etc. from a central file as required.

Test data is available

3 The schema

A schema to support this system was devised initially by Kal Ahmed and is described in the document linked below.

4 Current issues under discussion

4.1 OWL

Naturally, the group recognises the potential power of OWL to support content labelling. We have taken the approach that an application rule is a class with a series of properties that allow the correct label to be identified. An alternative approach is to treat a collection of resources as a class that has a series of properties that allow it to be matched with a label. The latter is more in line with OWL and offer advantages in some areas. However, we believe our approach offers distinct advantages, notably in not requireing clients to include both RDF and OWL processors.

A relatively trivial task to be undertaken by the group will be to provide a mapping from the Application Rule class-based approach to an OWL-based approach

4.2 Frequency modifiers

RDF content labels are designed to support a method of assigning frequency modifiers. That is, a movie might be described by content label A for most of its duration but have "several scenes" described by content label B and "occasional scenes" described by content label C. The mechanism for this is currently undder discussion.

4.3 Adding trust, web services etc.

The labelling structure suggested here provides a way for a content provider to label their own material using any vocabulary. Under the QUATRO project, mechanisms will be defined, or existing mechanisms highlighted, that will allow clients to gain further trust in those labels. This might be through third party label providers (analogous to a PICS Label Bureau), digital signatures, other database look up systems or content analysis methods.

4.4 Verbosity

The comment has been made that a content label written in RDF/XML is rather more verbose than one written in PICS. This is certainly true, however, the posting of labels in a discrete RDF instance that can cover any number of resources on any number of domains offers substantial optimisation benefits for content providers and client developers. The specific suggestion was made that an ICRA label could be written simply as, for example:

<i:cz />

As opposed to the suggested

<i:cz>1</i:cz>

This works for the ICRA system which is binary in nature. The descriptor "cz" (no user-generated content) is either true or not. However, other labelling/rating schemes will use numerical value other than 1 and 0. Therefore the generalised labelling schema must support this.

5 References

[PICS]
Platform for Internet Content Selection (PICS) (See http://www.w3.org/PICS/)
[RDF]
Resource Description Framework (RDF) (See http://www.w3.org/RDF/)
[QUATRO]
Quatro project (See http://www.icra.org/projects/quatro/)
[ICRA]
ICRA (An independent non-proft organisation)