RDF Content Labels: Use Cases and Draft Schemas

Abstract

This document describes use cases for content labels and an RDF schema designed to:

Define a content label as an RDF class. This will be a description of a resource or, more usually, a collection of resources.
Define a rule set for identifying the correct content label for a given URI from within a set of content labels.

2 Use cases

Note: In the following use cases, the example given is labelling using the ICRA system. The authors wish to stress that the concept of an RDF content label has been designed to support any labelling scheme.

2.1 Use case 1: Distributed content production, centralised label control

Exemplary Portal Inc. has 40 production centres around the world. Each is responsible for a subdomain of example.org and is largely autonomous. The Exemplary Portal operates two further domains at example.net and example-inc.net for internal functions but these domains are used to supply some content to the public-facing web properties. As well as content produced in-house, Exemplary Portal carries a great deal of third party content.

Although production is spread around the world, corporate liability is concentrated in one department at head office.

Each production centre should arrange for content to carry an identical tag that points to the labelling information. The tag should be regarded as stable over the medium to long term. Labelling information should be under the direct and easy control of the corporate liability department. It is posted online at www.example.org/labels.rdf.

The link tag should therefore be:

This can also be expressed as an HTTP Response Header:

Link: <http://www.example.org/labels.rdf>; /="/"; rel="meta" type="application/rdf+xml";

This method allows Exemplary Portal to make inclusion of the same link a feature of its standard server configuration.

A client following that link should receive back an RDF graph that is about, or can be interpreted as being about, the resource carrying the link. For example:

<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:label="http://www.w3.org/2004/12/q/contentlabel#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:i="http://www.icra.org/rdfs/vocabularyv03#">

  <rule:Ruleset>
    <rule:hostRestriction>example.org</rule:hostRestriction>
    <rule:hostRestriction>example.net</rule:hostRestriction>
    <rule:hostRestriction>example-inc.net</rule:hostRestriction>
    <rule:rules rdf:parseType="Collection">

      <rule:oneOf>
	<rdf:li>
	  <uri:Matches>
	    <uri:value>.*ads.*</uri:value>
	  </uri:Matches>
	</rdf:li>
	<rdf:li>
	  <uri:Matches>
	    <uri:value>.*banners.*</uri:value>
	  </uri:Matches>
	</rdf:li>
	<rule:confers>
	  <rule:PropertySet>
            <label:hasLabel rdf:resource="#advert"/>
          </rule:PropertySet>
        </rule:confers>
      </rule:oneOf>

      <uri:Matches>
	<uri:value>*</uri:value>
	<rule:confers>
	  <rule:PropertySet>
	    <label:hasLabel rdf:resource="#defaultContentPage"/>
	  </rule:PropertySet>
	</rule:confers>
      </uri:Matches>

    </rule:rules>
  </rule:Ruleset>

  <label:contentLabel rdf:ID="defaultContentPage">
    <i:cz>1</i:cz>
    <i:lz>1</i:lz>
    <i:nz>1</i:nz>
    <i:oz>1</i:oz>
    <i:sz>1</i:sz>
    <i:vz>1</i:vz>
  </label:contentLabel>

  <label:contentLabel rdf:ID="advert">
    <i:cz>1</i:cz>
    <i:lz>1</i:lz>
    <i:nz>1</i:nz>
    <i:oz>0</i:oz>
    <i:sz>1</i:sz>
    <i:vz>1</i:vz>
  </label:contentLabel>

  <rdf:Description rdf:about="">
    <dc:creator rdf:resource="http://www.icra.org" />
    <label:authorityFor>http://www.icra.org/rdfs/vocabularyv03#</label:authorityFor>
  </rdf:Description>

</rdf:RDF>

Data for use case 1

As well as data about the labelled resource, the RDF instance carries a limited amount of data about itself. That is:

The creator of the RDF instance
The namespace of the labelling scheme about which further information is available from the creator.

This small addition to the file would enable the labelling organisation to make assertions via other mechanisms about the veracity of the labels.

The rules determine which of the two available labels apply to resources on example.org according to their URIs. Test data is available

2.2 Use case 2: Distributed content production, distributed label control

The Content Management Company is a major portal for a single country, Germany. In accordance with German practice, it categorises its content into age brackets 5: 0-6, 6-12, 12-16, 16-18 and 18+. The Content Management Company defines 5 labels and puts them in an RDF instance.

<?xml version="1.0"?>
<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:label="http://www.w3.org/2004/12/q/contentlabel#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
 xmlns:i="http://www.icra.org/rdfs/vocabularyv03#">

  <rdf:Description rdf:about="">
    <dc:creator rdf:resource="http://www.icra.org" />
    <label:authorityFor>http://www.icra.org/rdfs/vocabularyv03#</label:authorityFor>
  </rdf:Description>

  <label:Ruleset>
    <label:hostRestriction rdf:resource="#hosts" />
  </label:Ruleset>

  <label:Ruleset rdf:ID="hosts">
    <label:hostRestriction>example.de</label:hostRestriction>
  </label:Ruleset>

  <label:contentLabel rdf:ID="defaultContentPage">
    <i:cz>1</i:cz>
    <i:lz>1</i:lz>
    <i:nz>1</i:nz>
    <i:oz>1</i:oz>
    <i:sz>1</i:sz>
    <i:vz>1</i:vz>
  </label:contentLabel>

  <label:contentLabel rdf:ID="a6">
    <i:cz>1</i:cz>
    <i:lz>1</i:lz>
    <i:nb>1</i:nb>
    <i:oz>1</i:oz>
    <i:sz>1</i:sz>
    <i:vz>1</i:vz>
  </label:contentLabel>

  <label:contentLabel rdf:ID="a12">
    <i:cb>1</i:cb>
    <i:lc>1</i:lc>
    <i:na>1</i:na>
    <i:nb>1</i:nb>
    <i:oz>1</i:oz>
    <i:sa>1</i:sa>
    <i:vb>1</i:vb>
    <i:vc>1</i:vc>
    <i:vd>1</i:vd>
    <i:vg>1</i:vg>
  </label:contentLabel>

  <label:contentLabel rdf:ID="a16">
    <i:cb>1</i:cb>
    <i:lb>1</i:lb>
    <i:lc>1</i:lc>
    <i:na>1</i:na>
    <i:nb>1</i:nb>
    <i:nc>1</i:nc>
    <i:oa>1</i:oa>
    <i:ob>1</i:ob>
    <i:oc>1</i:oc>
    <i:sa>1</i:sa>
    <i:sb>1</i:sb>
    <i:vb>1</i:vb>
    <i:vc>1</i:vc>
    <i:vd>1</i:vd>
<i:vg>1</i:vg>
<label:hasModifier><i:xb /></label:hasModifier>
</label:contentLabel>

<label:contentLabel rdf:ID="a18">
    <i:ca>1</i:ca>
    <i:la>1</i:la>
    <i:lb>1</i:lb>
    <i:lc>1</i:lc>
    <i:na>1</i:na>
    <i:nb>1</i:nb>
    <i:nc>1</i:nc>
    <i:oa>1</i:oa>
    <i:ob>1</i:ob>
    <i:oc>1</i:oc>
    <i:sa>1</i:sa>
    <i:sb>1</i:sb>
    <i:sc>1</i:sc>
    <i:sd>1</i:sd>
    <i:sf>1</i:sf>
    <i:vb>1</i:vb>
    <i:vc>1</i:vc>
    <i:vd>1</i:vd>
    <i:ve>1</i:ve>
    <i:vf>1</i:vf>
    <i:vg>1</i:vg>
    <i:vh>1</i:vh>
    <i:vi>1</i:vi>
    <i:vj>1</i:vj>
</label:contentLabel>

</rdf:RDF>

Data for use case 2

The CMS operated by the Content Management Company uses metadata associated with elements present on each page to assign a label for the generated HTML. This is achieved by inserting one of 5 possible link tags:

><link rel="meta" href="http://www.example.de/labels.rdf#a18" type="application/rdf+xml" />

Although there are no rules for determining which label applies to which URI, there is a host restriction in place so that it is clear that the labels may only be applied to example.de. Furthmore, this example shows the host restriction list as a separate block within the RDF instance. By extension, the list of hosts may be held in a separate file. This allows the main RDF instance to be signed or otherwise "approved of" by the labelling organisation while allowing the content provider to add new hosts.

Test data is available

2.3 Use case 3: Centralised generic labels with local override

This is a combination of use cases 1 and 2. The staff at one of the Exemplary Portal's 40 production centres feels that a particular page they've created should not be associated with the "defaultContentLabel" but should use the "advert" label instead. This is achieved easily by changing the link tag as follows:

The same RDF instance is referenced as for other pages on the example.org site. However, the tag points to a specific fragment of the instance and the filter should respect this rather than processing the application rules.

NB. It is quite possible that a user's computer will already hold the RDF instance in cache and, in this use case, probably allow the page based on the information it already has. Filters should, however, always follow links to RDF instances to look for data of the type ContentLabel. This means that it is possible a client will have two different descriptions of the same resource. However, they will each have a different provenance so that a client, such as a filter, will be able to give more weight to one description than another.

Search engine results

A very similar situation obtains in a search engine's results. The labelling of the results page cannot be pre-determined since, by its nature, it is compiled at runtime based on a user's input. Similarly, its URL is not a guide to what content is on the page.

Use case 3 covers this well since images and other page elements that appear irrespective of the search conducted will be labelled through the centralised system (their URIs are static). The HTML page itself will also be labelled through the centralised system, BUT, if the search engine recognises that the label is inappropriate for the results returned, an extra link tag can be inserted in the HTML <head> that points to a more appropriate label. This will effectively override the ApplicationRules specified in the RDF instance.

Test data is available

2.4 Use case 4: Video on demand

The Dutch Example Video Company makes its movies available online. It wants to label its website in the usual way but also wants to provide separate labels for the individual movies.

<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:label="http://www.w3.org/2004/12/q/contentlabel#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:i="http://www.icra.org/rdfs/vocabularyv03#">

  <rdf:Description rdf:about="">
    <dc:creator rdf:resource="http://www.icra.org" />
    <label:authorityFor>http://www.icra.org/rdfs/vocabularyv03#</label:authorityFor>
  </rdf:Description>

  <label:Ruleset>
    <label:hostRestriction>examplevideocompany.nl</label:hostRestriction>
    <label:hostRestriction>examplevideocompany.com</label:hostRestriction>
    <label:rules rdf:parseType="Collection">

      <label:Matches>
	<label:value>movie1</label:value>
	<label:confers>
	  <label:PropertySet>
	    <label:hasLabel rdf:resource="#label_1"/>
            <label:frequentScenes rdf:resource="#label_2" />
	  </label:PropertySet>
	</label:confers>
      </label:Matches>

      <label:Matches>
	<label:value>movie2</label:value>
	<label:confers>
	  <label:PropertySet>
            <label:hasLabel rdf:resource="#label_2" />
            <label:singleScene rdf:resource="#label_3" />
	  </label:PropertySet>
	</label:confers>
      </label:Matches>

      <label:Matches>
	<label:value>.*</label:value>
	<label:confers>
	  <label:PropertySet>
	    <label:hasLabel rdf:resource="#label_1"/>
	  </label:PropertySet>
	</label:confers>
      </label:Matches>

    </label:rules>
  </label:Ruleset>

  <label:contentLabel rdf:ID="label_1">
    <rdfs:label>No nudity, violence etc.</rdfs:label>
    <i:nz>1</i:nz>
    <i:sz>1</i:sz>
    <i:vz>1</i:vz>
    <i:lz>1</i:lz>
    <i:oz>1</i:oz>
    <i:cz>1</i:cz>
  </label:contentLabel>

  <label:contentLabel rdf:ID="label_2">
    <rdfs:label>Mild nudity</rdfs:label>
    <i:na>1</i:na>
    <i:sz>1</i:sz>
    <i:vz>1</i:vz>
    <i:lz>1</i:lz>
    <i:oz>1</i:oz>
    <i:cz>1</i:cz>
  </label:contentLabel>

  <label:contentLabel rdf:ID="label_3">
    <rdfs:label>Mild language</rdfs:label>
    <i:nz>1</i:nz>
    <i:sz>1</i:sz>
    <i:vz>1</i:vz>
    <i:lc>1</i:lc>
    <i:oz>1</i:oz>
    <i:cz>1</i:cz>
  </label:contentLabel>

</rdf:RDF>

Data for use case 4

The data includes a rule that allows a client to infer that everything on the Example Video Company's website should have label 1. However, specific information is also provided for two movies. The first movie has frequent scenes of what the content provider calls mild nudity (bare breasts in the ICRA vocabulary) but otherwise there is no sex, nudity, potentially offensive language, violence etc. in this movie.

Likewise movie 2 has mild nudity throughout and a single scene when mild expletives are used.

This use case also exemplifies the power of using rules rather than specifying the resource's URI in the usual way. The Example Video Company operates the same site on two top level domains (.nl and .com), both accessible with and without the www prefix. Furthermore, it offers its movies in a range of formats (Real, Quick Time etc.) Whichever domain is being used, whichever format the user chooses, assuming a consistent approach is used to name the files, the same label will be applied without duplicating the metadata.

Note: In this example, all descriptions are held in a single file, however, the nature of RDF is that the data can be divided between any number of files. For example, each movie could point to its own data file that then referenced "label_1," "label_2" etc. from a central file as required.

Test data is available

3 The schema

A schema to support this system was devised initially by Kal Ahmed and is described in the document linked below.

3.1 A schema for RDF Content Labels

This document describes the RDF schema proposed for PICS-like labelling schemes.

4 Current issues under discussion

4.1 OWL

Naturally, the group recognises the potential power of OWL to support content labelling. We have taken the approach that an application rule is a class with a series of properties that allow the correct label to be identified. An alternative approach is to treat a collection of resources as a class that has a series of properties that allow it to be matched with a label. The latter is more in line with OWL and offer advantages in some areas. However, we believe our approach offers distinct advantages, notably in not requireing clients to include both RDF and OWL processors.

A relatively trivial task to be undertaken by the group will be to provide a mapping from the Application Rule class-based approach to an OWL-based approach

4.2 Frequency modifiers

RDF content labels are designed to support a method of assigning frequency modifiers. That is, a movie might be described by content label A for most of its duration but have "several scenes" described by content label B and "occasional scenes" described by content label C. The mechanism for this is currently undder discussion.

4.3 Adding trust, web services etc.

The labelling structure suggested here provides a way for a content provider to label their own material using any vocabulary. Under the QUATRO project, mechanisms will be defined, or existing mechanisms highlighted, that will allow clients to gain further trust in those labels. This might be through third party label providers (analogous to a PICS Label Bureau), digital signatures, other database look up systems or content analysis methods.

4.4 Verbosity

The comment has been made that a content label written in RDF/XML is rather more verbose than one written in PICS. This is certainly true, however, the posting of labels in a discrete RDF instance that can cover any number of resources on any number of domains offers substantial optimisation benefits for content providers and client developers. The specific suggestion was made that an ICRA label could be written simply as, for example:

<i:cz />

As opposed to the suggested

<i:cz>1</i:cz>

This works for the ICRA system which is binary in nature. The descriptor "cz" (no user-generated content) is either true or not. However, other labelling/rating schemes will use numerical value other than 1 and 0. Therefore the generalised labelling schema must support this.

5 References

[PICS]: Platform for Internet Content Selection (PICS) (See http://www.w3.org/PICS/)
[RDF]: Resource Description Framework (RDF) (See http://www.w3.org/RDF/)
[QUATRO]: Quatro project (See http://www.icra.org/projects/quatro/)
[ICRA]: ICRA (An independent non-proft organisation)

RDF Content Labels: Use Cases and Draft Schemas

Abstract

Status of this Document

Table of Contents

1 Introduction

2 Use cases

2.1 Use case 1: Distributed content production, centralised label control

2.2 Use case 2: Distributed content production, distributed label control

2.3 Use case 3: Centralised generic labels with local override

Search engine results

2.4 Use case 4: Video on demand

3 The schema

3.1 A schema for RDF Content Labels

4 Current issues under discussion

4.1 OWL

4.2 Frequency modifiers

4.3 Adding trust, web services etc.

4.4 Verbosity

5 References