Overview of the RDF Vocabularies for Labelling


Table of Contents

Introduction
How To Read This Document
The Content Labelling Vocabulary
Vocabulary Description
Creating a Labelling Scheme
Identifying, Naming and Describing Scheme Components
Defining A Labelling Scheme
Using the ICRA labelling vocabulary.
Content Label Basics
Specifying the application of a label to resources

Introduction

This document provides an overview of two RDF vocabularies which together enable the use of RDF for content labelling with the ICRA scheme. The two vocabularies are the generic Content Labelling Vocabulary which provides a mechanism for describing a content labelling system; and the ICRA content labelling vocabulary which describes the ICRA content labelling scheme.

How To Read This Document

The first two sections of this document describe how the generic vocabulary for defining labelling schemes is constructed and how to apply the vocabulary to defining a new labelling scheme. If you are only interested in applying the ICRA labelling scheme in RDF, you can skip the first part of this document and go straight to the section called “Using the ICRA labelling vocabulary.”.

The Content Labelling Vocabulary

The Content Labelling Vocabulary (namespace http://www.w3.org/2004/12/q/contentlabel#) provides a simple vocabulary for the description of a labelling scheme. A labelling scheme consists of one or more categories which group together related content descriptors and zero or more modifiers which provide further context for a label. Together, these are referred to in this document as the components of a labelling scheme.

In terms of the ICRA vocabulary, "Violence" is a category, "Deliberate injury to human beings" is a descriptor, and "Material appears in a sports context" is a modifier.

The Content Labelling Vocabulary defines a small set of classes and properties that are the basis for defining labelling schemes. A labelling scheme such as the ICRA scheme is created by defining instances of these classes and using the properties to define the relationships between those instances.

Vocabulary Description

ContentLabel

Full URI. http://www.w3.org/2004/12/q/contentlabel#ContentLabel

Description.  An instance of this class is a single descriptive label for content which may be applied to one or more web resources.

Properties. The following properties may be specified for a ContentLabel instance:

  • hasModifier specifies the modifiers for the content label.

  • Any subproperty of the the descriptor property.

Category

Full URI. http://www.w3.org/2004/12/q/contentlabel#Category

Description.  A category is a grouping of related content descriptors. In the ICRA system, these groupings are thematic, but this is not a constraint on category instances in general.

Properties. 

  • hasDescriptor specifies the descriptors which make up this category.

descriptor

Full URI. http://www.w3.org/2004/12/q/contentlabel#descriptor

Description.  A descriptor defines a single form of content which may or may not be present in a resource. When labelling web resources, a descriptor is used as a property of the content label that it applies to. This means that a descriptor has a range of allowed values. The Content Labelling Vocabulary does not restrict the allowed range of values.

hasDescriptor

Full URI. http://www.w3.org/2004/12/q/contentlabel#hasDescriptor

Description.  This property connects a category to the descriptors that make up that category. It can be used by applications to quickly list what all the possible descriptors for a category are.

Modifier

Full URI. http://www.w3.org/2004/12/q/contentlabel#Modifier

Description.  A modifier provides context for a content label as a whole. Each content labelling scheme may define its own set of modifiers.

hasModifier

Full URI. http://www.w3.org/2004/12/q/contentlabel#hasModifier

Description.  This property connects an instance of the Modifier class to the ContentLabel that it modifies.

hasContentLabel

Full URI. http://www.w3.org/2004/12/q/contentlabel#hasContentLabel

Description.  This is a property that links a resource to the ContentLabel that labels that resource.

Creating a Labelling Scheme

This section describes how to apply the Content Labelling Vocabulary to create a specific labelling scheme.

Identifying, Naming and Describing Scheme Components

The Content Labelling Vocabulary makes use of basic RDF functionality for identifying, naming and describing the components that make up a labelling scheme.

Each component of the scheme is assigned an ID. This ID, when combined with the base URL of the RDF resource that describes the scheme, gives a unique URI identifier for the component. For example, the ICRA labelling scheme is defined by the resource with the base URI http://www.icra.org/rdfs/vocabularyv03# and the descriptor for "Unmoderated user-generated content" is currently defined in that resource with the ID "cb", so the full identifier for the descriptor is http://www.icra.org/rdfs/vocabularyv03#cb.

Each component should always be assigned a short name. This should be a name suitable for display in a user interface and should be consumer-oriented in nature. A good short name would be "Violence" or "Injury to animals", a bad short name would be "vb" or "vz". RDF provides a mechanism for these short names by using the rdfs:label property. A component can have any number of rdfs:label property values, although it is STRONGLY recommended that they should be distnguished from each other using an xml:lang attribute and that there should be only one label per language.

Example 1. An example of a short name

<label:Category rdf:ID="nx">
  <rdfs:label xml:lang="en">Nudity</rdfs:label>
  ...
</label:Category>

A component may also be assigned a longer description that might be displayed to a user as pop-up help text. For this description, use the RDF-defined rdfs:comment property. Again, multiple rdfs:comment labels may be provided, but should be distinguished by language using the xml:lang attribute.

Example 2. An example of a short description

<label:Category rdf:ID="nx">
  <rdfs:label xml:lang="en">Nudity</rdfs:label>
  <rdfs:comment xml:lang="en">
    Exposed breasts, Bare buttocks, Visible genitals
  </rdfs:comment>
</label:Category>

Finally, a component may also contain a link to another web resource that provides a much more detailed description. For this link, use the RDF-defined rdfs:seeAlso property. The value of this property MUST be an RDF resource URI.

Example 3. An example of a reference to a longer description

<label:Category rdf:ID="nx">
  <rdfs:label xml:lang="en">Nudity</rdfs:label>
  <rdfs:comment xml:lang="en">
    Erections or female genitals in detail, Male genitals, Female genitals, 
    Female breasts, Bare buttocks
  </rdfs:comment>
  <rdfs:seeAlso rdf:resource="http://www.icra.org/vocabulary/#hn"/>
</label:Category>

Defining A Labelling Scheme

  1. Define Categories

    Each category in a labelling scheme has the identifier, name and descriptions described above, and a list of the descriptors that are part of that category. The descriptors are linked to the category using the label:hasDescriptor property. As there is a list of descriptors, and we want the list to be closed (i.e. no more can be added to the list without modifying our vocabulary file), we specify the hasDescriptors property value as a collection.

    Each descriptor must be defined as being a subPropertyOf the descriptor property from the Content Labelling Vocabulary.

    Example 4. Example of a Category Definition

    <!-- Nudity category -->
      <label:Category rdf:ID="nx">
        <rdfs:label xml:lang="en">Nudity</rdfs:label>
        <rdfs:comment xml:lang="en">Exposed breasts, bare buttocks, visible genitals</rdfs:comment>
        <rdfs:seeAlso rdf:resource="http://www.icra.org/vocabulary/#hn"/>
        <label:hasDescriptor rdf:parseType="Collection">
          <rdf:Property rdf:ID="na">
    	<rdfs:label xml:lang="en">Exposed breasts</rdfs:label>
    	<rdfs:subPropertyOf rdf:resource="#icraDescriptor"/>
          </rdf:Property>
          <rdf:Property rdf:ID="nb">
    	<rdfs:label xml:lang="en">Bare buttocks</rdfs:label>
    	<rdfs:subPropertyOf rdf:resource="#icraDescriptor"/>
          </rdf:Property>
          <rdf:Property rdf:ID="nc">
    	<rdfs:label xml:lang="en">Visible genitals</rdfs:label>
    	<rdfs:subPropertyOf rdf:resource="#icraDescriptor"/>
          </rdf:Property>
          <rdf:Property rdf:ID="nz">
    	<rdfs:label xml:lang="en">No nudity</rdfs:label>
    	<rdfs:subPropertyOf rdf:resource="#icraDescriptor"/>
          </rdf:Property>
        </label:hasDescriptor>
      </label:Category>
    
  2. Define Modifiers

    Each modifier is simply defined as an instance of the label:Modifier class. Modifiers should be defined with names and descriptions as described above, but there is no need to define any other properties for a modifier.

    Example 5. Example of a Modifier definition

      <label:Modifier rdf:ID="xa">
        <rdfs:label xml:lang="en">This material appears in an artistic context </rdfs:label>
        <rdfs:seeAlso rdf:resource="http://www.icra.org/vocabulary/#hx"/>
      </label:Modifier>
    

Using the ICRA labelling vocabulary.

This section covers the creation of content labels using the ICRA-defined labelling scheme.

Content Label Basics

A content label consists of two principle components:

  • a list of descriptor properties, and

  • a list of modifiers.

Under the ICRA labelling scheme, each descriptor property MUST have a value that is a valid boolean as defined by W3C XML Schema Part 2: Datatypes (this allows the values '0', '1', 'false' and 'true') and there SHOULD be at least one descriptor from each ICRA-defined category. Descriptors are listed as RDF properties of the ContentLabel resource.

Modifiers are simply present or not present in a content label and no value is associated with them. If a modifier is present in a label, then the modifier applies. Modifiers are added to a content label using the hasModifier property.

Example 6. A simple label with descriptors and modifiers

  <label:ContentLabel rdf:ID="siteLabel">
    <i:cz>1</i:cz>
    <i:lz>1</i:lz>
    <i:nz>1</i:nz>
    <i:oz>1</i:oz>
    <i:sz>1</i:sz>
    <i:vz>0</i:vz>
    <label:hasModifier><i:xb/></label:hasModifier>
  </label:ContentLabel>

The label:ContentLabel tag tells the processor that this is an RDF resource of type ContentLabel (from the namespace http://www.w3.org/2004/12/q/contentlabel#). The rdf:ID attribute on the label:ContentLabel element enables this label to be selected from an XML file containing multiple labels, the value must be unique within the XML file.

The elements i:cz to i:vz specify the values of descriptors defined by the ICRA scheme. The namespace i should be the URI http://www.icra.org/rdfs/vocabularyv03#, so these XML tags actually represent the descriptors http://www.icra.org/rdfs/vocabularyv03#cz (chat), http://www.icra.org/rdfs/vocabularyv03#lz (language) etc.

The label:hasModifier element contains a single modifier represented by the i:xb tag. This indicates that the modifier identified by the URI http://www.icra.org/rdfs/vocabularyv03#xb (educational context) applies to this label.

Specifying the application of a label to resources

There are a number of ways in which a label may be applied to a resource, in many cases it is simplest to include a reference to the label either within the resource itself or within the HTTP response header generated by the server when it provides the resource. However, in some cases it is either necessary or more efficient to define a catalog of labels and the rules that a client should use to apply those labels to resources. This can be achieved using the RDF Rules vocabulary.