Incubator Activity > W3C Content Label Incubator Group Charter

Sponsoring Members
Objectives and Scope
Deliverables
Duration
Contact
Operations and Procedures
Patent Policy
Meetings
Additional Information
- Use cases

In its earliest days, W3C recognized a need to be able to describe content according to a defined vocabulary. This could be done for a variety of reasons including, but not limited to, child protection. The result was the PICS system which, despite early promise, has achieved limited support.

The underlying need for such a system is, however, undiminished. To child protection systems like that operated by ICRA, we can now add trustmarks, such as Segala's accessibility certificate, and the proposed mobileOK system being devised by the Mobile Web Initiative [MWI] - all of which have the potential to make a significant contribution to content discovery and search. As the use cases provided as Additional Information below show, the scope can be broader still encompassing digital rights management and metadata that allows a user's choice of resource to lead reliably to related content.

In essence what's required is a way of making any number of assertions about a resource or group of resources. In order to be trustworthy, the label containing those assertions should be testable in some way through automated means.

These issues have been addressed in a variety of ways, notably in the W3C's Semantic Web Activity. The development of RDF picked up where PICS left off (indeed, PICS-NG was an early moniker for RDF) and the potential for content labels, especially where the assertions they carry can be cross-checked by database look-up and other validation processes, now exceeds the early concept embodied by PICS. The need for a system to make assertions about a group of resources however means that RDF is not an "out of the box" solution (see Additional Information).

Objectives

The first objective must be to review the use cases and establish a set of requirements. Some relevant questions for this stage might be:

Are the use cases clear enough?
Should further use cases be added?
Can a scalable method of creating metadata for the Web be trustworthy?
What role can digital signatures play?
What other techniques can be drawn upon?

Secondly, in the light of those requirements, the RDF Content Labels schema [RDF-CL] should be scrutinized to see whether it is a suitable system and, if so, what improvements can and should be made. If RDF-CL is found not to be suitable, the task must be to suggest a better alternative. Either way, the discussion should include examples of appropriate Web Services using techniques such as SOAP and SPARQL.

Thirdly, the XG should consider, and where necessary define, suitable methods for the application of the new content labeling system to a range of standards such as ATOM and SMIL as well as XHTML.

The scope of the activity is therefore within that of Web-Based Applications and will support the efforts of a broad range of user communities.

Deliverables

A report that describes the work done by the XG and provides the normative framework for making assertions about a resource or a group of resources.

A proposal to offer the defined method as an alternative to, or perhaps a replacement for, PICS.

Duration

Since the proposed activity seeks to define a method of using existing standards rather than creating new ones, and may be able to achieve its goal through scrutinizing, improving and formalizing work already done, it is anticipated that the XG may complete its work in as little as 3 months. However, it is also possible that more comprehensive work may be required so that a 6 month charter is sought.

This Incubator Group is chartered through 7 February 2007.

Contact

The XG will be chaired by Phil Archer of ICRA <parcher@icra.org>.

Operations and Procedures

This Incubator Group operates according to W3C Incubator Group Procedures. All technical work is on a public mailing list (archive) and Web pages. Please note that the proceedings of this Incubator Group (mailing list archives, minutes, etc.) are publicly visible.

This Incubator Group makes decisions by consensus, manages dissent and maintains standing of its participants according to the W3C Process Document.

Patent Policy

Patent Disclosures

The Content Label Incubator Group provides an opportunity to share perspectives on proving metadata for groups of resources. W3C reminds Incubator Group participants of their obligation to comply with patent disclosure obligations as set out in Section 6 of the W3C Patent Policy. While the Incubator Group does not produce Recommendation-track documents, when Incubator Group participants review Recommendation-track specifications from Working Groups, the patent disclosure obligations do apply.

Meetings

In an effort to minimize costs, face to face meetings will be co-located with other meetings that a significant number of participants are attending. The Workshop on Transparency and Usability of Web Authentication in March and the Mobile Web Best Practices meeting scheduled for June are noteworthy in this regard. Any standalone face to face meetings are likely to be held in London. Regular meetings will be held fortnightly using the W3C's Zakim telephone/IRC facility. The mailing lists will provide an important part of the communication both internally and externally.

Additional Information

The demand for a flexible and feature-rich content labeling scheme is exemplified in the following use cases.

Use case 1: Content provider to content aggregator

The Exemplary Multimedia Company offers a range of ring tones, video clips, full TV programmes, images and text. In order to maximize their assets they make metadata available that describes each resource in terms of:

content type, subject matter, authorship, genre etc.
compliance with Mobile Web Initiative Best Practice (mobileOK)
compliance with WAI guidelines
presence or absence of nudity, sexual content, violence etc.

Most of the available content can be described in the same way in all these areas, however, across the portfolio, there are differences. Rather than spend considerable time and effort to create a complete set of metadata for each resource, the Exemplary Multimedia Company wishes to group resources together for descriptive purposes. For example:

As a matter of policy, all content created after 1 January 2005 meets WAI AA standard.
Content created after 1 January 2006 meets the Mobile Web Initiative's mobileOK standard.
There is no sex or violence in any content but resources whose URLs contain the word "-pg" may portray bare breasts, bare buttocks, alcohol or gambling.
The content is organized in such a way that the genre of a resource (pop, film, fashion etc.) can be inferred from its host, such as http://fashion.example.com
All material is copyright Exemplary Multimedia Company
Some metadata is unique to a given resource, such as title and author. This can be accessed using a URI associated with the resource. This might be a URL, an internal ID number or the resource's ISAN number.

Using such a system allows the creation and maintenance of the metadata to be semi-automated.

At the other end of the system, the Exemplary Content Aggregator receives the metadata as a single resource that can be referenced when compiling pages of content for end users. Since there is a commercial relationship between the content provider and content aggregator (and ultimately with the end user) the data can be taken at face value. As an added security measure, however, the data is digitally signed by the Exemplary Multimedia Company. Customized content can be presented in an appropriate format for the end user whether they're receiving it on a mobile device, on a desktop, or through an IPTV channel.

Use case 2: Trustmark Scheme operator to content portal

The Example Trustmark Scheme reviews online traders, providing a trustmark for those that meet a set of published criteria. The scheme operator wishes to make its trustmark available as machine readable code as well as a graphic so that content aggregators, search engines and end-user tools can recognize and process them in some way.

The trustmark operator maintains a database of sites it has approved and makes this available in two ways:

First, the labelled site includes a link to the database. This can be achieved in a variety of ways such as an XHTML Link tag, an HTTP Response Header or even a digital watermark in an image. A user agent visiting the site detects and follows the link to the trustmark scheme's database from which it can extract the description of the particular site in real time.

Secondly, the scheme operator makes the full database available in a single file for download and processing offline.

Since the actual data comes directly from the trustmark scheme operator, it is not open to corruption by the online trader and can therefore be considered trustworthy to a large degree. To reduce the risk of spoofing, however, the data is digitally signed.

Use case 3: Website to end-user

Mrs Chaplin teaches 7 year olds at her local school. An IT enthusiast, she makes her teaching materials available through her personal website. She adds metadata to her material that describes the subject matter and curriculum area. In order to gain wider trust in her work she submits her site for review by her local education authority and a trustmark scheme. Both reviewers offer Mrs Chaplin a digitally signed, machine-readable version of their trustmark that she can add to her site. She merges these into a single pool of metadata to which she adds content descriptors from a recognized vocabulary that declare the site to contain no sex or violent content. She adds her own digital signature to the metadata. The set of digital signatures allow user-agents to identify the origin of the various assertions made. As in use case 2, links from the content itself point to this metadata.

Since the metadata is on the website itself, user agents are unlikely to take the assertions made in the metadata at face value. The local authority does not operate a web service that can support the label but the trustmark does. A user-agent can interrogate the trustmark operator's database in real time to check whether Mrs Chaplin is authorized to make the assertions relevant to their namespace. Furthermore, the use of a recognized vocabulary for the content description means that a content analyser trained to work with that vocabulary can give a probabilistic assessment of the accuracy of the relevant data.

Taken together, these multiple sources of data can provide confidence in the quality of the content and the local authority trustmark which is not directly testable. The multiple data sources may be further supported by recognising that Mrs Chaplin's work is cited in many online bookmarks, blog entries and postings to education-related message boards.

Use case 4: Rich metadata for RSS/ATOM

Dave Cook's website offers reviews of children's films and the site is summarized in both RSS and ATOM feeds. Most of the films reviewed have an MPAA rating of G and/or British Board of Film Classification rating of U. This is declared in a rating for the channel as a whole. However, Dave includes reviews of some films rated PG-13 or 12 respectively which is declared at the item level and overrides the channel level metadata.

The actual rating information comes from an online service operated by the relevant film classification board itself and is identified using either an ISAN number or the relevant Internet Movie Database entry ID number. As with use case 2, trust is implicit given the source of the data.

Use case summary

The use cases above are not exhaustive but they illustrate situations where metadata is made available and the quality of that metadata can be assessed. The end goal in each case is to make metadata available in such a way that it can be trusted and therefore exploited to the full. This might be to increase the accuracy of search results, to promote the sale of related items based on previous purchases, to adapt content for different end-user devices or to ensure that age-appropriate content is presented to younger users.

Does RDF do the job?

There are features of RDF that make it entirely suited to the task - however, it is not a complete, ready-made solution.

For

RDF is now a mature technology with many potential applications for which off the shelf toolkits are available.

A key design concept is that a full description of a particular resource can be developed through aggregating triples from different sources. This fits squarely with the use cases and trust model, especially use case 3, where multiple sources of data are used to add trust to the assertions made about a given resource.

Against

There is no system for default descriptions that can be overridden by more specific data. According to the RDF philosophy, if you have information that the apple is red and other information that it is green, then the apple is "reddy-green." We need to be able to say that all apples are red unless they are identified as Granny Smiths in which case they are green and not red.

Whether the balance is in RDF's favour is a key question for the proposed incubator activity.

Relevant work already done

During 2004-2005, a method was devised for associating an RDF description with a group of URIs. Known as RDF Content Labels (RDF-CL), it is defined by an RDF Schema that includes the class ContentLabel and properties such as hasDefaultLabel and hasLabel. As implied by these property names, RDF-CL is designed to make it easy to associate a default description with a group of URIs, such as all resources on a given domain, and then to override that description with another in certain circumstances.

Work began with a group set up by ICRA, benefited from the support and cooperation of organisations such as Vodafone Global Services, IA Japan, T-Online and Yahoo!, and was subsequently refined and published under the QUATRO project in which ERCIM is a participant. Although the RDF-CL schema and documentation is hosted on the W3C domain, it is not associated with any working group and has no formal status.

Whether the particular system laid out in RDF-CL or some other system is identified as the best approach, it seems likely that either RDF and/or XML will be at the heart of the solution and therefore it will be seen as an application of an existing standard. The XG will consider this when deciding whether or not to propose that the activity should progress to the Recommendation Track.

Dependencies

As set out above, the motivation and use cases come from areas such as child protection and trustmarks with a view to contributing to content personalization. This Incubator Activity proposal therefore stands on its own. However, it is recognized that the Mobile Web Initiative is chartered to develop a "mobileOK" trustmark and the XG will be informed by this work.

Furthermore, there is continuing interest in the subject matter in Japan that it is understood may lead to the establishment of a complementary Incubator Activity. The XG notes this interest and expresses its strong desire to exchange ideas with this and other interested parties.

Input and Reference Materials

PICS: http://www.w3.org/PICS/
MWI: http://www.w3.org/Mobile/
RDF: http://www.w3.org/RDF/
RDF-CL: http://www.w3.org/2004/12/q/doc/content-labels-schema20050704.htm
SOAP: http://www.w3.org/TR/soap12-part0/
SPARQL: http://www.w3.org/TR/rdf-sparql-query/
ATOM: http://atomenabled.org/
SMIL: http://www.w3.org/AudioVideo/
QUATRO: http://www.quatro-project.org/

Phil Archer <parcher@icra.org> Content Label Incubator Group Chair
Revision: 1.0 $Date: 2006/2/2