W3C Workshop on Metadata for Content Adaptation

ICRA Position paper

Since 1999, ICRA has operated the internet's leading system of self-labelling for the purposes of child protection. Like its predecessor organisation, RSACi, and other online labelling systems, ICRA has used the PICS Recommendations to encode its labels.

ICRA acknowledges that take-up of labelling has been limited. Although several high profile websites carry ICRA's labels, few are "fully labelled" in terms that a PICS-based filter would understand. There are several reasons for this that were discussed in detail at a workshop at WWW2004.

Neither has ICRA labelling been used to its full potential, notably in content adaptation. This is unfortunate since the potential is significant. If a delivery system were aware that content of a particular type should not be offered, perhaps in the children's section of a website, then alternatives can be automatically identified and provided. In this way the user sees customised content and the provider keeps their audience.

In June this year, under the chairmanship of David Young of Verizon, ICRA established a working group to explore improved methods of labelling that would overcome both the technical problems associated with PICS and the logistical problems faced by large organisations who wish to label their content. By working on solutions to this problem, ICRA believes that while maintaining its focus on child protection it can make a real contribution to the wider issues of associating metadata with content.

Various methods of linking RDF to web resources, especially HTML documents, have been suggested. Simply embedding RDF in an (X)HTML document breaks the Doc Type and is not a satisfactory solution. Using namespace defined meta tags, as recommended by the Dublin Core initiative for example, has two key drawbacks:

It only applies to (X)HTML.
It is highly verbose - every document would carry it's own long-hand description, even if 80% of that description were the same as another 500 documents.

The Recommendation is that descriptions should be held in discrete files and that resources should point to them. However, the assumption at the heart of RDF is that each resource has its own unique description. Content labelling for child protection and other purposes calls for the ability to associate a single label with an unlimited number of resources for which the same meta data applies. There are a lot of PG-13 films, but the Motion Picture Association of America only has one PG-13 rating.

ICRA contends that the ideal system for delivering labels of this type - ones that can be applied to multiple resources - will have a number of features that in addition to helping to empower parents to make choices about what their children do and don't see online would be directly applicable to content adaptation and other metadata applications. A single description should be able to cover all resources within a defined domain, subdomain, path etc. Furthermore, it should be possible to override a description applied at a domain level at document level.

The working group has made significant progress towards realising these desirable features but recognises that further work needs to be done. We have devised and tested two candidate solutions that complement each other.

In both candidate solutions, the description(s) are held in a separate file - an RDF/XML instance. A content provider might define, say, 3 or 4 descriptions that describe their material and these are encoded within that single file. From an ICRA point of view, these would declare the type of content present, but they could just as easily convey any metadata that applied to multiple resources. Such a file would be retrieved once by a client and processed internally rather than being fetched repeatedly for each request to the network. The two candidate solutions differ in the way in which the labels are linked to the content they describe.

Candidate solution 1

In this scenario, content includes a pointer (in its header information) to one of the descriptions (or the single description) in the RDF file. Any number of resources can point to a given description and, for a given description, the pointer is identical. The content management system/webmaster retains the responsibility for including a pointer that links the content to the correct description.

Candidate solution 2

In this scenario the descriptions include extra information about where they should be applied. Therefore a content provider can arrange for the same pointer to be included with all content, irrespective of the description it should have. Within the description there would be data that encodes rules such as 'everything on our domain should have description A except things with the word "chatroom" in them which should have description B.' The responsibility for labelling content correctly can then be passed to an individual or department who may themselves have no direct access to or control over the content, just the descriptions for it.

Basic demonstration-standard tools have been devised to generate these descriptions and to locate and parse them.

Personalisation - the privacy problem

If a user declares that his/her device has particular properties, perhaps using CC/PP, or that they prefer a text only version of a site, they disclose simply how they can or prefer to receive the material offered. If, however, the user discloses that, for example, sexual material, gambling services or other content types are not wanted, they disclose information about themselves. This constitutes a loss of privacy that might be regarded as acceptable by many users if it improved the ratio of wanted to unwanted material received.

However, if the profile of the user indicated that he/she was probably a young child, would that increase or decrease their vulnerability on the web?

The possibility that it would actually make children more vulnerable, particularly to paedophile activity, currently prevents ICRA from advocating that filter settings should be disclosed when making requests to the web. If there is a way around this problem however, the potential of ICRA labelling for content adaptation becomes significantly greater.

Summary position

ICRA and others are working on a system that allows metadata to be associated with multiple resources. This metadata, expressed in RDF/XML, offers the potential for uses that reach from content labelling for child-protection purposes, through to discovery metadata, copyright, trust marks etc. all of which can have direct bearing on content adaptation.

Phil Archer, CTO, ICRA
7^th September 2004

Companies/individuals involved include:

Kal Ahmed (Techquila)
Vodafone Global (Dan Appelquist)
ICRA
W3C (Dan Brickley et al)
Web Host Automation (Mark Hall)
Blogwise.com / 1Do3 (Sven Latham)
Kingston Communications (Richard Sandy, Ian Bissett)
T-Online
Verizon
Yahoo!