Use Cases and Requirements

This document expands upon the original use cases as set out in the charter and then derives a set of requirements from them.

The group has finished work on this document and is now translating the data into a more detailed set of requirements that get closer to a technological solution. These will be included in the group's Final Report. Comments on this and all other aspects of the group's work are welcome through the public mailing list.

The use cases

Use Case 1: Profile matching

The original use case given in the charter has been simplified by reducing the number of essential actors to three:

CONTENT PROVIDER (metadata provider)
PORTAL PROVIDER (metadata consumer)
END USER

One can imagine a range of scenarios with very similar characteristics that amount to "sub-use cases."

Sub use case 1A: END USER discovers content appropriate to their device ["MobileOK"]

Diagrammatic representation of use case 1A

Fig 1. Diagrammatic version of sub-use case 1A.

END USER visits portal
END USER's device profile is extracted with reference to a separate metadata store
END USER searches for a topic of interest.
PORTAL PROVIDER matches END USER's device profile with contentprofiles provided by CONTENT PROVIDER.
PORTAL PROVIDER provides search results matching this topic.
PORTAL PROVIDER filters results based on the metadata encoded in the content with regard to the "mobile friendliness" of the content/presentation in question and the known properties of the device profile according to business rules.

Sub use-case B: END USER discovers content appropriate to their age-group ["Child Protection"]

Diagrammatic representation of use case 1B

Fig 2. Diagrammatic version of sub-use case 1B.

END USER visits portal
END USER's user profile is extracted from a repository, perhaps the portal's own.
END USER searches for a topic of interest.
PORTAL PROVIDER matches END USER's age with content profiles provided by CONTENT PROVIDER.
PORTAL PROVIDER provides search results matching this topic.
PORTAL PROVIDER filters results based on the metadata encoded in the content with regard to the "child friendliness" of the content/presentation in question and the known age of the user according to local business rules.

Use case 2: Trustmark Scheme operator to content portal

The Example Trustmark Scheme reviews online traders, providing a trustmark for those that meet a set of published criteria. The scheme operator wishes to make its trustmark available as machine readable code as well as a graphic so that content aggregators, search engines and end-user tools can recognize and process them in some way.

The trustmark operator maintains a database of sites it has approved and makes this available in two ways:

First, the labelled site includes a link to the database. This can be achieved in a variety of ways such as an XHTML Link tag, an HTTP Response Header or even a digital watermark in an image. A user agent visiting the site detects and follows the link to the trustmark scheme's database from which it can extract the description of the particular site in real time.

Secondly, the scheme operator makes the full database available in a single file for download and processing offline.

Since the actual data comes directly from the trustmark scheme operator, it is not open to corruption by the online trader and can therefore be considered trustworthy to a large degree. To reduce the risk of spoofing, however, the data is digitally signed.

Use case 3: Website to end-user

Mrs Chaplin teaches 7 year olds at her local school. An IT enthusiast, she makes her teaching materials available through her personal website. She adds metadata to her material that describes the subject matter and curriculum area. In order to gain wider trust in her work she submits her site for review by her local education authority and a trustmark scheme. Both reviewers offer Mrs Chaplin a digitally signed, machine-readable version of their trustmark that she can add to her site. She merges these into a single pool of metadata to which she adds content descriptors from a recognized vocabulary that declare the site to contain no sex or violent content. She adds her own digital signature to the metadata. The set of digital signatures allow user-agents to identify the origin of the various assertions made. As in use case 2, links from the content itself point to this metadata.

Since the metadata is on the website itself, user agents are unlikely to take the assertions made in the metadata at face value. Unlike the trustmark operator, the local authority does not operate a web service that can support the label, it does, however, digitally sign its labels and publishes its public key on its website. This can be used to verify that it is indeed the local education authority that issued the relevant data in the label.

Separately, a user-agent can interrogate the trustmark operator's database in real time to check whether Mrs Chaplin is authorized to make the assertions relevant to their namespace. Furthermore, the use of a recognized vocabulary for the content description means that a content analyser trained to work with that vocabulary can give a probabilistic assessment of the accuracy of the relevant data.

Taken together, these multiple sources of data can provide confidence in the quality of the content and the local authority trustmark which is not directly testable. The multiple data sources may be further supported by recognising that Mrs Chaplin's work is cited in many online bookmarks, blog entries and postings to education-related message boards.

Use Case 4: Rich Metadata for RSS/ATOM

Dave Cook's website offers reviews of children's films and the site is summarized in both RSS and ATOM feeds. Most of the films reviewed have an MPAA rating of G and/or British Board of Film Classification rating of U. This is declared in a rating for the channel as a whole. However, Dave includes reviews of some films rated PG-13 or 12 respectively which is declared at the item level and overrides the channel level metadata.

The actual rating information comes from an online service operated by the relevant film classification board itself and is identified using a URL and human-readable text. The movie itself is identified by either an ISAN number or the relevant Internet Movie Database entry ID number. As with use case 2, trust is implicit given the source of the data, which is indicated by a link to Dave's site's policy.

Separately, Fred combines Dave Cook's and other review feeds to provide alternative reviews of the movies by transforming the ATOM feeds into RDF and creating an aggregate view using SPARQL queries.

Use Case 5: MLK and the KKK

Fred operates an antiracism education site which aggregates and curates content from around the Web. Fred wants to label the resources that he aggregates such that educational and other institutions may harvest the resources and associated commentary and metadata automatically for reuse within their instructional support systems, etc.

One of the ways in which Fred wants to curate resources is to say about them that they are pedagogically useful but politically noxious. For example, some sites on the Web make claims about Martin Luther King, Jr that are motivated by a racist ideology and are historically indefensible. Fred's vocabulary allows him to claim that such resources are pedagogically useful for purposes of analysis, but that they are otherwise suspicious and should only be consumed by students in an age-appropriate manner or with appropriate supervision, etc. In other words, Fred needs to be able to make sharply divergent claims about resources: (1) that they are noteworthy, and (2) that they are, from his perspective, dangerous or noxious or troublesome.

Use Case 6: Scalar Classification

A company named Advance Medical Inc. reviews medical literature on the Web based on a range of quality criteria such as effectiveness and research evidence. The criteria may be changed according to current scientific and professional developments. The review process leads to literature being classified as belonging to one of 5 levels as follows.

Level A : clear evidence.
Level B : supportive evidence.
Level C : poor evidence.
Level D : expert opinion with explicit critical appraisal.
Level E : no evidence.

The company produces label data that declares the classification level value and provides a summary of each document. The label data is stored in a metadata repository which can be accessed via the Web.

M.D. Smith uses the label data in the repository to make decisions about heath care for specific clinical circumstances.

Requirements

The following requirements have been approved by the group.

It must be possible to group resources and to make assertions that apply to the group as a whole (This is fundamental to all use cases)
It must be possible to self-label (use cases 2 - 4)
To provide as complete a description as possible, labels must be able to contain unambiguous assertions using more than one vocabulary (all use cases, especially 3)
It must be possible for a content provider to make reference to third party labels (use case 2)
It must be possible to make assertions about the accuracy of claims made in a label (use case 2)
The system must be readily usable within a commercial workflow, allowing a content provider to apply metadata to a large number of resources in one step and to separate the activity of labelling from that of content creation, where desired (use case 1).
The system must support a concept of default and override metadata. The mechanism that is used to determine where overrides apply should be based on the full concept of a URI rather than, for example, just a web URL. (Use case 1, 2, 4)
It should be possible to ascertain unambiguously who created the label, using techniques such as digital signatures, S/MIME etc. (use cases 2, 3 and perhaps 5)
It must be possible for a labeling organization to make all its labels available as a single database (use case 2)
It should be possible to include assertions from an unlimited number of vocabularies in a single content label. Assertions from each vocabulary may be subject to its own verification mechanism (use case 3)
Labels should support a human-readable summary as well as the machine-readable code (all).
Labels should validate to formal published grammars (all)
It must be possible to encode labels in a compact/efficient form (all)
It must be possible to identify whether labels are self-applied or created by a third party. (use case 2)
It must be possible to discover a feedback mechanism for reporting false claims (all, especially use case 2)
It must be possible to associate labels with a 'time to live' and/or 'expiry date' (all, especially user case 2)
It must be possible to discover the date and time when a label was last verified and by whom. (all, especially use case 2)
It must be possible to describe the process by which data in labels is to be verified (use case 3)

Although not a testable requirement, the group has further resolved the principle that adding labels to resources should be easy and intuitive. It is recognized that this is likely to be made so through implementation but the design of the system should nonetheless be mindful of the principle (use case 3).

Phil Archer, Content Label Incubator Group Chair
$Revision: 1.25 $ of $Date: 2006/05/16 14:00:51 $