W3C Content Labels

1 Introduction (Informative)

As set out in the charter, the group looked for "... a way of making any number of assertions about a resource or group of resources." Furthermore "... those assertions should be testable in some way through automated means."

fwiw, I think we should revisit what "testable" means - i.e. an assertion that I wrote a document is not "testable" as such but is an assertion that we need to be able to make within the scope of what we are trying to do.

also fwiw, I think that we need to be clear what we mean by automatic testing. We definitely mean that the assertions need to be expressed in a form that an automated tester - or any automated process - can parse effectively. We do not mean, imo, that assertions that are not open to automatic verification are not valid assertions. As Paul often points out, the WAI stuff is partly testable.

Further, imo, I think we need to introduce the concept of verification, verifiable, authentic and so on.

summary: I think we need to include a section on refinement and clarification of charter following the study the group has put in

In addition to its original sponsors, the XG attracted participation from AT&T, AOL Inc., the Center for Democracy and Technology, Centre Virtuel de la Connaissance sur l'Europe, the Internet Association of Japan, Maryland Information and Network Dynamics Lab at the University of Maryland, Opera Software and RuleSpace. The diverse membership reflects a widely recognized need to be able to "label content" for various purposes. These range from child protection through to scientific accuracy, from the identification of mobile-friendly and/or accessible content to linking of thematically-related resources.

Initial work focused on revising and improving the use cases given in the charter, and the derivation of high level requirements from them [use]. Based on that work, the group was able to create a more detailed set of requirements and definitions that are more readily expressed in technological terms.

Editor's note: Add more when possible about the tech used here, where RDF is appropriate, where it is not etc.

Process by which the conclusions were reached.

Is there need for discussion the above plus SysArch etc. and the other stuff that actually gets done, as well as work that is yet to be done. Need for reference to use cases?

2 Detailed requirements and definitions (Informative)

Definitions

assertion - the expression of an opinion at a point in time using a defined vocabulary about a resource or set of resources and optionally their directly or indirectly linked resources.

resource - the result of dereferencing a URI

vocabulary - a set of named properties are applicable to an aspect of [experiencing] a resource or to the resource itself, together with the data type of the values that those properties may assume and optionally a constrained list of such values. i.e. it defines the possible values the meaning and relationship of x and y in the statement "The value of property x in respect of Resource R is y"

Note to self mainly, if x has primarily boolean values true/false should it also be possible to code for "unknown", "indeterminate", "partly true".

Certificate -

Trustmark -

Validation

Claim- an assertion that is subject to validation

and so on ...

Fundamentals

It must be possible for both content provider [resource creator] and third parties to make assertions about resources.
The assertions must be able to be expressed in terms chosen from different vocabularies. Such vocabularies might include, but are not limited to, those that describe a resource's suitability for children, its conformance with accessibility guidelines and/or Mobile Web Best Practice, its scientific accuracy and the editorial policy applied to its creation.
It must be possible to group assertions from an arbitrary number of vocabularies. A group of assertions of this kind is called a content label, written cLabel. Editor's note: cLabel or CL?
It must be possible to group resources and have cLabels refer to that group of resources. That is, cLabels can refer to all the pages of a Web site, defined sections of a Web site, or all resources on multiple Web sites. Editor's note - is this broad enough?
It must be possible for both content authors and third parties to make cLabels for any given resource

Label semantics

A cLabel is the expression of claims made only by the party that created it.It is the expression of an opinion of a person or organization or an automaton, from a point of view (limited by the vocabulary or vocabularies chosen), potentially qualified by the limitations of present knowledge, expressed at a point in time about the current state of ...)
Vocabularies must support the grouping of assertions so that a single assertion can take the place of a number of other assertions. (i.e. e.g. the vocabularies must support the idea of making granular assertions as well as composite assertions. Examples include WAI AAA, mobileOK and age-based classifications.)
Labels must also support the addition or subtraction from composite assertions - the first of these would appear to be supported by the ability to group a composite assertion with atomic ones. The second would appear to require in addition, negative assertions. Further, those negative assertions would, for the sake of a client being able to understand the meaning of a composite assertion which is subtracted from, without having to parse the relevant vocabulary, hence negative assertion probably need to be nested in a composite assertion. Example AAA minus x and y. Vocabulary maybe should be able to constrain whether negative assertions can be made??? Also this suggests requirements as to the usability or utility of labels absent the ability to retrieve and/or parse the accompanying vocabs.
More than one cLabel can refer to the same resource or group of resources. Since conflicting labels are therefore permissible, their acceptance lies with the end user
It must be possible for a resource to refer to one or more cLabels. It follows that there must be a linking mechanism between content and labels.
cLabels must be able independently to point to any content
It must be possible to make assertions about cLabels using appropriate vocabularies. That is e.g., a cLabel can have metadata describing who created it, what its period of validity is, how to provide feedback about it, a who last verified it and when.
It must be possible for a cLabel to be associated with its metadata and vice versa.

Fitting in with commercial or other large scale workflows

It must be possible for cLabels and cLabel metadata to be authenticated. That is, it must be possible to verify that a cLabel is an authentic expression of its creator's claims (i.e that the person who said they created it really did express the opinions represented)
It must be possible to validity opinions about a label and / or its metadata - that is that it must be possible to either affirm or deny assertions that are made in the label or its metadata.
It must be possible (independently) to link to and from validity opinions and what they are expressing opinions about. [this is actually true of all types of opinion, I think]
It must be possible to create and edit cLabels without modifying the resources they describe.But this need not be the only or even primary means of adding labels to content.
cLabels I think it is actually the vocabularies that must support defaults must support defaults.
cLabels must be able to override defaults.
It must be possible for a labeling organization to make all its labels data available and to define the means through which it can be accessed. This may be through a Web Service, as an xml file or another means.does this mean that it must be possible to group cLabels? or does it just mean that cLabels, however they are stored, must be accessible. If so I wonder why exactly we are saying it? See also the non-requirement that labels from different labellers do not need be be grouped [below]

Encoding labels for humans and machines

It must be possible to express cLabels and cLabel metadata in a machine readable way.
The machine readable form of a cLabel and cLabel metadata must be defined by a formal grammar
cLabels must provide support for a human readable summary of the claims it contains
u. It must be possible to express cLabels and cLabel metadata in a compact form
vocabularies also need encoding. So do validation statements.

vocabularies need identifiers and resolution mechanisms.

Non requirements

It is not necessary for a cLabel to consist of assertions that are made by different entities. [i.e. a cLabel is the expression of opinion of only one party per f. above and there is no foreseen requirement to group labels from more than one party]

2.1 A cLabel and its metadata

The requirements above can be expressed in a more programmatic way as follows. A Content Label (cLabel) can carry a variety statements such as:

cLabel {
  That resource R has the property P1 is true
  That resource R has Property P2 that has value V
  That resource R meets WCAG 1.0 AA is true
  That resource R was created in accordance with satisfactory procedures is 
true
}

I think that labels, metadata validity statements and possibly individual assertions need unique and unambiguous identifiers?

Where R may be either a single resource identified by its own URI or a group of resources.

Think that the resource R cannot only be identified by its URI it may also need identification by its creation time (or version). An assertion is made at a specific point in time about a specific version of a resource. It may be that in practice you can tell whether the assertion matches the current version of the resource by looking at the creation time of the label and the creation time of the resource and if the latter is later than the former, then the label is inapplicable. But this in turn introduces the question of whether in some cases (like the Kai-Dietrich get out clause) that the label refers to no particular version of the resource it refers to the way in which resources are created.

A URI may stand in for a group of resources, as follows:

it may refer to all resources that match the pattern defined by the URI [which is actually an expression with wild-cards?] with the proviso that those resources must be reachable by following iteratively links originating in the original URI - e.g. http://www.w3c.org/ means all URIs that are linked from the home page that match http://www.w3c.org/*
a list of URIs where no such linking is implied
there may be exclusions from the list
the target URI may include transcluded content and the label may refer to the transcluded content (in which case the exclusions are also needed e.g. this page is fine for kids with the exception that the little photo at the bottom may be upsetting to some)
the target URI may include links and the label may refer to the targets of such links (this is a general case of 1 above)

The assumption here is that if, for example, the definition of WCAG 1.0 AA is looked up, it will be seen that it includes WCAG 1.0 A plus a several atomic statements. The cLabel makes the basic statement, the data that a client looks up reveals the semantics of the data and perhaps how it relates to other elements.

Further, it is necessary to be able to make statements like:

metadata {
cLabel was created by $organization
    {
      has the e-mail address mail@organization.org
      has a homepage at $url
      has a feedback page at $URL
       ...
    }
cLabel was created on $date
cLabel was last reviewed by $person
}

Finally, it is necessary to be able to send a real-time request to $organization seeking automatic confirmation that it was responsible for creating the cLabel, i.e. authenticating the label and the claims made.

Also need to be able to make statements like:

validity {
metadata and cLabel verified by $organization  
  { has email sss ... }
verified on $date}

System architecture (Informative or Normative)

We could a) define a specific architecture, or; b) give one or more examples of how cLabels can be used and authenticated.

The Quatro model provides on example, actually on example with three variants. See http://www.quatro-project.org/howto/Default.htm#s4

Other options include Chris Bizer's work on TriQL.P Trust Architecture

The PRIME Project This is more about identity than trust as such

Jen Goldbeck's Trust on the Web project, Film Trust etc.

And we could posit the general statement that cLabels constitute a single data point that could be used in any framework.

Need to be able to do simple things like embed labels in their metadata and embed that metadata in the content that they refer to.

Actors

Do the above cited requirements cope with the requirements and the following actors?

Roles may be shared by single actual entities. Multiple actual entities may share roles. Entities may be human or automated.

Content Creator - A creator of content

Content Label[l]er – Entity that expresses an opinion about content. May be the content creator

Portal Provider – Entity that serves content to end user.

End User – Entity that ultimately consumes content

User Agent - A means of retrieving and rendering Web content for End Users mediated by the use of labels.

User Agent Provider - Entity that provides a tool that renders, decorates or otherwise differentiates content on the basis of label information.

Vocabulary Provider – Entity responsible for creation and maintenance of vocabularies

Certification Provider – Entity that verifies the claims of a content provider.

Search Provider – Entity that provides a tool or service that uses [in whole or in part] the content of labels to discriminate content.

Encodings (Normative)

Can we do this in two ways?

RDF Model

The RDF model - how to express cLabels, cLabel metadata and how to express defaults and overrides.

Key question remains: how to group URIs. There was a discussion about this between Sandro Hawke and Dan Bri on the public mailing list. Essentially we need:

a "uri" property, linking things to strings which are unambiguous names (URIs) for them.
a simple rule language.
The rule language has to have simple/common string functions.

We must be mindful of SPARQL's abilities here. http://www.w3.org/TR/2006/WD-rdf-sparql-query-20060220/#SparqlOps looks relevant.

Sando also offered...that an application specific approach like:

<http://www.w3.org/> rdf:type icra:ChildSafeSiteTree.

seems fine, and in the presence of suitable rules could generate

<...> rdf:type irca:ChildSafe.

as necessary. The rule might be:

      if
         icra:ChildSafeSiteTree(y)
         y.uri = prefix
         x.uri.startsWith(prefix)
      then
         icra:ChildSafe(x)

RDF-CL offers this kind of approach already:

<label:Ruleset>
  <label:hasHostRestrictions>
    <label:Hosts>
      <label:hostRestriction>example.org</label:hostRestriction>
    </label:Hosts>
  </label:hasHostRestrictions>
  <label:hasDefaultLabel rdf:resource="#label_1" />
</label:Ruleset>

Where "label_1" contains the actual descriptive claims.

A non-RDF method

How about something based on the P3P methodology - it's XML, not RDF, based and has a separate Policy Reference File that tells an agent where to get the actual policy for a given URI, then you get the actual policy (label). The example Policy Reference File (confusingly for those us with a PICS background called a PRF)looks like this:

<META xmlns="http://www.w3.org/2002/01/P3Pv1">
 <POLICY-REFERENCES>
  <EXPIRY max-age="172800"/>

    <POLICY-REF about="/P3P/Policies.xml#first">
      <INCLUDE>/*</INCLUDE>
      <EXCLUDE>/catalog/*</EXCLUDE>
      <EXCLUDE>/cgi-bin/*</EXCLUDE>
      <EXCLUDE>/servlet/*</EXCLUDE>
    </POLICY-REF>

    <POLICY-REF about="/P3P/Policies.xml#second">
      <INCLUDE>/catalog/*</INCLUDE>
    </POLICY-REF>

    <POLICY-REF about="/P3P/Policies.xml#third">
      <INCLUDE>/cgi-bin/*</INCLUDE>
      <INCLUDE>/servlet/*</INCLUDE>
      <EXCLUDE>/servlet/unknown</EXCLUDE>
    </POLICY-REF>

 </POLICY-REFERENCES>
</META>

First named editor of the P3P spec is Lorrie Cranor who was then with AT&T, now at CMU.