1 Introduction
This document discusses the possibility of a new PICS label type, default label, in addition to the existing specific and generic label type. This work is motivated from the idea that currently defined generic and specific label types are not expressive enough for some practical purposes.
Section 2 of this document reviews the current definitions of generic and specific labels. Section 3 redefines generic labels and introduces new default label type. Section 4 describes typical scenarios where different type of labels are used under different circumstances. Section 5 suggests protocols how client requests PICS labels from server. Section 6 summarizes what needs to be changed to the existing specification.
This work is a result from the discussion in the PICS meeting held January 1997 in London. At the time a committee was formed consisting of four volunteers to further investigate this issue. The members are Yang-hua Chu from MIT/W3C (yhchu@w3.org), Daniel Dardailler from W3C (danield@w3.org) , Martin Presler-Marshall from IBM (martinm@raleigh.ibm.com) , and Bob Schloss from IBM (schloss@watson.ibm.com). Your comments are appreciated, please direct all comments to the four of us.
2 Review of Current Definitions
As defined in the PICS Version 1.1 specification, there are two types of PICS labels. Core definitions are extracted below:
Specific label: applies to a single document. If the document is in HTML format, it may refer to other documents, either by external reference (for example, using the <A href=...> tag) or by requesting that they be displayed in-line (for example, using the <img ...> or <object ...> tag). A label applies to the given document only, not to the referenced documents.
Generic label: applies to any document whose URL begins with a specific string of characters (specified using the for option). A generic label does not have the expected semantics of a "default" label that can be overridden by more specific labels. While a specific label does override a generic label when a client has access to both, the two labels may be distributed separately, and thus a client may have access to only the generic label. A server can keep track of defaults and overrides and generate a specific label based on a default that is not overridden in its local database. However, a generic label for a site or directory should only be distributed if it applies to all the documents in that site or directory.
Definition of the specific label is clear but the generic label is ambiguous. The ambiguity comes from where and how much the client should look for the specific labels which may override the generic label.
3 Suggested New Definitions
Here we clarify the definition of generic label, and define the new label type based on generic label.
Generic Label: applies to any document whose URL begins with a specific string of characters (specified using the for option). The generic label is overridden only by those labels that come in the same PICS Labellist as the generic label. The overriding labels can be specific, generic, or default type.
Default Label: applies to any document whose URL begins with a specific string of characters (specified using the for option) as a generic label. The default label is overridden by a specific, generic or a default label that has higher preference (longer for string) as a generic label. It is required that the client must first query the server to ensure there is no better specific labels, before the default label is validated.
4 Typical Cases and Suggested Usage
Case 1: If a web site or directory has the same default ratings across all its descendants except for a few, a generic label should be used. Specifically, the rating of the web site or directory is expressed in a Labellist, containing a generic label and a list of exception descendant labels. The generic label has the default rating for the whole web site or directory. The exception descendant labels can be of any type (specific, generic, or default), and if applicable, takes precedence over the generic label when read by the client.
Case 2: If a web site or a directory has the same default ratings across most its descendants, but does not know in priori which descendants are exceptions, a default label should be used. It is assumed that the web administrator must provide all descendants ratings if they are different from the default label. Default labels can also be used if the exception URLs are too long to be bundled in a Labellist for efficiency reason.
Case 3: If a web site or a directory is under multi-party control, both generic or default labels may not be used to rate this site or directory. For example, a university site where there is a part administered by the university (registrar’s office) and the other part by the student body (student homepages). This requires a label to express a default rating with a list of unknown URLs, and the unknown URLs may not be rated appropriately. This type of default with unknown exception labels cannot be expressed under current or proposed PICS label specification.
5 Transmission Method
PICS defines three methods by which labels may be transmitted. In this section we provide a guideline for each type of PICS label, how client should query, how server should respond, and how client should interpret server’s response. The goal here is find an optimal way to maximize efficiency between the server and the client, while leaving no ambiguity for the client where to fetch information and whether it has enough information to make a decision.
From label bureaus
The label bureaus should never need to dispense a default label, because with or without a default label, a client always needs to make one query to the label bureau. Hence, holding the default label in the client side does not improve any efficiency in communication. Default labels may be used internally to build a smaller database of PICS labels, but label bureaus should always return only generic and specific labels.
Via HTTP header
[ Here I assume a ‘smart’ server: if the server returns ‘label not found’ for a particular URL, then the server contains NO label of any type that can apply to that URL. Is it a valid assumption that most web servers with this transmission capability are ‘smart’? ]
The web server should never dispense a default label, because of the same reason as the case of label bureaus.
Embedded in the documents
The PICS labels are embedded in the documents, if the server (needs not be a HTTP server) does not support a standard header stream, or if there is no server at all. A typical example is a large collection of web pages in a CD-ROM accessible directly by the client.
Assuming the client has an empty cache, the client first queries the server for the URL of interest, let’s say "http://w3.org/pub/WWW/". If a PICS label of any type is embedded in the document, the protocol stops here because that embedded label(s) apply directly to the URL. If no label is embedded, the client needs to traverse up the chain to get either a generic or a default label that applies to the URL (first "http://w3.org/pub/" and then "http://w3.org/").
If the client has in its cache a default label for "http://w3.org/pub/", the client still needs to make one query to "http://w3.org/pub/WWW/". If a PICS label of any type is embedded in the document, that label takes precedence. Otherwise, the cached default label wins.
If the client has in its cache a default label for "http://w3.org/", then the client needs to make the first query to "http://w3.org/pub/WWW/". If a PICS label of any type is found, the protocol stops and the found label takes precedence. If not, the client makes a second (and last) query to "http://w3.org/pub/". If a label exists, it wins, otherwise the default label wins.
It is worth noting that the client can also cache ‘no PICS label found in this URL" information to speed up later queries. From pervious example, if both queries fail to get any label, the client can cache "http://w3.org/pub/" and "http://w3.org/pub/WWW/" as "no label in this directory", so when the client is asked about "http://w3.org/pub/WWW/hello.html", it only needs to make one query to the server instead of three.
6 Suggested Change of Current Specification
7 Additional Questions
Appendix A: Glossary
[to be fill out]