by Tim Krauskopf (timk@spyglass.com), Jim Miller (jmiller@w3.org), Paul Resnick (presnick@research.att.com), and Win Treese (treese@OpenMarket.com)
Revision 1, DRAFT 12 Last modified on Sun. Nov. 19, 1995
This document has been prepared for the technical subcommittee of PICS (Platform for Internet Content Selection). It defines a general format for labels that permits them to be embedded in RFC-822-style headers. It defines three methods by which PICS labels may be transmitted:
A label consists of a service identifier, label options, and a rating. The service identifier is the URL chosen by the rating service (see Rating Services and Rating Systems) as its unique identifier. Label options give additional properties of the document being rated as well as the rating itself, such as the time the document was rated. The rating itself is a set of attribute-value pairs that describe a document along one or more dimensions. One or more labels may be distributed together as a list. The general form for a label list (formatted for presentation, and not showing error status codes) is:
(PICS-1.0 <service url> [option...] labels [option...] ratings (<category> <value> ...) [option...] ratings (<category> <value> ...) ... <service url> [option...] labels [option...] ratings (<category> <value> ...) [option...] ratings (<category> <value> ...) ... ...)
Label options are as follows (some options can be abbreviated, as shown):
For example, a label that uses the example rating system from the document PICS Rating Services and Rating Systems might be as follows:
(PICS-1.0 "http://www.gcf.org" labels on "1994.11.05T08:15-0500" until "1995.12.31T23:59-0000" for "http://www.gcf.org/index.html" by "John Patrick" ratings (suds 0.5 density 0 color/hue 1))
The same label may be transmitted more compactly by converting all of the line breaks and subsequent indentation characters into a single space, and by replacing the word "labels" with "l", "ratings" with "r" and long option names with their abbreviations. It may be compressed for transmission purposes even further by removing all of the optional information to a separate document and referencing that document by a URL:
(PICS-1.0 "http://www.gcf.org" l full "http://www.gcf.org/labels/13242123" r (suds 0.5 density 0 color/hue 1))
Finally, the optional information may be omitted entirely, reducing the information content of the label but making the transmission even smaller. The resulting label would then be:
(PICS-1.0 "http://www.gcf.org" l r (suds 0.5 density 0 color/hue 1))
The following grammar, in modified BNF, describes the syntax of labels. The methods by which labels are embedded in specific protocols are detailed below.
Notes:
labellist :: '(' 'PICS-1.0' service-info+ ')' service-info :: 'error' '(no-ratings' explanation* ')' | serviceID service-error | serviceID option* labelword label* serviceID :: quotedURL labelword :: 'labels' | 'l' label :: label-error | single-label | '(' single-label* ')' single-label :: option* ratingword '(' rating+ ')' ratingword :: 'ratings' | 'r' quotedURL :: '"' URL '"' as described and extended in Rating Services and Rating Systems. option :: 'at' quoted-ISO-date | 'by' quotedname | 'comment' quotedname | 'complete-label' quotedURL | 'full' quotedURL | 'extension' '(' mand/opt quotedURL data* ')' | 'generic' boolean | 'gen' boolean | 'for' quotedURL | 'MIC-md5' "base64-string" | 'md5' "base64-string" | 'on' quoted-ISO-date | 'signature-PKCS' "base64-string" | 'until' quoted-ISO-date | 'exp' quoted-ISO-date mand/opt :: 'optional' | 'mandatory' data :: quoted-ISO-date | quotedURL | number | quotedname | '(' data* ')' quoted-ISO-date :: '"'YYYY'.'MM'.'DD'T'hh':'mmStz'"' based on the ISO 8601:1988 date and time standard, restricted to the specific form described here: YYYY :: four-digit year MM :: two-digit month (01=January, etc.) DD :: two-digit day of month (01 through 31) hh :: two digits of hour (00 through 23) (am/pm NOT allowed) mm :: two digits of minute (00 through 59) S :: sign of time zone offset from UTC ('+' or '-') tz :: four digit amount of offset from UTC (e.g., 1512 means 15 hours and 12 minutes) For example, "1994.11.05T08:15-0500" is a valid quoted-ISO-date denoting November 5, 1994, 8:15 am, US Eastern Standard Time Note: The ISO standard allows considerably greater flexibility than that described here. PICS requires precisely the syntax described here -- neither the time nor the time zone may be omitted, none of the alternate formats are permitted, and the punctuation must be as specified here. rating :: transmit-name number | transmit-name '(' multi-value* ')' multi-value :: number | number ':' number transmit-name :: [1*n]alphanumpm ['/' transmit-name] number :: [sign]unsignedint['.' [unsignedint]] sign :: '+' | '-' unsignedint :: [1*n][0-9] quotedname :: ' " ' [1*n]extendedalphanum ' " ' alphanumpm :: 'A' | ... | 'Z' | 'a' | ... | 'z' | '+' | '-' extendedalphanum :: alphanumpm | '.' | ' ' | ',' | ';' | ':' | '&' | '=' | '?' | '!' | '*' | '~' | '@' | '#' base64-string :: as defined in RFC-1521. service-error :: 'error' '(' 'request-denied' explanation* ')' | 'error' 'service-unavailable' label-error :: 'error' '(' request-denied' [quotedURL explanation*] ')' | 'error' '(' not-labeled' quotedURL* ')' explanation :: quotedname
A labellist is used to transmit a set of PICS labels. The format specified here is intended to be registered with IANA as the MIME type "application/pics-labels." It allows for transmission of both labels and reasons why labels are not available, and is the format used when labels must be conveyed in a document, along with a document, or from a PICS label bureau. The labellist will always be surrounded by parentheses and begin with the PICS version number (1.0 in this specification).
A label list either specifies that there are no labels available at all ("error (no-ratings ...)") or is separated into sections of labels, one section for each rating service. The URL of each service must be specified (the serviceID). This is either followed by an error message indicating why no labels are available from that service (service-error) or an overall set of optional information (option*) followed by the keyword "labels" (or "l") and the labels from the service. The optional information provided here applies to every label from the service, unless overridden in the specific label itself.
A label encompasses three separate cases. The first is an error that applies to retrieving the label for a particular URL (label-error). The second, and most common, is a single-label consisting of options (which override those specified with the service), the marker word "ratings" (or "r") and the ratings themselves (a list of category names and values). Finally, in the special case where the ratings for an entire tree of documents have been requested, any number of single-labels can be transmitted, enclosed in parentheses. This case is described in more detail in the section on "Requesting Labels Separately."
A label may apply to a specific URL, or it may be generic. A generic
label implicitly rates every URL for which the specified one is
a prefix. For example, a generic label for the URL "http://www.gcf.org"
implicitly rates every document available at that site. A regular
(non-generic) label for the same URL, "http://www.gcf.org",
does not give any implicit ratings: it merely rates the organization's
home page that is fetched by the command "GET /
" sent
by HTTP to the host www.gcf.org
. A generic label
must include the "for" option specifying
the URL to which it applies.
When a multi-value is provided, any combination of numbers and ranges of numbers may be specified, with the endpoints of a range separated by a ":". Thus, in the labellist
(PICS-1.0 "http://www.gcf.org" l r (suds 0.5 density 0 color/hue 1 subject (0.5:2.5 3)))all subject values between 0.5 and 2.5 (including both endpoints) apply to the item, as does the subject value 3. Given the example service description in Rating Services and Rating Systems, all three document subjects apply, "soap," "water," and "soapdish."
Many protocols, such as Internet electronic mail, the HyperText Transfer Protocol, and USENET News, use ASCII headers as described in RFC-822. For use in such protocols, we define a new header, PICS-Label, used to contain the labels described in this document. The syntax is:
PICS-Label: <labellist>
where labellist is described according to the syntax above. Continuation lines beginning with whitespace may be used following the specification given in RFC-822.
Labels may be embedded in HTML files as meta-information, using the META element defined in the HTML specification. This embedding uses the HTTP header equivalency mechanism:
<META http-equiv="PICS-Label" content='labellist'>
(Note that the content attribute uses single quotes, because the PICS label syntax uses double quotes. Any of the following characters appearing within the content must be escaped using SGML entities:
' ' /* single quote */ & & /* ampersand */ > > /* greater than */
See the HTML 2.0 Proposed Standard.
When an HTTP server sends a document to a client, it sends additional headers as well. We specify how the client can request that one or more labels be included in a header. HTTP servers should include PICS label headers only if requested to do so by the client, and should only include the labels from services requested by the client.
Client sends to HTTP server www.greatdocs.com:
GET foo.html HTTP/1.0 Accept-Protocol: {PICS-1.0 {params full {services "http://www.gcf.org/ratings"}}}
Server responds to client:
HTTP/1.0 200 OK Date: Thursday, 30-Jun-95 17:51:47 GMT MIME-version: 1.0 Last-modified: Thursday, 29-Jun-95 17:51:47 GMT Protocol: {PICS-1.0 {headers PICS-Label}} PICS-Label: (PICS-1.0 "http://www.gcf.org" labels on "1994.11.05T08:15-0500" exp "1995.12.31T23:59-0000" for "http://www.gcf.org/index.html" by "George Sanderson, Jr." ratings (suds 0.5 density 0 color/hue 1)) Content-type: text/html ...contents of foo.html...
The client requests the document foo.html. In addition, the client requests the full label of the document from the rating service "http://www.gcf.org/ratings". The server responds by sending back the label, in the PICS-Label header, as well as the document. The format of the PICS-Label header field (a labellist) allows the server to respond either with a label or an explanation of why the label is not available, since it would be inappropriate for the server to generate an HTTP error status if the document is available but (some of) the labels are not.
Following the usual HTTP distinction between HEAD and GET, a client that wishes to examine a rating before retrieving the full document can substitute the word HEAD for GET in the request. The server responds with exactly the headers shown above, but does not send back the document foo.html.
The following grammar, in modified BNF, describes the syntax of the additional header line to be included in an HTTP request for a document and associated labels.
accept-header :: 'Accept-Protocol: {PICS-1.0 {params ' [completeness] extension* services '}}' completeness :: 'minimal' | 'short' | 'full' | 'signed' extension :: '{' token-or-quoted-string+ '}' where the first token-or-quoted-string is not 'services'. token-or-quoted-string :: token | quotedname token :: [1*n]alphanumpm services :: '{' 'services' quotedURL+ '}'
A request for a minimal label asks that all options be omitted, unless a generic label is returned, in which case the generic and for options must also be included in the label. A short label includes everything that is included in a minimal label, plus additional options that the server deems appropriate. A request for a full label asks that as much information as possible should be sent back in the label, either directly or through the use of a complete-label (or full) option, but no signature-PKCS option is needed.
A request for signed labels asks that all the information in a full label should be sent, along with a digital signature on the label itself. In a signed label the information must be transmitted directly as part of the label (and included in the computation of the signature); the complete-label (or full) option may be sent, but it would be redundant. Details of signing labels are included in the section MICs and Digital Signature.
It is acceptable for a server to ignore the completeness, either by delivering more or fewer options than requested. If the completeness is omitted, it should be treated as though minimal had been supplied. For future extensibility, any alphanumeric string may be used for a value of the completeness option. Servers which receive a value of completeness that they do not recognize must treat it as though minimal had been specified.
The extensions are for future extensions to the protocol; any extensions which are not understood by the server must be ignored by it. It is recommended that experimental extensions use a URL, which dereferences to a description of the extension, as the initial token-or-quoted-string.
Each service specifies a rating service from which the client is requesting a label for the document. There may be as many repetitions of the service part of the query as desired.
Two additional headers are specified:
protocol-header :: 'Protocol: {PICS-1.0 {headers PICS-Label}}' label-header :: 'PICS-Label: ' labellist
PICS labels can also be retrieved separately from the documents to which they refer. To request labels in this way, a client contacts a label bureau. A label bureau is an HTTP server that understands a particular query syntax, defined below. It can provide labels for documents that reside on other servers, and, indeed, for documents available through protocols other than HTTP. It is anticipated that there will be "well-known" label bureaus which dispense (possibly for a fee) labels created by many rating services.
Rating services are also encouraged to act as label bureaus, providing on-line access to their own labels. By default, the URL that identifies a rating service also identifies its label bureau. If a client requests the URL that identifies a rating service, a human-readable description of the service is returned, as specified in Rating Services and Rating Systems. If, on the other hand, a client requests the same URL and includes query parameters as defined below, it should be interpreted as a request for labels. A rating service, however, is not required to act as a label bureau, and it may choose a different URL (perhaps even on a different HTTP server) to act as its label bureau.
Imagine a rating service, identified by the URL http://www.labels.org/Ratings, which decides to run a label bureau to dispense (at least) its own labels for documents. The following sample request, made to the HTTP server www.labels.org, is illustrative (line breaks are inserted for presentation purposes only):
GET /Ratings?opt=generic& u="http%3A%2F%2Fwww.questionable.org%2Fimages"& s="http%3A%2F%2Fwww.gcf.org%2Fratings"& HTTP/1.0
The query asks the label bureau http://www.labels.org/Ratings to send a single label that applies to everything in the images directory at site www.questionable.org. The desired label should have been created by the service http://www.gcf.org/ratings. Notice the use of %3A to represent a ":" and %2F for "/." This is required for encoding characters within a URL. See RFC-1738.
The label bureau responds by sending back a document of type "application/pics-labels." The labels should be as complete as possible, either by including as many options as possible or by supplying the complete-label (or full) option.
The following grammar, in modified BNF, describes the syntax of the GET request to a label bureau:
get :: 'get' url-fragment '?' [opt] [format] extension* url+ service+ url-fragment :: the part of the original URL after the host name, as specified in HTTP 1.0. opt :: 'opt=' option option :: 'generic' | 'normal' | 'tree' | 'generic+tree' format :: [and] 'format=' form form :: 'minimal' | 'short' | 'full' | 'signed' extension :: token '=' token-or-quoted-string where the token is not one of opt, format, u, or s; and token-or-quoted-string follows the quoting conventions specified in RFC-1738 token-or-quoted-string :: token | quotedname token :: [1*n]alphanumpm url :: [and] 'u=' encodedURL service :: [and] 's=' encodedURL boolean :: 't' | 'f' | 'true' | 'false' and :: '&' this must be included unless it immediately follows the ? in the query. encodedURL :: a URL, with quotation as required for inclusion within another URL. According to RFC-1738, quotation is done using "%xx" notation. Alphabetic characters, digits, and the special characters $_-.+!*'(), need not be quoted, but other characters must be. This does imply that the colon (:) must be encoded as %3A and slash (/) as %2F.
Notes:
The label bureau responds by sending back a document of type "application/pics-labels." Unless the document indicates an overall error, there should be one service-info for each rating service requested in the query. Each service-info should have an error message or a label (or list of labels, in the case of a "tree" query) for each requested URL.
The query's ordering must be preserved in the response. That is, the information from the rating services must be presented in the same order the rating services appear in the query, and the labels from each service must be presented in the same order the URLs appear in the query. If a rating service or label is not provided, the error message should appear in the same position that the service-info or label would appear. Because order is preserved, it is acceptable to omit from the labels the "for" option which indicates the URL being rated (unless the label is generic in which case, as always for generic labels, the for is required.) The client should match the label positionally with the URL for which it requested a rating.
In response to a request for a generic label, only a generic label may be returned. In response to a request for a regular label, a generic label for a URL that is a prefix of the requested URL may be returned. For example, in response to a label request for URL "http://www.gcf.org/index.html" a generic label for the URL "http://www.gcf.org" may be returned. In this case, it is required that the "for" and "generic" options be included in the label, to specify exactly what rating is being returned.
For a tree request, all the labels sent in response to a particular URL are enclosed in parentheses, so the client can match them positionally with the single request URL. The "for" option must be included in such labels to specify exactly which URLs the labels apply to.
This section remains to be specified. There are three particular difficulties that must be addressed:
Brenda Baker, AT&T Tim Berners-Lee, W3C Roxana Bradescu, AT&T Daniel W. Connolly, W3C Roy Fielding, W3C Jay Friedland, SurfWatch Michael Gordon, Prodigy Wayne Gramlich, Sun Woodson Hobbs, NewView Rohit Khare, W3C Charlie Kim, Apple John C. Klensin, MCI Ann McCurdy, Microsoft Rich Petke, CompuServe Dave Raggett, W3C Bob Schloss, IBM David Singer, IBM Michael Smith, Prodigy Marcy Swenson, Providence Systems Jason Thomas, MIT
This section is not expected to be contained in future versions of this document.
Instead of extending HTTP, we considered proposals for special-purpose label transport protocols. Before making a final decision, we constructed the following lists of pros and cons.
This section is not expected to be contained in future versions of this document.
The PICS premise is that all compliant clients will have to implement some new protocol. The subset of HTTP which would be required for obtaining a PICS label can be minimal. HTTP will be no more difficult to implement in an FTP (or other) client than a brand-new protocol that provides similar features.
If you minimize the server's response to one line, plus the label information, you are already dealing with the minimum amount of data transfer possible to obtain a label. In addition, most performance issues for PICS will probably be addressed with caching, not by reducing lookup time for a single label. Caching optimization requires meta-data which can be easily encoded within HTTP headers.
The worst risk seems to be that HTTP could be upgraded to a new revision level forcing some HTTP implementations to support multiple versions (1.0 and 2.0, for example) or forcing some PICS bureaus to update their protocol as well. Hopefully a major update in HTTP would bring enough benefits for PICS to make any update worthwhile.
PICS faces many of the same problems that face the security and electronic payment community. In PICS the issue revolves around the ability for the client to tell the server from which rating services it would like to have labels. This is a simple negotiation problem of the kind PEP was designed to solve. Rather than invent an orthogonal mechanism it seemed best to use one that is already being proposed and investigated.