PICS Label Distribution

Label Syntax and Communication Protocols

by Tim Krauskopf (timk@spyglass.com), Jim Miller (jmiller@w3.org), Paul Resnick (presnick@research.att.com), and Win Treese (treese@OpenMarket.com)

Revision 1, DRAFT 12 Last modified on Sun. Nov. 19, 1995

Overview

This document has been prepared for the technical subcommittee of PICS (Platform for Internet Content Selection). It defines a general format for labels that permits them to be embedded in RFC-822-style headers. It defines three methods by which PICS labels may be transmitted:

In a document: One or more labels may be embedded in a document. We specify the format and note in particular how to use a META tag to embed labels in HTML documents.
With a document: An HTTP client can request that labels be sent along with a document. An HTTP server can satisfy the request, by sending the labels in RFC-822-style headers.
Separately: A client can request labels from a "label bureau" that runs the HTTP protocol. The labels may refer to items available through protocols other than HTTP, such as ftp, gopher, or netnews. The simplest implementation of a label bureau is an off-the-shelf HTTP server running a special CGI script.

General Format

A label consists of a service identifier, label options, and a rating. The service identifier is the URL chosen by the rating service (see Rating Services and Rating Systems) as its unique identifier. Label options give additional properties of the document being rated as well as the rating itself, such as the time the document was rated. The rating itself is a set of attribute-value pairs that describe a document along one or more dimensions. One or more labels may be distributed together as a list. The general form for a label list (formatted for presentation, and not showing error status codes) is:

         (PICS-1.0
           <service url> [option...] 
           labels [option...] ratings (<category> <value> ...)
                  [option...] ratings (<category> <value> ...)
                  ...
           <service url> [option...] 
           labels [option...] ratings (<category> <value> ...)
                  [option...] ratings (<category> <value> ...)
                  ...
           ...)

Label options are as follows (some options can be abbreviated, as shown):

at quoted-ISO-date: The last modification date of the item to which this rating applies, at the time the rating was assigned. This can serve as a less expensive, but less reliable, alternative to the message integrity check (MIC) options.
by quotedname: An identifier for the person or entity within the rating service who is responsible for this particular label.
comment quotedname: Information for humans who may see the label; no associated semantics.
complete-label quotedURL
full quotedURL: Dereferencing this URL returns a complete label that can be used in place of the current one. The complete label has values for as many attributes as possible. This is used when a short label is transmitted for performance purposes but additional information is also available. When the URL is dereferenced it returns an item of type application/pics-labels that contains a labellist with exactly the one label.
extension (optional quotedURL data*)
extension (mandatory quotedURL data*): Future extension mechanism. To avoid duplication of extension names, each extension is identified by a quotedURL. The URL can be dereferenced to get a human-readable description of the extension. If the extension is optional then software which does not understand the extension can simply ignore it; if the extension is mandatory then software which does not understand the extension should act as though no label had been supplied. Each item of data must be one of a fixed set of simple-to-parse data types as specified in the detailed syntax below.
for quotedURL: The URL of the item to which this rating applies.
generic boolean
gen boolean: This label can be applied to any URL starting with the prefix given in the for option. This is used to supply ratings for entire sites or directories.
MIC-md5 "Base64-string"
md5 "Base64-string": A message integrity check (MIC) of the item being rated. The MD5 Message Digest Algorithm is used to compute the MIC. See RFC1321.
on quoted-ISO-date: The date on which this rating was issued.
signature-PKCS "Base64-string": An RSA digital signature encompassing the label as transmitted, signed by the rating service that issued the label. See MICs and Digital Signatures below.
until quoted-ISO-date
exp quoted-ISO-date: The date on which this rating expires.

Example

For example, a label that uses the example rating system from the document PICS Rating Services and Rating Systems might be as follows:

     (PICS-1.0 "http://www.gcf.org"
       labels on "1994.11.05T08:15-0500"
              until "1995.12.31T23:59-0000"
              for "http://www.gcf.org/index.html"
              by "John Patrick"
              ratings (suds 0.5 density 0 color/hue 1))

The same label may be transmitted more compactly by converting all of the line breaks and subsequent indentation characters into a single space, and by replacing the word "labels" with "l", "ratings" with "r" and long option names with their abbreviations. It may be compressed for transmission purposes even further by removing all of the optional information to a separate document and referencing that document by a URL:

     (PICS-1.0 "http://www.gcf.org" l
       full "http://www.gcf.org/labels/13242123"
       r (suds 0.5 density 0 color/hue 1))

Finally, the optional information may be omitted entirely, reducing the information content of the label but making the transmission even smaller. The resulting label would then be:

(PICS-1.0 "http://www.gcf.org" l r (suds 0.5 density 0 color/hue 1))

Detailed Syntax

The following grammar, in modified BNF, describes the syntax of labels. The methods by which labels are embedded in specific protocols are detailed below.

Notes:

Whitespace is ignored except in quoted strings.
The string in a transmit-name is case insensitive. All other strings are case sensitive.
Option names ("on," "until," "at," etc.) are case insensitive.
This specification requires the use of US-ASCII. Note that the document PICS Rating Services and Rating Systems describes how a service can map the US-ASCII transmit-names to descriptive strings using other character sets.
An option that appears in the service-info applies to all labels in that service-info unless overridden by an option in a specific label. That is, a label is effectively lexically nested within the enclosing service-info for the purpose of understanding the applicable options. This is most likely to be useful in the case of the at, by, generic, until and experimental or future options.
Numbers in PICS labels may be integers or fractions with no greater range or precision than that provided by IEEE single-precision floating point numbers.
The multi-value syntax must be used when the value on a particular (multi-valued) scale has either zero or more than one value. It may be used for a single-valued or multi-valued field when there is exactly one value, but the more compact version may also be used in that case.
The only options that may occur more than once in a single label are comment and extension; if the extension option is supplied more than once, the quotedURLs defining the extensions must be distinct.

labellist :: '(' 'PICS-1.0' service-info+ ')'
service-info :: 'error' '(no-ratings' explanation* ')'
              | serviceID service-error | serviceID option* labelword label*
serviceID :: quotedURL
labelword :: 'labels' | 'l'
label :: label-error | single-label | '(' single-label* ')'
single-label :: option* ratingword '(' rating+  ')'
ratingword :: 'ratings' | 'r'
quotedURL :: '"' URL '"' as described and extended in
             Rating Services and Rating Systems.
option :: 'at' quoted-ISO-date        
        | 'by' quotedname
        | 'comment' quotedname        
        | 'complete-label' quotedURL | 'full' quotedURL
        | 'extension' '(' mand/opt quotedURL data* ')'
        | 'generic' boolean          | 'gen' boolean 
        | 'for' quotedURL
        | 'MIC-md5' "base64-string"  | 'md5' "base64-string"
        | 'on' quoted-ISO-date        
        | 'signature-PKCS' "base64-string"
        | 'until' quoted-ISO-date    | 'exp' quoted-ISO-date
mand/opt :: 'optional' | 'mandatory'
data :: quoted-ISO-date | quotedURL
        | number | quotedname | '(' data* ')'
quoted-ISO-date :: '"'YYYY'.'MM'.'DD'T'hh':'mmStz'"'
     based on the ISO 8601:1988 date and time standard, restricted
     to the specific form described here:
     YYYY :: four-digit year
     MM :: two-digit month (01=January, etc.)
     DD :: two-digit day of month (01 through 31)
     hh :: two digits of hour (00 through 23) (am/pm NOT allowed)
     mm :: two digits of minute (00 through 59)
     S  :: sign of time zone offset from UTC ('+' or '-')
     tz :: four digit amount of offset from UTC
           (e.g., 1512 means 15 hours and 12 minutes)
     For example, "1994.11.05T08:15-0500" is a valid quoted-ISO-date
     denoting November 5, 1994, 8:15 am, US Eastern Standard Time
     Note: The ISO standard allows considerably greater
     flexibility than that described here.  PICS requires precisely
     the syntax described here -- neither the time nor the time zone may
     be omitted, none of the alternate formats are permitted, and
     the punctuation must be as specified here.
rating :: transmit-name number | transmit-name '(' multi-value* ')'
multi-value :: number | number ':' number
transmit-name :: [1*n]alphanumpm ['/' transmit-name]
number :: [sign]unsignedint['.' [unsignedint]]
sign :: '+' | '-'
unsignedint :: [1*n][0-9]
quotedname :: ' " ' [1*n]extendedalphanum ' " '
alphanumpm :: 'A' | ... | 'Z' | 'a' | ... | 'z' | '+' | '-'
extendedalphanum :: alphanumpm | '.' | ' ' | ',' | ';' | ':'
                   | '&' | '=' | '?' | '!' | '*' | '~' | '@' | '#'
base64-string :: as defined in RFC-1521.
service-error :: 'error' '(' 'request-denied' explanation* ')'
               | 'error' 'service-unavailable'
label-error :: 'error' '(' request-denied' [quotedURL explanation*] ')'
             | 'error' '(' not-labeled' quotedURL* ')'
explanation :: quotedname

Semantics of PICS Labels and Label Lists

A labellist is used to transmit a set of PICS labels. The format specified here is intended to be registered with IANA as the MIME type "application/pics-labels." It allows for transmission of both labels and reasons why labels are not available, and is the format used when labels must be conveyed in a document, along with a document, or from a PICS label bureau. The labellist will always be surrounded by parentheses and begin with the PICS version number (1.0 in this specification).

A label list either specifies that there are no labels available at all ("error (no-ratings ...)") or is separated into sections of labels, one section for each rating service. The URL of each service must be specified (the serviceID). This is either followed by an error message indicating why no labels are available from that service (service-error) or an overall set of optional information (option*) followed by the keyword "labels" (or "l") and the labels from the service. The optional information provided here applies to every label from the service, unless overridden in the specific label itself.

A label encompasses three separate cases. The first is an error that applies to retrieving the label for a particular URL (label-error). The second, and most common, is a single-label consisting of options (which override those specified with the service), the marker word "ratings" (or "r") and the ratings themselves (a list of category names and values). Finally, in the special case where the ratings for an entire tree of documents have been requested, any number of single-labels can be transmitted, enclosed in parentheses. This case is described in more detail in the section on "Requesting Labels Separately."

A label may apply to a specific URL, or it may be generic. A generic label implicitly rates every URL for which the specified one is a prefix. For example, a generic label for the URL "http://www.gcf.org" implicitly rates every document available at that site. A regular (non-generic) label for the same URL, "http://www.gcf.org", does not give any implicit ratings: it merely rates the organization's home page that is fetched by the command "GET /" sent by HTTP to the host www.gcf.org. A generic label must include the "for" option specifying the URL to which it applies.

When a multi-value is provided, any combination of numbers and ranges of numbers may be specified, with the endpoints of a range separated by a ":". Thus, in the labellist

 
(PICS-1.0 "http://www.gcf.org" l 
  r (suds 0.5 density 0 color/hue 1 subject (0.5:2.5 3)))

all subject values between 0.5 and 2.5 (including both endpoints) apply to the item, as does the subject value 3. Given the example service description in Rating Services and Rating Systems, all three document subjects apply, "soap," "water," and "soapdish."

RFC-822 Headers

Many protocols, such as Internet electronic mail, the HyperText Transfer Protocol, and USENET News, use ASCII headers as described in RFC-822. For use in such protocols, we define a new header, PICS-Label, used to contain the labels described in this document. The syntax is:

PICS-Label: <labellist>

where labellist is described according to the syntax above. Continuation lines beginning with whitespace may be used following the specification given in RFC-822.

Embedding Labels in HyperText Markup Language (HTML)

Labels may be embedded in HTML files as meta-information, using the META element defined in the HTML specification. This embedding uses the HTTP header equivalency mechanism:

       <META http-equiv="PICS-Label" content='labellist'>

(Note that the content attribute uses single quotes, because the PICS label syntax uses double quotes. Any of the following characters appearing within the content must be escaped using SGML entities:

        '       &#39;           /* single quote */
        &       &amp;           /*  ampersand   */
        >       &gt;            /* greater than */

See the HTML 2.0 Proposed Standard.

Sending Labels With A Document

When an HTTP server sends a document to a client, it sends additional headers as well. We specify how the client can request that one or more labels be included in a header. HTTP servers should include PICS label headers only if requested to do so by the client, and should only include the labels from services requested by the client.

Example

Client sends to HTTP server www.greatdocs.com:

GET foo.html HTTP/1.0
Accept-Protocol: {PICS-1.0 {params full {services "http://www.gcf.org/ratings"}}}

Server responds to client:

HTTP/1.0 200 OK
Date: Thursday, 30-Jun-95 17:51:47 GMT
MIME-version: 1.0
Last-modified: Thursday, 29-Jun-95 17:51:47 GMT
Protocol: {PICS-1.0 {headers PICS-Label}}
PICS-Label:
 (PICS-1.0 "http://www.gcf.org" labels
  on "1994.11.05T08:15-0500"
  exp "1995.12.31T23:59-0000"
  for "http://www.gcf.org/index.html"
  by "George Sanderson, Jr."
  ratings (suds 0.5 density 0 color/hue 1))
Content-type: text/html
...contents of foo.html...

Explanation of example

The client requests the document foo.html. In addition, the client requests the full label of the document from the rating service "http://www.gcf.org/ratings". The server responds by sending back the label, in the PICS-Label header, as well as the document. The format of the PICS-Label header field (a labellist) allows the server to respond either with a label or an explanation of why the label is not available, since it would be inappropriate for the server to generate an HTTP error status if the document is available but (some of) the labels are not.

Following the usual HTTP distinction between HEAD and GET, a client that wishes to examine a rating before retrieving the full document can substitute the word HEAD for GET in the request. The server responds with exactly the headers shown above, but does not send back the document foo.html.

Detailed Syntax of HTTP Requests for Labels With Document

The following grammar, in modified BNF, describes the syntax of the additional header line to be included in an HTTP request for a document and associated labels.

accept-header :: 
 'Accept-Protocol: {PICS-1.0 {params ' [completeness] extension* services '}}'
completeness :: 'minimal' | 'short' | 'full' | 'signed'
extension :: '{' token-or-quoted-string+ '}'
     where the first token-or-quoted-string is not 'services'.
token-or-quoted-string :: token | quotedname
token :: [1*n]alphanumpm
services :: '{' 'services' quotedURL+ '}'

A request for a minimal label asks that all options be omitted, unless a generic label is returned, in which case the generic and for options must also be included in the label. A short label includes everything that is included in a minimal label, plus additional options that the server deems appropriate. A request for a full label asks that as much information as possible should be sent back in the label, either directly or through the use of a complete-label (or full) option, but no signature-PKCS option is needed.

A request for signed labels asks that all the information in a full label should be sent, along with a digital signature on the label itself. In a signed label the information must be transmitted directly as part of the label (and included in the computation of the signature); the complete-label (or full) option may be sent, but it would be redundant. Details of signing labels are included in the section MICs and Digital Signature.

It is acceptable for a server to ignore the completeness, either by delivering more or fewer options than requested. If the completeness is omitted, it should be treated as though minimal had been supplied. For future extensibility, any alphanumeric string may be used for a value of the completeness option. Servers which receive a value of completeness that they do not recognize must treat it as though minimal had been specified.

The extensions are for future extensions to the protocol; any extensions which are not understood by the server must be ignored by it. It is recommended that experimental extensions use a URL, which dereferences to a description of the extension, as the initial token-or-quoted-string.

Each service specifies a rating service from which the client is requesting a label for the document. There may be as many repetitions of the service part of the query as desired.

Detailed Syntax For HTTP Response Headers For Labels With Document

Two additional headers are specified:

protocol-header :: 'Protocol: {PICS-1.0 {headers PICS-Label}}'
label-header :: 'PICS-Label: ' labellist

Requesting Labels Separately

PICS labels can also be retrieved separately from the documents to which they refer. To request labels in this way, a client contacts a label bureau. A label bureau is an HTTP server that understands a particular query syntax, defined below. It can provide labels for documents that reside on other servers, and, indeed, for documents available through protocols other than HTTP. It is anticipated that there will be "well-known" label bureaus which dispense (possibly for a fee) labels created by many rating services.

Rating services are also encouraged to act as label bureaus, providing on-line access to their own labels. By default, the URL that identifies a rating service also identifies its label bureau. If a client requests the URL that identifies a rating service, a human-readable description of the service is returned, as specified in Rating Services and Rating Systems. If, on the other hand, a client requests the same URL and includes query parameters as defined below, it should be interpreted as a request for labels. A rating service, however, is not required to act as a label bureau, and it may choose a different URL (perhaps even on a different HTTP server) to act as its label bureau.

Sample Query

Imagine a rating service, identified by the URL http://www.labels.org/Ratings, which decides to run a label bureau to dispense (at least) its own labels for documents. The following sample request, made to the HTTP server www.labels.org, is illustrative (line breaks are inserted for presentation purposes only):

GET /Ratings?opt=generic&
             u="http%3A%2F%2Fwww.questionable.org%2Fimages"&
             s="http%3A%2F%2Fwww.gcf.org%2Fratings"&
             HTTP/1.0

The query asks the label bureau http://www.labels.org/Ratings to send a single label that applies to everything in the images directory at site www.questionable.org. The desired label should have been created by the service http://www.gcf.org/ratings. Notice the use of %3A to represent a ":" and %2F for "/." This is required for encoding characters within a URL. See RFC-1738.

The label bureau responds by sending back a document of type "application/pics-labels." The labels should be as complete as possible, either by including as many options as possible or by supplying the complete-label (or full) option.

Detailed Syntax and Semantics of HTTP Query for Labels Separate From Documents

The following grammar, in modified BNF, describes the syntax of the GET request to a label bureau:

get :: 'get' url-fragment '?' [opt] [format] extension* url+ service+
url-fragment :: the part of the original URL after the host name, as specified in HTTP 1.0.
opt :: 'opt=' option
option :: 'generic' | 'normal' | 'tree' | 'generic+tree'
format :: [and] 'format=' form
form :: 'minimal' | 'short' | 'full' | 'signed'
extension :: token '=' token-or-quoted-string
     where the token is not one of opt, format,
     u, or s; and token-or-quoted-string follows
     the quoting conventions specified in RFC-1738
token-or-quoted-string :: token | quotedname
token :: [1*n]alphanumpm
url :: [and] 'u=' encodedURL 
service :: [and] 's=' encodedURL 
boolean :: 't' | 'f' | 'true' | 'false'
and :: '&' this must be included unless it immediately follows the ? in the query.
encodedURL :: a URL, with quotation as required for inclusion within another URL.
    According to RFC-1738, quotation is done using "%xx" notation. Alphabetic
    characters, digits, and the special characters $_-.+!*'(), need not be quoted,
    but other characters must be. This does imply that the colon (:) must be encoded
    as %3A and slash (/) as %2F.

Notes:

opt=generic requests generic labels. For each requested URL, the desired response is a generic label that implicitly applies to all URLs matching it. This is useful for requesting a rating of a site or directory.
opt=tree requests a tree of labels. For each requested URL, the desired response is all labels for URLS that match it. This is a way to request all the labels for items in a directory or a site. In the response, everywhere a label would normally be expected in the response, a set of simple-labels will be returned, surrounded by parentheses.
opt=generic+tree requests all generic labels that apply to matching URLs. This is a way to request generic labels for all of the directories at a site. In the response, everywhere a label would normally be expected in the response, a set of simple-labels will be returned, surrounded by parentheses.
opt=normal, or omitting the opt completely, requests specific labels for the URLs specified.
It is permitted to include more than one URL in the request.
The format= specifies the optional information that should be transmitted with the labels. It is treated precisely as the similar keywords would be when sent to a document server as the completeness (see Detailed Syntax of HTTP Requests for Labels With Document), except that the default is full (rather than minimal). Servers which receive a value of completeness that they do not recognize must treat it as though the default, full had been specified.

Detailed Syntax and Semantics of Response to Query for Labels Separate From Documents

The label bureau responds by sending back a document of type "application/pics-labels." Unless the document indicates an overall error, there should be one service-info for each rating service requested in the query. Each service-info should have an error message or a label (or list of labels, in the case of a "tree" query) for each requested URL.

The query's ordering must be preserved in the response. That is, the information from the rating services must be presented in the same order the rating services appear in the query, and the labels from each service must be presented in the same order the URLs appear in the query. If a rating service or label is not provided, the error message should appear in the same position that the service-info or label would appear. Because order is preserved, it is acceptable to omit from the labels the "for" option which indicates the URL being rated (unless the label is generic in which case, as always for generic labels, the for is required.) The client should match the label positionally with the URL for which it requested a rating.

In response to a request for a generic label, only a generic label may be returned. In response to a request for a regular label, a generic label for a URL that is a prefix of the requested URL may be returned. For example, in response to a label request for URL "http://www.gcf.org/index.html" a generic label for the URL "http://www.gcf.org" may be returned. In this case, it is required that the "for" and "generic" options be included in the label, to specify exactly what rating is being returned.

For a tree request, all the labels sent in response to a particular URL are enclosed in parentheses, so the client can match them positionally with the single request URL. The "for" option must be included in such labels to specify exactly which URLs the labels apply to.

MICs and Digital Signatures

This section remains to be specified. There are three particular difficulties that must be addressed:

On what data is the MIC included in the mic-md5 (or md5) option computed? In particular, if the URL ftp://www.somewhere.com/Pictures/Interesting/Look.gif refers to a compressed GIF image, is the MIC computed on the compressed or uncompressed form? Does it depend on the content-transfer-encoding? The MIME type?
How is the label canonicalized before computing the digital signature? Because header lines can be folded by various transports, it is important that a canonical form be carefully defined. Clearly, it should not include the signature itself, but does it include all of the other optional fields? Does a signed label necessarily imply a full label (hence the distinction should be dropped)?
How are the public keys for rating services distributed? Can it be done using a variant on the same technique used for communicating with a label bureau or is a full certificate authority required? What authority should be used or can multiple be used? Is the service's URL a satisfactory distinguished name for use with a certificate authority?

Glossary

application/pics-service: A new MIME data type used to describe a rating service, defined in Rating Services and Rating Systems.
application/pics-labels: A new MIME data type used to transmit one or more labels, defined in this document.
BNF: Backus-Naur Form (or Backus Normal Form). A notation for describing a formal syntax, used extensively in describing programming languages and computer-readable data formats.
category: The part of a rating system which describes a particular criterion used for rating. For example, a rating system might have three categories named "sexual material," "violence," and "vocabulary." Also called a dimension.
content label: A data structure containing information about a given document's contents. Also called a rating or content rating. The content label may accompany the document it is about or be available separately.
content rating: See content label.
dimension: See category.
HTML: HyperText Markup Language. A means of representing hypertext documents. Based on SGML. See the HTML 2.0 Proposed Standard.
HTTP: HyperText Transfer Protocol. Used for retrieving document contents and/or descriptive header information. See the draft HTTP specification.
hypertext: Text, graphics, and other media connected through links.
label: See content label.
MD5: An algorithm, see RFC1321, that can be used to compute a MIC. PICS specifies this particular algorithm for use in PICS labels.
MIC: Message Integrity Check. Also known as a "cryptographic checksum." For PICS, the importance of a MIC is that a rating service can compute the MIC of a piece of information when the label is created and that MIC can be put into the label itself. A client can retrieve the label and the information to which it is supposed to be attached, recompute the MIC and compare it to the one in the label. If they match, for all practical purposes, it is a proof that the label really belongs to the information that has been retrieved. The particular algorithm specified by PICS to compute the MIC is MD5.
MIME: Multimedia Internet Message Extension. A technique for sending arbitrary data through electronic mail on the Internet. See RFC-1521
PICS: Platform for Internet Content Selection, the name for both the suite of specification documents of which this is a part, and for the organization writing the documents. For more information, see http://www.w3.org/PICS
rating: See content label.
label bureau: A computer system which supplies, via a computer network, ratings of documents. It may or may not provide the documents themselves.
rating server: See label bureau.
rating service: An individual or organization that assigns labels according to some rating system, and then distributes them, perhaps via a label bureau or via CD-ROM.
rating system: A method for rating information. A rating system consists of one or more categories.
scale: The range of permissible values for a category.
SGML: Standard Generalized Markup Language. See ISO 8879.
transmission name: (of a category) The short name intended for use over a network to refer to the category. This is distinct from the category name in as much as the transmission name must be language-independent, encoded in ASCII, and as short as reasonably possible. Within a single rating system the transmission names of all categories must be distinct.
URL: Uniform Resource Locator. Described in RFC-1738. A URL describes the location and means of retrieval for a single document. It consists of three components: the "scheme" (protocol used to retrieve a document, like "http" or "ftp"), a host name, and a hierarchical document name within that host. For example "http://www.w3.org/PICS" is the URL of the PICS home page. The scheme for retrieving it is "http," the host is "www.w3.org" and the name within that host is "PICS".

References

PICS, "Rating Services and Rating Systems", Internet Draft, "draft-pics-services-00.txt", 11/21/95.
R. Rivest, "The MD5 Message-Digest Algorithm", RFC 1321, 04/16/1992.
N. Borenstein, N. Freed, "MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1521, 09/23/1993.
T. Berners-Lee, D. Connolly, "Hypertext Markup Language - 2.0", RFC 1866, 11/03/1995.
T. Berners-Lee, L. Masinter, M. McCahill, "Uniform Resource Locators (URLs)", RFC 1738, 12/20/94.

Acknowledgments

Comments and suggestions from the following people are gratefully acknowledged:

Brenda Baker, AT&T
Tim Berners-Lee, W3C
Roxana Bradescu, AT&T
Daniel W. Connolly, W3C
Roy Fielding, W3C
Jay Friedland, SurfWatch
Michael Gordon, Prodigy
Wayne Gramlich, Sun
Woodson Hobbs, NewView
Rohit Khare, W3C
Charlie Kim, Apple
John C. Klensin, MCI
Ann McCurdy, Microsoft
Rich Petke, CompuServe
Dave Raggett, W3C
Bob Schloss, IBM
David Singer, IBM
Michael Smith, Prodigy
Marcy Swenson, Providence Systems
Jason Thomas, MIT

Temporary Appendix A: Why HTTP For Label Bureaus

This section is not expected to be contained in future versions of this document.

Instead of extending HTTP, we considered proposals for special-purpose label transport protocols. Before making a final decision, we constructed the following lists of pros and cons.

Advantages of Using HTTP

An existing HTTP server can be used as a PICS label bureau. This is particularly useful in the short term. CGI scripts at the HTTP server can handle the special header fields of a request for labels.
A label returned from a label bureau and a label returned along with a document from an HTTP server can use identical label formats.
Client programs that already support HTTP will have much less new code to implement.
Client programs that do not support HTTP will have to support a new protocol in any case. It may be easier to support HTTP than a newly defined label transport protocol, because of available software libraries.
Several protocol elements are already fully specified by HTTP that would be required in any PICS protocol.
- Date and time formats.
- Content encoding types.
- Character set and Internationalization issues.
- Error/result conditions. Both result categories (extensible), as well as a sample set of messages are specified.
- Handling of expiration dates for each URL queried.
HTTP is quite stable, has not diverged, and is well accepted.
Security and payment systems either exist or are being developed for HTTP. A binary format may also be developed for speed. PICS need not reinvent such systems.
Firewalls tend to allow HTTP headers to be transmitted already. A new protocol would take much longer to be accepted.
A reliable connection (initially TCP based), ASCII-based protocol seems desirable initially.
Current extensibility already defines how extensions to PICS itself should be accomplished.

Advantages of Creating a New Protocol Instead of Using HTTP

A new protocol would avoid any HTTP protocol wars.
Label bureaus and clients would not need to be updated to accommodate HTTP changes.
RFC-822 and other precedents could still be used in the design of a new protocol.
A binary format could be considered initially for speed.
UDP or other datagram lookups could be considered.

Temporary Appendix B: FAQ - Frequently Asked Questions

This section is not expected to be contained in future versions of this document.

Why is there no ftp, gopher, or netnews protocol for requesting labels along with a document?

Labels can be sent as additional headers in any protocol that employs RFC-822 style headers. We have not yet determined, however, convenient extensions to protocols other than HTTP to permit requests that ask for labels created by specific services. We may specify such extensions in the future.

How do you get labels for items on FTP, Gopher, or netnews servers? Are we forcing all FTP implementations to implement all of HTTP as well?

FTP, Gopher, and netnews servers need not distribute PICS labels. Labels for items on such servers can be retrieved from an HTTP-based label bureau.

The PICS premise is that all compliant clients will have to implement some new protocol. The subset of HTTP which would be required for obtaining a PICS label can be minimal. HTTP will be no more difficult to implement in an FTP (or other) client than a brand-new protocol that provides similar features.

Can existing HTTP servers be used as PICS label bureaus?

Using CGI scripts, or with a small amount of added code in the HTTP server, an existing HTTP server can be configured to access a database of labels and return that information coded as additional HTTP Headers. Most of the work is in the lookup and formatting of the labels themselves, not the modifications to HTTP.

How do I design a really fast PICS label bureau? Won't the overhead be too much?

HTTP already explicitly defines the minimum fields required and then what rules must be followed when additional information is useful to the transaction. For example, HTTP does not require that clients provide "Accept:" headers to indicate preferred MIME types for the content, but if they are provided, servers can match up available formats with the client's request. An HTTP server may be designed to optimize throughput or to optimize the appearance of the result, or to adjust to the client software's preference.

If you minimize the server's response to one line, plus the label information, you are already dealing with the minimum amount of data transfer possible to obtain a label. In addition, most performance issues for PICS will probably be addressed with caching, not by reducing lookup time for a single label. Caching optimization requires meta-data which can be easily encoded within HTTP headers.

How can we keep the PICS extensions from getting tied up in HTTP standardization?

The management of header extensions for HTTP has been an issue of discussion and work by the HTTP group for some time. The HTTP specification lays down specific rules for the handling of extensions which guarantee that those extensions will not be made invalid by any revisions of HTTP itself. In addition, the W3C is working on a system (PEP) for managing and negotiating HTTP extensions even more intelligently.

The worst risk seems to be that HTTP could be upgraded to a new revision level forcing some HTTP implementations to support multiple versions (1.0 and 2.0, for example) or forcing some PICS bureaus to update their protocol as well. Hopefully a major update in HTTP would bring enough benefits for PICS to make any update worthwhile.

What is PEP and Why is PICS Using It?

The Protocol Extension Proposal from the World Wide Web Consortium uses a trio of header fields (Protocol, Accept-Protocol, and Content-Encoding) to allow a HTTP client and server to do sophisticated negotiation about the set of header fields and their meanings. It is being proposed for use in HTTP 1.2 and HTTP-ng, and is currently under careful scrutiny by the W3C Security Editorial Board to make sure that it contains the features necessary to provide security for general document transmission as well as electronic payments.

PICS faces many of the same problems that face the security and electronic payment community. In PICS the issue revolves around the ability for the client to tell the server from which rating services it would like to have labels. This is a simple negotiation problem of the kind PEP was designed to solve. Rather than invent an orthogonal mechanism it seemed best to use one that is already being proposed and investigated.

What if PEP Does Not Catch On?

If the general extension mechanism specified by PEP does not become a generic feature of HTTP servers, PICS label bureaus will need to look for the specific header line beginning Accept-Protocol: PICS/1.0 and process it to determine the rating request. PICS clients will need to look for and process the specific header lines PICS-Label and PICS-Status. We will also have to hope that no other group tries to extend HTTP in a way that uses headers named PICS-Label or PICS-Status.

Comments to Jim Miller.

Webmaster

Created 21 November 1995 by Jim Miller

Last updated 21 Nov 1995