W3C PICS

PICS Label Distribution

Label Syntax and Communication Protocols

by Tim Krauskopf (timk@spyglass.com), Jim Miller (jmiller@w3.org), Paul Resnick (presnick@research.att.com), and Win Treese (treese@OpenMarket.com)

Revision 2, DRAFT 2 Last modified on Sun. Mar. 3, 1996

Overview

This document has been prepared for the technical subcommittee of PICS (Platform for Internet Content Selection). It defines a general format for labels and three methods by which these labels may be transmitted:

In an HTML document.
We specify a mechanism, using the existing META tag, for embedding one or more labels in (the header of) an HTML document.
With a document transported via a protocol that uses RFC-822 headers.
Labels can be transmitted using any protocol that uses RFC-822-style headers. In addition, we define an extension specific to the HTTP protocol that allows an HTTP client (Web browser) to request which labels (if any) it would like to have sent along with a document. The PICS committee hopes that other network protocols will be extended in a similar way.
Separately from the document.
A client can request labels from a "label bureau" that runs the HTTP protocol. The labels may refer to any document that has a URL (see RFC-1738), including those available through protocols other than HTTP, such as ftp, gopher, or netnews. Notice that PICS defines a new URL scheme for referencing IRC chat rooms (see Rating Services and Rating Systems). The simplest implementation of a label bureau is an off-the-shelf HTTP server running a special CGI script.

General Format

A label consists of a service identifier, label options, and a rating. The service identifier is the URL chosen by the rating service (see Rating Services and Rating Systems) as its unique identifier. Label options give additional properties of the document being rated as well as the rating itself, such as the time the document was rated. The rating itself is a set of attribute-value pairs that describe a document along one or more dimensions. One or more labels may be distributed together as a list. The general form for a label list (formatted for presentation, and not showing error status codes) is:

         (PICS-1.0
           <service url> [option...] 
           labels [option...] ratings (<category> <value> ...)
                  [option...] ratings (<category> <value> ...)
                  ...
           <service url> [option...] 
           labels [option...] ratings (<category> <value> ...)
                  [option...] ratings (<category> <value> ...)
                  ...
           ...)

An specific label applies to a single document. If the document is in HTML format, it may refer to other documents, either by external reference (using the <A href=...> tag) or by requesting that they be displayed in-line (using the <img ...> tag or the proposed <object ...> tag). A label applies to the given document only, not to the referenced documents.

A generic label (identified by the use of the generic option) applies to any document whose URL begins with a specific string of characters (specified using the for option). A generic label should only be created if it is certain that all documents with the stated prefix string would be acceptably labeled as specified. It is not safe to assume that attaching a more specific label will guarantee that the generic label will not be used (since the specific label might not be seen by the client software when the access decision is made).

From any rating system, a given document may have any number of generic labels (depending on the length of its URL) but only one specific label. When the specific label for a document can be found, it should be used in preference to any generic label. Lacking a specific label, any generic label may be substituted, but preference should be given to the generic label which has the longest string. Some PICS client software may impose restrictions on the use of generic labels. For example, a client may choose to ignore a generic label that applies to a node in the URL tree more than two levels above the node where the document is located.

Label options can be divided into three groups. Options from the first group supply information about the document to which the label applies. Options from the second group supply information about the label itself. The last group provides miscellaneous information.

  1. Information about the document that is labeled.
    at quoted-ISO-date
    The last modification date of the item to which this rating applies, at the time the rating was assigned. This can serve as a less expensive, but less reliable, alternative to the message integrity check (MIC) options.
    MIC-md5 "Base64-string"
    -or- md5 "Base64-string"
    A message integrity check (MIC) of the item being rated. The MD5 Message Digest Algorithm (see RFC1321) is used to compute the MIC. One way to create this message digest is to use the RSAREF (version 2.0) software available for this purpose at no charge from RSA Laboratories. See MICs and Digital Signatures below.
  2. Information about the label itself.
    by quotedname
    An identifier for the person or entity within the rating service who is responsible for this particular label.
    for quotedURL
    The URL (or prefix string of a URL) of the item to which this rating applies. This option is required for generic labels and in certain other cases (see "Requesting Labels Separately," below); it is optional in other cases. Since a single document may have many URLs it is not necessarily an error if the URL specified in the for option is not the same as the URL used to specify the document being labeled.
    generic boolean
    -or- gen boolean
    This label can be applied to any URL starting with the prefix given in the for option. This is used to supply ratings for entire sites or directories. All generic labels must also include the for option. As mentioned earlier, a generic label should not be created unless it can be legitimately applied to all documents whose URL begins with the prefix specified in the for option (even if a more specific label exists).
    on quoted-ISO-date
    The date on which this rating was issued.
    signature-RSA-MD5 "Base64-string"
    An RSA digital signature encompassing the label. The signature is computed using the MD5 algorithm by the rating service that issued the label. One way to create this signature is to use the RSAREF (version 2.0) software available for this purpose at no charge from RSA Laboratories. See MICs and Digital Signatures below.
    until quoted-ISO-date
    -or- exp quoted-ISO-date
    The date on which this rating expires.
  3. Other information.
    comment quotedname
    Information for humans who may see the label; no associated semantics.
    complete-label quotedURL
    -or- full quotedURL
    Dereferencing this URL returns a complete label that can be used in place of the current one. The complete label has values for as many attributes as possible. This is used when a short label is transmitted for performance purposes but additional information is also available. When the URL is dereferenced it returns an item of type application/pics-labels that contains a labellist with exactly one label.
    extension (optional quotedURL data*)
    -or- extension (mandatory quotedURL data*)
    Future extension mechanism. To avoid duplication of extension names, each extension is identified by a quotedURL. The URL can be dereferenced to get a human-readable description of the extension. If the extension is optional then software which does not understand the extension can simply ignore it; if the extension is mandatory then software which does not understand the extension should act as though no label had been supplied. Each item of data must be one of a fixed set of simple-to-parse data types as specified in the detailed syntax below.

Example

For example, a label that uses the example rating system from the document PICS Rating Services and Rating Systems might be as follows (in all examples, the spacing and indentation is provided for readability; the specification treats multiple white space characters as if they were compressed into a single space):

     (PICS-1.0 "http://www.gcf.org/v1.0"
       labels on "1994.11.05T08:15-0500"
              until "1995.12.31T23:59-0000"
              for "http://w3.org/PICS/Overview.html"
              by "John Patrick"
              ratings (suds 0.5 density 0 color/hue 1))

The same label may be transmitted more compactly by converting all of the line breaks and subsequent indentation characters into a single space, and by replacing the word "labels" with "l", "ratings" with "r" and long option names with their abbreviations. It may be compressed for transmission purposes even further by removing all of the optional information to a separate document and referencing that document by a URL:

     (PICS-1.0 "http://www.gcf.org/v1.0" l
       full "http://www.gcf.org/labels/13242123"
       r (suds 0.5 density 0 color/hue 1))

Finally, the optional information may be omitted entirely, reducing the information content of the label but making the transmission even smaller. The resulting label would then be:

(PICS-1.0 "http://www.gcf.org/v1.0" 
  l r (suds 0.5 density 0 color/hue 1))

Detailed Syntax

The following grammar, in modified BNF, describes the syntax of labels. The methods by which labels are embedded in specific protocols are detailed below.

Notes:

  1. The string "PICS-1.0" in version corresponds to the version number of the PICS specification in PICS Rating Services and Rating Systems. While it is inelegant that the service description uses the notation "(version 1.0)" while the label itself uses "PICS-1.0", it is intentional.
  2. Whitespace is ignored except in quoted strings. Multiple contiguous whitespace characters can be treated as though they were a single space character.
  3. Transmit-names and quoted strings are case sensitive. Option names and other tokens in the BNF grammar are case insensitive.
  4. This specification is strictly about information carried over the wire from the client to the server, and it requires the use of US-ASCII. The companion document PICS Rating Services and Rating Systems describes how a client can map these transmit-names to descriptive strings using other character sets. Clients are advised to cache the descriptions of rating systems they use so that the information specified here can be conveniently presented to the user.
  5. An option that appears in the service-info applies to all labels in that service-info unless overridden by an option in a specific label. That is, a label is effectively lexically nested within the enclosing service-info for the purpose of understanding the applicable options. This is most likely to be useful in the case of the at, by, generic, until and experimental or future options.
  6. Numbers in PICS labels may be integers or fractions with no greater range or precision than that provided by IEEE single-precision floating point numbers.
  7. The multi-value syntax must be used when the value on a particular (multi-valued) scale has either zero or more than one value. It may be used for a single-valued or multi-valued field when there is exactly one value, but the more compact version may also be used in that case.
  8. The only options that may occur more than once in a particular single-label or service-info are comment and extension; if the extension option is supplied more than once, the quotedURLs defining the extensions must be distinct.
labellist :: '(' version service-info+ ')'
version :: 'PICS-1.0'
service-info :: 'error' '(no-ratings' explanation* ')'
              | serviceID service-error | serviceID option* labelword label*
serviceID :: quotedURL
labelword :: 'labels' | 'l'
label :: label-error | single-label | '(' single-label* ')'
single-label :: option* ratingword '(' rating+  ')'
ratingword :: 'ratings' | 'r'
quotedURL :: '"' URL '"' as described and extended in
             Rating Services and Rating Systems.
option :: labeloption | documentoption | otheroption
labeloption ::
          'by' quotedname
        | 'generic' boolean          | 'gen' boolean 
        | 'for' quotedURL
        | 'on' quoted-ISO-date        
        | 'signature-RSA-MD5' "base64-string"
        | 'until' quoted-ISO-date    | 'exp' quoted-ISO-date
documentoption ::
          'at' quoted-ISO-date        
        | 'MIC-md5' "base64-string"  | 'md5' "base64-string"
otheroption ::
          'comment' quotedname        
        | 'complete-label' quotedURL | 'full' quotedURL
        | 'extension' '(' mand/opt quotedURL data* ')'
mand/opt :: 'optional' | 'mandatory'
data :: quoted-ISO-date | quotedURL
        | number | quotedname | '(' data* ')'
quoted-ISO-date :: '"'YYYY'.'MM'.'DD'T'hh':'mmStz'"'
     based on the ISO 8601:1988 date and time standard, restricted
     to the specific form described here:
     YYYY :: four-digit year
     MM :: two-digit month (01=January, etc.)
     DD :: two-digit day of month (01 through 31)
     hh :: two digits of hour (00 through 23) (am/pm NOT allowed)
     mm :: two digits of minute (00 through 59)
     S  :: sign of time zone offset from UTC ('+' or '-')
     tz :: four digit amount of offset from UTC
           (e.g., 1512 means 15 hours and 12 minutes)
     For example, "1994.11.05T08:15-0500" is a valid quoted-ISO-date
     denoting November 5, 1994, 8:15 am, US Eastern Standard Time
     Note: The ISO standard allows considerably greater
     flexibility than that described here.  PICS requires precisely
     the syntax described here -- neither the time nor the time zone may
     be omitted, none of the alternate formats are permitted, and
     the punctuation must be as specified here.
rating :: transmit-name number | transmit-name '(' multi-value* ')'
multi-value :: number | number ':' number
transmit-name :: urlchar+
number :: [sign]unsignedint['.' [unsignedint]]
sign :: '+' | '-'
unsignedint :: [0-9]+
quotedname :: '"' urlchar-or-space+ '"'
alphanumpm :: sign | 'A' | ... | 'Z' | 'a' | ... | 'z'
urlchar :: alphanumpm | '.' | '$' | ',' | ';' | ':' 
                | '&' | '=' | '?' | '!' | '*' | '~' | '@'
                | '#' | '_' | '(' | ')' | '%' hex hex
    Note: Use the "%" escape technique to insert single or
          double quotation marks into a URL
hex :: '0' | ... | '9' | 'A' | ... | 'F' | 'a' | ... | 'f'
urlchar-or-space :: urlchar | ' '
base64-string :: as defined in RFC-1521.
service-error :: 'error' '(' 'request-denied' explanation* ')'
               | 'error' 'service-unavailable'
label-error :: 'error' '(' request-denied' [quotedURL explanation*] ')'
             | 'error' '(' not-labeled' quotedURL* ')'
explanation :: quotedname

Semantics of PICS Labels and Label Lists

A labellist is used to transmit a set of PICS labels. The format specified here is intended to be registered with IANA as the MIME type "application/pics-labels." It allows for transmission of both labels and reasons why labels are not available, and is the format used when labels must be conveyed in a document, along with a document, or from a PICS label bureau. The labellist will always be surrounded by parentheses and begin with the PICS version number (1.0 in this specification).

A label list either specifies that there are no labels available at all ("error (no-ratings ...)") or is separated into sections of labels, one section for each rating service. The URL of each service must be specified (the serviceID). This is either followed by an error message indicating why no labels are available from that service (service-error) or an overall set of optional information (option*) followed by the keyword "labels" (or "l") and the labels from the service. The optional information provided here applies to every label from the service, unless overridden in the specific label itself.

A label encompasses three separate cases. The first is an error that applies to retrieving the label for a particular URL (label-error). The second, and most common, is a single-label consisting of options (which override those specified with the service), the marker word "ratings" (or "r") and the ratings themselves (a list of category names and values). Finally, in the special case where the ratings for an entire tree of documents have been requested, any number of single-labels can be transmitted, enclosed in parentheses. This case is described in more detail in the section on "Requesting Labels Separately."

A label may apply to a specific URL, or it may be generic. A generic label implicitly rates every URL for which the specified one is a prefix. For example, a generic label for the URL "http://w3.org" implicitly rates every document available at that site. A specific (non-generic) label for the same URL, "http://w3.org", does not give any implicit ratings: it merely rates the organization's home page that is fetched by the command "GET /" sent by HTTP to the host w3.org. A generic label must include the "for" option specifying the URL to which it applies. As mentioned above, a generic label should be supplied only if it can be legitimately applied to all documents with URLs that begin with the string specified in the label's for option.

When a multi-value is provided, any combination of numbers and ranges of numbers may be specified, with the endpoints of a range separated by a ":". Thus, in the labellist

 
(PICS-1.0 "http://www.gcf.org/v1.0" l 
  r (suds 0.5 density 0 color/hue 1 subject (0.5:2.5 3))) 
all subject values between 0.5 and 2.5 (including both endpoints) apply to the item, as does the subject value 3. Given the example service description in Rating Services and Rating Systems, all three document subjects apply, "soap," "water," and "soapdish."

RFC-822 Headers

Many protocols, such as Internet electronic mail, the HyperText Transfer Protocol, and USENET News, use US-ASCII headers as described in RFC-822. For use in such protocols, we define a new header, PICS-Label, used to contain the labels described in this document. The syntax is:

PICS-Label: <labellist>

where labellist is described according to the syntax above. Continuation lines beginning with whitespace may be used following the specification given in RFC-822.

Embedding Labels in HyperText Markup Language (HTML)

Labels may be embedded in HTML files as meta-information, using the META element defined in the HTML specification. This embedding uses the HTTP header equivalence mechanism:

       <META http-equiv="PICS-Label" content='labellist'>

(Note that the content attribute uses single quotes, because the PICS label syntax uses double quotes. Any of the following characters appearing within the content must be escaped using SGML entities:

        '       &#39;           /* single quote */
        &       &amp;           /* ampersand   */
        >       &gt;            /* greater than */

See the HTML 2.0 Proposed Standard.

Using HTTP to Request Labels With A Document

We specify a simple extension to HTTP that allows a client to request that one or more labels be included in a header along with the document. We deal here only with the HTTP protocol; we hope that other protocols will be similarly extended. HTTP servers should include PICS label headers only if requested to do so by the client, and should only include the labels from services requested by the client.

Example

Client sends to HTTP server www.greatdocs.com, a PICS-enabled server:

GET foo.html HTTP/1.0
Protocol-Request: {PICS-1.0 {params full 
                    {services "http://www.gcf.org/v1.0"}}}

Server responds to client:

HTTP/1.0 200 OK
Date: Thursday, 30-Jun-95 17:51:47 GMT
MIME-version: 1.0
Last-modified: Thursday, 29-Jun-95 17:51:47 GMT
Protocol: {PICS-1.0 {headers PICS-Label}}
PICS-Label:
 (PICS-1.0 "http://www.gcf.org/v1.0" labels
  on "1994.11.05T08:15-0500"
  exp "1995.12.31T23:59-0000"
  for "http://www.greatdocs.com/foo.html"
  by "George Sanderson, Jr."
  ratings (suds 0.5 density 0 color/hue 1))
Content-type: text/html
...contents of foo.html...

Explanation of example

The client requests the document foo.html. In addition, the client requests the full label of the document from the rating service "http://www.gcf.org/v1.0". The server responds by sending back the label, in the PICS-Label header, as well as the document. The format of the PICS-Label header field (a labellist) allows the server to respond either with a label or an explanation of why the label is not available, since it would be inappropriate for the server to generate an HTTP error status if the document is available but (some of) the labels are not.

Following the usual HTTP distinction between HEAD and GET, a client that wishes to examine a rating before retrieving the full document can substitute the word HEAD for GET in the request. The server responds with exactly the headers shown above, but does not send back the document foo.html.

Detailed Syntax of HTTP Requests for Labels With Document

The following grammar, in modified BNF, describes the syntax of the additional header line to be included in an HTTP request for a document and associated labels.

request-header :: 
 'Protocol-Request: {PICS-1.0 {params ' [completeness] 
                                         extension* 
                                         services '}}'
completeness :: 'minimal' | 'short' | 'full' | 'signed'
extension :: '{' token-or-quoted-string+ '}'
     where the first token-or-quoted-string is not 'services'.
token-or-quoted-string :: token | quotedname
token :: alphanumpm+
services :: '{' 'services' quotedURL+ '}'

A request for a minimal label asks that all options be omitted, unless a generic label is returned, in which case the generic and for options must also be included in the label. A short label includes everything that is included in a minimal label, plus additional options that the server deems appropriate. A request for a full label asks that as much information as possible should be sent back in the label, either directly or through the use of a complete-label (or full) option, but no signature-RSA-MD5 option is needed.

A request for signed labels asks that all the information in a full label should be sent, along with a digital signature on the label itself. In a signed label the information must be transmitted directly as part of the label (and included in the computation of the signature); the complete-label (or full) option may be sent, but it would be redundant. Details of signing labels are included in the section MICs and Digital Signature.

It is acceptable for a server to ignore the completeness, either by delivering more or fewer options than requested. If the completeness is omitted, it should be treated as though minimal had been supplied. For future extensibility, any alphanumeric string may be used for a value of the completeness option. Servers which receive a value of completeness that they do not recognize must treat it as though minimal had been specified.

The extensions are for future extensions to the protocol; any extensions which are not understood by the server must be ignored by it. It is recommended that experimental extensions use a URL, which dereferences to a description of the extension, as the initial token-or-quoted-string.

Each service specifies a rating service from which the client is requesting a label for the document. There may be as many repetitions of the service part of the query as desired.

Detailed Syntax For HTTP Response Headers For Labels With Document

Two additional headers are specified:

protocol-header :: 'Protocol: {PICS-1.0 {headers PICS-Label}}'
label-header :: 'PICS-Label: ' labellist

Requesting Labels Separately

PICS labels can also be retrieved separately from the documents to which they refer. To request labels in this way, a client contacts a label bureau. A label bureau is an HTTP server that understands a particular query syntax, defined below. It can provide labels for documents that reside on other servers, and, indeed, for documents available through protocols other than HTTP. It is anticipated that there will be "well-known" label bureaus which dispense (possibly for a fee) labels created by many rating services.

Rating services are also encouraged to act as label bureaus, providing on-line access to their own labels. By default, the URL that identifies a rating service also identifies its label bureau. If a client requests the URL that identifies a rating service, a human-readable description of the service is returned, as specified in Rating Services and Rating Systems. If, on the other hand, a client requests the same URL and includes query parameters as defined below, it should be interpreted as a request for labels. A rating service, however, is not required to act as a label bureau, and it may choose a different URL (perhaps even on a different HTTP server) to act as its label bureau.

Sample Query

Imagine a rating service, identified by the URL http://www.labels.org/Ratings, which decides to run a label bureau to dispense (at least) its own labels for documents. The following sample request, made to the HTTP server www.labels.org, is illustrative (line breaks are inserted for presentation purposes only):

GET /Ratings?opt=generic&
             u="http%3A%2F%2Fwww.questionable.org%2Fimages"&
             s="http%3A%2F%2Fwww.gcf.org%2Fv1.0"&
             HTTP/1.0

The query asks the label bureau http://www.labels.org/Ratings to send a single label that applies to everything in the images directory at site www.questionable.org. The desired label should have been created by the service http://www.gcf.org/v1.0. Notice the use of %3A to represent a ":" and %2F for "/." This is required for encoding characters within a URL. See RFC-1738.

The label bureau responds by sending back a document of type "application/pics-labels." The labels should be as complete as possible, either by including as many options as possible or by supplying the complete-label (or full) option.

Detailed Syntax and Semantics of HTTP Query for Labels Separate From Documents

The following grammar, in modified BNF, describes the syntax of GET and POST requests to a label bureau. The use of the POST request is specified only for backward compatibility with HTTP servers that cannot handle a long GET query. Its use, while described in the HTML 2.0 specification (for use in submitting forms, see section 8.2.1 and 8.2.3), is deprecated.

request :: get | post
get :: 'get' url-fragment '?' [opt] [format]
                              extension* url+ service+
post :: 'post' url-fragment crlf crlf formencodeddata
url-fragment :: the part of the original URL after the host
    name, as specified in HTTP 1.0.
crlf :: carriage return (hex D) followed by line feed (hex A)
opt :: 'opt=' option
option :: 'generic' | 'normal' | 'tree' | 'generic+tree'
format :: [and] 'format=' form
form :: 'minimal' | 'short' | 'full' | 'signed'
extension :: token '=' token-or-quoted-string
     where the token is not one of opt, format,
     u, or s; and token-or-quoted-string follows
     the quoting conventions specified in RFC-1738
token-or-quoted-string :: token | quotedname
token :: alphanumpm+
url :: [and] 'u=' encodedURL 
service :: [and] 's=' encodedURL 
boolean :: 't' | 'f' | 'true' | 'false'
and :: '&' this must be included unless it immediately
     follows the ? in the query.
encodedURL :: a URL, with quotation as required for inclusion
     within another URL. According to RFC-1738, quotation is done
     using "%xx" notation. Alphabetic characters, digits,
     and the special characters $_-.+!*'(), need not be quoted,
     but other characters must be. This does imply that the 
     colon (:) must be encoded as %3A and slash (/) as %2F.
formencodeddata :: The query as specified for get but encoded into
     MIME type application/x-www-form-encoded as described in
     sections 8.2.1 and 8.2.3 of HTML 2.0.

Notes:

Detailed Syntax and Semantics of Response to Query for Labels Separate From Documents

The label bureau responds by sending back a document of type "application/pics-labels." Unless the document indicates an overall error, there should be one service-info for each rating service requested in the query. Each service-info should have an error message or a label (or list of labels, in the case of a "tree" query) for each requested URL.

The query's ordering must be preserved in the response. That is, the information from the rating services must be presented in the same order the rating services appear in the query, and the labels from each service must be presented in the same order the URLs appear in the query. If a rating service or label is not provided, the error message should appear in the same position that the service-info or label would appear. Because order is preserved, it is acceptable to omit from the labels the "for" option which indicates the URL being rated (unless the label is generic in which case, as always for generic labels, the for is required.) The client should match the label positionally with the URL for which it requested a rating.

In response to a request for a generic label, only a generic label may be returned. In response to a request for a regular label, a generic label for a URL that is a prefix of the requested URL may be returned. For example, in response to a label request for URL "http://w3.org/PICS/Overview.html" a generic label for the URL "http://w3.org/PICS" (or even "http://w3.org") may be returned. In this case, it is required that the "for" and "generic" options be included in the label, to specify exactly what rating is being returned.

For a tree request, all the labels sent in response to a particular URL are enclosed in parentheses, so the client can match them positionally with the single request URL. The "for" option must be included in such labels to specify exactly which URLs the labels apply to.

MICs and Digital Signatures

This specification includes two independent security features, each intended to prevent a different problem that can arise in a PICS system. They may be used independently or together. Both features rely on patented cryptographic technology whose use is subject to a variety of legal restrictions (including possible U.S. export controls). The PICS technical committee cannot provide any information about the exact legal status of the code or algorithms.

Within the United States, RSA Laboratories (100 Marine Parkway, Redwood City, CA, 94065-1031) distributes a source code kit called RSAREF which provides all of the code required to implement the cryptographic components of the PICS spec. The president of RSA Data Security, Inc., Mr. Jim Bidzos, has advised us that RSAREF will be made available at no cost for use in implementing the PICS specifications. Questions about the legal status, etc., should be directed to Mr. Bidzos.

The first problem arises when a document has been examined and a label generated, and then the document is modified without updating the label. While this can happen legitimately (as when Time-Warner updates the page containing the current issue of Time Magazine and believes that the label is still valid) it can also happen as a result of tampering with the document by an unauthorized party. PICS labels contain three option fields intended to help deter this kind of problem:

At
If the objective is to simply detect accidental changes, then the date of last modification of the document can be calculated when the label is created and stored in the at field. Assuming that the last modification time is accurately maintained, this will detect updates to the document made after the label was created.
Until or exp
If the document is expected to be updated infrequently or periodically, the label can contain an expiration date that should cause the label to be invalid before the document is next updated. This, too, does not guard against a concerted malicious attack.
MIC-md5 or md5
If the label is intended to apply only to the data that was actually rated, then a form of checksum (called a "message digest") can be applied to the data when the label is created. The message digest is converted into US-ASCII characters using MIME base-64 encoding and stored in the MIC-md5 (also called md5) field. When the document is later retrieved, the same algorithm can be used to recompute the message digest and the two digests can be compared. The MD5 algorithm is designed so that it is extremely unlikely that the two digests will be the same if the document has been tampered with in any way.

This technique is well-known in the cryptographic community and has been adopted by the electronic mail community, where it is part of the MOSS specification. For use with electronic mail, an elaborate technique is required to assure that the two message digests will match, since electronic mail gateways can modify the data before it is delivered (by wrapping lines, for example). We have chosen not to adopt MOSS directly for PICS, largely because of this complexity.

Instead, we recommend the direct use of the MD5 algorithm on the source document and conversion of the result to base64 encoding. This resulting string of US-ASCII characters is broken into lines of 60 characters each, and included directly in the mic-md5 (md5) label option. The MD5 algorithm and the conversion of the result into US-ASCII characters is provided by the RSAREF (version 2.0) software.

The second problem is that of tampering or forging labels. Here the problem is that the end user needs some way of being reassured that the label they receive was created by the rating service they expected and that it has not been altered since it was created. PICS addresses this problem by allowing labels to be "digitally signed". A digital signature, while not currently legally recognized, is a cryptographic technique to provide exactly this assurance. The RSA signature technique works as follows:

The problem of distributing these keys (and invalidating them in case the service's key is compromised) is an active area of commercial competition. Since there is no clearly established solution available today, PICS assumes that each service will distribute the public keys in some way it chooses. It also assumes that no keys will ever have to be invalidated. While this is clearly not a perfect solution, it seems to be the limit of what can be done today without committing to specific proprietary technology.

There is one additional problem with the digital signature solution outlined above. If a rating service allows other people to generate labels under its name (for example, a service that supports self-ratings by content producers) then the labels may need to be signed by both the service and the content producer. This can be done (each signs the label without the other's signature), but it becomes quite difficult to distribute the public keys needed to verify the signature. The PICS specification does not propose a solution to this problem (it, too, is part of active commercial competition).

Signature Details

  1. PICS specifically requires the use of the RSA signature algorithm with the MD5 message digest. Should this system become outdated, the PICS specification can be easily updated to add a new label option that supports a different pair of algorithms.
  2. PICS does not specify the key length to be used for the digital signatures. Individual services will need to investigate the legal and technical ramifications involved and make a choice. Should a single answer become common, this specification may be re-issued with this detail filled in.
  3. The special form of the label that is used for signatures is computed as follows:
    • The service must decide which options it will include in the signed label when it is transmitted. Any options not transmitted with the signature cannot be used in the computation of the signature. We recommend that all options with know values be included with the exception of signature-rsa-md5. Any option may be omitted, but it will be common for the options mic-md5 (or md5) and full (or complete-label) to be omitted. The signature-rsa-md5 option is never included in the list of options.
    • The selected options are sorted alphabetically by their shortest name (i.e. use full instead of complete-label). If a selected option has a default value and it is the same as the value to be used in the label, the option is omitted from this list.
    • For each option in the list (in order), the short name is put into the label followed by a single space followed by the value of the option, followed by a space. The shortest form of a value is used, and strings are output in lower case if they are case insensitive.
    • After all of the options has been output, output the characters "r (".
    • Output the transmission names and their values, in alphabetical order by transmission name (using the US-ASCII character collating sequence for "alphabetical order"), separating the transmission name from the value by a single space. In outputting the value, no whitespace is permitted except for a single space used to separate items in a multi-value.
    • Output a ")"
  4. When the client computes the special label format described above, it will use all options available to it: both those in the single-label and in the service-info. This implies a constraint on the server when it decides what options to include in the transmitted set. The transmitted set must include any options that the server ships as part of the service-info, unless either the value specified in the service-info or the value of the option for this label is the default value of the option.

Glossary

application/pics-labels
A new MIME data type used to transmit one or more labels, defined in this document.
application/pics-service
A new MIME data type used to describe a rating service, defined in Rating Services and Rating Systems.
BNF
Backus-Naur Form (or Backus Normal Form). A notation for describing a formal syntax, used extensively in describing programming languages and computer-readable data formats.
category
The part of a rating system which describes a particular criterion used for rating. For example, a rating system might have three categories named "sexual material," "violence," and "vocabulary." Also called a dimension.
content label
A data structure containing information about a given document's contents. Also called a rating or content rating. The content label may accompany the document it is about or be available separately.
content rating
See content label.
dimension
See category.
HTML
HyperText Markup Language. A means of representing hypertext documents. Based on SGML. See the HTML 2.0 Proposed Standard.
HTTP
HyperText Transfer Protocol. Used for retrieving document contents and/or descriptive header information. See the draft HTTP specification.
hypertext
Text, graphics, and other media connected through links.
label
See content label.
label bureau
A computer system which supplies, via a computer network, ratings of documents. It may or may not provide the documents themselves.
MD5
An algorithm, see RFC1321, that can be used to compute a MIC. PICS specifies this particular algorithm for use in PICS labels.
MIC
Message Integrity Check. Also known as a "cryptographic checksum." For PICS, the importance of a MIC is that a rating service can compute the MIC of a piece of information when the label is created and that MIC can be put into the label itself. A client can retrieve the label and the information to which it is supposed to be attached, recompute the MIC and compare it to the one in the label. If they match, for all practical purposes, it is a proof that the label really belongs to the information that has been retrieved. The particular algorithm specified by PICS to compute the MIC is MD5.
MIME
Multimedia Internet Message Extension. A technique for sending arbitrary data through electronic mail on the Internet. See RFC-1521
PICS
Platform for Internet Content Selection, the name for both the suite of specification documents of which this is a part, and for the organization writing the documents. For more information, see http://w3.org/PICS
rating
See content label.
rating server
See label bureau.
rating service
An individual or organization that assigns labels according to some rating system, and then distributes them, perhaps via a label bureau or via CD-ROM.
rating system
A method for rating information. A rating system consists of one or more categories.
scale
The range of permissible values for a category.
SGML
Standard Generalized Markup Language. See ISO 8879.
transmission name
(of a category) The short name intended for use over a network to refer to the category. This is distinct from the category name in as much as the transmission name must be language-independent, encoded in US-ASCII, and as short as reasonably possible. Within a single rating system the transmission names of all categories must be distinct. URLs, while generally longer than desired, can be used as transmission names. Hence transmission names are case sensitive.
URL
Uniform Resource Locator. Described in RFC-1738. A URL describes the location and means of retrieval for a single document. It consists of three components: the "scheme" (protocol used to retrieve a document, like "http" or "ftp"), a host name, and a hierarchical document name within that host. For example "http://w3.org/PICS" is the URL of the PICS home page. The scheme for retrieving it is "http," the host is "w3.org" and the name within that host is "PICS". Notice that PICS defines an additional scheme beyond those listed in RFC-1738, described in Rating Services and Rating Systems, which allows Chat (IRC) rooms to be named.

References

  1. PICS, Rating Services and Rating Systems, Internet Draft, "draft-pics-services-00.txt", 11/21/95.
  2. R. Rivest, "The MD5 Message-Digest Algorithm", RFC 1321, 04/16/1992.
  3. N. Borenstein, N. Freed, "MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1521, 09/23/1993.
  4. T. Berners-Lee, D. Connolly, "Hypertext Markup Language - 2.0", RFC 1866, 11/03/1995.
  5. T. Berners-Lee, L. Masinter, M. McCahill, "Uniform Resource Locators (URLs)", RFC 1738, 12/20/94.

Acknowledgments

Comments and suggestions from the following people are gratefully acknowledged:
Brenda Baker, AT&T
Tim Berners-Lee, W3C
Roxana Bradescu, AT&T
Daniel W. Connolly, W3C
Roy Fielding, W3C
Jay Friedland, SurfWatch
Michael Gordon, Prodigy
Wayne Gramlich, Sun
Woodson Hobbs, NewView
Rohit Khare, W3C
Charlie Kim, Apple
John C. Klensin, MCI
Ann McCurdy, Microsoft
Rich Petke, CompuServe
Dave Raggett, W3C
Bob Schloss, IBM
David Singer, IBM
Michael Smith, Prodigy
Marcy Swenson, Providence Systems
Jason Thomas, MIT

Appendix A: An Algorithm for Locating a Label Bureau

As the use of PICS grows, we must consider its impact on overall network performance. In general, the PICS techniques for transmitting labels in or with documents add only a very small amount of traffic to the net, since the additional PICS headers will ordinarily contain only a few hundred bytes of data and the documents themselves are more likely to be several thousand bytes of data. Furthermore, since the labels come from the same source as the document itself there is no network hot spot created by PICS (although popular servers may themselves already be such hot spots).

Label bureaus, however, are a new component proposed by PICS. And if a single label bureau becomes popular then there is a significant risk of becoming a hot spot and hence a performance bottleneck for the PICS system. The Internet is in need of a good solution to this problem, and there is work (both underway and proposed) that may solve the problem in the long term.

In the short term, however, there is no truly good solution. The following suggestion comes from Prof. David Karger at MIT. It is a variant on several well-known algorithms for distributing load in a system.

First, we assume that popular label bureaus will be able to establish a number of mirror sites around the network. This is already common practice, and we have no suggestions for the details of determining the sites or keeping them updated as new labels are generated. Our algorithm simply assumes that they exist and are equivalent, and that the network's Domain Name System (DNS) has records which map the single well-defined name for the label bureau to multiple Internet addresses, in the usual manner.

When client software starts, it should attempt to resolve the name of the label bureau it wishes to use (we assume one label bureau, but the algorithm extends in an obvious manner to multiple bureaus) through DNS. If it receives more than one host address, it saves the entire list and chooses two at random, labeling one the "primary" and the other the "secondary" bureau. Alternatively, these may be configuration parameters of the client software that are then validated when the software starts. It also divides 60 minutes by the total number of address it can find for the label bureau, sets a timer to this value, and remembers this as the "threshold" value.

Every time the client wishes to contact the label bureau it does the following. If the timer is below the threshold, the primary bureau address is used. Otherwise, the query is sent to both the primary and the secondary label bureau address. When the first answer arrives the connection to both label bureaus is closed down. The bureau which answered first becomes the primary bureau. In any case, a new secondary bureau address is chosen at random and the timer is reset to the threshold value.

A simple variant on this algorithm will probably become feasible in the near future. When the HTTP protocol is updated to allow "keep alive" connections to a server, the PICS client should keep its connection to the primary label bureau alive as long as possible. Then, instead of simply accepting the first response and considering the responder as the primary, a more careful measurement must be made. The time required to send the query and receive the response must be measured, rather than the total transaction time: connection setup costs can be quite high, and would distort the measurement if one compared the round trip time to the primary bureau through an existing connection to the time to establish the connection to the secondary bureau plus the round trip time.


W3C
Comments to Jim Miller.
Webmaster
Created 21 November 1995 by Jim Miller
Last updated 3 Mar 1996