PICS Developers' Workshop Summary

Held June 20-21, MIT

Notes by Paul Resnick (presnick@research.att.com)

Agenda

9:10		Welcome and Introductions
10:00		PICS History (Weitzner and Miller)
10:15		PICS Spec Overview (Resnick)
11:00		Break
11:15		Standards Process (Miller)
11:30		Afternoon Preview (Resnick, Schloss, Miller, Kotok)
12:00		Lunch (provided)
1:30		Breakout sessions:
          1)  Operating with PICS-1.1
          2)  Protocol evolution  
3:30		Break
4:00		Reports back
5:00		Adjourn

Introductions and briefings on the current status of the specifications and the public policy process took most of the morning. More than 40 people attended the first PICS developers' workshop, representing xx companies and organizations. Danny Weitzner discussed ramifications of the Philadelphia court injunction against enforcing the CDA. It is likely that the government will appeal to the Supreme Court. It is also possible that the New York court, hearing a similar case, may come to a different conclusion than the Philadelphia court. Further legislation in the U.S. is also a possibility, once the current CDA status becomes final. Catherine Soubeyrand summarized a recently passed French law that requires Internet Service Providers to do two things: 1) filter materials deemed illegal by a Government labeling body that was appointed; 2) give subscribers access to filtering technology so that they can choose what else they want to block.

I walked the group through the specs. Many questions came up, but since I was presenting, I don't have notes (anyone who took notes, please send them and I'll link them in here.) Several questions were redirected to the afternoon breakout session on operating with the 1.1 specs. A few points of terminology kept tripping us up, so here's a quick glossary that we agreed to follow as best we could for the rest of the day:

As technical terms, we use label and rating interchangeably. Depending on the audience, one may communicate more effectively than the other.
A rating system or rating vocabulary is the dimensions and scales used for labeling.
A rating service is the entity that provides the labels. We say that RSACi and SafeSurf are rating services, even though webmasters make their own decisions about which RSACi or SafeSurf labels apply to their documents.
A label bureau is an http server that responds to requests for labels independent of documents.

Jim Miller discussed options for moving PICS from de facto to de jure standard, including IETF, ISO, and IEEE. Most participants felt that it was not worth the effort, though there were a few dissenters. The group voiced confidence in W3C to make appropriate decisions about when and whether to forward the work to an official standards body.

In the afternoon, we split into two breakout groups. One discussed the current state of implementations, the other future protocol evolution.

Working With the 1.1 Specs

The current implementations group surveyed what features are currently being implemented, decided to create several web pages to keep tabs on the developer community's progress, and identified several areas where additional work is needed.

Status

Keeping in mind that not all developers were present (regrettably, the invitations went out only 3 weeks prior to the workshop), the following appears to be the current status:

Several implementors are using the filename extension .rat for local storage of a service descriptin (the MIME type application/pics-service). No one claimed to be using anything else. From here on, everyone is encouraged to use .rat
No one is paying attention to icons associated with scales or values in service descriptions, and neither SafeSurf nor RSACi are providing icons. Developers agreed that the specification needed to be more specific about icons, specifying a size (e.g., 48x48) and an encoding (e.g., gif) in order to be useful. Rating services are hereby warned that effort put into creating beautiful icons associated with numeric values may be wasted effort.
The PICS extensions do not rely on implementation of full PEP. The additional PICS headers (Protocol-Request, Protocol, PICS-Label) will work with http/1.0 and http/1.1. Servers and clients should be prepared for PICS headers in either of these versions of http.
SafeSurf is not providing expiration dates for its labels and RSACi providing 1-year expiration dates. No one knew of any instances of expiration times in the two-minute range.
Security features. RSACi and SafeSurf are not signing labels or providing MD5 document hashes. None of the clients have yet implemented the ability to decode signatures or MD5 hashes.
Generally, clients are providing restricted Boolean filtering rules (profiles). For example, Microsoft provides only an implicit AND (violence < 3 AND literary quality < 2). Some clients permitted rules based on labels from more than one rating service. There were also a few variants on requiring there to be at least one label versus requiring that every service have labeled the item. In the other room, they were deciding to define a standard format for profiles, so this discussion will no doubt continue in that forum.
Clients were mixed in what they do to look for labels when a document does not have one in a META element. Two implementors said that they give up and assume that no label is present. One said that his software checks for a "site" label in the site home page. That is, if the document URL is "http://www.greatdocs.com/foo/bar/bat.htm", and there is no label in the document (or in the http header stream accompanying the document), the software GETs "http://www.greatdocs.com/" and looks for a generic label in the META element of that page. Ray Soular from SafeSurf argued that it would be better to look for a generic label for the immediate directory, "http://www.greatdocs.com/for/bar/" He has been telling sites that rate themselves through SafeSurf that they should make generic labels on a per directory basis. He also pointed out that many people get only a single directory, not an entire domain name, and so looking for a generic site label would be too generic. A lively discussion ensued.
In the end, those who were not looking up the hierarchy for generic labels at all seemed to carry the day, since an extra connection and document download may be a significant performance penalty, especially since it happens only after processing the original document. It was agreed that tools are needed so that webmasters can enter wildcard labels for directories and have the server automatically send out the labels with requested documents, rather than counting on the client to go looking. Several kinds of tools may help:
1. Http servers that put labels in the http header stream could look in files up the hierarchy for generic labels, or, better yet, in a local labels database. Two http server developers were present who are planning to add support for labels in the http header stream.
2. Popular web authoring tools could build in support for adding labels and propagating generic labels through all the specific documents.
3. New stand-alone tools could look for generic labels and propagate them through all the documents in a directory. Several participants indicated that they could find or write such tools for specific platforms.
It was also suggested the recent distributed indexing workshop held by W3C had dealt with similar problems of finding the home directory for a page, and may have come up with interesting solutions.

New Resources

We agreed to keep lists of resources that will be useful to developers. To spread the burden, these lists will be maintained by various people, with the PICS page linking to them. Since the people maintaining these pages may need to include links to competitors' products, those who are maintaining the pages all agreed to be: "fair, equitable, and speedy." If you are on this list, please send me a URL as soon as possible, and I will make the links. On the page that you create, please indicate submission instructions so that people can send you additional links. Please indicate also that pics-ask@w3.org is a good place to send an "appeal" if anyone feels that the page is not being run fairly, equitably, and speedily.

There will be lists of:

Client software (Susan Getgood, Microsystems)
HTTP servers (Martin Presler-Marshall, IBM)
Proxies (Kevin Fink, N2H2)
Label bureaus (Paul Resnick, AT&T)
Labeling tools (Paul Resnick, AT&T)
Protocol extensions (Bob Schloss, IBM)
Hints to implementors (Bob Schloss, IBM)
PICS-related press releases from all companies (Susan Getgood, Microsystems)
Rating services and rating vocabularies/systems (Ray Soular, SafeSurf)

More Work Needed

We identified several areas where additional work is needed:

Convince the major HTTP server vendors to pass labels in the header stream. IBM's server will have this feature, and Robert Thau offered to put a limited version into the Apache server, but it would be nice to get it into all the major servers, so that webmasters can move away from embedding labels in documents.
Tools for rating are needed, as noted above.
Test suites are needed. The developers asked W3C to create a test suite as part of the reference code. Jim Miller agreed in principle, but suggested that any help member companies provide will speed the process.
A common language for profiles is needed, so that they can be easily downloaded and installed, saving parents from having to set the volume on each of the rating dimensions. This should be taken care of by the new profile format that will be developed.
An NNTP extension for requesting labels in netnews, similar to the HTTP extension already defined, may be useful. No working group has yet formed on this, although individuals are thinking about it.

Protocol Extensions

(Slightly edited version of summary provided by Alan Kotok)

Jim Miller discussed the question of whether there needed to be a way of asking a rating bureau for lists of documents matching certain ratings. The initial conclusion was, no, not yet. But this turned around later, since a subcommitte was solicited to develop this idea.

Jim Miller then discussed the question of whether the protocol needed to allow arbitrary text strings as values of ratings. He pointed out that all known justifications for this requirement could be met in other ways. It was agreed to postpone this for a more compelling argument. However, in the discussion, a problem with text strings being language-specific was identified. Proposed solutions to that problem were (1) including language identifiers to tag the strings (where one or more such string was provided), and (2) having multiple "ratings" retrieved using the yet-to-be-widely-adopted language preference part of HTTP.

In the discussion about interfacing PICS to Search Services, we discussed both sending the filtering criteria to the Search Service, and the Search Service supplying enough information for the browser to do filtering.

We agreed that latter was better for the strict purpose of filtering, but the former was required for other reasons, as well: PICS may well convey many kinds of "ratings", some of which have nothing to do with filtering, but which may well be useful as search criteria. Therefore it is desirable that there be a standard protocol to convey PICS-based criteria to search services, both for use in guiding searches, and avoiding the problem of "here's the first 10 responses, but 9 of them were censored by your browser."

Some issues raised:

How is a document composed from many URLs conveyed? It was claimed that HTML now has a "section" identifier.
Browsers always tell servers which PICS labels they want. What happens if the Search engine doesn't have such a label? Were all embedded labels extracted? Are they forwarded?
If there isn't a label, and the browser goes to the label bureau to get a label, it may have changed since the crawler found the document. Then what?
The distributed search and indexing workshop in May discussed META info, schemes, home and title page definitions, document-based robot instructions. A summary of that workshop can be found at http://www.w3.org/pub/WWW/Search/9605-Indexing-Workshop/

pics-profiles working group

A new working group was formed, with work to be conducted by mailing list. If you would like to be on the mailing list, send email to Kevin Fink saying why you'd like to participate. The pics-profiles working group will specify a format for describing a PICS preference profile. A preference profile indicates what labels are required in order for a URL to be "acceptable" for a particular user. The profile will have to indicate which services' labels should be consulted, and what constitutes an acceptable label. It may also include information such as the user name and password associated with the profile, or rules about which profiles apply to which users. Details are still to be determined. It is believed that the preference profile will also be sufficient for communicating queries to label bureaus, such as "find all labels above 3 on scale A and between 2 and 4 on scale B." Enough people had this hunch that we decided to make a single working group for both functions. We'll watch carefully to make sure the eventual preference profile format also is adequate as a query language. The work of this group will be carried out exclusively by email, at least for the time being. Jonathan Brezin from IBM has agreed to take primary responsibility for creating an initial proposal that we can all respond to. Anyone else is also welcome to suggest options or full proposals.