W3C PICS

PICS: Internet Access Controls Without Censorship

Paul Resnick
AT&T Research
600 Mountain Avenue
Murray Hill, NJ 07974
presnick@research.att.com

James Miller
World Wide Web Consortium
MIT Laboratory for Computer Science
Room NE43-355
545 Technology Square
Cambridge, MA 02319
jmiller@mit.edu

An updated version of this article, which has been accepted for publication in Communications of the ACM, is now available.

With its recent explosive growth, the Internet now faces a problem that confronts all media that serve diverse audiences: not all materials are appropriate for every audience. Societies have confronted this problem differently for different media [1, 2]. For example, in the USA, there are few restrictions on the content of printed materials that may be distributed, but more restrictions on what may be broadcast on television and radio. Any such restrictions on distribution, however, will be too restrictive from some perspectives, yet not restrictive enough from others. For the Internet, we can do better- the Internet can regulate itself through local choices about what to receive.

PICS, the Platform for Internet Content Selection, is designed to enable supervisors-whether parents, teachers, or administrators-to block access from their computers to certain Internet resources, without censoring what is distributed to other sites. It draws on two unique features of the Internet. First, publishing is instantaneous, world-wide, and very inexpensive, so it is easy to publish rating and advisory labels. Labels and ratings already help consumers choose many products, from movies to cars to computers. Such labels are provided by the producers or by independent third parties, such as consumer magazines. Similarly, labels for Internet resources could help users to select interesting, high-quality materials and could help supervisors to block access to inappropriate ones. Second, access to Internet resources is mediated by computers that can process far more labels than any person could. Thus, parents, teachers, and other supervisors need only configure software to selectively block access to resources based on the rating labels; they need not personally read them.

In August 1995, representatives from twenty-three companies and organizations gathered under the auspices of MIT's World Wide Web Consortium to discuss the need for content labeling, especially for use with selection software. Faced with the obvious desire of some parents and other users to control what they or their children have access to, the companies felt that such tools were important to the continued growth of Internet and on-line service use in homes, schools, and businesses. The attendees also hoped to offer a positive alternative to U.S. government regulation of on-line content, which would dampen free speech. The group decided to develop a technical infrastructure that would support label distribution and selection software. PICS, the fruit of the group's work, establishes conventions for label formats and distribution methods, without dictating a labeling vocabulary or who should pay attention to which labels. It is analogous to specifying where on a package a label should appear, and in what font it should be printed, without specifying what it should say.

The Need for Flexible Blocking

The Internet and on-line services offer a wide range of information; many people feel that some of it is inappropriate for some audiences, at least some of the time. Parents may not wish to expose their children to sexual or violent images. Businesses may want to prevent their employees from visiting recreational sites during hours of peak network usage. The "off" button is too crude: there should be some way to block only the inappropriate material.

Appropriateness, however is neither an objective nor a universal measure. It depends on at least three factors.

The problem of suppressing inappropriate materials is not unique to the on-line world, and various approaches have been used. The coarsest intervention is to classify some materials as inappropriate and bar them entirely from certain distribution channels. For example, in the United States it is illegal to distribute obscene materials through the mail or most other channels, including computer networks. This approach does not take into account the supervisor, recipient, or context, and it gives great power to those who choose the criteria or perform the classification-- power that may be abused. Moreover, distribution restrictions are difficult to enforce on the Internet; publishers can operate from countries with less restrictive regulations yet still reach a global audience.

The motion picture industry in the United States employs a slightly finer grain mechanism. A rating board, the MPAA, rates movies on a five-point scale, G, PG, PG-13, R, and NC-17. Theaters are supposed to refuse entry to R-rated movies to children under age seventeen who are not accompanied by an adult. While this system considers the recipient-it distinguishes children from adults and supervised from unsupervised children-it treats all fourteen to sixteen-year-old children alike, regardless of their maturity or the mores of their parents.

More flexible, selective blocking system quickly become too burdensome for people to enforce. For example, some parents think that R-rated movies, which tend to contain nudity, are acceptable for their children, but PG movies, which tend to contain violence, are inappropriate. Other parents might prefer a completely different rating system, such as one provided by a church. It would be impractical for the movie theaters to keep track of which rules apply to which children.

The task is feasible, however, for software. The basic idea, illustrated in Figure 1, is to interpose selection software between the recipient and the on-line resources. Several selection software products were either under development or already on the market when we began work on PICS, including SurfWatch, CyberPatrol, NewView, and Parental Guidance. Each product included labels indicating whether certain Internet sites were acceptable as well as software that blocked access to unacceptable sites. But none of them could process the labels provided by a competing product. It was clear to the on-line services and many of the filtering software vendors that some technical conventions would be needed to allow innovations in labeling systems and services to proceed independent of innovations in software that makes use of labels.

Figure 1: selection software automatically blocks access to some resources, but not others.

PICS separates the selection software from the rating labels: any PICS-compliant selection software can read any PICS-compliant labels. In fact, a single site or document may have many labels, provided by different organizations. Consumers choose their selection software and label sources, as illustrated in Figure 2.

Figure 2: selection software blocks some content, based on labels provided by publishers and third-party labeling services, and on selection criteria set by the parent.

The separation of selection software from rating services will enable both markets to flourish. Software companies and on-line services that prefer to remain value-neutral can offer selection software without providing any rating labels; values-oriented organizations can offer labels, even if they lack the expertise to write selection software.

Labels may come from many sources. Information publishers may self-label, just as manufacturers of children's toys currently label products with text such as, "Fun for ages 5 and up." Much as independent consumer magazines rate products, third-party ratings of information resources can also be useful. For example, the Wiesenthal Center, which is concerned about Nazi propaganda and other hate speech available on-line, could label materials that are historically inaccurate or promote hate. A teacher might label a set of NASA photographs and block access to everything else for the duration of an astronomy lesson. A service like Yahoo might rate everything, or at least the most popular resources. With multiple perspectives to choose from, parents and other supervisors can choose labeling sources that reflect their goals and values, and ignore all other labels.

PICS also allows labels to use non-binary rating scales. Rather than being limited to permitted/prohibited labels, services can invent more complex scales. For example, Yahoo labels might include a "coolness" value and a subject category. Non-binary labels enable more flexible blocking rules. For example, if a rating service used the MPAA's movie-rating scale, an eight-year-old might be permitted access only to G-rated sites while a fifteen-year-old might be permitted access to PG-rated sites as well.

A Tour of the PICS Specifications

PICS provides neither selection software nor a rating system. It simply establishes conventions for describing rating systems and for label formats, so that PICS-compatible software can read labels from any source. It also establishes technical specifications for label distribution, so that software from different vendors can exchange labels. We consider each in turn.

First, PICS specifies a standard format for describing a labeling service, the new MIME type application/pics-service. This is the key element that enables selection software to read any set of labels. Selection software reads service descriptions written in this format, to interpret content labels and to help end-users configure selection software.

Figure 3 shows the description of a sample rating service based on the movie rating scale. The initial section includes a pointer to a document that describes the labeling system and criteria for assigning ratings; a URL that serves as an identifier for the labeling service; an icon; a name; and a longer description. The second section describes each of the dimensions, or categories, and the scales used for each. In this case, there is just a single category, with five possible values: G through NC-17. In actual labels, these values would be represented by the integers 0-4; the service description allows a software program to determine that a value of 1 corresponds to the PG rating. Other services might include more than one dimension; For example, the Recreational Software Advisory Council (RSAC) system of advisories for computer games has three: sex, language, and violence.

((PICS-version 1.0)
  (rating-system "http://moviescale.org/Ratings/Description/")
  (rating-service "http://moviescale.org/v1.0")
  (icon "icons/moviescale.gif")
  (name "The Movies Rating Service")
  (description "A rating service based on the MPAA's movie rating scale")

  (category 
   (transmit-as "r")
   (name "Rating")
   (label (name "G") (value 0) (icon "icons/G.gif"))
   (label (name "PG") (value 1) (icon "icons/PG.gif"))
   (label (name "PG-13") (value 2) (icon "icons/PG-13.gif"))
   (label (name "R") (value 3) (icon "icons/R.gif"))
   (label (name "NC-17") (value 4) (icon "icons/NC-17.gif"))))

Figure 3: A PICS-compatible description of a service that is based on the MPAA movie rating scheme.

To illustrate how this service description might be used, consider the prototype configuration software shown in Figure 4. Here the parent is configuring which sites Johnny can visit based on the RSAC system. The parent drags the slider to indicate the maximum permitted value on the violence scale. The software has lifted the icons and the text description ("Strong, vulgar language…") directly from the service description. The details of configuration software packages may differ; the prototype shown here, written by MIT student Jason Thomas, merely illustrates that the selection software can use any service description to automatically generate a reasonable user interface.

Figure 4: Prototype software draws on text and icons in the service description to automatically generate a user interface for configuring selection rules.

PICS specifies a standard format for labels. Figure 5 shows a sample. The URL on the first line, which identifies the labeling service, makes it possible to redistribute labels yet still identify their original sources. The label can also include information about itself, such as the date on which it was created, the date it will expire, that the label is associated with a certain resource (in this case, "http://www.gcf.org/stuff.html"), and the label's author. The last line shows the attributes that describe the resource: a "language" value of 3; "sex" 2; and "violence" 0.

(PICS-1.0 "http://www.rsac.org/v1.0/" 
 labels 
  on "1994.11.05T08:15-0500" 
  until "1995.12.31T23:59-0000" 
  for "http://www.gcf.org/stuff.html" 
  by "John Patrick" 
  ratings (l 3 s 2 v 0))

Figure 5: A sample label from the service described above.

Referring to resources by their URLs makes it possible to match a label with its associated resource, even if the label is distributed separately. In contrast, labels for physical products are difficult to find if they are not attached to the products or their packaging. The separation is important for third-party labelers: for example, the Wiesenthal Center can distribute labels without the cooperation of the hate groups whose materials they are labeling.

Anything that can be named by a URL can be labeled, including resources that are accessed via FTP, gopher, or Netnews, as well as http. The PICS group defined a URL naming system for IRC, so that chat rooms with stable topics can be labeled. It is possible to create a single label that applies to an entire site or a directory within a site, by referring to a partial URL.

Labels can include two optional security features (not shown in the example.) The first is a message integrity check on the content of the resource that is labeled, in the form of an MD5 message digest. This enables software to detect whether changes have been made to a resource after the label was created. The second is a digital signature on the contents of the label itself, which allows software to verify that a label really was created by the service mentioned in it.

Since labels can be authenticated through digital signatures, secure distribution mechanisms are not needed. PICS specifies three ways to distribute labels. The first is to embed labels in HTML documents. This method will be helpful for those who wish to label content they have created.

The second method is for a client to ask an http server to send labels along with the documents it requests. The server would most likely offer the publishers' labels, but a server could also redistribute labels from third parties that it cooperates with. For example, a commercial service might choose to include certain labeling services' "seals of approval." Figure 6 shows a sample interaction: the http GET requests includes an extra header line asking for labels and saying which service's labels should be sent back. The server includes two extra header lines in the response, one of which contains the labels.

Client sends to HTTP server www.greatdocs.com:

GET foo.html HTTP/1.1
Accept-Protocol: {PICS-1.0 {params full {services "http://www.gcf.org/1.0/"}}}

Server responds to client:

HTTP/1.1 200 OK
Date: Thursday, 30-Jun-95 17:51:47 GMTMIME-version: 1.0
Last-modified: Thursday, 29-Jun-95 17:51:47 GMT
Protocol: {PICS-1.0 {headers PICS-Label}}
PICS-Label: …label here…
Content-type: text/html
…contents of foo.html…

Figure 6: A sample interaction: requesting a document and associated label from an http server.

The third way to distribute labels is through a label bureau that dispenses only labels. A bureau could distribute labels created by one or more labeling services. A client asks the bureau for certain services' labels of specific resources. This is most likely to be used for third-party labels.

Drafts of the PICS specifications were released in November 1995 [3, 4]. Reference code and services to aid developers were released in February of 1995. The first commercial PICS-compatible product, which provides a label bureau and a set of third-party ratings, has already been announced, and many other announcements are expected in early 1996.

What PICS doesn't specify

In general, PICS does not specify any technical, user interface, or market structure details that do not affect interoperability. It does not specify how rating services or selection software work, just how they work together.

For example, PICS-compatible software can implement selective blocking features in various ways. One possibility is to build it into a browser, such as Netscape. A second method-one used in most current products, such as SurfWatch-is to perform this operation on each computer, as part of the network protocol stack. A third possibility is to perform the operation somewhere in the network, for example at a proxy server used in combination with a firewall. Each alternative affects efficiency, ease of use, and security. For example, a browser could include nice interface features such as graying out blocked links, but it would be fairly easy for a child to install a different browser and bypass the selective blocking. The network implementation may be the most secure, but could create a performance bottleneck, if not implemented carefully.

PICS does not specify how parents or other supervisors will set configuration rules. One possibility is to provide a configuration tool like the prototype shown above in Figure 4. Parents would choose which labeling services to pay attention to, as well as the maximum allowable values on each dimension. The parents would also choose whether to block access to unlabeled resources. Even that amount of configuration may be too complex, however. Another possibility is for organizations and on-line services to provide preconfigured sets of selection rules. For example, AOL might team up with the Boy Scouts of America to offer "Internet in a Box for Cub Scouts" and "Internet in a Box for Boy Scouts" packages, containing not only preconfigured selection rules, but a default home page provided by the Boy Scouts as well.

Labels can be distributed in various ways. Some labeling services might distribute their labels on CD-ROMs or diskettes rather than using the PICS-specified protocols for distribution over a network. Some blocking clients might choose to request labels each time a user tries to access a resource. Others might cache frequently requested labels or keep a local database, to minimize delays while labels are retrieved.

PICS specifies very little about how to run a labeling service, beyond the format of the service description and the labels. Services can provide simple permission/prohibition labels, or provide information about any dimensionsw that they choose, from sex to coolness to literary quality. Commercial content providers have been meeting to develop a common set of dimensions they would all use, which would make publishers' self-labels more useful to consumers. Third party labelers are likely to use a wide range of other dimensions. Services can label entire sites, or individual documents and images. They can employ professionals, volunteers, or perhaps even computers to do the labeling. Some may strive for comprehensive coverage of the entire Internet, others for narrower areas such as educational sites or even just astronomy. An interesting intermediate offering may be to label the resources that subscribers ask about: while there are thousands of sites and millions of resources available on the Internet, any particular set of users is likely to ask for access to a much smaller set. This approach could be particularly effective for a cooperative service formed by a number of like-minded parents or teachers.

The market structure of labeling services is likely to evolve as services experiment with revenue-generation models. Some may charge subscribers. Others may charge intermediaries such as on-line services for the right to redistribute labels. Others may charge commercial sites for the privilege of being labeled; this may seem far-fetched, but in fact most software game vendors now either pay a fee for the right to self-label using the RSAC system, or pay a competing service, the Entertainment Software Rating Board, to rate their games. We might even see the rise of labeling intermediaries who pay a royalty to values-oriented organizations such as the Boy Scouts for the right to label resources with the Boy Scouts logo, according to criteria set by the Boy Scouts.

Other Uses for Labels

While the primary goal of PICS is to facilitate the use of labels by selection software, PICS-compatible labels can also be used in other ways. For example, a labeling service might rate based on quality or classify resources by subject, allowing users to search for items on a particular subject, or to sort the results of a search based on quality ratings. Browsers could incorporate the contents of labels into visual displays that aid browsing, perhaps highlighting in green links to particularly popular or high-quality items or striking a red line through links to resources that are not recommended. It has even been suggested that labels could convey copyright ownership, distribution rights, and requested payments. Software could check for such labels and demand payment before distributing the labeled items.

One particularly promising application is collaborative filtering, where everyone can contribute ratings, and those ratings are used to guide others toward interesting materials [5, 6]. Guidance can be personalized by matching end-users with others who have similar tastes, as reflected in their ratings of resources that both have examined [7-9]. A browser add-in feature would enable end-users to submit PICS rating labels to a labeling service.

PICS-compatible labels might also be useful for on-line journals, which could publish all submissions, but attach review labels that each reader could interpret as guides to the best articles [10]. One limitation of the PICS approach, however, is that labels cannot contain original text. While PICS-compatible labeling services can associate text phrases or icons with values on numeric scales, so that a frequently used annotation such as, "seminal article" can be encoded, PICS labels currently can not include arbitrary text. A PICS label can, however, include the URL of another document that contains textual annotations, which provides a means of integrating PICS with more general annotation platforms such as ComMentor [11].

Conclusion

PICS provides a labeling infrastructure for the Internet. It is value-neutral-it can accommodate any set of labeling dimensions, and any criteria for assigning labels. Any PICS-compatible software can interpret labels from any source, because each source provides a machine-readable description of its labeling dimensions. A new labeling service can distribute labels directly to clients over the network, or arrange with information providers or on-line services to redistribute the labels.

This system permits the implementation of context-specific rules rather than blanket rules. Around the world, governments are considering restrictions on on-line content. Since children differ, contexts of use differ, and parents' values differ, such blanket restrictions can never meet everyone's needs. PICS will enable labeling services and software to meet supervisors' diverse needs, and the labels will also help users surf the Internet more efficiently.

References

[1] I. de Sola Poole, Technologies of Freedom. Cambridge: MIT Press, 1983.

[2] J. Berman and D. Weitzner, "User Control: Renewing the Democratic Heart of the First Amendment in the Age of Interactive Media," Yale Law Journal, vol. 104, pp. 1619, 1995.

[3] T. Krauskopf, J. Miller, P. Resnick, and G. W. Treese, "Label Syntax and Communication Protocols," Internet Draft draft-pics-labels-00.txt, November 21 1995.

[4] J. Miller, P. Resnick, and D. Singer, "Rating Services and Rating Systems (and Their Machine Readable Descriptions)," Internet Draft draft-pics-services-00.txt, November 21 1995.

[5] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, "Using Collaborative Filtering to Weave an Information Tapestry," Communications of the ACM, vol. 35, pp. 61-70, 1992.

[6] D. Maltz and K. Ehrlich, "Pointing the Way: Active Collaborative Filtering," Proceedings of CHI 95, Denver: ACM. 202-209.

[7] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, "GroupLens: An Open Architecture for Collaborative Filtering of Netnews," Proceedings of CSCW 94: Conference on Computer Supported Cooperative Work, New York: ACM. 175-186.

[8] U. Shardanand and P. Maes, "Social Information Filtering: Algorithms for Automating "Word of Mouth"," Proceedings of CHI 95 Conference of Human Factors in Computing Systems, Denver: ACM. 210-217.

[9] W. Hill, L. Stead, and M. Rosenstein, "Recommending and Evaluating Choices in a Virtual Community of Use," Proceedings of CHI 95 Conference on Human Factors in Computing Systems, New York: ACM. 194-201.

[10] A. M. Odlyzko, "Tragic Loss or Good Riddance? The Impending Demise of Traditional Scholarly Journals," International Journal of Human-Computer Studies, vol. 42, pp. 71-122, 1995.

[11] M. Roscheisen, C. Mogensen, and T. Winograd, "A Platform for Third-Party Value-Added Information Providers: Architecture, Protocols, and Usage Examples," Stanford University CSDTR/DLTR, November 1994, updated April 1995 1995.