The Jigsaw Label Bureau

Comprehensive Design Overview

Introduction

The Jigsaw Label Bureau effort focuses on three related areas:

An implementation of a SQL-driven label bureau in Java with Java Database Connectivity (JDBC). This label bureau implementation has been designed with a front-end that allows it to easily integrate with Jigsaw version 1.
A comprehensive definition of label bureau communication protocols, beyond those defined in the PICS specification.
A language for querying a label bureau by label ratings as well as label options. This facilitates searching based on label ratings. We have chosen PICSRules as this language.

Label Bureau Architecture

We are currently developing a label bureau implementation that supports the above PUT and query extensions to the original PICS protocol. The front-end is a configurable Jigsaw resource that extends the Jigsaw HTTPResource.

Overview

The entry point of all label bureau code is the LabelBureauResource. LabelBureauResource is a Jigsaw resource that extends HTTPResource. It allows clients to make PICS label queries. When it receives an HTTP PUT, it dispatches on the HTTP Content-Type of the PUT, as follows:

	application/pics-labels: submit labels to the bureau.
	application/pics-service: submit a service RAT file to the bureau.
	application/pics-rules: submit a PICSRule to the bureau, expect labels in the response.

When the LabelBureauResource receives a GET, this is always interpreted as a label bureau query, and the LabelBureauResource expects the appropriate syntax.

Database Management

One of the major challenges of this project was deciding on how label data would be represented both in the relational database. The following is a description of how PICS labels are stored in the database, both in the presence and absence of a RAT file. Alternative methods of storing labels are also discussed. We begin with the representation of RAT files.

Labels With Known .RAT Files

The label bureau maintains a table whose default name is "Services." This table records all of the rating services that the bureau knows about, as well as references to their individual table names:

Services
serviceURL: VARCHAR
serviceTableName: VARCHAR

Note

We use relational database notation: the table name is listed in bold as the first column entry. All of the table column names are listed below it. The "primary key" -- column(s) that uniquely identify a record -- are in italicized print. Standard SQL data types are also given. The notation "[identifier]" is used to denote an identifier; {A, B, ...} means only one of A, B, ...

When the label bureau receives a RAT file; it adds the serviceURL to the "Services" table and computes the service ordinal x, where x is one less than the number of services currently installed. Then, the label bureau sets up the label tables for the new service:

service_[x]
forURL: VARCHAR
isGen: BIT
byName: VARCHAR
onDate: VARCHAR
[a]_: {REAL, INT}
...
[z]_: {REAL, INT}

where [a] through [z] are the full PICS names of the categories in this service which take on single values only (i.e., the multivalue attribute is set to false for that category in the RAT file). There is a one-to-one correspondence between service_[x] database rows and labels.

service_[x]_[category]
alpha: {REAL, INT}
beta: {REAL, INT}
forURL: VARCHAR
gen: VARCHAR

where category is the full PICS name of a non-single valued category (i.e., the multivalue attribute is set to true for that category in the RAT file). Each row of a service_[x]_category table holds part of a multivalue rating for the document at forURL. If beta is NULL, that rating is not a range; if beta is not NULL, [alpha, beta] defines a range. The alpha value must not be NULL.

Note

	New labels may be inserted into the SQL database using the above two classes of tables.
	The table service_[x]_categories holds database metadata and is used internally by the label bureau; do not modify this table.

Labels Without Known .RAT Files

The label bureau is able to store labels that were submitted before a RAT file is submitted. This is accomplished by maintaining a table whose name defaults to "UnsortedLabels," where all of the labels' PICS serviceID fields are unknown.

Labels are stored in binary format in the data field of UnsortedLabels. When a new RAT file is encountered, the UnsortedLabels table is scanned by serviceURL to see if any labels can be parsed against the new RAT file.

UnsortedLabels

serviceURL: VARCHAR

forURL: VARCHAR

gen: BIT

data: IMAGE

Concurrency Issues

The label bureau maintains an arbitrary number of database connections at all times. When the label bureau needs to perform database operations, it acquires a database connection through an instantiation of the ConnectionServer object. When it finishes accessing the database, it releases that connection, again through the ConnectionServer.

The ConnectionServer Object

All of the functions in the ConnectionServer are synchronized, to avoid conflicts among multiple threads acquiring connections simultaneously. The bottleneck effect is negligible, however, because the code in ConnectionServer is designed to perform most time-consuming work when the object is constructed, not when acquireConnection() and releaseConnection() are called.

When the ConnectionServer is constructed, MAX_CONNECTIONS number of connections are requested from the database, and the results of each request stored in internal data structures. When acquireConnection is called, the next available Connection is retrieved and returned. If no connections are available in the internal data structures of ConnectionServer, wait is called on that thread and the synchronization lock released. When releaseConnection is called, the supplied Connection is restored to the ConnectionServer, and one thread is woken by a call to notify.

At acquire-time, connections are labeled as read-only or non-read-only. ConnectionServer implements the following policy with respect to read-only and non-read-only connections:

	Multiple read-only connections can be made.
	Non-read-only connections block all other connections.

This policy ensures that data is written to the database without error (database writes are not common), and is read from the database quickly (database reads do not need to be synchronized).

Searching With PICSRules

Semantics of Mimetype application/pics-rules

When the label bureau receives a PICSRule, it is being asked to perform a search for labels stored in the database that match the criteria specified in the Policy clauses in that PICSRule. One way of accomplishing this could be to retrieve all of the labels in the SQL database, instantiate however many Label objects are needed, and filter the labels through the PICSRule. This would be inefficient, however, because the capacity of the database, and consequently, the number of labels stored in a label bureau, is large.

PICSRules to SQL

The solution to this problem is to generate SQL queries from the PICSRule that the label bureau receives. PICSRules are complex, however, so the generation of SQL code is rather complicated. However, one limited SQL query is much faster and efficient than a large SQL query and label filtering, so the conversion to SQL is justified in terms of speed and efficiency.

To convert a PICSRule (a Profile object), the Profile is applied to each of the Service tables in the database. All of the labels resulting from the application of a Profile to a Service are then combined to yield the labels.

PICSRules Query Diagram.gif (7155 bytes)

Applying Profiles to Services

To apply a Profile to a service, the Policy clauses are rewritten as a PoliciesApplication. The final result is a ProfileApplication, the SQL query representing the result of applying the Profile to the service. To rewrite the Policy clauses as a PoliciesApplication, each Policy clause is associated with a matching PolicyApplication.

ProfileApplication :: 'SELECT * FROM' ServiceTableName ' WHERE' PoliciesApplication

ServiceTableName :: [the name of the SQL table for the current service]

PoliciesApplication :: PolicyApplication [Join PoliciesApplication]

Join :: 'AND' | 'OR'

AcceptByURL :: 'forURL =' URL

RejectByURL :: '(NOT (forURL =' URL '))'

RejectIf :: '(NOT (' SExpr '))'

AcceptIf :: '(' SExpr ')'

RejectUnless :: '(' SExpr ')'

AcceptUnless :: '(NOT (' SExpr '))'

SExpr :: [as defined in the PICSRules specification]

URL :: [as defined in RFC-1738]

Note

	PolicyApplication must agree with the Policy clause it represents.
	Join must be 'AND' if it follows RejectIf, RejectUnless, or RejectByURL; Join must be 'OR' otherwise.

References

	Jim Miller, editor. "PICS Label Distribution Label Syntax and Communication Protocols," http://www.w3.org/PICS/labels.html.
	Jim Miller, editor. "Rating Services and Rating Systems (and Their Machine Readable Descriptions)," http://www.w3.org/PICS/services.html.
	Martin Presler-Marshall, editor. "PICSRules 1.1," http://www.w3.org/TR/REC-PICSRules.
	Tim Berners-Lee et al, "Uniform Resource Locators (RFC-1738)," http://ds.internic.net/rfc/rfc1738.txt

Kyle Jamieson jamieson@mit.edu (January 20, 1998)