REFEREE version 1.4d

Rule-controlled Environment for Evaluation of Rules,
and Everything Else

Jonathan Brezin (brezin@watson.ibm.com), Yang-Hua Chu (yhchu@w3.org),
Rohit Khare (khare@w3.org), Brian LaMacchia (bal@research.att.com),
Jim Miller (jmiller@w3.org), Paul Resnick (presnick@research.att.com),
Martin Strauss (mstrauss@research.att.com)

Draft of 11/8/96

1. Introduction

Digital signatures alone are not sufficient for code signing and other applications: signatures can solve the problems of message integrity and authentication, but not more general notions of security or trust. These applications require not only cryptographic tools for determining authenticity and message integrity but also a mechanism that permits users to clearly state their own security policies and evaluate digital signatures with respect to those policies. For example, in a code signing application a user must state under what conditions do they trust the code. Similarly, the entity signing the code must state precisely what properties they claim the code has (e.g. virus-free, works with particular hardware platforms).

This document describes both a specific language for describing trust policies and a general mechanism for evaluating them. The profiles language includes built-in constructs that make it easy to specify combinations of predicates on the contents of PICS labels. The language can call out to subroutines written in other languages. For example, a digital signature checking subroutine can verify the authenticity of PICS labels.

The profiles language is best understood (and implemented) as part of a more general mechanism, called REFEREE. Deciding whether a piece of software is safe or some action is authorized often involves further safety-related decisions about which credentials to fetch, how to authenticate them, and possibly whether to install new interpreters for credentials written in unknown languages. REFEREE places all such decisions under explicit policy control. Hence the acronym REFEREE, short for "Rule-controlled Environment For Evaluation of Rules and Everything Else." That is, REFEREE is a system for writing policies about policies, as well as policies about cryptographic keys, PICS label bureaus, certification authorities, or anything else. Within REFEREE every policy evaluation is under the control of some other local policy. It sounds complicated, but in practice it just means there is a root policy which decides not only the final authorization question but also any intermediate decisions about with other programs to run while making that decision. Those readers interested in PICS, but not necessarily in digital signature applications, may want to consult first the accompanying Gentle Introduction to REFEREE for PICS developers, which also includes sample policies about PICS labels and a generic template to express Internet Explorer 3.0 Content Advisor policies.

Section two of this document describes the underlying architecture of the REFEREE trust management engine. Section three describes the Profiles-0.92 language for specifying user policies. Section four presents a library of primitive functions that must be provided by every conforming implementation. Finally, Section five presents a detailed example of operation within the environment of the World-Wide Web where the policy refers to PICS labels.

2. Underlying Architecture

2.1 Overview

The core of the REFEREE trust management system is premised on the following four assumptions:

An application calls REFEREE with a question concerning a particular action. The question being asked is implicit in the action, and it is up to the calling application to properly interpret REFEREE's response. (For example, a Web browser may ask REFEREE about the action "view-URL" with respect to a particular URL and may allow or block viewing of the URL based on REFEREE's response.)
For every possible action the application can ask about, there is a corresponding policy that is responsible for determining whether the action should be taken.
The result of calling REFEREE is a pair consisting of a tri-value and a justification for the tri-value. A tri-value is a three-valued logic element, corresponding to the following three possible statements from the trust management system:

true
false
unknown

REFEREE returns the value "unknown" when it does not have enough information (under the active policy) to definitively answer true or false. The justification returned with the tri-value is a list of statements (assertions) generated during the system's evaluation of the proposed action. These statements may provide the application with information concerning why the action request returned true, false or was left unanswered.

Figure 1 below provides a block-diagram view of the interface between a calling application and the REFEREE system.

Figure 1: The external interface to REFEREE

Applications call REFEREE and pass as an argument the action about which the application wants a trust decision. For example, in a PICS filtering context the action might be "can the user view the document with URL 'http://www.gcf.org/'?" REFEREE then processes that request under the policy for "viewing documents" and, when it has finished, returns to the application the discovered answer and justification. Of course, the application may ignore the returned result if it chooses to do so; REFEREE provides only guidance, not enforcement of policy.

2.2 Policies

The REFEREE system itself is an extensible, self-modifiable, recursive execution environment, although it appears to the user to be a monolithic "true/false/unknown" decision box. The basic computing units within REFEREE are policies. A policy is an executable block of code that processes input statements and asserts additional statements. We may think of a policy as a "program element" and inside the "REFEREE" box in the previous block diagram is an entire network of interconnected program elements. Figure 2 shows one example network of interrelated policies; the "top-level policy" and all three "sub-policies" are all REFEREE policies.

Figure 2: Block diagram of REFEREE internal structure.

Within the context of the REFEREE system the terms "policy," "program," and "program element" are interchangeable. All three terms refer to executable blocks of code, and it may be easier to think of policies as small self-contained programs or scripts in a scripting language. (REFEREE policies are similar to PolicyMaker [BFL] assertions. In both systems policies are code fragments and policy evaluation is accomplished by running the policies in some environment. When policy processing has completed, the final state of the execution environment determines whether the given set of policies has been satisfied.) The policy that controls overall operation of the REFEREE system, therefore, is a policy controlling the operation of other policies. (This is the "top-level policy" in Figure 2.) The overarching policy states what "sub-policies" should be run when and how those subpolicies interact with each other.

Every REFEREE policy has the same semantics and calling sequence, no matter what language was used to construct the policy. All policies (programs) have a single mandatory input argument: a list of statements (conveying information). Policies may have any number of additional arguments. Every policy returns as output a pair containing a tri-value and a statement list.

Figure 3: Required interface for every REFEREE policy.

Figure 3 shows the calling sequence of every policy in REFEREE. Thus, every policy block in Figure 2 above is actually a "subpolicy" call with at least a list of statements passed as an argument and a (tri-value, statement-list) returned as the exit value.

The idea of "policy execution under policy control" is central to the REFEREE model. Everything that happens during the execution of REFEREE is under the control of some policy, and at the root of the control hierarchy is a local policy. Of course, there must be some initial policy information provided by an external source in order to bootstrap REFEREE and root the recursion tree. We detail the bootstrapping process later in this section.

2.3 Primitive Data Types

REFEREE has three primitive underlying data types: they follow directly from the three operating assumptions in the previous section. The underlying data types are:

policy databases
tri-values, and
statements (and lists of statements).

Policy databases provide bindings between symbolic names and executable code. When an application calls REFEREE on a particular action request, a global policy database tells it what policy is responsible for making decisions about the requested action. REFEREE then invokes that policy/program, passing a statement list plus any additional arguments. Applications calling REFEREE are required to provide a core set of trusted action-policy bindings to bootstrap the system; additional bindings may be installed dynamically pursuant to policy control.

Tri-values are three-valued logic operands. We designate the three possible values as "true", "false" and "unknown" to distinguish tri-values from traditional Boolean values. Tri-values may be combined in manner similar to Boolean values. They are one of the outputs of policies. Tri-valued logic is detailed in Section 3 below.

Statements express information acquired during the execution of policies. They are the common information interexchange containers among different programs. All statements are two-element s-expressions. The first element conveys the context of the statement and the second element provides the content. (For example, a context might be the program that was running when the information was acquired. The content might be an assertion that a particular document has been virus-checked.)

2.4 Policy-names, Policies and Interpreters

As mentioned above, a policy database provides a set of bindings between symbolic names and executable objects, making it possible to call policies by name (as is common in almost all programming languages). The policy database provides a level of indirection that is under policy control; that is, a policy can selectively "install" or "uninstall" elements of the table of policies to control the availability of those policies.

We make the distinction between "policies" and "interpreters" so that one interpreter can process any number of policies written in a particular language. Thus, the policy database is segregated into a "policy" table and an "interpreter" table. Each entry in a policy table is a triplet (policy-name, policy, language-name); each entry in the interpreter table maps language-names (strings) to interpreters.

Table 1: A sample policy database

Policy name	Policy	Language-name
"view URL"	<view-URL-policy>	Profiles-0.92
"load-label"	<load-label-policy>	Java

Table 2: A sample interpreter database

Language-Name	Interpreter
Profiles-0.92	<Profiles-0.92-interpreter>
Java	<Java interpreter>

There can only be one entry in the policy table for a particular policy-name, and likewise there may only be one entry in the interpreter table for a particular language-name.

All conforming implementations are required to provide a policy database and an interpreter database in the execution environment of every policy. These two databases have dynamic scope and may be modified during the execution of REFEREE policies. Modifications to the policy databases are not persistent across calls to REFEREE. Of course, as the system is bootstrapped by a calling application the application may provide some persistent storage on its own. Furthermore, all conforming implementations are required to provide an interpreter for the Profiles-0.92 language, described in Section three below, and the primitive policies described in Section four. A binding between the symbolic language name "Profiles-0.92" and the interpreter for Profiles-0.92 must exist in the bootstrap interpreter database provided by the calling application.

2.5 Bootstrapping REFEREE

There is one further component to the REFEREE system that must be specified to complete the architectural picture: the information required from the calling application to bootstrap REFEREE. Recall the REFEREE mantra, " everything under policy control." In order to process a policy there must be a policy-controlling policy already in place. Thus, to bootstrap the system there needs to be an initial policy provided ex machina (ex application) to start the system running. The bootstrap policy (the policy input in Figure 1 above) is in fact provided as an initial set of policy and interpreter databases, as providing the bindings in these databases for particular policies and interpreters is sufficient to make REFEREE aware of them. Thus, if the calling application is asking REFEREE about a "view-URL" action, the bootstrap databases must contain at a minimum a binding for a "view-URL" policy and an interpreter for whatever language that policy is written in.

In addition to the bootstrap policy databases it may also be necessary for the calling application to pass along other information to REFEREE. Any action-specific information may be passed either as part of the statement list or as extra arguments (assuming that the policy bound to the given action expects to receive additional arguments). For example, the policy for viewing URLs in a PICS-related context might expect the URL in question to be passed as an argument, and label bureaus to be contacted for PICS labels to be declared on the initial statement list.

2.6 Persistence of objects in REFEREE

It is our assumption that none of the structures described here, including the bootstrap policy and interpreter databases, inherently persist across multiple invocations of REFEREE. That is, when a policy installs another policy into the policy database, that installation is only in effect for the remainder of the current trust management computation. A new top-level input will start with a new policy database. We suspect that individual applications will want to create or maintain their own persistent storage (or designate an area of the filesystem for same) and pass in richer initial policy and interpreter tables to REFEREE.

2.7 Moving beyond the REFEREE core

With the "big-picture" architectural view of REFEREE in mind we may now move on to consider the details of the various policy boxes shown in Figure 2 above. In Section 3 we describe the policy language Profiles-0.92, a minimal yet relatively expressive language well-suited for writing top-level policies. In Section 4 we describe the input/output specifications of other policy blocks that we expect to be present in all REFEREE implementations. These "required library policies" provide necessary functions such as network access.

3. Profiles-0.92: A Simple Profile Language

3.1 Overview of Profiles-0.92

In this section we present Profiles-0.92, a simple language for describing top-level trust management policies. Profiles-0.92 allows the user to:

install new policies and new interpreters,
call other policies,
combine tri-values to form other tri-values,
search statement-lists for statements that match a particular pattern, and
return "true", "false" or "unknown" along with a justification for making that decision.

Profiles-0.92 is intended to be a scripting language for "top-level policies" in REFEREE (the kernel policies shown in Figure 2 above). As such, the language contains constructs for linking together other policies and controlling the overall operation of the trust management system. Profiles-0.92 is not intended to be a general-purpose language suitable for writing policies that perform arbitrary computations; other programs must be invoked to perform complex computations and create new statements. Profiles-0.92 provides constructs that combine return-values (or pieces of return-values) produced by other invoked programs, but cannot generate new statements without help.

3.2 Provided special forms and their semantics

Profiles-0.92 provides seven types of primitive operations. They are:

Invocations (ways to run other policies)

(invoke <policy-name> <statement-list> <additional-args>*)invoke calls the policy named <policy-name> with a copy of the <statement-list> and possibly some other additional arguments. When the called policy returns, its return value is a pair consisting of a tri-value and a statement-list. By convention, the statement-list contains statements that the called policy wants appended to the statement-list. For every statement in the returned statement-list, Profiles-0.92 (a) prepends the name of the called policy to the context of the statement, and (b) appends the statement to the original statement-list that was referenced in the call to invoke. The return value of the (invoke ...) construct is a pair of the tri-value returned by the called policy and the list of tagged statements resulting from prepending the name of the called policy in step (a).
Installations (ways to install bindings between names and policies or interpreters):

(install-policy <statement-list>) (install-interpreter <statement-list>)Recall that in Profiles-0.92 there are two independent, global databases in REFEREE: the policy database and the interpreter database. install-policy creates bindings in the policy database and install-interpreter creates bindings in the interpreter database. In both cases, the information required to make these bindings are passed within a statement-list containing a single statement. install-policy expects to see a statement of the form:

((<context>) (<policy-name> <policy-body-or-pointer> <language-name>))Similarly, install-interpreter expects to see a statement of the form:

((<context>) (<language-name> <interpreter-body-or-pointer>))
URL prefix matching: (ways to match particular URL prefixes)

(url-match URL ( <string+> ) <prefix-match?>)This functions provides a means of explicitly returning a tri-value based on a string or substring matche against a particular URL. The first argument is the subject URL. The second argument is a list of strings to be matched against the given URL. The third argument is a boolean value that determines whether the string matching should be exact or prefix matching. If this argument is true then URL must exactly match one of the strings for the resulting value to be true. If false, then one of the strings must be a prefix of URL in order for the resulting value to be true. The <prefix-match?> argument is optional; if it is not present it is assumed to be false and exact matching is performed.

For example, the operation (url-match URL ("http://www.whitehouse.gov/" "http://w3.org/") true)returns the tri-value true if the requested URL has a prefix of either "http://www.whitehouse.gov/" or "http://w3.org/," and otherwise returns false. (It is not possible for url-match to return a tri-value of "unknown.")

The statement-list returned by url-match consists of statements of the form "(url-match <URL-prefix>+)" for each relevant URL prefix. In the above example, if "http://www.whitehouse.gov/president" was requested then (url-match URL ("http://www.whitehouse.gov/" "http://w3.org/") true) returns ( (url-match "http://www.whitehouse.gov/") ) as a list containing one statement.
Combinations (ways to combine tri-values)

(and <rule>+) (or <rule>+) (not <rule>) (true-if-unknown <rule>) (false-if-unknown <rule>) (threshold-and <num> <rule>+)These functions all operate on pairs of tri-values and statement-lists and return pairs of tri-values and statement-lists. The semantics of these operators are detailed below in Section 3.4.
Local variable binding

(let (<binding>+) <rule>+)Let creates a new sub-environment of the current execution environment and creates in that sub-environment new variable-value bindings. The created local bindings are listed in the list of bindings <binding>+. Each <binding> is a list of the form: (<var> <expression>); the variable <var> is bound to the result of evaluating <expression>. Each <expression> may optionally be null (not present), in which case the variable is defined but its value is unassigned. The bindings remain in effect for the scope of the rules in the body of the let.
Pattern matching

(match <pattern> <statement-list>) match returns the subset of <statement-list> that matches the given <pattern>. The pattern-matching language is described in Section 3.5 below.
Immediate values

true false unknownThese three reserved keywords evaluate to the corresponding tri-values.

3.3 Provided local variables

Profiles-0.92 also provides the following local variables within :

STATEMENT-LIST: This global variable is bound to the statement-list that is passed to a Profiles-0.92 program as its mandatory argument.
URL: This global variable is bound to the first additional argument in the call to a Profiles-0.92 program. The first additional argument is assumed to be the URL of the object of the requested action (e.g. the URL of the software potentially to be downloaded).

3.4 Semantics of Combinations

Every implementation of Profiles-0.92 must provide the following six operations for combining tri-values:
(and <rule>+) (or <rule>+) (not <rule>) (true-if-unknown <rule>) (false-if-unknown <rule>) (threshold-and <num> <rule>+)

The operators and, or and threshold-and are multi-argument operators; not, true-if-unknown and false-if-unknown are unary operators.

We provide the truth tables for each operator below. All operators accept as inputs tri-values and return tri-values as outputs. The operators and, or and not act as their Boolean counterparts if none of their arguments are "unknown." Intuitively, a result of true or false indicates that the result would hold no matter what the actual truth value of any unknown inputs. Otherwise the result is unknown.

3.4.1 The `and` operator

The and operator is the three-valued version of Boolean and. It may take any number of arguments; this truth table describes the operation of and when it is given two arguments. The rows represent the truth values of the first argument, the columns the truth values for the second argument, and the cells the truth value of the and expression.

Table 3: Truth Table for the and operator

	true	unknown	false
true	true	unknown	false
unknown	unknown	unknown	false
false	false	false	false

For more than two arguments, and recursively reduces itself one argument at a time:

(and arg1 arg2 arg3 … argn) = (and (…(and (and arg1 arg2)
  arg3) …argn)

The and of a single argument is that argument itself. The and of no arguments is true by definition.

3.4.2 The `or` operator

The or operator is the three-valued version of Boolean or. It may take any number of arguments; this truth table describes the operation of or when it is given two arguments:

Table 4: Truth Table for the or operator

	true	unknown	false
true	true	true	true
unknown	true	unknown	unknown
false	true	unknown	false

For more than two arguments, or recursively reduces itself one argument at a time:

(or arg1 arg2 arg3 … argn) = (or (…(or (or arg1 arg2) arg3)
  …argn)

The or of a single argument is that argument itself. The or of no arguments is false by definition.

3.4.3 The `not` operator

The not operator is the three-valued version of Boolean not. It takes exactly one argument. Here is the truth table for the not operator:

Table 5: Truth Table for the not operator

	output
true	false
unknown	unknown
false	true

3.4.4 The `true-if-unknown` operator

The true-if-unknown operator is a projection function from three-valued logic to Boolean logic. It takes exactly one argument:

Table 6: Truth Table for the true-if-unknown operator

	output
true	true
unknown	true
false	false

3.4.5 The `false-if-unknown` operator

The false-if-unknown operator is a projection function from three-valued logic to Boolean logic. It takes exactly one argument:

Table 7: Truth Table for the false-if-unknown operator

	output
true	true
unknown	false
false	false

3.4.6 The `threshold-and` operator

The threshold-and operator implements "any m of n" semantics on a list of three-valued arguments. The threshold-and operator takes at least one argument; the threshold value, a non-negative integer. A call to threshold-and looks as follows:

(threshold-and threshold arg1 arg2 arg3 … argn)

Let n_T,n_F and n_U be, respectively, the number of arguments to threshold-and (arg1…argn) that evaluate to true, false, and unknown. We have 0 <= n_T,n_F,n_U, <= n, and further n_T + n_F + n_U = n. Then the value of threshold-and is computed as follows:

If n_T>= threshold, return true.
If n_T< threshold and n_T+ n_U >= threshold, return unknown.
Else, if n_T+ n_U < threshold, return false.

By definition, (threshold-and 0) evaluates to true.

3.4.7 Completeness

Appendix B below provides a proof that this set of three-valued operators is complete and provides other details on the logic system.

3.4.8 Returned statement-lists

Besides a tri-value, each combinator also returns a statement-list of data that was relevant in forming the decision. Each of the combinations currently unions the statement-lists returned by all of its arguments, although we believe that this issue is more complex and requires further study.

3.5 Pattern matching language

The pattern matching language for the match operator is basically an s-expression-aware structure matcher extended with restrictions. That is, the patterns and the matcher understand s-expression syntax and structure, so that patterns can be written that match particular subexpressions within a generic s-expression. (S-expressions are parenthesized collections of numbers, words, strings and other s-expressions.)

In addition to parentheses and literal elements, the pattern matcher has three generic match elements:

".": matches any single s-expression.
"*": matches a sequence of zero or more s-expressions.
"+": matches a sequence of one or more s-expressions.

Thus, "( * 3 * )" matches "(3)", "(2 3 4)", but not "(2 4 5)". Similarly, "(. (sha-1 +) * )" matches "((foo) (sha-1 3))" but not "((foo) bar (sha-1 3))". Quoted strings are matched on a case-sensitive bases; all other elements are matched insensitive to case.

Restrictions are comparison operators that perform simple evaluation on elements of the pattern. For example, in a PICS environment we may want to test whether the value associated with a transmit-name is less than some threshold value. All restrictions are s-expressions of the form (RESTRICT operator literal value) where RESTRICT is a literal symbol. A restriction syntactically matches an s-expression of the form (literal *). The pattern matcher then compares the matched value to the value compared within the give pattern and tests for the given restriction operator.

Restriction examples:

"(RESTRICT < n 3)" matches "(n 2)"
"( * (RESTRICT < n 3) * )" matches "(foo bar baz (n 2) quux)"
"( * (RESTRICT < n 3) * )" does not match "(foo bar baz (n 3) quux)"

The pattern matcher supports the following comparison operations:<,>, =, <=,>=, <> (<> is the "not equal" operator).

If no statements syntactically match the required pattern, the returned tri-value is "unknown." If some statements match and no restrictions are included, the return tri-value is "true." If statements match and there are restrictions, the return tri-value depends on whether the predicates in the restrictions are met. Each operator exists in both normal and "-!" form. The presence of an "!" does not modify the matching operation but does change the way the overall match construct computes the returned tri-value. For operators ending in "!", match returns "true" only if every statement that syntatically matches the restriction satisfies the predicate. For non-"!" operators match returns "true" if any syntactically-matching statement also satisfies the predicate. If more than one restriction is present, their tri-values are implicitly "and"ed together: if any is false the match returns false.

The backslash ("\") character has special meaning within patterns; it is used to quote pattern elements that would normally have special semantics. That is, to match the character "+" as opposed to one or more s-expressions, use "\+" in the pattern. (The reserved word "\RESTRICT" can similarly be quoted to match the actual word "restrict" and not invoke restriction handling.)

The comma (",") character also has special meaning within patterns. A comma denotes a symbolic name and indicates that the value bound to that name in the execution environment should be inserted at that point in the pattern. This mechanism allows variable values to be instantiated into patterns. A quoted comma, "\,", matches a literal comma.

Notice that this specification of the pattern matcher does not provide any way to extract substructures matched by portions of a pattern. Appendix A below suggests a possible revision to the semantics of let and match that would allow variable binding to matched subexpressions.

3.6 Formal Grammar

The grammar is written in modified form of BNF.

policy :: rule+
rule :: let-rule | combine-rule | invoke-rule | install-rule |
        projection-rule | match-rule | url-match-rule 

let-rule :: '(' 'let' '(' let-binding+')' rule+ ')'
let-binding :: '(' variable-name let-expression ')'
variable-name :: symbol
let-expression :: rule | ''

combine-rule :: unary-combine-rule | multi-combine-rule | threshold-rule
unary-combine-rule :: '(' unary-operator rule ')'
unary-operator :: 'not' | 'true-if-unknown' | 'false-if-unknown'
multi-combine-rule :: '(' multi-combine-operator rule* ')'
multi-combine-operator :: 'and' | 'or'
threshold-rule :: '(' 'threshold-and' threshold-value rule* ')'
threshold-value :: number

invoke-rule :: '(' 'invoke' policy-name statement-list optional-argument* ')'
policy-name :: quoted-name
statement-list :: '(' statement* ')'
statement :: '(' context content ')'
context :: s-expression
content :: s-expression
optional-argument :: s-expression

install-rule :: install-policy | install-interpreter
install-policy :: '(' 'install-policy statement-list ')'
install-interpreter :: '(' 'install-interpreter' statement-list ')'
 
projection-rule :: '(' 'tri-value' rule ')' | '(' 'statement-list' rule ')'

match-rule :: '(' 'match' pattern rule ')'
pattern :: '*' | '+' | '.' | string-literal | symbol-literal | restriction |
           '(' pattern* ')' | `\` string-literal | `,` string-literal
restriction :: '(' 'RESTRICT' restriction-operator transmit-name value ')'
restriction-operator :: '<' | '>' | '=' | '<=' | '>=' | '<>' |
                        '<!' | '>!' | '=!' | '<=!' | '>=!' | '<>!'

url-match-rule :: '(' 'url-match' variable-name string-literal+)

4. Library of Primitive Policies and Languages

It is suggested that conforming implementations provide these standard policies and languages for minimum expressiveness, although applications are only required to provide Profiles-0.92. As mentioned above, applications calling the trust management system are required to provide a language-name/interpreter binding for Profiles-0.92. Applications that provide other policies and languages on bootstrap must also provide those bindings.

The language Profiles-0.92
PICS label loader ("load-label"): a policy that tries to locate PICS labels for a given URL.
URL fetcher ("URL-loader"): a policy tries to load the contents of a particular URL.

In addition to these three modules, conforming implementation will likely want to provide the following additional modules for interfacing to the digital signature block now under development:

dsig-parser: parses digital signature blocks and converts embedded credentials into SPKI-like 5-tuples.

4.1 Syntax of the PICS label loader

The PICS label loader is a primitive policy that tries to locate PICS labels for a given URL. The PICS label loader is a policy, written in some language, that may be called via Profiles-0.92's invoke mechanism. We assume for purposes of this section that a binding exists in the policy database between the PICS label loader code and the symbolic name "load-label."

The arguments passed to load-label are as follows:

a statement list (required)
the subject URL for which labels are sought (required)
the rating service of interest
a list of places to look for labels (optional)

A typical call to load-label from Profiles-0.92 would look like this:

(invoke "load-label" statement-list "http://www.w3.org/Overview.html"
        "http://www.gcf.org/v2.5" (EMBEDDED "http://www.label-bureau.org/"))

The URL in question in this example is "http://www.w3.org/Overview.html" and the label loader is told to first check the document itself for embedded labels (the EMBEDDED keyword) and then, if necessary, contact the label bureau at www.label-bureau.org. Only labels that use the rating service http://www.gcf.org/v2.5 are desired.

When load-label discovers a label for the given URL it creates a new statement for the statement-list that consists of the assertions contained within the label. To continue the example, assume that when load-label looks for labels for http://www.w3.org/Overview.html it finds the following label:

(PICS-1.1 "http://www.gcf.org/v2.5"
          labels on "1994.11.05T08:15-0500"
            by "John Doe"
            until "1995.12.31T23:59-0000"
            for "http://w3.org/PICS/Overview.html"
            ratings (suds 0.5 density 0 color/hue  (1 2:3)))

For each label that is found, load-label parses the label and creates a new statement containing the contents of the label in a structured format. Basically, keyword-value pairs in the original PICS label are turned into two-element lists (with the key as the first element). For the above example, the content of the new statement generated by load-label looks like this (line numbers are for identification purposes only):

 1.(("load-label" "http://w3.org/Overview.html" EMBEDDED)
 2. ((version "PICS-1.1")
 3.  (service "http://www.gcf.org/v2.5")
 4.  (by "John Doe")
 5.  (on "1994.11.05T08:15-0500")
 6.  (original (PICS-1.1 "http://www.gcf.org/v2.5"
 7.                      labels on "1994.11.05T08:15-0500"
 8.                      by "John Doe"
 9.                      until "1995.12.31T23:59-0000"
10.                      for "http://w3.org/PICS/Overview.html"
11.                      ratings (suds 0.5 density 0 color/hue (1 2:3)))")
12.  (ratings (color/hue (1 2 : 3)) 
13.           (density 0)
14.           (suds 0.5))
15.  (until "1995.12.31T23:59-0000")))

Line 1 is the header containing identifying information for the content of the statement. The header is a list consisting of the name of the policy ("load-label"), the URL for which labels were sought, and the source of this particular label. The source may be (a) the URL of a label bureau, (b) the reserved keyword EMBEDDED (for embedded labels), or (c) the reserved keyword ALONG-WITH (for labels transmitted in the HTTP stream along with the content).

Lines 2-15 jointly contain a list of keyword-tagged lists. This list is the content of the discovered label. Each keyword-tagged field (another list) corresponds to some portion of the original label, Most of the keyword-tagged fields are simply keyword-value pairs (lines 2-11,15). Ratings information can be richer, however, and thus its structure is more complex.

The original label itself is included, unparsed, in the original field (lines 6-11). (Some policies, such as signature checks, may require access to the transmitted exact form of the label.)

Rating values themselves (lines 12-14) are either single floating-point numbers or multi-values consisting of two floating-point numbers separated by a colon (indicating a range of values). Note that, unlike in PICS labels, each dimension-value pair is enclosed in parentheses; this makes it easier to write match statements in the Profiles-0.92 language.

All poptions (parenthesized options) present are sorted and listed in PICS canonical order (alphabetized by shortest name in US-ASCII character collating sequence). The shortest name of an option is always used.

Recall from PICS label syntax that a label-list consists of (possibly) several labels. load-label returns one statement for each label. The order of the returned statements is not guaranteed by "load-label," they may appear in any order. Each option is guaranteed to appear at most once. load-label merges the service-specific options and the label-specific options according to the scope rules described in PICS label syntax.

Finally, here is a formal grammar for the statement content returned by load-label:

returned-content-list :: '(' *content ')'
content ::  '(' header version service *poption ratings ')' 
header :: '(' '"label-loader"' quotedURL label-source ')' 
label-source :: bureau | 'EMBEDDED' | 'ALONG-WITH'
bureau ::  quotedURL
version :: '(' 'version' '"PICS-1.1"' ')'
service :: '(' 'service' quotedURL ')' 
poption :: '(' option ')' 
option :: 'by' quotedname | 'gen' boolean | 
            'for' quotedURL | 'on' quoted-ISO-date |  
            'signature-rsa-md5' base64-string | 
            'exp' quoted-ISO-date | 'at' quoted-ISO-date | 
            'md5' base64-string | 'comment' quotedname | 
            'full' quotedURL |  'original' quotedname | 
            'extension' '(' mand/opt quotedURL data* ')' 
ratings :: '(' 'ratings' *rating ')'
rating :: '(' transmit-name number ')' |
           '(' transmit-name '(' *multi-value ')' ')'
transmit-name ::  1*urlchar 
alphanumpm :: 'A' | ... | 'Z' | 'a' | ... | 'z' | '0' | ... | '9' | '+' | '-' 
urlchar :: alphanumpm | '.' | '$' | ',' | ';' | ':' | '&' | '=' | '?' | '!' |
                        '*' | '~' | '@' | '#' | '_' | '%' hex hex

The symbols quotedURL, quotedname, quoted-ISO-date, base64-string, mand/opt, data, hex, multi-value and number are defined as in the PICS label syntax. Our definition of the symbol transmit-name differs from the definition in that document because in our definition parentheses are not allowed in transmit-names (except via the percent-hex-hex mechanism).

4.2 Syntax of the URL loader

The URL loader is a policy for retrieving the data addresses by a URL. The URL loader takes two required arguments:

a statement list
the URL to be loaded

The call to the URL loader is as follows (assuming that the policy is bound to the name load-URL in the execution environment):

(invoke "load-url" statement-list "http://www.w3.org/Overview.html")

The URL loader returns a tri-value of true so long as some response was obtained from the server (otherwise it returns false). The statement-list returned by the URL loader consists of a single statement of the form:

(("load-url") ("http://www.w3.org/Overview.html" <url-data>))

where <url-data> is the data returned as a result of fetching the given URL. Note: it is an open issue whether the <url-data> that is returned should be the unparsed string obtained from the remote server or a parsed, structured representation. For example, for HTTP data the headers exchanged as part of the HTTP protocol are probably desired as well as the content of the document pointed at by the URL.

4.3 Syntax of the Digital Signature block parser

The digital signature (DSig) block parser is a policy for processing digital signature information about Web documents. The format of digital signature blocks (information containers for digital signatures) is still under development, so the syntax provided here is subject to change. The DSig parser takes as input statements containing digital signature blocks and translates those blocks into SPKI-like 5-tuples of the form <issuer, subject, delegation, authorization, validity>. Each 5-tuple becomes a separate statement on the statement-list. The generic format of a DSig 5-tuple is as follows:

(<context> (dsig (issuer <the-issuer>)
                 (subject <the-subject>)
                 (delegation <the-delegation>)
                 (authorization <the-authorization>)
                 (validity <the-validity>)))

The DSig parser takes exactly one argument: a statement list containing raw DSig blocks. A typical call to the parser from Profiles-0.92 would look like this (we assume the policy is bound to the name dsig-parser):

(invoke "dsig-parser" statement-list)

The raw DSig blocks will presumably appear on the statement-list either as the result of calls to URL-loader or some other policy that acquires digital signature information. For each statement in the statement-list that contains a raw DSig block the parser returns a statement containing the digital signature information in 5-tuple format. The tri-value returned by the parser is true if the conversion succeeded, otherwise false.

We suspect that the DSig parser itself will call special-purpose policies to handle the conversion of particular signature formats into 5-tuples. Specifically, we expect that policies for X.509v3-, SDSI- and SPKI-format credentials will be needed.

5. Examples

In the following examples, the given policy is bound to the action view-URL in the bootstrap policy database. The bootstrap policy database also provides bindings for the action "load-label" and, in Example 5, "check-md5". The bootstrap interpreter database provides similar, appropriate bindings.

Table 8: Policy database for sample policies

Action	Program	Language
"view-URL"	<as below>	Profiles-0.92
"load-label"	<implementation-dependent>	load-label-language
"check-md5"	<implementation-dependent>	check-md5-language

Table 9: Interpreter database for sample policies

Language	Interpreter
Profiles-0.92	[Profiles-0.92 interpreter]
load-label-language	[implementation-dependent]
check-md5-language	[implementation-dependent]

Sample Policy 1: View a URL only if a PICS labels for the URL using the "witnessing" system that asserts "witnessed"

(invoke "load-label" STATEMENT-LIST 
        "http://www.witness.org/third-party-witness.html" URL)
(match
 (("load-label" *)
  (* ((version "PICS-1.1") * 
      (service "http://www.witness.org/third-party-witness.html") 
      * (ratings (witnessed 1)))))
 STATEMENT-LIST)

This policy loads labels for the given URL and searches for labels in the "third-party-witness.html" rating service that assert "witnessed". If any label matches the given pattern the policy returns a tri-value of "allow".

Sample Policy 2: View any URL with a PICS label in the Musac system with "s < 2"

(invoke "load-label" STATEMENT-LIST "http://www.musac.org/" URL)
(match
 (("load-label" *) 
  (* ((version "PICS-1.1") * (service "http://www.musac.org/") *
      (ratings (RESTRICT < s 2))))))
 STATEMENT-LIST)

This policy has two steps. First, we invoke to find and download labels for the given URL; any labels found will be put on the statement list. We then run the pattern matcher over the now-modified statement list looking for any label using the rating service from "http://www.musac.org/" and with an s rating less than 2.

Sample Policy 3: View URLs only if all PICS labels in the Musac system for the URL say "s < 2"

(invoke "load-label" STATEMENT-LIST "http://www.musac.org/" URL)
(match
 (("load-label" *) 
  (* ((version "PICS-1.1") * (service "http://www.musac.org/") *
      (ratings (RESTRICT <! s 2))))))
 STATEMENT-LIST)

This policy is almost identical to Sample Policy 2 above, except that the restriction operator within the pattern matcher is "<!" instead of "<". The "!"-ending restriction operators require every matching statement that is tested against the operator to satisfy the operator.

Sample Policy 4: Combinations of simple filters

(invoke "load-label" STATEMENT-LIST 
        "http://www.e-trust.org/privacy-descriptions" URL)
(invoke "load-label" STATEMENT-LIST 
        "http://www.dma.org/privacy" URL)
(and
 (match
  (("load-label" *) 
   (* ((version "PICS-1.1") 
       * (service "http://www.e-trust.org/privacy-descriptions") *
       (ratings (RESTRICT = data-reselling 0))))))
  STATEMENT-LIST)
 (match
  (("load-label" *) 
   (* ((version "PICS-1.1")
       * (service "http://www.dma.org/privacy") * 
       (ratings (RESTRICT < personal-data-collected 3))))))
  STATEMENT-LIST))

This policy uses labels from two services In the first step, we load labels for URL by invoking "load-label" twice, once for each rating service; "load-label" adds statements to the statement-list. We then search the returned statements for two types of labels:

Labels using a rating service described by E-Trust, and
Labels using a rating service described by DMA.

Each pattern-match uses criteria specific to one a rating service, and then the results of those two pattern-matches are combined using and. Thus, in order for the policy to be satisfied there must be labels in both rating services.

Sample Policy 5: A More Complex Profile using multiple invocations

Here is a more complicated example, showing how multiple invocations can be used to "chain" processing of statements.

(invoke "load-label" STATEMENT-LIST "http://www.musac.org/" 
        URL ("http://www.label-bureau.org/"))
(invoke "check-md5"
 (match (* (("load-label" * "http://www.label-bureau.org/")
            ((version "PICS-1.1") * (md5 *) *)))
        STATEMENT-LIST))
(and
 (match 
  (("check-md5" *) 
   ("md5-verified" (version "PICS-1.1") *
    (service "http://www.e-trust.org/privacy-descriptions") *
    (ratings (RESTRICT = data-reselling 0))))
  STATEMENT-LIST)
 (match
  (("check-md5" *)
   ("md5-verified" (version "PICS-1.1") * 
    (service "http://www.dma.org/privacy") *
    (ratings (RESTRICT < personal-data-collected 3))))
  STATEMENT-LIST))

Sample Policy 5 is like Sample Policy 4 except that this policy additionally requires that labels have valid MD5 hashes before their ratings are used by the system. We assume the existence of a "check-md5" policy that verifies the correctness of MD5 hashes contained within PICS labels, and writes "md5-verified" at the beginning of such statements.

Sample Policy 6: PICS filtering in Microsoft Internet Explorer 3.0

Microsoft's Internet Explorer version 3.0 (IE3.0) includes a "Content Advisor" that permits creation of simple PICS-related policies. Content Advisor allows users to

choose a rating service to use for filtering access to URLs,
choose a label bureau (if desired) to contact to request labels for URLs, and
specify maximum values along the rating service's various measurement axes which labels must satisfy in order to permit viewing of a particular URL.

IE3.0 provides a graphical user interface (GUI) for configuring Content Advisor, but every possible configuration is in fact a policy that IE3.0 follows when asked to load and view a particular URL. The set of possible Content Advisor policies is a subset of the universe of Profiles-0.92 policies. The companion document Gentle Introduction to REFEREE shows in detail how any Content Advisor policies may be cast as a Profiles-0.92 policy.

Sample Policy 7: Microsoft Authenticode

[this example is still under construction…]

"Tell me how code signing works [in Authenticode]"

When the user downloads the code, the downloading application calls the WinVerifyTrust API. The system extracts the signature, determines the CA who authenticated the certificate, and obtains the software publisher's public key distributed by that CA. The system then uses the public key to decrypt the digest. It runs the specified hash on the code again, creating a new digest. If the code has not been modified since it was signed, the new digest should match the old one. If the two digests don't match, either the code was modified, or the public and private keys aren't a matched pair. In either case, the code becomes suspect and the user is warned.

[source: The Microsoft Authenticode FAQ, available at http://www.microsoft.com/intdev/security/authcode/signfaq-f.htm]

Microsoft's Authenticode system provides a limited, fixed semantics attached to digitally signed code. An Authenticode-like system could be written as a Profiles-0.92 policy; here's an example policy. We assume the code comes with a DSig-format digital signature block,

Sample Policy 8: SPKI verification/compression

SPKI verification/compression is the process by which two SPKI certificates may be combined into a third certificate. Let <i1,s1,d1,s1,v1> and <i2,s2,d2,a2,v2> denote two SPKI certificates (i1 and i2 are the issuers of the certificates, s1 and s2 are the respective subjects, etc.) Then the SPKI compression rule is:

<i1,s1,d1,a1,v1> + <i2,s2,d2,a2,v2> = <i1,s2,d2,a,v>
if ((s1 == i2) && (d1>d2) && (a1>=a2) && ((v1&v2)>0))
if (a2 == "DELEGATE-ALL") a = a1 else a = a2
v = v1 & v2

If the Profiles-0.92 language is extended to handle subexpression binding on matched patterns (Appendix A) and to permit arbitrary additions to statement lists (Appendix C), then it is possible to implement the majority of the SPKI compression function directly within Profiles-0.92. Within Profiles-0.92 as specified in Section three above it is not possible to directly implement SPKI compression; this function would have to be implemented as a subpolicy which could be invoked from Profiles-0.92.

If we are willing to implement compression as a callable subpolicy, then implementation is straightforward. We assume that this subpolicy will accept as input any number of SPKI certificates and continue to combine and process those certificates until no further compression is possible. It may be desirable to implement an alternative version of SPKI compression that accepts exactly two certificates and tries to compress together only those two certificates, or a version that accepts a list of certificates and an authorization goal and tries to derive the goal from the given initial certificates.

The input statement list to the SPKI compression subpolicy will be a list of 5-tuple statements generated by the DSig parser routine. Recall that each statement on the statement list is of the following form:

(<context> (dsig (issuer <the-issuer>)
                 (subject <the-subject>)
                 (delegation <the-delegation>)
                 (authorization <the-authorization>)
                 (validity <the-validity>)))

The call to the SPKI compression routine, then, is simply:

(invoke "spki-compression" STATEMENT-LIST)

since the only arguments that need to be passed to the compression function are the certificates themselves, which are included in the STATEMENT-LIST. The SPKI compression policy then returns as additions to the STATEMENT-LIST new statements of same format containing the new derived certificates.

Note that the SPKI compression routine is assumed to be able to understand the AUTH fields present in the presented certificates and be able to locate an appropriate comparison function ("AUTH-<") for that particular authorization. It may be possible to use REFEREE policy databases to obtain this information if the databases are seeded properly.

Appendix A: Binding variable names to subexpressions of matched patterns

At the first design meeting for the W3C Digital Signature Initiative it was suggested that Profiles-0.92 be extended to allow variable binding to arbitrary subexpressions within that pattern matcher. That is, policy authors should be able to execute a pattern match and bind a variable to portions of the matching statements (if any). We have not been able at this time to enunciate a compelling reason that this functionality is needed within Profiles-0.92 itself, and (as we discuss below) the Profiles-0.92 language would have to become much more complex in order to do anything with the results of a subexpression match, thus we have not added these functions to the draft language specification. This appendix details how the spec would have to be changed to permit variable binding to subexpressions of matched patterns.

Modifying the syntax and semantics of the match command itself is straightforward. In addition to the reserved constructs listed in Section 3.5 above, we create two additional classes of reserved words: '\' variable '(' and '\' variable ')'. The BNF rule for pattern in Section 3.6 also extended so that a pattern may be of the form: pattern :: `\` variable '(' pattern '\' variable ')'. The semantics of this new construct are as follows: \<variable>( … \<variable>) binds the portion of the matched s-expression between the \<variable>( and \<variable>) to <variable> in the current execution environment. Thus, the pattern ( * \( ( * 3 * ) \x) * ) would match ( 1 ( 2 3 4 ) 5 ) and bind ( 2 3 4 ) to x.

Allowing match to return arbitrary portions of matched s-expressions as return values would violate the convention that rules always return pairs of tri-values and statement-lists, so we do not do so. As the only way to create bound variables in the local environment is via let, subexpression binding in match must necessarily happen within the scope of a let that defines the variables that will be used to hold the matched subexpresions. (This is why the semantics of let permit a variable to be bound in the current environment to an unassigned value (no value specified).) For example:

(let ((x) (y))
 (match ( \x( ( * ) \x) \y( ( * ) \y) ) STATEMENT-LIST))

binds the variables x and y, respectively to the context and content of a statement. But now we have another problem, which is that match by definition operates on lists of statements, not individual statements. Suppose in the above example that two statements on the STATEMENT-LIST satisfy the given pattern. The match function will return a (tri-value, statement-list) pair where the statement-list contains both matched items. What are the new values of the variables x and y? Are they bound to the corresponding subexpressions in the first statement that matched the pattern? Or the subexpressions in the last statement that matched the pattern? Or perhaps x and y should themselves be lists of all the corresponding matched subexpressions, one per matched statement? This last choice is most appealing semantically; if the matcher finds multiple matches then x and y should get all of those matches. But if x and y are lists then we have no way in the current Profiles-0.92 language to operate on those lists or take them apart to get at the core pieces. The lists will either have to be passed as arguments to another invoke that can process them or we will have to add iterators and loop constructs to the Profiles-0.92 language.

Appendix B: Proof of completeness of the 3-valued logic system

We have a proof that the operators and, or, not, true-if-unknown, false-if-unknown and threshold-and are sufficient to construct any tri-value x tri-value --> tri-value function desired. It's not written up yet but will go in this appendix when completed.

Appendix C: Writing directly to a statement-list in Profiles-0.92

Another suggestion from first DSig design meeting was to to allow the generation of new statements and statement lists from within Profiles-0.92 itself. Currently, Profiles-0.92 operates on statement lists generated by subpolicies but does not contain language constructs that can be used to directly add new statements to a statement list. For example were Profiles-0.92 extended to allow variable binding to subexpressions of matched patterns, those bound values might be added to the statement list in some format.

Again, however, there does not appear to be a compelling reason to allow Profiles-0.92 (as opposed to subpolicies called by Profiles-0.92) to write directly to the statement list. In order to support this extension Profiles-0.92 would have to include, at a minimum, primitives for arbitrary list construction and deconstruction (cons, car, cdr) and side-effecting a variable (changing the bound value of a statement list). (The deconstructors car and cdr could be implemented using subexpression extraction in the pattern matcher, if the pattern macthed was extended as described in Appendix A. cons, however, would still need to be added to the language.)