This document is in the course of review by the members of the World-Wide Web Consortium. This is a stable document derived from internal Working Drafts of the W3C PICSRules Working Group and the public working draft dated 26 October, 1997. Details of this review have been distributed to member's representatives. Comments by non-members should be sent to w3c-PICSRules-ask@w3.org.
The review period will end on 14 December 1997 24:00 GMT. Within 14 days from that time, the document's disposition will be announced: it may become a W3C Recommendation (possibly with minor changes), or it may revert to Working Draft status, or it may be dropped as a W3C work item. This document does not at this time imply any endorsement by the Consortium's staff or member organizations.
This document is part of the W3C (http://www.w3.org/) Metadata activity.
A list of current W3C Recommendations, Proposed Recommendations and Working Drafts can be found at: http://www.w3.org/TR
This document defines a language for writing profiles, which are filtering rules that allow or block access to URLs based on PICS labels that describe those URLs.
This specification uses the same words as RFC 1123 [RFC1123] for defining the significance of each particular requirement. These words are:
An implementation is not compliant if it fails to satisfy one or more of the MUST requirements for the protocols it implements. An implementation that satisfies all the MUST and all the SHOULD requirements for its protocols is said to be "unconditionally compliant"; one that satisfies all the MUST requirements but not all the SHOULD requirements for its protocols is said to be "conditionally compliant." User-agents which process PICSRules are free to choose any interpretation they wish for constructs which fail to meet one of the MUST requirements.
This document assumes that the reader has a working knowledge of PICS-1.1. All labels referred to here are assumed to be PICS-1.1 compliant labels. See references [PicsServices] and [PicsLabels] for details.
1 (PicsRule-1.1 2 ( 3 Policy (RejectByURL ("http://*@www.grody.com:*/*" "http://*@www.gross.net:*/*")) 4 Policy (AcceptIf "otherwise") 5 ) 6 )
The numbers on the left are line numbers for ease of reference; they aren't part of the actual rule.
This example forbids access to a specific set of URLs, without using any PICS labels. Any URL that specifies the host www.grody.com or www.gross.net will be blocked, regardless of the username, port number, or particular file path that is specified in the URL; any other URLs are considered acceptable.
1 (PicsRule-1.1 2 ( 3 serviceinfo ( 4 "http://www.coolness.org/ratings/V1.html" 5 shortname "Cool" 6 bureauURL "http://labelbureau.coolness.org/Ratings" 7 UseEmbedded "N" 8 ) 9 Policy (RejectIf "((Cool.Coolness <= 3) or (Cool.Graphics >= 3))") 10 Policy (AcceptIf "otherwise") 11 ) 12 )
This rule checks the rating given to documents according to the "Cool" rating service ("http://www.coolness.org/ratings/V1.html"). Labels will be fetched from the label bureau "http://labelbureau.coolness.org/Ratings". Labels embedded in the document are ignored because the document authors can't be trusted to assess their own coolness. Documents which are not sufficiently cool or have too many graphics will be blocked. Everything else, including unlabeled documents, will be allowed.
1 (PicsRule-1.1 2 ( 3 ServiceInfo ( 4 name "http://www.coolness.org/ratings/V1.html" 5 shortname "Cool" 6 bureauURL "http://labelbureau.coolness.org/Ratings" 7 ) 8 Policy (RejectUnless "(Cool.Coolness)") 9 Policy (AcceptIf "((Cool.Coolness > 3) and (Cool.Graphics < 3))") 10 Policy (RejectIf "otherwise") 11 ) 12 )
This rule also checks the rating given to documents according to the "Cool" rating service. In this case, because UseEmbedded is not specified, it defaults to using embedded labels in addition to labels it fetches from the label bureau. Line 8 says that documents will be blocked unless we have a rating on the "Coolness" scale of the "Cool" rating system ("http://www.coolness.org"). Line 9 says that documents which are sufficiently cool, and don't have too many graphics, will be passed. Line 10 says to block all other documents.
1 (PicsRule-1.1 2 ( 3 name (rulename "Example 4" 4 description "Example 4 from PICSRules spec; simply shows how PICSRules rules are formed. This rule is not actually intended for use by real users.") 5 source (sourceURL "http://www1.raleigh.ibm.com/pics/PICSRulz/Example1.html") 6 ServiceInfo (name "http://www.coolness.org/ratings/V1.html" 7 shortname "Cool" 8 bureauURL "http://labelbureau.coolness.org/Ratings") 9 ServiceInfo ("http://www.kid-protectors.org/ratingsv01.html" 10 shortname "KP") 11 Policy (RejectByURL ("http://*@www.badnews.com:*/*" "http://*@www.worsenews.com:*/*" "*://*@18.0.0.0!8:*/*")) 12 Policy (AcceptByURL "http://*rated-g.org/movies*") 13 Policy (AcceptIf "(KP.educational = 1)" Explanation "Always allow educational content.") 14 Policy (RejectIf "(KP.violence >= 3)" Explanation "Blood's a %22scary%22 thing.") 15 Policy (RejectUnless "(Cool.Graphics < 4)" ) 16 Policy (AcceptIf "otherwise") 17 ) 18 )
The summary of this rule is the following:
It is intended that this syntax will be registered as a MIME type, application/pics-rules.
attrvalpair:: attribute whitespace value | value attribute:: alphanumstr value:: quotedstring |'(' attrvalpair+ ')' quotedstring:: '"'notdoublequotechar*'"' | "'"notsinglequotechar*"'" alphanumstr:: (alphanum | '.')+ whitespace:: ' ' | '\t' | '\r' | '\n' alphanum:: '0' - '9' | 'A' - 'Z' | 'a' - 'z' notdoublequotechar :: any Unicode character except "
notsinglequotechar :: any Unicode character except '
The grammar uses " to quote strings, but ' may be used instead, provided that the
same character starts and ends the string:
"string"
'string'
but not:
"string'
'string"
As a shorthand in the rest of the BNF, we will use "double quotes" for all quoted strings, with the understanding that single quotes are equally valid as a delimiter. Also as a shorthand, we use notquotechar to mean any Unicode character other than the quoting delimiter (either " or ') used for the current string.
The other quoting character may appear within a string. In order to accommodate the use of both single and double quotes inside strings, the following escaping conventions apply:
Note that, although ", ', and % are encoded using the % hex hex encoding rule used
for special characters in URLs, other % hex hex combinations are not valid and are not
considered encodings of other characters.
Character string as represented in a PICS Rule | Parsed and decoded character string |
"string" | string |
'string' | string |
'This is "quoted" text.' | This is "quoted" text. |
"It's nice to quote." | It's nice to quote. |
"It%27s nice to %22quote.%22" | It's nice to "quote." |
"50%25 of test scores are above the median" | 50% of test scores are above the median |
"50% are below the median" | <syntactically invalid string> |
RFC 2070 [RFC2070] on internationalization of HTML describes the more general SGML distinction between the internal character encoding and external character encoding. In those terms, Unicode is the internal character set for PICSRules rules. Unicode is a character set that includes characters from most languages; it is a 16-bit character set. We designate UTF-8 as the official external encoding for PICSRules. UTF-8 [UTF-8] has the useful properties that all USASCII characters are represented by themselves, and that they do not appear as part of the encoding of anything else. This means that most processing need not know about UTF-8 provided that it does not strip the top bit of 8-bit bytes.
Note that in order to properly interpret a PICSRules rule, the UTF-8 transformation is
applied first, to convert the rule into a sequence of Unicode characters. Each quoted
string must then be passed through a converter that unescapes quotes,
converting %22 to ", %27 to ', and %25 to %.
Note that all attribute names are case insensitive, while the case of values MUST be preserved. However, individual clauses and/or attributes MAY define their values to be case-insensitive.
The PICSRules syntax, which will be presented below, has a facility for descriptive text which can be shown to a user, in addition to various statements which influence the behavior of user-agents. However, it is frequently useful to have "source-level" comments - comments which are intended to individuals writing and/or editing rules, but which are not intended for display to end users. This is analogous to placing comments in source code; in an effort to encourage rule authors to write clear rules, we provide a facility for placing comments into PICSRules rules.
The syntax of a comment is:
comment:: '{' comment-text* '}' comment-text:: any characters except '}'
Note that a result of the above syntax is that comments may not be nested.
Comments may appear anywhere in PICSRules rules. A user-agent MAY remove the comments during lexical analysis of the rule; text within comments MUST NOT influence the interpretation of the rule in any manner. Note also that user-agents which generate or export PICSRules rules MAY choose to strip out comments before generating, exporting, or transmitting them.
The general format of a PICSRules rule, in modified BNF, is as follows. Some elements, such as "policy-expression" and "URLpattern" are used here but defined later in the document.
rule:: '(' 'PicsRule-'verMajor'.'verMinor rule-body ')' verMajor :: integer verMinor :: integer rule-body :: '(' rule-clauses ')' rule-clauses :: rule-clause+ rule-clause :: policy-clause | name-clause | source-clause | service-clause | opt-extension-clause | req-extension-clause | extension-aval policy-clause :: 'Policy' '(' policy-attribute+ ')' policy-attribute :: ['Explanation'] quotedstring | 'RejectByURL' URL-strings | 'AcceptByURL' URL-strings | 'RejectIf' policy-string | 'RejectUnless' policy-string | 'AcceptIf' policy-string | 'AcceptUnless' policy-string | extension-aval
URL-strings :: URL-string | '(' ['patterns'] URL-string+ ')'
URL-string :: '"'URLpattern'"'
policy-string :: '"'policy-expression'"' name-clause :: 'name' '(' name-attribute+ ')' name-attribute :: ['Rulename'] quotedstring | 'Description' quotedstring | extension-aval source-clause :: 'source' '(' source-attribute+ ')' source-attribute :: ['SourceURL'] quotedURL | 'CreationTool' quotedstring | 'author' quoted-address | 'LastModified' quoted-date | extension-aval service-clause :: 'serviceinfo' '(' service-attribute+ ')' service-attribute :: ['Name'] quotedURL | 'shortname' quotedstring | 'BureauURL' quotedURL | 'UseEmbedded' yes-no | 'Ratfile' quotedstring | 'BureauUnavailable' pass-fail | extension-aval
yes-no :: '"'Y-N'"'
Y-N :: 'Y' | 'N'
pass-fail :: '"'P-F'"'
P-F :: 'PASS' | 'FAIL'
opt-extension-clause :: 'optextension' '(' extension-name+ ')' extension-name :: ['extension-name'] quotedURL | 'shortname' quotedstring | extension-aval req-extension-clause :: 'reqextension' '(' extension-name+ ')' extension-aval :: attrvalpair
quotedURL :: '"'URL'"'
URL :: as defined in RFC-1738 for URLs. quoted-address :: '"'e-mail-address'"' e-mail-address :: as defined in RFC-822 for addresses.
quoted-ISO-date :: '"'YYYY'-'MM'-'DD'T'hh':'mmStz'"' based on the ISO 8601:1988 date and time standard, restricted to the specific form described here: YYYY :: four-digit year MM :: two-digit month (01=January, etc.) DD :: two-digit day of month (01 through 31) hh :: two digits of hour (00 through 23) (am/pm NOT allowed) mm :: two digits of minute (00 through 60) S :: sign of time zone offset from UTC ('+' or '-') tz :: four digit amount of offset from UTC (e.g., 1512 means 15 hours and 12 minutes) For example, "1994-11-05T08:15-0500" is a valid quoted-ISO-date denoting November 5, 1994, 8:15 am, US Eastern Standard Time Note: The ISO standard allows considerably greater flexibility than that described here. PICS requires precisely the syntax described here -- neither the time nor the time zone may be omitted, none of the alternate formats are permitted, and the punctuation must be as specified here.
Note: The PICS-1.1 label format spec inadvertently used a date format that was slightly incompatible with the ISO date format. In particular, that spec required '.' instead of '-' as the separator between year and month, and between month and day. This spec corrects that error, so that it is incompatible with the PICS-1.1 label spec's date format, but compatible with the ISO date format.
An application program will invoke a rule evaluator, providing a rule and a URL, and perhaps labels that were embedded in the document associated with the URL or passed in the HTTP headers along with the document associated with the URL. A yes (accept) or no (reject) answer is returned. The rule evaluator SHOULD also return the explanation string associated with the policy clause that determines the final answer, if such an explanation string is provided.
The serviceinfo clause or clauses specify how to find labels associated with a given URL (from one or more label bureaus or embedded in the document). The Policy clause or clauses determine whether an accept or reject answer is returned. Extension clauses (either required or optional) may cause additional labels to be collected or discarded, or otherwise change the meaning of a rule. The semantics of a rule are defined based on a user agent making a best-effort attempt to retrieve labels from all the specified sources and using all the retrieved labels in evaluating policy clauses. A user agent may, however, perform optimizations, such as consulting a local source (a cache or a CD-ROM) that provides the same labels as those provided at a specified URL, or not collecting labels at all when those labels could not possibly change the rule's result.
Later in this document, we suggest that implementors adopt a particular evaluation order. Implementors should be very careful about any deviations from this suggested evalution order. Note that it is possible to write rules that are non-monotonic in the receipt of labels: as more labels are received, the result could flip from accept to reject and back again. In some situations, however, it may be possible to infer that additional labels can not alter the result of a rule: for example, the first policy clause may specify that a particular URL is to be accepted, based solely on its URL, regardless of any labels that are available. As an optimization, a user agent may use the policy clause(s) to determine an answer even before labels are available from all of the sources specifies in the serviceinfo clause(s), but implementors should be careful to do this only in those situations where the additional labels, even if they were available, could not change the results of the evaluation.
Attribute in clause | Satisfied by | Action |
RejectByURL | URL matches any of the patterns specified | Block document |
AcceptByURL | URL matches any of the patterns specified | Pass document |
RejectIf | expression = true | Block document |
AcceptIf | expression = true | Pass document |
RejectUnless | expression = false | Block document |
AcceptUnless | expression = false | Pass document |
If none of the policy clauses is satisfied, then the document is passed. This is equivalent to making the final policy clause be AcceptIf "otherwise".
The following semantic restrictions are imposed on rules:
In policy clauses, the AcceptByURL and RejectByURL attributes employ the URLpattern element, whose BNF is given below.
URLpattern :: internet-pattern | other-pattern
internet-pattern :: internet-scheme '://' [user '@'] hostoraddr [':' port] ['/' pathmatch]
internet-scheme :: '*' | 'ftp' | 'http' | 'gopher' | 'nntp' | 'irc' | 'prospero' | 'telnet'
user :: ['*' | '%*'] notquotechar* ['*' | '%*'] hostoraddr :: ['*' | '%*'] host | ipwild ['!' bitlength] ipwild :: ipcomponent '.' ipcomponent '.' ipcomponent '.' ipcomponent
ipcomponent :: integer between '0' and '255' inclusive
bitlength :: integer between '0' and '32' inclusive host :: substring of a fully qualified domain name as described in Section 3.5 of [RFC1034]
port :: '*' | integerorwild [ '-' integerorwild ]
pathmatch :: ['*' | '%*'] notquotechar* ['*' | '%*']
integerorwild :: digit+ | '*'
digit :: '0' - '9' other-pattern :: scheme : ['*' | '%*'] notquotechar* ['*' | '%*']
scheme :: '*' | schemechar+
schemechar :: 'a' - 'z' | 'A' - 'Z' | digit | '+' | '.' | '-' (as specified in [RFC1738])
A RejectbyURL policy clause causes the overall rule to "reject" if the URL match evaluates to TRUE. Similarly, an AcceptbyURL policy clause causes the overall rule to "accept" if the URL match evaluates to TRUE. In either case, the explanation associated with policy clause is returned. If a list of URL patterns is provided, the URL match evaluates to TRUE if any one of the patterns matches. If the URL match evaluates to FALSE, the policy clause is ignored and evaluation continues with the next policy clause.
When comparing a URLpattern to a URL, the rule interpreter MUST NOT unencode the URL
(e.g., do not convert %2F to /). If the pattern can be interpreted as an
internet-pattern, then the pattern is divided into its component parts and the URL matches
the pattern if a match occurs on every component that is included in the pattern.
scheme '*' for the pattern matches every scheme. Otherwise, an exact string
match is required, but the comparison is case-insensitive. The scheme component MUST
NOT be omitted from the pattern. user '*' at the beginning or end of the pattern
matches any number of characters in the URL string. '%*' at the beginning or end of the
pattern matches the single character '*' in the URL string. Characters in the middle of
the pattern must match exactly the characters in the URL string; this comparison is
case-sensitive. A user component of "*" in the pattern also matches URLs that
omit the user component. If the user component is omitted from the pattern, there is a
match only if the user component is also omitted from the URL. password PICSRules
patterns do not specify passwords. A pattern matches URLs that specify any password, as
well as URLs that omit the password component. ipwild If the hostoraddress in the
pattern is an ipaddress, then the corresponding host component of the URL is first
resolved into a set of IP addresses. The pattern matches if it matches any of the IP
addresses. If ! and a bitlength is specified, both the pattern and the URL are converted
from decimal into binary notation and the first bitlength bits of the strings are
compared. Thus, the '!' has the same semantics that '/' normally has when specifying
subnets or CIDR blocks: we use ! because / could be misinterpreted as the delimiter after
which the scheme appears. 18.23.7.22!16 is equivalent to 18.23.0.0!16, because comparisons
will be done only on the first 16 bits. host '*' at the beginning of the pattern
matches any number of characters in the URL string. '%*' at the beginning of the pattern
matches the single character '*' in the URL string. Subsequent characters in the pattern
must exactly match the remaining characters in the URL string; this comparison is
case-insensitive. Note that if the pattern specifies a host name (or a host name
with wildcards), it does not match a URL that specifies an IP address, even if the host
name in the pattern would resolve to the IP address in the URL. This avoids the need to
perform expensive reverse DNS lookups based on IP addresses in URLs. Either a host
or an ipwild component MUST be specified in the URL pattern. port If the
pattern specifies two numbers, it matches against any port number in that range. For
example, if the pattern's port component is "80-82", it would match a URL whose
port is 80, 81, or 82. The wildcard * as part of a range indicates an extreme value. That
is, if the pattern's port is "*-82", it matches all ports less than or equal to
82; if the pattern's port is "80-*", it matches all ports greater than or equal
to 80. If the pattern's port is simply "*", it matches URLs with any port
number, including URLs that omit the port number component. If the pattern's port is
omitted, it matches only URLs that also omit the port number. path '*' at the
beginning or end of the pattern matches any number of characters in the URL string. '%*'
at the beginning or end of the pattern matches the single character '*' in the URL string.
Characters in the middle of the pattern must match exactly the characters in the URL
string; this comparison is case-sensitive. A path component of "*" in the
pattern also matches URLs that omit the path component. If the path component is omitted
from the pattern, there is a match only if the path component is also omitted from the
URL.
WARNING: if a component is not specified in the pattern, the pattern matches only URLs that omit the pattern. It is necessary to specify '*' for pattern components if the intention is to ignore that component of URLs. For example, to block access to all URLs contain the string "buy" in the pathname, the correct pattern is "*://*@*:*/*buy*". While it might seems natural to write the pattern "*://*/*buy*" or even "*buy*", the first would match only URLs that omit the username and port number, and the second is simply not a valid pattern.
If the pattern can not be interpreted as an Internet scheme, it is divided into a
scheme name and a scheme-specific part. '*' for the scheme name matches any URL's scheme;
otherwise exact string matching is required; this comparison is case-insensitive. '*' at
the beginning or end of the scheme-specific part of the pattern matches any number of
characters in the URL string. '%*' at the beginning or end of the pattern matches the
single character '*' in the URL string. Characters in the middle of the scheme-specific
part of the pattern must match exactly the characters in the URL string; this comparison
is case-sensitive.
NOTE: It is not possible to write a URLpattern that matches exactly the URL string
characters '%*'. This is not a limitation of the pattern matching language, however,
because, in a valid URL, the '%' character must be followed by two hex digits. Thus, there
are no URL strings containing the character sequence '%*'.
Since %-encoded characters in URLs are not unencoded before comparison, a server may
choose to treat two URLs as synonyms that the PICS rule evaluator will not treat as
synonyms. That is, the URLs <http://www.student1.mit.edu/sex>,
<http://www.student1.mit.edu/%73%65%78> and
<http://www.student1.mit.edu/se%78> might all cause the server to send back the same
page, if the server follows a rule of unencoding the URL path (%73 becomes 's', %65
becomes 'e' and %78 becomes 'x').
Unfortunately, the alternative matching rule, of always unencoding URLs before comparing to the pattern, can cause ambiguities. For example, in HTTP, ? is reserved as the query string delimiter; any naturally occurring ? is encoded as %3F. After unencoding it would no longer be possible to distinguish a query string delimiter from a naturally occurring ?. We felt it was better to make the pattern matching precise, at the expense of missing some synonyms.
Another, similar limitation is that IP addresses in URLs are not converted into host names for comparison to rule patterns. This means that host name-based patterns will miss matching against certain synonymous IP-address based URLs. The pattern "http://*.mit.edu" will match against fewer URLs than the pattern "http://18.0.0.0!8". The latter pattern will match against web site ending in mit.edu, because they all will resolve to ip addresses beginning with 18. The reason that URLs containing IP addresses will not match against patterns that specify domain names is that performing a reverse lookup of the IP address in the URL is too expensive an operation to perform routinely. Hence, whenever it is practical to do so, rules may want to specify IP address matching rather than host name maching; beware, however, that this may require updating of the rule whenever a host name switches to a different IP address.
The attributes AcceptIf, RejectIf, AcceptUnless, and RejectUnless to the Policy clause all take a policy-expression as an argument. It is an expression operating on various labels; this section defines the syntax and semantics for those expressions.
policy-expression :: simple-expression | or-expression | and-expression | degenerate-expression simple-expression :: '(' service ['.' category [op constant ] ] ')' service :: any shortname defined in a serviceinfo clause within this rule
category :: transmit-name-char+ ['/' category] Note: as in the [PicsLabels] spec, if the rating service defines hierarchically nested categories, the outermost category name goes at the left, followed by a slash, then the next category name, etc.
transmit-name-char :: alphanumpm | '.' | '$' | ',' | ';' | ':' | '&' | '=' | '?' | '!' | '*' | '~' | '@' | '#' | '_' | '%' hex hex alphanumpm :: 'A' | ... | 'Z' | 'a' | ... | 'z' | '0' | ... | '9' | '+' | '-'
hex :: '0' | ... | '9' | 'A' | ... | 'F' | 'a' | .... | 'f'
op :: '>' | '<' | '=' | '>=' | '<=' constant :: [sign] alphanum* ['.' alphanum*] or-expression :: '(' policy-expression [or policy-expression]+ ')' or :: 'or' and-expression :: '(' policy-expression [and policy-expression]+ ')' and :: 'and' sign :: '-' degenerate-expression :: 'otherwise'
When evaluating a clause, the user-agent may use zero, one, or more labels from a given rating service (for more details, see the control flow section). A simple-expression evaluates to true if any available label from the specified service satisfies the condition of the expression. Intuitively, a rule evaluator will try to prove that an expression is satisfied, using any available labels as evidence.
We must deal with the situation where a simple-expression calls for a value from a label, and either no label is available, or the available labels do not have values for the specified category. In those situations, the simple-expression evaluates to false. This leads to an intuitive semantics: if a simple-expression has no associated label available, that expression cannot contribute evidence toward proving the claim made by the expression.
Simple-expressions, as defined above, can use any types of operators on any types of data. More specifically, the semantics of expression evaluation are as follows:
Early drafts of PICSRules-1.0 included a != operator, which is intuitively useful. It was removed, because, in the presence of either zero or multiple values, the intuitive semantics for != are inconsistent with the semantics for other operators. For example, suppose that a label includes (s (2 3)), indicating values on the s dimension of both 2 and 3. This label would satisfy the policy-expression (Service.s < 3), because there exists a value less than 3. The intuitive semantics for !=, however, is to require that all the values be unequal to three. We found that smart people could easily get confused when mixing the existential quantification (there exists a value less than 3) with universal quantification (all values are unequal to 3). Moreover, "x != 3" is normally a synonym for "((x < 3) or (x > 3))". But in the presence of multiple values, this would not hold. We believed that it was worse to have an operator with non-intuitive semantics that to not have the operator at all, so it was removed from PICSRules-1.1.
The careful reader will also note the lack of the Boolean not operator, as
well as the lack of universally quantified operators such as max, min, and
forall. These omissions are deliberate, and for similar reasons to the omission
of !=. Given that the available labels may provide either no values or multiple
values for particular categories, rules become very difficult for people to understand
when such operators are allowed in an unrestricted way. We have restricted the use of
negation and universal quantification to appear only at the top-level, using the
attributes AcceptIf, AcceptUnless, RejectIf, and RejectUnless, as described
below. Our restricted language still has full expressiveness, however, by taking advantage
of the fact that "forall x, g(x) holds" is mathematically equivalent to
"there does not exist x such that g(x) does not hold". For example, suppose one
wants to accept any URL so long as all the labels agree on an s-value equal to
three. The policy clause would be:
Policy (AcceptUnless "(Service.s < 3) or (Service.s > 3)" ).
PICSRules is structured as a nested set of attribute-value pairs. Unrecognized attribute keywords are ignored by user-agents, and the associated values can be discarded by a PICSRules parser, as all values will be in a known syntax. The basic mechanism for extending PICSRules is to define new clauses and/or attribute-value pairs, their context, and their meaning. All new attribute-value pairs will be associated with a named extension. Names of extensions are URLs, and hence globally distinct. When used in a PICSRule, extension attribute names are preceded by a shortname for the extension that defines the attribute, so as to avoid potential attribute naming conflicts.
To define a new extension:
Here's a simple example of a PICSRules rule that uses an optional extension:
1 (PicsRule-1.1 2 ( 3 ServiceInfo ( 4 "http://www.coolness.org/ratings/V1.html" 5 shortname "Cool" 6 bureauURL "http://labelbureau.coolness.org/Ratings" 7 ) 8 Policy (AcceptIf "((Cool.Coolness < 3) or (Cool.Graphics < 3))" ) 9 Policy (RejectIf "otherwise") 10 optextension ( "http://www.si.umich.edu/~presnick/pics/extensions/PRsample.htm" 11 shortname "extension1") 12 extension1.SampleAttribute ( 13 UseExpired "YES" 14 GroupFile "/etc/ics.grp" 15 ) 16 ) 17 )
This example makes use of an optional extension named "http://www.si.umich.edu/~presnick/pics/extensions/PRsample.htm". That extension defines the keyword SampleAttribute . User-agents which don't understand this extension can simply ignore the extension1.SampleAttribute clause and its attribute-value pairs (lines 12-14).
Note that there is only one "level" to declaring extensions, but attribute-value pairs defined by extensions may appear anywhere within a PICSRules rule. That is, all extensions should declare themselves with an optextension or reqextension clause within a rule-clause, but the attributes defined by an extension may appear nested several layers down within a rule.
We thank the following for their assistance in writing this document; without their help, none of this would have been possible. Special thanks go to David Shapiro, whose parsing code made it possible to test changes in the syntax and examples as we made them.
Scott Berkun, Microsoft Jonathan Brezin, IBM Yang-hua Chu, MIT Lorrie Cranor, AT&T Jon Doyle, MIT Ghirardelli Chocolate Co. Brian LaMacchia, AT&T Breen Liblong, NetShepherd Jim Miller, W3C Mary Ellen Rosen, IBM Rick Schenk, IBM Bob Schloss, IBM David Shapiro, MIT Ray Soular, SafeSurf