W3C

Protocol for Web Description Resources (POWDER): Grouping of Resources

W3C Working Draft 31 October 2007

This version
http://www.w3.org/TR/2007/WD-powder-grouping-20071031/
Latest version
http://www.w3.org/TR/powder-grouping/
Previous version
http://www.w3.org/TR/2007/WD-powder-grouping-20070709/
Editors:
Andrea Perego, Università degli Studi dell'Insubria
Phil Archer, Family Online Safety Institute

Abstract

The Protocol for Web Description Resources (POWDER) facilitates the publication of descriptions of multiple resources such as all those available from a Web site. This document describes how sets of resources may be defined, either for use in Description Resources or in other contexts. An OWL Class is to be interpreted as the Resource Set with its predicates and objects either defining the characteristics that elements of the set share, or directly listing its elements. Resources that are directly identified or that can be interpreted as being elements of the set can then be used as the subject of further RDF triples.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a Public Working Draft, designed to aid discussion. The POWDER Use Cases and Requirements document [PUC] details the use cases and requirements that motivated the creation this document. Changes since earlier versions of this document are recorded in the change log.

This document was developed by the POWDER Working Group. The Working Group expects to advance this Working Draft to Recommendation Status.

Please send comments about this document to public-powderwg@w3.org (with public archive); please include the text "comment" in the subject line.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction (Informative)
1.1 Design Goals and Constraints
1.2 Outline Methodology
2 Grouping by Address
2.1 Grouping by IRI/URI component
2.1.1 Resource Set Definitions Referring to Ports
2.1.2 Resource Set Definitions Referring to Paths and/or Query Strings
2.1.3 IRI/URI Canonicalization
2.1.4 Data encoding
2.2 Grouping using Wildcards: The includeUriPattern and excludeUriPattern Properties
2.3 Grouping by Regular Expression: The includeRegEx and excludeRegEx Properties
2.3.1 Safe Use of includeRegEx
2.4 Grouping by IP Address
2.4.1 Safe Usage of the includeIpRanges Property
2.5 Enumerating Elements of a Resource Set: the includeResources and excludeResources properties
2.6 Redirection: the includeRedirection property
3 Grouping by Resource Property
4 Conjunction and Disjunction
4.1 Combining Definitions Within a Resource Set
4.2 Combining Multiple Resource Sets
5 Logical Inconsistency
6 Extension Mechanism
6.1 Extension Example 1: ISAN
6.2 Extension Example 2: Custom URL Patterns
7 References
8 Acknowledgments
9 Change Log

1 Introduction (Informative)

The Protocol for Web Description Resources (POWDER) facilitates the publication of descriptions of multiple resources such as all those available from a Web site. These descriptions are attributable to a named individual, organization or entity that may or may not be the creator of the described resources. This contrasts with more usual metadata that typically applies to a single resource, such as a specific document's title, which is usually provided by its author.

Description Resources (DRs) are described separately [DR]. This document sets out how groups (i.e. sets) of resources may be defined, either for use in DRs or in other contexts. Set theory has been used throughout as it provides a well-defined framework that leads to unambiguous definitions. However, it is used solely to provide a formal version of what is written in the natural language text. Companion documents describe the RDF/OWL vocabulary [VOC] and XML data types [WDRD] that are derived from this and the Description Resources document, setting out each term's domain, range and constraints. As each term is introduced in this document, it is linked to its description in the vocabulary document. The POWDER vocabulary namespace is http://www.w3.org/2007/05/powder# for which we use the prefix wdr.

1.1 Design Goals and Constraints

In designing a system to define sets of resources we have drawn on earlier work [Rabin] carried out in the Web Content Label Incubator Activity [WCL-XG], and taken into account the following considerations.

  1. It must be possible to define a set of resources, either by describing the characteristics of the resources in the set, or by simply listing its elements.
  2. It must be possible to determine with certainty whether a given resource is or is not an element of the Resource Set
  3. The ease of creation of accurate and useful Resource Sets is important.
  4. It should be possible to write concise Resource Set definitions.
  5. Resource Set definitions must be easy to write, be comprehensible by humans and, as far as is possible, should avoid including or excluding resources unintentionally.
  6. It must be possible to create software that implements Resource Set definitions primarily using standard and commonly available components and specifically must not require the creation of custom parsing components.
  7. So far as is possible, use of processing resources should be minimized, especially by early detection of a match or failure to match.

1.2 Outline Methodology

Defining a Resource Set by specifying the characteristics that the resources in the set share is clearly an indirect approach, albeit a very useful one in the real world. In a logical sense, the definition must be interpreted to arrive at the full set. The implicit constraint on the resources in the set is that they exist. Newly created resources that match the set definition will become members of the Resource Set, even though at the time the definition was created, they didn't exist. Despite this, as stated above, Resource Set definitions must be unambiguous so that an application can always determine with certainty whether a specific resource is or is not within the defined set of resources.

More formally, a Resource Set definition D denotes a set of resources RS = DI, where DI is the interpretation of D, i.e., the set of resources sharing the characteristics denoted by D.

We take this further and allow a set definition to be built up in stages.

A Resource Set RS is denoted by a set definition DRS in terms of one or more characteristics that the elements of the set have in common. Each characteristic itself gives rise to a set definition D1, D2, …, Dn, so that the complete set definition DRS comprises D1, D2, …, Dn.

The Resource Set RS is the intersection of the sets denoted by the definitions in DRS.

Formally, RS = DRSI = D1ID2I ∩ … ∩ DnI = (D1D2 ∧ … ∧ Dn)I.

For example, suppose that a resource set RS is denoted by the following definitions:

As already noted, there is a further definition here that is implicit, namely that the resources exist. Therefore, the complete set definition, DRS, denotes those resources that exist AND that have the characteristics of being available from example.org AND that have a URI with a path component beginning with foo.

We define an instance of an OWL class to take the place of the Resource Set and the properties of that Class are the set definitions D1, D2, …, Dn. The example can therefore be written using the following pseudo triples:

RSrdf:typeResource Set
is_available_fromexample.org
has_a_URI_with_a_path_component_beginning_withfoo

Whether a specific resource R, known as the candidate resource, is a member of Resource Set RS or not, is determined by comparing its characteristics with those denoted by the set definitions used in DRS. It must be an element of the intersection of the sets defined by the interpretation of D1, D2, …, Dn to be an element of RS.

If a set definition is empty, that is, if the Resource Set Class has no properties, then the set is undefined and RS MUST be considered as the Empty Set. Formally:

Let RS be a resource set, and let DRS be the set of resource set definitions denoting the resources in RS: if DRS = ∅, then RS = ∅.

There are two ways in which a Resource Set may be defined.

A Resource Set may be defined using any combination of these methods. Furthermore, each may be negated so that, for example, it is possible to define a set as "all resources on example.com except those on video.example.com shot in widescreen format." This is shown in Example 4-6.

2 Grouping by address

A Resource Set may be defined in terms of the IRIs, URIs or IP addresses of resources that are its members. Determining whether a candidate resource, is or is not a member of the set, can therefore be done by comparing its address with the data in the set definition. Importantly, if the set is defined solely in terms of IRIs or URIs, this can be done before deciding whether to fetch the candidate resource or perform a DNS lookup, thus maximizing processing efficiency in many environments.

We define a range of methods to support set definition by address, and provide support for methods defined in other Recommendations.

2.1 Grouping by IRI/URI component

The syntax of a URI, as defined in RFC3986 [URIS], provides a generic framework for identification schemes that goes beyond what is demanded by the POWDER use cases [PUC]. We therefore limit our work to IRIs and URIs with the syntax: scheme://host:port/path?query (as shown below). The user info and fragment components are not supported as it is felt that these are not useful in defining Resource Sets and may add a layer of unnecessary complexity. That said it is noteworthy that Resource Sets may be defined using additional vocabularies as set out in Section 6. That extension method, or the use of the includeRegEx and excludeRegEx properties, means that user info and fragments can be used in Resource Set definitions if required.

http://www.example.com:1234/example1/example2?query=help
\   /  \             / \  /\                / \        /
 ---    -------------   --  ----------------   --------  
  |           |          |          |              |  
scheme       host      port        path          query

The following Regular Expression, elaborated from that offered in RFC 3986 [Rabin], provides a means of splitting URIs of this type into their component parts.

(([^:/?#]+):)?(//([^:/?#@]*)(:([0-9]+))?)?([^?#]*)(\?([^#]*))?

This yields the components as shown in Table 1.

Table 1: Mapping between regular expression variables and URI components
ComponentRE variable
scheme $2
host $4
port $6
path $7
query $9

For each URI component we define a corresponding RDF property, the value of which is a white space-separated list of strings, any one of which must match the relevant portion of the URI of the candidate resource.

Formally, we have a set definition D = URI component matches(?x, {string1 | string2 | … | stringn}), where ?x is a variable denoting the URI component under consideration, and {string1 | string2 | … | stringn} denotes a set consisting either of string string1, or string2, or … stringn.

Any number of set definitions D1, D2, …, Dn can be declared and, as stated in Section 1.2, the overall Resource Set is the intersection of the sets that can be interpreted from those definitions. However with some exceptions, each particular RDF property can only appear 0 or 1 times and some are mutually exclusive. Greater detail on this is provided as terms are introduced and in Section 4.

Strings are matched according to one of four rules:

Recognizing the great diversity of potential uses and set definition requirements, multiple properties are defined relating to the path and query components. Furthermore, for each property there is a 'negative' property, that is, a property whose value is a list of strings that must not be present in the relevant URI component.

Table 2: The RDF Properties used to define resource sets by URI components. The annotations refer to notes in the following text.
RDF PropertyURI componentMatching RuleNegative RDF property
includeSchemes scheme exact excludeSchemes
includeHosts host endsWith excludeHosts
includePorts port exact excludePorts
includePortRanges port exact excludePortRanges
includeExactPaths path exact excludeExactPaths
includePathContains contains excludePathContains
includePathStartsWith startsWith excludePathStartsWith
includePathEndsWith endsWith excludePathEndsWith
includeQueryContains query contains excludeQueryContains
includeExactQueries exact excludeExactQueries

As a quick example, the set of all resources on example.org, whether fetched using http or https, where the path component of their URIs starts with foo, and where the path does not end with .png is defined thus:

Example 2-1: A Resource Set definition using three RDF properties

<wdr:ResourceSet>
  <wdr:includeSchemes>http https</wdr:includeSchemes>
  <wdr:includeHosts>example.org</wdr:includeHosts>
  <wdr:includePathStartsWith>/foo</wdr:includePathStartsWith>
  <wdr:excludePathEndsWith>.png</wdr:excludePathEndsWith>
</wdr:ResourceSet> 

The semantics and constraints of each of the terms in Table 2 is further defined in the POWDER Vocabulary document [VOC]. Precise details of how values for each term are combined is discussed is Section 4 below. However, it is worth noting the points made in the following sub-sections.

2.1.1 Resource Set Definitions Referring to Ports

Ranges of Ports are defined as x-y, where x < y, that is, the lower and upper values in the range are separated by a hyphen. Multiple ranges can, of course, be listed using white space as the separator. Specific ports can be included or excluded using the includePorts and excludePorts properties so that the set of all resources on example.org via ports 3125 to 5236 excluding ports 4345 and 5000 can be expressed as in Example 2-2.

Example 2-2: A Resource Set definition using port lists and ranges

<wdr:ResourceSet>
  <wdr:includeHosts>example.org</wdr:includeHosts>
  <wdr:includePortRanges>3125-5236</wdr:includePortRanges>
  <wdr:excludePorts>4345 5000</wdr:excludePorts>
<wdr:ResourceSet>

includePorts and includePortRanges are mutually exclusive, that is, a Resource Set definition may include 0 or 1 of these RDF properties but not both. This is because, as has been noted, a candidate resource must share all of the characteristics defined in the Resource Set to be an element of it. Multiple definitions of port numbers would therefore require the URI of a candidate resource to have multiple ports (which is impossible).

2.1.2 Resource Set Definitions Referring to Paths and/or Query Strings

includePathContains and includeQueryContains may appear any number of times within a Resource Set definition so that it is easy to create one in which multiple strings must be present in paths and/or queries. This is in contrast to all other terms in Table 2 which can only occur 0 or 1 times since the URI of a candidate resource can only have one scheme, one host etc.

Query strings typically contain a series of name-value pairs separated by ampersands thus:

?name1=value1&name2=value2

These are usually acted on by the server to generate content in real time and the order of the name-value pairs is unimportant. For practical purposes ?name1=value1&name2=value2 is equivalent to ?name2=value2&name1=value1. Therefore, if the candidate resource's URI includes a query string, and if the Resource Set definition refers to the query string then:

* If a server is known to use a different delimiter then a different RDF property must be defined, see Section 6.

N.B. If using the RDF properties relating to the query string of a URI then the real-time generation of content should be taken into account. It may be difficult, if not impossible, to predict with certainty what the content of the resource will be and therefore the Resource Set may not be fully defined. It follows that query string-based RDF properties should be used with caution.

2.1.3 IRI/URI Canonicalization

Before any IRI or URI matching can take place the following canonicalization steps should be applied to the candidate resource's IRI or URI. These steps are consistent with RFC3986 [URIS], RFC3987 [IRIS] and URISpace [URISpace].

The following table gives some examples.

Table 3: Examples of canonicalized IRIs and URIs
Input IRI/URICanonical form
www.example.comhttp://www.example.com/
http%3A%2F%2Fwww.example.com%2Ffoohttp://www.example.com/foo
HTTp%3a%2f%2fwww.Example.Com:80%2Ffoohttp://www.example.com/foo
http://www.example.com./foohttp://www.example.com/foo
HTTPS://WWW.EXAMPLE.COM/FOOhttps://www.example.com/FOO
http://example.com/staff/Fran%c3%a7oishttp://www.example.com/staff/François
http://example.com/my%20doc.dochttp://www.example.com/my doc.doc

2.1.4 Data encoding

To complement the URI/IRI canonicalization steps described in the previous section, related processing steps must also be carried out on the strings supplied as set defining data, that is, the values for the RDF properties listed in Table 2.

Bear in mind that if the data is serialized in XML, URI strings specified in the resource set definition will be escaped according to the XML syntax using entity references for specific characters (escaping < with &lt; and & with &amp; is mandatory, others may also be used). Moreover, since Resource Set definition properties take a white space-separated list of URI strings as their value, whenever a URI string contains an unescaped white space (i.e., a white space not encoded as %20), it will be substituted by %20.

The following steps should therefore be applied to each item in the list separately.

If the set definition includes values related to the port then matching of the data against the candidate resource's URI/IRI must be carried out as follows:

2.2 Grouping using Wildcards: The includeUriPattern and excludeUriPattern Properties

Enabling Read Access for Web Resources [WAF] defines a method for encoding the domains and sub-domains from which access to resources on a given Web site should be granted or denied. The includeUriPattern and excludeUriPattern properties support this syntax directly. Domains and sub-domains may be substituted by a wildcard character (*) according to the following EBNF:

access-item    ::= (scheme "://")? domain-pattern (":" port)? | "*" 
domain-pattern ::= domain | "*." domain

scheme and port are used as defined in RFC 3986. domain is an internationalized domain name as defined in RFC 3490.

It follows that:

<wdr:includeHosts>example.com</wdr:includeHosts>

and

<wdr:includeUriPattern>example.com</wdr:includeUriPattern>

are equivalent. However, *.example.com, meaning resources on sub-domains of example.com but not on example.com itself, is not a valid value for includeHosts.

Note that paths and query strings MUST NOT be included in the pattern. If these are required in a Resource Set definition, the relevant properties from Table 2 can be used.

2.3 Grouping by Regular Expression: The includeRegEx and excludeRegEx Properties

The RDF properties discussed above all take white space-separated lists of strings as their values. It is believed that these properties will be easy to use and cover the overwhelming majority of cases. However, the use of strings with fixed matching rules clearly presents a restriction on flexibility. To support fully flexible set definition by URI, the includeRegEx and excludeRegEx properties take a Regular Expression (RE) and should be applied to the candidate resource's complete URI (after following the canonicalization steps above).

The RE syntax used defined by XML schema as modified by XQuery 1.0 and XPath 2.0 Functions and Operators [XQXP].

N.B. The value of the includeRegEx and excludeRegEx properties MUST be a single Regular Expression, not a white space-separated list.

As an example, the set of all the resources hosted either by example.org or example.net, where the path component of their URIs starts either with foo or bar, can be defined thus:

Example 2-3: Set definition by regular expression (not including character escaping)

<wdr:ResourceSet>
  <wdr:includeRegEx>^(([^:/?#]+):)?(//[^:/?#]+\.)*example.(org|net)/(foo|bar)</wdr:includeRegEx>
</wdr:ResourceSet> 

It is important to note that Example 2-3 does not take account of the need to escape certain characters.

The following characters are used as meta characters in Regular Expressions and MUST therefore be escaped if used in an RE pattern given as the value of the includeRegEx property:

. \ ? * + { } ( ) [ ]

In addition, the < (less than) character MUST always be escaped since, if the set definition is given in RDF/XML, it could be mistaken for the beginning of the closing <wdr:includeRegEx> tag.

As a safeguard against unintended consequences, other characters that always or typically have special meaning within URI strings and/or XML SHOULD also be escaped, namely:

! " # % & ' , - / : ; = > @ [ ] _ ` ~

As a result, Example 2-3 should properly be written as shown in Example 2-4 below.

Example 2-4: Set definition by regular expression, including character escaping

<wdr:ResourceSet>
  <wdr:includeRegEx>^(([^\:\/\?\#]+)\:)?(\/\/[^\:\/\?\#]+\.)*example\.(org|net)\/(foo|bar)</wdr:includeRegEx>
</wdr:ResourceSet> 

2.3.1 Safe Use of includeRegEx

Example 2-4 uses a modified version of the RE given Section 2.1, substituting individual portions with specific strings. This is the safest method but is not, perhaps, the most natural way to proceed. If a less rigorous approach is taken it is easy to make mistakes when specifying REs, and incorrect REs in set definitions will have one of two possible (and obvious) consequences

  1. the corresponding set does not include the intended resources;
  2. the corresponding set includes resources not intended to be included.

Example 2-5 shows how this can happen.

Example 2-5: An example of a bad set definition by regular expression

<wdr:ResourceSet>
  <wdr:includeHosts>example.org</wdr:includeHosts>
  <wdr:includeRegEx>https</wdr:includeRegEx>
</wdr:ResourceSet> 

The intention in the RE given in Example 2-5 is probably to say "all resources on example.org with a URI beginning with https." However, as the RE is not anchored at either end, what this actually means is "all resources on example.org where the URI includes https". Thus this Resource Set includes both:

Adding in anchors at the beginning and end of the RE can have equally undesirable consequences.

Example 2-6: A second example of a bad set definition by regular expression

<wdr:ResourceSet>
  <wdr:includeHosts>example.org</wdr:includeHosts>
  <wdr:includeRegEx>^https$</wdr:includeRegEx>
</wdr:ResourceSet> 

In Example 2-6, the intention is, again probably, to define the set of "all resources on example.org fetched using https only". However, adding both the ^ and $ anchors at the beginning and end of the RE means that the whole URI must be https from start to finish — which can never be true so this Resource Set is equivalent to the empty set.

Example 2-7 shows one possible way to encode the intended set definition.

Example 2-7: An example of a correct set definition by regular expression

<wdr:ResourceSet>
  <wdr:includeHosts>example.org</wdr:includeHosts>
  <wdr:includeRegEx>^https</wdr:includeRegEx>
</wdr:ResourceSet> 

Whilst Example 2-7 'works', the potential dangers of using REs mean that it is generally better to use component strings where possible. Example 2-7 is therefore better written as shown in Example 2-8 below.

Example 2-8: A re-write of Example 2-7 without using a regular expression

<wdr:ResourceSet>
  <wdr:includeHosts>example.org</wdr:includeHosts>
  <wdr:includeSchemes>https</wdr:includeSchemes>
</wdr:ResourceSet> 

2.4 Grouping by IP Address

A set of resources can be defined in terms of the IP address(es) from which the resources are served. To support this we define two RDF properties: includeIPs, which takes a white space-separated list of single IP addresses, and includeIpRanges which takes a white space separated list of CIDR blocks [CIDR]. Negative versions of the these RDF properties are also defined: excludeIPs and excludeIpRanges respectively.

As with includePorts, and for similar reasons, includeIPs and includeIpRanges are mutually exclusive, that is, a Resource Set may include one or other, but not both of these RDF properties.

The includeIPs RDF property is simple enough: Example 2-9 defines the Resource Set as all resources available from IP address 123.123.123.123.

Example 2-9: A Resource Set definition using the includeIPs RDF property

<wdr:ResourceSet>
  <wdr:includeIPs>123.123.123.123</wdr:includeIPs>
</wdr:ResourceSet>

The includeIpRanges RDF property allows the definition of a resource set based on a range of IP addresses, specified in a CIDR block. A CIDR block has the form <IP address>/x, where the CIDR prefix x is a number ranging from 1 to 32, denoting the leftmost x bits which a set of IP addresses shares. For instance, the CIDR block 123.234.245.254/8, denotes the range of IP addresses sharing the leftmost 8 bits, i.e., starting with 123.

As an example, suppose that a Resource Set definition should denote all the resources hosted by the machines with IP addresses 123.234.245.254 and 123.234.245.255. This can be expressed by the following Resource Set definition:

Example 2-10: A Resource Set definition using the includeIpRanges RDF property

<wdr:ResourceSet>
  <wdr:includeIpRanges>123.234.245.254/31</wdr:includeIpRanges>
</wdr:ResourceSet> 

2.4.1 Safe Usage of the includeIpRanges Property

In order to use CIDR blocks correctly, it must be taken into account that a CIDR prefix refers to the binary representation of an IP address. For instance, the binary representation of IP address 123.234.245.254 corresponds to

01111011 11101010 11110101 11111110

A CIDR block 123.234.245.254/31 denotes a range of IP addresses

01111011 11101010 11110101 1111111b

i.e., the range of IP addresses sharing the leftmost 31 bits with b either 1 or 0 (formally b ∈ {0,1}). Consequently, the CIDR block 123.234.245.254/31 denotes the following IP addresses:

01111011 11101010 11110101 11111110 = 123.234.245.254

01111011 11101010 11110101 11111111 = 123.234.245.255

This also means that the CIDR block 123.234.245.255/31 is equivalent to 123.234.245.254/31.

It is important to note that the number N of IP addresses denoted by a CIDR block corresponds to 232−x. Therefore, if x = 32, N = 20 = 1, if x = 31, N = 21 = 2, etc. Therefore, it is possible to denote a range of IP addresses using wdr:includeIpRanges only when the number N of IP addresses is a power of 2. Otherwise, it is necessary to provide a white space separated list of CIDR blocks or, alternatively, individual IP addresses. For instance, the resources hosted by the machines with IP addresses 123.234.245.253, 123.234.245.254, and 123.234.245.255 can be expressed as shown in Example 2-11.

Example 2-11: Resource Set definition across several IP addresses

<wdr:ResourceSet>
  <wdr:includeIpRanges>123.234.245.253/32 123.234.245.254/31</wdr:includeIpRanges>
</wdr:ResourceSet> 

OR

<wdr:ResourceSet>
  <wdr:includeIPs>123.234.245.253 123.234.245.254 123.234.245.255</wdr:includeIPs>
</wdr:ResourceSet> 

Incidentally, as already noted, includeIPs and includeIpRanges are mutually exclusive. It is perhaps tempting to create a Resource Set definition like that shown in Example 2-12, however, this would require a candidate resource to be available from both 123.234.245.253 AND either 123.234.245.254 OR 123.234.245.255 which is impossible so that Example 2-12 is tantamount to the empty set.

Example 2-12: Erroneous Resource Set definition across several IP addresses

<wdr:ResourceSet>
  <wdr:includeIpRanges>123.234.245.254/31</wdr:includeIpRanges>
  <wdr:includeIPs>123.234.245.253</wdr:includeIPs>
</wdr:ResourceSet> 

Defining Resource Sets by IP address puts a burden on the processor since it will often have to perform a DNS look up to determine whether a candidate resource is, or is not, a member of the Resource Set. Furthermore, it is particularly easy to include resources in the set by accident using such a broad-sweep approach. If a Web site is hosted on a shared server, for example, it is very likely that the set will include resources by mistake.

Defining a Resource Set by IP address would, however, be appropriate where a content provider operates a large network of servers, or where particular types of content to be described are hosted on servers that can easily be identified by their IP address.

2.5 Enumerating Elements of a Resource Set: the includeResources and excludeResources properties

It is useful to be able to include or exclude resources from sets by simple listing. The includeResources and excludeResources RDF properties support this, both of which take white space separated lists of IRIs and/or URIs. To give a simple example, the set of all resources on example.org except its stylesheet and JavaScript library can be encoded as shown in Example 2-13.

Example 2-13: Resource Set definition using excludeResources property

<wdr:ResourceSet>
  <wdr:includeHosts>example.org</wdr:includeHosts>
  <wdr:excludeResources>http://www.example.org/stylesheet.css http://www.example.org/jslib.js</wdr:excludeResources>
</wdr:ResourceSet>

As emphasized throughout this document, each RDF property and its value creates a set definition of its own and the full Resource Set is the intersection of those sets. Thus an alternative way of looking at Example 2-13 is to say that a candidate resource is a member of the Resource Set if it is on example.org AND does not have the URI http://www.example.org/stylesheet.css AND does not have the URI http://www.example.org/jslib.js.

It is tempting to use includeResources in a similar fashion as shown in Example 2-14.

Example 2-14: Erroneous Resource Set definition using includeResources property

<wdr:ResourceSet>
  <wdr:includeHosts>example.org</wdr:includeHosts>
  <wdr:includeResources>http://www.w3.org/Icons/valid-xhtml10</wdr:includeResources>
</wdr:ResourceSet>

The intention in this example is to include the W3C's valid XHTML 1.0 icon in the set of resources on example.org. However, a resource would have to be both on the example.org host AND have a URI that matched http://www.w3.org/Icons/valid-xhtml10 to be an element of the set. Since this is impossible, such a definition is, again, tantamount to the empty set.

The solution is to use the OWL set operator owl:unionOf as shown in Example 2-15.

Example 2-15: Correct Resource Set definition using includeResources property

<wdr:ResourceSet>
  <owl:unionOf rdf:parseType="Collection">

    <wdr:ResourceSet>
      <wdr:includeHosts>example.org</wdr:includeHosts>
    </wdr:ResourceSet>

    <wdr:ResourceSet>
      <wdr:includeResources>http://www.w3.org/Icons/valid-xhtml10</wdr:includeResources>
    </wdr:ResourceSet>

  </owl:unionOf>
</wdr:ResourceSet>

Here we have two discrete Resource Sets, each of which is made up of, in this case, a single RDF property and its value; and the overall Resource Set comprises the union of those two sets. The use of the OWL set operators is discussed in detail in Section 4.

2.6 Redirection: the includeRedirection property

If a Resource Set is defined in terms of the URIs of the resources that are elements of the set then resolving the URIs may lead to redirection through 3xx HTTP status codes [HTTPCODE]. By default, such redirection MUST lead to the 'new' resource itself being compared with the Resource Set definition. That is, if the resource identified by URI1 is an element of the Resource Set but, when resolving it, the user agent is redirected via a 3xx HTTP response code to URI2, then the resource identified by URI2 MUST itself be compared with the Resource Set definition to determine whether or not it is an element of the set.

Recognizing that there may be circumstances where this default behavior may cause unnecessary latency, redirected resources MAY be included by use of the includeRedirection property. The range of this RDF property allows for any of HttpAnyRedirect, HttpPermRedirect or HttpTempRedirect to be given as its value. These classes are all based on those defined in the HTTP in RDF vocabulary [HTTPRDF]. See the POWDER Vocabulary [VOC] for details. As their names suggest, the HTTP redirection classes allow Resource Set definitions to allow any redirection, specifically permanent redirection (i.e. HTTP response code 301) or any of the temporary redirection HTTP response codes (302, 303 and 307).

Example 2-16: Resource Set definition using includeRedirection property

<wdr:ResourceSet>
  <wdr:includeHosts>example.org</wdr:includeHosts>
  <wdr:includeRedirection rdf:resource="http://www.w3.org/2007/05/powder#HttpPermRedirect" />
</wdr:ResourceSet>

Example 2-16 encodes that if, when resolving any URI on the example.org domain (or its sub-domains), the user agent is redirected through a 301 (permanent) HTTP response code then the target resources are elements of the Resource Set, even if those resources are on a different domain. Resources resolved following other redirects would not be included unless they were also on the example.org domain.

3 Grouping by Resource Property

The definition of a Resource Set by reference to the addresses of its elements is not always practical or relevant. For example, numeric URIs generated by a content management system may not reveal any information about a given resource and there are other situations where knowledge of one property itself allows the inference of further properties. For example, if the title of a document includes the word 'draft' it may be possible to infer that different terms of use apply than if the word is absent.

We therefore provide two RDF properties, includeConditional and excludeConditional, the object of which is the base RDFS Class 'Resource' that represents the resources that are, or are not, elements of the set respectively. Any characteristic of those resources can be defined in the usual way to confer membership of the Resource Set or exclude resources from it. For instance, Example 3-1 defines the set of resources on the example.org domain whose language is French (the prefix ex denoting any vocabulary). Although, in common with most other set definition terms, includeConditional and excludeConditionalmay each only occur once in a Resource Set definition, any number of predicates from any vocabularies may be defined as RDF properties of the RDFS Class to which they link.

Example 3-1: Basic Resource Set definition by resource property.

<wdr:ResourceSet>
  <wdr:includeHosts>example.org</wdr:includeHosts>
  <wdr:includeConditional rdf:parseType="Resource">
    <ex:lang>fr</ex:lang>
  </wdr:includeConditional>
</wdr:ResourceSet>

Importantly, it is for the processor to define a suitable method for determining whether a candidate resource has the stated property and therefore whether or not it is an element of the Resource Set. It follows that using includeConditional and excludeConditional breaks the design goals since a generic POWDER processor will not be able to determine with certainty whether a given candidate resource is or is not an element of the Resource Set. Referring to Example 3-1, two different outputs are possible:

  1. The candidate resource is not an element of the Resource Set as it is not on the example.org domain.
  2. The candidate resource is an element of the Resource Set if it is written in French

However, a Resource Set definition may offer a hint as to the best method to take in making such a determination.

The RDF property lookUpService links to a description of a service through which resource properties MAY be discovered (if the processor has another method available, it is acceptable to use it). Such a description may be a natural language document, a WSDL file or be in any other format. Such a description would allow a POWDER processor to be extended to give a definitive answer as to whether a candidate resource was or was not an element of the Resource Set. The following example explores this further.

The trustmark.example organization wants to define a Resource Set as everything on the Web sites to which it has granted its seal of approval. It can then publish a Description Resource [DR] that provides a semantically-rich, machine-processable version of that seal, effectively automating its 'click to verify' system. Since the organization already publishes the list of approved Web sites, both in an HTML document and as an ATOM feed, determining whether a candidate resource is or is not an element of the resource Set is straightforward, albeit outside the POWDER processing model.

In Example 3-2, the lookUpService property points to a natural language document (at http://trustmark.example/doc.htm) that gives the URI of the list of approved Web sites, an ATOM feed of those same sites, and additional details of how the data is presented. This allows a developer to extend a POWDER processor to identify with certainty whether a candidate resource is or is not in the set of resources that carry the trustmark by referring to the data in whichever format he/she finds easiest.

Example 3-2: Resource Set definition including resource property and look up service description.

<wdr:ResourceSet>
  <wdr:includeConditional>
    <rdfs:Resource>
      <ex:lang>fr</ex:lang>
    </rdfs:Resource>
  </wdr:includeConditional>
  <wdr:lookUpService rdf:resource="http://trustmark.example/doc.htm" />
</wdr:ResourceSet>

The model here is similar to that used for HTML Profile [HTMLPROF] — the look up service description will not be parsed every time the Resource Set is queried. Rather, the expectation is that the descriptive document should remain stable and its contents become well established. Citing the document should therefore be sufficient for established look up services to be identified and used.

4 Conjunction and disjunction

4.1 Combining Definitions Within a Resource Set

As set out briefly in Section 2.1 and referred to throughout this document, Resource Sets are defined using RDF properties whose values are white space separated lists of possible values. The exceptions to this are the includeRegEx and excludeRegEx properties which take a single Regular Expression. Taken from the point of view of determining whether a candidate resource is or is not an element of the Resource Set, the values of the include RDF properties are combined with logical OR. In Example 4-1, the candidate resource is an element of the Resource Set if it is on example.org OR example.com.

Example 4-1: Resource Set definition with a two element list

<wdr:ResourceSet>
  <wdr:includeHosts>example.org example.com</wdr:includeHosts>
</wdr:ResourceSet> 

This is the only way to encode the set of resources on these two hosts (excepting the possibility of doing so using a Regular Expression). A validation error SHOULD be raised if any set definition RDF property, other than includePathContains or includeQueryContains, appears more than once in a given Resource Set. Example 4-2 is therefore invalid.

Example 4-2: An invalid Resource Set definition in which a single RDF property appears more than once

<wdr:ResourceSet>
  <wdr:includeHosts>example.org</wdr:includeHosts>
  <wdr:includeHosts>example.com</wdr:includeHosts>
</wdr:ResourceSet> 

A candidate resource MUST satisfy ALL definitions in a given Resource Set. Therefore the set of all resources on example.org or example.com that have a path starting with foo or bar is defined as shown in Example 4-3.

Example 4-3: Resource Set definition with two RDF properties

<wdr:ResourceSet>
  <wdr:includeHosts>example.org example.com</wdr:includeHosts>
  <wdr:includePathStartsWith>/foo /bar</wdr:includePathStartsWith >
</wdr:ResourceSet> 

Expressed using set theory, each RDF property is a resource set definition intentionally denoting a set of resources.

Thus, given the following two resource set definitions:

D1 = includeHosts(?x, {example.com, example.org})

D2 = includePathStartsWith(?x, {foo, bar})

the Resource Set is the intersection of the extension of such resource set definitions:

RS = D1ID2I

In natural language, the same is true for the exclude properties. That is, Example 4-4 says that a resource is a member of the set if it is on example.org and does not have a path beginning with foo or bar.

Example 4-4: Resource Set definition with two RDF properties

<wdr:ResourceSet>
  <wdr:includeHosts>example.org</wdr:includeHosts>
  <wdr:excludePathStartsWith>/foo /bar</wdr:includePathStartsWith >
</wdr:ResourceSet> 

However, when converting from natural language into Boolean logic, we actually need to combine the listed values for the exclude properties with AND. Example 4-4 can be written as

if (host = example.org) AND (path ≠ foo) AND (path ≠ bar)

This is an application of DeMorgan's Theorem which states that if P and Q are Boolean statements then the expression: NOT(P OR Q) is equivalent to NOT(P) AND NOT(Q). More formally:

¬(P ∨ Q) = ¬(P) ∧ ¬(Q)

It is therefore consistent to state that POWDER processors MUST:

This is made explicit in the POWDER Vocabulary [VOC].

4.2 Combining Multiple Resource Sets

It is believed that the RDF properties described in sections 2 and 3 provide sufficient flexibility to cover the majority of uses for the grouping of resources. However, there is a clear limit on expressivity which needs to be addressed, for example, it is impossible using the system described so far to express the set of resources on example.org with a path beginning with foo and the resources on example.com that have a path beginning with bar (again, that is, it's impossible without using the includeRegEx property and a regular expression). To define such a Resource Set requires the union of two discrete sets and this can be achieved using the OWL set operators [OWLSO], as shown in Example 4-5.

Example 4-5: A Resource Set formed from a union of two sub sets

1  <wdr:ResourceSet>
2    <owl:unionOf rdf:parseType="Collection">

3      <wdr:ResourceSet>
4        <wdr:includeHosts>example.org</wdr:includeHosts>
5        <wdr:includePathStartsWith>/foo</wdr:includePathStartsWith>
6      </wdr:ResourceSet>

7      <wdr:ResourceSet>
8        <wdr:includeHosts>example.com</wdr:includeHosts>
9         <wdr:includePathStartsWith>/bar</wdr:includePathStartsWith>
10     </wdr:ResourceSet>

11   </owl:unionOf>
12 </wdr:ResourceSet> 

Lines 3 - 6 and 7 - 10 of Example 4-5 are Resource Set definitions in their own right and the overall Resource Set is the union of these two. Formally we can write:

D1 = includeHosts(?x, {example.org})

D2 = includePathStartsWith(?x, {foo})

D3 = includeHosts(?x, {example.com})

D4 = includePathStartsWith(?x, {bar})

RS1 = D1ID2I

RS2 = D3ID4I

RS = RS1RS2

OWL's intersectionOf set operator can also be used although it is anticipated that this will be rare since a Resource Set is the intersection of the various sets defined within it. One scenario where it is appropriate to use owl:intersectionOf is where Resource Sets are defined by reference to multiple external data sources using the property look up method described in Section 3.2.

In theory, the OWL complementOf property can also be used. However, this can readily lead to significant logic problems since it is an 'open world' definition. To give an example, in order to determine the elements of the set of movies that have not received bad reviews, one would have to collect all movie reviews ever published and note the ones that were not bad. Since it is a critical design goal that a processor MUST be able to determine with certainty whether a candidate resource is or is not an element of a Resource Set, the OWL complementOf property SHOULD NOT be used.

A combination of the exclude RDF properties described in sections 2 and 3 and OWL's unionOf operator can be used to create precise, that is, closed world, Resource Set definitions that exclude particular resources. For example, at the end of Section 1.2 we claimed that it is possible to define the set of "all resources on example.com except those on video.example.com shot in widescreen format." Example 4-6 shows how this can be done in a relatively few lines.

Example 4-6: A relatively complex Resource Set definition excluding certain resources (without using owl:complementOf)

<wdr:ResourceSet>
  <owl:unionOf rdf:parseType="Collection">
    
  <wdr:ResourceSet>
    <wdr:includeHosts>example.com</includeHosts>
    <wdr:excludeHosts>video.example.com</wdr:includeHosts>
  </wdr:ResourceSet>

  <wdr:ResourceSet>
    <wdr:includeHosts>example.com</includeHosts>
    <wdr:excludeConditional rdf:parseType="Resource">
      <ex:format>widescreen</ex:format>
    </wdr:excludeConditional>
  </wdr:ResourceSet>

  </owl:unionOf>
</wdr:ResourceSet>

The owl:unionOf operator may be used to create highly complex nested Resource Set definitions such as that shown in Example 4-7.

Example 4-7: A complex Resource Set definition using nested sets

<wdr:ResourceSet>
  <owl:unionOf rdf:parseType="Collection">

    <wdr:ResourceSet>
      <wdr:includeHosts>example.org</wdr:includeHosts>
      <owl:unionOf rdf:parseType="Collection">

        <wdr:ResourceSet>
          <wdr:includePathStartsWith>/foo</wdr:includePathStartsWith>
        </wdr:ResourceSet>

        <wdr:ResourceSet>
          <owl:unionOf rdf:parseType="Collection">

            <wdr:ResourceSet>
              <wdr:includePathEndsWith>bar</wdr:includePathEndsWith>
            </wdr:ResourceSet>

            <wdr:ResourceSet>
              <wdr:excludePathEndsWith>foo</wdr:includePathEndsWith>
            </wdr:ResourceSet>

          </owl:unionOf>
        </wdr:ResourceSet>

      </owl:unionOf>
    </wdr:ResourceSet>

    <wdr:ResourceSet>
      <wdr:includeHosts>example.com</wdr:includeHosts>
      <wdr:includePathStartsWith>/bar</wdr:includePathStartsWith>
    </wdr:ResourceSet>

  </owl:unionOf>
</wdr:ResourceSet> 

Whilst Resource Set definitions like Example 4-7 are possible, their use will place a substantial burden on the processor and SHOULD be avoided. The Resource Set it defines is the set of resources on example.org with a URI path starting with foo or ending with either foo or bar, plus the resources on example.com that have a URI path starting with bar.

It is important to note that, when a set definition denotes resource by their address, we can obtain the same result by using the includeRegEx property, which would usually provide a more efficient solution. Example. 4-7 can be rewritten as shown in Example 4-8.

Example 4-8: A more efficient way of expressing the same Resource Set shown in Example 4-7

<wdr:ResourceSet>
  <wdr:includeRegEx>(example.org\/(foo)|(.*(foo|bar)$))|(example.com\/bar)</wdr:includeRegEx>
</wdr:ResourceSet> 

5 Logical Inconsistency

It is recognized that a number of the design goals and constraints set out in Section 1.1 are in tension with each other, notably that Resource Set definitions must be easy to write, be comprehensible by humans and, as far as is possible, should avoid including or excluding resources unintentionally.

To answer the call to make it easy to write Resource Set definitions, a wide variety of RDF properties have been defined that are, it is hoped, easy to use and comprehend by humans. It is anticipated that Example 5-1 will be typical.

Example 5-1: A simple Resource Set definition anticipated as being typical

<wdr:ResourceSet>
  <wdr:includeHosts>example.mobi</includeHosts>
  <wdr:excludePathStartsWith>/cgi-bin /test /private</wdr:excludePathStartsWith>
</wdr:ResourceSet>

This is analogous to the sort of resource grouping in a robots.txt file [ROBOTS] that invites crawlers to probe all parts of a Web site except the cgi-bin, the testing and private areas.

Now suppose that the content provider responsible for example.mobi sets up a service called 'Test Your IQ.' realizing that the Resource Set definition will exclude the testyouriq section of the Web site (as it begins with test), he/she adds a new line to the Resource Set definition in an attempt specifically to include the new section thus:

Example 5-2: A repeat of Example 5-1 with an additional line of data

<wdr:ResourceSet>
  <wdr:includeHosts>example.mobi</includeHosts>
  <wdr:excludePathStartsWith>/cgi-bin /test /private</wdr:excludePathStartsWith>
  <wdr:includePathStartsWith>/testyouriq</wdr:includePathStartsWith>
</wdr:ResourceSet>

This would not have the desired effect! The critical part of this definition now says that a candidate resource is a member of the Resource Set if it has a path that begins with testyouriq AND does NOT have a path that begins with test. This can never be true and therefore Example 5-2 is equivalent to the empty set.

This example serves to highlight an important point: that it is perfectly possible to create a set definition that includes logical inconsistencies. A POWDER processor MUST, indeed can only, treat such Resource Set definitions as the Empty Set.

The correct solution to the problem is not to specify a further property in the original Resource Set, but to create an additional Resource Set definition and combine the two with an owl:unionOf operator thus:

Example 5-3: A corrected version of Example 5-2

<wdr:ResourceSet>
  <owl:unionOf rdf:parseType="Collection">

    <wdr:ResourceSet>
      <wdr:includeHosts>example.mobi</includeHosts>
      <wdr:excludePathStartsWith>/cgi-bin /test /private</wdr:excludePathStartsWith>
    </wdr:ResourceSet>

    <wdr:ResourceSet>
      <wdr:includeHosts>example.mobi</includeHosts>
      <wdr:includePathStartsWith>/testyouriq</wdr:includePathStartsWith>
    </wdr:ResourceSet>

  </owl:unionOf>
<wdr:ResourceSet>

6 Extension Mechanism

In this document we have laid out just two methods to define a set of resources: one referring to resource addresses and the other to resource properties. The address-based methods are clearly designed to be used with information resources available on the Web that can be identified by matching things like host names, paths and IP addresses. There is no limit on the distinguishing characteristics that can be used to define a set of resources, however, and so there should not be unnecessary constraints on how the protocol works.

The POWDER Vocabulary [VOC] uses pre-defined data types from XML Schema as well as other atomic data types, and then derives list data types from them. As the following examples show, an analogous approach can be taken with any system used for identifying resources so that little augmentation would be needed for a POWDER processor to be able to handle the data.

Importantly, if a Resource Set is defined using any term that the processor does not recognize then it MUST treat it as the empty set.

6.1 Extension Example 1: ISAN

The International Standard Audiovisual Number [ISAN1] is a voluntary numbering system for the identification of audiovisual works. Following ISO 15706, the numbers are written as 24 bit hexadecimal digits in the following format [ISAN2].

-----root-----episode-version-
ISAN1881-66C7-3420-0000-7-9F3A-0245-U

The root of an ISAN number is assigned to a core work with the other numbers being used for things like episodes, different language versions, promotional trailers and so on.

A vocabulary can readily be defined to allow Resource Sets to be defined based on ISAN numbers. The terms might be along the lines of:

includeRoots — the value of which would be a white space separated of hexadecimal digits and hyphens that would be matched against the first three blocks in the ISAN number.

includeEpisodes — a white space separated list of hexadecimal digits and hyphens that would be matched against the 4th block of 4 digits in the ISAN number.

includeVersions — a white space separated list of hexadecimal digits and hyphens that would be matched against the 5th and 6th blocks of 4 digits in the ISAN number.

includeIsanPattern — a regular expression that should be matched against the entire ISAN number.

The set of all audio visual resources that relate to two particular works might then be defined as shown in Example 6-1.

Example 6-1: A Resource Set Definition using an ISAN number pattern

<wdr:ResourceSet>
  <ex_isan:includeRoots>1881-66C7-3420 1881-66C7-3421</ex_isan:includeRoots>
</wdr:ResourceSet>

6.2 Extension Example 2: Custom URL Patterns

Developers may create their own URL patterns for use in specific services. For example, Google Custom Search Engine [Google] uses wildcards so that www.example.org/* means "all the resources on www.example.org." Such a system is easily used within a Resource Set, only requiring the definition of a single RDF property myPattern as shown below.

Example 6-2 A Resource Set definition using a custom URL pattern

<wdr:ResourceSet>
  <ex:myPattern>www.example.org/*</ex:myPattern >
</wdr:ResourceSet>

7 References

Normative References

[URIS]
RFC 3986 — Uniform Resource Identifiers (URI): Generic Syntax, T. Berners-Lee, R. Fielding and L. Masinter, IETF, January 2005. This document is http://tools.ietf.org/html/rfc3986.
[IRIS]
RFC 3987 — Internationalized Resource Identifiers (IRIs), M. Duerst and M. Suignard, IETF, January 2005. This document is at http://www.ietf.org/rfc/rfc3987.txt
[UTF-8]
RFC 3629 — UTF-8, a transformation format of ISO 10646, F. Yergeau, November 2003. This document is at http://www.ietf.org/rfc/rfc3629.txt
[RFC3490]
RFC 3490 — Internationalizing Domain Names in Applications (IDNA) P. Faltstrom, P. Hoffman, A. Costello. This document is at http://www.ietf.org/rfc/rfc3490.txt
[WAF]
Enabling Read Access for Web Resources A van Kesteren. This document is at http://www.w3.org/TR/access-control/
[XQXP]
XQuery 1.0 and XPath 2.0 Functions and Operators, A. Malhotra, J. Melton, N. Walsh. W3C Recommendation 23 January 2007. This document is at http://www.w3.org/TR/xpath-functions/
[CIDR]
RFC 1518 — An Architecture for IP Address Allocation with CIDR, Y. Rekhter and T. Li, editors, IETF, September 1993. This document is http://tools.ietf.org/html/rfc1518.
[HTTPCODE]
Part of Hypertext Transfer Protocol -- HTTP/1.1, RFC 2616 Fielding, et al. This document is http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html .
[HTTPRDF]
HTTP Vocabulary in RDF J Koch, C Velasco, S Abou-Zahra. This document is at http://www.w3.org/TR/HTTP-in-RDF/
ATOM
Atom Format Nottingham & Sayre. This document is at http://www.ietf.org/rfc/rfc4287.txt
HTMLPROF
HTML 4.01 D. Raggett, A. Le Hors, I. Jacobs. This document is at http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#profiles
[OWLSO]
OWL Web Ontology Language Guide: Set Operators M. Smith, C. Welty, D. McGuinness. This document is at http://www.w3.org/TR/2004/REC-owl-guide-20040210/
[WSDL]
Web Services Description Language (WSDL) 1.1, E Christensen, F Curbera, G Meredith, S Weerawarana. This document is at http://www.w3.org/TR/wsdl

Sources

[DR]
Protocol for Web Description Resources (POWDER): Description Resources, K Smith, P Archer, A Perego. This document is at http://www.w3.org/TR/powder-dr/
[VOC]
Protocol for Web Description Resources (POWDER): Web Description Resources (WDR) Vocabulary, A Perego, P Archer. This document is at http://www.w3.org/TR/powder-voc/
[WDRD]
Protocol for Web Description Resources (POWDER): Web Description Resources Datatypes (WDRD), A Perego, P Archer, K Smith. This document is at http://www.w3.org/TR/powder-xsd/
[Rabin]
URI Pattern Matching for Groups of Resources J Rabin, Draft 0.1 17 June 2006. This document is at http://www.w3.org/2005/Incubator/wcl/matching.html
[WCL-XG]
W3C Content Label Incubator Group February 2006 - February 2007
[PUC]
POWDER: Use Cases and Requirements, P. Archer, July 2007. This document is at http://www.w3.org/TR/powder-use-cases/
[URISpace]
URISpace 1.0, M. Nottingham, W3C Note 15 February 2001
[SPARQL]
SPARQL Query Language for RDF E Prud'hommeaux, A Seaborne. This document is at http://www.w3.org/TR/rdf-sparql-query/
[SOAP]
See, for example, SOAP Version 1.2 Part 0: Primer (Second Edition) N Mitra, Y Lafon. This document is at http://www.w3.org/TR/soap12-part0/.
[ROBOTS]
robotstxt.org This document is at http://www.robotstxt.org/.
[ISAN1]
International Standard Audiovisual Number
[ISAN2]
ISAN FAQs: What is the ISAN? This document is at http://www.isan.org/portal/page?_pageid=166,41960&_dad=portal&_schema=PORTAL.
[Google]
Google Custom Search Engine URL Patterns

8 Acknowledgments

The editors duly acknowledge the earlier work in this area carried out by Jo Rabin and the contributions made by all members of the POWDER Working Group.

9 Change Log

Changes since First Public Working Draft

  1. Updated introduction to refer to vocabulary and XML data types documents. Corrected erroneous use of 'QNames'.
  2. Small addition to the introduction to Grouping by address paragraph.
  3. Update status section
  4. Renumbering of sections previous 2.2 - 2.5
  5. Insertion of Grouping using Wildcards following discussion with Web Application Formats Working Group
  6. Resolution of open question on choice of Regular Expression syntax. Now use XML Schema REs as modified by XPath/XQuery for consistency with other W3C work - the syantx more than meets POWDER's requirements. Data type to be defined in POWDER's own XML Schema
  7. Added hyperlinks to the first mention of each Class and property, pointing to its entry in the vocabulary document
  8. Removed includeUserInfo and includeFragments properties since these are not strictly part of HTTP, the former can cause security issues, especially when written as username:password, and grouping by fragments is very vague since there is no sure way to define the end of a fragment.
  9. Section 3 completely rewritten. Feature at Risk marker removed.