POWDER: Resource Grouping

DR scope definition:

by URI
by property:
- properties derived from analyzing resources themselves
- properties derived from additional data (e.g., DRs) associated with resources
- properties included in HTTP headers – e.g., language, encoding, MIME type
- others?

Table 1
Property	URI component
`hasScheme`	`scheme`
`hasUserInfo`	`userInfo`
`hasHost`	`host`
`hasPort`	`port`
`hasPath`	`path`
`hasQuery`	`query`
`hasFragment`	`fragment`
`hasURI`	`whole URI`
`hasIP`	`host IP address`

The current set of proposed properties to be used for specifying constraints on resources' URIs are reported in Table 1.

To be decided whether regular expressions should be used:

pros: flexibility
cons: not so usable for end users having limited technical knowledge (is this really an issue to be addressed? DRs and scopes are usually specified by humans, or rather by tools?)

Alternatives (not exclusive):

Table 2
Property	Matching rule	Match
`hasScheme`	`exact`	True
`hasNotScheme`	`exact`	False
`hasUserInfo`	`exact`	True
`hasNotUserInfo`	`exact`	False
`hasHost`	`endsWith`	True
`hasNotHost`	`endsWith`	False
`hasPort`	`exact`	True
`hasNotPort`	`exact`	False
`hasPath`	`startsWith`	True
`hasNotPath`	`startsWith`	False
`hasQuery`	`startsWith`	True
`hasNotQuery`	`startsWith`	False
`hasFragment`	`startsWith`	True
`hasNotFragment`	`startsWith`	False
`hasURI`	`contains`	True
`hasNotURI`	`contains`	False
`hasIP`	`exact` (`startsWith`?)	True
`hasNotIP`	`exact` (`startsWith`?)	False

defining default matching rules for each property: in such a case also the negative match (for all or some properties) should be defined explicitly, whereas, if we use regular expressions, this is not needed (see Table 2 for examples)
defining further properties (modifiers) corresponding to specific matching rules, allowing the specification of regular expressions in a way which is more transparent to end users (e.g., exact, endsWith, startsWith, contains); the negation can be expressed either by a specific modifier (not) or by defining additional properties (e.g., hasNotURI). NB: modifiers can be defined also as XML attributes
defining default matching rules for each property and allowing the specification of both "normal" strings and regular expressions: in the former case, the default matching rule of a property is used

Possible solutions?

Using the current approach – regular expressions for some properties and default matching rule for others
Combining alternatives 1 and 2:
- default matching rules may be used in the most general cases
- if more flexibility is needed, specific property modifiers can be used: in such a case, modifiers override the default matching rule of a property
- defining a further property (regEx) to enforce the full flexibility of regular expressions
Using alternative 3

The following examples summarize the possible alternative solutions.

Example 1: using only regular expressions

<wdr:Scope>
  <wdr:hasScheme>^http$</wdr:hasScheme>
  <wdr:hasHost>example.org$</wdr:hasHost>
  <wdr:hasIP>^213.249.189.194$</wdr:hasIP>
  <wdr:hasPath>^foo</wdr:hasPath>
  <wdr:hasPath>^bar</wdr:hasPath>
  <wdr:hasPath>\.jpg$</wdr:hasPath>
  <wdr:hasURI>?!http://www.example.org/foo/bar.png</wdr:hasURI> <!-- negation can be expressed by the regular expression itself -->
</wdr:Scope>

Example 2: using default matching rules (Table 2)

<wdr:Scope>
  <wdr:hasScheme>http</wdr:hasScheme>
  <wdr:hasHost>example.org</wdr:hasHost>
  <wdr:hasIP>213.249.189.194</wdr:hasIP>
  <wdr:hasPath>foo</wdr:hasPath>
  <wdr:hasPath>bar</wdr:hasPath>
  <wdr:hasPath>.jpg</wdr:hasPath> <!-- this does not mean "ends with"! -->
  <wdr:hasNotURI>http://www.example.org/foo/bar.png</wdr:hasNotURI>
</wdr:Scope>

Example 3: using only property modifiers

<wdr:Scope>
  <wdr:hasScheme>
    <wdr:exact>http</wdr:exact>
  </wdr:hasScheme>
  <wdr:hasHost>
    <wdr:endsWith>example.org</wdr:endsWith>
  </wdr:hasHost>
  <wdr:hasIP>
    <wdr:exact>213.249.189.194</wdr:exact>
  </wdr:hasIP>
  <wdr:hasPath>
    <wdr:startsWith>foo</wdr:startsWith>
  </wdr:hasPath>
  <wdr:hasPath>
    <wdr:startsWith>bar</wdr:startsWith>
  </wdr:hasPath>
  <wdr:hasPath>
    <wdr:endsWith>.jpg</wdr:endsWith>
  </wdr:hasPath>
  <wdr:hasURI>
    <wdr:not>
      <wdr:exact>http://www.example.org/foo/bar.png</wdr:exact>
    </wdr:not>
  </wdr:hasURI>
</wdr:Scope>

Example 4: using default matching rules (Table 2) + regular expressions when needed

<wdr:Scope>
  <wdr:hasScheme>http</wdr:hasScheme>
  <wdr:hasHost>example.org</wdr:hasHost>
  <wdr:hasIP>213.249.189.194</wdr:hasIP>
  <wdr:hasPath>foo</wdr:hasPath>
  <wdr:hasPath>bar</wdr:hasPath>
  <wdr:hasPath>\.jpg$</wdr:hasPath>
  <wdr:hasNotURI>http://www.example.org/foo/bar.png</wdr:hasNotURI>
</wdr:Scope>

Example 5: using default matching rules (Table 2) + modifiers when needed

<wdr:Scope>
  <wdr:hasScheme>http</wdr:hasScheme>
  <wdr:hasHost>example.org</wdr:hasHost>
  <wdr:hasIP>213.249.189.194</wdr:hasIP>
  <wdr:hasPath>foo</wdr:hasPath>
  <wdr:hasPath>bar</wdr:hasPath>
  <wdr:hasPath>
    <wdr:endsWith>.jpg</wdr:endsWith>
  </wdr:hasPath>
  <wdr:hasNotURI>http://www.example.org/foo/bar.png</wdr:hasNotURI>
</wdr:Scope>

The concept here is that other data may already be available, as likely as not in a format other than DRs, from which we can find out whether a given resource is in or out of scope. This might be an RDF dump, a web page (the data from which we can extract using GRDDL) or whatever.

<wdr:Scope>
  <wdr:hasProperty>
    <wdr:Property>
      <ex:colour>red</ex:colour>
    </wdr:Property>
  </wdr:hasProperty>
  <wdr:propLookUp rdf:resource="http://sparql.example.com" />
</wdr:Scope>

Here we specify a property (colour = red) and the URI of a service which can tell us whether the resource satisfies the constraint or not. We can send a SPARQL request to propLookUp and find out whether the resource we're interested in is red or not.

In order to facilitate the creation of a standard "POWDER processor" we need to be able to specify a SPARQL request that will be sent. But, apart from incuding the word sparql in the propLookUp URI, we haven't specified that SPARQL is to be used. Do we need a further property for this? The answer is yes only if we also wish to support other methods. For example, a SOAP service may return property values? And if we support SOAP and SPARQL what else needs support? The list becomes infinite so we'd probably need to specify an extensibility method.

<wdr:Scope>
  <wdr:hasProperty>
    <wdr:Property>
      <ex:colour>red</ex:colour>
    </wdr:Property>
  </wdr:hasProperty>
  <wdr:propLookUp rdf:resource="POWDER-ns#self" />
</wdr:Scope>

Here we aree trying to indicate that the only way to find out whether the resource is red or not is to fetch it and examine it. Maybe this is the default? In which case there would be no need to inclue the propLookUp property?

<wdr:Scope>
  <wdr:hasProperty>
    <wdr:Property>
      <ex:language>it</ex:language>
    </wdr:Property>
  </wdr:hasProperty>
  <wdr:propLookUp rdf:resource="POWDER-ns#HTTP-RESPONSE" />
</wdr:Scope>

The intention here is to indicate that the HTTP Response Headers will carry the information necessary to determine whether or not a given resource is in scope - in this case whether it is available in Italian. If the necessary data is available from the HTTP Response headers then a HEAD request should be sufficient to determine whether the resource is or is not in scope.

The semantics of the constraints in a scope are currently implicit, according to the principle that instances of the same property are in OR, whereas instances of different conditions are in AND.

For instance, the scope

<wdr:Scope>
  <wdr:hasHost>example.org$</wdr:hasHost>
  <wdr:hasHost>example.net$</wdr:hasHost>
  <wdr:hasPath>^foo</wdr:hasPath>
  <wdr:hasPath>^bar</wdr:hasPath>
</wdr:Scope>

denotes all the resources hosted either by example.org or example.net, where the path component of their URIs starts either with foo or bar.

It may be the case that we wish to denote all the resources hosted either by example.org, where the path component of their URIs starts with foo, or example.net, where the path component of their URIs starts with bar.

This may be expressed by associating with the same DR two distinct scopes, considered as alternative:

<wdr:Scope>
  <wdr:hasHost>example.org$</wdr:hasHost>
  <wdr:hasPath>^foo</wdr:hasPath>
</wdr:Scope> 

<wdr:Scope>
  <wdr:hasHost>example.net$</wdr:hasHost>
  <wdr:hasPath>^bar</wdr:hasPath>
</wdr:Scope>

Alternatively, if we use a RegEx-based approach, we may specify that there can only be ONE instance of each property type and use the RegEx structure to handle 'OR'

<wdr:Scope>
  <wdr:hasHost>example.org$|example.net$</wdr:hasHost>
  <wdr:hasPath>^foo|^bar</wdr:hasPath>
</wdr:Scope>

To define the resources on example.org where the path component starts with foo and the resources on example.net where the path begins with bar we would then use the hasURI property thus:

<wdr:Scope>
  <wdr:hasURI>example.org/foo|example.net/bar</wdr:hasURI>
</wdr:Scope>

The disadvantage of this is relative complexity (you need to understand RegExes to read the scope statement. This makes the standard more dependent on tools).

The advantage is that we can use OWL constraints to specify that there must be 0 or 1 instances of each thing, especially hasScope, since to allow multiple scope statements requires a little more work on the security side – i.e. if I publish a DR with a scope of example.org and you publish some triples that say the scope is also example.mobi, we need to examine the provenance of each triple to establish trust. Perfectly possible, but that's the payback.

Scopes can be associated with:

a specific DR, by using the hasScope property
a set of DRs (a package), by using the hasPackageScope property

A question: do we really need two distinct properties?

A scope can be

included in the specification of a DR or of a package (see Example 6)
“standalone” (see Example 7)

Example 6:

<wdr:Package rdf:ID="package">
  <wdr:hasPackageScope>
    <wdr:Scope>
      <wdr:hasHost>example.org</wdr:hasHost>
    </wdr:Scope>
  </wdr:hasPackageScope>
  <wdr:hasDRs rdf:parseType="Collection">
    <rdf:Description rdf:about="#dr_1" />
    <rdf:Description rdf:about="#dr_2" />
    <rdf:Description rdf:about="#dr_3" />
  </wdr:hasDRs>
</wdr:Package>

<wdr:WDR rdf:ID="dr_3">
  <foaf:maker rdf:resource="http://labellingauthority.example.org/foaf.rdf#me" />
  <dcterms:issued>2006-09-01</dcterms:issued>
  <wdr:validUntil>2007-09-01</wdr:validUntil> 
  <wdr:hasScope>
    <wdr:Scope>
      <wdr:hasHost>example.org</wdr:hasHost>
    </wdr:Scope>
  </wdr:hasScope>
  <wdr:hasDescription rdf:resource="#description_3" />
</wdr:WDR>

Example 7:

<wdr:Package rdf:ID="package">
  <wdr:hasPackageScope rdf:resource="#primaryScope" />
  <wdr:hasDRs rdf:parseType="Collection">
    <rdf:Description rdf:about="#dr_1" />
    <rdf:Description rdf:about="#dr_2" />
    <rdf:Description rdf:about="#dr_3" />
  </wdr:hasDRs>
</wdr:Package>

<wdr:WDR rdf:ID="dr_3">
  <foaf:maker rdf:resource="http://labellingauthority.example.org/foaf.rdf#me" />
  <dcterms:issued>2006-09-01</dcterms:issued>
  <wdr:validUntil>2007-09-01</wdr:validUntil> 
  <wdr:hasScope rdf:resource="#primaryScope" />
  <wdr:hasDescription rdf:resource="#description_3" />
</wdr:WDR>

<wdr:Scope rdf:ID="primaryScope">
  <wdr:hasHost>example.org</wdr:hasHost>
</wdr:Scope>

The main issue here is: are we going to specify inheritance? i.e. that a DR within a package automatically inherits the scope of the package? It avoids repeating data, yes, but what about the situation, like T-Online, where rules are not used and link/rel tags point to specific DRs?

POWDER: Resource Grouping

Notes to aid discussion

Scope of a DR

Scope Defined by URI

Scope Defined by Property

Properties knowable from external data source

Properties Obtained by Analyzing Resources

Properties Included in HTTP Headers

Scope Semantics

Associating Scopes with DRs