POWDER: Resource Grouping

Notes to aid discussion


Scope of a DR

DR scope definition:

Scope Defined by URI

Table 1
Property URI component
hasSchemescheme
hasUserInfouserInfo
hasHosthost
hasPortport
hasPathpath
hasQueryquery
hasFragmentfragment
hasURIwhole URI
hasIPhost IP address

The current set of proposed properties to be used for specifying constraints on resources' URIs are reported in Table 1.

To be decided whether regular expressions should be used:

Alternatives (not exclusive):

Table 2
Property Matching rule Match
hasSchemeexactTrue
hasNotSchemeFalse
hasUserInfoexactTrue
hasNotUserInfoFalse
hasHostendsWithTrue
hasNotHostFalse
hasPortexactTrue
hasNotPortFalse
hasPathstartsWithTrue
hasNotPathFalse
hasQuerystartsWithTrue
hasNotQueryFalse
hasFragmentstartsWithTrue
hasNotFragmentFalse
hasURIcontainsTrue
hasNotURIFalse
hasIPexact (startsWith?)True
hasNotIPFalse
  1. defining default matching rules for each property: in such a case also the negative match (for all or some properties) should be defined explicitly, whereas, if we use regular expressions, this is not needed (see Table 2 for examples)
  2. defining further properties (modifiers) corresponding to specific matching rules, allowing the specification of regular expressions in a way which is more transparent to end users (e.g., exact, endsWith, startsWith, contains); the negation can be expressed either by a specific modifier (not) or by defining additional properties (e.g., hasNotURI). NB: modifiers can be defined also as XML attributes
  3. defining default matching rules for each property and allowing the specification of both "normal" strings and regular expressions: in the former case, the default matching rule of a property is used

Possible solutions?

  1. Using the current approach – regular expressions for some properties and default matching rule for others
  2. Combining alternatives 1 and 2:
    • default matching rules may be used in the most general cases
    • if more flexibility is needed, specific property modifiers can be used: in such a case, modifiers override the default matching rule of a property
    • defining a further property (regEx) to enforce the full flexibility of regular expressions
  3. Using alternative 3

The following examples summarize the possible alternative solutions.

Example 1: using only regular expressions

<wdr:Scope>
  <wdr:hasScheme>^http$</wdr:hasScheme>
  <wdr:hasHost>example.org$</wdr:hasHost>
  <wdr:hasIP>^213.249.189.194$</wdr:hasIP>
  <wdr:hasPath>^foo</wdr:hasPath>
  <wdr:hasPath>^bar</wdr:hasPath>
  <wdr:hasPath>\.jpg$</wdr:hasPath>
  <wdr:hasURI>?!http://www.example.org/foo/bar.png</wdr:hasURI> <!-- negation can be expressed by the regular expression itself -->
</wdr:Scope> 

Example 2: using default matching rules (Table 2)

<wdr:Scope>
  <wdr:hasScheme>http</wdr:hasScheme>
  <wdr:hasHost>example.org</wdr:hasHost>
  <wdr:hasIP>213.249.189.194</wdr:hasIP>
  <wdr:hasPath>foo</wdr:hasPath>
  <wdr:hasPath>bar</wdr:hasPath>
  <wdr:hasPath>.jpg</wdr:hasPath> <!-- this does not mean "ends with"! -->
  <wdr:hasNotURI>http://www.example.org/foo/bar.png</wdr:hasNotURI>
</wdr:Scope> 

Example 3: using only property modifiers

<wdr:Scope>
  <wdr:hasScheme>
    <wdr:exact>http</wdr:exact>
  </wdr:hasScheme>
  <wdr:hasHost>
    <wdr:endsWith>example.org</wdr:endsWith>
  </wdr:hasHost>
  <wdr:hasIP>
    <wdr:exact>213.249.189.194</wdr:exact>
  </wdr:hasIP>
  <wdr:hasPath>
    <wdr:startsWith>foo</wdr:startsWith>
  </wdr:hasPath>
  <wdr:hasPath>
    <wdr:startsWith>bar</wdr:startsWith>
  </wdr:hasPath>
  <wdr:hasPath>
    <wdr:endsWith>.jpg</wdr:endsWith>
  </wdr:hasPath>
  <wdr:hasURI>
    <wdr:not>
      <wdr:exact>http://www.example.org/foo/bar.png</wdr:exact>
    </wdr:not>
  </wdr:hasURI>
</wdr:Scope> 

Example 4: using default matching rules (Table 2) + regular expressions when needed

<wdr:Scope>
  <wdr:hasScheme>http</wdr:hasScheme>
  <wdr:hasHost>example.org</wdr:hasHost>
  <wdr:hasIP>213.249.189.194</wdr:hasIP>
  <wdr:hasPath>foo</wdr:hasPath>
  <wdr:hasPath>bar</wdr:hasPath>
  <wdr:hasPath>\.jpg$</wdr:hasPath>
  <wdr:hasNotURI>http://www.example.org/foo/bar.png</wdr:hasNotURI>
</wdr:Scope> 

Example 5: using default matching rules (Table 2) + modifiers when needed

<wdr:Scope>
  <wdr:hasScheme>http</wdr:hasScheme>
  <wdr:hasHost>example.org</wdr:hasHost>
  <wdr:hasIP>213.249.189.194</wdr:hasIP>
  <wdr:hasPath>foo</wdr:hasPath>
  <wdr:hasPath>bar</wdr:hasPath>
  <wdr:hasPath>
    <wdr:endsWith>.jpg</wdr:endsWith>
  </wdr:hasPath>
  <wdr:hasNotURI>http://www.example.org/foo/bar.png</wdr:hasNotURI>
</wdr:Scope> 

Scope Defined by Property

Three approaches suggest themselves for this and there may be more.

Properties knowable from external data source

The concept here is that other data may already be available, as likely as not in a format other than DRs, from which we can find out whether a given resource is in or out of scope. This might be an RDF dump, a web page (the data from which we can extract using GRDDL) or whatever.

<wdr:Scope>
  <wdr:hasProperty>
    <wdr:Property>
      <ex:colour>red</ex:colour>
    </wdr:Property>
  </wdr:hasProperty>
  <wdr:propLookUp rdf:resource="http://sparql.example.com" />
</wdr:Scope>

Here we specify a property (colour = red) and the URI of a service which can tell us whether the resource satisfies the constraint or not. We can send a SPARQL request to propLookUp and find out whether the resource we're interested in is red or not.

In order to facilitate the creation of a standard "POWDER processor" we need to be able to specify a SPARQL request that will be sent. But, apart from incuding the word sparql in the propLookUp URI, we haven't specified that SPARQL is to be used. Do we need a further property for this? The answer is yes only if we also wish to support other methods. For example, a SOAP service may return property values? And if we support SOAP and SPARQL what else needs support? The list becomes infinite so we'd probably need to specify an extensibility method.

Properties Obtained by Analyzing Resources

<wdr:Scope>
  <wdr:hasProperty>
    <wdr:Property>
      <ex:colour>red</ex:colour>
    </wdr:Property>
  </wdr:hasProperty>
  <wdr:propLookUp rdf:resource="POWDER-ns#self" />
</wdr:Scope>

Here we aree trying to indicate that the only way to find out whether the resource is red or not is to fetch it and examine it. Maybe this is the default? In which case there would be no need to inclue the propLookUp property?

Properties Included in HTTP Headers

<wdr:Scope>
  <wdr:hasProperty>
    <wdr:Property>
      <ex:language>it</ex:language>
    </wdr:Property>
  </wdr:hasProperty>
  <wdr:propLookUp rdf:resource="POWDER-ns#HTTP-RESPONSE" />
</wdr:Scope>

The intention here is to indicate that the HTTP Response Headers will carry the information necessary to determine whether or not a given resource is in scope - in this case whether it is available in Italian. If the necessary data is available from the HTTP Response headers then a HEAD request should be sufficient to determine whether the resource is or is not in scope.

Scope Semantics

The semantics of the constraints in a scope are currently implicit, according to the principle that instances of the same property are in OR, whereas instances of different conditions are in AND.

For instance, the scope

<wdr:Scope>
  <wdr:hasHost>example.org$</wdr:hasHost>
  <wdr:hasHost>example.net$</wdr:hasHost>
  <wdr:hasPath>^foo</wdr:hasPath>
  <wdr:hasPath>^bar</wdr:hasPath>
</wdr:Scope> 

denotes all the resources hosted either by example.org or example.net, where the path component of their URIs starts either with foo or bar.

It may be the case that we wish to denote all the resources hosted either by example.org, where the path component of their URIs starts with foo, or example.net, where the path component of their URIs starts with bar.

This may be expressed by associating with the same DR two distinct scopes, considered as alternative:

<wdr:Scope>
  <wdr:hasHost>example.org$</wdr:hasHost>
  <wdr:hasPath>^foo</wdr:hasPath>
</wdr:Scope> 

<wdr:Scope>
  <wdr:hasHost>example.net$</wdr:hasHost>
  <wdr:hasPath>^bar</wdr:hasPath>
</wdr:Scope> 

Alternatively, if we use a RegEx-based approach, we may specify that there can only be ONE instance of each property type and use the RegEx structure to handle 'OR'

<wdr:Scope>
  <wdr:hasHost>example.org$|example.net$</wdr:hasHost>
  <wdr:hasPath>^foo|^bar</wdr:hasPath>
</wdr:Scope> 

To define the resources on example.org where the path component starts with foo and the resources on example.net where the path begins with bar we would then use the hasURI property thus:

<wdr:Scope>
  <wdr:hasURI>example.org/foo|example.net/bar</wdr:hasURI>
</wdr:Scope> 

The disadvantage of this is relative complexity (you need to understand RegExes to read the scope statement. This makes the standard more dependent on tools).

The advantage is that we can use OWL constraints to specify that there must be 0 or 1 instances of each thing, especially hasScope, since to allow multiple scope statements requires a little more work on the security side – i.e. if I publish a DR with a scope of example.org and you publish some triples that say the scope is also example.mobi, we need to examine the provenance of each triple to establish trust. Perfectly possible, but that's the payback.

Associating Scopes with DRs

Scopes can be associated with:

A question: do we really need two distinct properties?

A scope can be

Example 6:

<wdr:Package rdf:ID="package">
  <wdr:hasPackageScope>
    <wdr:Scope>
      <wdr:hasHost>example.org</wdr:hasHost>
    </wdr:Scope>
  </wdr:hasPackageScope>
  <wdr:hasDRs rdf:parseType="Collection">
    <rdf:Description rdf:about="#dr_1" />
    <rdf:Description rdf:about="#dr_2" />
    <rdf:Description rdf:about="#dr_3" />
  </wdr:hasDRs>
</wdr:Package>

<wdr:WDR rdf:ID="dr_3">
  <foaf:maker rdf:resource="http://labellingauthority.example.org/foaf.rdf#me" />
  <dcterms:issued>2006-09-01</dcterms:issued>
  <wdr:validUntil>2007-09-01</wdr:validUntil> 
  <wdr:hasScope>
    <wdr:Scope>
      <wdr:hasHost>example.org</wdr:hasHost>
    </wdr:Scope>
  </wdr:hasScope>
  <wdr:hasDescription rdf:resource="#description_3" />
</wdr:WDR>

Example 7:

<wdr:Package rdf:ID="package">
  <wdr:hasPackageScope rdf:resource="#primaryScope" />
  <wdr:hasDRs rdf:parseType="Collection">
    <rdf:Description rdf:about="#dr_1" />
    <rdf:Description rdf:about="#dr_2" />
    <rdf:Description rdf:about="#dr_3" />
  </wdr:hasDRs>
</wdr:Package>

<wdr:WDR rdf:ID="dr_3">
  <foaf:maker rdf:resource="http://labellingauthority.example.org/foaf.rdf#me" />
  <dcterms:issued>2006-09-01</dcterms:issued>
  <wdr:validUntil>2007-09-01</wdr:validUntil> 
  <wdr:hasScope rdf:resource="#primaryScope" />
  <wdr:hasDescription rdf:resource="#description_3" />
</wdr:WDR>

<wdr:Scope rdf:ID="primaryScope">
  <wdr:hasHost>example.org</wdr:hasHost>
</wdr:Scope>

The main issue here is: are we going to specify inheritance? i.e. that a DR within a package automatically inherits the scope of the package? It avoids repeating data, yes, but what about the situation, like T-Online, where rules are not used and link/rel tags point to specific DRs?