POWDER: Conjunction of set element properties

Background

POWDER is designed to allow a small amount of data to describe a large number of resources. An important step on this road is to be able to define a set of resources. If a candidate resource is an element of the set, then POWDER allows a processor to deduce RDF triples in which the candidate resource's URI is the subject.

The focus of this discussion document is how we define the sets, specifically, how we combine different statements from which set membership can be inferred.

For the purposes of this discussion we want to encode:

All resources on example.org and example.net where the path starts with foo or bar are described by http://www.example.org/description.

The RDF properties hasHost and pathStartsWith are among those being defined by the group.

There are several options. At present, the plan is to include them in a first public working draft and seek feedback. As there are so many options, we'd like to reduce the number a little before then if we can.

It's clear that none encode all the necessary information to determine whether a candidate resource is, or is not, an element of the set, i.e. the data will have a strcture that must be understood by a POWDER processor.

Option 1

RDF/XML, Graph

Multiple instances of the same property are combined with logical OR, different properties are combined with logical AND. Thus we would encode the example as:

  <wdr:WDR>
    <wdr:hasScope>
      <wdr:Set>
        <wdr:hasHost>example.org</wdr:hasHost>
        <wdr:hasHost>example.net</wdr:hasHost>
        <wdr:pathStartsWith>foo</wdr:pathStartsWith>
        <wdr:pathStartsWith>bar</wdr:pathStartsWith>
      </wdr:Set>
    </wdr:hasScope>

    <wdr:hasDescription rdf:resource="http://www.example.org/description" />
  </wdr:WDR>
For
This 'looks right' to some
Against
It is rather vague and relies on several assumptions being made. Can't use OWL cardinality to constrain instances of properties.

Option 2

RDF/XML, Graph

All properties of Set are combined with logical AND but we introduce a new property and class that allows properties to be combined with OR thus:

  <wdr:WDR>
    <wdr:hasScope>
      <wdr:Set>
        <wdr:includes>
          <wdr:unionOf>
            <wdr:hasHost>example.org</wdr:hasHost>
            <wdr:hasHost>example.net</wdr:hasHost>
          </wdr:unionOf>
        </wdr:includes>

        <wdr:includes>
          <wdr:unionOf>
            <wdr:pathStartsWith>foo</wdr:pathStartsWith>
            <wdr:pathStartsWith>bar</wdr:pathStartsWith>
          </wdr:unionOf>
        </wdr:includes>
      </wdr:Set>
    </wdr:hasScope>

    <wdr:hasDescription rdf:resource="http://www.example.org/description" />
  </wdr:WDR>
For
Logical, Sem-Web friendly.
Against
What a lot of processing!

Option 3

RDF/XML, Graph

Properties are combined with logical AND but they take a white space separated list, members of which are combined with OR.

  <wdr:WDR>
    <wdr:hasScope>
      <wdr:Set>
        <wdr:hasHost>example.org example.net</wdr:hasHost>
        <wdr:pathStartsWith>foo bar</wdr:pathStartsWith>
      </wdr:Set>
    </wdr:hasScope>

    <wdr:hasDescription rdf:resource="http://www.example.org/description" />
  </wdr:WDR>

</rdf:RDF>
For
Compact, logical, can use OWL cardinality
Against
Processing may be heavy, values have additional structure which is implied

Option 4

RDF/XML, Graph

Allow multiple Scope statements in a DR and combine those with OR.

  <wdr:WDR>
    <wdr:hasScope>
      <wdr:Set>
        <wdr:hasHost>example.org</wdr:hasHost>
        <wdr:pathStartsWith>bar</wdr:pathStartsWith>
      </wdr:Set>
    </wdr:hasScope>

    <wdr:hasScope>
      <wdr:Set>
        <wdr:hasHost>example.org</wdr:hasHost>
        <wdr:pathStartsWith>foo</wdr:pathStartsWith>
      </wdr:Set>
    </wdr:hasScope>

    <wdr:hasScope>
      <wdr:Set>
        <wdr:hasHost>example.net</wdr:hasHost>
        <wdr:pathStartsWith>foo</wdr:pathStartsWith>
      </wdr:Set>
    </wdr:hasScope>

    <wdr:hasScope>
      <wdr:Set>
        <wdr:hasHost>example.net</wdr:hasHost>
        <wdr:pathStartsWith>bar</wdr:pathStartsWith>
      </wdr:Set>
    </wdr:hasScope>

    <wdr:hasDescription rdf:resource="http://www.example.org/description" />
  </wdr:WDR>
For
Logically consistent. Maintains tight control on individual set definition. Allows easy integration with sets defined by property. May aid processing efficiency
Against
Allows multiple scope statements so that the scope of a DR is not closed - others can publish additional scope statements. Thus security is dependent on recognising the source of the scope statements

Option 4A

RDF/XML, Graph

A variation on 4 with closed list of scope statements:

  <wdr:WDR>
    <wdr:hasScope rdf:parseType="Collection">

      <wdr:Set>
        <wdr:hasHost>example.org</wdr:hasHost>
        <wdr:pathStartsWith>bar</wdr:pathStartsWith>
      </wdr:Set>

      <wdr:Set>
        <wdr:hasHost>example.org</wdr:hasHost>    
        <wdr:pathStartsWith>foo</wdr:pathStartsWith>
      </wdr:Set>

      <wdr:Set>
        <wdr:hasHost>example.net</wdr:hasHost>
        <wdr:pathStartsWith>foo</wdr:pathStartsWith>
      </wdr:Set>

      <wdr:Set>
        <wdr:hasHost>example.net</wdr:hasHost>
        <wdr:pathStartsWith>bar</wdr:pathStartsWith>
      </wdr:Set>
    </wdr:hasScope>

    <wdr:hasDescription rdf:resource="http://www.example.org/description" />
  </wdr:WDR>
For
As option 4 but with closed list of possible sets. Looks Sem Web friendly.
Against
Heavy processing requiring multiple queries to extract the data

Option 5

RDF/XML, Graph

None of the above - just use the Regular Expression property hasURI. Options 1 - 4 all use properties that match a string against a component of a URI. The hasURI property matches a RegEx against the whole thing so we could write our example as

  <wdr:WDR>
    <wdr:hasScope>
      <wdr:Set>
        <wdr:hasURI>^(([^:/?#]+):)?(//[^:/?#]+\.)*example\.(org|net)/(foo|bar)</wdr:hasURI>
      </wdr:Set>
    </wdr:hasScope>

    <wdr:hasDescription rdf:resource="http://www.example.org/description" />
  </wdr:WDR>
For
Most sets are expected to be simple. If you need to write a complex Set definition, you probably know Regular Expressions or know someone who does (and they'd probably be written by machine anyway). So we can decide that all properties in the Set should be combined with AND, use OWL cardinality etc.
Against
Goes against some of the design goals in that it requires knowledge of Regular Expressions. Also, they are error prone which may lead to resources being included or excluded by mistake. Finally it can't work with sets defined by resource properties

Option 6

RDF/XML, Graph not available

Use an XML literal instead. Something like:

  <wdr:WDR>
    <wdr:hasScope rdf:parseType="Literal" xmlns:set="http://www.w3.org/2007/05/powder-set">
      <set:Set>
        <set:host>
          <set:match name="example.org" type="endsWith" />
          <set:match name="example.net" type="endsWith" />
          <set:path>
            <set:match name="foo" type="startsWith" />
            <set:match name="bar" type="startsWith" />
          </set:path>
        </set:host>
      </set:Set>
    </wdr:hasScope> 

    <wdr:hasDescription rdf:resource="http://www.example.org/description" />

  </wdr:WDR>
For
It's easier to build in flexibility using attributes. A descrete process based on an XPath query could determine whether a candidate resource was an element of the set or not, allowing the candidate URi to be the subject of a bunch of RDF triples. This approach has a history! See the incomplete http://www.w3.org/2005/Incubator/wcl/matching.html and its references, in particular http://www.w3.org/TR/urispace.
Against
It's no more logical to assume that this means example.org OR example.net than in, say, option 1. Also, as this is XML, not RDF, you need a separate XML processor to extract the result, a different processing model etc. Maybe GRDDL could help here?

Option 7

RDF/XML, Graph

Use RDF's native support for structured values (rdf:value).

RDF has support for situations where there is a 'main value' that then needs some sort of qualification. The classic example being a weight where the main value is the numerical weight and the structural element is the units. This might be useful to us here.

  <wdr:Set> 
    <wdr:hasHost rdf:parseType="Resource">
      <rdf:value>example.org</rdf:value>
      <rdf:value>example.net</rdf:value>
      <wdr:combine>OR</wdr:combine>
    </wdr:hasHost>
    <wdr:pathStartsWith rdf:parseType="Resource">
      <rdf:value>foo</rdf:value>
      <rdf:value>bar</rdf:value>
      <wdr:combine>OR</wdr:combine>
    </wdr:pathStartsWith>
  </wdr:Set>
For
It's using RDF's native support for structured values, making the structure unambiguous (we could define classes for Or and AND if necessary)
Against
Maybe this is an abuse of rdf:value? If the idea is to say "2.4 kg" that's one value and one unit. Here we're giving multiple values and an operand so there is still structure that needs to be understood.