Powdering logos (again)

Part of Data

Author(s) and publish date

By:
Published:
Skip to 1 comments

Quite a while ago (and on a different blog, actually) I wrote a short blog on how to use the upcoming POWDER spec. The example was to create RDF triples expressing copyright information on Semantic Web logos. Lot has happened with POWDER since, and most of what I wrote in that blog is now technically outdated:-( So here is the updated example. The goal remains the same: a POWDER-aware processor should be able, following its nose, to find copyright information expressed in RDF for the various logos.

One starts with the index page of the logos. The page is, in fact, using RDFa; the extracted RDF information includes the triple:

<> wdrs:describedBy <http://www.w3.org/Icons/SW/CopyrightResources.xml>

which gives the URI for the real POWDER content. (There could be other ways to find that XML file, but, well, that what I chose to use.)

The first difference, compared to the previous version, is that this file is in XML and not in RDF. This XML format is one of the formats of POWDER; it is easier to read and write. Although a dedicated POWDER processor can handle this file directly, this file is, conceptually, converted to an OWL file (referred to as POWDER-S) to express its meaning. More about this later.

The XML file contains statements on a number of resources by defining constraints on their URI-s. These are the “IRI sets". To take an example from our document:

<iriset>
  <includehosts>www.w3.org</includehosts>
  <includepathstartswith>/Icons/SW/</includepathstartswith>
  <includepathendswith>.png .svg .gif .eps</includepathendswith>
  <excludepathstartswith>/Icons/SW/Buttons</excludepathstartswith>
  <excluderegex>w3c</excluderegex>
</iriset>

One can read that relatively easily: the set includes URI-s like http://www.w3.org/Icons/SW/something.png but it does not include http://www.w3.org/Icons/SW/something-w3c.gif. (There are a number of other possibilities like constraining on the port numbers; see the “Grouping of Resources” document for further details.) The important point to remember is that this part of the XML file defines (RDF) resources by constraining their URI-s.

Once we have the resources we want to characterize, we have to define what properties those resources have. This is defined in a separate portion of the file. Our example says, for example:

<descriptorset>
  <typeof             src="http://creativecommons.org/ns#Work"/>
  <cc:license         rdf:resource='http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231'/>
  <cc:attributionURL  rdf:resource='http://www.w3.org/2001/sw/'/>
	...
</descriptorset>

By combining the two sets of information, a POWDER processor can deduce, for example, the following set of RDF triples for a URI it finds on the site:

<http://www.w3.org/Icons/SW/sw-cube.png>
  a <http://creativecommons.org/ns#Work>;
  cc:license <http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231>;
  cc:attributionURL <http://www.w3.org/2001/sw/> 
  ...
 

And voilà! Simple, isn’t it? Well… of course, as always, the devil is in the details. If you are interested to understand how all this comes about in Semantic Web land then go on reading; if not, just stop here…

You continued reading? Well, you have been warned:-) The trick is that the XML file is “canonically” converted into an OWL file (which I’ve put up on the Web, too: CopyrightResources.rdf). The fundamental structure of the OWL equivalent is as follows:

  1. the “IRI set” defines an (anonymous) class of resources, say, _:iriset.
  2. the “descriptor set” defines another, anonymous class of resources (say _:descriptorset) through the intersection of a number of owl:hasValue property restrictions.
  3. an extra OWL statement is issued, saying that _:iriset is a subclass of _:descriptorset. What this means, in human terms, is that each resource in _:iriset will be the subject of triples with the properties listed in the property restrictions.

Item #2 is actually fairly simple. Here is a portion of what happens in our example after the transformation to OWL (the file itself is in RDF/XML, but I use turtle to make it more readable):

_:descriptorset a owl:Class;
   rdfs:subClassOf <http://creativecommons.org/ns#Work>;
   owl:intersectionOf (
     [ a owl:Restriction;
       owl:onProperty cc:license;
       owl:hasValue <http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231>
    ]
    [ a owl:Restriction; 
      owl:onProperty cc:attributionURL;
      owl:hasValue <http://www.w3.org/2001/sw/>;
    ]
    ...
   )
So far so good. But the tricky one is item #1 on our list. The constraints on the IRI-s are translated into restrictions again, but the values for this restrictions are regular expressions. What happens in our example is:
_:iriset a owl:Class;
  owl:intersectionOf (
    [ a owl:Restriction;
      owl:onProperty <http://www.w3.org/2007/05/powder-s#matchesregex>;
      owl:hasValue "://(([^/?#]*)@)?([^:/?#@]*)(:([0-9]+))?(/Icons/SW/)"
    ]
    ...
  )

This is not very readable but, well, regular expressions have never been pretty… But that is not the real complication for the IRI sets. There is also a subtle but important semantic point. What is exactly this “matchesregex” property? Well, it assigns to an RDF resource its URI as a string and matches it against a regular expression. The problem is that this cannot be expressed in regular RDF+OWL semantics. The “space” of URI-s and that of resources are strictly separated, and there is no standard “bridge” between the two. There is no way around it: the POWDER document has to extend the standard semantics to accommodate this (see the formal document for details). What this means in practice is that to deduce the targeted RDF triples, eg,

<http://www.w3.org/Icons/SW/sw-cube.png>
  cc:license <http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231>;
  ...

the processor must be POWDER aware; just a general RDF/OWL tool won’t do it. It is not a complex extension, but has to be there.

You have been warned: the semantic side is a little bit more complex. But most of the users do not care what is under the hood and these complexities are well hidden; the complexities are of interest for implementers only. On the other hand, POWDER will be a really important piece of the Semantic Web landscape!

Formally, POWDER is in last call. If you are interested, then this is the time to send in your comments!

 

Related RSS feed

Comments (1)

Comments for this post are closed.