Re: htmldata-ISSUE-4 (Property as subject): Should the registry allow a property name or URI to be used as an alias for @itemid [Microdata to RDF] from Gregg Kellogg on 2011-11-06 (public-html-data-tf@w3.org from November 2011)

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Sun, 6 Nov 2011 16:17:23 -0500
To: Ivan Herman <ivan@w3.org>
CC: Gregg Kellogg <gregg@kellogg-assoc.com>, HTML Data Task Force WG <public-html-data-tf@w3.org>, Jeni Tennison <jeni@jenitennison.com>
Message-ID: <874A004C-9D38-4687-88B7-6AE912F439BB@greggkellogg.net>
On Nov 6, 2011, at 2:14 AM, Ivan Herman wrote:

> Gregg,
> 
> this line of adding more and more behavioural constraints into that registry becomes more and more complex to me. The other issue I have with this particular problem is that this would also mean a complication for the reproduction of the schema.org patterns in RDFa; indeed, the registry feature below would not work for RDFa, ie, schema.org examples encoded in RDFa would look quite different from their microdata equivalent. Not good. 

To a large extent, the registry discussion is there mostly as a rhetorical artifact useful for describing different types of behavior in microdata. It may be necessary to create an repository, as we will likely need to call out different processing rules for different vocabularies, but I agree that we should strive for simplicity. I've stuck with Jen's dictate that we try to call out the different options for another working group to carry on, and this seems like a reasonable way to describe the different type of behaviors that need to be considered.

Your point about these patterns also needing to be addressed in RDFa (particularly schema:url as subject) is something I had thought about too. It would be much simpler if there was an rdfs:sameAs with reasonable antecedent/consequent rules, then we could just extend the @vocab expansion used in RDFa and create a parallel for that in microdata. Something like:

	bbb schema:url uuu . bbb ppp ooo => uuu ppp ooo .

> I was wondering whether a totally different approach would not be better and more appropriate. Namely:
> 
> - there is one, canonical, mapping from microdata to RDF. For practical purposes, to make the mapping as close to the biggest microdata usage, namely schema.org, let us make it as close as possible to what schema.org proposes.

Having one transformation that incorporates schema.org, given the schema.org extension mechanism, probably means separating a URI between base and path, so that all path components become part of the vocabulary. Of course, this won't work for FOAF or DC, so I'm not sure, without a registry, how we might come up with a one-size-fits-all method.

> - there would be a registry for vocabularies defining some rules that can transform RDF to RDF; application may, if necessary, make use of these rules.
> 
> The caveat is that, unfortunately, there is no real rule engine syntax well accepted by the RDF community. Except that we can use SPARQL CONSTRUCT queries for that purpose. The extra bonus is that converters may use SPARQL engines to get the conversion done, for example. Using this, I believe what you describe in the current issue is:
> 
> CONSTRUCT {
>  ?uri a ?t ;
>       ?p ?o .
> }
> WHERE {
>  ?b a ?t ;
>     schema:uri ?uri ;
>     ?p ?o .
>  FILTER( ?p != schema.uri && ?p != rdf:type )
> }
> 
> Thoughts?

This could be similar to the @vocab vocabulary expansion in RDFa, but looking for a SPARQL rather than an OWL/RDFS file, perhaps using content-negotiation. For example, if there was an application/sparql-query at a higher priority than application/rdf+xml or text/turtle, this could return a SPARQL CONSTRUCT or UPDATE that would perform such a transformation, and it could work for either Microdata or RDFa (with suitable changes to @vocab expansion).

However, if ?uri is a literal, I'm not entirely sure how to construct an IRI from a literal. Also, if we'd like to create typed literals from plain literals, does SPARQL CONSTRUCT allow such a transformation?

SPARQL does add a fair burden on a processor or client, substantially increasing the performance requirements of generating an appropriate output graph. Using syntactic constructs, such as those I described for literal coercion and subject insertion, can be more performant and reduce fewer "junk" triples. Your query above, for example, would duplicate the triples for an item for each schema:url encountered and leave the original with it's Blank Node. But, I don't know how we could accommodate this in RDFa.

BTW, I think we can simplify your query and count on duplicate triples being rejected and that BNodes are existential quantifiers in SPARQL:

CONSTRUCT { ?s ?p ?o }
WHERE { [ schema:url ?s; ?p ?o ] }

Gregg

> Ivan
> 
> 
> On Nov 5, 2011, at 19:21 , HTML Data Task Force Issue Tracker wrote:
> 
>> 
>> htmldata-ISSUE-4 (Property as subject): Should the registry allow a property name or URI to be used as an alias for @itemid [Microdata to RDF]
>> 
>> http://www.w3.org/2011/htmldata/track/issues/4
>> 
>> Raised by: Gregg Kellogg
>> On product: Microdata to RDF
>> 
>> Schema.org provides a 'url' property which, in practice, is used to set the subject for an item. Moreover, in many examples, the property is used with a literal content model, rather than a URI content model.
>> 
>> For example, the following use case is common in schema.org examples:
>> 
>> 
>> <div itemprop="tracks" itemscope itemtype="http://schema.org/MusicRecording">
>>  <span itemprop="name">Rope</span>
>>  <meta itemprop="url" content="foo-fighters-rope.html">
>>  ...
>> </div>
>> 
>> In this case the @content attribute is used where the value is expected to be a URI. And, it is clear that this URI is intended as the subject of the item.
>> 
>> A registry entry could be created which would affect processing of a microdata processor by specifying a content model for the property (URI reference) and that it is to be used as the subject of an item. Note, that there is a special case where the item already has an @itemid attribute, or there are more than one 'url' property values. This could be resolved by using the first property value only if the item has no @itemid.
>> 
>> The suggested behavior would be to use the first 'url' property value both as the item subject and as a property and subsequent values as a property only.
>> 
>> For example, the previous microdata would produce the following Turtle:
>> 
>> <foo-fighers-rope.html> a schema:MusicRecording;
>> schema:name "Rope";
>> schema:url <foo-fighters-rope.html> .
>> 
>> A possible JSON representation of a registry that identifies this could be the following:
>> 
>> {
>> "http://schema.org/": {
>>   "propertyURI": "vocabulary",
>>   "multipleValues": "unordered",
>>   "@context": {
>>     "url": { "@datatype": ["@subject", "@uri"]},
>>     "dateCreated": {"@datatype": "http://www.w3.org/2001/XMLSchema#date"},
>>     ...
>>   }
>> }
>> }
>> 
>> This notation borrows some concepts from the JSON-LD context, but it is intended for discussion and is not proposed as a repository syntax. As the range is described, any use of this property would treat the value as a URI reference. As a by-product, this can be used for URIs (or IRIs) which are not URLs.
>> 
>> The 'url' refers to http://schema.org/url, and is defined both as having a URI reference data range, and to be used as an alias for the item subject. In contrast, http://schema.org/dateCreated is defined as having an xsd:date range, which would cause the resulting literal to have the associated datatype.
>> 
>> 
>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
>
Received on Sunday, 6 November 2011 21:18:02 UTC