Re: htmldata-ISSUE-4 (Property as subject): Should the registry allow a property name or URI to be used as an alias for @itemid [Microdata to RDF] from Ivan Herman on 2011-11-07 (public-html-data-tf@w3.org from November 2011)

From: Ivan Herman <ivan@w3.org>
Date: Mon, 7 Nov 2011 09:56:46 +0100
To: Gregg Kellogg <gregg@kellogg-assoc.com>
Cc: HTML Data Task Force WG <public-html-data-tf@w3.org>, Jeni Tennison <jeni@jenitennison.com>
Message-Id: <D4A5B45D-D7A3-4F29-8DEC-C7967ACA8AE9@w3.org>
On Nov 6, 2011, at 22:17 , Gregg Kellogg wrote:

> On Nov 6, 2011, at 2:14 AM, Ivan Herman wrote:
> 
>> Gregg,
>> 
>> this line of adding more and more behavioural constraints into that registry becomes more and more complex to me. The other issue I have with this particular problem is that this would also mean a complication for the reproduction of the schema.org patterns in RDFa; indeed, the registry feature below would not work for RDFa, ie, schema.org examples encoded in RDFa would look quite different from their microdata equivalent. Not good. 
> 
> To a large extent, the registry discussion is there mostly as a rhetorical artifact useful for describing different types of behavior in microdata. It may be necessary to create an repository, as we will likely need to call out different processing rules for different vocabularies, but I agree that we should strive for simplicity. I've stuck with Jen's dictate that we try to call out the different options for another working group to carry on, and this seems like a reasonable way to describe the different type of behaviors that need to be considered.
> 
> Your point about these patterns also needing to be addressed in RDFa (particularly schema:url as subject) is something I had thought about too. It would be much simpler if there was an rdfs:sameAs with reasonable antecedent/consequent rules, then we could just extend the @vocab expansion used in RDFa and create a parallel for that in microdata. Something like:
> 
> 	bbb schema:url uuu . bbb ppp ooo => uuu ppp ooo .
> 

Well, that is not a sameAs in the same way as owl:sameAs... Anyway, that is a detail for now. But what you do here is exactly what I proposed: you define a rule that is to be applied after the microdata->RDF conversion:-)

>> I was wondering whether a totally different approach would not be better and more appropriate. Namely:
>> 
>> - there is one, canonical, mapping from microdata to RDF. For practical purposes, to make the mapping as close to the biggest microdata usage, namely schema.org, let us make it as close as possible to what schema.org proposes.
> 
> Having one transformation that incorporates schema.org, given the schema.org extension mechanism, probably means separating a URI between base and path, so that all path components become part of the vocabulary. Of course, this won't work for FOAF or DC, so I'm not sure, without a registry, how we might come up with a one-size-fits-all method.
> 
>> - there would be a registry for vocabularies defining some rules that can transform RDF to RDF; application may, if necessary, make use of these rules.
>> 
>> The caveat is that, unfortunately, there is no real rule engine syntax well accepted by the RDF community. Except that we can use SPARQL CONSTRUCT queries for that purpose. The extra bonus is that converters may use SPARQL engines to get the conversion done, for example. Using this, I believe what you describe in the current issue is:
>> 
>> CONSTRUCT {
>> ?uri a ?t ;
>>      ?p ?o .
>> }
>> WHERE {
>> ?b a ?t ;
>>    schema:uri ?uri ;
>>    ?p ?o .
>> FILTER( ?p != schema.uri && ?p != rdf:type )
>> }
>> 
>> Thoughts?
> 
> This could be similar to the @vocab vocabulary expansion in RDFa, but looking for a SPARQL rather than an OWL/RDFS file, perhaps using content-negotiation. For example, if there was an application/sparql-query at a higher priority than application/rdf+xml or text/turtle, this could return a SPARQL CONSTRUCT or UPDATE that would perform such a transformation, and it could work for either Microdata or RDFa (with suitable changes to @vocab expansion).
> 
> However, if ?uri is a literal, I'm not entirely sure how to construct an IRI from a literal. Also, if we'd like to create typed literals from plain literals, does SPARQL CONSTRUCT allow such a transformation?

That is again a detail, but yes, it exists now in SPARQL 1.1 (not in SPARQL 1.0). In the current draft, it is called the IRI function:

"The IRI function constructs an IRI by resolving the string argument (see RFC 3986 or any later RFC that superceeds RFC 3986. )."

and can be used with BIND, for example. But yes, this goes beyond SPARQL 1.0, ie, that *is* an extra complication...

> 
> SPARQL does add a fair burden on a processor or client, substantially increasing the performance requirements of generating an appropriate output graph. Using syntactic constructs, such as those I described for literal coercion and subject insertion, can be more performant and reduce fewer "junk" triples. Your query above, for example, would duplicate the triples for an item for each schema:url encountered and leave the original with it's Blank Node. But, I don't know how we could accommodate this in RDFa.
> 
> BTW, I think we can simplify your query and count on duplicate triples being rejected and that BNodes are existential quantifiers in SPARQL:
> 
> CONSTRUCT { ?s ?p ?o }
> WHERE { [ schema:url ?s; ?p ?o ] }
> 

Yes, that is probably true.

Coming back to the bigger picture. The way I see it now, the microdata+vocabulary world fairly messy (at least for my taste). It seems to be fairly impossible to define a registry format that would come up with *all* different possible mapping of vocabularies. The quite ad-hoc usage of the url property in schema.org is a sign. Hence my idea of pushing this somewhere else, so to say. (I see your point that *some* features may need a registry entry, essentially how to manage the URI in itemtype, as well as the lists. I would really like to stop there, and push everything else into the application domain if possible.)

But, after all: what we are talking about is *a* RDF representation of a microdata annotated HTML file; a representation that makes sense, though may not be an exact replica of the microdata data model (which seems to be quite inadequate for the RDF world). However... considering schema.org, and using the simple mapping that we had in mind, what we'd get is 

[
   a <SomeSchemaType>;
   schema.url <URI of the entity>
   ...
]

From an RDF point of view I do not see anything wrong with that representation! The 'semantics' of schema.url being documented, it probably also entails that there ought to be one (an only one?) such statement for a particular object; with those constraints, as a consumer of the schema.org RDF I can do whatever I want in my RDF based application, I can happily combine it with other data, display it, etc.

The problem with my original approach is that we do not have (sadly!) a proper, and well accepted, syntax to express rules for RDF. (This has a long history, let us not get there.) Hence my usage of SPARQL. We can, of course, take a *much* simpler approach of using a very simple syntax, much like what you used (and what is in the RDF Semantics document), but it will take quite some work to do that properly, and to define the behaviour of the microdata->RDF converters... So I wonder whether this should not be let to the individual applications instead.

Just musing...

Ivan



> Gregg
> 
>> Ivan
>> 
>> 
>> On Nov 5, 2011, at 19:21 , HTML Data Task Force Issue Tracker wrote:
>> 
>>> 
>>> htmldata-ISSUE-4 (Property as subject): Should the registry allow a property name or URI to be used as an alias for @itemid [Microdata to RDF]
>>> 
>>> http://www.w3.org/2011/htmldata/track/issues/4
>>> 
>>> Raised by: Gregg Kellogg
>>> On product: Microdata to RDF
>>> 
>>> Schema.org provides a 'url' property which, in practice, is used to set the subject for an item. Moreover, in many examples, the property is used with a literal content model, rather than a URI content model.
>>> 
>>> For example, the following use case is common in schema.org examples:
>>> 
>>> 
>>> <div itemprop="tracks" itemscope itemtype="http://schema.org/MusicRecording">
>>> <span itemprop="name">Rope</span>
>>> <meta itemprop="url" content="foo-fighters-rope.html">
>>> ...
>>> </div>
>>> 
>>> In this case the @content attribute is used where the value is expected to be a URI. And, it is clear that this URI is intended as the subject of the item.
>>> 
>>> A registry entry could be created which would affect processing of a microdata processor by specifying a content model for the property (URI reference) and that it is to be used as the subject of an item. Note, that there is a special case where the item already has an @itemid attribute, or there are more than one 'url' property values. This could be resolved by using the first property value only if the item has no @itemid.
>>> 
>>> The suggested behavior would be to use the first 'url' property value both as the item subject and as a property and subsequent values as a property only.
>>> 
>>> For example, the previous microdata would produce the following Turtle:
>>> 
>>> <foo-fighers-rope.html> a schema:MusicRecording;
>>> schema:name "Rope";
>>> schema:url <foo-fighters-rope.html> .
>>> 
>>> A possible JSON representation of a registry that identifies this could be the following:
>>> 
>>> {
>>> "http://schema.org/": {
>>>  "propertyURI": "vocabulary",
>>>  "multipleValues": "unordered",
>>>  "@context": {
>>>    "url": { "@datatype": ["@subject", "@uri"]},
>>>    "dateCreated": {"@datatype": "http://www.w3.org/2001/XMLSchema#date"},
>>>    ...
>>>  }
>>> }
>>> }
>>> 
>>> This notation borrows some concepts from the JSON-LD context, but it is intended for discussion and is not proposed as a repository syntax. As the range is described, any use of this property would treat the value as a URI reference. As a by-product, this can be used for URIs (or IRIs) which are not URLs.
>>> 
>>> The 'url' refers to http://schema.org/url, and is defined both as having a URI reference data range, and to be used as an alias for the item subject. In contrast, http://schema.org/dateCreated is defined as having an xsd:date range, which would cause the resulting literal to have the associated datatype.
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF: http://www.ivan-herman.net/foaf.rdf
>> 
>> 
>> 
>> 
>> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Monday, 7 November 2011 08:54:29 UTC