Re: Proposal to resolve ISSUE-1 from Gregg Kellogg on 2011-11-02 (public-html-data-tf@w3.org from November 2011)

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Tue, 1 Nov 2011 22:26:35 -0400
To: Ivan Herman <ivan@w3.org>
CC: HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-ID: <4CFAB3A3-9D15-4424-9535-E6141C9A14AD@greggkellogg.net>
On Nov 1, 2011, at 8:34 AM, Ivan Herman wrote:

> 
> On Oct 31, 2011, at 20:33 , Gregg Kellogg wrote:
> 
>> On Oct 31, 2011, at 2:04 AM, Ivan Herman wrote:
>> 
>>> Gregg,
>>> 
>>> I am not sure I understand exactly what you mean by the various choices. Can you give examples with some well known vocabularies, like schema, dc, foaf, vcard, gr?
>> 
>> Looking at the examples in [2] might be useful. If we had the following in the registry:
> 
> Thanks. It helps. 
> 
> [snip]
>> 
>> 
>>> I am also concerned by the fact microdata->rdf converters would have to consult the registry (ok, it can be cached, but nevertheless) for each and every @itemtype. This may be prohibitive. Also, a default mechanism should be made available in case the registry is unreachable, and this default should correspond to the most frequent usage (my feeling is that the most frequent usage was the approach you originally had, namely that the vocabulary base URI can be deduced from the @itemtype URI by cutting it back from its last component).
>> 
>> If there was a way to meet the requirements without a registry, that would be great. Aside from the WHATWG HTML5 spec defining vocabularies which use the _contextual_ URI generation scheme, there's schema.org's expansion [3], which makes detecting the proper vocabulary more challenging. Basically, a type (or property) can be expanded by adding to it's URI path:
>> 
>> <http://schema.org/Person> could be a basic type, while <http://schema.org/Person/Deceased> could be an application-specific sub-type of Person. Without a registry, it would be more difficult to determine the proper URI prefix to use for the vocabulary.
> 
> Ouch. So _that_ is the real reason we need a differentiation between 'vocabulary' and 'type'...

Actually, other than the vCard and Event vocabularies in the (WHATWG version) HTML5 spec, I don't really know what vocabularies would use the 'type', and those might need contextual. However, they're not in the W3C spec, and I don't really think it's appropriate to have microformat vocabularies (with the MF namespace) in a W3C spec anyway. But, a registry is still required to  know unambigiously what the vocabulary is.

> B.t.w., your algorithm seems to fall back to 'contextual' as a default. I am not sure I would agree with that. I regard that case as being very rarely used in practice, if at all. 

Yes, this hasn't received any other support, and as there's been no other consequential feedback on this, I'm inclined to change the default choice to something that uses the previous heuristic for determining the vocabulary...

>> Another thought I raised earlier was to do away with special URI processing rules and settle on one in particular (the previous, I would think).
> 
> Previous being?

The previous algorithm was to take everything after the last '/' or '#' in the @itemtype URI and use that as the basis for resolving non-absolute URI property names.

>> Then we could rely on pre-defined prefixes, equivalent to those defined in the RDFa default context, to allow for saner URI production. So for example, we could have the following:
>> 
>> <div itemscope itemtype="schema:Person/Deceased">
>> <span itemprop="schema:name">Jane Doe</span>
>> <img itemprop="schema:image" src="janedoe.jpg" />
>> </div>
>> 
> 
> I think we can forget about that. Prefixes will never be accepted in the microdata context.
> 
> That being said... My personal feeling is that
> 
> - a vast majority of non-schema.org vocabularies will work with 'type' (I would take that as a default)

Actually, no, the majority would be 'vocabulary'. Type would take a type such as 'http://schema.org/Person' with a property 'name' and create 'http://schema.ort/Personname'. It's really intended more for things like the (hypothetical) http://microformats.org/hcard to create http://microformats.org/hcard#name.

> - we can define that registry the same way as we have the default set of prefixes for RDFa; implementations will hard-wire those into their implementation. I personally do not expect a large number of vocabularies there; the microdata deployment, at the moment, is probably schema.org only in terms of real numbers...

I'd vote for schema.org, foaf, and goodrelations in the set. The real issue, which I've punted on, is how to establish and update this set, and exactly what format to use. I used JSON in my examples because it's easy to parse, but you could also argue for a human-readible registry in Microdata.

Gregg

> Ivan
> 
> 
> 
>> Much simpler than requiring a registry. There's still the issue of multi-valued properties. I would support abandoning placing them in RDF Collections, and being out of conformance with Microdata ordering, which I think is much more what people want to use, but this could always be done with post-processing entailment rules such as the following:
>> 
>> sss ppp bbb . bbb rdf:rest rrr => sss ppp rrr .
>> sss ppp bbb . bbb rdf:first vvv => sss ppp vvv .
>> 
>> And then ignoring or removing the BNode values.
>> 
>> Gregg
>> 
>>> Ivan 
>>> 
>>> 
>>> On Oct 28, 2011, at 19:23 , Gregg Kellogg wrote:
>>> 
>>>> I am preparing an update to the Microdata to RDF specification. I propose we resolve ISSUE 1 as follows:
>>>> 
>>>> We define a registry mapping URI prefixes to property URI generation behavior with possible values of _vocabulary_, _type_, or _contextual_. @itemtypes which begin with URI prefix will use the associated value of property URI generation behavior when generating property URIs, and otherwise fall back to _contextual_.
>>>> 
>>>> We also add a mapping from a URI prefix to the mechanism for serializing multi-valued properties with possible values _unordered_ and _list_.
>>>> 
>>>> The format of the registry is undefined, as is the update process. I think this is really a fairly complicated issue, and probably beyond the scope of this TF.
>>>> 
>>>> (Note there is some debate on if "registry" is the proper term, I'm sticking with it for now).
>>>> 
>>>> For non-URI property names:
>>>> 
>>>> _vocabulary_ URI generation constructs a URI by appending fragment-escaped property names to the URI prefix.
>>>> 
>>>> _type_ URI generation constructs a URI by appending '#' and the fragment-escaped property name to the @itemtype URI. This is only valid for @itemtype URIs which do not, themselves, contain a fragment.
>>>> 
>>>> _contextual_ URI generation uses the original property URI generation algorithm from [1].
>>>> 
>>>> 
>>>> When generating triples for multi-valued properties, _subject_ and _predicate_ serialize the list of values as follows:
>>>> 
>>>> _unordered_ generates a triple with _subject_, _predicate_ and _value_ for each _value_ in the list of values.
>>>> 
>>>> _list_ generates an RDF Collection.
>>>> 
>>>> I'm marking the issue as PENDINGREVIEW.
>>>> 
>>>> Gregg
>>>> 
>>>> [1] http://www.w3.org/2011/htmldata/track/issues/1
>> [2] https://dvcs.w3.org/hg/htmldata/raw-file/74bd1c88b77d/microdata-rdf/index.html#markup-examples
>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C Semantic Web Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
>
Received on Wednesday, 2 November 2011 02:29:53 UTC