Re: My review of RDFa Core 1.1 (2011-12-15 version) from Niklas Lindström on 2012-01-26 (public-rdfa-wg@w3.org from January 2012)

From: Niklas Lindström <lindstream@gmail.com>
Date: Thu, 26 Jan 2012 03:08:00 +0100
To: Shane McCarron <shane@aptest.com>
Cc: public-rdfa-wg@w3.org
Message-ID: <CADjV5jfbckOFiPEwZ2-Xs6Vt8wcu1z8CG2tnFQok0m93A4WS8A@mail.gmail.com>
Hello Shane,

On Wed, Jan 25, 2012 at 12:04 AM, Shane McCarron <shane@aptest.com> wrote:
> My comments are in line.  I have mostly integrated your changes as you
> requested.  Note that there are questions in here for Nicklas, Manu, and
> Ivan.  Please look carefully and answer the questions directed at you.

That's really great!

Relevant parts kept and commented inline below. There are some points
which need further consideration, and it would be good to have input
from more WG members on those.


>> "2.2 Examples"
>> --------------
>>
>> * The first example uses the terms "author", "prev" and "next". But
>> these aren't mapped to property IRIs anymore, right? If they don't I
>> believe it's a poor example. Although I'm a bit confused by the
>> XHTML+RDFa 1.1 spec which still links to
>> <http://www.w3.org/2011/rdfa-context/xhtml-rdfa-1.1>, defining
>> these... In any case, it is probably confusing to have this in RDFa
>> Core 1.1 if it relies on terms defined for XHTML only...
>
>
> These examples are all properly in XHTML+RDFa and I am loathe to change them
> at this time.  The terms are defined for XHTML+RDFa and that is the only
> language we have any control over.  In particular when talking about terms
> we want to have some concrete examples, and the base only defines 3.

Ok. Then the initial paragraph here should say "In XHTML 1.1", since
these terms aren't available in HTML5 (nor XHTML5), right?


>> * I find the example a bit awkward since it builds up an event by
>> first implying that the current document is the event, before
>> enclosing it as a bnode of type cal:Vevent..
>
>
> I removed this example in favor of using something about books to show
> typeof as per a suggestion from Manu

Good. But there are still two examples above that using cal:summary
and cal:dtstart properties which describe the current document
(compare to the full event described in section 8). Perhaps using
something like:

    <body>
      <h1 property="dc:title">My home-page</h1>
      <p>Last modified: <span property="cal:dtstart"
              content="2015-09-16T16:00:00-05:00"
              datatype="xsd:dateTime">today</span>.</p>
    </body>

is better?


>> * Is it wise to use urn:ISBN URIs for the bibo:Book examples? I'd
>> rather see even example.org IRIs, or ideally stable real world IRIs
>> from some linked data library service..
>
>
> We want to show that these sorts of references are legitimate.

Ok.


>> "3.4 Plain literals"
>> --------------------
>>
>> * As I already brought up, the description of literals in section "3.4
>> Plain literals" isn't entirely correct. It might be good to add an
>> example of a literal with language related to the ongoing example
>> here, such as:
>>
>>     <http://dbpedia.org/resource/German_Empire>
>>         rdfs:label "German Empire"@en;
>>         rdfs:label "Deutsches Kaiserreich"@de .
>
>
> I wasn't smart enough to do this in the time I had.  If you want to provide
> specific text I am happy to stick it in.  It is an editorial change and we
> can do it at any time.

How about this for "3.4 Plain literals": [[[

Although IRI resources are always used for subjects and predicates,
the object part of a triple can be either an IRI or a literal. In the
example triples, Einstein's name is represented by a plain literal,
specifically a basic string with no type or language information:

    <http://dbpedia.org/resource/Albert_Einstein>
      <http://xmlns.com/foaf/0.1/name> "Albert Einstein" .

A plain literal can also be given a language tag, to capture plain
text in a natural language. For example, Einstein's birthplace has
different names in english and german:

     <http://dbpedia.org/resource/German_Empire>
         rdfs:label "German Empire"@en;
         rdfs:label "Deutsches Kaiserreich"@de .

]]]


>> "3.6 Turtle"
>> ------------
>>
>> * Perhaps the first two examples should include the full data being
>> discussed for the sake of completeness?
>
>
> I couldn't decide what was missing.

The full data (including the suggested language literals above) seems to be: [[[

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dbp: <http://dbpedia.org/property/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://dbpedia.org/resource/Albert_Einstein>
  foaf:name "Albert Einstein";
  dbp:birthPlace <http://dbpedia.org/resource/German_Empire>;
  dbp:dateOfBirth "1879-03-14"^^<http://www.w3.org/2001/XMLSchema#date>;
  foaf:depiction <http://en.wikipedia.org/wiki/Image:Albert_Einstein_Head.jpg> .

<http://dbpedia.org/resource/German_Empire>
  rdfs:label "German Empire"@en;
  rdfs:label "Deutsches Kaiserreich"@de .

]]]

The second example could probably do with just the data needed to
illustrate abbreviation of the subject and datatype, i.e.: [[[

@prefix dbp: <http://dbpedia.org/property/> .
@prefix dbr: <http://dbpedia.org/resource/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

dbr:Albert_Einstein dbp:dateOfBirth "1879-03-14"^^xsd:date .

]]]


>> "6. CURIE Syntax Definition"
>> ----------------------------
>>
>> * The following is stated:
>> [[[
>> A CURIE is comprised of two components, a prefix and a reference. The
>> prefix is separated from the reference by a colon (:). In general use
>> it is possible to omit the prefix, and so create a CURIE that makes
>> use of the 'default prefix' mapping; in RDFa the 'default prefix'
>> mapping is http://www.w3.org/1999/xhtml/vocab#. It's also possible to
>> omit both the prefix and the colon, and so create a CURIE that
>> contains just a reference which makes use of the 'no prefix' mapping.
>> This specification does not define a 'no prefix' mapping. RDFa Host
>> Languages must not define a 'no prefix' mapping.
>> ]]]
>>
>> I find this confusing on three accounts:
>>
>> - Is the default prefix mapping set? Shouldn't it be possible to set
>> it to what ever the default *vocabulary* is? So that someone can use
>> e.g.:
>>
>>     <a vocab="http://example.org/vocab#" rel=":describedby" href="">
>>
>> To mean:
>>
>>     <>  <http://example.org/vocab#describedby>  <>  .
>
>
> No.  In CURIEs :foo ALWAYS references the XHTML vocabulary.  We provide no
> way  to override this.

I see. So this means that there is no way to use create such a triple
using @vocab (due to 'describedby' being a predefined term)? I.e. for
that, one have to use CURIEs with non-empty prefixes or full IRIs?


>> - This definition of CURIEs state that terms are also CURIEs, does it not?
>
>
> No.  TERMS are things that are looked for BEFORE CURIEs are matched.

Ok. The interplay of concepts is quite intricate here. It seems to me
to be an overlap between terms and CURIEs with no prefix and leading
":"? I had the idea that terms were distinct from CURIEs in that the
latter always started with a (possibly empty) prefix and ":". This
since some terms are predefined and some resolved against the local
default vocab (as defined in section "7.4.3 General Use of Terms in
Attributes"). But not all references are terms, that seems clear.

So it seems that certain expressions like "item@a" or "value#b" would
fail to match the rules for terms, but are valid CURIEs. How would
they then be resolved? From the next piece I conclude: not at all.


>> - Although I interpret "RDFa Host Languages must not define a 'no
>> prefix' mapping" to mean TODO
>
>
> Not sure where you were going with this, but...

(Ugh; apparently I left this incomplete. I'm so sorry.) Well, at the
time I thought that host languages could, in their default context,
define such a mapping with the default vocabulary mechanism. But that
is only for terms, and not for this 'no prefix' mapping; right? (And
as said above, I conclude that those two expressions ("item@a" and
"value#b") would not resolve.)

Am I reading this right?


>> * I would like to see a note here about CURIEs effectively working
>> like protocol shorthands, with appropriate warnings about how they
>> *may* overshadow existing or future protocols (especially profiles
>> with many prefixes could cause this in a non-obvious way). This is
>> what we discussed when we resolved ISSUE-90 [1]. Note though that the
>> resolution of ISSUE-125 [2] might remove the need for this.
>
>
> Added some text

Great, this was definitely needed. However, I'm not satisfied with it yet.

 - In "it is possible though unlikely, that schemes will be introduced
in the future that will conflict with prefix mappings defined in a
document", can we really say "unlikely"? The creation of schemes is
entirely independent of the use of prefixes in RDF contexts, so we
really don't know. We've already seen e.g. "http", "geo", "tag" and
"urn", all in <http://prefix.cc/>. Perhaps if we say: "it is possible
though unlikely, that popular schemes will be introduced in the
foreseeable future that will conflict with (popular) prefix mappings".

All of this requires monitoring and interception by people aware of
both contexts. This note is where we raise awareness of this need for
coordination. Of course problems won't appear over night, maybe not
for many years; hopefully never. (And I do hope that the creation of
new IRI schemes will continue to decrease in popularity). I gather
that our position is to expect people to understand this and inform
each other early on. (I'm just not sure.)

 - The example uses an @href, but those cannot contain CURIEs, so they
are safe. The attributes of concern are @about and @resource. (Of
course if the use of CURIEs would catch on and become available to
"actionable" link attributes the situation is worsened.)

 - "In neither case would this RDFa overshadowing of the underlying
scheme alter the way other consumers of the IRI treat that IRI." I
don't follow. Do you mean in the sense of consumers *not* using RDFa,
only the lexical value? That seems irrelevant to me for our purposes.

 - "It could, however, mean that the document author's intended use of
the CURIE is misinterpreted by another consumer as an IRI." That
should probably be: "It does mean that the document author's intended
use of an IRI is misinterpreted, since any RDFa consumer would expand
that as a CURIE and get a different IRI as a result."

Note that in attributes where a prefix overshadows a scheme the
resulting IRI in the RDF data is irrevocably different from its
"CURIE-looking-like an-IRI" origin. And a consumer of the resulting
RDF may never detect this. It's even more untraceable if the triples
are transmitted further.

 - "The working group considers this risk to be minimal at worst." I'd
strike "at worst". (The notion "CURIE injection" comes to mind in a
scenario where some social networking giant starts to publish snippets
of RDFa for unassuming web administrators to use, poised against
another proprietary protocol of some major digital artifact vendor,
where the scheme and prefix happen to be the same. Of course I'm
really exaggerating here. But perhaps you see my point.)

(And while I fear there's little support for disallowing
"prefix://"-like forms from being CURIEs, *if* that would be accepted
then we should of course reformulate this note to highlight the
lessened risk, and explain which kinds of schemes (forms of IRIs) are
still at risk and require this care.)


>> "7.2 Evaluation Context"
>> ------------------------
>>
>> * Should it be explained that "parent subject" is only used, as far as
>> I can see, for handling incomplete triples? (Or am I missing
>> something?)
>
>
> They are, but I felt it was clear enough from the language.

Ok.


>> * Perhaps the concept "parent object" could be renamed to "parent
>> resource", due to the way it is used in the processing.
>
>
> I felt it was too late to change this.

Ok.


>> "7.4.2 General Use of CURIEs in Attributes"
>> -------------------------------------------
>>
>> * The note states: "An empty attribute value (e.g., typeof='') is
>> still a CURIE, and is processed as such.". Is that really true? Isn't
>> it rather so that certain attributes have meaning (effect on the
>> processing) even when empty? The notion of an empty CURIE strikes me
>> as strange.
>
>
> I was not sure how to change this.  While it is odd, it is important for
> many steps in the sequence that empty attributes are not ignored.

Many RDFa attributes expresses meaning even when empty, so their
presence is in itself important information.


>> * The last sentence "As a special case, _: is also a valid reference
>> for one specific bnode." is the only explanation of what "_:" means. I
>> think it should be elaborated a little upon, making it clear how it
>> works and why. (Also I was under the impression that it should
>> generate a unique bnode each time it is used (and not represent the
>> same bnode across the document), but that does not seem to be the
>> case?)
>
>
> I have no idea at all what to do here.

Nor do I. I've never used it, I think usage of empty @typeof fulfills
my potential needs for what I thought it meant, and I don't really get
why a kind of bnode "singleton" would be useful at all. Can anybody
explain what it means and is used for?


>> "7.5 Sequence"
>> --------------
>>
>> * Step 1 (and 3). Shouldn't the local list of IRI mappings actually be
>> set to *a copy of* the list of IRI mappings from the evaluation
>> context? In step 3, this local list is mutated by adding to it, so we
>> must ensure that someone implementing it like this doesn't affect the
>> list for following sibling elements. Either that or express "adding to
>> the local list" differently, in functional terms.
>
>
> This is not really an issue.  'local' is local to each iteration of the
> depth-first processing loop.  Nothing is passed 'by reference' in this
> algorithm.

Ok.. I see what you mean. Still, since it reads "the local list of IRI
mappings is set to the list of IRI mappings from the evaluation
context", and then "and these are added to the local list of IRI
mappings", might one not get that impression? It concerned me since in
step 13, the new evaluation context is either "a copy of the current
context that was passed in to this level", or it is constructed again.
Well, I may be splitting technical hairs here, and I suppose
implementers won't actually be misled into mutating a shared list for
all levels. And if they still do, basic testing will quickly show them
the error in that. :)


>> * Step 9. Shouldn't there be an error or warning if both @rev and
>> @inlist are present?
>
>
> No - @rev is not relevant to @inlist, but an @rev on the element might still
> generate triples relevant to the subject (think of @rev, @rel and @inlist on
> the same element).

Ah, of course; very good point! Thanks.


>> * Step 10. It would be good to explain why "Also, current object
>> resource should be set to a newly created bnode".
>
>
> Fixed

Hm...  It now says: "Also, current object resource should be set to a
newly created bnode (so that the incomplete triples have a subject to
connect to if they are ultimately turned into triples)".

But I thought that the subject for the incomplete triples is to be the
*parent subject* resource (and the very reason for it to exist). Note
that this newly created bnode in step 13 is used for the next parent
*object*. All of this to support:

    <div about="#parent_subject" rel="dc:hasPart">
        <p property="rdfs:label">anonymous object</p>
    </div>

To generate:

    <#parent_subject> dc:hasPart [ rdfs:label "anonymous object" ] .

In the more common(?) cases of hanging rels, where nested markup
defines a new resource with @about, @typeof or a resource atttribute,
this newly created bnode will be replaced and forgotten by that/those.

It's late though so I need someone to verify this! If I'm not totally
lost, the explanation for the newly created bnode should probably say
something like: "so that the incomplete triples have an object to
connect to if the first triple generating item encountered in a
subsequent level is a predicate". (Well, some legible wording of
that..)


>> "7.6 Processor Status"
>> ----------------------
>>
>> * While this section is good, and the concept of a processor graph in
>> general can be useful, I do wonder if it's really tangential to
>> defining the core of RDFa processing? Have we considered placing it in
>> a separate document? If I understand correctly, it came to life as a
>> part of the now removed custom profile processing mechanics? Mostly
>> thinking out loud here though.
>
>
> It is needed for warnings still.

Ah, yes, of course.


>> * Section "10. RDFa Vocabulary Expansion" should include the relevant
>> triple from the cc vocabulary used for expansion?
>
>
> I think it does.    It shows the dc: reference that the cc: item refers to.

True, but I meant the triple causing this to expand. That is, in the
RDFa (nice!) from <http://creativecommons.org/ns#>, this triple:

   cc:license rdfs:subPropertyOf dct:license .

is specifically what makes the expansion infer dc:license from cc:license.


>> "B.4 Changes"
>> -------------
>>
>> Could this perhaps be merged with its only subsection "B.4.1 Major
>> differences with RDFa Syntax 1.0"?
>
>
> Hmm - maybe.  Not right now though.   Thinking about it.

Ok. Perhaps it'd be more valuable to have more than one subsection,
e.g. "Minor differences", "Complying to RDFa 1.0" or similar (though
that latter part in isolation is odd since it advices usage of
deprecated features like @xmlns).


That should be all.

Thanks for addressing my entire bulk of remarks! Impressive editorial
work (which I've gathered is your modus operandi).

Best regards,
Niklas
Received on Thursday, 26 January 2012 02:08:54 UTC