Re: Named Graphs in RDFa from Mark Birbeck on 2009-02-17 (public-rdf-in-xhtml-tf@w3.org from February 2009)

From: Mark Birbeck <mark.birbeck@webbackplane.com>
Date: Tue, 17 Feb 2009 09:54:47 +0000
To: Kjetil Kjernsmo <kjetil@kjernsmo.net>
Cc: RDFa <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <ed77aa9f0902170154h595e7258p31362d60b960a007@mail.gmail.com>
Hi Kjetil,

My apologies for the delay this time. :)

I probably didn't make it clear enough where the distinctions were
between our two approaches, for which I apologise.

So let's put Fresnel to one side -- I agree it's complicated, and if
that was to be part of a solution it would need to be greatly
simplified.

Also, put aside my jSPARQL technique -- that's just a way for one lot
of people to create 'formatters' that other people can use (the HTML
author wouldn't need to get involved in that, but that's by the by).

So let's simply say that there is _some_ templating language or
technique, but we don't yet know what it is...and in fact there may be
many.

Now, that leaves us with the core of our disagreement, which is that
in your technique, you would like to use a 'named graph' to identify
the templating rules, whilst I don't think that is necessary.

I do think that named graphs are an important concept and that RDFa
has the power to get it right. But I don't think that an RDFa document
should have the power to place triples into _any_ named graph, for
reasons of provenance (which I've mentioned).

But anyway, we can return to named graphs later, and for this post I
want to stress that I believe that the templating language that is
used (whatever it might be), can be accessed by using 'well-known
predicates', rather than a named graph.

In other words, we don't actually _need_ to use named graphs as the solution.

To illustrate, in your example, you do this:

  <rat:graph xml:id="query1" endpoint="http://dbpedia.org/sparql">
    <div about="sub:resource">
      <div property="rdfs:label">Resource Description Framework</div>
      <div property="rdfs:comment" datatype="rdf:XMLLiteral">
        <rat:variable name="sub:comment"/>
      </div>
    </div>
  </rat:graph>

You want to overload the use of @xml:id (which you'll notice that the
RDFa spec was *very* careful to avoid using, for reasons I won't go
into here), you want to add some extra elements and attributes to
XHTML, and you want to parse @about, etc., in different ways in
different contexts.

On the use of @xml:id for named graphs, as I said, I think we should
discuss that separately.

I think the biggest problem here is the use of an extra element, and
the fact that it seems unnecessary. The use of an extra element means
that a 'version 1' RDFa parser would parse your markup differently to
a 'rat-enabled' parser. The latter would realise that it has a
template, but the former would just parse @about as normal.

My main point, though, is that I don't think we need to do this at
all, because if we say that the core of your proposal is the notion of
expressing both a query and substitution template in one go, why not
just do this:

  <div typeof="rat:Template">
    <span rel="rat:endpoint" href="http://dbpedia.org/sparql"></span>
    <div rel="rat:pattern" datatype="rdf:XMLLiteral">
      ...
    </div>
  </div>

Then all your processor has to do is to run through the triples
obtained from parsing the entire document, find any item of type
'rat:Template", and then begin processing the templates. (That's what
my processor does, except it looks for Fresnel rules, but the
principle is the main point here.)

You could even go further and indicate on an element which template to apply:

  <div about="[_:a]" typeof="rat:Template">
      ...
  </div>

  <div about="[_:b]" typeof="rat:Template">
      ...
  </div>

  <div rel="rat:applyTemplate" resource="[_:a]">
    ...
  </div>

  <div rel="rat:applyTemplate" resource="[_:b]">
    ...
  </div>

(And this would allow you to load templates from a library.)

Of course, these triples would be part of the default graph, but in my
view that is correct, since it is this document that contains the
templates, i.e., they are part of the named graph that the
query/substitution belongs to. (To put it a different way, the
provenance of the templates is clear.)

With all of that as the context, I'll address some of your points
directly. Note one rider, though; when I read your template proposal I
mentally parsed the named graph support as if it was in line with the
document you wrote with Toby:

  <http://buzzword.org.uk/2009/rdfa4/spec>

The latter proposes that a named graph can be 'any URI' which is what
I am very much arguing against, for the reasons I've outlined below.

However, I realise that the template proposal uses a kind of 'graph
collection' approach, which doesn't suffer from the same problems. I'd
like to discuss that separately, so as I've said, this email is mainly
about the fact that I don't think we need any new features in RDFa to
handle templates.


> Hi Mark! Thanks for the insightful comments, and sorry for taking so
> long to respond.

Not at all..now it's my turn to apologise. :)


>> I'm not convinced that named graphs are required to support the
>> use-case that you describe, and I'd like to show another approach to
>> templating that doesn't require them.
>
> I'm all ears!
>
>> But the only way this is manifest at the RDFa document level is that
>> the URI of the document becomes the named graph.
>
> Right!
>
> I think this is valuable too, and so I finally got around to actually
> read the excellent spec and found that it mandates a single default
> graph, and I would not suggest that this is changed, as it would break
> both this useful feature and backwards compatibility. Thus, we are
> suggesting that the triples are in the named graphs in *addition* to
> the to the default graph.

The ability to support additional graphs is of course, by design. This
allows RDFa processors to do other things, including adding their own
processing rules, whilst still retaining the ability to say in the
specification that 'a conforming processor must produce these
triples'.

For example, in my opinion the 'alt' attibute on the (X)HTML 'img' tag
could be regarded as an rdfs:label for the image:

  <img alt="Me on holiday" src="holiday.png" />

However, in past discussions there has been no agreement on that, and
so we were left with a tricky situation; if I was to add that feature
to my RDFa processor, then I would be non-conformant because my
processor does not produce the set of triples that the RDFa spec says
should be produced. But on the other hand, we don't want to stifle
innovation, and stop people generating additional triples that they
can do clever things with.

So by allowing me to add an rdfs:label to some separate graph in my
processor, I can still achieve 100% conformance -- because my default
graph matches the one described by the spec -- at the same time as
allowing me to experiment and try out new things.

And of course, if one day some feature that someone has been
experimenting with in their processor gets incorporated into the spec,
then you can simply move the processing in your processor, so that the
triples get stored in the default graph.

But I don't think your templates fall into this category. I don't see
anything wrong with you storing your template rules directly in the
main graph -- after all, from the point of view of 'provenance', then
it really is the case that the origin of the triples about the
template is the HTML document currently being parsed.


> The main argument against this approach is duplication of data, but that
> is a minor thing compared to the potential usefulness of the approach.

:)

I know that people always say that, but for me, if I end up with
duplication, alarm bells start ringing and I try to find a more
efficient way.

But anyway, that's not an argument for, or against, so let's put this
to one side.


>> So I'm keen to see us preserve a one-to-one mapping between an
>> HTML+RDFa document and a named graph.
>
> Sure, but that we say they need to be in both graphs takes care of that,
> right?

[These comments are mainly relevant to your named graph proposal, in
the situation where the graph attribute has @about-like properties.]

No, it doesn't. I'm talking about a one-to-one mapping which means the
identity works in both directions; you are talking about a one-to-many
mapping, in that each graph could contain triples that come from many
different sources.

I'll try to explain.

Say I have a document with the URI of A, and it contains some triples.
I also have a document with the URI of B, and that also contains some
triples. If I store them in my triple store, in two named graphs, then
it's very easy to keep the triples apart. If I visit A again at some
point in the future, I can simply delete all of the triples, and
reinsert them again, without having to worry that named graph A has
been 'polluted' by triples from somewhere else (i.e., that I am
deleting too many triples).

All the triples are still usable of course, because I can run a SPARQL
query across all graphs at the same time. And of course, I can also
query just one graph.

(And since a named graph is just a URI, I can also query for 'all
graphs created by Kjetil', before then querying those graphs.)

Now, what if any HTML document that contains RDFa (which is
essentially a named graph) can add triples to any other named graph?
That is the model you are describing, and that allows document A to
contain triples that will end up in *both* named graphs A and B (the
same goes for document B).

At the end we still have two named graphs, just like in my scenario,
but the problem now is that we can't separate the triples that are in
graph A that came from your document, from the triples in graph A that
came from my document.

You could argue that this is a mere implementation detail, and that
you could keep track of the origin of each triple; to some extent that
is true, but there are two important issues here. The first is whilst
most people are talking about using 'quads' to keep track of their
triples (i.e., the triple plus named graph, which is equivalent to
origin), your solution would require 'pents' (i.e., triple, plus named
graph, plus origin).

The second issue is that we already have a mechanism for querying
named graphs in SPARQL, and that mechanism could be used when dealing
with things like 'give me all statements that came from document A'.
But we don't have any way of querying across 'pents', so that would
need to be invented.


>> TEMPLATING
>>
>> I'm really excited to see the proposals you've made on templating,
>> but perhaps I can explain the approach I've taken to the questions
>> that you have raised, to show how I don't think you need named graphs
>> to do what you want.
>>
>> In the library I mentioned, I've taken an approach to templating that
>> is based on Fresnel [2]. To be brutally honest, I think Fresnel is a
>> bit over-complicated :), but I felt that since it already existed, it
>> would probably make sense to start with that, and then add things as
>> it became clear what else was needed.
>
> Yeah, we also looked at Fresnel, and we came to the opposite
> conclusion. :-)

Which conclusion? I said that it's over-complicated, which I think you
agree with. ;)


> To explain where I come from: We do mostly ontology engineering, big-O
> and little-o, reasoning, SKOS thesauri, search and that kind of stuff.
> The web work we do is currently trivial, thus it is not where we'd like
> to spend time, and moreover, we'd like to give the styling to someone
> else in the company, who might be good at CSS and know a little XSLT,
> but we're not there yet.
>
> I suspect that we might have this in common with some web developers who
> only wants to use a bit of data of the Semantic Web with their
> relatively simple web pages.
>
> So, if visualization was important to me, I'd certainly go with Fresnel,
> and I think I might find use for the full complexity of it. I've been
> advocating that we pick up Fresnel for a long time, but it was hard to
> sell. Not the least because it meant that a designer, who we'd might
> use for styling the site, would need to learn it in addition to CSS.
> So, I was thinking in terms of "as simple as possible, but not simpler"
> (I'm no fan of KISS, because it tends to result in things that doesn't
> do the job).
>
> A solution that could let me write the HTML and the designer CSS, would
> be the right tool for the job right now. Again, if I was writing
> something more advanced, where the designer should control the HTML too
> would require a Separation of Concerns regime that would make my RDFa
> Template proposal the wrong tool for the job.

I don't necessarily disagree, but I apologise again that I've 'hidden'
my main point behind a discussion about what the templating language
should look like. The solution you have proposed is certainly
workable, and my only criticism is that I don't believe that it
requires anything more than is already available in XHTML/HTML with
embedded RDFa.


>> As you're probably aware, the Fresnel format contains a set of RDF
>> that describes rules such as 'given an item of this type, add this
>> CSS class'. This works quite nicely with RDFa because any triples
>> that are queried from an RDFa document have a definite location. For
>> example, if you have:
>>
>>   <div typeof="foaf:Person">
>>     ...
>>   </div>
>>
>> then querying for all items of type 'foaf:Person" leads naturally to
>> the div that contains the RDFa, making it easy to set a CSS class on
>> it.
>
> Right, but we have some examples that different foaf:Person's should be
> treated very differently in our apps.

Hence the use of a SPARQL derivative, to get finer-grained access to
the data in the page.

But as I said...that's a digression. :)


>> So, the Fresnel example I just gave would be expressed in RDFa (and
>> jSPARQL), like this:
>>
>>   <div
>>    xmlns:fresnel="http://www.w3.org/2004/09/fresnel#"
>>    typeof="fresnel:Group"
>>    style="display: none;"
>>
>>     <div rev="fresnel:group">
>>       <div typeof="fresnel:Format">
>>         <div property="fresnel:instanceFormatDomain">
>>           select: [ "s", "item" ],
>>           where:
>>           [
>>             { pattern: [ "?s", "http://ebay.com/item", "?item" ],
>> setUserData: true }
>>           ]
>>         </div>
>>
>>         <span property="fresnel:resourceStyle"
>> datatype="fresnel:styleClass">ebay-item</span>
>>       </div>
>>     </div>
>>   </div>
>
> I see! But I feel that this is a lot further from RDFa than my proposal.

Mmm... with respect, I'm *only* using RDFa. :)

Your proposal on the other hand, has additional elements and
attributes in your own namespace, has devised a use for @xml:id, where
currently one doesn't exist, has added the ability to support named
graphs, and seeks to suspend normal processing of @about and other
attributes under certain circumstances. ;)

But as I say, how the templating language looks is not the key thing;
I'm just stressing for now that a templating language can be achieved
using current RDFa.


> You'd have to understand SPARQL and jSPARQL much deeper to actually use
> it, than just use a bit of XML and there you go. Also, it has a lot
> more implementation infrastructure behind it. I was also thinking along
> those lines for an XSLT-like RDF transformation language, but I
> rejected it. If I required that much knowledge about the data, I'd use
> some kind of ontology class-OO class mapper and do the work in the
> application View.
>
> But again, that's the thing I would do in an application that had
> complex requirements for the Web interface. My current use is for the
> applications that only requires a very simple Web interface.
>

No problem -- it's the named graph v. using current RDFa that we're
really talking about.


>> Note that this is in the same document as the data itself, and for
>> the reasons I gave in the first part of this email, I think that it
>> is correct that the formats and the data end up in the same named
>> graph.
>
> Sure! As stated, there is no conflict with the proposal.

Except I mean that there is no need for additional named graphs.


>> Anyway, the key point I'm driving at here is that there is no need to
>> keep the templating rules separate from the main document's graph,
>> since they are part and parcel of it. They are much like CSS rules,
>> in that they operate on the DOM, but they use semantic selectors,
>> rather than DOM selectors. All that is required is to use various
>> predicates as the trigger for what to do, rather than segmenting
>> things with named graphs.
>
> Hmmmm, I don't feel you quite demonstrated this...
>
> Importantly, the same predicates in different parts of the document
> could be used in very different ways. So, I'd have to at least use
> a "triple fingerprinting" to resolve such problems, I without having
> tried, I think that too would fail. For example, in some uses, we have
> a foaf:Person that in one case is an author and in one case is the
> audience. They are known as different to the page author, thus
> identifying them with different named graphs would be trivial, but
> their data structure is identical. And the implementation complexity
> would be much larger, I fear, and I'm out for something that's really
> simple to implement.

By all means implement away, but I think you should avoid adding
things that would not be conformant with other RDFa parsers.

The solution I am suggesting -- not the use of Fresnel, but the
general algorithm that you create some 'well-known predicates' which
your parser then finds after loading -- would create exactly the same
triples in both your parser, and a parser that is unaware of your
templating rules.

However, the solution that you've outlined in your document would
actually produce different triples in your parser. For example:

  <rat:graph xml:id="query2" endpoint="http://dbpedia.org/sparql">
    <tr about="sub:resource">
      <td property="foaf:name" datatype="rdf:XMLLiteral"><rat:variable
name="sub:name"/></td>
      <td property="dbo:produced">1973</td>
      <td property="dbp:firstFlight"
datatype="rdf:XMLLiteral"><rat:variable name="sub:first"/></td>
    </tr>
  </rat:graph>

In this example, in your templating language, @about is not a subject,
but a subject that will be substituted. This means that in your parser
there would be no triple generated by this, but in another parser
there would.


> Well, to sum up, my key point is that at this point, it is important to
> have several different approaches flourish. I can certainly see that
> yours have a very important use (though I would do more in the View and
> not inline (j)SPARQL in the page)...

But don't forget, having inline rules is merely the simplest example.
Since the templates and queries are defined using RDF, then it means
that any mechanism that can be used to import triples can be used to
import template rules. As I said before, this might be something as
simple as @rel="owl:imports".


>... but I still feel that it is not the
> right solution for us, and also too complex for many web developers
> just out to get a little data from the Semantic Web into his
> application.

Sure...no problem. :) As I said, I blurred the issue by discussing
templating solutions, when in fact my problem is with the use of named
graphs. Here I'm afraid I cannot be so laissez-faire; using @xml:id
and named graphs is something that has to be designed right for all of
RDFa, not just one use case. (And as I've tried to show, in this
particular use-case I don't believe it is even needed.)


> Certainly, one day, all lenses that will ever be needed are written,
> which will change the picture, but up to then, I think several
> directions should be left open.

But by using RDF to define 'lenses' you don't need to imagine that
there will be a finite set of lenses. And in fact, the beauty of using
RDF to define these kinds of rules is that we can even use reasoning
to decide which lenses to deliver to you.


CONCLUSION

With such a lengthy email -- sorry about that! -- it might not be
clear what exactly my conclusions are.

The key thing is that I'm all for having a templating language, but
would urge that it is done using 'normal' RDFa, rather than adding new
features.

I don't disagree that there *is* a discussion to be had about named
graphs, but I think that should be had separately.

Regards,

Mark

-- 
Mark Birbeck, webBackplane

mark.birbeck@webBackplane.com

http://webBackplane.com/mark-birbeck

webBackplane is a trading name of Backplane Ltd. (company number
05972288, registered office: 2nd Floor, 69/85 Tabernacle Street,
London, EC2A 4RR)
Received on Tuesday, 17 February 2009 09:55:32 UTC