microdata feedback / Was: Re: Why bound prefixes are an anti-pattern in language design

Hi, 
just to add an alternative view: I've been using eRDF, RDFa, and multiple
homegrown RDF-in-HTMLs during the last years, none was really satisfying.

Now I've tried microdata and it's actually a very refreshing experience. 
RDF authoring finally feels like HTML authoring (and not just like an RDF 
syntax that partly looks like HTML). I definitely want a prefix mechanism 
in SPARQL and Turtle, but in practical *HTML*-based apps (especially 
semwebby ones where data is pulled in from remote sources and users are 
enabled to interact with the data inline), they simply suck. Not only
in terms of complexity, but also for server-side efficiency in publishing
systems. As I wrote elsewhere, it takes some time (and practical projects)
to a) realize that fact, and b) to admit it (at least if you are used to 
RDF formats).

Apart from the prefix (non-)issue, I discovered a lot of unexpected nice
things about microdata. Accessing data items and interacting with them 
through JavaScript/AJAX is amazingly easy (just find the parent @item),
in contrast to RDFa, whose possible syntax variations always made me feel
unsafe about my app's behaviour.

But the really great thing is the ability to blend Web 2.0-style 
semantics with RDF ones. RDF is awesome for agile software development
(on-the-fly schema creation and evolution), but you still lose time 
thinking about type and predicate URIs. Unlike other RDF syntaxes, 
Microdata lets me define preliminary/local types and predicates, because 
URIs are not mandatory and non-URI-tokens will be auto-grounded in 
"..custom#" or "../vocab#" for the time being. Instant RDF which I can 
already aggregate, query, and post-process to map it to my app's 
selection of RDF vocabularies. This is nothing less than folksonomy-style
semantic publishing. (Read that last sentence again, this is huge.)

The only thing I'm still missing is the ability to have plain literals
and also literals with markup. I don't need full datatyping, I just want
to efficiently create backups from my published posts, or possibly from
other sites, without losing links and formatting. My idea so far is
to use a special item type, a @class=markup (or somesuch), or possibly
the existing HTML5/Atom means, like <article /> to tell the parser that
markup should be preserved for the current @itemprop, e.g.:

...
<div itemprop="raw-content" class="markup">
   <p>...</p>
   <p>...</p>
</div>
...

...
<div itemprop="raw-content" item="CDATA|Literal">
   <p>...</p>
   <p>...</p>
</div>
...

The special item type would be similar to how the pre-semweb ontology 
editors worked (literals as first-class resources) but I'd still prefer
a more official way, or a communiy-wide agreement (RDF/HTML5, anyone?).

Side note: the RDF community could probably use the reverse DNS
notation for the 10 or 20 core vocabularies, too, to simplify
RDF-in-HTML publishing for a wide range of use cases (and authors).
There are quite some opportunities around microdata that we should 
have a closer look at.

So, before you dismiss microdata, maybe ask yourself if your arguments
are mainly politically motivated, and, more important, build an app with
it first before you argue in favor of other solutions. You might be 
surprised.

Cheers,
Benji

--
Benjamin Nowack
http://bnode.org/
http://semsol.com/

Received on Tuesday, 11 August 2009 17:58:34 UTC