namespace documents (thoughts and issues)

I've been working on what to put at the rdf: and rdfs: namespaces. This 
is a long email, with three sections.

     1. goals for the project
     2. what my software will look for in RDF data
     3. specific issues with the two RDF namespaces

Perhaps the most interesting/novel bit is about HTML and language tags.  
Plus my idea for handling rdf:_100.  :-)

This is progress on ACTION-98, my last remaining W3C action item (!).

=== PART 1:  Goals ===

1. If someone puts the IRI of a term in the rdf: or rdfs: namespaces 
into their browser, they'll get some nice documentation on that term.  
The URL field will continue to show that term (it wont redirect).

2. That documentation will link to other terms.  When it does so, 
clicking will repeat the experience as above: the IRI of the term will 
be in the URL field of the browser, and the user will see decent 
documentation.

3. The documentation will be available in multiple languages.  We don't 
need this on day one, but we currently have the dcat schema in English, 
Spanish, Arabic, Greek, French, and Japanese (thanks for Phil Archer's 
pushing on that).  I'm still learning how to do a multi-lingual webapp: 
there's an early version at 
http://www.w3.org/2013/vocabspec/examples/dcat.html -- in that version, 
you use the gear in the upper-right to change the language.   I'm in the 
process of changing it to use the browser's language setting as a 
starting point, then allow a simpler selection control.

4. The software will be available so other people can do this with their 
namespace documents easily enough.

5. The documentation will be entirely driven by RDF triples that one 
gets by dereferencing the namespace documents while asking for 
Content-Type text/turtle.  In the dcat example above, view source shows 
the .html file is just a shell; the content is generated at browse-time 
from the turtle at the dcat namespace.

6. In the real deployment, there will also be content-negotiated static 
versions at the namespace URL so that search engines and non-javascript 
browsers can see the content as well.   Folks hosting namespace 
documents, if they want this, will have to run a node.js program to 
re-generate the static files whenever the turtle is changed.

7. Over time, I want to evolve the code to include social features like 
crowdsourced translations, stars (aka "like", "+1", "endorsement", 
"bookmarks"), and links to code and public data sources that use the 
term.  Obviously that will have to be done carefully to avoid detracting 
from the official documentation. (This part is a research project.)

I think that's it.

===  PART 2: What my software will look for in RDF data ===

One challenge is that for expressing the documentation in RDF, I don't 
know of any consensus around a vocabulary or how to use it. Here's my 
best guess, but I'm making some of this up.  Feel free to correct me 
(but soon, please).

The basic predicates:

* rdfs:label - a name, usually one or two words; the English version 
will usually be the same as the end part of the IRI.

* rdfs:comment - a descriptive phrase, usually 5-10 words, might be in 
rdf:HTML, especially if it needs to other terms in the vocabulary

* dc:description - a longer, definitional description, usually 1-5 
paragraphs (using rdf:HTML for formatting).   For rdf and rdfs, I plan 
to copy the HTML out of RDF Schema 1.1.  For example, for rdf:type it'll 
be the stuff at 
https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-schema/index.html#ch_type

* vann:usageNote - arbitrary length, more practical and less 
definitional than dc:description.  (I don't plan to use this for rdf: 
and rdfs:, but it's used by dcat and others.)

* dc:title - for the title of the namespace document; ignored on terms

It'll also show the rdfs:domain, rdfs:range, rdf:type, rdfs:subClassOf, 
rdfs:subPropertyOf, and other bits I can think of how to include without 
making things to complicated.   Also, rdfs:isDefinedBy linking to the 
right section of the spec.

On language/HTML handing:

I think this is how to do it:

    <some term> dc:description
        'description of that term in plaintext English',
        'description of that term in plaintext English'^en,
        '<div lang="en">description of that term in HTML English</div>',
        'description of that term in plaintext French'^fr,
        '<div lang="fr">description of that term in HTML French</div>',
        ... etc

The xs:string is there for non-multilingual apps, and to use as the 
fallback (with a warning) if no matching languages are found.

This approach implies that predicates with natural language expressions 
as their range MUST be conceptually single-valued.   You can't do 
this:   {  <s> rdfs:comment "some comment"^en, "some other comment"^en. 
}  I expect I'll have my software display a warning if this kind of 
thing (two values with the language language) occurs in the data.  See 
[1] for some more discussion of this.

I guess I'll treat an HTML literal without a lang attr on the first 
element as like the xs:string literal -- a fallback for when no 
available values lang-match the user's preferences.

I plan to ignore triples giving domain, range, or subClassOf as 
rdfs:Resource, since they're meaningless.

===  PART 3: Specific issues with the two RDF namespaces ===

* Should we include any dct:creator or dct:contributor triples? It's 
hard to make that helpful and fair given all the people who've been 
involved with these namespaces over the years.

* Should we leave out the meaningless triples giving domain, range, or 
subClassOf as rdfs:Resource?   There's some pretty odd stuff there now.

* What should we do about rdf:_1, etc?    I'd think having the first few 
in the namespace document would make sense, maybe rdf:_1 through 
rdf:_20.   I *could* put in special javascript for arbitrary ones, but 
that seems kind of goofy.

* Can we say *anything* about how investing in implementing reification 
systems might not be your best bet?  Pretty, pretty please?  Or do we 
have to let that wait for the commenting mechanism?

* What's the title of the rdf: namespace document?  I propose, "The Core 
RDF Vocabulary"

* What formats do we serve the schemas in?  They've been just RDF/XML so 
far.   Left to myself, I'd just do Turtle.  I'm okay with including any 
other format for which there's a serializer available for node.js, so I 
can generate them out of the same system.  If someone wants json-ld, 
could they please write a @context that makes the schema look nice in json?

So...   thoughts?

       -- Sandro


[1] http://lists.w3.org/Archives/Public/public-rdf-wg/2013Dec/0151.html

Received on Monday, 10 February 2014 02:30:41 UTC