Difference between revisions of "RDF Core Work Items"

From Semantic Web Standards
Jump to: navigation, search
m (JSON)
(Co-reference vocabulary as alternative to owl:sameAs: add to interested people list)
Line 776: Line 776:
 
* David Booth
 
* David Booth
 
* Guus Schreiber
 
* Guus Schreiber
 +
* Bernard Vatant
  
 
}}
 
}}

Revision as of 07:44, 29 June 2010

__NUMBEREDHEADINGS__

STATUS: Rough draft. Developed by group editing at the RDF/NextStepWorkshop. Not all text has been reviewed by anyone.

These are the work items that were discussed at some length, including ones which do not have much support.

See table showing support from workshop participants.


1 Possible Charter Language

1.1 Background

[terse, choppy version for now]

The Resource Description Framework (RDF) was first published as a W3C Recommendation in 1999. In 2001 a new Working Group (called "RDF Core") was chartered to "rearticulate" the 1999 specification, and add some new features, including datatyped values. That group completed its work with a set of Recommendations in 2004.

Since then, RDF has been adopted widely, but practitioners still encounter situations where (1) minor aspects of the current version cause problems, (2) the current design is not well explained, and (3) the documents suggest usage patterns which are not currently considered to be good practice.

This new RDF Core Working Group is intended to make editorial changes and to alter the design where necessary to align with what users actually want and implementors are actually implementing. The group must not do anything to break mainstream deployments of RDF and should try to avoid breaking conformant but idiosyncratic deployments.

The group is to make RDF even more stable, not to make it unstable. It should be careful in its communications to help the market understand that.

1.2 Mission

This Working Group is chartered to advance the adoption and utility of RDF systems by producing W3C Recommendations and Working Group Notes. It will revise existing RDF Recommendations and create new documents to address developments in the field while maintaining compatibility with deployed systems and prevailing best practice.

1.3 Deliverables

The group is chartered to produce certain documents, detailed below. Each specification is to be a W3C Recommendation (or part of one), and guidance is to be text in either a W3C Recommendation or a Working Group Note. The group must decide how to group the deliverable specifications and guidance into documents, and how and whether to revise the current RDF documents.

The names used here for deliverables are for reference only; the Working Group is free to choose different names in keeping with its mission.

During charter development, structured discussion about the charter item is done using Template:CharterItem.

2 Syntax Work Items

@@@ update to include discussion resulting in don't-break-it/don't-fix-it about RDF/XML. MAYBE rdf:graph attribute is small enough to still be essentially compatible.

All syntaxes must be (updated if necessary) support the entire RDF model including whatever other changes are made such as named graphs etc.

Alternative: support the full RDF model in Turtle and JSON. Let RDF/XML and RDFa deal with subsets. The reality may be that RDFa for HTML5 is completed before any future RDF model changes are done.

May need a survey of implementers to see if they would be motivated to update code for syntax changes.

Break existing syntaxes with caution:

  • Do not want to break RDF/XML but if there are changes, might as well break it good removing all weakly deprecated parts and change media type, file name extension. Which amounts to making a new XML syntax.
  • Reluctant to break existing Turtle-RDF but it is not a W3C REC yet, so there is some wiggle room. If Turtle needs named graphs, maybe it should be a different mime type for the same grammar / spec / format.

(Breaking means adding new features that are incompatible and cause existing code to fail. )

2.1 Namespace and Profiles Management

Guidance on techniques for managing collections of namespaces, such as profiles/context in RDFa 1.1 (but may be apply to other syntax). This may be uses in new syntaxes (below) and added to existing syntaxes, if it can be done maintaining backward compatability

The concept is a syntax document or a protocol header that points to a separate profile document that contains a set of short prefix / URIs pair that are used to abbreviate long uris.

Should include consideration of adding profiles to remove the need for a bunch of namespaces - see Twitter annotations and Facebook open graph. Make it friendlier at the top of the document to avoid scaring user.

All use of profiles / namespace management must be the same pattern across syntaxes.


Why do this work now?
  • Author convienience and removing confusing boilerplate.
  • Reducing file size.
  • Don't repeat yourself - reduction of errors.
  • Encodes and encourages best practice and reuse in schema and vocabulary choices.
Reasons to not do this now
  • Requires changing existing parsers and standards if adding to an existing format.
Proposals

See RDFa 1.1 draft.

Likely Technical Issues
  • Expecting this to be applied to JSON and RDFa.
  • If it was applied to Turtle, this would only be worth doing if something else broke the syntax (such as new data formats).
  • Not expsecting this to apply to RDF/XML.
People Interested in Doing The Work

-


2.2 Turtle

Decide the syntax stack of how Turtle themed languages fit together (N-Triples, any future N-Quads, Turtle, maybe N3) including how the media types work.

A specification for Turtle, generally compatible with existing systems which read and write it.

Consider some syntax extensions such as allowing raw date / date time literals (timbl) to improve validation and ease of use.

Consider making this the recommended RDF syntax.


Why do this work now?
  • Because it is awesome.
  • It is in widespread use - in tutorials, W3C docs and code.
Reasons to not do this now
  • Yet another new syntax - all syntaxes must be aligned and capable of encoding the same models.
  • May need new media type (but the current one is not IETF approved).
  • It is in widespread use - possibly cannot make major changes.
  • May need a new name for the named graph format. Qurtle (Dave B)
Proposals

See W3C Team Submission on Turtle.

See Andy S for blog points, Dave B for RDF next steps paper.

Likely Technical Issues
  • Whether to include date.
  • Better formal explanation of mapping from model to syntax.
  • Alignment with SPARQL formats.
  • Errata...
  • May need two mime types for turtle doc that encodes 1 graph only and turtle doc that encodes multiple graphs (sparql dataset) because you may need to know in advance whether to stream an incoming document into a graph or a dataset.
People Interested in Doing The Work

Willing to work on it: Dave, Andy S


2.3 Turtle Support for Graph Identification

A specification for an extension to Turtle which includes support for graph metadata.


Why do this work now?
  • Provide support for the named graph model changes.
  • Align with SPARQL dataset work.
  • Alignment with other serializations if/when they support named graphs.
Reasons to not do this now
  • Concern that you want to know when a document has 1 graph, versus may have many graphs.
Proposals

Trig and N-Quads. (Note Trig is not a true superset of Turtle or N-Quads - ask Andy S)

Likely Technical Issues

Should this be a superset of Turtle?

Expect this to be a different mime type to Turtle, maybe a different named spec.

People Interested in Doing The Work

list of people volunteering to do the necessary writing/editing, or 'None Yet'.


2.4 Named graphs support in RDF/XML

A specification for an extension to RDF/XML which includes support for graph metadata.

[Support for this at the workshop was not polled.]

Consider leaving RDF/XML to support just a single graph or the RDF (2004) data model.



Why do this work now?
  • Supports named graphs in RDF/XML.
Reasons to not do this now
  • Changes RDF/XML.
Proposals

Possible starting point: RDF Source. Named Graphs document.

or just add an rdf:graph (e.g.) attribute to rdf:RDF if you do not want nested graphs.

Likely Technical Issues
  • Decide whether to do nested graphs (model question?)
People Interested in Doing The Work

list of people volunteering to do the necessary writing/editing, or 'None Yet'.


2.5 JSON

A Specification for a way to serialize RDF graphs in JSON.

Should include consideration of adding profiles to remove the need for a bunch of namespaces - see Twitter annotations and Facebook open graph. Make it friendlier at the top of the document to avoid scaring user.

Suggest a survey of existing work and a community building "event" or process to bring alignment since this seems urgent to start soon.


Why do this work now?
  • Allows web authors (Javascript, HTML5, ... developers) more easily use rdf data with existing tools and techniques.
  • Multiple JSON formats and implementations (some interoperable) already exist showing interest in this work.
Reasons to not do this now
  • Current JSON formats are not aligned - differnent approaches - making it JSON-user friendly versus making it familiar to existing RDF users.
  • Needs some R&D and alignment.
  • Risk that the result would be some standard that would not be adopted if it was not 'web author' friendly.
Proposals

Possible starting points include: Talis RDF JSON, RDFj, JSON-LD, and JRON.

See Linked Data API JSONFormats.

Likely Technical Issues
  • Should support named graphs if they are added to the rdf model.
People Interested in Doing The Work

Manu Sporny


2.6 Possible RDF/XML weak deprecations

Revise the RDF/XML specification to advise that certain syntactic constructs not be used by new vocabularies indicating they are "weakly deprecated" or "archaic" - should not be used in new vocabularies, but should still be supported in software. No plan to formally remove from the specifcations.

Candidates for weak deprecation in the RDF/XML syntax include but are not limited to:

  • reification rdf:ID on property elements - align with some "named graph" support
  • rdf:XMLLiteral datatype - use plain literals instead, with quoted XML. Problems with people misusing it with quoting markup, do not want XML C14N. Does not work with RDFa.
  • rdf:ID on node elements (typed or rdf:Description) - use rdf:about instead
  • xs:string used as a datatype - use plain literals instead OR equate them (like the RDFS rule)



Why do this work now?
  • Some of these syntax parts confuse new users - training burden.
  • Experience has shown several of these are not (widely) used or implemented.
  • Records actual usage.
  • Interoperability problems such as using xs:string, rdfxml literal, plain literals not matching / clashing
Reasons to not do this now
  • Keep RDF/XML stable
  • The problems are mostly minor.
  • RDF/XML has been very interoperable
Proposals

See above for candidates.

Likely Technical Issues
People Interested in Doing The Work

-


2.7 Data Model Issues

Consider weakly deprecating some of the data model vocabulary, especially those tied to RDF/XML syntax.

Candidates for data model changes

  • reification vocabulary rdf:subject, rdf:predicate, rdf:object and rdf:Statement
  • rdf:Alt
  • rdf:Bag
  • rdf:Seq (e.g. use rdf:List instead; it's costly to have two similar options)
  • rdf:value - maybe a best practice to use a more specific term if it exists



Why do this work now?
  • data model changes to back up removing confusing, unused, unpopular syntax forms
Reasons to not do this now
  • Do not make changes that are disruptive, low value.
Proposals
Likely Technical Issues
People Interested in Doing The Work

-


2.8 RDF/XML and RDF Concepts Errata

Apply RDF/XML and RDF Concepts spec errata, not discussed at this time.

Typos, errata folded in, clarifications.



Why do this work now?
  • Make the RDF specs match the latest URI work in IRIs
Reasons to not do this now
  • Not sure of the implication of the IRI change.
Proposals

See Jeremy Carroll's analysis of the RDF Core (v1) list of postponed issues list.

e.g. Revise the specifications to globally substitute the term RDF URI Reference with an up-to-date reference to IRIs. This has an issue with allowing spaces: yes in RDF URI Reference, no in IRIs.

Likely Technical Issues

-

People Interested in Doing The Work

-


2.9 Collections syntax in RDF/XML

Request for an less annoying syntax for collections as subjects, literals in collections (timbl)

We do not expect to be breaking RDF/XML and therefore this would not be possible.


Why do this work now?
  • Less annoying syntax, helps some users.
  • Collections for literals are very useful and common.
Reasons to not do this now
  • Any change here very likely is new syntax that breaks existing RDF/XML code parsers and requires code changes.
Proposals

-

Likely Technical Issues

-

People Interested in Doing The Work

-


3 Semantics Work Items

Minutes at [1].

@@@ some places it says "Revise..." should be "Investigate, and if a suitable solution is found, revise...."

3.1 Inference Rules

Fix the inference rules in the semantics, as they are currently incomplete.


Why do this work now?
  • It's a "bug".
Reasons to not do this now
  • None known.
Proposals
  • Fix them (as per Herman ter Horst or preferably as per the Chileans - references needed for both).
Likely Technical Issues
  • None known.
People Interested in Doing The Work

The work has been done, what remains is editorial


3.2 Blank Nodes

Revise the treatment of blank nodes in RDF, so that they correspond more closely to practice, e.g., SPARQL, updates, many systems. This might encompass only the mapping from syntax to graphs, or might also affect the RDF semantics.

If one person has an RDF/XML document, including blank nodes, and sends it to two parties who load it into their systems, write it out again, and send to a fourth party. The fourth party should be able to tell that it has received two copies of the same document.


Why do this work now?
  • Nearly every system converts blank nodes into some kind of labeled node now; they just do it in a way which is kept inside and not interoperable.
  • SPARQL also does something like this.
  • Have a stable way to refer to a given blank node makes change propagation much easier.
Reasons to not do this now
  • Any change in this space could well make blank nodes even more confusing.
  • The problem is over-constrained; there may be no consensus for any change.
Proposals
  • Allow blank nodes to remain in RDF syntaxes, but systems must convert them to labeled nodes (ie., Skolemize them) in some special URI scheme or in some other global naming scheme.
Likely Technical Issues
  • Should the new IRIs be dereferenceable or not?
  • If dereferenceable, then who serves it, and with what content?
  • Can the "label" be removed from the RDF graph, e.g, via SPARQL querying? Can the "label" be used in another RDF graph?
  • Inputting the same document twice might (probably would) end up with non-isomorphic graphs. (Is this really a problem?)
  • If we have a spreadsheet with 10 rows and 5 columns, and one person converts to RDF in any old simple way you get 10 blank nodes and 5 triples off each blank node. If you apply the same algorithm twice, you get the same graph. However, if you skolemize, then you get two different graphs.
People Interested in Doing The Work

none yet


3.3 Simplified RDF Semantics

Revise the specifications so the RDF semantics aligns with current practice, as seen in SPARQL systems. This may involve recasting RDF itself as a data description language instead of a knowledge representation language.


Why do this work now?
  • Many systems don't implement the semantics (correctly). For example, few systems lean graphs.
  • SPARQL querying sometimes produces different results for RDF graphs that are equivalent under the simple entailment.
  • Most RDF users do not understand the implications of the RDF semantics, even whether entailment is important.
Reasons to not do this now
  • The mandated behavior of RDF systems on blank nodes will have to change.
  • People are willing to live with this W3C Recommendation diverging from practice.
  • The cost of revising the semantics in this way may be expensive, as there are potential thorny issues.
Proposals
  • Make the corrected inference rules on unleaned graphs be the normative specification and either:
    • Remove the model-theoretic semantics; or
    • Use instead a unique interpretation semantics (e.g., Herbrand semantics).
  • Collapse all the various semantics into one.
  • Use a unique-interpretation semantics for RDF, and the current possible world semantics for RDFS. NB: This option seems the worst, unless there are technical reasons that it is required.
Likely Technical Issues
  • Making the inference rules normative makes their correctness more important, and this can be problematic.
  • Determining the best semantics for the unique-interpretation semantics.
People Interested in Doing The Work

Stefan Decker.


3.4 Archaisms

Revise the semantic specification to reiterate advice not to use archaic syntactic constructs.

Remove semantic conditions on container membership properties. Remove semantics for rdf:XMLLiteral. Make xs:string and plain literals be the same in simple entailment.


Why do this work now?
  • These conditions cause confusion.
  • These semantic conditions are often incorrectly implemented.
Reasons to not do this now
  • This would change some of the underlying meaning of RDF.
  • The changes to xsd:string and plain literals could result in a change to SPARQL.
Proposals
  • See above.
  • Normalize xs:string and plain literals on input, possibly into rdf:PlainLiteral.
  • Make rdf:XMLLiteral be the same as xs:string.
Likely Technical Issues

Some changes would be quite simple. The changes to xs:string would be a large change to e.g. the SPARQL results from existing data. Changes to rdf:XMLLiteral would need to take into account the original I18N motivations for this feature (e.g. Ruby markup, bidi etc)

People Interested in Doing The Work

None yet.  :-)


3.5 Literals as Subjects

Should literals be allowed as subjects of triples (or even blank nodes and literals as predicates)?

It was the sense of the workshop that this change would not be worth doing.


Why do this work now?
  • Literals as subjects has been on the list of pain points for RDF for a long time, e.g., 3 is the square root of 9, "5678301" is the title of a B52 song. Another use case: in describing HTTP semantics, it is necessary to talk about URIs as strings (rather than as RDF names). Another use case: it would facilitate datatype reasoning, rather than making a lot of blank nodes.
  • The change to some implementations would be trivial. Many triple stores would even be simplified.
  • Some or all filtering of inference results would not be required.
Reasons to not do this now
  • The inference rules would change.
  • The changes to allow this in RDF/XML would be quite complex - some shorthands might not be allowed.
  • More people would inappropriately use strings instead of other entities, e.g., "London" for http://...London.
  • This would diverge RDF and OWL even further.
  • Changes to the OWL documents might be "required".
Proposals
  • It is possible to do _:n owl:sameAs literal. and then use _:n. This can be written as a Best Practice.
  • Collect use cases to see if any are sufficiently compelling.
Likely Technical Issues
  • Determining the correct set of inference rules may be problematic.
People Interested in Doing The Work

none yet


3.6 Semantics for the Next Steps

Updating the semantics to handle extensions added to RDF, e.g, named graphs. This could be very tricky for the current style of the RDF semantics, particularly if there is interesting intended meaning to capture.


Why do this work now?
  • Whatever the rationale is for the extension.
Reasons to not do this now
  • Tricky semantics may be needed.
Proposals
  • Go to the unique-model semantics, where much would be easier.
  • Require that any extension come with a semantics.
Likely Technical Issues
  • If the RDF and RDFS semantics become unique-model semantics then any extension may become simple.
  • Issues from carroll et al. include special interpretation of graph names and a built-in subGraphOf predicate.
  • A further issues with named graphs is interactions, if any, between multiple graphs.
People Interested in Doing The Work

Willing to work on it: None Yet


4 Graph Identification and Metadata Work Items

Produce a W3C Recommendation which provides for interoperability for selected use cases for reification, named graphs, graph literals, annotations, etc.

Breakout session minutes are available at http://www.w3.org/2010/06/27-rdfn-meta-minutes.html

4.1 Graph Identification

Why do this work now?
  • widely used by the community
  • part of SPARQL already
  • numerous use cases
  • clarify confusion in implementation
Reasons to not do this now
  • adds complication and may not solve the issue nevertheless
  • complicates the RDF model (potentially)
  • risks with backward compatibility should be assessed (e.g., syntax)
  • does it need standardization?
Proposals

http://www4.wiwiss.fu-berlin.de/bizer/SWTSGuide/carroll-ISWC2004.pdf

Likely Technical Issues
  • mutual roles of quads vs. singleton named graphs vs. named graphs
  • extension the RDF(S) semantics?
  • new RDF(S) terms? rdf:Graph, rdf:subGraphOf, rdf:equivalentGraph, etc.
  • syntax (TRIG, INRIA Member submission, Web, Graphs and Semantics , n3)
  • graph inclusion, can named graphs share triples
  • whether blank nodes can be shared among multiple graphs
  • whether blank nodes can be used as graph names
  • named graphs do not fully replace reification
  • how would follow your nose apply to named graphs?
  • relationships to SPARQL
  • effects on the SW stack
People Interested in Doing The Work

Axel Polleres, Thomas Lörtsch, Fabien Gandon, Elisa Kendall, Jeremy Carroll


4.2 Annotations

Discussion of whether or not this should be in the charter, or whether this should be a 'time permitting' / 'may'. Use cases for provenance, annotations in general, time dependent features, etc. should be considered in defining named graphs.


Why do this work now?
  • Provenance incubator needs the foundation for providing annotations in RDF, which is currently unclear, but support for named graphs is a first step towards addressing this - there may be additional features needed, which will depend on how named graphs is implemented
  • lack of agreement on how to implement annotations generally
  • there are widely deployed implementations using named graphs and annotations, but there needs to be guidance on best practices for annotations in particular
Reasons to not do this now
  • outside the direct scope of the RDF Core effort, should be done in an XG or elsewhere
  • RDF is domain independent, and should remain so
  • practical reasons, including resource availability at W3C
  • other tools for vocabulary development
Proposals
  • dependent on the solution for named graphs
Likely Technical Issues
  • how do you say something about a named graph - if you make statements about a particular named graph, where should this be stored
  • solution would require clear specification of the impact on the semantics
People Interested in Doing The Work

None Yet - someone from provenance incubator group


5 Linked Data Work Items

Provide guidance and recommendations that support the publication and use of Linked Data.

Break-out meeting IRC minutes: http://www.w3.org/2010/06/27-rdfn-lod-minutes.html

5.1 Codify the follow-your-nose approach to using URIs in RDF

Despite the centrality of URIs in the RDF data model, the RDF specifications treat URIs simply as opaque identifiers that conform to a certain syntax. This reflects neither current practice nor the intention behind RDF and URIs. The W3C's RDF Recommendations lack any indication as to why http://google.com/ does not make a good identifier for David Booth, or why it would be a good idea to consult http://xmlns.com/foaf/spec/ to find out what http://xmlns.com/foaf/0.1/name might refer to.

The linked data community has adopted a particular interpretation of URIs in RDF that is widely deployed and well-documented in tutorials and other non-W3C documents. The W3C documents should be updated to reflect this. Most of the ingredients can actually already be found scattered throughout W3C materials: the AWWW document, the httpRange-14 TAG Finding (which is just an email message), the Cool URIs for the Semantic Web Note. They should be tied together into a coherent document.


Why do this work now?
  • Current practice is not reflected in the specs
  • RDF spec treats URIs as opaque names, and that does not reflect deployed reality
  • Clear guidance from a W3C recommendation will aid adoption
  • Foundation for further work on provenance, operational protocols for linked data clients, etc
Reasons to not do this now
  • Current practice is already in place and is working (somewhat).
  • This may not be RDF Core
  • Use of URIs in RDF is still evolving
Proposals
  • Update RDF Semantics to reflect the fact that most URIs are not just opaque strings, but can be resolved to learn more about the referent
  • Named graphs as formalism: The web can be seen as a set of named graphs. Good foundation for further work on provenance etc
Likely Technical Issues
  • How does this relate to RDF semantics?
  • 303 and other
  • Hash vs. slash
  • Widespread unhappiness with the AWWW definition of information resource?
People Interested in Doing The Work
  • David Booth
  • Richard Cyganiak
  • David Wood


References

5.2 Document the social contract around the intended referent of a URI

What is a URI intended to name? What are the responsibilities and expectations of the URI owner, RDF statement author and RDF consumer? How far to dig, in doing follow-your-nose? Compute the complete transitive closure? Resolve ontology URIs? How can the RDF consumer determine what meaning the RDF author intended to convey (whether or not the RDF consumer chooses to use it)?


Why do this work now?
  • Different people are taking different approaches and this is creating lots of confusion.
  • W3C cannot afford more than one WG, so anything that needs to be done in RDF needs to be here.
Reasons to not do this now
  • Seems too much like R&D—too early to standardize
  • Doesn't belong in RDF core—more of a best practices WG.
  • This sounds more like social practice, not technical standard. (Counter argument: AWWW)
Proposals
Likely Technical Issues
  • How does this relate to RDF semantics?
  • How deep to dig in computing the transitive closure of following links?
  • Relationship to inference
People Interested in Doing The Work
  • David Booth


References

5.3 Co-reference vocabulary as alternative to owl:sameAs

Co-reference and other similarity relationships cause issues in linked data. Current practice is to overload the use of owl:sameAs. This is causing difficulties. There are complaints of misuse. We need more guidance and/or vocabulary. One proposal might provide only guidance; another might provide new vocabulary or adopt existing vocabulary.


Why do this work now?
  • Linked data techniques that include owl:sameAs are now being more broadly deployed in other projects. We need to address this before it gets worse.
Reasons to not do this now
  • Maybe too late.
  • SKOS has already defined additional terms. (Counter argument: there still needs to be a best practice document that explains when to use owl:sameAs and when to use SKOS.)
Proposals
Likely Technical Issues
  • Defining the semantics of a weaker owl:sameAs
  • Choosing the weaker owl:sameAs vocabulary
  • Does this belong in RDF or in OWL? This may lead to lengthy arguments. Is the predicate supposed to be viewed under RDF semantics or OWL semantics?
People Interested in Doing The Work
  • David Booth
  • Guus Schreiber
  • Bernard Vatant


References