DereferenceURI

From W3C Wiki
Revision as of 11:00, 11 August 2012 by Zruset (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Dereferencing a URI to RDF

The operation of dereferencing a URI is a crucial one on the Web. This article discuses the operation on the Semantic Web. The Semantic Web is in essence a mapping of URIs to logic statements in some language.

In Cwm, the mapping is represenetd by the log:semantics property, which relates a information resorce to the N3 formula which results from accessing the information resource on the web and extracting the N3 semantics from it.

While in the long term this function may be extended with new technology, in general a common understanding of that mapping is really valuable for consistent communication. This is an attempt to capture existing practices. It may also show a need for standards to bridge a gap in the FollowYourNose path.

It is useful that when a URI for something is dereferenced, that the information returned contains information about the thing identified by the URI.

This article currently deals with the dereferencing of URIs starting with HTTP. It only deals with the RDF semantics of resources.

The Hash

The hash, #, separates the URI to be looked up, here 'document URI' part of the URI. When a hash is present, the lookup operation is performed on the document URI.

HTTP use

HTTP Accept headers

The accept headers should be set to the languages suppoorted, according to this convention application/rdf+xml and text/rdf+n3 .

HTTP Content-type

The content type returned is definitive, and determines the way information can be extracted. The following sections detail the treatment of various content types.

application/rdf+xml: RDF/XML

This is parsed according to the spec @@ref. An RDF graph is returned.

text/rdf+n3: Notation 3

This is parsed according to the spec, returning an N3 formula, in many cases an RDF graph.

text/html: XHTML

Information is extracted from an HTML document in any of the following ways.

text/html: HTML

If the document is not valid XHTML, then is tidied until it is, and then treated as XHTML, as follows.

Embedding RDF in XHTML

GRDDL

If

  • there is a profile URI as a profile attribute to the root element
 (only for an XHTML or HTML document) , OR
  • there is dataview:transformation attribute (where
  xmlns:data-view="http://www.w3.org/2003/g/data-view#") on the root element

then the GRDDL spec gives the algorithm for finding and using a transfrom file to apply to the page to get an RDF graph. See GRDDL spec.

This is way that Microformats, non-RDF RSS, etc can have RDF semantics. It is important, therefore, that users of these formats remember the GRDDL hook attribute to give the RDF-seeking client.

Rel Equals Meta

If there is link which has an attribute rel="meta" (See HTML4 spec), then the href attribute gives the URI of a specification which may be dereferenced for RDF semantics. This is common practice, not a spec (yet?). It is not in the list in the spec.

(There are other linking systems which have been used, but none which has a clear consensus route to RDF. See for example RSS autodiscovery)

XML

This section is highly presumptive and is not licensed by any of the stack of specifications. Moreover, RDF/XML documents need not have a root rdf:RDF.]

If the document is XML (the content-type contains "+xml" or is "application/xml"), then:

  • If the ROOT element's namespace is the RDF namespace and the root element is <RDF> then it is parsed as a RDF/XML (q.v)
  • If the root element's namespace is eth XHTML namespace and the root elemnt is <html>, then the document is treated as XHTML (q.v)
  • If the document is XML but not recongized as RDF or XHTML namespace, then the GRDDL method is be used. (see above)

text/plain

This is NOT any way to serve RDF or anything with other than human-readable content.

Any attempt to parse text/plain as anything else should be accomapnied by warnings and beeps and fireworks.

Prospective Dereferencing

It's worth noting that for a URI without a definitive content-type returned by a request (with accept headers) there is some merit in attemptng to parse it as Notation 3 first. A Notation 3 document will always require less effort to parse than it's RDF/XML equivalent. Upon failing an attempt to parse a Notation 3 document in this scenario, the client could then proceed through some incremental steps (stopping at any point of having successfully extracted RDF) to 'snif' RDF from the unknown URI. For example:

  • Attempt to parse as XML (following the steps outlined above if successful)
  • Attempt to tidy the content and treat as XHTML (following the steps above)

HTTP Caching

Wherever possible, HTTP caching should be leveraged to minimize any redundancy when dereferencing a URI repeatedly. Redfoot's program loading mechanism stores the HTTP headers of a dereference URI as RDF statements about the URI in a provenance graph for this sole purpose. A general protocol is outlined. See also: A RESTful Scutter Protocol for Redfoot Kernel

Recursive Dereferencing

RDF Graphs are very often linked by the following relations:

The last two have stronger semantics and almost always link to an RDF graph. It's common practice for RDF clients to attempt to follow such links to a fixed recursion depth and doing so introduces a high risk of circular references (think FOAF networks). HTTP caching can help aleviate this risk by essentially allowing a traversal through such links to avoid previously visited nodes.

Recursive dereferencing results in a Web Closure of the linked graphs