Linking patterns

From Semantic Web Standards

At the heart of Linked Data is a simple idea: a client searches its “current data” for links, follows them (normally via HTTP), and receives “more data”.

This “more data” is subject to a number of variables. Among them:

  • Significance and/or authority. “More data” can be critical to any processing application, or merely extra bits of limited interest.
  • Model and syntax. “More data” could be in any of a number of RDF serializations, or in other machine-readable formats like “plain old” XML, or even human-readable prose in HTML or PDF.
  • Vocabulary. In the case of machine-readable formats, especially RDF, the same information can be expressed with various common or custom vocabularies.
  • Size. Data about a city could include as little as its name, or as much as a complete list of streets and major buildings, potentially multiple MiB in size.

Different applications have different needs when it comes to gathering “more data” about resources. For example, a generic RDF crawler might follow all links to build up a comprehensive database, but it might not be interested in non-RDF objects. On the other hand, specialized user agents may seek specific info about a resource of interest, and may be prepared to glean it from a variety of sources from RDF to natural language.

All these applications could benefit from a shared set of linking patterns, which would allow a link to describe itself with regard to the above variables. Then, a Linked Data client could simply look at a link and decide if it wants to follow it or not.

A number of such patterns already exist, but they are scattered over many documents, or often not specified at all, and there is some confusion as evidenced by a January 2011 thread on the public-lod mailing list. This wiki page aims to gather informal recommendations on common linking patterns.

“Follow your nose”

The most basic linking mechanism in Linked Data stems from the original guideline:

When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)

To dereference a URI in search for information about it is sometimes called to “follow your nose”. It works because URIs in Linked Data are HTTP URIs, and moreover specially designed for such usage.

Information thus received is often though of as “authoritative”, because it comes from the same domain as the URI, so the “owner” of the information and the “owner” of the resource are the same. The Linked Data tutorial discusses in detail what this “authoritative” description should include.

“Nose-following” does not work with information resources that cannot embed RDF. For example, the W3C logo in PNG format cannot describe itself at its own URI. In this case, other approaches can be followed.

Existing RDF terms

rdfs:seeAlso

rdfs:seeAlso is a widely-used predicate for linking a resource to information about it. It is defined by the RDF Schema specification:

A triple of the form: S rdfs:seeAlso O states that the resource O may provide additional information about S. It may be possible to retrieve representations of O from the Web, but this is not required. When such representations may be retrieved, no constraints are placed on the format of those representations.

However, some believe that rdfs:seeAlso should only point to RDF data of limited size, in particular because of how it is used in the FOAF project and the Tabulator data browser.

Other examples of seeAlso usage:

  • the WordPress SIOC exporter mirrors the blog’s pagination in its RDF output and uses rdfs:seeAlso to point to other pages
  • Yahoo! SearchMonkey recommends rdfs:seeAlso (together with media:image) for pointing to an image representing a product

rdfs:isDefinedBy

The RDF Schema spec also introduces isDefinedBy—a subproperty of seeAlso that is defined thus:

A triple of the form: S rdfs:isDefinedBy O states that the resource O defines S.

This property is used for linking RDFS classes and properties to the schema documents that define them, see for example the organization ontology.

Questions: what exactly does “define” mean? can isDefinedBy be used as an equivalent to nose-following when the latter is not available for some technical reasons?

wdrs:describedby

A predicate defined by the POWDER specification, intended for linking a resource to its description. It has been discussed as an equivalent to rdfs:isDefinedBy for instance (non-vocabulary) data, especially in the context of 303-less “toucan publishing”. So maybe it can be used as an equivalent to nose-following, again in cases where the latter is problematic. This is however complicated by an error in the POWDER spec.

FIXME: The “toucan publishing” link is now broken. The current one might be http://blog.iandavis.com/2010/11/04/is-303-really-necessary/ .

foaf:page and foaf:homepage

foaf:page, defined in the FOAF vocabulary, “relates a thing to a document about that thing”. Since FOAF’s notion of “document” includes all kinds of data, foaf:page is theoretically equivalent to rdfs:seeAlso. In practice, however, foaf:page is more often? used to link to human-readable documents, whereas seeAlso is better suited for links to more RDF.

foaf:homepage is a specialization of foaf:page:

A 'homepage' in this sense is a public Web document, typically but not necessarily available in HTML format. The page has as a topic the thing whose homepage it is. The homepage is usually controlled, edited or published by the thing whose homepage it is; as such one might look to a homepage for information on its owner from its owner. This works for people, companies, organisations etc.

Note that foaf:homepage is an inverse functional property, i.e. two things cannot have the same foaf:homepage.

XHTML vocabulary

The XHTML+RDFa spec defines several navigational predicates, among them xhv:prev, xhv:next, xhv:section, xhv:first, xhv:last. It doesn’t mention any limits on the resources’ types, so presumably? they could be RDF as well as anything else.

FIXME: where’s the equivalent for RDFa 1.1?

  • In particular, the Linked Data API uses these terms for linking between pages in list output. However, their semantics are broader, for example xhv:next can point to a next chapter in a book.

dc:format

A “format” property is available both in the new and old Dublin Core namespaces. It could be used in conjunction with linking predicates to indicate the format of data in advance. For example:

</id/something> foaf:page </paper.pdf> .
</paper.pdf> dc:format <http://example.net/application/pdf> .

→ from this an LD client would know that paper.pdf is a PDF document, and there’s probably no use retrieving it (if the client only expects RDF triples).

FIXME: what are the real URIs for Internet media types?

xtypes

</id/something> rdfs:seeAlso </mysterious-url> .
</mysterious-url> rdf:type xtypes:Document-RDFSerialisation .

See http://purl.org/xtypes/ suggestions on improvement of this namespace to cjg@ecs.soton.ac.uk