SIOC/LinkedDataChecklist

From W3C Wiki

For examples on how to implement the guidlines presented in this page for SIOC concepts see URIs for SIOC

Slicing SIOC data into individual RDF documents

  • Having one RDF document equivalent to each HTML document might be a good idea
    • Pro: simplifies autodiscovery linkage and content negotiation
    • Con: harder to implement because page structure is different for each blog/forum engine
  • Offer different ways to navigate the content, e.g. recent posts, monthly archive, posts by user, posts by topic
    • There will be redundancy in the data; that's not a problem
  • If there is a triple "<aaa> sioc:foo <bbbb>", then that triple should be in the document about <aaa> and in the document about <bbb> (backlinks)
  • For Linked Data, links usually are sioc:something (domain links); but rdfs:seeAlso is also fine where no adequate domain property exists (e.g. see paging)

URIs, information resources vs. things

  • It's important to distinguish information resources and other things
    • Posts are obviously information resources (documents);
    • People are obviously not
    • Some other things are in a grey area; usergroup?
    • Test: Do I want to say something about the thing that would not apply to the document about the thing? Thomas is 31 years old, but the page about him is not 31 years old, so they are different. But a blog post is usually exactly as old as the page that shows the blog post (unless perhaps when we also want to model versions, drafts etc)
  • Recipe to generate URIs for information resources:
    • If we do content negotiation: use same as web page URI
    • If we don't do content negotiation: have a new URI, e.g. http://blog.example.com/sioc/{webpageuri}, where {webpageuri} is the path part of the web page's URI
    • Call this information resource the “SIOC profile for XYZ”
    • Serve RDF description of XYZ
  • Recipe to generate URIs for other resources:
    • Take the URI of the RDF document that describes the resource, and append #something, e.g. #user, #site, #forum (or just #thing, #it, #id)
    • In the case of content negotiation, be careful that the #something can't clash with an anchor on the HTML page

Fun with content negotiation

Content negotiation is actually quite complicated and requires a lot of work to do it right. It's very easy to get it wrong!

  • Consider cases like application/rdf+xml;q=0.1 or text/* or combinations of such patterns
  • Ideally, you want a URI for the negotiated resource (/mypost) and for the HTML (/mypost.html) and for the RDF (/mypost.rdf); the latter two are not strictly required
  • Two options for implementing negotiation
    • 1. by redirecting from /mypost to the appropriate /mypost.{rdf|html} (downside: extra HTTP roundtrip => you loose 0.3 seconds)
    • 2. by serving the content directly at /mypost, and add Content-Location: /mypost.{rdf|html} header
    • The second option should be preferred in almost all cases
  • Anything subject to negotiation should have Vary: Accept header

Paging

Blogs and forums require paging in many places, e.g. for long lists of posts or comments. Without paging, the HTML pages would become too large and would require too much bandwidth and database resources. The same applies to the SIOC RDF documents, so for example a very large SIOC Forum needs to be broken down into several smaller RDF documents, each with for example 20 posts.

  • rdfs:seeAlso links from the SIOC item (e.g. forum) to the URL of the next and previous RDF document
  • A dedicated paging vocabulary would be good (with pv:nextPage rdfs:subClassOf rdfs:seeAlso), because other projects also need this, but this is left as future work

Document metadata

  • Add metadata to RDF documents (SIOC profiles): dc:title, dc:description, foaf:primaryTopic etc