This document contains information about embedding metadata in W3C Technical Reports (TR) using RDFa.
This document is for review by the Semantic Web Deployment Working Group (SWD) and is subject to change without notice. This document has no formal standing within W3C. Please consult the group's home page and the W3C technical reports index for information about the latest publications by this group.
W3C publishes a number of Technical Reports (TR). Prior to publication, these documents are checked against some strict publication rules ("pubrules"). Once published, these documents are indexed at http://www.w3.org/TR/.
In their current version, pubrules do not require that machine-readable explicit and comprehensive metadata are added to the documents. However, pubrules dictate that the documents themselves must contain a notable amount of self-descriptive data in their headers and their first paragraphs. These information pieces must be formatted and edited according to some conventions.
The W3C internal "TR Automation" Project aims to simplify the publication of Technical Reports. It has produced a XSLT style sheet [XSLT2] that exploits the strict formatting rules of Technical Reports to generate metadata about them in RDF [RDFPrimer]. This style sheet is used at W3C to keep an up-to-date RDF document containing descriptions of all the documents published under http://www.w3.org/TR/. The present document discusses a different approach, based on making the metadata explicit in the document using RDFa [RDFaPrimer].
A combination of some W3C and third-party vocabularies can be used to formally capture the Technical Reports metadata in RDF. The following list summarizes these vocabularies:
Note that some of these vocabularies are published by W3C, but they have no formal standing (they are not W3C Recommendations).
In the following, it is assumed that the following namespace aliases are defined:
An analysis of W3C Technical Reports and their associated publication process shows that there are several pieces of metadata which could be useful to associate to the documents. The following table is a non-exhaustive list of the metadata. For each piece, a suggestion is made on which RDF properties can be used to encode them:
|Metadata item||Suggested properties||Use notes|
|Document title and subtitle||dct:title
Editors' Note: Which property for subtitles?
|Maturity level of the document: Working Draft, Note, Recommendation...||Declare the document an instance of one of these classes: rec:REC, rec:NOTE, rec:WD...|
|Link to the first published version|
|Link to previous published version||doc:obsoletes|
|Link to previous documents that are obsoleted or superseded by the present version (i.e.: "replaces")||rec:supersedes|
|Link to the most up-to-date published version of the current document||doc:versionOf|
|Link to the implementation report||mat:hasImplReport|
|Link to the errata||mat:hasErrata|
|Link to translated versions||mat:hasTranslations|
|Link to the W3C Activity that has produced the document||rec:cites|
|Link to the W3C Working Group that has produced the document||org:deliveredBy, con:homePage||Describe the WG as an anonymous resource|
|Name, affiliation and contact address of the editors / authors||rec:editor, con:fullName, con:mailbox||Describe each editor as a different resource. The FOAF vocabulary [FOAF] can be also used to create expressive descriptions.|
|Link to the patent policy the Working Group is operating under||org:patentRules|
|Deadline for feedback (e.g., for comments to Last Call documents, implementation feedback, etc.)||rec:lastCallFeedBackDue, rec:implementationFeedbackDue, rec:lastCallFeedBackDue|
|Links to / full citations of referenced documents, which can be normative and non-normative|
|Contact details to send feedback to, typically the Working Group public mailing list|
|Link to the archive of received feedback (typically, a link to the archives of the Working Group mailing list)|
|Links to companion documents, for documents which are released as part of a series, such as the RDF specifications.|
|Name of the series editor|
|Link to the patent disclosure statements|
|Link to the changelog|
The list above is a superset of the metadata that is extracted by the style sheet of the TR automation process. The latter can be easily obtained by means of the W3C Online XSLT 2.0 service. For instance, the RDF metadata extracted by the style sheet for this document follows:
Editors' Note: Replace this mock example with more realistic data. The XSLT fails to extract some of these triples because the headings of this document are not complete.
<?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:doc="http://www.w3.org/2000/10/swap/pim/doc#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rec="http://www.w3.org/2001/02pd/rec54#" xmlns:org="http://www.w3.org/2001/04/roadmap/org#" xmlns="http://www.w3.org/2001/02pd/rec54#" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <TRPub rdf:about=""> <dct:date>0001-01-01</dct:date> <dct:title>Adding Metadata to W3C Technical Reports</dct:title> <doc:versionOf rdf:resource=""/> <editor rdf:parseType="Resource"> <contact:fullName>UNKNOWN Diego Berrueta</contact:fullName> </editor> </TRPub> </rdf:RDF>
Note that the XSLT style sheet simply extracts the full name of the editors/authors and their contact address. As part of the W3C internal process to automate the listing of TR documents, this information is later matched against a manually-maintained list of "known" people. The insufficient mark-up in the original documents makes it impossible to fully automatize the extraction of people's data.
Although the RDFa technology [RDFaSyntax] has not reached yet the W3C Recommendation status, the pubrules allow Technical Reports (except for Recommendations) to use XHTML+RDFa (see June 24, 2008 announcement and current TR pubrules concerning normative representations).
RDFa can be used in enrich TR with comprehensive metadata. Moreover, the strict structure enforced by the pubrules makes it easy to decorate the markup with RDFa attributes. In many cases, there is no need to introduce redundant mark-up or data, although fine-grained annotation may require auxiliary mark-up.
At the moment, RDFa has only been specified for XHTML 1.1. Technical Reports using HTML4 or XHTML 1.0 cannot include RDFa attributes, because they will not successfully validate their mark-up. Similarly, those TR editors which use non-HTML formats in their documents (e.g., XML Spec), and later convert them to (X)HTML, must wait until RDFa support becomes available in the tools they use.
The use of RDFa to add metadata to a W3C Technical Report is illustrated by this document, which has been augmented with RDFa markup. It successfully passes the W3C markup validator, and its metadata can be extracted with the W3C RDFa Distiller service. Check the HTML source of this document for details, or read the example below.
Some steps to add RDFa to a Technical Report are described below. Note, however, that the authoriative source about RDFa usage is the RDFa Syntax [RDFaSyntax] and the RDFa Primer [RDFaPrimer]. The present document is not a substitute for any of these sources.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> ... </html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:rec="http://www.w3.org/2001/02pd/rec54#" xmlns:org="http://www.w3.org/2001/04/roadmap/org#" xmlns:mat="http://www.w3.org/2002/05/matrix/vocab#" xmlns:doc="http://www.w3.org/2000/10/swap/pim/doc#" xmlns:con="http://www.w3.org/2000/10/swap/pim/contact#" xmlns:dct="http://purl.org/dc/terms/" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > ... </html>
<body xml:lang="en" typeof="rec:WD" > ... </body>
Editors' Note: The class of "Editor's drafts" is not defined in the rec: ontology. Therefore, this document (and this example) use the rec:WD class, although at this point, the document is not a WD, but a ED.
<h1 id="title" property="dct:title"> Adding Metadata to W3C Technical Reports </h1> <h2 id="w3c-doctype"> W3C Working Draft <span property="dct:date" datatype="xsd:date" content="20080831"> 31 August 2008 </span> </h2>
<dl> ... <dt>Previous version:</dt> <dd> <a rel="doc:obsoletes" href="http://www.example.org/TR/2006/WD-20060314/" >http://www.example.org/TR/2006/WD-20060314/</a> </dd> ... </dl>
<dl> ... <dt>Editors:</dt> <dd rel="rec:editor"> <span typeof="con:Person"> <span property="con:firstName"> Diego </span> <span property="con:familyName"> Berrueta </span> <span rel="owl:sameAs" resource="http://berrueta.net/foaf.rdf#me"/> </span> , Fundación CTIC </dd> ... </dl>
GRDDL [GRDDL] is a W3C Recommendation of a mechanism for declaring that a document contains RDF-compatible data and for linking to algorithms that can extract these data from the document. Typically, these algorithms are codified in XSLT [XSLT2].
Unfortunately, the XSLT style sheet produced by the TR automation project cannot be directly used with GRDDL due to its internal modular structure.
(To be completed).