Why the QA Matrix in RDF?

Or why RDF has made our life easier.

The QA Matrix gives information about W3C specifications useful for implementors and formatted in a synthesized table. This document is a tale of why it has been beneficial to use the RDF model to manage this information. If you are looking for information on how the pipes are working for The Matrix (programs, scripts, etc), you will find information in the manual to edit The QA Matrix.

The information is published on the Web site by using an XSLT which transforms the RDF/XML information into XHTML.

The past and its hurdles

At first, it was managed using an XML vocabulary. A typical piece of information for a spec was organized as in this excerpt for CSS 1.

<spec>
   
<uri>http://www.w3.org/TR/REC-CSS1</uri>
<name>Cascading Style Sheets, level 1</name>
<acronym>CSS1</acronym>

<history>
<status name="rec" 
    uri="http://www.w3.org/TR/1999/REC-CSS1-19990111" 
    date="1999-01-11"/>
</history>

<validator status="yes">
   <uri>http://jigsaw.w3.org/css-validator/</uri>
</validator>

<ts origin="w3c">
   <uri>http://www.w3.org/Style/CSS/Test/</uri>
</ts>

<related>

   <info>
<uri>http://www.w3.org/Style/CSS/Errata/REC-CSS1-19990111-errata</uri>
      <text>Errata</text>
   </info>

   <info>
      <uri>http://www.w3.org/TR/CSS-access</uri>
      <text>Accessibility feature of CSS</text>
   </info>

</related>
<conf>
<uri>http://www.w3.org/TR/1999/REC-CSS1-19990111#css1-conformance</uri>
</conf>
</spec>

The file snippet of information seems easy to read, but the file is very constraining to edit even if "humanly" readable. This modeling has a few issues that we have identified:

Information redundancy: Part of the information present in The Matrix is already available on the W3C technical reports index (the "TR page"). The W3C Webmaster, who is in charge of publishing W3C technical reports, maintains an exact record of all the information related to the TR space. The information was in two different places: in the XML file and on the TR page.
Typos, spelling mistakes: The information (specification names, URIs, etc.) has to be taken from other sources, e.g. the TR page. During this step, it was quite easy to make typos or spelling mistakes, resulting in information which is different in the TR space and in the QA matrix.
Unreadability of the file: We said earlier that the XML file was easy to read; in fact that's not completely true. When the file gets bigger and bigger, it becomes very difficult to update the information in a flow of very dense data.
Weakness of the tree structure: If the maintainer of The Matrix was not very careful in the ordering of the XML tags in a certain section, the XSLT which creates the XHTML version could break. XML information doesn't have an organization model of data establishing the relationships between data. Therefore the final result depends on the way the XML file was edited.

All these reasons individually are cases in favor of managing The Matrix using an RDF model.

A new model for The Matrix.

But was switching to RDF really necessary? We had first to establish a set of requirements for the new system. A database would have been possible but that's overkill engineering for this task which consists of maintaining one file and publishing an HTML version of it.

Requirements

Easier to maintain (more tolerant, more straightforward)
Avoiding duplication of information

Using an RDF system to manage The Matrix was not a religious choice, but a practical choice. The abstract model of RDF, which establishes formal relationships between objects, gives a more constraining model for the data and therefore a less constraining way of managing them.

The W3C Webmaster keeps a record of the W3C specifications in an RDF file, called tr.rdf. This file contains information like the title, status of the specification, publication date, names of the editors, etc. The document Technical Reports Management Automation explains how the TR space and the RDF file for specifications are managed.

Preparation of the data

We had to design a QA vocabulary (done once and extensible if needed) for The Matrix in an RDF schema and we had to convert the information existing in the XML file to a base of knowledge formatted in Notation3 (N3). The choice of N3 (a non-XML syntax for RDF) was based on its compactness, making the file to maintain less bloated.

This is an excerpt of The Matrix base file in N3 for the CSS1 Recommendation:

<http://www.w3.org/TR/REC-CSS1> a :Work;

:hasAcronym "CSS1";
:hasConformanceSection 
    <http://www.w3.org/TR/1999/REC-CSS1-19990111#css1-conformance>;
:hasErrata 
    <http://www.w3.org/Style/CSS/Errata/REC-CSS1-19990111-errata>;
:hasTestSuite 
    <http://www.w3.org/Style/CSS/Test/>;
:hasValidator 
    <http://jigsaw.w3.org/css-validator/>;
:isRelatedTo 
    <http://www.w3.org/TR/CSS-access> .

At first glance, you can definitely see that it's "easier to read" than the XML version. You can notice that information about last version, status, and title of the specifications has disappeared. We don't have to add them in this file; they are managed by another person (the W3C Webmaster) in another file.

Why is it easier?

Let us discuss how the management of the QA matrix has been made easier by using an RDF model.

Easier: no redundancy

We do not have to look anymore for information already defined in the W3C Webmaster file for the TR space (tr.rdf). The unique reference they share is the URI of the specification. The rest, like the title or the last status of the specification, is automatically extracted. We make fewer mistakes (typos, spelling), and we gain time by not looking for the information and typing it.

Let's look again at the previous excerpt for CSS 1.

<http://www.w3.org/TR/REC-CSS1> a :Work;

:hasAcronym "CSS1";

[...]

:isRelatedTo 
    <http://www.w3.org/TR/CSS-access> .

As you can see in the file, there is no information about the status of the specification (Working Draft, Proposed Recommendation, etc); there isn't the information about the title of the specification (Cascading Style Sheets Language Level 1 in this case). All this information is automatically extracted from the W3C Webmaster file.

Easier: RDF is for the lazy person

Maintaining

RDF is definitely made for the lazy person. Let's say a new specification has been released and we want to add it to the QA matrix. We can just add it by writing the URI of reference, and the set of property with its respective values, and that's it. For example, let's say, we have LazyML.

# this is an excerpt of the file matrix-base.n3

<http://www.w3.org/TR/LazyML> a :Work;

:hasAcronym "Lazy" .

We don't have all the information yet. It's not a problem; we will add it later without breaking anything.

Adding more information

Imagine we decide to start to keep track of implementation reports for each specification. We will add to the RDF schema, the property hasImplementationReport. The maintainer of the RDF schema is not necessarily the same person as the one who is adding information to the QA matrix file (matrix-base.n3). After defining the new property in the RDF schema, we can use it without the need to be complete for all the previous data of the file. We will just add it step by step.

Let's say, we have only the implementation report for CSS1. The maintainer is now able to add a new entry to The QA Matrix file.

# this is an excerpt of the file matrix-base.n3

<http://www.w3.org/TR/REC-CSS1> a :Work;

:hasImplementationReport 
    <http://www.w3.org/example/CSS1-IR> .


<http://www.w3.org/example/CSS1-IR> 
	dc:title """Implementation Report for CSS 1""" .

It is straightforward, and we don't depend on any ordering constraints. We could add it in the middle of the file, at the end, or at the start.

Easier: Extreme flexibility

We have seen that the RDF model was giving us flexibility in the editing of our file. It also gives flexibility in the global management of information. Once the schema and vocabulary are defined, we don't have to worry anymore about updating the information we use in other files. Practically it means that the W3C Webmaster is managing the TR file and the person in charge of the QA matrix is managing the N3 file without needing to coordinate for updates. The RDF model gives great flexibility in the management of shared and disconnected information.

Time for a drink

As we have seen, The Matrix in RDF has made management easier and made us save time. It's time to have a drink with this new free time. Oh yes, before we disappear for this drink: another benefit, not for us this time, but for you. If you want to reuse the data of the matrix in RDF, you don't have to create an XSLT to transform your data from one XML data format to another one and try to figure out to make them compatible with your data. You can just use the file as we did for the TR space. The information can be used right away without worrying about the organization, because it's already organized.