Role

W3C maintains an internationally distributed network of servers and services that support Public, Member, and Team audiences in pursuit of the Consortium's technical and social objectives. W3C uses these systems to manage its Activities and Working Groups according to W3C Process.

Design

W3C's systems infrastructure is based almost completely on open source software running on Debian GNU/Linux servers. Many of our tools are built using the popular LAMP platform (Linux, Apache, MySql, Perl, PHP, Python scripting/programming languages.)

Status

We try our best to document known outages and disruptions to our services on the Web; should you encounter a problem with one of our services not documented on that page, please let us know at <web-human@w3.org>.

feed Subscribe to this blog's Articles feed

Tracking requests

W3C is a fifteen-years old organization, where plenty of people come to collaborate, with a high variation among them in terms of operating systems, computer proficiency, corporate set up, etc. A number of our users manage to be even more geek than we are in the Systems Team, while for many others, a computer is just a tool that really ought to "just work" (which they still often don't).

The result of that interesting mix is that we have set up over time a fairly large number of tools to facilitate collaboration: several hundreds of mailing lists, an IRC server with a few handy bots, a fine-grained access control system, a questionnaire system, a flexible editing system combined with a robust mirroring scheme, wikis, blogs, various bug and issues tracking system, etc.

And as all these tools are entirely bug-free and work seamlessly together and for all our users (NOT!), we have always had a need to track requests from our users to create and manage various accounts, set up new instances of these tools, correct or work around bugs, etc.

Over the years, the way we have tracked and managed all these requests has evolved, toward somewhat more formalism as the number of our users and tools grew.

When I joined the Systems Team in 2000, our tracking was entirely based on manual scanning of mails threads sent to one of our internal mailing lists: basically, each of us watched for what looked like a request, checked if anybody had responded to it, and if nobody had and the request was something you could manage, and you had some time on your hands, you would take on it.

Given how informal it was, it actually worked quite well, although over time we added some more purely conventional practices: requiring the use of a specific mailing list of a specific type of requests, adding a [closed] flag to the topic of a thread to signal that the request had been dealt with (so that others could simply delete the thread without bothering reading it).

But as the number of tools and users continued to grow, we started to get complaints that some requests had not been dealt with at all, and it became clear that we lacked overall visibility on what needed to be responded.

Back in 2002, I wrote a quick XSLT style sheet that would help us get more visibility on the state of our requests: it took the threaded view of the archive of our request mailing list, and would look for any thread that didn't contain a message starting with our now conventional [closed] flag, and would present a report showing all the requests that hadn't been closed, as well as those that hadn't had an answer at all.

And again, that fairly simple system served us well for quite a few years; some others W3C groups even started re-using it for tracking their issues, and a similar version of the tool based on more Semantic Web technologies was used by a couple of groups to track their specifications' Last Call comments.

But no matter how well that solution worked, we decided last year that we would finally move to a proper tickets tracking system, the well-known open-source RT, to get the following advantages over our existing hack:

  • get a clear view of who was working on what ticket,
  • be able to assign a given ticket to someone, even if that person hadn't picked it up yet
  • easily find tickets that were stalled due to lack of responses from the requester (vs. because we didn't act on it)

The first few months with RT weren't quite so rosy, actually, as we had to find ways to integrate it as smoothly as possible in our current procedures and infrastructures, and with our mailing list habits.

Some of the changes we've brought to it include:

  • make it track messages sent as part of a given thread (as identified by the In-Reply-To header) as belonging to the same ticket (even without the id number included) - with an an existing patch to that end;
  • change the way it modified subject messages (with the Branded Queues extension);
  • fix partially the way it sends messages and notices through the configuration UI;
  • make it understand our [closed] convention so that we could continue using mail as our primary way to close a ticket, using a simple "Scrip" inspired from another RT user's contribution.

There are still some rough edges - RT seems to be particularly reluctant to send messages in CC when using the Web interface for some reasons, we need to integrate it better with our existing accounts system so that our users can better follow progress on their requests -, and some user interface and HTTP behaviors problems that make me cringe.

But overall, I think the tool has certainly helped us regain control over our growing number of requests, and is also hopefully steadily allowing us to offer a better experience to our community.

Leave a comment

Validator Dev Watch: fuzzy matching for unknown elements/attributes

Unknown elements and attributes at top of the validation errors charts

According to MAMA's survey of validation of a few million web pages, the most common validation error is either “There is no attribute X” or “Element X undefined”. In other words, instances where the document uses elements or attributes which are not standard. As explained in the Validator's documentation of errors, the most likely reasons for these errors are:

  1. typos. The user wrote <acornym> when what was really meant was <acronym>. I am not sure if this is the most common error, but it can be a terribly frustrating one. “What do you mean acronym is not a standard element. Of course it is! Oh, wait, I made a typo…”
  2. Non-standard elements. Again, I don't have statistics about which elements/attributes trigger this error most of the times, but I would bet on the <embed> element and the target attribute (which, by the way, is only available in Transitional Doctypes). For those we can't do much, other than recommend using another doctype and point to standard ways of using <object> to display flash content.
  3. Case-sensitive XHTML. This one bites me more often than I'd like to admit. Copy and paste a snippet of code that uses e.g the onLoad attribute, test the functionality in a few browsers – they will gladly oblige – then see the validator throw an error, because of course, in lower-case XHTML, onLoad isn't a known attribute. onload is.

What makes these errors frustrating is not so much the difficulty they present. Anyone carefully reading the error message and the explanation that comes with it will easily fix their markup. Unfortunately, for a number of good and bad reasons, few of us ever read the explanations: those tend to be a bit long, propose possible causes for the problem, and a list of potential solutions – and most people will just ignore or gloss over them.

Suggestive power

One way we found of making the validator more user-friendly here is to escalate the most likely solution up into the error message itself. In other words, compare:

Error Line 12, Column 14: there is no attribute "crass"

<spam crass="foo">typos in attribute and element</span>

  
    lenghty explanation here…
    

with...

Error Line 12, Column 14: there is no attribute "crass". Maybe you meant "class" or "classid"?

<spam crass="foo">typos in attribute and element</span>

  
    same lenghty explanation here…
    

The former is what the latest stable release of the markup validator will output. The latter is what I implemented last week, and can be tested on our test instance of the validator.

How is it implemented?

Since the validator is coded in perl, we looked for perl modules implementing algorithm to calculate edit distance between strings. We found String::Approx, which implements the Levenshtein algorithm. Take this algorith, plug in a list of all known elements and attributes in HTML, and after moderate hacking, my code would very easily find that <spam> should be <span>, and some extra tweaking yielded good results suggesting <acornym> could be corrected as <acronym>.

For some reason however, I could not find a way to make the String::Approx algorithm reliably suggest onload as a replacement for onLoad – it seems to consider character substitution as expensive, regardless of the fact that the substitution is from a character to its uppercase equivalent. A trivial additional test took care of this glitch, and we seem to be all set to have a more usable validator at the upcoming release.

What do you think?

What do you think of this feature? Would you have implemented it differently?

Any suggestion for a better way to word/present the suggested correction for unknown element/attributes? Any thought on other small improvements to the validator which would dramatically improve its usability?

9 comments

:: Next Page >>

W3C Systems Team