As anyone who has been involved in W3C will tell you, making open standards means a lot of discussions, and in the case of W3C, a lot of discussions by e-mail.
W3C mailing-lists are where a lot of things happen, and so the organization has been dedicating a lot of resources to its mail infrastructure over the year, often being at the forefront of the fight against spam, so much that the “World Wide Web Consortium” sometimes get nicknamed “SMTP Consortium”.
The flow of mail can be dreadful sometimes. For example, I think I must be receiving several hundreds, perhaps a thousand, of mails per day, from W3C participants or mailing-lists. And our system tells me that in my few years around here, I’ve sent over 10,000 mails to W3C lists.
It is fortunate that we have some simple but cool systems to manage this information overload:
- all w3c mailing-lists get archived on the Web
- Almost every e-mail has a
Message-Ididentifier. That’s convenient! So we can use that to provide a simple, but effective gateway from Mail to Web, and immediately find any mail ever archived at W3C by its identifier
- Web archives means… Web-based search, too
- There are also feeds for list discussions, and even a Mail to Usenet gateway
What all these systems bring, in a nutshell, is a choice in how one can find, access, and work on this mass of information exchanged by e-mail. This choice is good: it really leverages the power of the public web, the ability to harvest, process and enhance data. So it is quite exciting to see how a new actor can take this data and do cool things with it.
Yesterday, the Markmail team announced that they were opening a new search interface to the public w3c lists, with more than 400,000 messages loaded (less than our in-house search system, but quite impressive nevertheless!). With some nice ideas and a lot of Web Technology magic (XML , XQuery, HTML, PNG, CSS, DOM…) they managed to come up with a very interesting interface indeed. I can only wish all this were open source ;) .
Beyond the mere value of being able to access the e-mail discussion contents via many interfaces, there is a serious lesson to be noted: what gets published on the public web is likely to stay on the public web forever. Indeed, even what we publish on our personal sites, content we feel free to “remove” at any point, may be indexed, cached, quoted, or even mirrored, copied, reused… Better try and behave accordingly.