W3C Group Pages: metadata extraction experiment

This page describes Amy and Dan's experiments with extracting information from W3C group pages. This document, and the sample cdocument (longer template) that accompanies it are publically visible. Some work referenced here may be Team or Member access only.

Draft templates

XSLT experiment

(based on web service created for the RSS extraction tool)

XSL file:

XML data:


Related work


We want to make it easier to find out various things about the Working, Interest and Coordination Groups chartered within W3C process. We have some of this information in RDF already, while some of it is available only as prose within charter, home page and activity statement documents. Amy has collected (by hand) some information on group expiry date. Rather than do this again, we're trying to come up with some recommended HTML-based conventions that could be included in all such documents, so this info can be extracted automatically (eg. with XSLT) next time. We're starting by working on a demo of a WG homepage and/or charter doc that includes some usefully extractable data...

See sample group page for some properties related to working groups that we are interested in gathering. The basics are those relating to schedule, associated people, activity/domain. A more ambitious project might try to extract a lot more information, eg. inter-group dependencies, versioning history, deliverables, full schedule etc.

Issues, Next steps

Stuff we've run into, or to think about later...

$Id: Overview.html,v 1.11 2002/04/23 17:33:05 danbri Exp $