W3C Working Group Home Page Style Guidelines
Quick Summary

The brief text below is a condensed extract of the full guide. These condensed points recall the major details in the guidelines; the deeper explanations and helpful examples in the full guide are more instructional while this document serves as a quick reference sheet.

Test Your Page

Use the W3C XSLT Service to check if the scraper understands your page. Replace the tidy URI if your page is already in XHTML. Also supply a base URI (probably the same as the URI of your page).

Examples: see the QA Working Group's implementation (view the page source) and the resulting information extraction (in RDF), the Web Ontology Language Working Group's implementation and results, and the CSS Working Group's implementation and static results.

Page URI:

Base URI:

Working Group Name:

Proxy basic authentication for your page


Processing working group pages is known to work for public XHTML pages and isolated member-confidential XHTML or HTML 4 pages. Support for cases where other working group pages need to be accessed, either for processing or tidying via the XSLT, is in the works. Testing is possible on a page-by-page basis. Report problems and issues to Ryan Lee (also copying to w3c-tools@w3.org).

The XSLT processing may take some time.


The <head> must have a profile attribute of http://www.w3.org/2002/12/wg

Group Characteristics

The name of the working group is derived from the <title> element if the name is not provided to the parser beforehand.

The activity name is found within an element with class activity.

A summary of the group's purpose is found within an element with class summary.

The link to the charter is found in an <a> element with rel attribute of value charter.

Referencing External Documents

Authoritative information located on pages other than the main page can be referred to by using a rel attribute on anchor or link elements with a value of one of the respective information classes: news, drafts, deliverables, participants, meetings, teleconferences.

Blocked Method

See the guidelines on Blocked Method. Recognized block-level tags are: <ul>, <ol>, <dl>, and <div> blocks with <p> items; <table> is discussed below. A summary of the information:

Information Class class rel
(none) activity, summary charter, activity
news title, link, description, date
drafts title, description, date details
deliverables title, description, date
one of: note, wd, lc, ends, cr, pr, rec
details, versionof
participants name, email, lastname, firstname, organization
phone, role
meetings description, date agenda, minutes
teleconferences description, date agenda, minutes

See also the rules for parsing an item for each information class.

Tabled Method

See the guidelines on Tabled Method. The <table> element uses the information class, and each cell of the first row may take one of the classes listen per each information class. Unmarked columns are ignored.

See above table for table classes and column classes, and note that a column class extends for the entire column; they are not used in extracting information per cell (the semantics are different for this method, though the keywords remain the same).

Advanced Method

See the style guideline notes on Advanced Method. The Advanced Method is based on marking individual items that belong to an information class. Possibly contained in the item are marked elements holding more specific information.

See also the rules for parsing an item for each information class (the same rules as Blocked Method items).