Source Format Refactoring

From WCAG WG


This page collects requirements and considerations for modernizing the source format for WCAG 2.0 and supporting resources.

Introduction

For the past decade or longer the WCAG sources have been maintained in a customized version of XML Spec. This is an XML language intended to facilitate creation of W3C specifications. The WCAG extensions support cross relationships between the guidelines, Understanding, and Techniques, and enforces a specific structure for the materials. XSLT generators output various forms of the materials with cross references and inclusions.

This approach to producing the sources has been very helpful to maintaining consistency and managing the high inter-relationship of content. However, it has required specific knowledge of a highly custom document type to edit documents, is highly sensitive to variation, and ongoing maintenance has made the toolchain very unwieldy. With a small set of dedicated editors these were acceptable costs. Now the group desires to expand opportunities for editing and public input, which would be better supported by a different format that makes use of more modern technologies.

The current source format is hastily introduced at WCAG Sources. Sources are now maintained in a GitHub WCAG repository.

Problems with the current format

Problems with the current format, that it would be desirable not to repeat in the new format, include:

  • The XML Spec format uses features that should be output to HTML, yet aren't the same so are hard to learn.
  • The WCAG customizations have diverged from the XML Spec format considerably, reducing its utility.
  • Massive amounts of content are maintained in single XML file, making it harder to make local edits.
  • Much English-language content exists in XSLT, making translation from the sources difficult.
  • The XSLTs use a complex set of includes where templates interact with each other in unpredictable ways, making maintenance very difficult.
  • Creation of diff indicators must be done manually.

Requirements for new format

The Working Group is considering the following requirements for source formats. These will be prioritized and some may not be met, but all should be considered.

  • Use a source format that is easy to learn, particularly by people familiar with HTML.
  • Use validation features to support consistency of structure.
  • Do not impose structural rules that are not needed for the chosen level of consistency.
  • Avoid duplication of content.
  • Break the content into smaller chunks that can be updated separately.
  • Support change histories so dependent entities can know when major or minor changes have been made.
  • Automate the creation of diff indicators.
  • Continue to output the various formats now output:
    • Single-file versions of Understanding and Techniques
    • Per-SC versions of Understanding
    • Per-Technology versions of Techniques
    • Per-Technique versions of Techniques
    • Diff-marked versions of Understanding and Techniques
    • How to Meet / Quickref
    • PDF versions
    • Epub versions
    • Zipped downloadable versions
    • Translations
  • Make it easy for WG members to edit the content.
  • Clearly separate proposed edits from consensus-approved edits.
  • Make it easy for the public to suggest edits.
  • Make it easy for the public to submit techniques that meet the WG's acceptance requirements.
  • Support annotations.
  • When people commit edits, they can easily see in a generated version what it will look like.
  • Do not allow URIs and IDs of the output to change.
  • Support translatability of the sources.
  • Ability to indicate changes that have been made but are contested.
  • Allow multiple test procedures, with meaningful categorizations (ACTION-113)
  • Make publication-specific metadata, such as pub date and maturity level, part of the build rather than part of the source.
  • Ability to output a structured data format.

Technical considerations

Some technical decisions that need to be made when considering how to meet the requirements:

  • Should we use an XML format (optionally that uses a lot of HTML in it), or should we use vanilla HTML with special markup (sections, ids, classes) to enforce structure? Or other (e.g., wiki markdown, etc.)?
  • Should we use an XSLT-based generator, or a script-based generator? Or other?
  • The current expectation is that sources will continue to be hosted in GitHub. This provides features for branching, history tracking, and outside input that we may want to exploit.

Work plan

The refactoring of sources will proceed according to the following plan:

  • Collect and prioritize requirements
  • Develop a proposed format that appears to meet the requirements.
  • Transform the existing materials into the proposed formats, in a testing branch.
  • Develop the generator that outputs the desired final materials.
  • Test the generator's output. Revise the format as needed, and revisit requirements if needed.
  • Approve the format and generator output.
  • Freeze edits on the sources prior to the following steps. This should probably be done immediately after a TR publication is completed.
  • Re-output the latest version of the sources into the chosen new format.
  • Commit the new version of the sources and generator into the working branch.
  • Un-freeze edits, using the new sources henceforth.