Architectural Specifications for the World Wide Web and their Role for Language Resources

Henry S. Thompson, Liam Quin, Felix Sasaki, Michael Sperberg-McQueen
World Wide Web Consortium (W3C)
27 May 2008

Slides from a talk at LREC 2008 Language Resources and Standards workshop.
The full paper is available.



What is the W3C?

Interoperability: A shared goal

Some history

What place standards?

Validation and interoperability

W3C XML Schema (XSD) 1.0

XSD 1.1

XSD 1.1 changes to support change

Integrity matters

XPath/XQuery Full Text: Getting exactly what you want

What is XML pipelining?

Divide and conquer

Pipeline benefit summary

Example XProc pipeline

Internationalization of XML

Summary of ITS data categories (1)

Summary of ITS data categories (2)

Using ITS locally

<help xmlns:its="" its:version="1.0"> [...]
  <p>To re-compile all the modules of the Zebulon toolkit you need to go in the
     its:translate="no">\Zebulon\Current Source\binary</path> directory.
    Then from there, run batch file 
<cmd its:translate="no">Build.bat</cmd>.</p> [...]



The next sections were shown only briefly during the talk

XML pipeline worldview

Benefits of 'divide and conquer'

Pipeline requirement

"Terminology" example

<doc its:version="1.0" xmlns:its="">
 <section xml:id="S001">
  <par>A <kw its:term="yes" 
also known as a <kw its:term="yes">logic <span its:term="yes">board</span></kw> on
Apple Computers, is the primary circuit board making up a modern computer.</par>

The next sections were not shown at all during the talk

Pipeline Concepts

Atomic Pipeline Steps

Compound Pipeline Steps

Steps can be grouped into pipelines

Pipelines are Steps

A Little Terminology

The XML Processing Model WG

Some observations


Compound steps

Language Constructs

Conditional processing


Selective processing

Exception handling

Building libraries

Standard step library


XProc conclusions

What's new in XML Schema 1.1?

More powerful all-groups


Conditional type assignment

Wildcard changes


Room to grow? The version problem

Open content

Multiple versions of XML Schema after 1.1

XML Schema 1.1 conclusions

Overall conclusions

Using ITS globally (1)

<its:rules version="1.0">
 <its:translateRule selector="//path | //cmd" translate="no"/>
<help> [...]
  <p>To re-compile all the modules of the Zebulon toolkit you need to go in the
    <path>\Zebulon\Current Source\binary</path> directory.
    Then from there, run batch file 
<cmd>Build.bat</cmd>.</p> [...]

Using ITS globally (2)

<its:rules xmlns:its="" version="1.0">
 <its:langRule selector="//*[@langinfo] langInfoPointer="@langinfo"/>

"Elements within Text" example

  <its:rules version="1.0" xmlns:its="">
   <its:withinTextRule withinText="yes" selector="//b|//u|//i"/>
   <its:withinTextRule withinText="nested" selector="//fn"/>
  <p>This is a paragraph with <b>bold</b>, <i>italic</i>, and <u>underlined</u>.</p>
  <p>This is a paragraph with a footnote
<fn>This is the text of the footnote</fn> at the middle.</p>

Ruby example

<text xmlns:its="">
 <head> ... 
   <its:rules version="1.0">
   <its:rubyRule selector="/text/body/img[1]/@alt">
    <its:rubyText>World Wide Web Consortium</its:rubyText>
  <img src="w3c_home.png" alt="W3C"/> ...

Application areas of ITS