Architectural Specifications for the World Wide Web and their Role for Language Resources

Henry S. Thompson, Liam Quin, Felix Sasaki, Michael Sperberg-McQueen
World Wide Web Consortium (W3C)
27 May 2008


Slides from a talk at LREC 2008 Language Resources and Standards workshop.
The full paper is available.

Acknowledgements

Introduction

What is the W3C?

Interoperability: A shared goal

Some history

What place standards?

Validation and interoperability

W3C XML Schema (XSD) 1.0

XSD 1.1

XSD 1.1 changes to support change

Integrity matters

XPath/XQuery Full Text: Getting exactly what you want

What is XML pipelining?

Divide and conquer

Pipeline benefit summary

Example XProc pipeline

Internationalization of XML

Summary of ITS data categories (1)

Summary of ITS data categories (2)

Using ITS locally

<help xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> [...]
  <p>To re-compile all the modules of the Zebulon toolkit you need to go in the
    <path
     its:translate="no">\Zebulon\Current Source\binary</path> directory.
    Then from there, run batch file 
<cmd its:translate="no">Build.bat</cmd>.</p> [...]
</help>

XSL FO

Conclusions

The next sections were shown only briefly during the talk

XML pipeline worldview

Benefits of 'divide and conquer'

Pipeline requirement

"Terminology" example

<doc its:version="1.0" xmlns:its="http://www.w3.org/2005/11/its">
 <section xml:id="S001">
  <par>A <kw its:term="yes" 
its:termInfoRef="http://en.wikipedia.org/wiki/Motherboard">motherboard</kw>,
also known as a <kw its:term="yes">logic <span its:term="yes">board</span></kw> on
Apple Computers, is the primary circuit board making up a modern computer.</par>
 </section> 
</doc>

The next sections were not shown at all during the talk

Pipeline Concepts

Atomic Pipeline Steps

Compound Pipeline Steps

Steps can be grouped into pipelines

Pipelines are Steps

A Little Terminology

The XML Processing Model WG

Some observations

Inputs

Compound steps

Language Constructs

Conditional processing

Iteration

Selective processing

Exception handling

Building libraries

Standard step library

Implementations

XProc conclusions

What's new in XML Schema 1.1?

More powerful all-groups

Assertions

Conditional type assignment

Wildcard changes

Versions

Room to grow? The version problem

Open content

Multiple versions of XML Schema after 1.1

XML Schema 1.1 conclusions

Overall conclusions

Using ITS globally (1)

<its:rules version="1.0">
 <its:translateRule selector="//path | //cmd" translate="no"/>
</its:rules>
<help> [...]
  <p>To re-compile all the modules of the Zebulon toolkit you need to go in the
    <path>\Zebulon\Current Source\binary</path> directory.
    Then from there, run batch file 
<cmd>Build.bat</cmd>.</p> [...]
</help>

Using ITS globally (2)

<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0">
 <its:langRule selector="//*[@langinfo] langInfoPointer="@langinfo"/>
</its:rules>

"Elements within Text" example

<doc>
 <head>
  <its:rules version="1.0" xmlns:its="http://www.w3.org/2005/11/its">
   <its:withinTextRule withinText="yes" selector="//b|//u|//i"/>
   <its:withinTextRule withinText="nested" selector="//fn"/>
  </its:rules>
 </head>
 <body>
  <p>This is a paragraph with <b>bold</b>, <i>italic</i>, and <u>underlined</u>.</p>
  <p>This is a paragraph with a footnote
<fn>This is the text of the footnote</fn> at the middle.</p>
 </body>
</doc>

Ruby example

<text xmlns:its="http://www.w3.org/2005/11/its">
 <head> ... 
   <its:rules version="1.0">
   <its:rubyRule selector="/text/body/img[1]/@alt">
    <its:rubyText>World Wide Web Consortium</its:rubyText>
   </its:rubyRule>
  </its:rules>
 </head>
 <body>
  <img src="w3c_home.png" alt="W3C"/> ...
  </body>
</text>

Application areas of ITS