Search Engines take on Structured Data

Structured data on the web got a boost this week, with Google’s announcement of Rich Snippets and Rich Snippets in Custom Search. Structured data at such a large scale raises at least three issues:

  1. Syntax
  2. Vocabulary
  3. Policy

Google’s documentation shows support for both microformats and RDFa. It follows the hReview microformat syntax with small vocabulary changes (name vs fn). Support for RDFa syntax, in theory, means support for vocabularies that anyone makes; but in practice, Google is starting with a clean slate: That’s a place to start, though it doesn’t provide synergy with anyone who has uses FOAF or Dublin Core or the like to share their data.

The policy questions are perhaps the most difficult. Structured data is a pointy instrument; if anyone can say anything about anything, surely the system will be gamed and defrauded. Google’s rollout is one step at a time, starting with some trusted sites and an application process to get your site added. The O’Reilly interview with Guha and Hansson is an interesting look at where they hope to go after this first step; if you’re curious about how this fits in to HTML standards, see Sam Ruby’s microdata.

While issues remain–there are syntactic i’s to dot and t’s to cross and even larger policy issues to work out–between Google’s rollout and Yahoo’s searchmonkey and the UK Central Office of Information rollout, it seems that the industry is ready to take on the challenges of using structured data in search engines.

Data interchange problems come in all sizes

I had a pretty small data interchange problem the other day: I just
wanted to archive some play lists that I had compiled using various
music player daemon (mpd)
The mpd server stores playlists as simple m3u files,
i.e. line-oriented files with a path to the media file on each line. But
that’s too fragile for archive and interchange purposes.
I had a similar problem a while back with iTunes playlists. In that episode,
I chose hAudio, an
HTML dialect in progress in the microformats
, as my target.

Unfortunately, hAudio changed out from under me between when I
started and when I finished. So this time, a simple search found the
music ontology and I tried it
with RDFa, which
lets you use any RDF vocabulary in HTML*.
I’m mostly pleased with the results:

  1. from A Song’s Best Friend_ The Very Best Of John Denver [Disc 1]

    by John Denver

    Poems, Prayers And Promises
  2. from WOW Worship (orange)

    by Compilations

    Did you Feel the Mountains Tremble
  3. from Family Music Party

    by Trout Fishing In America

    Back When I Could Fly

The album names come before the track names because I didn’t read
enough of the the RDFa primer when I
was coding; RDFa includes @rev as well as @rel
for reversing subject/object order.
advogato episode on
for details about the code.

The Music Ontology was developed by a handful of people who
staked out a claim in URI space
( and happily took comments from
as big a review community as they could manage, but they had no
obligation to get a really global consensus. The microformats process
is intended to reach a global consensus so that staking out a claim in
URI space is superfluous; it works well given certain initial
conditions about how common the problem is and availability of pre-web
designs to draw from. Perhaps playlists (and media syndication, as
hAudio seems to be expanding in scope to hMedia) will eventually reach
these conditions, but the music ontology already meets my needs, since
I’m the sort who doesn’t mind declaring my data vocabulary with URIs.

My view of Web architecture is shaped by episodes such as this
one. While giga-scale deployment is always impressive and definitely
something we should design for, small scale deployment is just as
important. The Web spread, initially, not because of global phenomena
such as Wikipedia and Facebook but because you didn’t need
your manager’s permission to try it out; you didn’t even
need a domain name; you could just run it on your LAN
or even on just one machine with no server at all.

In an
Oct 2008
tech plenary session on web architecture
Henri Sivonen said:

I see the Web
as the public Web that people can access. The resources you can
navigate publicly. I define Web as the information space accessible to
the public via a browser.
If a mobile operator operates behind
walls, this is not part of the Web.

I can’t say that I agree with that perspective. I’m no great fan of
walled gardens either, but freedom means freedom to do things we don’t
like as well as freedom to do things we do like. And architecture and
policy should have a sort of church-and-state separation between

Plus, data interchange happens not just at planetary scale, but
also within mobile devices, across devices, and across communities
and enterprises of all shapes and sizes.

I’ve gone a little outside the scope of current
standards; RDFa has only been specified for use in modular XHTML, with
the application/xhtml+xml media type, so far.

See also:

Once more into Versioning — this time with HTML

The W3C TAG has worked on the general issue of “versioning” for many years, and many TAG members may be worn out on the issue.
However, undeterred by past history, I’m taking another run at it, this time trying to look specifically at the issues around versioning of HTML, CSS, JavaScript and other parts of the standard web browser landscape.
Part of what’s new (I think) is looking at the cost/benefits around deployment. See the www-tag mailing list archive for the HTML and versioning threads.

Palm webOS approach to HTML extensibility: x-mojo-*

I got pretty excited about the iPhone,
and even more about the openness of Android and the G1, and then I
learn that the Palm Pre developer platform is basically just the open
web platform: HTML, CSS, and JavaScript.

Just after the mobile buzz at Web Directions North and the TAG declared victory on how to build The Self-Describing Web with URI-based Extensibility , I get some details on how Palm is building on the open web platform:

A widget is declared within your HTML as an empty div with an x-mojo-element attribute.

<div x-mojo-element="ToggleButton" id="my-toggle"></div>

Oh great; x- tokens… aren’t those passe by now?

The suggestion in the HTML 5 draft is data-* attributes. The ARIA draft suggests @role. The Palm design looks like new information for issue-41, Decentralized-extensibility, in the HTML WG.

Anybody know how frozen the Palm design is? Or if they looked at ARIA, data-* or URI-based namespaces?

JavaScript required for basic textual info? TRY AGAIN

Sam says he’s Online and Airborne. “Needless to say, this is seriously cool.” I’ll say! But when I follow the link to details from the service provider, I get:

Sorry. You must have JavaScript enabled to view this page. Click the
BACK button below or enable JavaScript in your browser preferences and
click TRY AGAIN.

Let’s turn that around, shall we? Sorry, if you’re a network provider and you want my business, read up on unobtrusive javascript (aka the rule of least power), go BACK to work on your web site design and TRY AGAIN.

How to evaluate Web Applications security designs?

I could use some help getting my head around security for Web
Applications and mashups.

The first time someone told me W3C should be working on specs help
the browser prevent sensitive data from leaking out of enterprises, I
didn’t get it. “Use the browser as part of the trusted computing base?
Are you kidding?” was my response. I didn’t see the bigger picture.
Crockford explains in an April 2008 item:

… there are multiple interests involved in a web
application. We have here the interests of the user, of the site, and
of the advertiser. If we have a mashup, there can be many more

Most of my study of security protocols concentrated on whether a
request from party A should be granted by party B. You know, Alice and
Bob. Using BAN
to analyze the Kerberos protocols was very interesting.

I also enjoyed studying capability
security and the E system
, which is a fascinating model of secure
multi-party communication (not to mention lockless concurrency),
though it seems an impossibly high bar to reach, given the
worse-is-better tendency in software deployment, and it seemed to me
that capabilities are a poor match for the way linking and access
work in the Web:

The Web provides several mechanisms
to control access to resources; these mechanisms do not rely on
hiding or suppressing URIs for those resources.

On the other hand, after wrestling with the patchwork of javascript
security policies in browsers in the past few weeks, the capability
approach in adsafe looks simple and elegant by comparison. Is there any
chance we can move the state-of-the-art that far? And what do we do in
the mean time? Crockford’s Jan 2008 post is quite critical of W3C’s current

This same sort of wrong-end-of-the-network thinking can be seen today
in the HTML 5 working group’s crazy XHR access control language.

Access Control for Cross-Site Requests
is a mouthful, and “Access Control” is too generic, which leads to “W3C
Access Control”. Didn’t we already go through this with “W3C XML
Schema”? Generic names are awkward. I think I’ll call it WACL…
yeah… rhymes with spackle… let’s see if it sticks. Anyway…

Crockford’s comment cites his proposal and argues…

does not allow the server to abdicate its responsibility of deciding if
the data should be delivered to the browser. Therefore, no policy
language is needed. JSONRequest requires explicit authorization.
Cookies and other tokens of ambient authority are neither sent nor

I’m not sure I understand that. I’m glad to learn there’s more to
the difference between XMLHttpRequest and JSONRequest than just
<pointy-brackets> vs {curly-braces}, but I’d like to understand
better how “ambient authority” relates to the interests of users,
sites, advertisers, and the like.

In response, the FAQ in the WACL spec says:

JSONRequest has been considered by the Web Applications Working
Group and the group has concluded that it does not meet the documented
requirements. E.g., requests originating from the JSONRequest API
cannot include credentials and JSONRequest is format specific.

Including credentials seems more like a solution than a
requirement; can someone help me understand how it relates to the
multiple interests involved in a web application?

Caching XML data at install time

The W3C web server is spending most of its time serving DTDs to
various bits of XML processing software. In a follow-up comment on an item on DTD traffic, Gerald says:

To try to help put these numbers into perspective, this blog post
is currently #1 on slashdot, #7 on reddit, the top page of , etc; yet is still serving more than
650 times as many DTDs as this blog post, according to a 10-min
sample of the logs I just checked.

Evidently there’s software out there that makes a lot of use of the
DTDs at W3C and they fetch a new copy over the Web for each use. As
far as this software is concerned, these DTDs are just data files,
much like the timezone database your operating system uses to convert
between UTC and local times. The tz database
is updated with respect to changes by various jurisdictions from time
to time and the latest version is published on the Web, but your
operating system doesn’t go fetch it over the Web for each use. It
uses a cached copy. A copy was included when your operating system
was installed and your machine checks for updates once a week or so
when it contacts the operating system vendor for security updates and
such. So why doesn’t XML software do likewise?

It’s pretty easy to put together an application out of components
in such a way that you don’t even realize that it’s fetching DTDs
all the time. For example, if you use xsltproc like this…

$ xsltproc agendaData.xsl weekly-agenda.html >,out.xml

… you might not even notice that it’s fetching the DTD and several
related files. But with a tiny HTTP
, we can see the traffic. In one window, start the proxy:

$ python
Any clients will be served...
Serving HTTP on port 8000 ...

And in another, run the same XSLT transformation with a proxy:

$ http_proxy= xsltproc agendaData.xsl weekly-agenda.html

Now we can see what’s going on:

	connect to
localhost - - [05/Sep/2008 15:35:00] "GET HTTP/1.0" - -
connect to
localhost - - [05/Sep/2008 15:35:01] "GET HTTP/1.0" - -
connect to
localhost - - [05/Sep/2008 15:35:01] "GET HTTP/1.0" - -
connect to
localhost - - [05/Sep/2008 15:35:01] "GET HTTP/1.0" - -

This is the default behaviour of xsltproc, but
it’s not the only choice:

  • You can use xsltproc --novalid tells it to skip DTDs altogether.
  • You can set up an
    XML catalog
    as a form of local cache.

To set up this sort of cache, first grab copies of
what you need:

$ mkdir xhtml1
$ cd xhtml1/
$ wget
=> `xhtml1-transitional.dtd'
Resolving,,, ...
Connecting to||:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 32,111 (31K) [application/xml-dtd]
100%[====================================>] 32,111       170.79K/s
15:29:04 (170.65 KB/s) - `xhtml1-transitional.dtd' saved [32111/32111]
$ wget
$ wget
$ wget
$ ls
xhtml1-transitional.dtd  xhtml-lat1.ent  xhtml-special.ent  xhtml-symbol.ent

And then in a file such as
xhtml-cache.xml, put a little catalog:

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
rewritePrefix="./" />

Then point xsltproc to the catalog file and try it again:

$ export XML_CATALOG_FILES=~/xhtml1/xhtml-cache.xml
$ http_proxy= xsltproc agendaData.xsl weekly-agenda.html

This time, the proxy won’t show any traffic. The data was all
accessed from local copies.

While XSLT processors such as xsltproc and Xalan have no
technical dependency on the XHTML DTDs, I suspect they’re used with
XHTML enough that shipping copies of the DTDs along with the XSLT
processing software is a win all around. Or perhaps the traffic comes
from the use of XSLT processors embedded in applications, and the DTDs
should be shipped with those applications. Or perhaps shipping the
DTDs with underlying operating systems makes more sense. I’d have to
study the traffic patterns more to be sure.

p.s. I’d rather not deal with DTDs at all; newer schema technologies make them obsolete as far as I’m concerned. But

  • some systems were designed before better schema technology came along, and W3C’s commitment to persistence applies to those systems as well, and
  • the point I’m making here isn’t specific to DTDs; catalogs work for all sorts of XML data, and the general principle of caching at install time goes beyond XML altogether.

The details of data in documents: GRDDL, profiles, and HTML5

GRDDL, a mechanism for putting RDF data in XML/XHTML documents, is
specified mostly at the XPath data model level. Some GRDDL software
goes beyond XML and supports HTML as she are spoke, aka tag soup. HTML 5 is intended to standardize the connection between tag soup and XPath. The tidy use case for GRDDL anticipates that using HTML 5 concrete syntax rather than
XHTML 1.x concrete syntax involves no changes at the XPath level.

But in GRDDL and HTML5,
Ian Hickson, editor of HTML 5, advocates dropping the profile attribute
of the HTML head element in favor of rel=”profile” or some such. I
dropped by the #microformats channel to think out loud about this stuff, and Tantek said similarly, “we may solve this with rel=”profile” anyway.” The rel-profile topic in the microformats wiki shows the idea goes pretty far back.

Possibilities I see include:

  • GRDDL implementations add support for rel=”profile” along with HTML 5 concrete syntax.
    implementors don’t change their code, so people who want to use GRDDL
    with HTML 5 features such as <video> stick to XML-wf-happy HTML 5
    syntax and they use the head/@profile attribute anyway, despite what
    the HTML 5 spec says.
  • People who want to use GRDDL stick to XHTML 1.x.
  • People who want to put data in their HTML documents use RDFa.

don’t particularly care for the rel=”profile” design, but one should
choose ones battles and I’m not inclined to choose this one. I’m
content for the market to choose.

life without MIME type sniffing?

In a recent item on IE8 Security, Eric Lawrence, Security Program Manager for Internet Explorer, introduced a work-around to the security risks associated with content-type sniffing: an authoritative=true parameter on the Content-Type header in HTTP. This re-started discussion of the content-type sniffing rules and the Support Existing Content design principle of HTML 5. In response to a challenge asking for evidence that supporting existing content requires sniffing, Adam made a suggestion that I’d like to pass along:

I encourage you to build a copy of Firefox without content sniffing
and try surfing the web. I tried this for a while, and I remember
there being a lot of broken sites …

That reminded me of an idea I heard in TAG discussions of MIME types and error recovery: a browser mode for “This is my content, show me problems rather
than fixing them for me silently.”

Though Adam offered a patch, building firefox is not something I have mastered yet, so I’m interested to learn about run-time configuration options in IE (notes Julian) and Opera (notes Michael). Eric Lawrence’s reply points out:

Please do keep in mind, however, that most folks (even the ultra-web engaged on these lists) see but a small fraction of the web, especially considering private address space/intranets, etc.

A report from one developer suggests there’s light at the end of the tunnel, at least for sniffing associated with feeds:

I did, partly as an experiment, stop sniffing text/plain in the latest release of SimplePie (which, inevitably, isn’t the nicest of things to do, seeming there are tens of thousands of users). Next to nothing broke. I know for a fact this couldn’t have been done a year or two ago: things have certainly moved on in terms of the MIME types feeds are served with …

If you get a chance to try life without MIME type sniffing, please let us know how it goes.

Syntax for ARIA: Cost-benefit analysis

Syntax for ARIA: Cost-benefit analysis

1. Introduction

This analysis is intended to be neutral with respect to ideology,
history and constituency. For a useful overview of how we got here, see WAI-ARIA Implementation Concerns (member-only link) by Michael Cooper.

The W3C’s WAI PF Working Group recently published the first
public working draft of the Accessible Rich Internet Applications (WAI-ARIA) specification, which “describes mappings of user interface controls and navigation to accessibility APIs”.

The ARIA spec. defines roles, states and properties to manage the interface
between rich web documents and assistive technologies. The primary expression
of roles, states and properties in markup languages is via attributes. Since
ARIA is meant to augment web applications across a range of languages and user
agents, ARIA has to specify how its vocabulary of attributes and values can be
integrated into both existing and future languages.

In preparing this analysis, I have reviewed the available concrete evidence
bearing on the matter, and have carried out a considerable amount of work to
replicate and, in some cases, correct or extend, testing which has been done in the
past. The details are available in a report entitled Some test results concerning ARIA attribute syntax.

2. The core issue: How should the ARIA attributes be spelled?

ARIA is useful only if it is widely supported. It therefore needs to
integrate cleanly into existing and future languages as easily as possible. Before looking at possible answers to the spelling question, we need to consider
exactly what supporting ARIA means.

We can distinguish two levels of support for ARIA on the part of user
agents, which I’ll call ‘passive’ and ‘active’ support. By passive support, I
mean that documents with ARIA-conformant markup are not rejected by the agent,
and the markup is available in the same way any other markup is, e.g. via a DOM
API or for matching by CSS selectors. By ‘active’ support
I mean the user agents actually implement their part of ARIA semantics, that is, reflecting changes to ARIA-defined states and properties via
accessibility APIs.

Although already deployed implementations cannot offer active support, an
optimal answer to the spelling question would maximise passive support from
existing languages, as well as encouraging active support from subsequent implementations.

3. Possible approaches: land-grab, colon or dash

There are in principle three possible approachs to the spelling question:

  • land-grab  Just use ‘role’ and the names of the properties (e.g.
    ‘checked’, ‘hidden’) as attribute names.
  • colon  Use ‘aria:’ as a distinguishing prefix, giving e.g. ‘aria:role’,
    ‘aria:checked’ as attribute names.
  • dash  Use ‘aria’ plus some other punctuation character, e.g.
    dash, as a distinguishing prefix, giving e.g. ‘aria-role’,
    ‘aria-checked’ as attribute names.

The land-grab approach is pretty clearly unacceptable, because
of clashes with existing vocabularies and the likelihood
of clashes with future ones, and will not be considered further.

The current
specifies a combination of the colon and
dash approachs, with the colon being specified for use
with XML-based
languages, with the necessary additional requirement that ‘aria’ is bound to
the ARIA namespace in the usual way, i.e.
xmlns:aria="", and the
dash approach being specified for use with non-XML languages. We’ll
call this the mixed approach hereafter.

My understanding is that as of the date of this note, the WAI PF working
group have indicated that their intention is that the next draft of
the ARIA specs will move to the dash appropach.

4. The status quo: languages and implementations

Choosing an approach is made complicated by the landscape of language
and infrastructure standards it has to fit in to, and by the fact that these are
moving targets. We therefor have to distinguish between what is currently
in place, what we have reason to expect in the near future, and what we can
foresee in the longer term. Furthermore, for existing languages we have
two categories: XML-based languages, with more or less explict provision for
extensibility in general, typically namespace-based, and non-XML languages,
which for the purposes of this analysis we will take to be HTML 4.01 and nothing else.

As noted above, the best we can expect from deployed user agents is passive
support. The table below sets out the extent of passive support which is
available for the colon and dash approaches for each
of three host languages, which exemplify the major relevant categories: HTML
4.01 (for the non-XML languages), XHTML (an XML language, but not always treated
as such, so we actually get two columns for it below) and SVG (only an XML language).

(as if HTML)0
(as XML)
at all
colon: Yes, by ‘should ignore’ advice
dash: Yes, by ‘should ignore’ advice
colon: Yes, by ‘should ignore’ advice
dash: Yes, by ‘should ignore’ advice
colon: Yes, by ‘must ignore’ rule
dash: Yes, by ‘must ignore’ rule
colon: Yes, by ‘must ignore’ rule
dash: In principle,no
in practice1, yes
via DOM
colon: Yes, via GetAttribute
dash: Yes, via GetAttribute
colon: Yes, via GetAttribute
dash: Yes, via GetAttribute
colon: Yes2, via GetAttributeNS and GetAttribute
dash: Yes2, via GetAttribute
colon: Yes3, via GetAttributeNS and GetAttribute
dash: Yes3, via GetAttribute
CSS selector
colon: Yes4, using [aria\:attr]
dash: Yes5
colon: Yes4, using [aria\:attr]
dash: Yes5
colon: Yes, using [aria|attr]
dash: Yes5
colon: No
dash: No


  • 0  This column applies to the IE family, and to other browsers
    whenever treating XHTML as HTML
  • 1  Firefox, IE7 + Adobe 3.03 SVG plugin
  • 2  All browsers which treat XHTML as XML
  • 3  Firefox (unable to test IE+plugin so far)
  • 4  Except IE family
  • 5  If attribute selectors supported at all, i.e. not IE5, IE6

It should be noted that some of the entries above disagree with assertions
made in the past about browser behaviour. At least some of those assertions
were based on flawed test materials—see the discussion
of experiments 1 and 2
in my testing report for details on the information
summarised above.

5. The near future

A number of browser implementors have responded positively to the ARIA
initiative and have included experimental active support for ARIA in pre-release
versions of their products. Most of the test materials and implementation
information I can find suggests that only the dash approach, and only
HTML or XHTML, are currently being implemented.

With regard to improving passive support, it seems very possible that
IE8 will support attribute selectors of the form [aaa\:checked],
which would remove the qualification recorded in the table above by footnote 4.

5.1. HTML5

The situation with respect to HTML5 is complicated. As it
currently stands, the HTML5 draft
supports namespaces internally, and all HTML elements are
parsed into the DOM nodes in the HTML namespace, regardless of whether they are
parsed “as HTML” or “as XML”. But when parsing documents “as HTML”, no
other namespaces are recognised. Unless this changes before HTML5
is completed, the HTML/”XHTML (as if HTML)” columns above will apply to
HTML5-conformant user agents in at least some cases.

6. Cost-benefit analysis

On the basis of the above survey, there follows below an attempt at a
cost-benefit analysis with respect to the colon and
dash approaches, as well as the mixed
approach as currently specced in the ARIA working draft and a fourth approach, as proposed by me in
message to www-tag
, which I’ll call the xcolon approach.
The xcolon approach attempts to address some of the problems
revealed in the passive support table by defining a
pair of getter/setter Javascript functions for access to ARIA information in the
DOM, and giving a design pattern for duplicated CSS selectors (one using
[aria\:xxx] and the other [aria|xxx]).

Benefits Costs
colon Consistency for page authors; Uniform DOM access (using
Get/SetAttribute); Orthogonal in XML languages; consistent with
extensibility for XML (and for HTML5?1)
Uniform DOM access ignores namespace2; no uniform CSS selector; no CSS selector at all
for IE legacy3; modest re-implementation cost4
dash Consistency for page authors; uniform DOM access; uniform CSS selector Inconsistent with XML namespace-based extensibility5; new paradigm for
‘namespace’6; scope creep7
mixed Orthogonal in XML languages; consistent with namespace-based
extensibility for XML (and for HTML5?1)
Confusing for authors; no uniform DOM access; no uniform CSS selector; uncertainty wrt XHTML; new paradigm for
‘namespace’6; scope creep7
xcolon Consistency for page authors; orthogonal for XML languages; consistent with
extensibility for XML (and for HTML5?1); uniform DOM access; uniform CSS selector
Requires indirection through accessor functions for DOM access;
requires duplicate CSS selectors; no uniform DOM representation; no CSS selector at all
for IE legacy3; modest re-implementation cost4
  • 1  HTML5’s provision for extensibility, whether compatible with
    XML namespaces or not, is an open area of discussion at the moment.
  • 2  That is, it requires the use of a fixed aria prefix
    and may not (i.e. in some browsers) correctly set the namespaceURI
    property even when targetting an XML DOM.
  • 3  That is, in the IE family, only (putatively) IE8 and successors
    will recognize [aria\:...] selectors
  • 4  See discussion of re-implementation cost below
  • 5  See discussion of XML extensibility below
  • 6  That is, adds the concept of a fixed, dash-delimited, prefix as
    a way of managing distinct symbol spaces to the existing non-fixed, colon-delimited
    prefix for the same purpose.
  • 7  That is, requires all embedding languages to explicitly allow
    and manage an inventory of fixed prefixes and, possibly, their vocabularies.

6.1. Implementation cost

For wholly commendable reasons, development of the ARIA spec. and pilot
implementation work have proceeded in parallel. Most if not all existing
implementations support only the dash approach. What is the likely
cost for those implementations of any decision to adopt any other approach? My
conclusion, having examined one implementation in some detail, is that the cost is
likely to be very modest.

Michael Cooper, WAI PF staff contact, captured the reason for this very
well, albeit unintentionally:

“The ARIA roles and properties are conceptually simple enough, but
they are designed to provide a bridge between HTML and desktop accessibility APIs,
a bridge which is exploited by the operating system, user agent, and assistive
technology all working together. There’s a complex set of interdependencies there
and the feasibility and details of many of the ARIA features could only be worked
out by testing in deployed systems, and therefore doing early implementation.”

The complexity referred to above is fundamentally one of architecture, both
static and dynamic. Not surprisingly, therefore, syntactic concerns account for a
tiny fraction of the code needed to implement ARIA as it stands. Furthermore, and
again not surprisingly, as it’s what sound software engineering practice requires,
the details of the concrete syntax are isolated, and the vast bulk of
the code I looked at refers to it only indirectly. The consequence of all this is
that the changes necessary to manage any change away from the dash
approach will be very straightforward. For more details, see the discussion
of experiment 3
in my testing report.

6.2. XML extensibility and SVG

Many existing XML languages make explicit, generic, provision for
extensibility by including in their formal schemas and/or spec. prose allowance for
any namespace-qualified elements and attributes from namespaces other than those
which make up the language itself. Tools such as NVDL and, to a lesser extent, W3C
XML Schema and RelaxNG, make it possible to combine the schemas for multiple XML
languages to give a complete characterisation of mixed-language documents.

One particularly important example of this approach is SVG. ARIA
integration into SVG is clean and straightforward under the colon or
approaches, but will require amending the spec. under the dash approach.

6.3. Short- vs. long-range considerations

In trying to weigh the tradeoffs which must of necessity be considered when
confronted by the information given above, the matter of timescale is particular
hard to address. Any assertion about how things will look five, or even two, years
hence can always be countered with a contrary assertion. None-the-less, the
centrality of the HTML languages for the Web, and the fundamental importance of
accessibility for all of us, suggest that we must take the long-term
impact of this decision seriously, and be prepared to discount some short-term
discomfort in return for long-term stability and simplicity.