Re: TAG ACTION-23: URIs for XML Schema datatypes from Henry S. Thompson on 2012-02-15 (www-tag@w3.org from February 2012)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Wed, 15 Feb 2012 16:50:00 +0000
To: www-tag@w3.org
Message-ID: <f5bzkck59dz.fsf@calexico.inf.ed.ac.uk>
ht writes:

> . . .
> I'll follow up.

Part I

So, per the analysis in [1], where do we stand today wrt the W3C XML
Schema Datatypes specification's claim [2] that

  Each built-in datatype defined in this specification can be uniquely
  addressed via a URI Reference constructed as follows: 
   * the base URI is the URI of the XML Schema namespace
   * the fragment identifier is the name of the datatype
  For example, to address the int datatype, the URI is:
    http://www.w3.org/2001/XMLSchema#int

?

As of today, three representations are available via content
negotiation from the XML Schema namespace URI.  One of them is
returned with a 200 in response to any GET of
http://www.w3.org/2001/XMLSchema, depending on the Accept header in
the request, as follows:

 R1) Accept: text/html, or no Accept: header: text/html + DOC1
 R2) Accept: application/xhtml+xml: application/xhtml+xml + DOC1
 R3) Accept: application/xml: application/xml + DOC2

DOC1 is an XHMTL+RDDL document, with DOCTYPE

 <!DOCTYPE html PUBLIC "-//XML-DEV//DTD XHTML RDDL 1.0//EN"
                "http://www.w3.org/2001/rddl/rddl-xhtml.dtd" >

and is valid.

DOC2 is an XML document, a W3C XML Schema schema document, valid per
the DTD for schema documents.

Are the media types of these three representations "type-consistent
for int", as defined in my previous email [1]?
 
 For text/html (R1), per the "[r]egistration of MIME media type
 text/html" [3]

  "[a] fragment identifier designates the correspondingly named
   element; any element may be named with the 'id' attribute, and A,
   APPLET, FRAME, IFRAME, IMG and MAP elements may be named with a
   'name' attribute."

 so fragids like 'int' identify elements.

 For application/xhtml+xml (R2), per the "[r]egistration of MIME media
 type application/xhtml+xml" [4]

  "fragment identifiers for XHTML documents designate the element
   with the corresponding ID attribute value (see [XML] section
   3.3.1); any XHTML element with the "id" attribute."

 so fragids like 'int' identify XHTML elements.

I think we can say that text/html and application/xhtml+xml are
type-consistent for int on that basis.

 For application/xml (R3), per the "[a]pplication/xml Registration" [5]

  "As of today, no established specifications define identifiers for
   XML media types."

Insofar as the official story then, text/html and application/xml are
type-consisent for int, as are application/xhtml+xml and
application/xml, since application/xml _has_ no definition

But in the unofficial story, widely implemented in practice and now
_nearly_ official in the not-yet-approved updated media type
registration for the XML Media Types [6], we find

  "Conformant applications MUST interpret such fragment identifiers
   as designating that part of the retrieved representation specified
   by [XPointerFramework] and whatever other specifications define any
   XPointer schemes used. Conformant applications MUST support the
   'element' scheme as defined in [XPointerElement]."

This amounts to saying that two forms of fragid, namely barenames and
strings roughly of the form 'element([barename](/digits)*)', "identify
element[s] in the resource's information set" [7][8]

Again, I think that's close enough.  So even on the "in practice/in
the future" story about application/xml, we can conclude that
text/html and application/xml are type-consisent for int, as are
application/xhtml+xml and application/xml.

So _that's_ good: the W3C is not committing a server management error
as things stand.

And, trivially, since in _none_ of the three representations are _any_
fragids multiply defined in inconsistent ways.  

Some fragids have definitions in both R1 and R2, not surprisingly, but
they are, again not surprisingly, for "the same" element, so no
problem there.  On some readings of the above definitions for
text/html vs. application/xhtml+xml, so fragids have definitions only
in R1, but again, that's not a problem.

A large number of fragids have definitions in R3, but there is no
intersection between this set and the R1/R2 set, so _that's_ alright
too.  [We leave as an exercise to the reader whether, supposing my
advice in [9] had been followed, and the document at [10] been served
in place of R1/R2, things would have broken at the token-inconsistency
level]

So with respect to (my expanded version of) WebArch, nothing is
fundamentally _broken_ about the _status quo_.  The only problem is
with the broken _promise_ in the Schema spec. Jonathan Rees has made and Tim
Berners-Lee has endorsed and elaborated a recommendation that RDFa is
the answer.  My next message will explore this from the perspective
elaborated here.

ht

[1] http://www.w3.org/mid/f5baa4k8cub.fsf_-_%2540calexico.inf.ed.ac.uk
[2] http://www.w3.org/TR/xmlschema11-2/#built-in-datatypes
[3] http://www.rfc-editor.org/rfc/rfc2854.txt
[4] http://www.rfc-editor.org/rfc/rfc3236.txt
[5] http://www.rfc-editor.org/rfc/rfc3023.txt
[6] http://www.w3.org/2006/02/son-of-3023/latest.html#frag
[7] http://www.w3.org/TR/xptr-framework/#shorthand
[8] http://www.w3.org/TR/xptr-element/#model
[9] https://www.w3.org/Bugs/Public/show_bug.cgi?id=1974
[10] http://www.w3.org/XML/Group/2005/12/XMLSchema.html
-- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail from me _always_ has a .sig like this -- mail without it is forged spam]
Received on Wednesday, 15 February 2012 16:50:29 UTC