This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 11427 - should allow use of xml:id for XHTML5
Summary: should allow use of xml:id for XHTML5
Status: RESOLVED WONTFIX
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML5 spec (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-29 14:29 UTC by brian m. carlson
Modified: 2011-08-04 05:35 UTC (History)
9 users (show)

See Also:


Attachments

Description brian m. carlson 2010-11-29 14:29:48 UTC
Because the XML serialization of HTML5 does not require or specify a DOCTYPE (and therefore, a DTD), generic XML processing tools have no way of determining the proper attribute to use as an ID.  This causes problems when using e.g. XInclude with the "element" xpointer, which uses the ID to determine the subtree to include.  I use this currently to include relevant portions of XHTML into an Atom feed, and it is broken with XHTML5.

Currently, the specification is silent about whether xml:id is permitted at all and if so, what its interaction is with id.  I suggest that xml:id be permitted and be preferred over id for the XML serialization, while being forbidden in the HTML serialization.  Whatever the decision may be, please make it clear in the text.
Comment 1 David Carlisle 2010-11-29 16:14:06 UTC
Note this would presumably also affect mathml in xhtml.

MathML2 specifies id as does MathMl3. (Some drafts of MathML3 switched to using xml:id in preference but in the end, backwards compatibility and feedback from implementers in browsers and elsewhere caused us to stick with id.

Not insisting on <!DOCTYPE syntax in the serialization is a good thing, but can't you assume in the spec (and implement in your atom pipeline) a catalog that defaults a dtd or schema that implies the IDness of the id attribute?
Comment 2 Anne 2010-11-29 16:35:30 UTC
I think we should just put at the DOM-level. That attributes named id give the ID for the element.
Comment 3 brian m. carlson 2010-11-29 16:39:56 UTC
(In reply to comment #1)
> Note this would presumably also affect mathml in xhtml.
> 
> MathML2 specifies id as does MathMl3. (Some drafts of MathML3 switched to using
> xml:id in preference but in the end, backwards compatibility and feedback from
> implementers in browsers and elsewhere caused us to stick with id.
> 
> Not insisting on <!DOCTYPE syntax in the serialization is a good thing, but
> can't you assume in the spec (and implement in your atom pipeline) a catalog
> that defaults a dtd or schema that implies the IDness of the id attribute?

No.  I've never seen a piece of code that does that, unless it has special defaults for HTML (in the non-XML serialization).  Xalan and Xerces, two very popular XML tools, don't handle that.  (Fixing that requires patching the source.)  And since the Atom is autogenerated (as is the XHTML), it cannot include an internal DTD subset.  Even if I could do this, it makes every user and toolchain have to work around it when the solution is easy and well-specified.  xml:id is intended to handle specifically this case, so people don't have to play guess-the-attribute (and it's a Recommendation).

It looks like, from <http://wiki.whatwg.org/wiki/HTML_vs._XHTML>, that this is intended to be allowed, but it's just not codified.  I'm okay with language that prohibits both of them from being used together, but without xml:id, XML tools that work on IDs won't work without a DTD.  I think allowing xml:id is the better solution here.
Comment 4 David Carlisle 2010-11-29 16:44:33 UTC
(In reply to comment #2)
> I think we should just put at the DOM-level. That attributes named id give the
> ID for the element.


Although that doesn't necessarily help the original poster, since many generic
XML tools don't use the DOM. Given the DOM-centric view of (x)html
specification though that's probably as much as you can do in this set of
specs, then people using generic xml processing will have to do whatever it
takes to have an equivalent effect, defaulting the IDness somehow.
Comment 5 David Carlisle 2010-11-29 16:46:36 UTC
(In reply to comment #3)

> No.  I've never seen a piece of code that does that, unless it has special
> defaults for HTML (in the non-XML serialization).  Xalan and Xerces, two very
> popular XML tools, don't handle that. 

xerces (and hence xalan) can be configured to use an xml catalog, and an xml catalog can default a DTD
Comment 6 brian m. carlson 2010-11-29 16:59:21 UTC
(In reply to comment #5)
> (In reply to comment #3)
> 
> > No.  I've never seen a piece of code that does that, unless it has special
> > defaults for HTML (in the non-XML serialization).  Xalan and Xerces, two very
> > popular XML tools, don't handle that. 
> 
> xerces (and hence xalan) can be configured to use an xml catalog, and an xml
> catalog can default a DTD

I don't believe the OASIS XML Catalog specification allows that.  Please specify where on <http://www.oasis-open.org/committees/entity/spec-2001-08-06.html> (within the normative text) you see that.  And I'm not really interested in going through a full analysis of my setup.

And I don't see the recalcitrance to allowing it for the XML serialization.  I'm not requesting that xml:id be the only option, or even the preferred option.  I'm just requesting that it be *an* option.  xml:id is a Recommendation.  Its use with XHTML5 has tangible technical and practical benefits, including compatibility with a wide range of existing XML software.  Is there a specific technical reason that you think it should be forbidden, in light of the fact that DTDs are not used with XHTML5?
Comment 7 David Carlisle 2010-11-29 17:11:12 UTC
You need the optional but specified functionality in appendix E of the xml catalog spec (or use an sgml-open catalog:

http://www.oasis-open.org/committees/entity/spec-2001-08-06.html#s.doctype

But I guess you knew that and discounted this in advance with the "normative" rider.

> And I don't see the recalcitrance to allowing it for the XML serialization. 
> I'm not requesting that xml:id be the only option, or even the preferred
> option.

But I think that you were requesting that a valid HTML5 DOM that is serialised as xhtml should use xml:id rather than id for attributes of type ID (that is, that the name of the attribute be changed on serialisation)?

xml:id is usable with any XML document type and infers IDness if used with an xml:id processor, but usually (as here) it makes the document invalid, whether that matters or not depends on what you are doing.
Comment 8 brian m. carlson 2010-11-29 17:30:39 UTC
(In reply to comment #7)
> You need the optional but specified functionality in appendix E of the xml
> catalog spec (or use an sgml-open catalog:
> 
> http://www.oasis-open.org/committees/entity/spec-2001-08-06.html#s.doctype
> 
> But I guess you knew that and discounted this in advance with the "normative"
> rider.

You can't expect non-normative parts of specifications to be implemented.  I have in the past used (and continue to use) other tools that don't support appendix E.  And I think my arguments have technical merit without regard to my particular setup.

> > And I don't see the recalcitrance to allowing it for the XML serialization. 
> > I'm not requesting that xml:id be the only option, or even the preferred
> > option.
> 
> But I think that you were requesting that a valid HTML5 DOM that is serialised
> as xhtml should use xml:id rather than id for attributes of type ID (that is,
> that the name of the attribute be changed on serialisation)?

I'm not requesting that at all.  I'm requesting that it be an acceptable (valid) serialization to serialize the DOM as XHTML using xml:id instead of id.  I'm also requesting that it be valid when parsing XHTML to the DOM to use the xml:id attribute as the ID if no id attribute is specified.

Alternatively, you could specify a DTD for the XHTML serialization, but even I think that's an unacceptable alternative.

> xml:id is usable with any XML document type and infers IDness if used with an
> xml:id processor, but usually (as here) it makes the document invalid, whether
> that matters or not depends on what you are doing.

Right.  I'm trying to use valid XHTML5 here, since the pages will end up on the Internet.  I'm not interested in solutions that produce invalid documents.
Comment 9 Henri Sivonen 2011-01-04 09:02:39 UTC
I think we shouldn't make xml:id conforming or in any way facilitate its use.

Even though it doesn't look like it on the surface, breaking the assumption that nodes have at most one attribute that has the IDness nature causes quite a bit of complexity. A patch for implementing xml:id in Gecko was written, but the patch ended up touching a lot of code and got rejected due to adverse performance effects. I initially implemented xml:id support in Validator.nu contrary to advice given to me by more experienced people, and supporting more than one attribute with IDness caused unanticipated complexity in various parts of the code base. WebKit has decided against supporting xml:id. At least some SVG people seem to regret that they put xml:id in SVG 1.2 Tiny.

Instead of using xml:id, XML tooling should have a processing stage that assigns IDness to the id attribute in no namespace. (You can call this processing stage "XHTML id processor" analogously to on "xml:id processor".)
Comment 10 Ian 'Hixie' Hickson 2011-02-07 22:28:11 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: (In reply to comment #0)
> Because the XML serialization of HTML5 does not require or specify a DOCTYPE
> (and therefore, a DTD), generic XML processing tools have no way of determining
> the proper attribute to use as an ID.

There are several ways for generic XML processing tools to know that the id="" attribute on elements in the HTML, MathML, and SVG namespaces is an ID attribute. The tool can have hardcoded namespace knowledge. The tool can be given a DTD. The tool can be given an XML Schema. The tool can be configured with namespace-specific information.


> This causes problems when using e.g.
> XInclude with the "element" xpointer, which uses the ID to determine the
> subtree to include.

If the tool implements XInclude, it can also implement HTML (and MathML and SVG), at least the minimum required for its users to use HTML (or MathML or SVG) with XInclude.


> I use this currently to include relevant portions of XHTML
> into an Atom feed, and it is broken with XHTML5.

You can use an XPath XPointer in XInclude, instead of relying on IDness.


> Currently, the specification is silent about whether xml:id is permitted at all

That's out of scope of the HTML spec. Nothing stops you from using xml:id with HTML. If you decide xml:id is a specification that applies to your stack, then it is conforming in HTML for you.


> and if so, what its interaction is with id.

If there are any specific interactions that are not defined, that's an oversight. The HTML spec, as far as I am aware, _does_ define the behaviour of its APIs with respect to other features that introduce IDs (there are many more than just xml:id).


> I suggest that xml:id be permitted

That's up to you. Only you can decide what specs are in scope for your documents.


> and be preferred over id for the XML serialization

xml:id doesn't seem to do anything that the id="" attribute in HTML doesn't already do, so recommending that people use it seems like a bad idea.


> while being forbidden in the HTML serialization. 

It's not forbidden, it's impossible.


> Whatever the decision may be, please make it clear in
> the text.

It's not clear to me that there is any relevant decision to make clear in the spec.
Comment 11 Michael[tm] Smith 2011-08-04 05:35:15 UTC
mass-move component to LC1