Bug 25149 - [Ser 30] Normative references to unstable documents must be removed before publication
Summary: [Ser 30] Normative references to unstable documents must be removed before pu...
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Serialization 3.0 (show other bugs)
Version: Proposed Recommendation
Hardware: PC Linux
: P2 blocker
Target Milestone: ---
Assignee: C. M. Sperberg-McQueen
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-25 22:30 UTC by Liam R E Quin
Modified: 2014-04-27 00:13 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Liam R E Quin 2014-03-25 22:30:48 UTC
The Serialization 3.0 PR contains normative references to
HTML 5
POLYGLOT
XSLT 3

In order to be published as a Rec we mustn't be in a situation where a change to an unstable spec like POLYGLOT (likely to have a 2nd last call soon) would break XQuery or XPath implementations.

I don't think we are actually in such a situation, but some editorial work is needed.

I suggest
(1) for POLYGLOT, this should become a non-normative reference; the one place where it's used in normative text seems to be about namespaces, and copying the appropriate paragraph out of the polyglot draft would cure that.

(2) for HTML 5, we refer to things like the list of void elements, which is now out of date (there are new ones). However, the new ones can have end tags, so we're *probably* OK on HTML 5; referring to specific sections of that huge spec might help. See http://www.w3.org/html/wg/tests-cr-exit/index.html - the items with green check marks are considered stable.

(3) XSLT 3 references can probably say "XSLT 2 or later"?? In cases where an XQuery alternative is given, the XSLT 3 reference could be non-normative.

The document (and hence XQuery 3, XQueryX 3, XPath 3 etc which refer nomatively to this spec) can't advance to Recommendation until this is resolved.

It probably also applies to the 3.1 spec.
Comment 1 C. M. Sperberg-McQueen 2014-03-26 21:04:15 UTC
One complication arises regarding XSLT 3.0:  section 6 of Serialization defines the (normatively required) process of namespace prefix normalization [1] by appealing normatively to an XSLT 3.0 stylesheet.  As written, this stylesheet is not a legal XSLT 2.0 stylesheet, because it uses the namespace axis in a match pattern. 

[1] http://www.w3.org/TR/xslt-xquery-serialization-30/#PREFIXNORMALIZATION

One possibility would be to rewrite the stylesheet in XSLT 2.0; I think this is possible.  A first attempt at such a rewrite follows.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
  xmlns:xhtml="http://www.w3.org/1999/xhtml"
  xmlns:svg="http://www.w3.org/2000/svg"
  xmlns:mathml="http://www.w3.org/1998/Math/MathML">
  
  <xsl:template match="xhtml:*|svg:*|mathml:*">
    <xsl:element name="{local-name()}" namespace="{namespace-uri()}">
      <xsl:call-template name="copy-namespace-nodes"/>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:element>
  </xsl:template>
  
  <xsl:template match="node()|@*">
    <xsl:copy copy-namespaces="no">      
      <xsl:call-template name="copy-namespace-nodes"/>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template name="copy-namespace-nodes">
    <xsl:for-each select="namespace::*
      [not(. = ('http://www.w3.org/1999/xhtml',
                'http://www.w3.org/2000/svg', 
                'http://www.w3.org/1998/Math/MathML'))]">
      <xsl:copy/>
    </xsl:for-each>
  </xsl:template>
  
</xsl:stylesheet>

(Changes made:  remove namespace::* (etc.) from all select patterns; suppress the template for the three magic namespaces; add the named template 'copy-namespace-nodes'; call that template from the two other templates.)

This is important enough and error-prone enough (I don't usually muck about with namespace nodes in my stylesheets, so I'm out of practice) that it would be a very good idea to have two or three additional pairs of eyes on this before concluding that the new stylesheet is correct.  (Volunteers, please check it and report your results in a comment on this bug.)

Another approach to the problem would be to embed the stylesheet in a note, on the theory that the transformation is adequately described in the prose definition of prefix normalization.
Comment 2 Michael Kay 2014-03-26 22:24:04 UTC
I think your rewrite is correct, but can be simplified: the copy-namespace=nodes template can be written

  <xsl:template name="copy-namespace-nodes">
    <xsl:copy-of select="namespace::*
      [not(. = ('http://www.w3.org/1999/xhtml',
                'http://www.w3.org/2000/svg', 
                'http://www.w3.org/1998/Math/MathML'))]"/>
  </xsl:template>

My only other reservation is whether the existing code is precisely equivalent to the prose. The prose says:

Any namespace node for any of those three namespaces that was previously present on any element node in the instance of the data model is also removed, unless the prefix that that namespace node declared is used as the prefix on the name of an attribute on that element or an ancestor of that element.

Firstly, the prose is ambiguous: does the last phrase mean "the name of an ancestor of that element" or "the name of an attribute on an ancestor of that element". I think it must be the latter.

Secondly, there seems to be no code to implement this final provision. Given

<a svg:x="1" xmlns:svg="http://www.w3....">
  <b>
</a>

the prose says that <b> will have a namespace node for the svg namespace, but the code says that it will not. This only makes a difference, of course, if the resulting tree is serialized with xml-version="1.1", where if we believe the code, there will be a namespace undeclaration on <b>, but if we believe the prose, there will not.

So I'm sympathetic to the idea of making the code non-normative, especially if the above-mentioned ambiguity in the prose is corrected.
Comment 3 Michael Kay 2014-03-26 22:27:30 UTC
Correction. I think I was mistaken that the code is inconsistent with the prose. I forgot about the rules for namespace inheritance. Henry Z is usually right.
Comment 4 C. M. Sperberg-McQueen 2014-03-26 23:19:57 UTC
Note that the normative reference to "Character Model for the World Wide Web 1.0: Normalization," ed. Addison Phillips, Tex Texin, Richard Ishida, et. al., also goes to a non-stable document (in this case a non-last-call draft).  The text of the spec refers to this document three times:

1 In section 4 "Phases of Serialization" [1], item 3.d of the list contains a definition of the term "Unicode normalization" which reads in part:

    For specific recommendations for character normalization on the 
    World Wide Web, see [Character Model for the World Wide Web 1.0: 
    Normalization].]

[1] http://www.w3.org/TR/xslt-xquery-serialization-30/#serphases

I think this sentence could if necessary be moved to a note; it expresses no normative requirement.  (I am divided in my mind whether it SHOULD be moved to a note; after some thought, I lean toward saying it should stay where it is, since it's in text clearly marked as the definition of a term.  But we can make it a note if the WG chooses.)

2 In section 5.1.9 XML Output Method: the normalization-form Parameter [2], one of the bullet items in normative text reads:

    NFC specifies the serialized result will be in Normalization 
    Form C, using the rules specified in [Character Model for the 
    World Wide Web 1.0: Normalization].

[2] http://www.w3.org/TR/xslt-xquery-serialization-30/#XML_NORMALIZATION-FORM

I may be missing something, but I don't see any special rules for normalization form C in the Character Model spec that apply to our situation.  At first glance, what the Character Model spec provides that the Unicode definition of NFC does not provide is a set of rules for getting there from legacy encodings.  That can be relevant for a parser, but not for a serializer.

I think the thing to do here is (a) replace the reference to Character Model with a reference to UAX #15, and (b) add a note pointing to Character Model for further information and rules for dealing with legacy encodings. 

3 Again in section 5.1.9, another bullet item reads:

    fully-normalized specifies the serialized result will be in 
    fully normalized text, as specified in [Character Model for 
    the World Wide Web 1.0: Normalization].

The term 'fully normalized' is (as far as I can tell) not standard Unicode terminology, but fortunately (as the note immediately below this passage suggests) it is defined not only by the Character Model spec but also by the XML 1.1 spec.  What we should do is (a) replace the reference to Character Model with a reference to XML 1.1 and (b) add a note pointing to Character Model for further information and motivation.
Comment 5 Michael Kay 2014-03-26 23:30:35 UTC
The references for Unicode normalization could be done by referring to the fn:normalize-unicode() function in F+O, which was rewritten to avoid normative reference to the character model spec.
Comment 6 C. M. Sperberg-McQueen 2014-03-27 00:13:53 UTC
(In reply to Michael Kay from comment #3)
> Correction. I think I was mistaken that the code is inconsistent with the
> prose. I forgot about the rules for namespace inheritance. [...]

OK, but now that the issue has come up, you have to explain it to me and we should get it in writing (either in the text of the spec or in a comment in the source explaining why everything is OK).  I had overlooked that part of the prose, and no matter how the ambiguity is resolved, it was not obvious to me at first glance how either the original 3.0 or the revised 2.0 stylesheet avoid losing the namespace declaration for the 'svg' namespace, in your example.  

A few minutes (or more) of leafing through the serialization spec and the XSLT spec make me conjecture that the reason no code is needed for the case described is the rule in 5.7.3 of XSLT 2.0 (5.8.3 of XSLT 3.0) that says, essentially, that the namespace node for svg cannot be dropped here, because the svg:x attribute is copied out without changing its name.  If some other rationale explains your conclusion, please record it.
Comment 7 C. M. Sperberg-McQueen 2014-03-27 01:23:26 UTC
I've been working through the list of normative references, checking for (a) stability of the target and (b) the nature of the normative dependency.  Some points seem worth recording, for the record.

(1) Character model (already dealt with in comment #4); not stable, needs to become a non-normative reference.

(2) Polyglot.  Some references appear non-normative in force and can be left alone (or moved into notes).  The references in section 6 on the XHTML serialization method have normative force; the relevant rules in the Polyglot spec are brief and can be copied into the serialization spec with a note indicating their provenance, and optionally a provision that conforming implementations MAY adjust their behavior to match that specified in any future Recommendation version of the Polyglot spec, with (or without) requiring or recommending a user option to control the behavior. 

(3) RFC 2854.  Only reference is in a note; should be moved to non-normative references.

(4) RFC 3236.  As for RFC 2854.

(5) XML Schema.  Only reference is in a note; should be moved to non-normative references.

(6) XSLT 3.0.  The following references to XSLT 3.0 appear to me to be non-normative in force, so they can be left alone (or moved into notes, or rephrased appropriately):

  - Abstract
  - 1 Intro
  - 9 Character maps (refers reader to XSLT 3.0 for examples)
  - 10 Conformance (mentions XSLT as a possible host language)

The following have normative force, but could easily refer to XSLT 2.0 instead.

  - 9 Character maps (reference to disable-output-escaping as an XSLT feature)

The following have normative force, and it's not immediately obvious to me how to rewrite them to avoid a normative dependency on XSLT 3.0 (it may be possible but not obvious; it may be obvious but not immediately obvious; it may be immediately obvious to any intelligent observer but not to me):

  - 1.1 Terminology

    Where this specification indicates that an XSLT instruction is 
    evaluated, the behavior is as specified by [XSL Transformations 
    (XSLT) Version 3.0].

  - 2 Sequence normalization (reference to XSLT 3.0 deep copy)

  - 3.1 Setting Serialization Parameters by Means of a Data Model Instance 
    (the static context component "Set of available instructions" is 
    defined as including the set of all XSLT 3.0 instructions)

(7) HTML 5.  Some references (I won't list them) have no particular normative force; others seem to me to require a normative reference, or else the replication of constraints from the current version of HTML 5.  Copying constraints will require us to decide what conforming users of Serialization should do when HTML 5 progresses, if it changes any of those rules.  The few references that seem most clearly normative to me are:

  - In 6 XHTML Output Method, HTML5 is listed as a source of information to use when deciding to recognize an element as an HTML element.

  - 6.1.4 XHTML Output Method: the indent and suppress-indentation Parameters, HTML 5 is the source of information about which elements are phrasing elements.  7.4.3 HTML Output Method: the indent and suppress-indentation Parameters refers to HTML 5 for the same information.

  - 7.2 Writing Attributes (in the HTML method) refers to HTML 5 as one source of information about which attributes to consider booleans. 

I suppose that in all of these cases, we could change the wording to generically refer to existing and future definitions of HTML, and make it implementation-defined which versions of HTML are binding for purposes of identifying HTML elements, phrasing elements, and boolean attributes.  If we do that, we should also do it for the list of void elements (already copied into the serialization spec, presumably for reasons like those occupying my mind today).

It's clear that removing the normative dependencies of Serialization 3.0 is going to involve changes large enough to require WG consideration and approval.  The editors will set about preparing a suitable change proposal.
Comment 8 Michael Kay 2014-03-27 08:17:38 UTC
Well, the XSLT code actually does what most people would expect, and it's only a few people like you and me who would worry about how it manages to achieve this.

Basically, if the stylesheet outputs an attribute in (say) the SVG namespace, then namespace fixup (section 5.8.3 in the 3.0 spec) ensures that the containing element will have a namespace node for the SVG namespace, and namespace inheritance (rule 12 of 5.8.1) ensures that its descendant elements will also have one.

The namespaces are not present in the output in the absence of such an attribute because neither <xsl:element> and <xsl:copy copy-namespaces="no"> copies namespaces from the source to the result; in both cases the only namespace nodes on the constructed element will be those generated by namespace fixup or namespace inheritance.
Comment 9 C. M. Sperberg-McQueen 2014-03-31 15:28:12 UTC
A proposal intended to resolve this issue is now on the server at [1] and [2].  The changes made (and some that have NOT been made) are listed in the abstract.

[1] https://www.w3.org/XML/Group/qtspecs/specifications/xslt-xquery-serialization-30/html/Overview-diff.html
[2] https://www.w3.org/XML/Group/qtspecs/specifications/xslt-xquery-serialization-30/html/Overview.html
(Both are member-only links.)
Comment 10 Jim Melton 2014-04-27 00:13:35 UTC
The 3.0 Serialization REC has been published and the offending normative references were, indeed, removed.

I am therefore marking this bug RESOLVED/FIXED and request that the commenter mark it CLOSED.