Minutes for the XML Core f2f 2005 Mar 3-4

$Date: 2005/03/07 19:25:37 $

General

The XML Core WG held a face-to-face meeting on Thursday and Friday of the W3C Technical Plenary week that occurred 2005 February 28-March 4 in Boston.

Thursday

xml:id
xml-stylesheet pi type pseudo-attribute
XML futures
Joint meeting with the TAG
    xml:id, C14N
    xml-stylesheet pi type pseudo-attribute
    XML futures including binary XML
    Meaning of namespace name
XML and Namespace errata issues
base URI
XLink

Friday

XInclude errata issues
Canonicalization (briefly)
Document review tasks

Thursday

Attendees

Paul Grosso
Norm Walsh
Richard Tobin
Dmitry Lenkov
Philippe Le Hégaret
Henry Thompson (part time)
Daniel Veillard on the phone
John Cowan on the phone

xml:id

We published our xml:id CR on 2005 February 8.

The only big issue with xml:id is the C14N. We hear that almost everyone uses exclusive C14N which isn't broken, and almost no one uses the (broken) (non-exclusive) C14N, so it shouldn't hurt too much to fool with it. Also the problem only surfaces when working with fragments as opposed to whole documents. The leading contender is to create a C14N 1.1 with a different namespace name. This implies saying that only xml:space and xml:lang inherit and doing something about xml:base. Both DV and Norm note there are possibly other issues to get C14N right which we will discuss later.

We still need to improve the xml:id test suite.

xml-stylesheet PI type attribute

See the Associating Stylesheets Recommendation.

There has been some discussion on the XML CG list.

We generally have consensus (at least until JohnC got on the phone) to use the URI to find the resource, get its media type, and use that media type's defined fragment identifier syntax to handle the fragment identifier and determine the specified subresource; finally the application uses the type attribute to parse the returned (sub)resource. In the case of something without a media type, do whatever you do in that case to attempt to resolve the URI-with-frag-id—this is outside the realm of the xml-stylesheet PI spec. Since the SS PI spec says the semantics of the type attribute is defined by the HTML spec, we need to get them to agree with what we're saying the type attribute does. Once that is the case, all we would need to do is change the SS spec to refer to the latest HTML spec with this new explanation. (Currently the SS spec points to HTML 4.0.)

Perhaps the best way to handle this is to write an erratum that both HTML 4.0 and the SS spec can point to until the HTML spec can be revised, at which point the SS spec can just point to that.

We had gotten to this point when JohnC dialed in. He objects to calling it an erratum, and prefers not to change the current HTML-defined meaning of the type attribute. Norm thinks it isn't very feasible to add another attribute. We'll need to discuss this more with the TAG.

XML 1.2, 2.0, etc.

Philippe says maybe we should talk about XML 1.1 first. Paul asks if this is a technical issue for the WG or a tactical issue for the W3C. On the other hand, the lack of success of XML 1.1 does certainly make it seem pointless to do an XML 1.2 or XML 2.0 that is an incremental change. And the XML Core WG isn't the place for a revolutionary change to XML.

We could write an umbrella spec that points to the other specs in an attempt to “collect” XML, namespace, xml:base, xml:id, etc.

We should review the documents being produced by the XML Binary WG.

Assuming we don't write an XML 2.0 that incorporates binary XML, then we figure the binary XML WG-to-be could write a Recommendation that would allow binary XML within the current stack of specs, e.g., feed into the Infoset.

We believe there is no compelling justification for a new version of XML.

Joint meeting with the TAG

TAG members joining XML Core are: Vincent, Stuart, TimBL, Noah, PaulC, Ed, Roy (starting with the second topic) as well as Henry and Norm with two hats each.

xml:id, C14N

We reviewed the issue with (non-exclusive) C14N which has problems with xml:base and xml:id when working with fragments of documents. We spent most of the time discussing details of how there can be problems with xml:base and non-exclusive C14N. PaulG reported the WG's plan for a C14N 1.1 (which would get used by other specs via a new canonicalization scheme URI). We would plan to continue with xml:id as currently presented by the CR draft.

Later on, Daniel asked for a statement that C14N 1.0 is in error. We re-opened the discussion, but didn't really come up with an answer.

The key open issue seems to be how to handle xml:base.

xml-stylesheet pi type pseudo-attribute

We reviewed the issue and some discussion to date on this topic. Tim explained a scenario about using media types to interpret fragment identifiers and then using namespaces of the retrieved names to determine the language of the returned fragment (TAG issue xmlFunctions-34).

We know that there is much industry practice here that doesn't always correspond to existing standards and specs yet that needs to be taken into account in any work.

If anything, we expanded the universe around this issue rather than narrowing it.

[As an aside, the TAG may wish to pick up the topic of the use of unregistered MIME types by W3C specs.]

Henry will probably raise some form of this as a TAG issue; the WG plan is to wait for the TAG to take the heat. Paul will ensure we have coordination with the Hypertext CG on this.

XML futures including binary XML

PaulG reports the the XML Core WG believes there is no compelling justification for a new version of XML. We briefly discussed the possibility of producing an umbrella spec. (It would basically be an exercise in “branding”: W3C Preferred XML Suite? Not clear if it would be a Rec or just a Note. Not clear if it would have “conformance” to it our not.) We didn't spend much time on this topic so that we could get to others.

Meaning of namespace name—can one add names to an existing namespace

First, can the XML Core WG add a name to the XML namespace? Henry points out that you can't add a name to the XML namespace because all names already exist, but the WG can define the meaning of any name in the XML namespace that doesn't already have a meaning. John Boyer has asserted that the namespace spec says that a namespace consists of a particular, specified set of names and a URI, and adding another name [which he claims is what is done by defining the meaning for a heretofore unmentioned name] changes that namespace.

We looked at URIs for W3C Namespaces for policies on namespace names.

WGs should state their policy on whether they will add names to a given namespace when they define that namespace. Though not made explicit enough, it is generally agreed that the implicit understanding is that names can be added to the XML namespace, as done in the case of xml:base. No one at this meeting said that the XML Core WG couldn't add xml:id (and such) to the XML namespace.

The XML Core WG should make the policy for adding names to the xml namespace explicit in the namespace document for this namespace—action to Paul and Henry.

XML and Namespace errata issues

Richard suggested we take NS 1.1 and revert the two substantive changes (IRI and undeclared namespaces) to create NS 1.0 2nd Ed. The WG has consensus to do that, and we got approval from the team to do so.

We note that the IRI spec is now finished—RFC 3987—so we have to issue an erratum for NS 1.1 for this.

We had one comment on namespaces effectively requesting a sort of minimal “schema-like” capability. We don't this is within the scope of the namespace spec or our WG charter.

base URI

Note: The replacement to RFC 2396 is 3986.

Norm asks the question about the absolutivity of [base URI] and suggests we should make an erratum to the infoset spec that says that the [base URI] is always absolute (unless there is no absolute [base URI] at all for the resource). We have CONSENSUS that base URIs are always absolute. We decided that the Infoset references xml:base which references RFC 2396 which makes base URI always absolute, so although some of us prefer making it explicit in the Infoset, we DECIDED not to bother.

Henry explains his view that we need to allow frag ids in [base URI]. However, the last sentence of the first paragraph of section 5.1 of RFC 3986 says “If the base URI is obtained from a URI reference, then that reference must be converted to absolute form and stripped of any fragment component prior to its use as a base URI.”

Since the infoset and xml:base refer to 2396, it's not clear whether the fragment identifier is part of the infoset's [base URI] or not as life stands today. We didn't resolve just what if anything to do about this.

XLink

The Extending XLink 1.0 WG Note (27 January 2005) describes some useful changes that could be incorporated into an XLink 1.1 Specification.

Martin Durst sent an email thread with some more notes about IRIs and XLink; see also his announcement of the publication of more recent IRI specs.

John Cowan sent email detailing places where we need wording changes with respect to attributes in the XLink namespace. We should also add a note saying that all attributes in the xlink namespace are reserved for use by the XLink family of specs. We should make an explicit statement that the namespace is open to additive changes and therefore we might want to say that attributes in the xlink namespace that are not recognized by the current xlink processor must be ignored.

Norm suggested we should now be able to point to RFC 3987 (IRIs) and remove our wording. However, since 3987 doesn't allow spaces in IRIs, and XLink 1.0 does allow spaces, we either need to say “IRIs with spaces (and other 'unwise' characters)” or we must realize we are making a backward incompatible change.

We had Martin Dürst join us at this point. 3987 does have some wording about what used to be called unwise characters: “Systems accepting IRIs MAY also deal with the printable characters in US-ASCII that are not allowed in URIs.... Protocols and formats that have used earlier definitions of IRIs including these characters MAY require percent-encoding of these characters as a preprocessing step to extract the actual IRI from a given field.” We would therefore put in wording that allows us to say “IRI but note the 'MAY' paragraph” without making values with spaces (and such) invalid.

We will also need to make parallel changes to XML 1.0 and 1.1 (for system ids). Daniel is concerned; Richard and John point out this is really just editorial—we aren't changing what is legal XML, implementations don't need to change.

NS 1.1 should have an erratum pointing to 3987 but without reference to the “MAY” paragraph.

We might want to consider switching base URI to base IRI in xml:base someday.

Friday

Attendees

Paul Grosso
Norm Walsh
Richard Tobin
Philippe Le Hégaret
Daniel Veillard on the phone

XInclude

PEX1 Fatal XInclude errors in unactivated fallbacks

Are (what would be) fatal XInclude errors within a fallback that one never processes fatal errors? ERH suggests it shouldn't be fatal. DV and Richard agree that it shouldn't be fatal and shouldn't need to be reported. Specifically, if one never needs to process the fallback, an XInclude processor shouldn't have read the fallback at all.

Section 4.4 Fallback Behavior explains how to process fallback in the case of resource errors. We had CONSENSUS to add a paragraph to this section about how there are no fatal errors relating to fallback—both errors within fallback elements and errors due to the wrong number elements—unless there is a resource error with the xinclude element surround this(these) fallback element(s). We will also add something after the third sentence of section 3.2 to this effect.

Both Richard's and ERH's implementation test for and treat a fatal error the occurrence of the wrong number of fallback elements even when fallback is not used, but neither inspect the contents of the fallback element if not fallback processing occurs.

PEX2 URI or IRI errors handling

We have already said that not reporting invalid IRIs as a fatal error is okay because some implementations could not distinguish such from resource errors. What if the implementations don't catch an invalid IRI as an error at all?

We don't expect implementors of XInclude to implement IRI processing, so whatever ends up happening with respect to IRIs isn't really the fault of the XInclude processor. However, we don't want to license such behavior, so we don't want to change our current wording here. [Paul doesn't really understand how to explain this, so someone else will have to write the proposed response here.]

PEX3 What is an error

The question is about (non-fatal) errors, specifically, from the last bit of section 2:

XInclude processors must stop processing when encountering errors other than resource errors, which must be handled as described in 4.4 Fallback Behavior.

There is no such thing as a non-fatal non-resource error. We will replace the above sentence with:

XInclude processors must stop processing when encountering a fatal error. Resource errors must be handled as described in 4.4 Fallback Behavior.

Also, at the end of the intro to 4.5, change “and does not produce an inclusion loop error” to “and does not produce an inclusion loop.”

As far as the case with accept="foo" (mentioned in the comment), the definition of the accept attribute in section 3.1 does not make this an XInclude error of any kind, though it may cause an error in a lower level that might result in a resource error, for example. Those present didn't seem quite satisfied with this resolution, so after some discussion, we decided to revisit this later.

PEX4 The encoding attr should trigger Accept-Charset

This comment is made obsolete by the fact that we have removed accept-charset from the final Rec.

PEX5 XML encoding detection in parse="text"

We had CONSENSUS to skip this comment for now.

PEX6 parse="text" and Byte-Order Mark

We explain how to determine the encoding in section 4.3 Included Items when parse="text" in a four point list where the last point simply concludes it is UTF-8. What happens in that case if the resource starts with U+FEFF? Should this be considered a byte order mark or a character?

We at first figured that we should treat the U+FEFF as a character in this case. But then we had second thoughts, so we plan to ask Martin. Later, we noticed 3629 was written by François, so we will ask him too.

PEX7 Illegal value in encoding attribute

Similar situation to PEX2 and to the accept="foo" part of PEX3. We don't want to require XInclude processors to have to error check the values of the encoding attribute. Norm suggests we change the wording for the encoding attribute in section 3.1 to change the sentence “The value of this attribute is an EncName as defined in XML specification, section 4.3.3, rule [81].” to say something like “The value of this attribute SHOULD be a valid encoding name.” So an invalid encoding attribute isn't a fatal xinclude error, though it might cause a resource error at some point.

PEX8 encoding attribute priority

This comment was against the PR draft. We believe we've now made this clear in section 4.3 of the Rec.

PEX9 encoding attribute content

Same issue/resolution as PEX7.

PEX10 accept-* attributes improperly defined

First, the comment suggests we misstate what RFC 2616 says, so we need to investigate that.

Next, we need to decide what to do about error checking the value of the accept-* attributes. We are pretty sure we want to avoid linefeeds in these attribute values, and we are pretty sure we don't want to require complete validation of these values, so we appear to be leaning toward doing some checking based on character ranges. Per 2616, section 2.2, the correct character range includes (among other things) linefeeds which we don't want to allow for security reasons. Therefore, we want to be more restrictive than required by 2616. So, we plan to change:

Values containing characters outside the range #x20 through #x7E are disallowed in HTTP headers, and must be flagged as fatal errors.

instead to be worded as follows:

Values containing characters outside the range #x20 through #x7E must be flagged as fatal errors.

PEX11 Reference to RFC 2279 should be RFC 3629

This is the RFC for UTF-8. We note that 2279 is listed in the normative references but there is actually no reference to it in the document. We will ask François for his opinion of how to handle this. Note that 3629 has a section about the BOM.

PEX12 getElementById

We sympathize. Unfortunately, we can't change the DOM, but at least some of these problems are said to be fixed in DOM Level 3.

PEX13 Normalize newlines when parse="text"?

The question is “When processing an xi:include element with parse="text", must newlines in the included document be normalized to LF?”

Our answer is “no”. Such normalization would be part of XML processing, not text processing.

Canonicalization

Canonical XML Version 1.0

We had several discussions about C14N earlier during this f2f, one among ourselves and another with the tag.

Non-exclusive C14N has a design flaw as it stands, but for implementations to comply to it, they must follow the spec's flaw and possibly put xml:id on several elements thereby making non-unique and/or otherwise incorrect id assignments.

The WG plans to create a C14N 1.1 that fixes the flaws.

Document reviews

The WG needs to review the Binary XML documents: Paul will look at them, and we need another volunteer (Norm is reviewing them already for the TAG).

Richard has been volunteered to review the XPath 2.0/XQuery 1.0 Data Model document, to be published as a LC WD in late March or early April (though there are no plans to make any changes to it at this time).

Henry Thompson <ht@w3.org> , staff contact
Paul Grosso <paul@arbortext.com>, co-chair and
Norman Walsh <Norman.Walsh@Sun.com>, co-chair
$Revision: 1.5 $ by $Author: PaulGrosso $
$Date: 2005/03/07 19:25:37 $