This document contains a list of issues regarding the DOM Level 3 Load and Save specification Last Call period. All comments or issues regarding the specification or this document must be reported to email@example.com (public archives) before July 31, 2003. After this date, we don't guarantee to take your comments into account before moving to Candidate Recommendation.
An XML version of this issues' list is also available.
Color key: error warning note
|harold1 : DOMInput.certified attribute meaning||agreed||clarification||No reply from reviewer|
|clover1 : Various edits||agreed||editorial||No reply from reviewer|
|clover2 : DOMImplementationLS.createDOMOuput||agreed||request||No reply from reviewer|
|manian2a : DOMParser interface name||agreed||request||No reply from reviewer|
|manian2b : DOMParser and EventTarget||declined||request||No reply from reviewer|
|manian2c : Progress events||agreed||clarification||No reply from reviewer|
|manian2d : DOMParser.config||declined||clarification||No reply from reviewer|
|manian3 : A DOMParser's DOMResourceResolver||agreed||clarification||No reply from reviewer|
|manian4a : DOMSerializer interface name||agreed||request||No reply from reviewer|
|manian4b : DOMSerializer.config||declined||clarification||No reply from reviewer|
|cparpart1 : DOMParser.parseWithContext||agreed||clarification||No reply from reviewer|
|clover3 : unbound-namespace-in-entity||agreed||clarification||No reply from reviewer|
|honkala1 : DOMParserFilter inconsistencies||agreed||clarification||No reply from reviewer|
|honkala2 : DOMSerializerFilter inconsistencies||agreed||clarification||No reply from reviewer|
|honkala3 : Make convenience interfaces mandatory||declined||request||No reply from reviewer|
|cparpart2 : newline handling||declined||clarification||No reply from reviewer|
|xmlcorewg1 : Unicode references||agreed||request||No reply from reviewer|
|xmlcorewg2 : Empty intput source||agreed||clarification||No reply from reviewer|
|xmlcorewg3 : DOMParser and CDATA as structure.||agreed||request||No reply from reviewer|
|xmlcorewg4 : DOMParser and "unbound-namespace-in-entity"||agreed||clarification||No reply from reviewer|
|xmlcorewg5 : DOMParser XML namespace reference.||agreed||editorial||No reply from reviewer|
|xmlcorewg6 : Finding data in DOMInput||agreed||clarification||No reply from reviewer|
|xmlcorewg7 : DOMSerializer and node types||declined||request||No reply from reviewer|
|xmlcorewg8 : DOMSerializer.writeURI||declined||request||No reply from reviewer|
|xmlcorewg9 : DOMOutput||agreed||request||No reply from reviewer|
|xmlcorewg10 : DOMOutput relative systemID||declined||request||No reply from reviewer|
|i18n1 : DOMParser character normalization||declined||request||Objection|
|i18n2 : DOMParser check character normalization error||agreed||request||Proposal incorporated|
|i18n3 : DOMSerializer output clarification||declined initial suggestion, agreed on follow-up||request||Proposal incorporated|
|i18n5 : DOMSerializer encoding pseudo attribute handling||agreed||request||No reply from reviewer|
|i18n6 : DOMSerializer.writeURI and encoding||declined||request||Agreement|
|i18n7 : DOMSerializer.writeURI naming||agreed||request||Agreement|
|i18n8 : DOMSerializer UTF8 & UTF16 & byteorders||declined initial suggestion, agreed on follow-up||request||Proposal incorporated|
|i18n9 : DocumentLS.load config parameters||declined||request||Agreement|
|i18n10 : Unicode 4.0||agreed||editorial||Agreement|
The new certified attribute of DOMInput has a very unobvious meaning. I suggest renaming it either "normalized" or "certifiedNormalized" or perhaps even "certifiedUnicodeNormalized"
DOMInput > systemId > "relative URI's" : Shouldn't have an apostrophe.
DOMParser > parse : The parameter 'is' should probably be called 'input', for consistency.
DOMSerializer > writeURI : The parameter 'URI' should probably not be in capitals, for consistency.
createDOMOutput method seems to be missing
This interface was called DOMBuilder in the earlier version(s) of the spec. Is there any specific reason why the name is changed to DOMParser. The name change to "DOMParser" is confusing to our users since we already have a public class called DOMParser (oracle.xml.parser.v2.DOMParser) and from a quick google search, it looks like Xerces might also have one (namely org.apache.xerces.parser.DOMParser ) If there is no "specific" reason for changing the name to DOMParser, it will be preferred if the name is changed back to DOMBuilder.
Alternatively, the interface could be changed to DOMParserLS or DOMBuilderLS (consistent with DOMImplementationLS, DocumentLS etc).
"Asynchronous DOMParser objects are expected to also implement the events::EventTarget interface so that event listener can be registered on asynchronous DOMParser objects."
It will be much cleaner and clearer if DOMParser extends events:EventTarget interface instead of expecting the implementation to extend and support EventTarget. It could be argued that synchronous DOMParser is not required to implement the events::EventTarget and so it should not be a forced to implement one. In that case, a possible solution is to have a generic DOMParser interface and two other interfaces namely DOMParserSynchronous and DOMParserAsynchronous which extends DOMParser. Then the DOMParserAsynchronous could be made to implement the events::EventTarget interface.
The spec is not very clear when the progress events are fired. Probably, the spec should include some scenarios when the progress event should be fired or should include a sentence saying that signaling of progress events is implementation dependent.
The spec is update to clearly state that this is implementation dependent. In addition to that, the spec now also includes an example of how an implementation *might* dispatch progress events, but that's just an example.
"In addition to the parameters recognized in DOM Level 3 Core...":
Does this mean that all the parameters listed in DOMConfiguration interface in DOM Level 3 needs to be recognized by DOM LS implementation?
If yes, then some of the parameters are repeated here like "infoset", "namespace" etc. They need to be removed.
If not, it should be explicitly stated which parameters (if any) from DOM Level 3 Core needs to be recognized by DOM LS module.
DOMParser does not have the attribute entity resolver. Therefore, it is not clear how the DOMResourceResolver is associated with the DOMParser.
Is there any specific reason why DOMWriter is changed to DOMSerializer. It will be preferred if DOMSerializer is changed to DOMSerializerLS or to DOMWriterLS.
Same as manian2d, but for DOMSerializer.config.
It is not clarified how parseWithContext interacts with the DOMBuilderFilter/DOMParserFilter and its very own passed ACTION TYPE. Which one gets precedance? Or will the filter be ignored and interpreded as accept?
In DOMSerializer: [["unbound-namespace-in-entity" [warning] Raised if the configuration parameter "entities" is set to true and an unbound namespace prefix is encounterd in a referenced entity.]]
Does this mean...
a. prefixes unbound in an entity declaration cause an error (as for DOMParser), or
b. prefixes unbound in an entity declaration cause an error only if they are referenced somewhere in the document, or
c. prefixes unbound in an entity reference cause an error?
The WG found numerous problems with the way this error was defined. The spec now defines an implementation dependent "unbound-prefix-in-entity" warning on DOMParser, and a fatal "unbound-prefix-in-entity-reference" error in DOMSerializer.
1.1 says DOMParserFilter filters only elements, while 1.3 says all kinds of nodes (e.g. attributes and text nodes) can be filtered. Which is right? The preferred answer is that of 1.3. Please fix this in the spec.
1.1 says DOMSerializerFilter can be used to filter out nodes, but 1.3 says that only elements can be filtered. Why doesn't this interface include attributes? An example of a use case: in XForms the 'relevant' attribute can be set to false on a attribute, which removes it from the serialization. Please fix this so that also attributes and text nodes can be filtered out.
Why are these interfaces optional? If the claim is right that they are just convenience methods, they should be trivial to implement. For users it will be a pain to check whether an implementation supports these interfaces. Please fix this by making them mandatory.
Whitespace handling clarification. See email.
In several places (1.2.3, 1.2.4, DOMInput, DOMOutput), it is said that UTF-16 is defined in [Unicode] and Amendment 1 of [ISO/IEC 10646]. That last part is obsolete, UTF-16 was defined in Amd 1 of 10646:1993, but integrated in an Appendix of 10646:2000. Just say "...in [Unicode] and in [ISO/IEC 10646]".
In interface DOMImplementationLS, method createDOMInput(), it says "Create a new empty input source." "Empty" is not defined. Does it mean that all attributes are null?
This comment will also probably apply to createDOMOutput() when the latter is added (see previous comment).
In interface DOMParser, 1st bullet after 3rd para, it is wrong to claim that CDATA sections are structure. It also seems wrong to set expectations that CDATA sections will show up after parsing when in fact parsers are not required to report them.
In interface DOMParser, in the description of the "unbound-namespace-in-entity" warning, how can an unbound prefix be found in an entity *declaration*? Perhaps you mean in an entity's replacement text?
In interface DOMParser, in the description of the "namespaces" parameter, shouldn't there also be a reference to [XML Namespaces 1.1]?
In interface DOMInput, it says "The DOMParser will use the DOMInput object to determine how to read data. The DOMParser will look at the different inputs specified in the DOMInput in the following order to know which one to read from, the first one through which data is available will be used: "
It is not clear how the DOMParser does that, i.e. how it determines if data is available. Is there an expectation that, say, DOMInput.characterStream will be null if data is not available there? What about stringData? Null or empty? Is this binding-specific?
Same comment for DOMOutput.
In interface DOMSerializer, the statement "For all other types of nodes the serialized form is not specified, but should be something useful to a human for debugging or diagnostic purposes." seems a bit weak. It should be possible to specify more, especially for Element nodes.
The WG discussed this but decided not to attempt to clarify this further in the spec. In stead, the WG chose to replace the above sentence with "For all other types of nodes the serialized form is implementation dependent.".
In interface DOMSerializer, method writeURI(), it would be desirable to specify more how to write to a URI, at least for very common schemes such as HTTP(S) and mailto.
In HTTP, it would seem desirable to actually be able to choose which verb (POST or PUT) is used. POST is supposed to be used when posting forms, which XForms does with XML data. PUT is supposed to be used for uploading data, here an XML document. The DOM user should be able to specify which to use, perhaps using an additional parameter to the method.
The spec should also specify to include a Content-Type header with a media type (which? need a parameter to the method?) and a charset parameter.
Same comment for DOMOutput when the systemID ends up being used.
The DOM WG discussed this issue and decided to specify that when writing to a HTTP URI, a HTTP PUT is always performed. For other typs of URIs, the mechanism for writing the data to the URI is implementation dependent. The WG did not want to extend the API to let the user specify a content type, though it was decided to make the spec state that the implementation is responsible of associating the appropriate media type with the serialized data. As for charset, use DOMSerializer.write() and specify the charset in the DOMOutput. (DOMSerializer.writeURI() is now simply a convenience method that acts as if calling write(), passing the uri using the DOMOutput argument).
In interface DOMOutput, the descriptions of encoding and systemID seem to have been more or less copy-pasted from DOMInput, not fully taking into account the fact that output is involved, not input. Setting encoding indicates an intention, not a knowledge of the encoding of some existing data.
In interface DOMOutput, is it not possible to specify the behavior when the systemID is a relative URI? Wouldn't "relative to baseURI of Document" work?
The WG discussed this and if anything, the systemId is relative to the caller's current location, but whether or not that's possible, and what that means, is implementation dependent. Therefore, the spec remains unchanged.
Interface DOMParser: character normalization checking is now controlled by the "check-character-normalization" parameter of DOMCOnfiguration defined in Core. The fact that the "true" value (do check) is marked as [optional] (not the default, not even required to implement) is not acceptable. Whereas Charmod says that normalization SHOULD be checked, users are not even able to check if the "true" value is not implemented. Furthermore, the DocumentLS.load() and loadXML() methods automatically do the wrong thing and have no way to do the right thing if the default is false.
Users *are* able to check if the "true" value is implemented or not. Using the DOMConfiguration object, a user can call config.canSetParameter("check-character-normalization", true), and that will tell them if the implementation supports character normalization checking. The DOM Level 3 Load and Save (and DOM Level 3 Core) specs do not *require* that implementations *must* support character normalization.
As for DocumentLS, that interface is no longer part of the LS spec.
This doesn't address our issue that making implementation optional is not acceptable.
For performance reasons, we believe the character normalization cannot be true by default. Also, no one in the WG committed to implement this feature.
Interface DOMParser: There should be an error type defined for failure to check normalization (sugg. "normalization-checking-failure") in addition to the existing "unknown-character-denormalization".
A non-fatal "check-character-normalization-failure" error was added to the spec.
Please reword to:
Raised if the parameter "check-character-normalization" is set to true and a string is encoutered that fails normalization checking.
or something similar.
In the discussion of interface DOMSerializer (above the IDL definition), it would be nice if character references were specified to be hexadecimal (preferred) or decimal. One way or the other determined by the spec, not implementation-dependent. Similarly (still within DOMSerializer), it would be better to specify serialization of attribute values to be always in quotes (or apostrophes, you choose), with escaping as necessary.
The DOM WG discussed this before, and the WG has always decided against doing this. If you want canonicalized output, set the "canonical-form" parameter, if not, you'll get implementation dependent output.
Reluctantly accepted. Given the apparently zero implementation burden of choosing one way or the other in the spec, one wonders why the WG resists this. Of course, the benefit is not great either, but given the rather severe under-specification of serializing anything but Documents and Entities, any amount of predictability would seem desirable...
We would appreciate a at least some text encouraging implementers to use hex for character references, since that is what all character encoding standards use.
One of the reasons the this request was rejected is that the WG wants existing DOM serializers be wrappable in an LS serializer w/o changes to the existing serializer (which may or my not be in control of who's wrapping it in an LSSerializer interface) and still be able to claim compliance (which wouldn't be possible if the existing serializer character references in a way that didn't follow what's required by the LS spec).
Text encouraging implementers to use hex for character references was inserted.
In DOMSerializer, the contents of the encoding pseudo-attribute of the XML (or text) declaration is underspecified. It should be specified that this MUST be the actual encoding that is used for output, whatever the source that determined that was.
In DOMSerializer, method writeURI(): there is no way to control the encoding that will be used to output. The method itself doesn't have a parameter, and the order of priorities is Document.actualEncoding followed by Document.xmlEncoding. Document.actualEncoding being read-only, the user has no way to specify the output encoding, except if by chance Document.actualEncoding is null. There should be an additional "encoding" parameter (nullable, to fall back to actualEncoding and xmlEncoding) to the method.
DOMSerializer.writeURI() is merely a convenience method (and is now defined as such), if you need to pass encoding information when writing to a IRI, use DOMSerializer.write() and set the encoding on the DOMOutput.
In DOMSerializer, method writeURI(): the name writeURI is a little unfortunate, it seems to imply that a URI is written, not that it is written *to*.
It should be specified that DOMSerializers MUST be able to serialize in UTF-8 and both byte-orders of UTF-16, to close the loop with XML parsers which are obligated to read these.
The DOM WG decided against this, however, it did decide to require that one of those encodings is required.
Please reconsider this one. It seems to be asking for non-compatibility of code. I think a minimum of one encoding should be required for all implementations, preferably UTF-8; and I really don't think it would be that onerous to require all three.
While this is sufficient for strict interoperability, it is not for compatibility of code. If there is not at least one required encoding, it is not possible to write a DOM program that will work over any DOM implementation. We insist that at least UTF-8 be required. Furthermore, since XML 1.0 did it back in 1998, it cannot be so onerous to require all 3. Please reconsider.
Agreed, the spec now requires that those 3 encodings must be supported when dealing with XML data.
In DocumentLS.load(), it is said that 'the parameters used in the DOMParser interface are assumed to have their default values with the exception that the parameters "entities", "normalize-characters", "check-character-normalization" are set to "false".', which is strange as the last 2 of these parameters do default to false anyway. "check-character-normalization" should default to true (see other comment).
The reference to Unicode 3.0 should be updated to Unicode 4.0, ISBN 0-321-18578-1.
Last update: $Date: 2003/12/17 21:24:15 $