[Bug 28479] New: [Ser 3.1] Character Maps

https://www.w3.org/Bugs/Public/show_bug.cgi?id=28479

            Bug ID: 28479
           Summary: [Ser 3.1] Character Maps
           Product: XPath / XQuery / XSLT
           Version: Last Call drafts
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Serialization 3.1
          Assignee: cmsmcq@blackmesatech.com
          Reporter: mike@saxonica.com
        QA Contact: public-qt-comments@w3.org

Some changes have occurred in the 3.0 and 3.1 specs regarding character maps,
whose implications do not appear to have been fully thought through.

In 3.0, item-separator was added. There's no clear indication as to whether
character-map substitution is applied before or after insertion of item
separators. The closest we get is:

Character mapping is applied to the characters that actually appear in a text
or attribute node in the instance of the data model, before any other
serialization operations such as escaping or Unicode Normalization are applied.

which I propose we change to:

Character mapping is applied to the characters that actually appear in a text
or attribute node in the instance of the data model, before any other
serialization operations such as sequence normalization, escaping, or Unicode
normalization are applied.

In 3.1, character mapping is applied to strings, as well as to text and
attribute nodes. This change was presumably intended primarily for JSON, though
it's not clear quite what the expected use case is. 

We say "If a character is mapped, then it is not subjected to XML or HTML
escaping, nor to Unicode Normalization." I would think that for character maps
to be useful with JSON, this should say "XML or HTML or JSON escaping" (indeed,
the rest of the paragraph could be interpreted as implying this).

(We actually say thrice that it is not subjected to XML or HTML escaping.
Presumably this is on the theory that "what I say three times is true").

I'm slightly worried that the extension of character maps to apply to strings
causes a backwards incompatibility for the XML and HTML output methods. In XSLT
2.0 it wasn't possible for the XSLT processor to send a string to the
serializer: only XML result trees were sent, which means any string would be
turned into a text node which would be subject to character mapping. But XQuery
3.0 could certainly send a string to an XML or HTML output method, and if we
accept that character mapping was supposed to happen before sequence
normalization, then character maps would not be applied to the string.

Finally, for the JSON case, I think it's not quite precise enough to say that
character mapping applies to "strings". We treat miscellaneous data types such
as dates, times, anyURIs and untypedAtomics by conversion to strings: does it
apply to these? Does it apply to the keys in maps as well as the values?

I would also point out that the idea of applying character maps early in the
serialization pipeline, and then treating mapped characters differently from
unmapped characters in later stages of the pipeline, is very messy from an
implementation viewpoint. We're saddled with this for XML and HTML
serialization, but do we really want to do this for JSON?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

Received on Monday, 13 April 2015 09:54:10 UTC