6909 – is a pre-lexical facet magic?

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6909 - is a pre-lexical facet magic?

Summary: is a pre-lexical facet magic?

Status:	CLOSED FIXED

Alias:	None

Product:	XML Schema
Classification:	Unclassified
Component:	Datatypes: XSD Part 2 (show other bugs)
Version:	1.1 only
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	David Ezell
QA Contact:	XML Schema comments list

URL:
Whiteboard:
Keywords:	resolved

Depends on:
Blocks:

Reported:	2009-05-14 16:10 UTC by Dave Peterson
Modified:	2009-10-10 02:29 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Dave Peterson 2009-05-14 16:10:50 UTC

We carefully explain that a lexical facet, by constraining the lexical space, may remove values from the value space.  We explain that a value-based facet may remove lexical representations from the lexical space.  But we don't anywhere (that I can find) say that when a pre-lexical facet us used, the lexical space loses those character strings that cannot be obtained by subjecting some character string to the processing that is required by the facet.

E.g., why does the lexical space of normalizedString not contain strings containing carriage returns?  The spec asserts that it does not, but there is no reason given anywhere that makes that loss other than magic.  (Much like in 1.0, the loss of decimal points in the canonical representations of integers was magic.)

Granted, we have explicitly (by magic) insured that every datatype using a whitespace facet does lose the strings in question, but we have also authorized implementation-defined facets.  Presumably the rules about lexical and value-based facets apply to implementation-defined lexical and value-based facets.  But there is no reason the rule I imputed above should hold for implementation-defined pre-lexical facets.

Shouldn't there be?

Comment 1 David Ezell 2009-06-26 16:00:44 UTC

WG agrees we intend to derive datatypes removing strings as a change in the property of pre-lexical facets, not by magic.

Comment 2 Dave Peterson 2009-07-06 14:59:12 UTC

Formal proposal:

In 2.3, following the definition of "literal", replace the next two Notes (with the paragraph between remaining unchanged) with:

[new paragraph and changed Note:]
  If a derivation introduces a ·pre-lexical· facet value (a new value
  for whiteSpace or an implementation-defined ·pre-lexical· facet), the
  corresponding ·pre-lexical· transformation of a character string, if
  indeed it changed that string, would prevent that string from ever
  having the ·lexical mapping· of the derived datatype applied to it.
  Character strings that a ·pre-lexical· transformation would change
  are always dropped from the derived datatype's ·lexical space·.

     Note: Systems other than XML schema-validity assessment utilizing
     this specification may or may not implement these transformations.
     If they do not, then input character strings that would have been
     transformed into correct lexical representations, when taken "raw",
     may not be correct ·lexical representations·.

[paragraph unchanged:]
  Should a derivation be made using a derivation mechanism that removes
  ·lexical representations· from the·lexical space· to the extent that
  one or more values cease to have any ·lexical representation·, then
  those values are dropped from the ·value space·.

[following Note changed:]
     Note: This could happen by means of a pattern or other ·lexical· facet,
     or by a ·pre-lexical· facet as described above.

Comment 3 Dave Peterson 2009-07-20 14:43:37 UTC

(In reply to comment #2)

During its telecon of 17 July, the WG noted that a pre-lexical facet could not only either leave a candidate character string alone or lose it from the range of the transformation it describes, it could also "scramble" the incoming character strings so that while it changes string S to another string, it also changes a different string to string S, so that string S is neither left alone nor lost from the transformation range.  Such strings should not a priori be removed from the lexical space.  Accordingly, I propose the following amendment, which seems to be a minimal change that avoids this problem.  Only the first ("new") paragraph above is amended:

> [new paragraph and changed Note:]
>   If a derivation introduces a ·pre-lexical· facet value (a new value
>   for whiteSpace or an implementation-defined ·pre-lexical· facet), the
>   corresponding ·pre-lexical· transformation of a character string, if
>   indeed it changed that string, would prevent that string from ever

Change "would" to "could".

>   having the ·lexical mapping· of the derived datatype applied to it.
>   Character strings that a ·pre-lexical· transformation would change

Change "would change" to "blocks in this way (i.e., they are not in the range of the ·pre-ledxical· facet's transformation)".

>   are always dropped from the derived datatype's ·lexical space·.

These two changes seem to be the minimum needed.  This is a formal proposal to the WG, but I invite alternative proposals.

Comment 4 Michael Kay 2009-07-20 15:22:27 UTC

Seems OK. In Saxon I have an example where I'm using the ability to have vendor-defined pre-lexical transformations to allow decimal numbers to be written in the European format of 1,00. The pre-lexical transformation converts all commas to periods and vice-versa, so the fact that 1,000 is converted to 1.000 does not mean that 1.000 is not part of the lexical space.

The tricky thing about such a transformation is that it needs to be reversible, but since a schema processor never does serialization, that's arguably out of scope for the XSD spec. Unless of course we think the datatype system is trying to support more than just validation.

Comment 5 David Ezell 2009-07-24 17:04:36 UTC

The WG agreed that the change proposed by Dave P. in comment #3 is definite improvement.  MSM expressed trepidation about the need for an even clearer resolution.

The WG decided to leave it to the editors to define a better solution, which failing we will adopt the resolution proposed in comment #3.

Comment 6 C. M. Sperberg-McQueen 2009-10-10 01:47:06 UTC

The change described in comments 2 and 3 is now in the status quo document on the server, so I'm marking this resolved.  

DaveP, if you would do the honors please, to signal your agreement with the change you proposed?

Comment 7 Dave Peterson 2009-10-10 02:29:02 UTC

(In reply to comment #6)

 DaveP, if you would do the honors please, to signal your agreement with the
> change you proposed?

Of course.  Done.