Minutes for XML Schema Face to Face, Chapel Hill, 14-15 October 2002

Mary Holstege, David Ezell, C. M. Sperberg-McQueen

$Id: xml-schema-ftf-minutes.html,v 1.1 2002/10/24 03:12:44 cmsmcq Exp $




Appointment of a Scribe

Done: MH for Monday.

Review of the Agenda

The goal is to be able to leave here with 2e and get forward traction on 1.1


DECISION: Approval of the minutes: without objection.


R-137 Substitution groups contain local elements?

Clarification: While we perhaps could argue that this is illegal from other points in the spec, we have been getting lots of questions from Query about this point, so making it more explicit is a good thing.

Clarification: This applies to the members of substitution groups: the head of the substitution group was already clearly not allowed to be a local element.

DECISION: Shall we resolve R-137 by adding an additional point to the element declaration properties correct schema component constraint, text as given by editor? WITHOUT OBJECTION

R-159 Attribute defaults bug

Clarification: The switch from "{value constraint}" to "effective value constraint" means that you look at the attribute use first, and finding nothing there, then look at the attribute declaration. Otherwise you would not pay attention to the value constraint coming from the global attribute declaration.

DECISION: Shall we resolve R-159 by changing "{value constraint}" to "effective value constraint" in the definitions of [schema normalized value] and [schema default]? WITHOUT OBJECTION

R-100 Overlapping for UPA

Clarification: local element A can still cause a problem

Correction: s/that/the/

DECISION: Shall we resolve R-100 by changing text in Appendix H as corrected? WITHOUT OBJECTION

R-107 Note on Include

Clarification: This refers to a logical sequence, not a temporal one. There is an order in which things must be done for redefine to work correctly, and this is defined elsewhere in the rec. This also applies to missing subcomponents.

Clarification: It isn't an error to have a missing subcomponent, only to attempt to use it.

Clarification: When we talk about subsequent processing providing a referent: this is within the same schema assessment episode.

Discussion: Many members were very uncomfortable with the language, in that is comes dangerously close to specifying the order in which a processor must perform certain tasks, something we have been careful to avoid. For example, we have already defined away subsequent schemaLocations by saying you have to behave equivalently to having had them all at the beginning. So being able to "later" find a schema component definition isn't right.

Some felt the language less objectionable in a non-normative note about how processors "might" behave and now find that making it a normative statement of how processors "must" behave is problematic. In particular it is saying "how" rather than "what".

Q: Explain why this should be normative? (The decision record records no particular rationale.)
A: Because otherwise behaviour of certain class of chameleon include is underspecified. But cannot reconstruct an example right now.

[[At this point Paul Biron joined the meeting]]

HST points to some prose in 3.1.3 that speaks of logico-temporal ordering and suggests that the prose in the note can be made parallel to this text without the objectional phrase "subsequent processing". Propose to do that.

DECISION: We ask the editor to revise the text to allay the expressed fears about normative temporal language. WITHOUT OBJECTION.

R-68 Complex types with simple content derived from mixed type?

Clarification: Is the parenthetical remark "(which must be present)" an additional constraint, or is that always the case?
A: Always the case.

Clarification: The non-highlighing of if-then is a stylesheet bug.

Clarification: Rules 1 and 3 used to have parallel text and now we don't. Why is what? Shouldn't they be?

A: We used to have 3 cases:

Now we have:

Have made minimal change to accomodate that change.

Q: Got impression from Alexander's comment that there was excess text that ought to be deleted.
A: Didn't find that to be the case.

Q: Looks like a backdoor for deriving simple types to complex types.
A: That's not a new door: yes can restrict away the attributes, always could. This just lets you start with mixed instead. Not actually a simple type, is a complex type with simple content; extensionally equivalent to simple type.

DECISION: Shall we resolve R-68 with the text as given by allowing 5.1.2 to rule? WITHOUT OBJECTION

R-107, revisited

HST has changed "in case subsequent processing provides a referent" to read "in case an appropriately named component becomes available to discharge the reference by the time it is actually needed for validation" which is text culled from 3.1.3.

Quick read from those who expressed prior discomfort is that while there are things one might wish were otherwise, we can live with it.

Clarification: Is "for validation" too limiting? Are there other cases?
A: "for validation" is from 3.1.3 that doesn't mean it is correct; could just drop the "for validation" and that might be clearer even.

Straw poll shows strong preference for dropping "for validation"

DECISION: Shall we resolve R-107 by accepting the text as amended? WITHOUT OBJECTION

Although members show some continuing uneasiness about the logical/temporaral sequence text, but willingness to move forward.

R-120 canonical form of date

AM proposes separating 3 things:

  1. value space
  2. whether value space for dates with and without timezones is different
  3. canonical representation (inc. algorithm for how you get there)

He observes that that we have not heard a lot of debate about the canonical representation; people seem relatively happy with it. Think value space should be 24 hours starting at 00:00:00 and value spaces ought to be separate for dates with and without timezones.

HST: Wasn't aware that any of proposals included any change in relationship between dates with and without timezones in the value space. So don't believe that is on the table.
AM: OK, fine. Yes, they are separate and there is a partial order between them.
MSM: Do believe that question of how we describe the relationship comes into play, however.

So, with respect to (1) and the value space:

AM asks: can we agree that value space is 24 hour intervals starting at 00:00:00?

Q: What do you mean to convey when you say "starting at 00:00:00"?
A: Two ways to look at it: Our dates only start in UTC (no, not that); or UTC starting at 1:00, 2:00 etc. to show timezone offset from UTC.
Q: So, then that date cannot be said to start at 00:00:00. So I'm confused?
A: Yes, meant 00:00:00 in a particular timezone.

DP: I think AM is proposing same thing that I did. If take 00:00:00 moment given a timezone, is a moment on the timezoned timeline of datetime and we have said what that is, I hope, is not 00:00:00 in Zulu but is in some timezone.
AM: Right.
DP: So think we're all trying to get the same value space, and only is a question of how to describe it, right?
AM: Right.

MSM: I am troubled that some of your proposals seem to countenance points that would not correspond to 00:00:00 in any timezone.
DP: Contraints elsewhere forbid this.

Summary of consensus: we all agree on what the value space should be, but still fail to agree on how to express that.

MSM isn't so sure he agrees. He is concerned about non-timezoned dates. WRT timezoned dates, there are two notions of value space:

  1. intervals on datetime line; so type is logically dependent on datetime
  2. tuples

We could interpret tuple as denoting the interval, but strictly speaking, it doesn't say that.

WRT non-timezoned dates, I think of them as discrete objects, not as intervals on a continuum. It may be important if you want to map non-timezoned dates onto main timeline, but isn't crucial.

DP: True, but speaking of it that way makes it easier to explain and less likely to be misunderstood.
AM: MSM, does a date without a timezone span 24 hours or something larger?
A: Yes, 24+28 = 52 hours.

NM: Point raised by implementors: turns out SOAP folks were very confused about our date/time types at interop meeting, when explained it, they saw it as hard to implement (specifically because timezoned and untimezoned are mixed in one type), and cannot easily separate them into distinct types, so could you at least give us derived types. Don't know if this affects our thinking, wanted to pass on this information.
AM: Please ask them to read F&O doc where this is spelt out. What we have said is that there are no dates without timezones -- when validate and store, give it a default timezone. When people speak about a date without a timezone, they usually mean "here and now".

This discussion led rapidly to a rathole and the chair called a halt and asked us to move on.

HST: I wrote the proposal the way I did because I did not want to leave anyone in any doubt about what the value space is. I had trouble with the somewhat ineffable referent to "a Gregorian date", so I went as far as I could the other way. I am willing to step out of the way since DP and AM coming close, but retain a certain amount of discomfort about that approach because I think they will need to explain a lot more that we haven't answered yet.
DP responds: People want to impute functions and relations upon these value spaces that we don't define, and want these functions and operations to work appropriately and consistently; therefore we need human-understandable prose. There would be no difference between decimal and datetime and string otherwise.

MF: If question is a user interface one, perhaps we should ask opinion of F&O task force, who are the clients of this explanation after all.
DP: Questions whether they are the only important audience.

We conduct a straw poll with the clarification that if we go for tuples, we could add the explanatory note that they map to intervals.

There is some small support for intervals, more support for tuples, and a greater number who are unsure or don't care.

Q: What would it take to make you sure?

Some members suspected that a coherent account of partial ordering in the tuple account would get very hairy, leaving us better off with intervals. The counter to this argument was that tuple comparison is well defined and straightforward for any tuples, needed no special rules for dates. Others feared that informal text was apt to cloud the issue more than clarify it, and we should say no more than we need to. The counter to this argument was that we got into this situation because of different inchoate opinions held by various members, which opinions turned out to be horrifying to other members once revealed, so saying too little is a real danger. A couple of members reiterated the request for external input by those who know and care more, also expressing some reservations about putting in place an informal description of the type that the F&O folks would feel they had to reconcile with their own work.

Note: The tuple proposal does not rule out impossible dates such as 30th of February. DP's proposal does, by extra appeal to Gregorian calendar rules. DP expressed concern that HST's proposal allows 30th February and 1st/2nd March and these are separate dates.

Not done, will come back to errata tomorrow afternoon.

R-54/R-93 et al., aka "the Big Kahuna"


1.1 Proposals

We walked through requirements document and look at whether we have any proposals wrt specific requirements. Localization issues with datatypes

No concrete proposal in view.

Clarification: Does this mean things like , and . as decimal separator?
A: No. Refers to issues raised by I18N in last call. Unit of length for all primitive types

PB and AM have a concrete proposal: number of characters in URI + number of characters in the NCName.

Q: Are encodings dealt with or not?
A: Characters in the name not in the encoding of the name.
Q: Is proposal somewhere I can find it?
A: No, not yet.

ACTION: PB, AM to provide proposal. Regex or BNF for all primitive types

Yes, have a proposal from Alexander Falk.

Get link to proposal.

Point of information: Where do we stand on larger proposal to do overall formalization of part 2? Would like to sync up.
Q: Are all our primitive types regular languages?
A: Yes.
Statement: But BNF allows for naming of parts, which may fit better into larger semantic description.
Statement: BNF tends to reify concepts that might not in fact be relevant; regexp just defines the language Fundamental facets

HST: Moves to add "and vice versa" -- both value space to canonical form and lexical form to value space.

AM: Confused. I thought this was about redefining order and equality and taking out the other fundamental facets.
A: That is here too.

We have a proposal from DaveP apropos cleaning up the fundamental facets.

For canonical forms (missing for QName): we have a proposal from Xan on the January IG and supposedly one from HST as well but he doesn't remember it.

DP: Also need to define what happens when restrict away canonical form.

ACTION: Ashok, add new candidate requirement wrt restricting away canonical forms.

Rules for generating canonical form: no proposal on the table, but DP volunteers to do it.

ACTION: DP to provide proposal.

Regex/ENBF for lexical forms and canonical forms. This would be part of DPs proposal.

HST: Formally, can we add as desideratum explicit description of mapping from lexical forms to values and from values to canonical forms?
NM would prefer it explicitly say "formal mapping"

DECISION: WITHOUT OBJECTION editors will add that under 3.1.2

Clarify status of anySimpleType; define its value space if any.

Q: Does Big Kahuna [R-54/R-93] cover this?
A: No. Doesn't cover the value space.

We have concrete proposal aligning parts 1 and 2 along lines of defining it as the union of all primitive types.

Get link to proposal

Clarify assignment of types to nodes [information-item] in absence of relevant schema components. Yes, have a proposal.

Get link to proposal.

Distinguish our identity relation from mathematical equality.

Covered in various errata. DP's facets proposal allows us to have a notion of equality that is closer to mathematical equality. Interactions with legacy types

Clarification: At issue is what happens if you have a list whose itemType is union of ID and token. If it isn't unique do you treat it as token?

We have already clarified in an erratum the distinction for ID etc. between type validity (simple lexical check) and full document validity (uniqueness). So we believe this is no longer a live issue.

So proposal is that we could drop this, given that it is no longer an issue.

Clarification: So list of ID is a separate issue. Yes.

3.2.1 Canonical representation of float and double

AM: This is related to the fact that float and double are not exact types. Don't want to go there.
Q: Do we have canonical form for float and double?
A: Yes.

Statement: e.g. 1.0...[1000 0s]01E2 vs 1.0E2 map to same value so what is the canonical form? Current specification does not quite do it. Two ways of thinking about canonical form:

  1. Canonical form := multiple lexical representations for a value, tell you which one to use; maps all legal lexical representations for that one value.
  2. Canonical form := one legal float value stands for an infinite number of real numbers.

Do we have a proposal for clarifying canonical form? Yes, HST gives one -- add "minimum length" to list of current requirements. AM proposes to write something up.

ACTION: AM to provide the proposal. scientific notation for decimals

Clarification: No, this is not precision decimal. Could want this independently of precision decimal.

Question about whether we know how to handle all our facets with this type. We do have a proposal.

Get link to proposal. Negative scale

We have a proposal.

Get link to proposal

Wait, do we really? We don't have a scale facet, so what facet did you have in mind?
Q: totalDigits=5, fractionDigits=-4, want to denote 50000; is "5" legal form?
A: Yes, denotes 5. A: Yes, denotes 50000.
Q: So do we really have a proposal that we understand?
AM & PB concur: I believe there is a proposal on the table, but am not for it.
MSM: Think this is mostly useful in line with precision decimal.
NM: Mike's scale is not what our fractionDigits is. We do not have positive or negative scale at all.

General sense that this is not an opportunity that anyone in the room is likely to seize. We think there's no coherent proposal here. We have ample evidence from prior discussion that there are many proposals that are inconsistent in their interpretation, and absent a specific proposal with specific use case, propose we not burn any more time on it. Decimal types with trailing zeros

Covered by precision decimal.

We have a proposal by DaveP and a proposal from Microsoft for precision decimal. Canonical representation of duration

This stems from a desire for duration to have total order.

Q: Don't we have a canonical form?
A: Yes, it just isn't what some folks want.
Q: But on looking at spec don't see anything there.
A: Because there is only one lexical form.
Q: Yeah, but what about leading zeros and such in the integers?


  1. Spec currently does not have a description of canonical form
  2. 24 hours != 1 day

There are several proposals on the table for this from Michael Kay, Andrew Eisenberg. All amount to use some sort of approximation for days in month, days in year. But note that we have talked through such proposals at length in the past, and rejected them for various reasons. We also have a proposal from Microsoft to split the types.

Get links to proposals

PB: HL7's date/time type system subdivides: calendar aligned types vs non-calendar aligned types. In calendar-aligned, 1month not comparable to 30days; in non-calendar aligned use approximations. Very much against just having non-aligned as the only type. Will put proposal on the table for additional type.

ACTION: PB to provide proposal. date and time localization

Needs to be expanded. timezone normalization

Believe subsumed by R-120, but regardless, we have proposals.

Get links to proposals. Ordered duration types

F&O has done something that matches up to SQL types: splitting into two types.

Get link to F&O work. First class objects

We have a proposal (NUNs). Localization of structures

Needs to be broken out. Inline schemas

We have a proposal: do nothing and this will happen once fragment syntax allowable for XML. MSM clarifies that we ought to explain this in a note.

ACTION: AM and CC will draft the note based on content HST has sent in email "several times". Redo restriction rules

Clarification: "on the language" meaning "on the language accepted by the schema".

HST has proposal in the archives.

Is this the right message to link to?

MSM has proposal as well: implicitly define intersection that you can check by checking both, or being smarter if you can manage it.

NM: We should not consider this in isolation from what we do about co-occurrence constraints.

Point of information: Is determination of the subset possible?
A: Exploiting UPA check can be done with FSAs and is linear in size of FSA; size of FSA blows out with min/maxOccurs. Detailed message in IG archives explains. Pointless occurrences

Mooted by But do have proposal for this from Microsoft. Choice:choice rules

Mooted by Simplify final and block

MF believes he has an old proposal. Will dredge up the link.

ACTION: MF to provide proposal. Abstract simple types

Have a sketch of a proposal from DaveP for abstract number. We also have a proposal from Microsoft for an abstract binary. Had old proposals; MSM would like this but doesn't have a proposal.

NM: Need to disambiguate DaveP's and MSM's are quite different things.

AM: Would like detailed proposal quickly; DaveP's proposal is just a sketch

ACTION: DP to provide proposal. Local references

Statement is a concrete proposal already. Normalized value for complex/mixed element

HST: Will pull this together as a proposal.

Will need deadline for proposals to be delivered; will discuss this later.

ACTION: HST to provide proposal. wildcards and substitution groups

Mooted by Expand wildcard namespace constraints

We have a proposal from Asir and a proposal from Microsoft.

4.4.1 interaction and exclusions and disallowed substitution

MF believes he wrote some stuff up about this a long time ago. Could dredge up and write it up.

ACTION: To MF to do do.

4.5.1 Improve named model group syntax and component

AV has a proposal relating to occurrence indicators. No schema components for selector/field annotations

HST believes this is subsumed by 4.7.1 Problematic restriction of identity constraints

Proposal: rule out identity constraint definitions on local element declarations. Identity constraint definition in schema schema component

Proposal: add a property on schema schema component that contains all identity constraint definitions. co-occurrence constraints

PB will put proposal on the table for subset of ccc's that will not be invalidated by whatever solution in 2.0 for all kinds of ccc's: occurrence-based vs. value-based. i.e. that set supported by Relax-NG, but not Schematron. cf RQ-118 (choice around attrs and model groups). Hung off the type.

NM: Point of information, by keeping UPA we are not quite as expressive as Relax-NG, true.
A: True.

HST: Gloss as "reconstruct conref"

ACTION: PB to provide proposal. Key constraints based on element types

Q: Why can't we do this today?
A: Don't know what root element is? Need to know in advance all the element names?

What is this issue really? Has to do with: RQ-33, LC-151; cool tricks with IDREF

MSM observes we have this: use key/keyref. Value-based co-constraints

No proposal here. Annotations in PSVI

HST will draft proposal that makes {annotations} property on all schema components that pulls in out of band attributes and annotations. Will need to pull in [owner] for attrs. MH expressed some reservations with the details of this approach.

ACTION: HST and MH to provide proposal. Normalized default value for attrs

Proposal in the requirement.

HST in two minds: this is just one of the attr properties, and doesn't ask for the others. But notes that we don't have a story about repeated validation which depends on this property and putting it in is a good thing.

NM: Argues not now. Not clear that [x] on output should be treated as [x] on input is the right architecture for pipelined processing. Other solutions are possible.

Candidate requirements

DP: Believes he had discovered three errors in our date/time types that must be fixed. Point at recent messages in IG.

Get links to the messages.

ACTION: AM get these items listed in candidate requirements document.

We now considered the unclassified candidate requirements.

Our job here is to classify these (for 1.1) as requirement, desideratum, opportunistic desideratum, or non-goal.

RQ-97 typed wildcards

It was pointed out that we had a proposal of this in the PR that we took out, in part due to problems in the area of UPA and non-determinism. On the other hand, this proposal is different and may not run afoul of those problems. Also, if we went to extensional restriction and required processors to compute subsumption for that we could do this: making processors figure out whether there was overlap is algorithmically very much the same as calculating subsumption.

Clarification: The intention is that typed wildcard would be strictly processed.

Q: Would this require lookahead?
A: Could be.

Classification: opportunistic desideratum

RQ-98 deprecate unused language features

Great concern was expressed about the operationalizion of this requirement. Applying the test of deprecating or fixing anything in the test cases list that did not have 3 schema processors agreeing on the correct outcome would not fly. Given 40000 test cases in the test database, 1000s of those cases would fail this standard because many processors forbid comparative evaluations in their EULAs (including ones from the major vendors). Of the processors that have been tested, XSV doesn't even claim to implement simple types, so all cases involving them would necessarily fail this test. Other members further objected on the grounds that doing this would require the dedication of a great deal of resources, in an area we have historically shown an inability to do so.

NIST hopes to get conformance testing to happen, and prevail on vendors to do the testing necessary and report any variances.

NM: Suggest we not tie this to specific test cases, propose we agree to put on to-do list for 2.0 to look for major features of language whose interoperability is proving problematic. Don't believe we even have time to do this for 1.1.
CC: Way to early to do this. Not enough people are doing schemas in real world to start taking things out of it.

Point of clarification: end user license agreements for some XML Schema processors disallow publishing results on tests.
Q: Then nobody can report bugs??!!
A: Yes.

Concern was expressed about difficulty making an operational requirement from this, concern that it is too early and 2.0 is better time frame, concern about thinness of our database.

Classification: non-goal for 1.1, requirement for 2.0

RQ-99 Multiple substitution group

We have a proposal on the table from HST.

MF stated that there are some problems with this proposal that have to do with UPA and there is absolutely no way to fix it with multiple substitution groups. This claim met with a great deal of skepticism. While it was acknowledged that it is certainly easier to run afoul of the UPA contraint with multiple substitution groups, many members did not see how adding multiple heads changed the facts of the case. MSM cited his experience with TEI using parameter entities in this kind of idiom and noted that while it could be a bitch to work out, he didn't believe there were any logic bombs.

MF: Happy to put this on the table and discuss it; just believe there are severe problems with it. Let it go and if I'm right it dies and if I'm wrong it lives.
HST: Believe it is no different from what we have now. Proposal is simple: from QName to list of QNames and likewise at component level. Should be easy to check. Given that there are doubts about its possibility should not make it a requirement, but would like it to be desideratum.
MSM: Clarification: your position is that if you add multiple substitution groups and get UPA violations the answer is your gun, your foot, your bullet?
A: Yes

Classification: desideratum.

RQ-100 Canonical form for language

PB: This is misphrased. Because our equality is identity we don't have a way of saying that two values in the value space compare as equal.
HST: Why not restrict value space to all lowercase?
PB: Yes, we could do that.

Clarification: EN be mapped to en or declared illegal? A: EN be mapped to en, which makes language magic (i.e. primitive).

MSM: Point of order. I believe we dispatched this in Redmond. We talked about case folding and decided to say this is a point where our notions of equality and identity differ from RFC. So I think we have dealt with this. Ashok has written erratum on this. So we decided on the erratum (R-130), but not for 1.1.

Therefore RQ-100 has been overtaken by events.

PB: Another issue wrt language datatype. XML Core is to issue erratum that says value of xml:lang can be the empty string. Our regexp does not allow language type to be nullable. So we could again revise the regexp to allow empty string or we could we change schema for XML namespace to make type of xml:lang be union of language and empty string.

Q: Should we hand off schema for XML namespace to XML Core?
A: Not yet.

Overwhelming support for changing the legal value of xml:lang rather than changing the legal value for the language type.

ACTION: HST to fix schema for XML namespace at the appropriate moment.

HST: Note, that document has no normative status.

RQ-101 PSVI representation of untyped character data

The message pointed at in the requirements list doesn't elucidate what the problem is exactly.

HST: Subsumed by section 4.2.3 opportunistic desiderata for types and type derivation item 3, normalized value for complex/mixed elements.
NM: Relates to proposal that comes forward at times that we consider processes that perform type assignment with or without validation and record that in xsi:type attributes which considers a bad idea and something we don't want to do.

Need action for someone to get back to Phil Wadler.
NM: No, that isn't who originated this anyway.

101 dies.


Administrative notes, Tuesday 2002-10-15

<Lisa Martin on the phone>


Mary and Asir presented Nuns.

Mary Holstege on basic approach

MH said that the editors' goal was to get some blessing for the basic approach they were taking. They suggest we distinguish two distinct problems, which can be called the left hand and the right hand:

She identified several open questions:

The problem of designation. There are two ways of looking at this:

  1. What you're designating are schema components with regard to a schema.
  2. What you're designating is a class of components with a common name (like 'p', as in "the HTML 'p' element").

How universal does the scope of the name have to be? We need input from Query. There are a lot of different kinds of schema components. Do we need all of these, or should we focus on the "big three" (attributes, elements, types)?

Do we need to distinguish attribute use from Attribute declarations?

How should NUNs behave in the presence of redefinition? What does that do to the component model? Do we need NUNs for both new and old (original and redefined) components?

There are issues with the component model as it is. There are fixes we might make. Can we rely on fixes in the timeframes for various scheduled releases? How should the design of NUNs relate to possible changes to the component model?

Syntax. What should the syntax look like? XPath? Extended QNames? XPointer?

Are we traversing a schema component graph? or a canonicalized dump?

We can distinguish several LHS options:

  1. take XPath through canonical XML dump. Need to define the canonical form, but XPath is well known. The left-hand side resolves to "a document". Downside: whole new schema model. New XPath axes? Anonymous components are a problem.
  2. XPointer scheme (what we've done in the draft). Note that XPointer isn't published. Need an XML resource and a Mime type. It may not be a good fit. Invent an XPointer scheme? Make it look like XPath, but not really?

RHS questions:

  1. What URI is it? a) namespace (what about non-namespaced components?) b) schema location (what if there's no schema document?) c) assembly document.
  2. What is the URI designating? a) resource, b) assembly?

Asir Vedamuthu on the content of the draft

USCI - Universal Schema component identifier. (presentation on W3C Archive http://lists.w3.org/Archives/Member/w3c-archive/2002Oct/0028.html). OOB notes... How do we determine "shortest path" to a component? Mary -- we intentionally went for a "completest" solution.

Comments & clarifications:

HT -- should the predicate be put into the sequence() designator? This doesn't work for "choice"s. How do you identify the branch of a choice... numeric designator won't do. Xan -- does "17th" banana mean in a specific sequence? Matt -- issue is what are the use cases? Noah -- since you "skip particles", you can address min and maxoccurs separately? is there a component with the property (counting) I might want to look at? Mary -- it wasn't clear there was a use for it.

HT -- it's worth that noting that attr use and attr decls have props in common, whereas particles are disjoint.

MSM -- string comparison is not good enough, but I believe what you've described would allow string comparison.

Asir -- what's not in there yet. Error reporting, namespace binding context, additional XPointer schemes, uri escaping, and the "left hand side".

Technical discussion

Paul -- question "what's designated", is a rathole -- discussion is going on right now in TAG. Hopefully we won't go there. Only approach is we have to, and if TAG doesn't resolve. MSM -- If from the structure of our SCDs we can distinguish a component from a namespace identifier, that would be a good thing. E.g. lefthand side will never be confusable with the right hand side. Matt -- but LHS is underspecified. We can get away with saying "the left thing" designates a set of components, and "the right thing" is a traversal method.

MSM -- we're not discussing the left side (other than distinguishing from right). HT -- your conclusion follows from Mary's premise that we separate left and right. We should establish a syntax that makes lh and rh designators. Noah -- 2 things. 1) it's important that ns participate lightly in our spec. There are questions we can't resolve with regard to name spaces. 2) web architecture assumes that the rhs of '#' implies a MIME type. XPointer complies with that. I believe that such is possible. MSM -- the MIME type is associated with the representation served, not with the resource. If you show me a URI, there is not guaranteed to be a single answer to the question "what is the MIME type".

Matt -- is lhs out of bounds, or are you saying "take it to email".

Somewhat lost in here: MSM -- had expected that we had general assent to the lhs assumptions, but that's obviously not the case. I think it would be a waste of time to follow that discussion here, since we can discuss the rhs.

MSM -- striking new thing in this design is that by using QNames as the arguments for each "pseudo-function", you have to keep redeclaring the namespace. Would NCNames work OK. MSM -- slight modification, Used to thing form-default = qualified, Asir -- If you derive from a different namespace, it pulls in things from that namespace (without namespace designations). David -- could we use "element name default = unqualified" idiom? Noah -- I suggest we not rely on idiomatic usage.


Mary -- target refresher.

1) need sense of direction on approach 2) given list of 28 schema components, which should we designate 3) the left-hand side

General approach

general approach -- WG is happy.

HT -- focus on details of navigation proposal. MIME type issues are a potential rathole. Nothing we do there will affect resolutions in the larger scheme.

Which components need designators?

MSM -- two points of view 1) only the big three or 2) write a lang which can designate any component. I don't want to have to explain why some can be designated, and others can't be. Matt -- problem is that's fine as long as the difficulty of finding "usual" components isn't complicated by the need to find other components not as "usual". MSM -- I thought one characteristic of the proposal was that details don't complicate the main functions. Paul -- the "big three" are what people want to talk about, but the other things are "side paths".

HT -- this seems to be a use case supporting removing local element declarations from content definitions. Given the current component structure, should we make it possible to distinguish between an element called "XY" locally, and an element declaration at a specific point in the CM. Can we encourage "eliding steps" in the designators? Mary -- that's an open issue in the spec already. Noah -- there is an interpretation of the current spec that gets us close to an "element use" construct. It is already the case that there is a particle that goes with each such use. We've got an issue of identified particles already. Asir -- not sure we can resolve today. Kongyi -- why do we have to go through groups to identify components. Asir -- we could end up with duplicate schema component designators. David -- why do we need so much detail in the NUN, at some point you have to read the schema. Matt -- this is a dangerous complex type. MSM -- I'm worried about the desire to elide the distinctions between different local names, and how this affects UPA.

Lefthand side

HT -- identify a set of schema components by either private UUID appropriate for built-in schemas, and urls for all the named schema documents in the composition. Noah -- a key piece to talk clearly about is the media type. We need to tell folks "if you invent a new type, you have to define this". Worried about complexity of the flat uris, but like the fact that they're flat.

Asir -- cameleon include and redefine are problems. URI or app dependent name with a Mime type designating a set of schema components. Jim B -- we risk confusing people when we put too much on the LHS. Matt -- agree with Asir, meaning we punt or push on "how do you compose". We don't have to deal with that now. Xan -- agree with Matt. Seems similar to how we construct documents with hints (etc.) Paul -- first level (designate the "SCHEMA" component), should be a URI. Don't go too deep. Mary -- preserve carefully crafted space between "namespace" and "schema document". Would like to think of a way of referring to the collection.

MSM -- declare discussion of NUNs done for today. Move to discussion of the test suite and its current state and what we need to decide to move it forward. Ask John Tebitt to discuss this. John sent a note on Sept 24 "Subj: Conformance Test Suite" on WG list.

Conformance testing

John -- I don't actually know about the current suite... what I want to do is get V2 out.

HT -- we folded in the MS contribution (up to 40,000 tests) in Spring of 2002. May proposal for restructuring the test suite was discussed, and how to fulfill the plan agreed to at that time is where John comes in.

John -- we at NIST are ready to go. We need a couple of things from W3C. 1) place to store the tests. 2) mailing lists for discussions.

I sent out a proposed directory structure for the repository. Also methods for submitting both tests and the results of those tests. The reason for the 2 schemas and directory structure is to automate the process of testing. What we hope is that we can get folks to participate on an internal basis. When an implementor posts their results, it naturally leads to a discussion -- implementation at fault, rec at fault, etc. Use a filter up through various review layers. Structure works for XML Core. Requires a few minutes a week of telcon time.

MSM -- you've put 3 things before us: 1) commitment to review schemas for the test suite and reporting results. 2) there's an issue with confidentiality vs. publication of test results. Does the submission of results have a property to state that publication is allowed. 3) we need a formal mechanism for handling this situation -- i.e. the "filter-up" process, where we need to either issue an errata or establish and issue.

Paul -- clarification on confidentiality. Set of test cases for FO were member confidential (W3C) but not WG confidential. I think it should be W3C confidential, not WG confidential. MSM -- I think our web server is set up with access for W3C access, but WG access will be difficult to acc

<mary (NIST) joins by phone> <peter chen arrives>

HT -- I would like to be able to demonstrate to world our commitment to interoperability without hindrance. Paul -- I'd like as much visibility as possible. Mary, can you speak to the releasability of test results for Core? Mary -- Core didn't publish, comments only. FO was published to the membership, sanitized results (no names) were published afterward. MSM -- we have sentiment for making the results as open as possible. We have rough consensus that public access is a desirable goal. Some support for accepting test results with selectable confidentiality. David -- W3C blessing for a system? HT -- no. W3C is worried about that. Chuck -- might be an encouragement to have a tool like that. Noah -- caveats. (not representing IBM, invited expert). I think there are two positions: 1) it's in the spirit of this group not to restrict tests and their use. 2) if we want to go to the next level, we could say something like "you can't claim conformance with these tests, unless you allow others to publish results (of your stuff) against these tests, too." MSM -- John and Mary: are such arrangements common or workable in your experience? Mary -- we don't really have any experience with this. MSM -- do companies advertise test results? Mary -- some do, issue press releases, etc. Chuck -- what did you do with the FIPS testing for SQL? Mary -- yes, we did that with a "stamp of approval".

Volunteers for TF -- Kongyi, Chuck, MSM(chair), John, report in about 3 weeks.

MSM -- on confidentiality, do I take that we encourage publication results, but that we'd rather have confidential results than no results.

John -- discussion of the tests and the rec are the most important things.

MSM -- if you had to choose between 1) strive encourage publication of results 2) most important thing is information and discussion of the tests is most important.

Preponderance for #2, but not unanimity. Some reject proposition of both.

Third point: filter-up model. We need a written description of how it works, and want to align this with our errata handling (in process document -- recommendation comments).

Action -- Lisa and John to review the process document for changes required (3 to 4 weeks).


Overall goals and schedule

Goals -- from Redmond f2f.

XML Schema 2e, the end game

MSM -- do we revise the "must have" list? 12 outstanding issues on structures. Some outstanding on datatypes. HT -- I can table drafts for all of these before our next telcon. 6 of these are must haves. Paul -- If I do 6 hours a day on this, I can get 9 or 10 of those left that I consider "really must haves" on the table. Those would be no-brainers for the group to agree to. Asir -- appeal to editors to give us a few days before the call to review the changes. MSM -- do we meet this week? (many can't). We could work on 1.1 requirements. Upshot... no meetings this Thursday and Friday. Reserve the time for prep.

MSM -- 9 or 10 errata text before next Tuesday, and the remaining the following Tuesday, we can publish the first of Nov., and meeting focus becomes 1.1. Or, bag everything, publish now. WG votes to finish errata by Nov 1, and publish after the AC moratorium (Dec 1?). Editors accept the errata targets (9 to 10) above.


MSM -- our plan was for Mary and Asir to work on a working draft, and publish coming out of this (Oct 14-15) f2f. David -- did we give decisions. Mary -- I'd as soon go ahead and publish and get feedback. Paul -- have we talked about publication of WD toward draft vs. Note? MSM -- no. Paul -- recommend "recommendation" WG -- decision to publish Nuns as a "recommendation" (consensus) when we publish. MSM -- should we publish now. HT -- I'd like to spend a little further work on making sure that we've answered the 3 questions. Maybe we should recognize that we should publish in Dec. (AC blackout). MSM -- I don't see an advantage in publishing it, know approximate direction, and know the issues we're aware of. I see only upside. I don't see a downside. Noah -- another reason to support publish, is that WSDL is starting to work on similar problems, and a WD would make it easier to coordinate discussion. Paul -- I know one side: how close is current draft to approved pubrules? If we can get through pubrules before Nov 10, I say go ahead.

Amended proposal -- if possible publish before moratorium, or if not possible, do light revision and publish after the moratorium. Matt -- I say we publish what's there now, before or after the moratorium. Future discussion goes in a future draft.

MSM -- formally I'll treat this as an amendment. 1) WG decided -- Matthew's proposal. Publish what's there now. 2) WG decided -- unanimous decision to publish first public WD.

Henry -- to fulfill this requirement, Director or COO must approve.

XML Schema 1.1

MSM -- we discussed in Redmond that we wanted to make last call in January. That doesn't seem feasible. Paul -- first WD end of Q103, with no promise about last call (wrt publishing or feature complete). Noah -- where are with WRT the process of determining scope of 1.1. I think we're still iterating on a feature list. Some people define scope based on time. MSM -- that's exactly what we said in Redmond: the timeline determines the scope. Strong support for making it a schedule driven release. The WG *did not* decide that it had to be released in Dec. We decided to reconsider after we knew more. Noah -- this seems like a bad idea to reconsider the schedule over and over. What is it about a feature that allows us to recompute the date? MSM -- it's not part of consensus that members agree to "why" the answer to the pub question is what it is. HT -- for my part i'm comfortable with the suggestion from Paul that our requirements review has given us a reasonable outline for 1.1, and that we can make the progress suggested (pub in Q1). Dave P -- we'd tentatively said that nothing would be done in time for our January meeting without a push for December. We need to set a schedule now to meet the revised goal. Asir -- what about publication of the 1.1 requirements document? Are we ready? MSM -- now we're discussing req publication instead of 1.1 Paul -- we need to finish classifying the last few requirements, do some editorial work, and publish it as soon as possible. Suggest finish classifying the reqs during the next two weeks. Use ballots? MSM -- we can put these remaining requirements in the strawpoll system. MSM -- question for implementors: in Redmond, i recall that reps from orgs of implementors, if we see prog on 1.1 soon, our implementors will stay interested. Later than that could be problematic (Ashok). I'd like to know if that's still the case. Noah -- not sure whether publishing of vague requirements is progress or not. HT -- a decent req document with a 1 para summary of every feature (before end of the year) would be the best thing we can do toward reassuring the developers and requirement community that we're on our way. Propose end of year as target. Paul -- I second that. End of year is a good target. MSM -- what I hear is mostly support for: 1) use web-based strawpoll system to do prelim classifcation of remaining reqs, discuss, propose. 2) integrate results into req doc. 3) publish request before end of year. 4) WD sometime Q103 (Feb). MSM -- I register my deep concern with this schedule slip. David E. -- I share the concern (with due respect to the editors). And we need to get input from the editors.

<call Lisa M. on the phone> Lisa M. -- the proposed slip is not that big a concern. So I think it will work. MSM -- so I'm hearing rough consensus on the schedule as outlined above (dissent W3C and NACS).

John -- is it possible to skip 1.1? go on to 2.0? MSM -- my reading of the sense of the WG is that 2.0 will take a while because of expectations. We're chartered to do 1.1, prep work for 2.0, and our Charter expires, and we won't get 2.0 out by then.

Rest of schedule

MSM -- things which depend on 1.1 pub must move back (interop, RDF, etc.) FD delayed some.

HT -- apologize for leaving early. The next candidate requirement is the one I care about, would like to enlarge it. As part of XML Schem 1.1 a completely determined serialization of the PSVI. Such a req would subsume the "validation attempted, validity (102)" requirement. Don't roll over this.

<Mary (proxy MSM), Matthew(proxy David E.), and Henry(proxy MSM) leave> <break> <John left>

MSM -- in light of where we are.... schedule must publish. Paul -- redux... requirements (with reflective paragraphs) by Dec 31, 1.1 by end of Q1. Allen B. -- probably too much of a slip. We need a "syntactic" version by Feb 1. Wired down.

MSM -- list of things to things to talk about... 1) annotations, probably not today. 2) CR for 1.1 3) use cases 4) what next on 2e 5) errata...

Resume with candidate requirements

RQ-102 validation attempted and validity.

MSM -- discussion? MSM -- remind us that HT suggested expanding this to a canonical serialization of PSVI. Asir -- canonical PSVI could be a separate REC. MSM -- my recollection is that the sense of the WG is not to make this a part of 1.1. Allen -- my recollection is that the majority was interested in having it on an independent path. Paul -- as somebody who has explain how you have to assemble all these specs, I discourage making PSVI a different specification, since it makes it harder to explain. Chuck -- how about a parallel track and bring it back together. MSM -- that's problematic. Paul -- editorially, it can be a separate document, but don't want to run the risk of the AC not approving a publication track. MSM -- I see this as being on a separate track, feeding into 2.0. MSM -- Henry cares a lot about knowing somehow about the relative success of validity checking. Important for query. Asir -- Henry's req came in a lot earlier. We decided to defer after he posted. Paul -- Michael Kay has suggested for XPath or XSLT (and XQuery) for some sort of a mechanism to be able to say "match on Foo only if it is of type Bar" to fire template rules. Extend to "fire only if 'valid'". Want to build off what they are going to do anyway. MSM -- as a WG member, anything we can do that folds into XPath 2.0, we should. I should mention that other work going on in W3C is moving people toward the question of Infoset Augmentation, depending on which kinds of processors have fired in the upstream processing. Henry is proposing that any augmentation be described in XML form (general model for infoset extensions). For now, I'll observe that there are two proposals: the Feb note (I don't really understand it), and the more general extension Henry proposed before he left, which extended proposal I agree with. Is there new information? Possibly show XPath 2.0 connection is important. No one here is moving us to move this to "goal" status for 1.1. WG straw poll: non-goal 4, don't know 1.

RQ-103 typing of nodes with no schema

Noah -- not sure why this is a goal, seems to be based on a note I sent clarifying something... MSM -- i believe that this did come up in the context of a question from XQuery about what to do about documents for which there's no schema. I think I can assure you that there's no requirement here for us to consider. Being taken care of in normal inter wg cooperation. WG decides to classify as not under consideration.

RQ-104 delimiters

Asir -- we discussed this at Edinburg (Ashok) Paul -- XPath and XQuery have a simple notion of sequence, repeated children in simple types are possible. Micro parsing is available in these applications, but if every application has to recreate the microparser is hard. MSM -- not sure why Query has decided to spend time on this issue... because they think sequences of integers are important, not sure why we should 1) rewrite XML, or 2) implement microparsing. Noah -- I agree with Michael on this one. We have to remember that we've almost used up our budget for feature complexity. I don't think this feature measures up. MSM -- I think we can make a preliminary disposition. Dave P -- this requirement could drive us to an unpleasant place, near where SGML was. WG -- Paul (req), 0 desideratum, 2 oppdes(kognee and lisa), rest non-goal (preponderance).


Asir -- this is an issue + feature request. Anli Shundi(tibco) -- I think he's wrong about that. Paul -- "for compatibility..." means if you use a schema that uses id/idref as types for anything other than attributes, that schema cannot be backtracked into a DTD. MSM -- I think it also expresses a wish on the part of the WG if people use attributes for ids, but the use of "should" instead of "must" is important. Paul -- the other way to get an element with more than one id is to use a datatype "list of id". MSM -- I still think of XML as a subset of SGML. International standards structure has been around longer than we have. Paul -- i remember discussions in the pre-history of schema about Schema 1.0 req. I remember "it was a bad idea for SGML to allow only one id, but because of the code base we should keep the restriction." Not sure that rationale still holds. Allen -- it's an awful idea having only one id. Paul -- HL7 policy is to use "list of ids" as any id. Anli Shundi -- you want to compare ids. MSM -- there's a certain convenience, but I sense we're coming into a discussion of a fundamental web design principle, namely: "you can always learn more...", i.e. if I'm comparing URIs, and they're the same, they point to the same resource. If two URIs are different, I can't use that fact assert either identity, nor can I deny that they point to the same thing. WG -- 0 req, des(paul, allen, melany), opdes(4), nong(0), not sure(1) Class - opportunistic desideratum.

RQ-106 QName ambiguous

Dave P -- he's complaining about the Qname lex->value space mapping is dependent on context. Paul -- Qnames are context dependent. A way to answer this problem is come up with an answer for a canonical representation of a QName. I suggest "opportunistic desideratum". I think he want's a lexical form for QNames analogous to James expanded QName form. Noah -- observe that after such munging it's not a QName.

<Noah departs (3:36)>

MSM -- two candidate reqs.. 1 change lexical form, or 2 (Paul B. suggests) expanded name type. 1) non-goal (for lex form) unanimous. 2) reengineering of this candidate req, to create an expanded name. oppdes(3) non-goal(7). MSM -- like to remark that I'd like a "triple" design, but probably too much for right now.

RQ-107 XML Schema Version

Asir -- here's my interpretation. for a given namespace, I might have multiple schemas for this namespace. What dan chang is asking for is a versioning of a document, and to be able to capture it at the component level. Need a metamarker which designates *which* schema version to pick up. David -- too quick and dirty. Paul -- since it's a hint, why not use the schema hint. Not sure what he's asking for. Chuck -- just guessing, but when I look at this I might have want to simply flag exceptions. MSM -- most convinced that you can give this kind of hint using Schema location. Noah would remind us that versioning is part social and part technical. Allen -- I have a definition. Reason this thing doesn't work, is a version is a closure. You must have a way to point at the thing being closed. Some SGML database systems actually do this, so there is implementation experience. WG -- req(0) des(0) opdes(0) non-goal(unanimous).

RQ-108 URI datatype distinct from "anyURI" which is a reference

MSM -- '#' at the right makes a reference. URI must be absolute, only references can be relative. Paul -- BNF for URIs has ambiguity WRT "can I write a regex to deal with whitespace" in the URI. The regex is probably pretty complex. No objection to putting it in the rec, but it will take some resources to work out the pattern regex. Suggest opportunistic desideratum. WG -- req(0) des(0) oppdes(8) non-goal(2)

RQ-109 multiple target namespaces in a schema

MSM -- not sure I understand the rationale for having to put schemas for two different XML documents. Why not rename our existing element to xs:schema, and allow several such in a document? Paul -- different alternate (for 2.0) why not allow a different target on each element? But only later. Cognee -- schema element doesn't have to be a document element, it's a non-issue. Asir -- in theory, this should affect only syntax, already present in model. Lumping lots of target schemas into one document is confusing. MSM -- answer seems to be that this isn't really an issue. Asir -- what does "redirect" a namespace mean. MSM -- I think this means that -- imagine my target namespace is built, and another document says redefine the schema in that document to target namespace X. Sounds too much like Cameleon, and that's a bad association.

WG -- allow multiple noop(unanimous) WG -- redefine/redirect opdes(1) non-goal(rest) WG -- put in note (yes)

Paul -- proposal for 2.0 is to allow the component to be serialized. MSM -- don't do that. Too controversial.

<break (5 min)>


MSM -- all thumbs down. WG -- non-goal(unanimous)

RQ-111 default on element

David -- TAG has signalled uncomfortableness with defaults. MSM -- seems like there could be lots of ambiguity. WG -- non-goal(unanimous)

RQ-112 extensible enumerations

MSM -- argue that this is a no-op. David -- I've solved using a union of enumeration and a type. Melanie -- we've solved it with an abstract type and xsi:type.

<Allen leaves>

Melanie -- can't use a union if I want to restrict it. If I want to modify it, I have to create an extension. WG -- req(1), des(0), opdes(4), non-goal(3).

RQ-113 equivalence comparisons on anyURI

MSM -- string compare doesn't work. I think this requirement is subsumed by clarifications we've already specified. I propose this is subsumed by 3.1.2 item 2.

WG -- follow MSMs recommendation.

RQ-114 add anyString datatype as parent of string

David -- is the point CDATA? Anli Shundi -- point is to put encodings on all strings. MSM -- seems like a level violation... asking to put into the schema information about the wire transmission semantics. It goes against the XML idea that Unicode is as far down as the encoding goes. David -- are there two problems here? Maybe there are two questions. MSM -- maybe solving the "database dump problem". XML 1.1 provides for numeric character references in the control code areas. Anli Shundi -- I agree that there can be value here. MSM -- not sold. WG -- opdes(1), non-goal(rest)


Xan -- partial reconstruction of the DTD feature defining the element. Dave P -- DTD doesn't contain DOCTYPE anyway. David E -- you could still use DOCTYPE. MSM -- I think of this as a simple case of a general problem. The general problem is that of writing modular parts of document grammars. "Import" and "Export" are what is needed, and it was turned down. I don't think this is useful, and I have sympathy, but this doesn't seem like the right way to go. WG -- des(1), opdes(1), non-goal(rest)

RQ-116 atomic types as derivation of existing types

Asir -- wants to make new atomic types based on one or more atomic types. Wants composite types (like currency value and currency code). Dave P -- not for 1.1, since if you try this you'd have all sorts of safeguards making sure that combining the 2 requires that you can always parse it back out. WG -- non-goal(unanimous)

RQ-117 allow keys on simple and complex types (rather than on elements)

MSM -- too much for 1.1. Dave P -- lots of support for 2.0. MSM -- fact that ids attach to elements instead of types is the cause of lots of problems. WG -- non-goal(unanimious) with interest for 2.0


Asir -- wants content model level to define cooccurance constraints. MSM -- subsumed by occurance based cooccurance constraints. WG -- subsumed.

RQ-119 allow vendors to extend the XML schema

MSM -- this reopens question in design of 1.0, when we said that only extensions are for attributes and appinfo. Is ability to have elements from foreign namespaces interspersed a problem? Asir -- WSDL allows extensions all over the place. These cause interoperability issues. Dave P -- one of the big problems is that HTML has lots of vendor specific extensions, to the point that the big vendors agreed not to do that with XML. Melane -- we wouldn't want vendors to have the ability to extend.

RQ-120 list and union, inconsistent use of the term derived

MSM -- Ashok has sent this back until we get it right. We use the term derived for *both* extension and restriction. Replace "derived" with "constructed" for lists and unions. WG -- make this editorial desideratum.

R-178 / RQ-121

R-178 (probably RQ-121, copy paste error) use = prohibited and fixed is specified???

MSM -- class this as a rec comment gone astray. Lisa -- because this causes backward compatibility problems, it needs to be on 1.1 list. WG -- requirement(all)

Any other business

MSM -- succeeded in getting preliminary dispositions to all of these. Thanks.

Asir -- comments from OAGIS questions and comments. Came in Aug 21. MSM -- haven't seen it here. Make a note to get them under consideration.

Dave P -- just put several other probable errata into the hopper. Lisa -- is requirements door closed yet. MSM -- no.

MSM -- no telcons Thursday and Friday this week.

MSM -- move a vote of thanks to Tibco and UNC for being great hosts. The video conferencing has been very helpful.