This document contains the issues raised by comments on XML Schema during its Last-Call period. Officially, the Last-Call comment period began 7 April 2000 and ended 12 May 2000; it does not in general contain issues raised earlier or later (though there are some exceptions). In its current form it has been prepared by Michael Sperberg-McQueen.
The process by which the XML Schema WG plans to handle these issues is described at http://www.w3.org/2000/04/24-xmlschema-lcprocess.html.
Material reproduced from comments has been marked up, and obvious typos have been corrected. Postings and documents which raise several issues have been silently divided among several issues. To consult the original postings, consult the archive of the comments list.
Commentators have been requested to consult the records for the issues they have raised, and check to make sure we have correctly understood and paraphrased the issue. (Note that in a few cases the paraphrase poses a slightly broader question that the commentator appears to have had in mind.)
In addition to the postings to the XML Schema comments list, some postings to the XML Schema Interest-Group mailing list have been included here; this list is W3C-internal and only those with member access to the W3C web site will be able to follow the relevant hyperlinks. Where we have received permission to quote the original posting in this public document, we have done so; in other cases, a paraphrase enclosed in square brackets has been supplied. Links to member-only material included in postings to the public list have been left intact in the interests of completeness (for those who do have member access) and simplicity (for those maintaining this document).
In addition, the following documents have been consulted:
Num | Cl | Cluster | Locus | Originator | Responsible | Description |
---|---|---|---|---|---|---|
LC-1 | C | 04 datetime | datatypes 3.3.22 | Martin Bryan, Paul Cotton | editor | Specify date/time validity better? |
LC-2 | D | 23 constructors | datatypes | Curt Arnold | Martin Gudgin | Conjunction types? |
LC-3 | C | 01 facets | datatypes 2.4.1.2 | Paul Cotton | editor | Why allow divergent order relations? |
LC-4 | C | 01 facets | datatypes 2.4.2.5 | Paul Cotton | editor | Enumerations should inherit ordering from underlying type |
LC-5 | C | 01 facets | datatypes 3.2.1 | Paul Cotton | editor | Ordering information is missing |
LC-6 | D | 15 strings | datatypes 3.2.1 | Paul Cotton | Jonathan Robie | Do strings need collation sequence? |
LC-7 | D | 26 numeric | datatypes 3.2.5 | Paul Cotton | Jonathan Robie | Arbitrary-precision decimal too much? |
LC-8 | C | 04 numeric | datatypes 3.3.9 | Paul Cotton | editor | Integers should not allow non-significant leading or trailing zeroes |
LC-9 | A | strings | datatypes | Peter Canning | Biron, Sperberg-McQueen | How do I restrict strings to ASCII or Latin 1? |
LC-10 | D | 28 keys | structures | Mary F. Fernandez | David Beech, Ashok Malhotra | Clarify the exposition of identity-constraint tables |
LC-11 | C | 04 datetime | datatypes | Aram Airapetian | editor | Date and time period |
LC-12 | A | 19 modules | structures | Aaron M. Cohen | Dan Connolly | How can a module creator make attributes available to other modules? |
LC-13 | A | 19 modules | structures | Aaron M. Cohen | Sperberg-McQueen | How can a module creator add things to content models in other modules? |
LC-14 | D | 15 xpath | datatypes | Curt Arnold | Don Box | Define an XPath type |
LC-15 | A | 14 attldecl | structures | Curt Arnold | Sperberg-McQueen | Allow hints for initial value of an attribute? |
LC-16 | D | 24 content-models | structures | Martin J. Duerst | Matt Timmermans | Allow arbitrary order with occurrence > 1? |
LC-17 | C | regex | datatypes | TAMURA Kent, Alexander Falk | editor | Give BNF for regular-expression language? |
LC-18 | C | regex | datatypes | TAMURA Kent | editor | Clarify character-set subtraction? |
LC-19 | C | regex | datatypes | TAMURA Kent | editor | Make - unambiguous in regexes? |
LC-20 | C | regex | datatypes | TAMURA Kent | editor | Clean up definition of multi-character escape? |
LC-21 | D | 27 i18n-datetime | datatypes | David RR Webber | Sperberg-McQueen | Allow non-gregorian dates? |
LC-22 | C | publication | both | Murray Altheim | editor | Where's the glossary? |
LC-23 | D | 21 sfs | both A | Alexander Falk | Dan Connolly | Use 2000 not 1999 in XML Schema namespace name? |
LC-24 | C | ed-str | structures G | Alexander Falk | editor | Improve or drop tabulation of changes in Structures? |
LC-25 | C | 01 facets | datatypes 3.2.2.2 | Alexander Falk | editor | Why have pattern facet for Boolean? |
LC-26 | C | binary | datatypes 3.2.8.1 | Alexander Falk | editor | Drop pattern from binary? |
LC-27 | D | 27 i18n-numeric | datatypes 3.2.3 - 3.2.5 | Alexander Falk, Dario De Judicibus | Sperberg-McQueen | Allow multiple lexical spaces for floats? |
LC-28 | C | 01 facets | datatypes 3.3 | Alexander Falk | editor | Don't list fixed-value facets? |
LC-29 | C | 04 datetime | datatypes | Alexander Falk | editor | Fix lexical form for recurringDay? |
LC-30 | C | publication | datatypes A | Alexander Falk | editor | Where is xml.xsd? |
LC-31 | C | publication | both | Alexander Falk | editor | Provide archive of spec, DTDs, stylesheets, and XSDs? |
LC-32 | D | 15 regex | datatypes | Alexander Falk | Dave Peterson | Add shorthands to regex syntax? |
LC-33 | A | regex | datatypes | Alexander Falk | Matt Timmermans | Why is {0,0} there? |
LC-34 | C | regex | datatypes | Alexander Falk | Paul Biron | Define single-character escape for vertical bar? |
LC-35 | C | typos | primer | David Wang | editor | Typo in example in Primer section 5? |
LC-36 | B | process | requirements | David RR Webber | Dave Hollander, Michael Sperberg-McQueen | Reconsider project requirements and design? |
LC-37 | A | 20 keys | both | Ani Pedersen | Noah Mendelsohn | Multi-field keys? |
LC-38 | C | 13 occurs | primer | Nick K. Aghazarian | editor | Fix defaulting text for maxOccurs? |
LC-39 | C | typos | primer 4.7 | Peter A. Berggren | editor | Typo: for finalDefault read blockDefault? |
LC-40 | C | 20 keys | both | Henry Thompson | editor | xsi:null and keyref |
LC-41 | C | typos | datatypes 3.3.22 | Curt Arnold | editor | Typos in datatypes? |
LC-42 | C | typos | primer 3.1 | Adrian Robert | editor | Typos in Primer Section 3.1? |
LC-43 | A | 02 enums | both | Martin Bryan | Martin Gudgin | Defining lists of permitted attribute values |
LC-44 | A | dt-queries | datatypes | Martin Bryan | Ashok Malhotra | Questions relating to data types |
LC-45 | A | 21 ns | structures | gmacri@libero.it | Henry Thompson | Questions |
LC-46 | B | 11 infoset | structures | Mikael Ståldal | chairs | Remove default values? |
LC-47 | C | typos | structures | Gregor Meyer | editor | Typo in example? |
LC-48 | C | sfs | structures | Curt Arnold | editor | Fix declaration of simpleType element? |
LC-49 | D | 12 content-models | structures | Curt Arnold | MSM, Matt Fuchs | Streamline restriction of content models? |
LC-50 | C | sfs | structures | Curt Arnold | editor | Suppress multiple local declarations of attribute element? |
LC-51 | D | 12 content-models | structures | Curt Arnold | Don Box, Dan Connolly | Can XML Schema define XSLT? |
LC-52 | D | 13 occurs | structures 4.3.3 | Curt Arnold | Don Box | Clarify minOccur/maxOccur defaulting? |
LC-53 | C | typos | primer 2.7 | Ray Gates | editor | Shipper/biller vs. Shippee/billee in part 0 |
LC-54 | D | 04 numeric | datatypes | Gregor Meyer | Ashok Malhotra | Drop nonPositiveInteger? |
LC-55 | D | 21 ns | datatypes | Curt Arnold | Henry Thompson | Proper home namespace/resource for built-in datatypes |
LC-56 | D | 19 modules | structures | Curt Arnold | Mark Reinhold* | Add schemaPrefix, targetPrefix attributes? |
LC-57 | D | 19 impl-modules | structures | Curt Arnold | Matt Fuchs | Add maximum depth for includes? |
LC-58 | D | 58 impl-modules | structures 6.3.2 | Curt Arnold | Matt Fuchs | How to deal with nested imports? |
LC-59 | D | 03 constructors | datatypes | Dario de Judicibus | Mary Holstege | Allow user-defined list separators? |
LC-60 | A | 10 openness | structures | Dario de Judicibus | Roger Costello* | Change meaning of anyAttribute? |
LC-61 | D | 03 constructors | datatypes | Dario de Judicibus | Frank Olken | Allow record-style simple types? |
LC-62 | D | 25 facets | datatypes | Ray Waldin | Mary Holstege, Ashok Malhotra | Doubly specified facets |
LC-63 | C | 12 content-models | structures | Richard Tobin | editor | Forbid recursion in content models? |
LC-64 | A | entities | structures | Dario De Judicibus | Priscilla Walmsley | Entities |
LC-65 | C | 14 attldecl | structures 3.2 | James Tauber | editor | Why are {min occurs}/{max occurs} optional in Attribute Declaration? |
LC-66 | A | urtype | structures 3.4 | James Tauber | Henry Thompson | {name} of the Ur-Type |
LC-67 | C | sfs | structures 3.4 | James Tauber | editor | Why does absent point to definition for null in glossary? |
LC-68 | C | urtype | structures 3.4/3.13 | James Tauber | editor | Should there be a Simple Type Definition of the Ur-Type? |
LC-69 | C | 14 attldecl | structures 4.3.3 | James Tauber | editor | Should "anyAttribute" have a "processContents" attribute? |
LC-70 | C | 10 openness | structures 4.3.7 | James Tauber | editor | Can ##local stand alone in namespace attribute or must it be in a list? |
LC-71 | C | 14 attldecl | structures 3.2/4.3.1 | James Tauber | editor | {value constraint} in top-level attribute declarations |
LC-72 | C | 14 attldecl | structures 4.3.1 | James Tauber | editor | Representation of {target namespace} in second case has parent instead of ancestor |
LC-73 | D | 21 publication | structures | Henry Thompson | Dan Connolly*, Henry Thompson | XML Schema Namespace versioning |
LC-74 | D | 31 urtype | structures | Noah Mendelsohn | Paul Grosso, Noah Mendelsohn* | Define an explicit name for the urType? |
LC-75 | A | 09 appinfo | structures 4.3.10 | David Vun Kannon | Frank Olken | Using appinfo annotations to store integrity constraints |
LC-76 | A | 14 attldecl | structures | David Vun Kannon | Rick Jelliffe* | Defining inherited attribute values |
LC-77 | D | 17 sfs | datatypes | Curt Arnold | Paul Biron* | Namespace of has-facet, has-property |
LC-78 | C | 21 ns | both | Alexander Falk | editor | Possible schema validation issue in 3.0b3 |
LC-79 | C | sfs | datatypes A | Dan Vint | editor | Stray data in Datatypes schema? |
LC-80 | C | sfs | datatypes | Dan Vint | editor | Stray slash in declaration of base attribute for simple types? |
LC-81 | C | publication | both | Dan Vint | editor | Schema-for-schema files |
LC-82 | A | 09 appinfo | requirements | Robert Miller | Matt Fuchs, Jonathan Robie | XML Schema considered inadequately extensible |
LC-83 | A | 09 appinfo | both | Robert Miller | Jim Barnette | Better support for semantics? |
LC-84 | A | 15 arrays | both | Robert Miller, MPEG-7 | Frank Olken | Arrays? |
LC-85 | C | 14 attldecl | structures 4.3.1 | James Tauber | editor | 4.3.1: second scenario: should value constraint default to FIXED? |
LC-86 | C | 21 ns | structures 3 | Peter Canning | editor | Optional component != mandatory but absent? |
LC-87 | C | xsinull | primer 2.3 | Curt Arnold | editor | Value space for xsi:null attribute |
LC-88 | C | 03 constructors | primer 2.3 | Curt Arnold | editor | Limits on lists |
LC-89 | A | 19 modules | primer 3.4 | Curt Arnold | Henry Thompson | Undeclared and unnamed namespaces |
LC-90 | D | 12 content-models | primer 4.2 | Curt Arnold | David Fallside* | Extension of mixed types |
LC-91 | D | 31 entities | structures | Steven Pemberton | Don Mullen* | Support declaration of character entities? |
LC-92 | A | 19 modules | structures | Dr. Ardeshir Bahreininejad | Mary Holstege | Dynamic element name specification. |
LC-93 | D | 23 constructors | datatypes | David Vun Kannon | Ashok Malhotra | Disjoint datatypes? |
LC-94 | A | 07 typedecl | structures 5.11 | Asir S Vedamuthu | Henry Thompson* | Clarify Structures 5.11 Complex Type Definition Constraints - Type Derivation OK |
LC-95 | A | 19 modules | structures | Jane Hunter | Sperberg-McQueen | Use of xml:lang |
LC-96 | D | 10 openness | structures 2.2.2.2 | Curt Arnold | Henry Thompson, Ugo Corda* | equivClass: common ancestor type |
LC-97 | D | 04 numeric | datatypes | Doug Ransom | Frank Olken | Allow hex notation for integers? |
LC-98 | A | 19 modules | structures | gmacri@libero.it | David Ezell* | Clarifying schema location and namespace |
LC-99 | A | 20 keys | structures | gmacri@libero.it | Jonathan Robie* | XPath expressions in key definitions |
LC-100 | C | ed-str | structures 3.8 | Michael K Smith | editor | Suggested rewording in '3.8 Particle Details' |
LC-101 | A | 14 attldecl | structures | achille@us.ibm.com | Chuck Campbell | Help on Wildcard |
LC-102 | D | 03 constructors | both | Anders W. Tell | David Ezell | Suggestion: Microparsing support in XML Schema |
LC-103 | A | 01 facets | datatypes | Michael Anderson | Lew Shannon | Restricting Facets |
LC-104 | A | requirements | both | Joseph M. Reagle Jr. | Sperberg-McQueen | XML Signature WG's review of XML Schema |
LC-105 | A | requirements | both | Joseph M. Reagle Jr. | Sperberg-McQueen | Please stabilize syntax |
LC-106 | C | 08 complexity | structures 2.2 | Joseph M. Reagle Jr. | editor | Restructure Structures 2.2.*, or define components? |
LC-107 | C | 08 complexity | both | Joseph M. Reagle Jr. | editor | Reagle's comments on XML Schema |
LC-108 | C | 08 complexity | both | Joseph M. Reagle Jr. | editor | Clear relation of xsi and xsd namespaces |
LC-109 | C | publication | structures A | Joseph M. Reagle Jr. | editor | Make schema and DTD locations more explicit? |
LC-110 | C | publication | both | Joseph M. Reagle Jr. | editor | Simplify default values, or document better |
LC-111 | C | 08 complexity | primer 3 | Joseph M. Reagle Jr. | editor | Clearer guidelines in Primer section 3? |
LC-112 | A | 12 content-models | structures | Bob Schloss | Henry Thompson | Determinism and Choices of Sequences |
LC-113 | C | misc | structures | Susan Lesch | editor | Minor comments for WD-xmlschema-1-20000407 |
LC-114 | C | 08 complexity | structures | Curt Arnold | editor | Minor part 1 comments |
LC-115 | A | 19 modules | structures 6.2.2 | Curt Arnold | Roger Costello | Resolution of QNames references |
LC-116 | D | 19 modules | structures 6.3.2 | Curt Arnold | Noah Mendelsohn | Require declaration before use of schema location? |
LC-117 | D | 19 modules | structures 6.3 | Curt Arnold | Noah Mendelsohn | (Locating Schema resources) |
LC-118 | D | 27 i18n-boolean | both | Martin J. Duerst | Ashok Malhotra | Making it easy to create signable schemas |
LC-119 | A | 19 modules | structures | gmacri@libero.it | Ugo Corda | Question on include |
LC-120 | C | 04 datetime | datatypes | Ninggang Chen | editor | [Clarifications] Part 2 Datatypes |
LC-121 | D | 05 fco | structures | Tim Berners-Lee | Henry Thompson | Element names should be xml:ids in schemas |
LC-122 | C | 05 fco-appinfo | structures | Ralph R. Swick | Paul Biron | xsd:appinfo |
LC-123 | D | 17 sfs/content-models | structures | Curt Arnold | Henry Thompson | Inconsistency on content model for group |
LC-124 | D | formal | both | Jane Hunter | Frank Olken | MPEG-7 Feedback to Last Call for Review |
LC-125 | B | process | requirements | David RR Webber | Dave Hollander | Extend last-call comment period? |
LC-126 | D | 12 content-models | structures | Henry Thompson | Henry Thompson | Content model of <complexType> |
LC-127 | D | 11 infoset | structures | Henry Thompson | Henry Thompson | Error logging in the infoset |
LC-128 | D | 31 modules | structures | Henry Thompson | Lee Buck | Lee Buck's use case returns: a hesitant proposal for type modification |
LC-129 | C | typos | datatypes | Susan Lesch | editors | Minor typos in WD-xmlschema-2-20000407 |
LC-130 | D | formal | both | Roger L. Costello | Roger Costello, Noah Mendelsohn | XML Schema Comments |
LC-131 | - | - | - | - | - | [Dummy] |
LC-132 | A | 12 content-models | structures | Dan Rupe | Martin Gudgin | Contents which may occur in any order |
LC-133 | D | 05 fco | datatypes | Ralph R. Swick | Henry Thompson | Well-known URIs for all built-in datatypes and facets |
LC-134 | - | - | - | - | - | [Dummy] |
LC-135 | A | 06 localtypes | structures | Ralph R. Swick | Noah Mendelsohn | 'form' and 'elementFormDefault' appear harmful |
LC-136 | D | 15 constants | both | Steven Goldfarb | Frank Olken | Symbolic constants |
LC-137 | - | - | - | - | - | [Dummy] |
LC-138 | - | - | - | - | - | [Dummy] |
LC-139 | A | 12 content-models | structures | Ninggang Chen | Dan Connolly* | Mixed Content Model |
LC-140 | - | - | - | - | - | [Dummy] |
LC-141 | - | - | - | - | - | [Dummy] |
LC-142 | D | formal | both | Jim Trezzo | Ashok Malhotra, David Beech | Oracle Comments on XML Schema Last Call: |
LC-143 | D | 30 complexity | structures | Philip Wadler | Matt Fuchs | Simplifying XML Schema |
LC-144 | A | 15 arrays | datatypes | Don Brutzman | Frank Olken | X3D-related comments on Schema datatypes |
LC-145 | A | 09 appinfo | datatypes | Don Brutzman | Priscilla Walmsley | How do I specify additional numeric constraints? |
LC-146 | A | 12 content-models | structures | gmacri@libero.it | Henry Thompson | Question on "ref" attribute |
LC-147 | - | - | - | - | - | [Dummy] |
LC-148 | - | - | - | - | - | [Dummy] |
LC-149 | A | 10 openness | structures | Jane Hunter (MPEG-7) | Frank Olken, Lee Buck | Provide guidance on extending schema for schemas? |
LC-150 | D | 15 arrays | both | Jane Hunter (MPEG-7) | Frank Olken | Allow specification of size constraints in instance? |
LC-151 | A | 20 keys | structures | Jane Hunter (MPEG-7) | Frank Olken | How do I restrict IDREFs to particular (element) types? |
LC-152 | D | 07 typedecl | structures | Jane Hunter (MPEG-7) | Frank Olken | Simultaneous restriction and extension? |
LC-153 | D | 24 occurs | structures | Jane Hunter (MPEG-7) | Frank Olken | Re-align occurrence indications for elements and attributes? |
LC-154 | D | 19 modules | structures | Jane Hunter (MPEG-7) | Frank Olken | Allow explicit specification of name import/export? |
LC-155 | D | 29 openness | structures | Roger L. Costello | Roger Costello, Noah Mendelsohn | Restore global openness |
LC-156 | D | 29 openness | structures | Roger L. Costello | Roger Costello, Noah Mendelsohn | Make schema for schemas open? |
LC-157 | D | 29 openness | structures | Roger L. Costello | Roger Costello, Noah Mendelsohn | Clarify schema evolution |
LC-158 | D | 08 complexity | both | Roger L. Costello | Roger Costello, Noah Mendelsohn | Split the specification? |
LC-159 | D | 28 keys-infoset | both | Jim Trezzo | Ashok Malhotra, David Beech | Add PSV infoset properties for keyref info? |
LC-160 | D | 10 openness | both | Jim Trezzo | Ashok Malhotra, David Beech | Default equivclass blocking? |
LC-161 | B | 11 infoset | both | Jim Trezzo | Ashok Malhotra, David Beech | An API needed for the PSV info set? |
LC-162 | D | 28 infoset | both | Jim Trezzo | Ashok Malhotra, David Beech | Add items to PSV infoset for schemas? |
LC-163 | D | 04 datetime | both | Jim Trezzo | Ashok Malhotra, David Beech | Recurring durations |
LC-164 | D | 18 xsi | structures | Philip Wadler | Matt Fuchs | Drop xsi:type? |
LC-165 | D | 23 xsi-nulls | structures | Philip Wadler | Matt Fuchs | Drop xsi:null? |
LC-166 | D | 12 content-models | structures | Philip Wadler | Matt Fuchs | Align simple and complex types more fully? |
LC-167 | D | 06 localtypes | structures | Philip Wadler, Murray Altheim | Matt Fuchs | Local declarations: less or more |
LC-168 | A | 07 typedecl | both | Murray Altheim | Peter Chen | Drop/clarify anonymous types? |
LC-169 | A | ed-primer | primer | Murray Altheim | David Fallside | Primer notes |
LC-170 | D | 19 modules-include | structures | Murray Altheim | Aki Yoshida | Drop include? |
LC-171 | D | 20 keys | structures | Murray Altheim | David Cleary | Drop/clarify keys? |
LC-172 | D | 10 openness | structures | Murray Altheim | MSM, Murray Maloney | Drop any wildcard? |
LC-173 | C | 08 complexity | structures | Murray Altheim | editor | Miscellaneous notes on structures |
LC-174 | D | 08 complexity | structures | Murray Altheim | Mary Holstege | Drop abstract model? |
LC-175 | D | 10 openness | structures | Murray Altheim | Norm Walsh | Drop equivalence classes? |
LC-176 | A | notations | structures | Murray Altheim | Dave Peterson | Drop/clarify notation? |
LC-177 | D | 16 validation | structures | Murray Altheim | Matt Fuchs | Tighten conformance rules? |
LC-178 | A | 08 complexity | structures | Murray Altheim | Sperberg-McQueen | Clarify! |
LC-179 | A | str-queries | structures | Peter Canning | Peter Chen | May components not at the top level be named? |
LC-180 | A | 08 complexity | both | XML Query | Sperberg-McQueen | Locality of exposition |
LC-181 | C | 12 content-models | structures | XML Query | editor | Allow abstract types in element declarations? |
LC-182 | D | 16 localtypes | structures | XML Query | Rick Jelliffe | Relax single-binding rule? |
LC-183 | C | validation | structures | XML Query | editor | Clarify details of lax validation? |
LC-184 | A | 19 modules | structures | Steve Monk | Dan Connolly | How do I combine schemas for two namespaces in one schema? |
LC-185 | D | 11 infoset | both | XML Core WG | Sperberg-McQueen | Comments from XML Core WG |
LC-186 | D | 19 modules | structures | Daniel Veillard | Matt Timmermans | Naturalizing names while declaring their relation to their original namespace |
LC-187 | D | 04 numeric | datatypes | Graham Kline | Paul Biron | Type hierarchy for numerics |
LC-188 | C | ed-primer | both | Martin Duerst | editor | Notes on the primer |
LC-189 | D | 02 enums | datatypes | Martin Duerst | Asir Vedamuthu | Easier (more compact) enumerations? |
LC-190 | D | 07 typedecl | structures | Martin Duerst | MSM | Allow attributes and content model in any order? |
LC-191 | D | 22 sui-generis | both | XForms Group | John McCarthy | Comments from XForms Group |
LC-192 | D | 11 infoset | both | DOM WG | Sperberg-McQueen | Comments from DOM WG |
LC-193 | A | 14 attldecl | structures | Peter van de Hoef | Sperberg-McQueen | How do I specify co-occurrence constraints on attributes? |
LC-194 | D | 12 content-models | both | Michael Stonebraker | Dave Hollander | Align better with OR schema |
LC-195 | C | ed-str | structures | Martin Duerst | editor | Eliminate the term 'obtains'? |
LC-196 | C | 08 complexity | both | Martin Duerst | editor | Need graphics? |
LC-197 | D | 26 numeric | datatypes | Martin Duerst | Don Mullen | Allow negative scale? |
LC-198 | D | 28 infoset | both | XML Query | Henry Thompson | Provide type-information in PSV Infoset? |
LC-199 | D | 28 infoset | both | XML Query | Henry Thompson et al. | A schema for the schemaless? |
LC-200 | D | 24 content-models | both | XML Query | David Beech | Distinguish sequences from sets? |
LC-201 | D | 20 keys | structures | XML Query | Jonathan Robie | Support cross-document keyref? |
LC-202 | B | 11 infoset | datatypes | XML Query | chairs | Make physical representation an optional PSV property? |
LC-203 | C | 11 infoset | both | XML Query | Align part 1 and part 2 better on datatypes infoset? | |
LC-204 | D | 15 dt-misc | both | XML Query | Sperberg-McQueen | Type coercions |
LC-205 | C | 27 i18n | primer | I18n WG | editor | I18n notes on primer, misc |
LC-206 | C | 27 i18n | structures | I18n WG | editor | I18n notes on structures |
LC-207 | C | 27 i18n | datatypes | I18n WG | editor | I18n notes on datatypes, misc |
LC-208 | D | 24 content-models | structures | Philip Wadler | Rename equivalence classes? | |
LC-209 | D | 23 enums | both | XForms | Don Box, Martin Gudgin | Open enumerations |
LC-210 | D | 31 misc | structures | Beech | Make DTD non-normative? | |
LC-211 | D | 22 xforms | datatypes | XForms WG | John McCarthy* | Add masks? |
LC-212 | D | 22 xforms | both | XForms WG | John McCarthy* | Currencies |
LC-213 | D | 22 xforms | datatypes | XForms WG | John McCarthy* | Drop facets? |
LC-214 | D | 22 xforms | datatypes | XForms WG | John McCarthy* | Add facets? |
LC-215 | D | 27 i18n | structures | I18n WG | Martin Gudgin | Easy add-ins |
LC-216 | D | 27 i18n | both | I18n WG | Merge mixed, text-only, and string? | |
LC-217 | D | 27 i18n | structures | I18n WG | Allow pattern on complex types? | |
LC-218 | B | 27 i18n | both | I18n WG | chairs | Solve C0 control-character issue? |
LC-219 | D | 27 i18n | datatypes | I18n WG | Lay foundation for multiple lexical representations? | |
LC-220 | D | 27 i18n | datatypes | I18n WG | Single lexical representations? | |
LC-221 | D | 27 i18n | datatypes | I18n WG | I18n on date/time types | |
LC-222 | D | unassigned | structures | XML Query | Revamp occurrence indicators? |
Should the datatypes part of XML Schema specify the legal value ranges for the parts of dates and times? In particular, should it point out that 0000 is not a valid year in the Gregorian calendar?
Input from Martin Bryan:
(Martin Bryan to XML Schema comments list, 29 February 2000.)
There would appear to be no mechanism for entering a data prior to 0 AD.
Could -CCYY
be allowed, with the proviso that 0000 is not a valid
year?
Input from Paul Cotton:
(Paul Cotton to XML Schema comments list, 9 March 2000)
The spec should specify the legal value ranges of the CC, YY, MM, and DD parts of a date (even if this information is given in ISO 8601), and also the various parts of a time value.
Input from C. M. Sperberg-McQueen:
"C. M. Sperberg-McQueen" <cmsmcq@acm.org> to XML Schema Comments list, Mon, 08 May 2000 20:11:05 -0600
At 16:24 00/02/29 +0000, Martin Bryan wrote: There would appear to be no mechanism for entering a data prior to 0 AD. Could -CCYY be allowed, with the proviso that 0000 is not a valid year?
I believe that this form is allowed by the schema spec. Section 3.3.24.1 says
"To accommodate year values outside the range from 0 to 9999, additional digits can be added to the left of this representation and an preceding '-' is allowed."
I believe there are two typos here (there is no year 0 in the Gregorian calendar, so the range should be 1-9999, and for 'an' read 'a'), but the phrase about the preceding minus sign is not a typo.
Personally, I agree that there should be a note pointing out that '0000' is not a valid year; otherwise, too many implementors will get it wrong.
Input from Martin Bryan <mtbryan@sgml.u-net.com>:
"Martin Bryan" <mtbryan@sgml.u-net.com> to XML Schema Comments list on Sun, 14 May 2000 08:01:08 +0100
Ashok
You have sent a number of notes with comments on the Datatypes part of the XML Schema specification. We appreciate your input. I believe I have responded to all your concerns but would like to ask you formally whether you feel your concerns have been addressed and the issues you raised can be closed.
The vast majority of my concerns have been answered, for which I thank you. There are still some relatively minor ones, such as the sentence that reads 'To accommodate year values outside the range from 0 to 9999, additional digits can be added to the left of this representation and an preceding "-" is allowed.' This really needs the 0 changed to 1, and a statement to indicate that negative numbers represent Before Christian Era (BC/BE) dates....
petsa@us.ibm.com to XML Schema Comments list, Fri, 14 Apr 2000 13:18:32 -0400
Appendix D now contains information on the legal ranges of the parts of date and time values.
Formal response to commentator. Martin Bryan confirms he is satisfied, for the most part. A later message says he thinks there are still problems.
Paul Cotton indicates he is satisfied.
Should XML Schema introduce constructors for simple types based on Boolean logic?
Input from Curt Arnold:
Curt Arnold to XML Schema Comments list, 3 March 2000:
The following facets would seem to address quite a few constructs that appear within Schema for Schema and a few things like multiple, disjunctive value ranges from http://lists.w3.org/Archives/Public/www-xml-schema-comments/1999OctDec/0034.html
Examples:
<simpleType name="maxOccur" base="string"> <or> <conform type="non-negative-integer"/> <enumeration><literal value="*"/></enumeration> </or> </simpleType> |
<simpleType name="noTeens" base="integer"> <or> <and> <minInclusive value="0"/> <maxExclusive value="10"/> </and> <minInclusive value="20"/> </or> </simpleType> |
<simpleType name="targetOrNamespace" base="string"> <or> <enumeration> <literal value="##targetNamespace"/> </enumeration> <conform type="uri-reference"/> </or> </simpleType> <simpleType name="targetOrNamespaces" base="targetOrNamespace" derivedBy="list"/> <simpleType name="anyAttribute" base="string"> <or> <enumeration> <literal value="##any"/> <literal value="##other"/> <literal value="##local"/> </enumeration> <conform type="targetOrNamespaces"/> </or> </simpleType> |
Discussed at Edinburgh ftf.
Task force assigned to work on the problem.
Discussed in call of 2000-07-14.
The question is on the 'union types proposal'. The WG discussed the question.
Paul Biron spoke in favor of the proposal: the functionality is important for our language and for other people's schemas. XSL documented this very early, so it's a well-known need which this proposal meets.
David Beech expressed a number of reservations: there's an even stronger requirement for codependency constraints, which we've done nothing about. This proposal adds to the complexity of the spec, and picks some low-hanging fruit, but not necessarily the most important fruit. The proposal to change the syntax for complex types does so in a way that does some strange things (e.g. requiring 'restriction' as part of the definition of a complex type without any explicit base!).
It's much more important to just put 'type' in the element name and merge simple and complex types. The uniformity proposed here has several restrictions. Is this a slippery slope? What about unions for complex types? You soon get into needing discriminators. Unions are not normally order-dependent; they are usually commutative.
The union-types proposal from the task force was sent 30 June 2000 to the IG by Henry Thompson.
Discussed again at face to face meeting 1-2 August 2000.
Clarifications: the PSV infoset should give you both the union type and also which member of the union you actually got. It is possible to use xsi:type with union types, as a discriminator. Open enumerations (e.g. a union of the enumeration small - medium - large with string) are possible.
In discussion, some WG members said they liked the general idea better than the specifics of the proposed concrete syntax, and were worried about the verbosity of common cases; other WG members felt that the concrete syntax was a net usability win, since the proposal reduces the variation in declarations and makes their formal definitions much tighter, and thus much easier to work with in XML-aware tools. Some WG members felt that too much effort was being put into a relatively minor problem, when more important problems (e.g. co-occurrence constraints) had been shelved for now. The disambiguation rule has the perhaps unpleasant effect that A union B is not the same as B union A. An alternative syntax which used attributes and not nested subelements seemed attractive to some WG members; it would however in practice require that all member types have names, and had been considered and rejected by the task force. Some members of the WG held that the attribute-based alternative would in fact be harder to understand and teach.
Resolved: to resolve issues LC-2 and LC-93 by adopting the proposal. Dissenting: Vedamuthu. Abstaining: Beech, Grosso.
On the follow-on question of aligning the syntax of complex types with that proposed for simple types, a majority (12:4) was inclined to pursue the question. A task force met overnight and reported the next day.
Resolved: to adopt the task force proposal, leaving open the question of the generic identifiers for the elements and the correct solution to the design problem identified in the proposal.
Discussed the follow-on issues in call of 2000-08-10. Resolved: to retain the element names simpleContent and complexContent.
Resolved without dissent: to allow the 'mixed' attribute on the 'complexType' element, specifying that when the complexType element has a simpleContent child, then the attribute has no effect (but is not illegal). Abstaining: Beech, Corda, Mendelsohn, Olken, Vedamuthu by proxy.
Should the discussion of order relations state or imply that each datatype must define a different order relation on the value space?
Input from Paul Cotton:
(Paul Cotton to XML Schema comments list, 9 March 2000)
1. Section 2.4.1.2 Order
This section states "In such cases each datatype will define a different order relation on the value space". I do not understand why this must be done. Certainly at worst it should say "may define". Better even would be to delete the sentence entirely.
Formal response to commentator. Paul Cotton indicates resolution is satisfactory.
Should the discussion of enumerations be revised to specify that enumerations inherit their ordering from their base type?
Input from Paul Cotton:
(Paul Cotton to XML Schema comments list, 9 March 2000)
2. Section 2.4.2.5 enumeration
This section states "No order or any other relationship is implied ...". This seems to imply that enumerations are not ordered. I think this sentence needs to be reworded to imply that "No further ordering is implied" since certainly the ordering of the underlying data type must be inherited. If not then XML Query will have no means of ordering enumerations.
petsa@us.ibm.com to XML Schema Comments list, Fri, 14 Apr 2000 13:18:32 -0400
Section has been reworded to reflect the intent above.
Formal response to commentator. Paul Cotton indicates resolution is satisfactory.
Should all primitive datatypes specify how their values are ordered?
Input from Paul Cotton:
(Paul Cotton to XML Schema comments list, 9 March 2000)
3. Section 3.2.1 string
This section states "The ordered property of string is the Unicode character number sequence." The string data type is the only primitive datatype that makes an explicit statement about how the ordering relation (not property) is defined. I expect the ordering information is missing from other primitive datatype sections.
petsa@us.ibm.com to XML Schema Comments list, Fri, 14 Apr 2000 13:18:32 -0400
We've added order relations for the other primitive datatypes.
Formal response to commentator. Paul Cotton indicates resolution is satisfactory.
Should XML Schema allow schema authors to specify that a particular subtype of string should be sorted according to a particular collation sequence? If so, should XML Schema provide a way of defining collation sequences, or simply a way to provide a name, with the further details left out of bound (as in SQL)?
Input from Paul Cotton:
(Paul Cotton to XML Schema comments list, 9 March 2000)
4. Section 3.2.1 string This section states "The ordered property of string is the Unicode character number sequence." I wonder why the definition of the string datatype does not permit a user to define the "collation" to be used? "Unicode character number sequence" is only one "collation" and is not very useful. In addition the specification does not explain why this "collation" is needed.
XML Query will need to support different collations for the string data
type. It would be preferable if the collation was defined as part of the
<data type>
not as part of the query
<predicate>
s. I would recommend you consider a solution such
as one adopted by SQL to permit the type definer to simply name the collation
to be used. No exact definition of the action collation needs to be provided
since there are several other sources for this information.
petsa@us.ibm.com to XML Schema Comments list, Fri, 14 Apr 2000 13:18:32 -0400
Collation is needed to enable max/min on strings. The WG discussed user-defined collations but decide not to do anything about this in V1. My personal viewpoint is that, except for the min/max case, schema never concerns itself with the relation between 2 strings and so this is not a schema problem. Others disagree with this position but, regardless, we will not add anything in V!.
Discussed in call of 2000-06-29.
Some members of the WG suggested that this issue is tied in with the proposal from the i18n WG to remove minimum- and maximum-value facets from strings and string-based types.
After discussion, a straw poll was taken, covering the four possibilities of keeping/adding or removing/not-adding the min/max facets and user-identified collation sequences. The results showed a small amount of support for retaining min/max values and adding user-identified collation sequences, some support for removing min/max and adding collations, no support for the status quo, and substantial support for removing min- and max-values and declining to add user-identified collation sequences.
RESOLVED: to remove minimum- and maximum-value facets from strings and string-based types, and to reply to LC-6 by saying no (with rationale). Dissenting: Olken.
This decision left us with a follow-on question. Up until now, all ordered value spaces have had (a) a specified ordering relation and (b) minimum- and maximum-value facets. In removing those facets from string, we have removed the need to specify any collation sequence at all. Do we wish:
There was no support for claiming that strings are intrinsically unordered; there was some support for each of the others, with a very strong preponderance of support for saying only that XML Schema does not specify any ordering relation for strings.
Paul Biron expressed the intention of the editors to add a note making clear that any description of a value space as 'unordered' should not be taken as precluding other applications from defining an ordering relation for that value space, only as indicating that XML Schema defines no such relation. There was no audible objection to that statement of intent.
RESOLVED: to specify that XML Schema defines no ordering relation on strings. Dissenting: Maloney, Olken. (Rationale: we should say that strings are ordered, and that the order relation is locale-dependent).
Is the requirement that XML Schema implementations must support arbitrary-precision decimals an excessive burden on implementors of a query language? Should XML Schema instead specify that the maximum precision for decimal numbers should be an "implementation-defined number not less than X", with the value of X to be determined?
Input from Paul Cotton:
(Paul Cotton to XML Schema comments list, 9 March 2000)
5. Section 3.2.5 decimal
The Note in this section asks "Our design discussions did not reveal convincing evidence of undue burden because of arbitrary precision decimal numbers in this design, but we welcome further input from implementors".
I believe that you may want to consider the impact on implementors of a
query language based on this data type that must implement
<predicate>
s and arithmetic operators for an "arbitary
precision decimal number". I believe we will find this to be too expensive and
that implementations will in fact constrain the precision of this data type. If
the XML Schema specification does not do this then interoperability will be
heavily constrained.
I do not accept the argument that XML Schemas needs an arbitrarily precise decimal datatype just to be able to model the length of names in XML which are in turn unconstrained in length.
I suggest that the document be modified to state that the maximum precision for decimal numbers should be an "implementation-defined number not less than X" where X can be agreed upon by implementors as a practical lower limit for this amount. "Implementation-defined" means that a conforming implementation must state in its conformance statement what the value is.
petsa@us.ibm.com to XML Schema Comments list, Fri, 14 Apr 2000 13:18:32 -0400
There is now a note in 3.2.5 asking for feedback on this issue.
Discussed at Edinburgh ftf.
Task force consisting of Robie and Malhotra will examine this with a goal of formulating a new design which (1) sets minimum precision and magnitude that all processors must support and (2) makes a formal proposal for what schema processors must accept (presumably includes unrestricted integer) in a schema as well as an instance.
1. The XML Schema spec set down the minimum mumber of digits that must be supported by a conforming XML processor for the numeric datatypes integer, decimal, float and double.
2. XML processors are free to support more than the minimum number of digits. If they do so, they should adverstise this fact as part of their specifications.
3. The minimum number of digits required for (1) should be derived based on the number of digits supported by some standard programming languages such as C and Java. These are discussed below. In earlier notes I had proposed that the precisions be based on the number of digits supported by 32-bit processors but I realized that languages often use multiple words to store numeric values.
Also, as most processors will translate values encoded in XML documents into values in some programming language, it seems more sensible to base precisions on those supported by common programming languages.
SUGGESTED PRECISIONS
The Java Language Specification (Gosling, Joy, Steele) says
C allows compilers to chose how many digits to use for int and long. The limits.h library defines the minimum and maximum values for long consistent with the values for Java int above.
For floating point numbers float.h defines precisions tha are significantly lower: for float, 6 digits of precision in the mantissa and a maximum of +/- 37 for the exponent. For double, 10 digits of precision and still +/- 37 as a maximum for the exponent.
I do not understand the lower precision for floating point numbers in C. Perhaps this is because float.h also allows you to specify the radix for float and double or merely that I used Kernighan and Richie and newer compilers allow more digits.
RECOMMENDATION
If we set a single minimum standard then, based on the Java figures above, I would recommend 18 digits for integers and decimals. For float and double I would recommend 15 digits for the mantissa and 2 digits for the exponent.
If these figures are felt to be too generous we could go with a 2-tier system.
Discussed in call of 2000-07-21.
The question is on a proposal from Paul Cotton to specify some minimal level of support for precision of decimals, which all implementations must support as a minimum. The argument in favor is that this clarifies where the interoperability boundary lies much more clearly than will otherwise be the case. The argument against is that the minimum level of support will turn into the maximum and no one will support any more.
Concrete proposal by Ashok Malhotra is: If we set a single mnimum standard then, based on the Java figures above, I would recommend 18 digits for integers and decimals. For float and double I would recommend 15 digits for the mantissa and 2 digits for the exponent.
RESOLVED: to dispose of issue LC-7 by saying in principle yes, we will specify some minimal level of support for precision of decimals. Dissenting: Biron, Corda, Gudgin. Abstaining: none
RESOLVED without dissent: to specify that conforming processors may support any number of digits of precision greater than or equal to 18 digits precision, and to make the minimum level a priority feedback issue. Abstaining: Biron, Corda, Fuchs, Gudgin, Mendelsohn, Peterson.
RESOLVED: to require processors to specify the maximum number of digits they support. Dissenting: Fuchs, Hollander, Mendelsohn, Vedamuthu. Rationale for dissent: because we don't specify anything about the form of documentation, this is not really enforceable as a requirement.
Michael Rys would prefer a different solution.
Should XML Schema forbid the use of non-significant leading and trailing zeroes?
Cf. Allow multiple lexical spaces for floats?
Input from Paul Cotton:
(Paul Cotton to XML Schema comments list, 9 March 2000)
6. Section 3.3.9 integer
The definition of the lexical representation of the integer datatype does not correctly reflect that non-significant leading and trailing zeroes should not be used. Non-significant zeroes are leading zeroes to the left of the decimal point or trailing zeroes to the right of the decimal point. I suggest using this concept in the descriptive material.
petsa@us.ibm.com to XML Schema Comments list, Fri, 14 Apr 2000 13:18:32 -0400
This is a good addition to integer. For decimal, trailing zeroes sometimes provide information.
Discussed at Edinburgh ftf.
CONSENSUS: make no change to status quo (meaning keep multiple lexical representations for integers and decimals), but add explicit definition of canonical representations for them (and other types with multiple lexical representations).
RESPONSE to originator will be that we are unwilling to impose that usability cost. ACTION to editors to propose which forms are canonical.
OPEN QUESTION: what about the PSV infoset? Do you get a single value in the original form? In the canonical form? Both? Status quo is you get the string as it was in the document and its type.
These questions were resolved, in the course of September, in favor of a schema-normalized form of any simple type, which has undergone whitespace normalization and has no comments or processing instructions, but which is not canonicalized (so that, for example, nonsignificant leading and trailing zeroes may occur). Members of the WG deeply involved with database implementation said that most commercial DBMS are capable of handling leading and trailing zeroes.
Formal response to commentator. Paul Cotton indicates that XML Query can live with this decision, although it would prefer a differeent one. "I am still concerned that user's of the XML Query language will want to distinguish between occurrences of XML Schema Integers with the values 01 and 1. This is why I wanted to prohibit the former. Since XML Schema has not prohibited the first alternative then XML Query will simply have to ensure that it clearly states which values can be searched for."
How should a schema designer go about restricting the string type to contain only characters from a specific coded character set or encoding, such as ASCII or ISO Latin 1 (ISO 8859-1)?
Input from Peter Canning:
(Peter Canning to XML Schema IG, 14 March 2000 -- member-only)
[What if I want a type 'ASCII' or 'Latin1', to model an existing string type?]
(The following draft reply was prepared by C. M. Sperberg-McQueen.)
Because XML Schema applies only at the info-set level, it is not possible to restrict a string to a specific character encoding, nor to what ISO standards used to call a coded character set, such as ISO 8859-1. All XML processors are required to accept data in the UTF-16 and UTF-8 encodings, and no schema can override that.
It is, however, possible to restrict a string to a particular set of characters (a particular character repertoire). The process might go something like this:
If you are defining a self-contained schema (i.e. you don't expect elements from other namespaces to appear in your documents), this should be all you are looking for.
If you are defining a schema module, which is expected to be used in conjunction with other modules, designed by other people, and you had been hoping for a way to restrict the other modules, then it will be disappointing to learn that such after-the-fact restriction of other people's schemas is not supported in XML Schema.
(The following draft reply was prepared by Paul Biron and posted to the XML Schema IG on 15 March 2000.)
Assuming the character set you wish to restrict to is a Unicode Block, then you could define a subtype of string restricted to just that block as in:
<simpleType name='ascii-string' base='string'> <pattern value='\p{IsBasicLatin}*'/> </simpleType> |
or
<simpleType name='latin-1-string' base='string'> <pattern value='\p{IsLatin-1Supplement}*'/> </simpleType> |
This is mentioned in the regex design that was part of the 1999-12-17 draft but was accidentally left out of the 2000-02-25 draft. The recognized block names are those in the Unicode Database (file blocks.txt), with whitespace stripped out.
If the character set you with to restrict to is not a single block, but can be constructed by combining some set of Unicode properties, then you can do something like (which is described in the 2000-02-25 draft):
<simpleType name='letters-and-punctuation' base='string'> <pattern value='[\p{L}\p{P}]*'/> </simpleType> |
As a last resort, you could always construct the character class which comprises the character set by enumerating all members of the class [2], as in:
<simpleType name='my-strings' base='string'> <pattern value='[{̻ᕡ...]*'/> </simpleType> |
Commentator responds privately 21 September "I am satisfied with the response of the working group on this issue."
Should the exposition of identity-constraint tables be revised in the interests of clarity?
Input from Mary Fernandez:
(Mary Fernandez to XML Schema Comments list, 15 March 2000)
I am writing as a representative of the XML Query working group. Currently, we are specifying the data model for XML Query. Part of that exercise requires specifying formally the mapping from an instance of the PSV Infoset to an instance of the Query data model.
This message regards the definition of the Schema Infoset Contribution for Identity-constraint tables described in : http://www.w3.org/TR/xmlschema-1/#Identity-constraint_Definition_details.
This message has 2 parts:
Part 1.
A request to review the example in the attached text file. Please confirm that this example is correct w.r.t. the definition above.
If not, please explain how we misinterpreted the definition.
The attached example contains an abbreviated version of the <purchaseReport> example by David Fallside in : http://www.w3.org/XML/Group/xmlschema-current/new-design/exposition.html.
For the given example, we define a fragement of the corresponding PSV Infoset instance, which contains the Identity-constraint table for the <purchaseReport> element item.
We use the following notation:
Here's a DTD for an identity-constraint table that I inferred from the XML Schema document:
<!ELEMENT psvis:identityConstraintTable (psvis:identityConstraint, psvis:nodeTable)*> <!ELEMENT psvis:identityConstraint is:ref> <!ELEMENT psvis:nodeTable (psvis:keySequence, psv:qualifiedNode)*> <!ELEMENT psvis:keySequence (is:ref)*> <!ELEMENT psvis:qualifiedNode is:ref> <!ELEMENT is:ref EMPTY> <!ATTLIST is:ref idref > |
Part 2
A question and a suggestion follow.
In the subsection "Schema Information Set Contribution: Identity-constraint Table", I believe "key sequence k" should be changed to "key sequence b". If not, then please explain the relationship between the b & k key sequences.
[On 17 March 2000 Henry Thompson replies: You're right about the typo, that should be 'key sequence k' throughout."]
Part of the difficulty in understanding the definition of the Identity-Constraint table is that the document tries to describe in prose both what the table is and how to construct it at the same time. I tried to translate the prose into a pseudo-code specification (see below). If we assume that a node table is is a table of (key-sequence, qualified-node) pairs and is keyed on key-sequence, we can compute a new node table for element E, eligible constraint C as follows:
fun nodeTable(E, C) { let /* compute node table for element E & constraint C */ table = UNION(forall (keyseq, qualnode) in eligibleConstraint(E,C)) /* inherit non-conflicting constraints from children */ kidsTable = UNION(forall K in children(E), nodeTable(K, C)) inheritTable = kidsTable - table in table UNION inheritTable } // inheritTable is really a project on key-sequence, difference of the // two tables and then a join with kidsTable to reconstruct the // inherited table. |
You might not want to present the definition in such a form, but for anyone who must implement or use the Schema definition (such as the Query working group), this is precisely what must be inferred.
[On 5 April 2000, Henry Thompson replied: "As near as I can tell your algorithm is correct, and the example infoset is also correct."]
A. Abbreviated Example Data (same Schema as in full example)
<purchaseReport> <regions> <zip code="95819"> <part number="872-AA"/> <part number="455-BX"/> </zip> <zip code="63143"> <part number="455-BX"/> </zip> </regions> <parts> <part number="872-AA">Lawnmower</part> <part number="455-BX">Sturdy Shelves</part> </parts> </purchaseReport> |
B. Information-Set Instance for Example Data in A. This is an XML serialization of the infoset instance.
<is:Document id="document#0"> <is:children> <is:ref idref="element#0"/> <!-- reference to infoset item for schema --> <psvis:schema idref="schema#0"/> </is:children> </is:Document> <is:Element is:id="element#0"> <is:localName>purchaseReport</is:localName> <is:children> <is:ref idref="element#1"> <is:ref idref="element#7"> </is:children> <psvis:identityConstraintTable> <!-- Skip <unique> constraint --> <!-- Constraint : <key name="pNumKey"> --> <psvis:identityConstraint> <!-- reference to infoset item for <key name="pNumKey"> --> <is:ref idref="identityConstraint#0"/> </psvis:identityConstraint> <psvis:nodeTable> <psvis:keySequence> <!-- number="872-AAA" --> <is:ref idref="attribute#5"/> </psvis:keySequence> <psvis:qualifiedNode> <is:ref idref="element#8"/> </psvis:qualifiedNode> <psvis:keySequence> <!-- number="455-BX" --> <is:ref idref="attribute#6"/> </psvis:keySequence> <psvis:qualifiedNode> <is:ref idref="element#9"/ </psvis:qualifiedNode> </psvis:nodeTable> <!-- Constarint : <keyref refer="pNumKey"> --> <psvis:identityConstraint> <!-- reference to infoset item for <keyref refer="pNumKey">...</key> --> <is:ref idref="identityConstraint#1"/> </psvis:identityConstraint> <psvis:nodeTable> <psvis:keySequence> <!-- number="872-AAA" --> <is:ref idref="attribute#1"/> </psvis:keySequence> <psvis:qualifiedNode> <is:ref idref="element#8"/> </psvis:qualifiedNode> <psvis:keySequence> <!-- number="455-BX" --> <is:ref idref="attribute#2"/> </psvis:keySequence> <psvis:qualifiedNode> <is:ref idref="element#9"/ </psvis:qualifiedNode> <psvis:keySequence> <!-- number="455-BX" --> <is:ref idref="attribute#4"/> </psvis:keySequence> <psvis:qualifiedNode> <is:ref idref="element#9"/ </psvis:qualifiedNode> </psvis:nodeTable> </psvis:identityConstraintTable> </is:Element> <is:Element is:id="element#1"> <is:localName>regions</is:localName> <is:children> <is:ref idref="element#2"/> <is:ref idref="element#5"/> </is:children> </is:Element> <is:Element is:id="element#2"> <is:localName>zip</is:localName> <is:attributes> <is:ref idref="attribute#0"/> </is:attributes> <is:children> <is:ref idref="element#3"/> <is:ref idref="element#4"/> </is:children> </is:Element> <is:Element is:id="element#3"> <is:localName>part</is:localName> <is:attributes> <is:ref idref="attribute#1/> </is:attributes> </is:Element> <is:Element is:id="element#4"> <is:localName>part</is:localName> <is:attributes> <is:ref idref="attribute#2/> </is:attributes> </is:Element> <is:Element is:id="element#5"> <is:localName>zip</is:localName> <is:attributes> <is:ref idref="attribute#3"/> </is:attributes> <is:children> <is:ref idref="element#6"/> </is:children> </is:Element> <is:Element is:id="element#6"> <is:localName>part</is:localName> <is:attributes> <is:ref idref="attribute#4/> </is:attributes> </is:Element> <is:Element is:id="element#7"> <is:localName>parts</is:localName> <is:children> <is:ref idref="element#8"/> <is:ref idref="element#9"/> </is:children> </is:Element> <is:Element is:id="element#8"> <is:localName>part</is:localName> <is:attributes> <is:ref idref="attribute#5/> </is:attributes> <is:children> <!-- references to CDATAItems for "Lawnmower" --> </is:children> </is:Element> <is:Element is:id="element#9"> <is:localName>part</is:localName> <is:attributes> <is:ref idref="attribute#6/> </is:attributes> <is:children> <!-- references to CDATAItems for "Sturdy Shelves" --> </is:children> </is:Element> <is:Attribute is:id="attribute#0"> <is:localName>code</is:localName> <value>95819</value> </is:Attribute> <is:Attribute is:id="attribute#1"> <is:localName>number</is:localName> <value>872-AA</value> </is:Attribute> <is:Attribute is:id="attribute#2"> <is:localName>number</is:localName> <value>455-BX</value> </is:Attribute> <is:Attribute is:id="attribute#3"> <is:localName>number</is:localName> <value>872-AA</value> </is:Attribute> <is:Attribute is:id="attribute#4"> <is:localName>number</is:localName> <value>455-BX</value> </is:Attribute> |
This issue was discussed and resolved together with issue LC-159; see there for details of decisions in this area.
Should the date and time types have the same value for period?
Input from Aram Airapetian:
(Aram Airapetian to XML Schema Comments list, 30 March 2000)
How come the date and the time types have the same period? The "000000T2400" value is period for time. It is 'unit' for date, though. For date (CCYY-MM-DD, see 3.3.22.1) period is "010000". Am I missing something?
The notions of 'period' and 'unit' might be beneficial for simplification of numeric type definition. All types 3.2.2 - 3.2.5 and 3.3.9 - 3.3.21 could be defined (derived from decimal) by assigning corresponding 'period', 'unit' and 'signed/unsigned' constraints.
Input from Ashok Malhotra:
My apologies for taking so long to reply to your note below. We have made some changes to the text and corrected some typos. Now, "date" has a period of 0 (no recurrence) and a duration of 24hours. "time" has a period of 24 hours and a duration of zero. I trust that addresses your concerns.
How does a schema author go about adding new attributes to elements declared in a different module?
Input from Aaron M. Cohen:
(Aaron M. Cohen to XML Schema Comments list, 3 April 2000)
It has recently been brought to my attention that the current draft of XML-Schemas does not provide a mechanism for incrementally building up an attribute set on an element, perhaps from several separate modules containing attributes, and also for adding attributes to elements already declared.
This is essential for modularizing SMIL with XML Schemas, and is something that can (and has) already be done with DTD's. Maybe my information is incorrect, or my understanding is faulty, so I am sending this to make you aware of our needs, and asking for guidance in applying XML Schemas in this manner. If my information and understanding are correct, then please consider this a requirement for the SMIL modules to have associated XML Schemas. I am under the impression that XHTML has very similar needs, and my impression is based in part on my exchanges with some of the XHTML folks.
A concrete example will probably make our needs clear. So I'll discuss our modularization needs in the context of a vastly simplified view of SMIL.
SMIL timing is a set of reusable modules that can be used to incorporate timing relationships into XML languages. This language could be "SMIL", or "XHTML", or something else. We provide time containers, which are grouping elements that contain other elements and provide semantics for the timing relationships between the grouped elements. We also provide attributes that are added to elements to allow authors to specify explicit timing relationships.
For example, SMIL includes the parallel time container, <par>, which allows for several things to be played at once. To play an audio and video clip and the same time (assuming that the language, like SMIL 1.0, includes elements to express playing each of these):
<par> <video .../> <audio .../> </par> |
Note that the elements could be from some other language, such as XHTML. Here's how two paragraphs of text might be displayed at the same time:
<par> <p>First Paragraph.</p> <p>Second Paragraph.</p> </par> |
This becomes much more powerful when the elements that are timed can have their own timing-specific attributes. For this discussion, we'll only have two. 'begin' tells when to start displaying something, and 'dur' specifies the duration, or how long to display it. Building off the XHTML example:
<par> <p begin="0s" dur="1s">First Paragraph.</p> <p begin="2s" dur="1s">Second Paragraph.</p> </par> |
This displays the first paragraph immediately for 1 second, then there is one second where nothing is displayed, and then at time=2s, the second paragraph is displayed for 1 second.
This admittedly very simple example gets to the root of what we need to be able to accomplish with XML Schemas. The <p> element is already defined in a module by XHTML. For a language designer to be able to incorporate or combine timing with elements already defined, we need to be able to extend the attribute set of an element defined in a different module. For this particular example, the language designer needs an XML Schema method of adding the begin and dur attributes defined in a SMIL XML Schema module to the <p> element defined in an XHTML Schema module. It is not sufficient to just be able to declare the members of an attribute set from several places when the element is initially defined.
Please feel free to contact me with any questions you may have about anything that I have said here.
Input from Dan Connolly <connolly@w3.org>:
Dan Connolly <connolly@w3.org> to XML Schema Comments list on Fri, 12 May 2000 17:18:11 -0500
It has recently been brought to my attention that the current draft of XML-Schemas does not provide a mechanism for incrementally building up an attribute set on an element, perhaps from several separate modules containing attributes,
I believe the attribute group mechanism provides this. http://www.w3.org/TR/2000/WD-xmlschema-1-20000407/#Attribute_Group_Definition
and also for adding attributes to elements already declared.
I presume you're talking about this idiom...
"6.1. Defining additional attributes [...] This works because XML permits the definition or extension of the attribute list for an element at any point in a DTD. " http://www.w3.org/TR/2000/WD-xhtml-building-20000105/developing.html#s_dev_attrs
The Schema spec provides an analog of this idiom; if you want your element declarations to allow other attributes this way, just include
<anyAttribute namespace="##any" processContents="strict"/> |
cf http://www.w3.org/TR/2000/WD-xmlschema-1-20000407/#element-anyAttribute
I would actually recommend ##other
rather than
##any
; requiring
"mixed in attributes" to be declared as coming from another
namespace seens more appropriate than allowing unqalified
attributes or attributes from the same namespace.
This is essential for modularizing SMIL with XML Schemas, and is something that can (and has) already be done with DTD's.
Well... it can be done with a specific set of conventions layered on top of DTDs, using parameter entities. It cannot be done for arbitrary combinations of DTDs. And the XHTML modularization mechanisms are susceptible to name collisions between modules.
cf http://www.w3.org/TR/2000/WD-xhtml-building-20000105/developing.html#s_dev_attrs
Maybe my information is incorrect, or my understanding is faulty, so I am sending this to make you aware of our needs, and asking for guidance in applying XML Schemas in this manner. If my information and understanding are correct, then please consider this a requirement for the SMIL modules to have associated XML Schemas. I am under the impression that XHTML has very similar needs, and my impression is based in part on my exchanges with some of the XHTML folks.
A concrete example will probably make our needs clear.
Yes, thanks.
So I'll discuss our modularization needs in the context of a vastly simplified view of SMIL.
SMIL timing is a set of reusable modules that can be used to incorporate timing relationships into XML languages.
I started drafting a schema for SMIL animation; see: http://www.w3.org/XML/2000/04schema-hacking/smil-animation.xsd, revision 1.1, date: 2000/05/04 19:35:09, aka http://www.w3.org/XML/2000/04schema-hacking/smil-animation.xsd.txt
This language could be "SMIL", or "XHTML", or something else. ...
This admittedly very simple example gets to the root of what we need to be able to accomplish with XML Schemas. The <p> element is already defined in a module by XHTML. For a language designer to be able to incorporate or combine timing with elements already defined, we need to be able to extend the attribute set of an element defined in a different module.
Or, alternatively, you need the definition of the P element to allow attributes declared elsewhere (as noted above, this is the default in XML 1.0)
I mocked up this example... http://www.w3.org/XML/2000/04schema-hacking/h+s.html
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:xsi='http://www.w3.org/1999/XMLSchema-instance' xmlns:t='http://www.w3.org/2000/TR/smil-animation10' xsi:schemaLocation="http://www.w3.org/1999/xhtml html.xsd http://www.w3.org/2000/TR/smil-animation10 smil-animation.xsd " > <head><title>Example of mixing HTML and SMIL timing</title></head> <body><h1>Some stuff</h1> <!-- example from http://www.w3.org/TR/NOTE-HTMLplusTIME --> <p t:begin="1"> This is a paragraph of text that appears after one second </p> <p t:begin="2"> This is a paragraph of text that appears after two seconds </p> <p t:begin="3"> This is a paragraph of text that appears after three seconds </p> </body> </html> |
(The schemaLocation gobbledygook is only necessary until schemas for XHTML and SMIL are available by dereferencing their namespace identifiers)
then in html.xsd: (i.e. http://www.w3.org/XML/2000/04schema-hacking/html.xsd)
<element name='p'> <complexType content='mixed'> <anyAttribute namespace="##other" processContents="strict"/> [...] </complexType> </element> |
and in smil-animation.xsd :
<attribute name='begin' type='string'/> <attribute name='dur' type='string'/> <attribute name='end' type='string'/> <attribute name='restart'> <simpleType base='string'> <enumeration value='always'/> <enumeration value='never'/> <enumeration value='whenNotActive'/> </simpleType> </attribute> <attribute name='repeatCount' type='string'/> <attribute name='repeatDur' type='string'/> <attribute name='fill'> <simpleType base='string'> <enumeration value='remove'/> <enumeration value='freeze'/> </simpleType> </attribute> |
You can check the results using http://cgi.w3.org/cgi-bin/xmlschema-check i.e. http://cgi.w3.org/cgi-bin/xmlschema-check?docAddrs=http%3A%2F%2Fwww.w3.org%2FXML%2F2000%2F04schema-hacking%2Fh%2Bs.html
I have used qualified names for the "mixed in" attributes, per this principle:
"The syntax must unambiguously associate an identifier in a document with the related schema without requiring inspection of that or another schema." -- http://www.w3.org/TR/1998/NOTE-webarch-extlang-19980210#Ambiguity
The schema spec allows you to violate that principle by using
<anyAttribute namespace="##any"/>
on the P element
declaration, and and attributeForm="unqualified"
in the
smil-animation schema, but I don't recommend that.
For this particular example, the language designer needs an XML Schema method of adding the begin and dur attributes defined in a SMIL XML Schema module to the <p> element defined in an XHTML Schema module.
I believe I have demonstrated this above.
It is not sufficient to just be able to declare the members of an attribute set from several places when the element is initially defined.
... To this end, we need confirmation that it can be done with XML Schemas, and guidance in how to proceed.
Does the explanation above provide enough confirmation and guidance?
I've been told to expect a note on modularization using XML Schemas. Will our use cases be covered in that note?
I'm not sure.
Input from Cohen, Aaron M <aaron.m.cohen@intel.com>:
"Cohen, Aaron M" <aaron.m.cohen@intel.com> to XML Schema Comments list on Mon, 15 May 2000 15:31:05 -0700
Dan:
Thanks for taking a look at this. I'll take a closer look at your SMIL Animation Schema later this week when I have a chance.
Meanwhile, I have a question. You integrated the SMIL timing attributes with the "P" element by making your own version of xhtml.xsd that allowed including qualified attribute names from other schemas. I'm not sure that this is a total solution.
However, in my understanding of our discussion of namespaces, I got the impression that redefining elements to include new content and/or attributes and then "putting" those elements in a "new" namespace was okay. So it seems that you can define hybrid languages, it's just that they don't have a "lineage" reflected in a namespace structure, or in an XML Schema structure (because the attributes on an element can only be defined in one place).
This will lead to integrations that need to be done very differently in DTD's vs. XML Schemas. What I mean is that in some cases, the schema will have to repeat a lot of stuff defined in a "base" schema, while the DTD will not. The case in point is XHTML+SMIL, which it seems, will have to repeat the definitions of all of the XHTML-based elements, add the timing attributes (which can be done nicely in attribute groups in the smil modules), and declare them in a new namespace. So while there will be very tight semantics coupling between XHTML and XHTML+SMIL, this coupling will not be reflected in either the namespace or the schema for XHTML+SMIL.
About the best that I think could be done is if the XHTML-WG creates "proto-HTML" types that can be used in the derived languages to define the actual elements.
I think that you can do something like this for timed <a> link's (correct me if I am wrong):
In the XHTML.xsd:
<xsd:complexType content="mixed" name="aType"> <xsd:attribute name="href" type="xsd:uriReference"/> ...other linking attribute declarations </xsd:complexType> |
In the SMIL.xsd:
<xsd:attributeGroup name="smilTimingAttrs" /> <xsd:attribute name="begin" type="xsd:string"/> <xsd:attribute name="end" type="xsd:string"/> ...other timing attribute declarations </xsd:attributeGroup> |
And then in the XHTML+SMIL.xsd:
<xsd:element name="a" type="xhtml:aType"> <xsd:attributeGroup ref="smilTimingAttrs"/> </xsd:element> |
Of course, the means to do this has to be set up in XHTML.xsd and SMIL.xsd. I can see us doing this kind of thing for SMIL, but of course the other stuff is up to XHTML, SVG, and the other XML-based languages using XML Schemas.
Input from Dan Connolly:
Meanwhile, I have a question. You integrated the SMIL timing attributes with the "P" element by making your own version of xhtml.xsd that allowed including qualified attribute names from other schemas.
Yes, I expect the "official" XHTML schema to work that way.
I'm not sure that this is a total solution.
However, in my understanding of our discussion of namespaces, I got the impression that redefining elements to include new content and/or attributes and then "putting" those elements in a "new" namespace was okay. So it seems that you can define hybrid languages, it's just that they don't have a "lineage" reflected in a namespace structure, or in an XML Schema structure (because the attributes on an element can only be defined in one place).
Not so: (a) as I say, I expect the XHTML schema to allow anyAttribute namespace="#other" in the first place, and (b) if you did create a new XHTML namespace, you certainly could design it so that it has a discoverable "lineage", i.e. it derived from the canonical XHTML schema.
This will lead to integrations that need to be done very differently in DTD's vs. XML Schemas. What I mean is that in some cases, the schema will have to repeat a lot of stuff defined in a "base" schema, while the DTD will not.
I don't know what leads you to think that; it's not so.
The case in point is XHTML+SMIL, which it seems, will have to repeat the definitions of all of the XHTML-based elements, add the timing attributes (which can be done nicely in attribute groups in the smil modules), and declare them in a new namespace. So while there will be very tight semantics coupling between XHTML and XHTML+SMIL, this coupling will not be reflected in either the namespace or the schema for XHTML+SMIL.
About the best that I think could be done is if the XHTML-WG creates "proto-HTML" types that can be used in the derived languages to define the actual elements.
I certainly expect the HTML WG to use things like <anyAttribute
namespace="#other">
to specify that XHTML is extensible.
I think that you can do something like this for timed <a> link's (correct me if I am wrong): ...
Yes, that's another way to design a schema for XHTML+SMIL.
But I hope that we don't use that approach; as you say:
Of course, the means to do this has to be set up in XHTML.xsd and SMIL.xsd. I can see us doing this kind of thing for SMIL, but of course the other stuff is up to XHTML, SVG, and the other XML-based languages using XML Schemas.
I think it's clearly preferable to have one schema for XHTML, one for SMIL, one for SVG, and one for MathML that can be used together in compound documents; rather than one for XHTML+MathML, one for XHTML+MathML+SVG, etc. for a total of N! schemas.
See clarifications above.
How does a schema author go about adding new elements as children of elements declared in a different module?
Input from Aaron M. Cohen:
(Aaron M. Cohen to XML Schema Comments list, 3 April 2000)
The same thing goes for the content model of the elements. It may be necessary to extend the permissible child element set of a module beyond how it was initially defined. SMIL Boston is planning on using "levels" of modules. As the functionality goes up in higher levels, we add some elements to the set of allowed children. As above, we'll also add new attributes to existing elements in order to make them more powerful in the higher level modules.
Furthermore, we will use the above process to define our next version of the SMIL Language, and also to define a language/profile that combines XHTML and SMIL timing structure (as well as some other aspects of SMIL), and to define a baseline "SMIL-Basic". So you can see that the use case that we are asking for is a key element in SMIL being a successful and reusable technology. The DTD experts in the group are comfortable that we can produce DTD's that reflect this structure, and we would also like to have a set XML Schemas before SMIL becomes a Recommendation. To this end, we need confirmation that it can be done with XML Schemas, and guidance in how to proceed. I've been told to expect a note on modularization using XML Schemas. Will our use cases be covered in that note?
Please feel free to contact me with any questions you may have about anything that I have said here.
Formal response to commentator.
Aaron Cohen replies that "Yes, I think that the answer below is sufficient for advance into CR, but I do think that the director should be aware of the difficulties in using XML Schemas for the modular design of reusable technology and "langauge families" and I'd like to see more experience with that as a requirement of XML Schemas coming out of CR. In particular, it is important that someone write a modular XHTML to show that it can be done in a useful and acceptable manner."
Should XML Schema define a built-in type for XPath?
Input from Curt Arnold:
(Curt Arnold to XML Schema Comments list, 6 April 2000)
It would seem a minimal burden to add a built-in datatype that allows you to declare an attribute (or element content) as conceptually being an XPath. Since XPath is intended to be used across W3C technologies, it would seem that the best place for it would be as a built-in type in Schema instead of every technology that uses it trying to kludge it with their own regular expressions.
<datatype base="string" name="XPath"/> |
The difficulty is in the implied validation a schema aware processor is expected to do when it encounters an attribute that uses an XPath or derived datatype (in the same manner the parser is anticipated to validate that a uri or Qname is valid beyond what is in the explicit Schema for Schema definitions). If that seems like too much complexity, you could except conforming processors from doing any implied validation of XPath's. But compared to the overall complexity of Schema, an XPath type validation seems fairly trivial.
Noah_Mendelsohn@lotus.com to XML Schema Comments list, Tue, 18 Apr 2000 11:48:26 -0400
Just my opinion, not speaking for the WG or anyone else, but I think that an XPath datatype would be a fine thing for the XSL workgroup to declare. I think it is a mistake to ask schemas to go too far down the road in baking in every string-type that is motivated by some other W3C spec. Schemas gives other groups the power to create their own target namespaces, and to publish schemas with the appropriate type definitions. As noted below, validation of XPath strings can at best be somewhat loose, but you can easily provide a standard W3C-wide means to express that a string is intended as an XPath. Admittedly, there is a slight circularity in the fact that schemas makes some use of XPath in structures. I would still prefer to do the architecturally correct thing, and get the XSL WG lined up to publish an XPath type if the world needs one. I would like to believe that we could sort out the corresponding trivial change to the schema for schemas during the CR period. In general, groups that own particular namespaces should own the schemas for the corresponding datatypes, I think. Yes, there is room for exceptions for convenience.
Discussed in call of 2000-06-29.
A straw poll showed a preponderance of opinion against defining an XPath data type (4 in favor, 11 opposed).
RESOLVED: to dispose of issue LC-14 by saying no (with rationale). Dissenting: Beech, Biron, Jelliffe, Peterson, Robie, Sperberg-McQueen.
Rationale for the decision: if there should be such a datatype, it should be defined as part of the XPath specification, not as part of XML Schema.
Rationale for the dissent (at least Sperberg-McQueen's): there should be such a datatype, we cannot now arrange for it to be defined in an XPath spec which is already a recommendation, and schema processors are already required to be able to type-check strings purported to be XPath expressions, so there is no additional implementation burden.
Should XML Schema provide a mechanism to allow a schema author to provide an initial or hint value for an element or attribute, which would be used by user interfaces but which unlike a default value would not add anything to the information set?
Input from Curt Arnold:
(Curt Arnold to XML Schema Comments list, 6 April 2000)
I meant to mention this at XTech, but it may be useful to allow for element and attribute a way that a user agent could obtain an initial or hint value for an element or attribute that didn't add anything to the information set. Possibly something like:
<!-- For 'element' and 'attribute' --> <attributeGroup name="valueConstraint"> <attribute name="default" type="string"/> <attribute name="fixed" type="string"/> <attribute name="initial" type="string"/> </attributeGroup> |
When initial is specified, a user agent may use it to provide a initial value for a user interface field, but an XML processor doesn't use it to provide a value if it isn't specified.
<element name="order"> <attribute name="quantity" type="non-negative-integer" initial="1"/> <attribute name="item" type="uri-Reference"/> </order> |
This could result in a Schema generated UI having a quantity field with initialized to 1, but would not result in:
<order item="http://www.buyme.com/item.xml?5555"/> |
being interpreted as having a quantity of 1.
Explicitly supporting this hint in XML Schema might make things a little cleaner with XForms which is using default to mean what I called initial.
Should the all group allow occurrence indicators with maxOccurs > 1?
Cf. Contents which may occur in any order
Input from Martin J. Duerst:
(Martin Duerst to XML Comments list, 7 April 2000)
On the XML schema side, if it's currently not possible to express arbitrary order with occurrence constraints, that may be a problem independent of whether P3P needs it; I'm sure there are other uses where this is a requirement.
Input from Yuichi Koike:
"Yuichi Koike" <koike@w3.org> to XML Schema Comments list, Tue, 18 Apr 2000 14:00:30 -0400
Though P3P WG decided to have fixed the element order in P3P 1.0 spec, we would like XML schema to have the ability to express arbitrary element order in a compact form.
And the answer to the following question is "Yes".
At 00/04/04 22:12 +0100, Henry S. Thompson wrote:
If what you want is arbitrary order, just what do you mean by that, e.g. , is the following OK?
<extension>...</extension> <statement>...</statement> <disclosure>...</disclosure> <statement>...</statement> <extension>...</extension> <statement>...</statement> |
The above question is pressing, if you want the WG to consider "arbitrary order with occurrence constraints", we really need clear input on this.
Discussed in call of 2000-07-20.
The question is whether to allow maxOccurs > 1 inside an all-group. If so, do we require that the occurrences of a given element type be contiguous (as in SGML) or not (counting just the overall number of occurrences of the type)?
RESOLVED: to close LC-16 with polite no. Dissenting: Peterson.
Rationale: complexity, the fact that the interpretation usually desired is incompatible with that of SGML's ampersand connector, and the feeling on the part of some WG members that this is not a pattern of document design to be recommended or supported. Formal response to commentator.
Martin Duerst replies that he is "not at all satisfied".
Should the datatypes spec give a formal definition in BNF or EBNF (or some similar formalism) for the regular-expression language?
Input from TAMURA Kent:
(TAMURA Kent to XML Schema Comments list, 10 April 2000)
I have some comments on the regular expressions section in the last call draft.
Re: The entire It is hard to know concrete syntax of the regular expression from the draft. I want readable rules like BNF.
Input from Alexander Falk:
Alexander Falk to XML Schema Comments list, 11 April 2000
I would certainly also hope to see a compact EBNF description for the Regular Expressions in the final draft - for the time being I have created my own condensed version for use in our own development, which I'll gladly share with you:
regExp ::= branch ('|' branch)* >Regular Expression (branch|branch|...) branch$ ::= piece+ >Branch (piece+) piece$ ::= atom quantifier? >Piece (atom quantifier?) quantifier$ ::= [?*+] | ( '{' quantity '}' ) >Piece quantifier (? | * | + | {quantity}) quantity$ ::= quantRange | quantMin | QuantExact >Numeric quantity quantRange$ ::= QuantExact ',' QuantExact >Quantity range {n,m} quantMin$ ::= QuantExact ',' >Minimum quantity {n,} QuantExact$ ::= [0-9]+ >Exact quantity {n} atom$ ::= Char | charClass | ( '(' regExp ')' ) >Atom (char | charclass | (regexp)) Char$ ::= [^.\?*+()|#x5B#x5D] >Normal character (any non-metacharacter) charClass ::= charClassEsc | charClassExpr >Character class (escape | expression) charClassExpr$ ::= '[' charGroup ']' >Character class expression ( [charGroup] ) charGroup ::= negCharGroup | posCharGroup | charClassSub >Character group negCharGroup$ ::= '^' posCharGroup >Negative character group charClassSub$ ::= ( posCharGroup | negCharGroup ) '-' charClassExpr >Character class subtraction posCharGroup$ ::= ( charRange | charClassEsc )+ >Positive character group (character range | character class escape)+ charRange$ ::= seRange | XmlCharRef | XmlChar >Character range (XML character|s-e range) seRange$ ::= charOrEsc '-' charOrEsc >s-e character range charOrEsc$ ::= XmlChar | SingleCharEsc >XML character or single-character escape XmlChar$ ::= [^\#x2D#x5B#x5D] >XML character (all except \[]) XmlCharRef ::= ('&#' [0-9]+ ';') | ('&#x' [0-9a-fA-F]+ ';') >Character-Reference (Ù or ê) charClassEsc ::= ( SingleCharEsc | MultiCharEsc | catEsc | complEsc ) >Character class escape SingleCharEsc ::= '\' [nrt\.?*+()|{}#x2D#x5B#x5D#x5E] >Single character escape MultiCharEsc ::= '.' | ('\' [sSiIcCdDwW]) >Multi-character escape catEsc$ ::= '\p{' charProp '}' >Category escape complEsc$ ::= '\P{' charProp '}' >Category escape compliment charProp$ ::= Letters | Marks | Numbers | Punctuation | Separators | Symbols | Other | IsBlock >Unicode character property IsBlock ::= 'Is' [a-zA-Z]+ >Unicode block name Letters ::= 'L' [ultmo]? >Unicode letters category Marks ::= 'M' [nce]? >Unicode marks category Numbers ::= 'N' [dlo]? >Unicode numbers category Punctuation ::= 'P' [cdseifo]? >Unicode punctuation category Separators ::= 'Z' [slp]? >Unicode separators category Symbols ::= 'S' [mcko]? >Unicode symbols category Other ::= 'C' [cfson]? >Unicode other category |
Should the discussion of regular expressions explain how to use character-class subtraction?
Input from TAMURA Kent:
(TAMURA Kent to XML Schema Comments list, 10 April 2000)
Re: Character class subtraction E.1:
[Definition:] A character class subtraction is a character class expression subtracted from a positive character group or negative character group, using the - character.
This definition does not explain how to use '-'. The next paragraph says "G-C is a valid character class subtraction", but there are no restriction on other usages of '-', like "GC-", "-GC" :-)
Should the datatypes spec be revised in order to make the minus sign unambiguous in regular expressions?
Input from TAMURA Kent:
(TAMURA Kent to XML Schema Comments list, 10 April 2000)
Re: '-' in character range A '-' in a character class has many meanings. So, interpretation of '-' can be ambiguous. For example:
[+--/] |
We can interpret this character class as:
Should the definition of multi-character escape in regular expressions be revised to avoid giving the impression that � and  are valid character references in XML?
Input from TAMURA Kent:
(TAMURA Kent to XML Schema Comments list, 10 April 2000)
Re: Definition of multi-character escape: "\w" is defined as "[�-]-[\p{P}\p{S}\p{C}]", but both of � and  are invalid character references in XML. I don't know characters in 𐀀- should be in "\w".
Should XML Schema define, or allow the definition of, non-Gregorian date types?
Input from David RR Webber:
(David RR Webber to XML Schema IG, 2 May 2000 -- member only)
[This is of immense importance: many people live their lives organized around non-gregorian calendars. A neutral date-time code (e.g. days and seconds since some epoch) would solve the problem.]
This issue was subsumed by a proposal to introduce abstract simple types (including an abstract date type), from which schema authors could derive concrete types with variant lexical forms. The abstract-type proposal was raised at the Edinburgh face to face (June 2000), discussed extensively in email (the i18n WG in particular was strongly opposed to it), adopted on the basis of a task-force proposal at the Redmond meeting (August 2000), and then rejected after the editors reported difficulties integrating it into the specification.
The net result is that there is no provision, in XML Schema 1.0, for the definition of dates using calendars other than the Gregorian, or lexical forms other than that of ISO 8601. Many members of the WG were sympathetic to the goal of allowing each of these, but the WG as a whole was of the opinion that such provisions, with the design work necessary to support them, were better left for a later version of XML Schema, and that the abstract-type proposal caused too many undesirable side effects in our type system to be introduced in XML Schema 1.0.
Should a glossary for XML Schema be prepared before the spec is promoted to CR? to PR?
Input from Murray Altheim:
(Murray Altheim to XML Schema IG, 5 May 2000 [member only])
[A glossary would be a big help.]
Input from Joseph M. Reagle Jr. <reagle@w3.org>:
"Joseph M. Reagle Jr." <reagle@w3.org> to XML Schema Comments list on Wed, 10 May 2000 12:48:14 -0400
Glossary; and Model Groups, Model Group Definitions, and Element Declarations
I find the distinction between these things confusing, perhaps it could be simplified or more text could be spent on describing how these things are different. Actually, I look forward to the glossary being completed as this will help me in understanding the specification. See http://lists.w3.org/Archives/Public/xmlschema-dev/2000Apr/0021.html for more:
Should the namespace for XML Schema use the date 1999 or the date 2000?
Cf. XML Schema Namespace versioning
Input from Alexander Falk:
("Falk, Alexander" <falk@icon.at> to XML Schema Comments list, 10 April 2000)
I was studying the new April 7 version of the XML Schema working draft throughout the weekend, as we are in the process of finalizing the beta 3 version of XML Spy 3.0 (see http://www.xmlspy.com/version30.asp), and I have a first list of comments and questions - especially regarding the changes to the datatypes (part 2).
Part 1 - Structures
A. Schema for Schemas
Why does the Public Identifier URN for the DOCTYPE statement still use 19991216 as its date, when the DTD for Schemas (Appendix B) is v1.1 dated 2000/04/06. This Public Identifier URN seems to imply that the Schema for Schemas is itself written in compliance with the old December 1999 XML Schema draft, which it is not.
Along the same lines: the year in the XML Schema namespace URI is also still fixed with 1999 - is that going to change for the final recommendation? While it is understandable from an implementors point of view that the URN should remain constant over the time of the draft and recommendation creation, it would IMHO be rather confusing for all future schema authors, if the date given here is not identical to the date of the final recommendation.
Discussed in call of 2000-07-13.
See issue LC-73 for discussion.
Formal response to commentator.
Duplicate formal response to commentator.
Commentator replies (by private mail) that "Yes, this is very much appreciated and it gives us - as a schema editor developer - the option to detect/support both old (April 7) and new (Sep 22) style schemas and to 'upgrade' them as well."
Should the Tabulation of Changes be revised or dropped in future versions of the structures spec?
Input from Alexander Falk:
("Falk, Alexander" <falk@icon.at> to XML Schema Comments list, 10 April 2000)
G. Tabulation of Changes The comments in this list are not very useful at all. Compared with "H Revisions from previous draft" in Part 2, which is ideal for implementors and saves us the burden of re-reading the entire Specs again and again, the list of changes in Part 1 is too minimal. Comments like "Lots of edits" or "more from Noah" are simply not comprehensible without the background that only insiders of the WG can have. Please provide a more meaningful change history in the future (or none at all).
Should the pattern facet be dropped from the boolean type?
Input from Alexander Falk:
("Falk, Alexander" <falk@icon.at> to XML Schema Comments list, 10 April 2000)
Part 2 - Datatypes
3.2.2.2 Constraining facets on boolean datatype Other than specifically restricting the lexical space to either {0,1} or {true, false} for a certain schema, what is the intention of allowing a pattern facet for booleans?
petsa@us.ibm.com to XML Schema Comments list, Fri, 14 Apr 2000 13:44:22 -0400
You could specify a pattern that only allowed "0", for example.
Should the pattern facet be dropped from the binary type?
Input from Alexander Falk:
("Falk, Alexander" <falk@icon.at> to XML Schema Comments list, 10 April 2000)
3.2.8.1 Constraining facets on binary datatype As binary currently only offers two different encodings that specify the respective lexical spaces, defining a pattern facet on binary doesn't make much sense - other than e.g. restricting the letters a-f to uppercase-only or lower-case only. However, with base64 the alphabet is strictly defined in the RFC. To answer the question contained in the Ed.Note of this chapter, I would, therefore, suggest to omit the pattern facet here from an implementors standpoint, as its benefits are rather limited and the potential confusion would be worse.
petsa@us.ibm.com to XML Schema Comments list, Fri, 14 Apr 2000 13:44:22 -0400
The utlity of "pattern" is questionable for some datatypes. Thanks for your feedback.
Should the datatypes spec define multiple lexical spaces for floating-point and decimal numbers?
Cf. Allow hex notation for integers?
Cf. Integers should not allow non-significant leading or trailing zeroes
Input from Alexander Falk:
("Falk, Alexander" <falk@icon.at> to XML Schema Comments list, 10 April 2000)
3.2.3 - 3.2.5 Lexical notation of floating-point numbers While it is very nice from an implementor's standpoint to know that all sorts of float, double, or decimal numbers will only use the period as a decimal separator, I wonder if this is really satisfying for many European and other non-US users. Specifically, when XML is being used to supplant existing systems, it is often necessary to interpret floating-point or decimal number with other decimal separators (most notably ',') and in some cases also including thousands separators (e.g. 4,560,758.99 vs. 4.560.758,99). Why is there no means provided to support these formatting styles in the XML schema draft. Just like the encoding facet for binaries, this "formatting" or "picture" facet (to use an old COBOL-coined term that was also suggested in the DCD submission to the W3C in July 1998) could be used to specify the various aspects of the lexical space of these datatypes. If we were to consider XML schemas for B2B e-Commerce scenarios only, it would be understandable to only allow one format that can be easily processed - but XML schemas should be thought of in much broader terms.
Input from Dario de Judicibus:
"Dario de Judicibus" <ddj@mclink.it> to XML Schema Comments list, Tue, 25 Apr 2000 23:12:02 +0200
Similarly we might define a new facet for decimal, dates, and other locale based data types, to support locale format. For example
<xsd:simpleType name="italianDecimal" base="xsd:decimal" derivedBy="xsd:format"> <xsd:locale value="IT-it" /> </xsd:simpleType> |
The derivation by format means that we do not restrict the scope of type, but we work on the lexical representation.
Input from Curt Arnold:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Wed, 26 Apr 2000 07:56:59 -0500
Localized numerical representations: The lexical representations were chosen for their unambiguity. For instance, the timeInstant format is an undesirable presentation format in all locales. In general it is assumed that localization of presentation would be done in transformations or applications. Definitely, wanted to avoid the case of not being able to determine (or worse guessing) whether a comma was a digit separator or a decimal point. You could create an italian decimal as a derived class from string, however min/max would be interpreted as string comparisions.
Input from Dario De Judicibus:
dejudicibus@it.ibm.com to XML Schema Comments list, Thu, 27 Apr 2000 12:35:14 +0200
Thank you for your reply, Curt. I still have a doubt about the localization issue, anyway. You said:
"Definitely, wanted to avoid the case of not being able to determine (or worse guessing) whether a comma was a digit separator or a decimal point."
I understand your point. However I am wondering the following. Let us suppose that you defined a language for legal documents which states that numbers are xsd:decimal and dates are in user-defined typical US format (MM/DD/YY). An Italian company publishes in its site a contract for web ordering of products. The contract uses that language and contains a price EUR 8.500 and a date 03/04/01. For that company, the price is eight thousand and five hundreds euros, and the date is April 3rd, 2001, but for an American customer price is eight euros and fifty cents, and date is March 4th, 2001. It would be very useful if the browser would be able to automatically convert those value to the current locale, that is the locale of customer. This is possible anyway, only if the webmaster specified in the document the locale in which those values had been written.
If you fix the format of decimal in XML Schema, you force me to use US-like format in Italian pages, or not use your language at all. That is, instread of
<product partNumber="AS45"> <description>Ink-jet Printer AS45</description> <price currency="EUR">1200.00</price> </product> |
which contains a US-form price, I will have to use
<product partNumber="AS45"> <description>Ink-jet Printer AS45</description> 1200,00 EUR </product> |
hoping that product content is defined as mixed. I would prefer
<product partNumber="AS45"> <description>Ink-jet Printer AS45</description> <price currency="EUR" xsi:locale="IT-it">1200,00</price> </product> |
As you can see, there is no need to change the meaning of decimal, but rather adding a new attribute for XML languages, and add in xsd:complexType a new property called xsd:localisable or something like that:
<xsd:element name="price"> <xsd:complexType base="xsd:decimal" derivedBy="xsd:extension" localisable="true"> <xsd:attribute name="currency" type="IsoCurrencyCodes" /> </xsd:complexType> </xsd:element> |
petsa@us.ibm.com to XML Schema Comments list, Fri, 14 Apr 2000 13:44:22 -0400
This argument was made by several people but there was a strong sentiment for a single lexical representation.
Discussed in call of 2000-07-27.
Discussion made clear that this issue is tied up with both LC-21 and LC-220.
The entire cluster of issues was discussed at face to face meeting, 1 August 2000, and the proposal for abstract types (intended to support the definition of multiple lexical spaces) was discussed again at the face to face meeting of 1 September 2000.
The net result is that there is no provision for multiple distinct lexical spaces for the same value space in XML Schema 1.0. There was some sentiment in the WG in favor of supporting this facility at some point, but the abstract-simple-types proposal which was intended to lay the ground work was judged, in the end, to have too many problems and raise too many difficult design choices to allow it to be included in XML Schema 1.0.
Formal response to commentator. Falk replies (privately) "Overall, I think that the WG resolution is reasonable and acceptable, because it turns out that in the real world, XML instance documents will mostly be generated by software and received by software and as such it is desirable to only have one lexical representation. Furthermore, most human beings will not want to read the XML instance documents in their 'raw' form, but will probably view the output of some XSLT transformation or other processing of the XML data into a presentation form suited for the target audience."
Should the list of constraining facets omit facets which have fixed values for all members of a type? Should facets which have fixed values for all members of a type be signaled in some way? Should such facets appear in the post-schema-validation infoset?
Input from Alexander Falk:
("Falk, Alexander" <falk@icon.at> to XML Schema Comments list, 10 April 2000)
3.3 A general question concering constraining facets in derived types: Most of the derived datatypes have certain facets that distinguish them from the primitive types. However, each one of the derived types still lists the very facets that were used to generate it from the primitive types in its list of applicable constraining facets. Consider the case of recurringDay, which is derived from recurringDuration by fixing the duration facet with "PT24H" and the period facet with "P1M". This type still lists duration and period as possible constraining facets - yet they are absolutely fixed by the very definition of recurringDay. How should a validating processor treat a new type derived from recurringDay that actually tries to use one of these facets in its definition? I see two possible solutions to this dilemma:
Input from Martin Bryan:
"Martin Bryan" <mtbryan@sgml.u-net.com> to XML Schema Comments list, Thu, 13 Apr 2000 12:15:46 +0100
Another, unrelated, problem concerns the your listing of scale as a valid facet for integer based derived datatypes. Under what conditions is it valid to specify a scale property for an integer?
petsa@us.ibm.com to XML Schema Comments list, Fri, 14 Apr 2000 13:44:22 -0400
The facets that have been given values during the refinement process cannot be changed. They are incuded in the post-validation infoset becase their actual values may be useful in some cases.
Should the lexical form for recurringDay be
changed from ---DD
to ----DD
?
Input from Alexander Falk:
("Falk, Alexander" <falk@icon.at> to XML Schema Comments list, 10 April 2000)
3.3.29.1 Lexical representation of recurringDay If this is a left
truncated ISO-8601 day, then it should be ----DD
, not
---DD
.
petsa@us.ibm.com to XML Schema Comments list, Fri, 14 Apr 2000 13:44:22 -0400
ISO 8601 says that the definition in the document is correct.
Should the file part2.xsd be changed so that it can be fetched and displayed without complaint by common software?
Input from Alexander Falk:
("Falk, Alexander" <falk@icon.at> to XML Schema Comments list, 10 April 2000)
A. Schema for Datatype Definitions The part.xsd schema document includes the namespace " http://www.w3.org/XML/1998/namespace" from a schemaLocation "../structures/xml.xsd" yet I was unable to locate this file on the W3C web-server. Can you please provide a URL that will allow me to access the xml.xsd file?
Input from Curt Arnold:
"Arnold, Curt" <Curt.Arnold@hyprotech.com> to XML Schema Comments list, Wed, 12 Apr 2000 10:32:39 -0600
Microsoft IE5 complains about xmlns:x not being fixed in the following line in Part2.xsd:
<!ATTLIST element xmlns:x CDATA #IMPLIED> <!-- keep this schema XML1.0 valid --> |
My interpretation is that the parser's behavior is well-intentioned but wrong. However since the line is in the DTD for compatibility to begin with, changing the line to:
<!ATTLIST element xmlns:x CDATA #FIXED "http://www.w3.org/XML/1998/namespace"> <!-- keep this schema XML1.0 valid --> |
should preserve compatibility with the most ubiquitous XML parser.
Should single-file archives of the spec, the DTD file(s), the stylesheets, and the XSD files be provided in future published versions of XML Schema?
Input from Alexander Falk:
("Falk, Alexander" <falk@icon.at> to XML Schema Comments list, 10 April 2000)
Would it be possible for a future draft or the final recommendation to include one downloadable archive file (ZIP, gzip, or any other common formats) that includes all required files in one neat package (i.e. the specs and their respective DTDs and XSL files plus the non-normative Schema DTDs, XSDs, and any other required file).
Should {,m}
be defined as a shorthand for
{0,m}
in the syntax of regular expressions?
Input from Alexander Falk:
("Falk, Alexander" <falk@icon.at> to XML Schema Comments list, 10 April 2000)
E. Regular Expressions For an implementor's position I don't
see why defining {,m}
as a shorthand form of
{0,m}
would be a problem. It would seem logical to add
this, now that {n,}
is allowed. I don't think it is
relevant whether or not Perl includes such a quantifier. If it is more
consistent and could potentially help schema authors, then it should
be added.
petsa@us.ibm.com to XML Schema Comments list, Fri, 14 Apr 2000 13:44:22 -0400
These are good suggestions. We decided to stick closely to Perl for the sake of consistency.
Discussed in call of 2000-06-29.
RESOLVED: to dispose of issue LC-32 by saying no (with rationale as described by Matt Timmermans in his email). Dissenting: Peterson (Rationale: the shorthands should be added)
Formal response to commentator. Commentator replies by private mail that he finds the rationale acceptable.
Should {0,0}
be removed from the table of
regular expression syntax? from the regular expression syntax
itself?
Input from Alexander Falk:
("Falk, Alexander" <falk@icon.at> to XML Schema Comments list, 10 April 2000)
Along these same lines: I doubt that there is any meaningful use
for {0,0}
apart from effectively "commenting
out" the preceding atom. Furthermore, {0,0}
could
then potentially be written as {,}
which is even more
confusing. Apart from being a logical consequence of the
{n,m}
quantifier, what was the reason for adding
{0,0}
to the table as a separate line?
petsa@us.ibm.com to XML Schema Comments list, Fri, 14 Apr 2000 13:44:22 -0400
These are good suggestions. We decided to stick closely to Perl for the sake of consistency.
Should a single-character escape for vertical bar (\|) be defined as part of the regular-expression syntax?
Input from Alexander Falk:
("Falk, Alexander" <falk@icon.at> to XML Schema Comments list, 10 April 2000)
Another problem: it is currently impossible to define a pattern that uses the vertical bar '|' as a character, because this is defined as a separator between branches, and there is no single character escape defined for \|. The only workaround is to include the vertical bar inside of a positive character group in a character class escape: [|]. Wouldn't it be better (i.e. more consistent) to add \| as a single char escape?
petsa@us.ibm.com to XML Schema Comments list, Fri, 14 Apr 2000 13:44:22 -0400
These are good suggestions. We decided to stick closely to Perl for the sake of consistency.
Is there a typo in section 5 of the Primer?
Input from David Wang:
David Wang <dwang@mitre.org> to XML Schema Comments list, Mon, 10 Apr 2000 16:06:09 -0400
I think the example for xmlschema-0/Section 5. Advanced Concepts III: The Quarterly Report has a small typo in either 4Q99.xml or report.xsd because the XML uses
<regions> <zip code="95819"> ... |
while the schema says it should be
<regions> <zipcode code="..."> ... |
I think the schema is the one that has the typo.
Input from Ray Gates:
Ray_Gates@manulife.com to XML Schema Comments list, Tue, 18 Apr 2000 16:17:26 -0400
In section 5 Advanced Concepts III: The Quarterly Report, in the listing of report.xsd:
under
<complexType name="RegionsType"> <element name="zipcode" .... |
If I am reading this correctly, this should be
<element name="zip" .... |
to be consistent with references to this element.
XML Schema should be revised to take better account of the needs of eBusiness. The project should be put on hold for three months to enable a review in the context of ebXML technical requirements. It should adopt a three-tier design. It should be subjected to field testing for six months before formal adoption. A test suite should be provided to help ensure consistency across implementations.
Input from David RR Webber:
David RR Webber <Gnosis_@compuserve.com> to XML Schema Comments list, Tue, 11 Apr 2000 11:34:51 -0400
... the current W3C Schema work is fundamentally flawed and does not provide a functional system that can support broad eBusiness interchanges in the same way that X12 and EDIFACT have provided for EDI. There are too many omissions, too many shortcomings and not enough regard to basic usability.
... The requirements fail to encompass what is now transpiring with ebXML, eCo and eSpeak to name some. Then a simple review of industry initiatives such as FIXML, wfmXML, RosettaNet, show the need for eBusiness directed mechanisms that are just not being addressed. Again, the requirements for Schema were written nearly two years ago - it's time to address this and revisit the requirements in the context of 2000/2001 and eBusiness needs.
All this is documented at http://www.bizcodes.org/eDTD/xml-eDTDWP.htm.
1) Moratorium of 3 months to allow Schema Specifications to be re-visited, particularly in the context of ebXML technical requirements.
2) 3 Tier syntax strategy to be adopted that allows hierarchy of representational levels -
3) Field testing for 6 months prior to formal adoption, with selection of industry groups providing evaluations, not just a set of vendors.
4) Interoperability test-suite development to ensure consistency.
Of course there are issues with this - but nothing that cannot be resolved by setting working parameters and putting together a cross-management group to oversee the technical work. We have plenty of precedence for this with efforts like RosettaNet and the standards groups X12, HL7, EDIFACT, et al. Yes this takes time, but this is one instance when that's exactly the path we should be taking now.
I'd rather have an objective Schema system that has been developed with broad involvement, rather than spending the next several years fixing up an inadequate system.
History teaches us that EDIFACT's semantics are much cleaner than X12's because X12 is a hodgepodge that evolved in an adhoc fashion. Right now we are looking at history repeating itself - and ebXML is our one bright hope to ensure that two years from now we're not looking at a dozen variants of XML/edi - all of which are not interoperable.
Input from Bruce Peat:
("Bruce Peat" <BPeat@eProcessSolutions.com> to XML Schema Comments list, Wed, 12 Apr 2000 16:06:53 -0400)
As to the 'three month timeframe', I think the members of the W3C should decide the schedule. This hierarchy approach should allow the working group to concentrate on what constitutes the 'base', and consider what best makes sense for the other 'representational levels'. As you suggest, this would give us more time to work and prove the extensions before going to recommendation status following a period of time after we can a chance to begin implementation using the base recommendation.
This approach I think would be accepted with open arms in the community. For a recommendation with a larger scope and without proper constraints for exchange would force a subset of the specification to be used in industry and will keep the various non-interoperable implementions on other critical items. IMHO: The W3C decision here could either save or cost industry billions of dollars over the next few years.
Input from David RR Webber:
David RR Webber <Gnosis_@compuserve.com> to XML Schema Comments list, Wed, 12 Apr 2000 16:45:03 -0400
The WG believes this is out of scope. Formal response.
Commentator's answer: suggestion has been overtaken by events.
How does the schema author define a multi-field key?
Input from Ani Pedersen:
Ani Pedersen <APeders@plexus.ca> to XML Schema Comments list, Tue, 11 Apr 2000 14:22:40 -0700
I have just started digging into XML and I need some help regarding mutiple field keys.
The new XML schema - Structures does not give any solution in how to implement multiple field keys. The only comment in section 3.10 is a note that mentions that is not supported by xsl:key.
Is there an alternative way of defining multi-field keys? A workaround?
Maybe I am missing something?
This is an extract of what I am working on and I need to define two fields as key values (they have to be together). Unfortunately the structure I came up with allows me to indicate that they both should be present and in that order (with group) and that both are keys. However, here I indicate that they are independent keys and I don't want that. Is there a way of restricting this looseness.
<element name="primaryCustomer" type="PrimaryCustomer" > <complexType name="PrimaryCustomer"> <element ref="accountNumber" minOccurs = "0"/> ...... <group name = "customerKey" > <sequence> <element ref="customerNumber"/> <element ref="customerSuffix"/> </sequence> </group> </complexType> <key name = "customerNumber" > <selector> customerKey/customerNumber </selector> <field> @name </field> </key> <key name = "customerSuffix" > <selector> customerKey/customerSuffix </selector> <field> @name </field> </key> </element> |
Noah Mendelsohn to XML Schema comments list, 11 April 2000:
<field> can be repeated to create a multi-field key. I think that's what you need, and "it's in there.".
Formal response to commentator. Commentator agrees this is OK.
Should the description of the default value for maxOccurs be changed in the Primer? in the Structures spec?
Cf. Clarify minOccur/maxOccur defaulting?
Input from Ace:
Ace <Ace@AceProgrammer.com> to XML Schema Comments list, Tue, 11 Apr 2000 16:42:00 -0700
I noticed that in the examples that are in XML Schema Part 0: Primer, that the comment element is usually defined:
<xsd:element ref="comment" minOccurs="0"/> |
The Primer explicitly says this means the element is optional. However, after looking at the spec and the explanation in the Primer, it seems to me that this actually makes the comment prohibited because it falls into the third case below. I hope I am mistaken. I'd rather the above syntax mean that the element is optional.
Input from Dario de Judicibus:
"Dario de Judicibus" <ddj@mclink.it> to XML Schema Comments list, Tue, 25 Apr 2000 23:12:02 +0200
Another problem is related to maxOccurs. It is said to be equal to minOccurs if not provided. But it is clear in specs that
<xsd:element ref="comment" minOccurs="0" /> |
means
<xsd:element ref="comment" minOccurs="0" maxOccurs="1" /> |
and not
<xsd:element ref="comment" minOccurs="0" maxOccurs="0" /> |
Is that a typo? Is intended?
Input from Curt Arnold:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Wed, 26 Apr 2000 07:56:59 -0500
I believe that the maxOccurs issue was an oversight, has been reported here, and will be corrected.
Henry Thompson to XML Schema comments list:
You're right, it should, and the defaulting text for maxOccurs is buggy. It should read {max occurs}
Is there a typo in the last paragraph of Primer 4.7?
Input from Peter A. Berggren:
berggren@lr.net (Peter A. Berggren) to XML Schema Comments list, Wed, 12 Apr 2000 10:54:16 -0400
Regarding the following extract from the last paragraph of Section 4.7 in XML Schema Part 0 : Primer...
"...As with final, there exists also an optional finalDefault attribute on the schema element whose value can be one of the values allowed for the final attribute. The effect of specifying the finalDefault attribute is equivalent to specifying a final attribute on every type definition and element declaration in the schema. ..."
This was apparently copied from a preceding, similar paragraph regarding finalDefault, without changing the word "final" to "block". I believe the text should read:
"...As with final, there exists also an optional blockDefault attribute on the schema element whose value can be one of the values allowed for the block attribute. The effect of specifying the blockDefault attribute is equivalent to specifying a block attribute on every type definition and element declaration in the schema. ..."
Should key references with fields whose elements are
xsi:null='true'
be treated as if the node were not found?
(Or, alternatively, as if it contained an empty string?)
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 12 Apr 2000 15:56:40 +0100
Not key, but keyref, for a change.
Seems to me we missed a case: if the node picked out by a keyref's field is
xsi:null='true'
, then this should be treated as if the node had
not been found. As the PWD reads, it will just use an empty string as that
field's contribution to the key sequence to be looked up.
Ditto for unique fields.
Input from Ashok Malhotra:
Your note of 4/12 confuses me. keyref fields can only refer to key fields and these are, by definition, non-nullable.
Are there typos in the schema-pointer and the description of time in the datatypes spec?
Input from Curt Arnold:
"Arnold, Curt" <Curt.Arnold@hyprotech.com> to XML Schema Comments list, Wed, 12 Apr 2000 10:05:26 -0600
Schema and DTD links in "This Version" header point to the Part 1 schema and DTD, not a text version of the schema or DTD for datatypes that I expected.
Section 3.3.22 time "date is generated from
recurringDuration by
setting the value of the duration
facet equal to P0Y
and the value
of the period facet
equal to PY24H
(24 hours)."
I believe "date" should be "time".
Henry Thompson to XML Schema comments list, 12 April 2000:
Those now are the schema and DTD for datatypes -- if you want the datatypes all by themselves, in another namespace, with no schema apparatus, use the pointer specifically for that, i.e. http://www.w3.org/1999/XMLSchema-datatypes.xsd
Should Primer 3.1 paragraph 3 read "unqualified" instead of "qualified"?
Input from Adrian Robert:
Adrian Robert <arobert@dtai.com> to XML Schema Comments list, Wed, 12 Apr 2000 11:55:11 -0700
In the third paragraph in Section 3.1 (below), it appears that "qualified" should be changed to "unqualified". (Also, note there is a small agreement-related ungrammaticality in the 2nd sentence.)
"In po1.xsd we globally specify the qualification of elements and attributes by setting the values of both elementFormDefault and attributeFormDefault to qualified. Strictly speaking, this is unnecessary because these are the default values of the two attributes, but we do so to highlight the contrast between this case and others we describe in subsequent sections."
[It is not clear which of the following questions is being raised by the commentator. -MSM] Is there any method, other than by enumerations, of defining lists of permitted attribute values? Is there a convenient way of importing lists of allowed values from outside the schema document? Is there a convenient way to specify that an attribute must have as its value one of an enumerated set of legal values? Is there a convenient way to specify that an attribute must have as its value a list or sequence of tokens, each of them from an enumerated set of legal tokens?
Input from Martin Bryan:
"Martin Bryan" <mtbryan@sgml.u-net.com> to XML Schema Comments list, Thu, 13 Apr 2000 12:15:46 +0100
I am trying to work out whether or not the derivedBy facet can be used to identify an element that contains a list of permitted values for an attribute. It seems to me to be a useful feature but I cannot see how it would work as the element referenced by derivedBy has, according to the Primer at least, and by implication from Part 2, to be in the instance and not in the schema. I was thinking about something along the following lines:
<xsd:attribute name="Code" base="xsd:string"> <xsd:simpleType name="CodeList" base="xsd:string" derivedBy="xsd:list"/> </xsd:attribute> <MyCodeList xsi:type="CodeList">AB1 CD2 EF3</MyCodeList> |
Is this valid? Where would MyCodeList need to be defined? (Is it permitted as part of the Schema?)
Incidentally the examples shown in Part 2 and the Primer conflict. Part 2 shows the use of the xsi:type attribute to link the instance to the derived type. The example in the primer does not include this attribute. Some clearer explanation of the role of this attribute and the position of the element containing the list of permitted values, might help to clarify this point.
Input from Henry S. Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 13 Apr 2000 14:24:46 +0100
Confusion of levels/locations/facilities, I guess.
If you want to constrain element content, use an element declaration:
<xs:schema xmlns='[URI:martinbryan]' targetNamespace='[URI:martinbryan]' xmlns:xs='http://www.w3.org/1999/XMLSchema'> <xs:simpleType name='CodeList' base='xs:string' derivedBy='list'/> <xs:element name='MyCodeList' type='CodeList'/> </xs:schema> <docroot xmlns='[URI:martinbryan]'> ... <MyCodeList>AB1 CD2 EF3</MyCodeList> </docroot> |
The <docroot> instance above is, as far as we can tell from the fragments given, schema-valid per the schema corresponding to the schema document above it.
If you want to constrain an element beyond its schema declaration in an instance, use xsi:type:
<xs:schema xmlns='[URI:martinbryan]' targetNamespace='[URI:martinbryan]' xmlns:xs='http://www.w3.org/1999/XMLSchema'> <xs:simpleType name='CodeList' base='xs:string' derivedBy='list'/> <xs:element name='MyContainer' content='textOnly'/> </xs:schema> <docroot xmlns='[URI:martinbryan]' xmlns:xsi='http://www.w3.org/1999/XMLSchema-instance'> ... <MyContainer xsi:type='CodeList'>AB1 CD2 EF3</MyCodeList> </docroot> |
Again, the <docroot> instance above is, as far as we can tell from the fragments given, schema-valid per the schema corresponding to the schema document above it.
Neither of these examples define the restricted element type in the instance. To do that you have to play games with what amounts to an internal subset:
schema3.xsd:
<xs:schema xmlns='[URI:martinbryan]' targetNamespace='[URI:martinbryan]' xmlns:xs='http://www.w3.org/1999/XMLSchema'> <xs:element name='MyContainer' content='textOnly'/> </xs:schema> <container xmlns='[URI:martinbryan]' xmlns:xsi='http://www.w3.org/1999/XMLSchema-instance' xmlns:xs='http://www.w3.org/1999/XMLSchema' xsi:schemaLocation='[URI:martinbryan] #xpointer(*/xs:schema)'> <xs:schema targetNamespace='[URI:martinbryan]'> <include 'schema3.xsd'/> <xs:simpleType name='CodeList' base='xs:string' derivedBy='list'/> </xs:schema> <docroot> ... <MyContainer xsi:type='CodeList'>AB1 CD2 EF3</MyCodeList> </docroot> </container> |
Input from Martin Bryan:
"Martin Bryan" <mtbryan@sgml.u-net.com> to XML Schema Comments list, Thu, 13 Apr 2000 15:43:56 +0100
Thanks for the response, but you are on the wrong track. I see how constraining element content works quite nicely. The problem was that I specifically want to apply the same technique to an enumerated list of attribute values, hence my question:
I am trying to work out whether or not the derivedBy facet can be used to identify an element that contains a list of permitted values for an attribute.
The area I am trying to get working is the ebXML electronic business area. We have a lot of elements which have "qualifier" attributes whose values are taken from code lists. Ideally I would like to be able to "import" up-to-date codelists as part of the schema so that maintenance of the code list can be made independent of maintenance of the schema. Having to define such lists as enumeration lists is very long winded, so I would like to use the derived by method, but I don't see how it applies to attributes, hence my attempted example:
<xsd:attribute name="Code" base="xsd:string"> <xsd:simpleType name="CodeList" base="xsd:string" derivedBy="xsd:list"/> </xsd:attribute> <MyCodeList xsi:type="CodeList">AB1 CD2 EF3</MyCodeList> |
Input from Henry S. Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 13 Apr 2000 18:03:06 +0100
What confuses me about your example is that you don't show an attribute in the instance, but rather an element. Your schema fragment, if it appeared within the type definition for the <banana> element, would schema-validate the following instance just fine:
<banana Code='AB1 CD2 EF3'>...</banana> |
Is what you want to be able to do is restrict the list elements to some enumerated list defined elsewhere?
Input from Martin Bryan:
"Martin Bryan" <mtbryan@sgml.u-net.com> to XML Schema Comments list, Fri, 14 Apr 2000 07:43:18 +0100
Yes, with a single value for the attribute taken from that list.
Input from Henry S. Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 14 Apr 2000 11:27:45 +0100
Right, sorry not to have understood sooner. This is a request we've had before, and will certainly consider carefully.
Formal response to commentator. Commentator replies that with some reservations he is "happy that we have a workable solution to the separation of the management of enumeration list values from the use of these values in specific applications".
Various points in need of clarification.
Input from Martin Bryan:
"Martin Bryan" <mtbryan@sgml.u-net.com> to XML Schema Comments list, Thu, 13 Apr 2000 15:27:21 +0100
What does it mean to apply a pattern to a timeDuration?
Why does the example given of a timeInstance not have a Z before the last hyphen?
Why is the period facet of time shown as
PY24H
rather than PT24H
? (Again there is no Z
preceding the timezone details in the example.)
If you define a minInclusive or minExclusive date for a recurring duration whose period is 7 days will the days always fall on the same day of the week as the first date within the period (i.e. the minInclusive date)?
How can enumeration be applied to the binary datatype?
Input from Martin Bryan <mtbryan@sgml.u-net.com>:
"Martin Bryan" <mtbryan@sgml.u-net.com> to XML Schema Comments list on Sun, 14 May 2000 08:01:08 +0100
You have sent a number of notes with comments on the Datatypes part of the XML Schema specification. We appreciate your input. I believe I have responded to all your concerns but would like to ask you formally whether you feel your concerns have been addressed and the issues you raised can be closed.
The vast majority of my concerns have been answered, for which I thank you. There are still some relatively minor ones, such as the sentence that reads 'To accommodate year values outside the range from 0 to 9999, additional digits can be added to the left of this representation and an preceding "-" is allowed.' This really needs the 0 changed to 1, and a statement to indicate that negative numbers represent Before Christian Era (BC/BE) dates. I would have loved to be able to deal with non-Gregorian calendars (e.g. Jewish, Arabic, Chinese, Japanese, Julian, ....) but don't expect everything to be in place at this stage.
The one area I still expect we are going to have problems in using datatype for electronic commerce is measurements. For example, how can I check that 100cm and 1m are exactly equivalent, but 1yd is not. But again I do not expect you to have addressed these problems at this state. (Schema2 will be along within a few years!)
Nevertheless you have done an impressive job and are to be highly commended.
petsa@us.ibm.com to XML Schema Comments list, Fri, 14 Apr 2000 13:35:22 -0400
What does it mean to apply a pattern to a timeDuration?
The pattern facet allows you to constrain the lexical representation of the datatype. For time duration you could, for example, write a pattern that starts with P100Y. This would mean that the the duration must be greater than 100 years.
Why does the example given of a timeInstance not have a Z before the last hyphen?
The example is correct. A "Z" is not required and would be an error.
Why is the period facet of time shown as PY24H rather than PT24H? (Again there is no Z preceding the timezone details in the example.)
It should be PT24H. Thanks. A "Z" is not required.
If you define a minInclusive or minExclusive date for a recurring duration whose period is 7 days will the days always fall on the same day of the week as the first date within the period (i.e. the minInclusive date)?
Yes, it would.
How can enumeration be applied to the binary datatype?
In theory you could define an enumeration by quoting hunks of, say, base64 encoded binary data.
Informal response to commentator. Commentator confirms he is satisfied.
Some requests for clarification.
Input from gmacri@libero.it:
"gmacri@libero.it"<gmacri@libero.it> to XML Schema Comments list, Fri, 14 Apr 2000 15:25:15 +0200
I'm a student of Politecnico in Turin. I have written you to ask some information:
xmlns:book="http://www.somewhere.org/Book
,
must have some relation with some component defined in the related schemas?
[Note re-sent 1 May 2000.]
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 01 May 2000 18:43:26 +0100
When I write an XML document, the URI used in the
declaration of a namespace, for example
xmlns:book="http://www.somewhere.org/Book
,
must have some relation with some component defined in the > related
schemas?
There doesn't need to be anything at that URI. But if you want to schema-validate elements/attributes from that namespace, you may want one there, or you'll need to define components for those elements in some schema document for that namespace.
The attribute "BLOCK" used in the definition of some schema's component is equivalent to old attribute "EXACT"?
Yes.
Should default values and other similar contributions to the standard info-set be removed from XML Schema?
Input from Mikael Ståldal:
Mikael Ståldal <d96-mst@d.kth.se> to XML Schema Comments list, Mon, 17 Apr 2000 12:13:24 +0200 (MET DST)
Section 1.1 in the XML Schema Structures WD says:
The purpose of an XML Schema: Structures schema is to define and describe a class of XML documents by using schema components to constrain and document the meaning, usage and relationships of their constituent parts: datatypes, elements and their content and attributes and their values.
Schemas may also provide for the specification of additional document information, such as default values for attributes and elements. -----
I consider this as two different purposes, and I don't think it's a good idea to mix them together as the schema WD does (DTDs has the same problem). The inclusion of default values in the schema lead to that the output of parsing the same XML document with and without the schema can be different, and I don't like that. I think that validation and supplying default values should be clearly separated processing steps. Likewise, I think that the schema for validation and the defintion for default values should be clearly separated data entites. It should be possible to apply default values without validation, and omitting the validation step should not affect the result for valid input.
My suggestion is to remove default values, and everything else that may cause the output from validating and non-validating parsing to be different, from the XML Schema spec and leave that for some other mechanism (perhaps internal DTD subset for attribute value defaults).
Commentator responds 21 September "I am satisfied with the decision."
Is there a typo in an example in xmlschema-1?
Input from Gregor Meyer:
GRMEYER@de.ibm.com to XML Schema Comments list, Mon, 17 Apr 2000 22:55:58 +0200
there are two minor typos in an example in xmlschema-1.html
<xs:complexType name="length1" base="dt:non-negative-integer" derivedBy="extension"/> ... ... <xs:element name="size" type="dt:non-positive-integer"/> |
The base type names should probably be written without '-'
Should the declaration for the simpleType element type have an explicit reference to the complex type simpleType?
Input from Curt Arnold:
"Curt Arnold" <carnold@houston.rr.com> to XML Schema Comments list, Mon, 17 Apr 2000 23:54:08 -0500
The definition for the simpleType element does not have an explicit reference to the simpleType complexType. I assume this is an error in the schema for schemas and not an indication that their is an implicit typing to an identically named type. However, if I'm wrong, could you point out where this behavior is described.
Should the mechanism for restricting complex types be revised to make it less verbose and awkward?
Cf. Simultaneous restriction and extension?
Input from Curt Arnold:
"Curt Arnold" <carnold@houston.rr.com> to XML Schema Comments list, Mon, 17 Apr 2000 23:54:08 -0500
Restrictions of element content in complex types still looks extremely awkward and under documented. If I've overlooked an concise description of how it should work, please point out the appropriate section. Mimicing the expanded content model up to the point of restriction would be extremely verbose when you are tweaking something fairly deep in a content model.
Input from Jane Hunter:
5. Derivation Issues
5.1 Simplified Derivation By Restriction
The current XML Schema WD requires a complex type derived by restriction to repeat all the declarations it inherited from its base. This becomes tedious and hard to manage as the inheritance hierarchy grows deeper. It would be preferable if only the declarations that are further constrained in the derived type needed to be specified. Its possible that in a deeply nested structure, such repetition might be the only practical way to specify restrictions to the structure without causing ambiguity.
Discussed in call of 2000-06-16.
Gudgin agreed with the commentator that when writing content models by hand our current rules might indeed be tedious, but observed that for schemas generated by machine (which he expected to be a more common case) the problem was not important. MSM observed that we had spent considerable time exploring alternatives in this area, and that in fact all of the alternatives proposed are not less tedious that the current rule, only tedious in different ways. Anyone writing a schema by hand can be assumed to have access to an editor with cut and paste facilities; David Beech observed that using cut and paste and the current rule is actually somewhat simpler than any of the alternatives, and is much less tedious than constructing the necessary Xpaths, or counting out the necessary sic elements.
RESOLVED without dissent to stand by the current design.
Formal response to commentators.
Commentator not satisfied (n.b. has typo in issue number): 20 July 2000.
Should the schema for schemas be revised to use a global declaration for the attribute element type, rather than multiple local declarations?
Input from Curt Arnold:
"Curt Arnold" <carnold@houston.rr.com> to XML Schema Comments list, Mon, 17 Apr 2000 23:54:08 -0500
Multiple context-specific defininitions of the attribute element are declared, however they are all identical. It would be confusing to someone looking at a help system when they are presented with the attributeGroup form of the attribute element, the complexType definition of element and don't see any obvious difference.
Can XML Schema be used to define the XSLT language?
Should the syntax for declaring complex types be revamped? In particular, should the content attribute be dropped?
Cf. Content model of <complexType>
Input from Don Box:
I just spent the afternoon massaging my schema for XSLT. If you want to check it out, it is at:
http://www.develop.com/dbox/xml/xslt.xsd
I'd love to get feedback, especially from those who have April 7-compliant schema parsers/validators. I still need to go through and tighten up the references to simple types, add xsd:unique, and catch any bugs I am too tired to see today.
Input from Curt Arnold:
"Arnold, Curt" <Curt.Arnold@hyprotech.com> to XML Schema Comments list, Tue, 18 Apr 2000 15:00:55 -0600
... Schema doesn't seem to have the ability to adequately represent the
content model of <xsl:template>
or
<xsl:for-each>
. <xsl:template>
content
should be zero or more <xsl:param>
elements followed by
template content. <xsl:for-each>
content should be zero or
more <xsl:sort>
elements followed by template content.
Your schema represented these as:
<complexType name='template' content='mixed'> <element ref='xsl:instruction' /> <any namespace='##other' /> </complexType> <complexType name='named-template' base='xsl:template-with-space' derivedBy='extension' > <element name='param' type='xsl:variable-definition'/> <!-- should have a minOccurs/maxOccurs here --> <attribute name='match' type='xsl:pattern' /> <attribute name='name' type='QName' /> <attribute name='priority' type='xsl:XPathNumber' /> <attribute name='mode' type='QName' /> </complexType> <complexType name='template-with-space' base='xsl:template' derivedBy='extension' > <attribute ref='xml:space' /> </complexType> <complexType name='for-each' base='xsl:template-with-space' derivedBy='extension' > <element name='sort' type='xsl:sort' minOccurs='0' maxOccurs='unbounded' /> <attribute name='select' type='xsl:expr' use='required' /> </complexType> |
My interpretation of derivedBy='extension'
is that any content
defined in the derived type appears after the content in the base type. (I
looked but couldn't see any definition of special behavior if the base
complexType was mixed) So that your definitions would allow template content
then <xsl:sort>
or <xsl:param>
elements.
However, since the only mechanism to get mixed content is through a
content='mixed'
attribute on a <complexType>
element and that the only mechanism to build a content model off of a
complexType is restriction or extension, there does not seem to be a mechanism
for doing what you would really want.
The best approximation you could do with the working draft is to not use
derivation and create a mixed model that allows <xsl:sort>
or <xsl:param>
to appear anywhere in the mixed content.
If you however, had a <mixed> grouping element then you could adequately the content model like:
<complexType name="for-each"> <element name="sort" type="xsl:sort" minOccurs='0' maxOccurs='unbounded'/> <mixed> <element ref='xsl:instruction' /> <any namespace='##other' /> </mixed> </complexType> |
That led me to look at the content attribute of complexType which appears to only provide information in a very few places and has substantial potential to be inconsistent with other parts of the type declaration. Elimination of the content attribute would seem to eliminate some complexity.
The content attribute can have values of
elementOnly
, textOnly
, mixed
and
empty
.
textOnly
can usually be implied by the
type attribute referencing a simple type or a
<simpleType>
child element. The equivalent of
content='textOnly'
would be nothing more would be
type='string'
. There doesn't seem to be a case where
content='textOnly'
adds value.
elementOnly
can be implied by a type
attribute referencing a complexType or a
<complexType>
child element.
If we had the <mixed>
group tag, then mixed
content is just a particular flavor of complex type.
That leaves empty
. Either an
<empty/>
element like in previous drafts or better
yet defining an "empty" complex type in the schema for schema that is
the default base type if no type is defined and would be the default
base type for complexType elements.
<!-- these would all be equivalent (unless someone defined a locally name empty complex type) --> <element name="apply-imports" type="empty"/> <element xmlns:xsd="http://www.w3.org/1999/XMLSchema" name="apply-imports" type="xsd:empty"/> <element name="apply-imports"/> |
I've posted an HTMLHelp file (http://home.houston.rr.com/curta/xslt.chm) based on a simplified version of Don's schema for XSLT (http://home.houston.rr.com/curta/xslt.xsd) on my home page.
Input from Don Box:
"Box, Don" <dbox@develop.com> to XML Schema Comments list, Tue, 18 Apr 2000 15:21:10 -0700
Second, Schema doesn't seem to have the ability to adequately represent the content model of <xsl:template> or <xsl:for-each>. <xsl:template> content should be zero or more <xsl:param> elements followed by template content.
Yeah, I thought about alternative ways to model that. One way would have been to use a named model group (that was my first pass btw). The problem is that for mixed content, you can't use sequence constraints. This is a problem with older technologies as well.
<xsl:for-each> content should be zero or more <xsl:sort> elements followed by template content.
Same problem.
[snip]
I don't know that anyone has the will to add more complexity to the schema language to handle mixed content.
Input from Curt Arnold:
"Arnold, Curt" <Curt.Arnold@hyprotech.com> to XML Schema Comments list, Tue, 18 Apr 2000 16:56:40 -0600
Actually, I thought my suggestions allowed you to appropriately constrain the content plus simplified things by eliminating the content attribute.
Input from Henry S. Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 20 Apr 2000 17:18:32 +0100
Curt Arnold <Curt.Arnold@hyprotech.com> writes:
My interpretation of derivedBy='extension' is that any content defined in the derived type appears after the content in the base type. (I looked but couldn't see any definition of special behavior if the base complexType was mixed) So that your definitions would allow template content then <xsl:sort> or <xsl:param> elements.
Correct.
However, since the only mechanism to get mixed content is
through a content='mixed'
attribute on a
<complexType> element and that the only mechanism to build a
content model off of a complexType is restriction or extension, there
does not seem to be a mechanism for doing what you would really
want.
I'd do it by defining template with a disjunction, and then restricting one or the other branch of the disjunction out of existence for the derived types.
That led me to look at the content attribute of complexType which appears to only provide information in a very few places and has substantial potential to be inconsistent with other parts of the type declaration. Elimination of the content attribute would seem to eliminate some complexity.
The content attribute can have values of elementOnly, textOnly, mixed and empty.
Your discussion below seems to assume that 'content' is an attribute on <element>, when in fact it belongs on <attribute>.
textOnly can usually be implied by the type attribute referencing a simple type or a <simpleType> child element. The equivalent of content='textOnly' would be nothing more would be type='string'. There doesn't seem to be a case where content='textOnly' adds value.
There's one corner case:
<xs:element name='foo'> <xs:complexType content='textOnly'/> </xs:element> |
is weaker than base='string'
(or
type='string'
on the xs:element), in that it can be
restricted by a complex type with any simple type as its
base
elementOnly can be implied by a type attribute referencing a complexType or a <complexType> child element.
How can we distinguish mixed from element only in that case?
If we had the <mixed> group tag, then mixed content is just a particular flavor of complex type.
If we had such a group tag, it would re-introduce the pernicious mixed content bug from SGML.
That leaves empty. Either an <empty/> element like in previous drafts or better yet defining an "empty" complex type in the schema for schema that is the default base type if no type is defined and would be the default base type for complexType elements.
I think we're open to re-designs in this area, but this one isn't quite there yet.
Input from Henry S. Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 20 Apr 2000 17:25:06 +0100
Don Box <dbox@develop.com> writes:
Yeah, I thought about alternative ways to model that. One way would have been to use a named model group (that was my first pass btw). The problem is that for mixed content, you can't use sequence constraints. This is a problem with older technologies as well.
Yes you can -- the whole point of making 'mixed' orthogonal to the content model is that e.g.
<xs:complexType name='haystack' content='mixed'> <xs:sequence> <xs:element name='thread'/> <xs:element name='needle'/> </xs:sequence> </xs:complexType> |
schema-validates only <haystack> elements with exactly one <thread> and one <needle> daughters, in that order.
Yet another reason to think again about the default <choice> wrapper when you simply say
<xs:complexType content='mixed'> <xs:element .../> . . . </xs:complexType> |
Discussed in call of 2000-06-22.
Email discussion had showed that many of the problems associated in SGML with 'pernicious mixed content' could probably be solved in XML, because XML does not require (or allow) the processor to suppress white space, even in cases where the white space is 'insignificant'.
Some WG members argued, however, that the current design has virtues other than simply avoiding pernicious mixed content: the function of a content model in XML Schema is to show where subelement can occur, how often, and in what sequence. There is, separately, a flag to indicate whether character data, or only white space, can occur between them. The simplicity of this design is a virtue in itself; it covers many more cases than XML 1.0 DTDs, and it covers virtually all realistic content models: the cases not covered are not useful or realistic enough to merit the change. As a case not covered, Matt Timmermans offered the example of a paragraph title: one would like to be able to write the equivalent of
<!ELEMENT p (head, (#PCDATA | %phrase;)*)> |
This was accepted as (a) not covered, (b) desirable, and (c) a good illustration of the relatively good coverage of the current design: if inability to handle paragraph headings is the worst that happens, ...
The simple fact that in some places within the element, character data is allowed, and in other cases it is forbidden, suggested that there is some semantic difference between those two regions. But it is an inherently questionable design (if not necessarily always a wrong design) to identify an important semantic unit and then to define no element type in the markup language corresponding to that semantic unit. It need not be the business of XML Schema to make questionable design decisions (among which we number, however reluctantly, the design of the XSL 'template' element's content model) easy to implement
RESOLVED: to dispose of issue LC-51 by explaining our rationale and declining to make the suggested change. Dissenting: Connolly; Abstaining: Timmermans.
There is some uncertainty about the default values of the minOccurs and maxOccurs attributes in various situations. Should the defaulting rules be changed?
Cf. Re-align occurrence indications for elements and attributes?
Input from Curt Arnold:
"Arnold, Curt" <Curt.Arnold@hyprotech.com> to XML Schema Comments list, Tue, 18 Apr 2000 15:00:55 -0600
The param element reference in the named-template
type definition should have a minOccur="0"
and a
maxOccur="unbounded"
. As written, a template has to have one and
only one param.
Input from Don Box:
"Box, Don" <dbox@develop.com> to XML Schema Comments list, Tue, 18 Apr 2000 15:21:10 -0700
My reading of rule 4.3 under the {content type} definition (found
under section 4.3.3) implies that there is an implicit
<choice minOccurs='0' maxOccurs='unbounded' >
particle over the particle children of a content=mixed
complex type. I'll defer to Henry on this.
Input from Curt Arnold:
"Arnold, Curt" <Curt.Arnold@hyprotech.com> to XML Schema Comments list, Tue, 18 Apr 2000 16:56:40 -0600
I guess it depends on what the interpretation of "extending" a complex type that ultimately derived from a complex type that has a content of mixed. If extending means that param is magically added into the mixed content of its base type, then the multiplicity would be implied. If extension in this context means appears that content declared in the derived type appears after the base complexType's content has been satisifed then param is in the wrong place with the wrong multiplicity. Possibly the behavior is already explicit in the schema docs. But for me to get my mind around the schema doc, I have to work out from the schemas to schema to the prose and not the other way around.
Input from Noah Mendelsohn:
Noah_Mendelsohn@lotus.com to XML Schema Comments list, Thu, 20 Apr 2000 01:11:45 -0400
Schemas treats mixed differently than DTD's. In schemas, both element-only and mixed take a full content model. The only difference with mixed is that the instance can have character information item children before after and in between the elements validated by the model. So, you have the full power of content models with mixed. Also, mixed does not imply any defaults for min/maxOccurs. This is NOT the DTD model, but it can express every constraint allowed by DTD mixed (and more).
Input from Don Box:
Don Box <dbox@develop.com> to XML Schema Comments list, Wed, 19 Apr 2000 23:23:19 -0700
This is actually a bit confusing, but I think I finally have my head around it (I certainly didn't two days ago).
If one looks at Section 4.3.3 of Part 1, the description of the
{content type} deserialization rules discusses the EXPLICIT PARTICLE
that is introduced as a parent of most complex type content models. To
paraphrase, unless the complexType's content model is a lone
all
, group
, sequence
, or
choice
, the model is interpreted as if a compositor has
been introduced. In the case of content='mixed'
, it is a
choice compositor marked
minOccurs='0'
/maxOccurs='unbounded'
.
That stated, I believe (but may be wrong) that the following:
<complexType name='bob' content='mixed' > <element name='a'/> <element name='b'/> <element name='c'/> </complexType> |
is equivalent to:
<complexType name='bob' content='mixed' > <choice minOccurs='0' maxOccurs='unbounded' > <element name='a'/> <element name='b'/> <element name='c'/> </choice> </complexType> |
This is pretty much the DTD story. If one really wants the "revolutionary structured mixed content model" that acts like elementOnly but allows non-whitespace character data, one would have needed to write this:
<complexType name='bob' content='mixed' > <sequence minOccurs='m' maxOccurs='n' > <element name='a'/> <element name='b'/> <element name='c'/> </sequence> </complexType> |
where m and n are the values that match your expectations ;-)
Taking this into account, I believe my original schema for XSLT was
using content='mixed'
correctly, although I now see at
least one opportunity to tighten up some constraints.
Input from Henry S. Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 20 Apr 2000 17:29:37 +0100
Curt Arnold <Curt.Arnold@hyprotech.com> writes:
The param element reference in the named-template type definition should have a minOccur="0" and a maxOccur="unbounded". As written, a template has to have one and only one param.\
Don wrote:
My reading of rule 4.3 under the {content type} definition (found under section 4.3.3) implies that there is an implicit <choice minOccurs='0' maxOccurs='unbounded' > particle over the particle children of a content=mixed complex type. I'll defer to Henry on this.
Don is right, but as I've said I think this is sowing enough confusion that the expected benefit is being overwhelmed.
Curt reply:
I guess it depends on what the interpretation of "extending" a complex type that ultimately derived from a complex type that has a content of mixed. If extending means that param is magically added into the mixed content of its base type, then the multiplicity would be implied. If extension in this context means appears that content declared in the derived type appears after the base complexType's content has been satisifed then param is in the wrong place with the wrong multiplicity. Possibly the behavior is already explicit in the schema docs. But for me to get my mind around the schema doc, I have to work out from the schemas to schema to the prose and not the other way around.
I suspect there's some unclarity in this case -- I'll get back to you.
Input from Henry S. Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 20 Apr 2000 17:32:08 +0100
Don Box <dbox@develop.com> writes:
This is actually a bit confusing, but I think I finally have my head around it (I certainly didn't two days ago).
If one looks at Section 4.3.3 of Part 1, the description of the {content type} deserialization rules discusses the EXPLICIT PARTICLE that is introduced as a parent of most complex type content models. To paraphrase, unless the complexType's content model is a lone all, group, sequence, or choice, the model is interpreted as if a compositor has been introduced. In the case of content='mixed', it is a choice compositor marked minOccurs='0'/maxOccurs='unbounded'.
...
The above analysis is entirely correct, as far as I can tell.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Primer
§2.2 Complex Type Definitions, Element & Attribute Declarations
The third paragraph prior to Table 1 (beginning "The
comment
element...") describes default values for
minOccurs
and maxOccurs
. This seems
counterintuitive. If I leave off maxOccurs
it seems that
a maximum has not been set, so I'd expect it to be unbounded. The
default is however either equal to minOccurs
or 1, if
minOccurs
is absent.
Discussed in call of 2000-06-23.
The WG first noted that the discussion of this issue eventually shifted from a proposal to change the defaulting rules for min- and maxOccurs to a proposal to change the defaulting rules for the top-level grouping element, the latter had been dealt with as issue LC-126. The former had not yet been dealt with, so we focused on that. The WG identified four possible ways of dealing with the defaults for min- and maxOccurs (for elements -- attributes are, it will be recalled, now handled differently):
It was observed that if literal defaults of 1 and 1 are supplied, a schema author could create an error by raising the minOccurs value and neglecting to edit the maxOccurs value. Some proponents of literal defaults agreed that this was a drawback to the use of literal defaults but argued that it was an acceptable cost for the advantage of eliminating conditional defaults. Some argued that since defaults of 1 and 1 are so easy to remember, it would be a rare schema author who didn't realize that maxOccurs needed to be raised if minOccurs was raised.
It was clarified that (given defaults of 1 and 1) processors would
be required to flag <element ref='foo'
minOccurs='2'/>
as an error; it was observed that processors
might choose to recover from this error by assuming maxOccurs='2' --
this might be reassuring to some, or alarming to others, but it is a
consequence of our error handling policy, which requires all errors to
be reported and does not specify behavior in the presence of errors.
A straw poll showed support for both the 'corrected status quo' and literal defaults, with a preponderance of both active support and toleration for literal defaults. The chair put the formal question accordingly
RESOLVED: to specify literal defaults for min / max Occurs in the XML transfer syntax. Dissenting: Maloney (by proxy).
The WG then discussed what the defaults should be. In a straw poll, there was a preponderance of opinion for 1 and 1 over 0 and unbounded. RESOLVED without dissent: to make the default for minOccurs and maxOccurs 1.
Should the description of the example in Primer 2.7 be changed to replace shipper and biller with shippee (or recipient) and billee (or payer)?
Input from Ray Gates:
Ray_Gates@manulife.com to XML Schema Comments list, Tue, 18 Apr 2000 16:21:54 -0400
In part 0 (April 7), in section 2.7 Building Content Models, end of para. three, refers to "a single address for those cases where the shipper and biller are co-located."
Surely, this should read "shippee and billee".
Should the simple type nonPositiveInteger be dropped as unnecessary?
Input from Gregor Meyer:
petsa@us.ibm.com to XML Schema Comments list, Tue, 18 Apr 2000 10:29:41 -0400 (forwarding note from Gregor Meyer)
A minor comment on the recent XML Schema definitions: xmlschema-2.html defines a type nonPositiveInteger; I have never seen an application where this type would have been useful. The type nonNegativeInteger is often used, though. Is it intended to have an almost complete set of basic types defined by the standard? In my humble opinion there is no need for a standardized type nonPositiveInteger.
Discussed at Edinburgh ftf.
The nonNegativeInteger type is needed for the Schema for Schemas. The nonPositiveInteger type is supplied for symmetry. It is true that it will be needed only rarely, but implementation seems unlikely to be a burden, and the WG agreed to retain the type in the interests of symmetry.
What is the proper namespace for built-in datatypes? In particular, should the built-in datatypes be defined both in the XSD namespace and in their own namespace?
(Cf. issues QNames and reproSubstitution in the development-period issues list.)
Input from Curt Arnold:
"Curt Arnold" <carnold@houston.rr.com> to XML Schema Comments list, Fri, 21 Apr 2000 10:48:02 -0500
I know that there is some reorganization of the physical structure of the schema for schemas intended in the near future and this thought occurred to me while working on the previously mentioned schema compilation project.
Basically, all documents whether they are schemas or not will need to have implied definitions of the builtin datatypes in the same manner as they have an implied definition of xsi:type and xsi:null (especially since a built-in datatype name could be a value for an xsi:type attribute).
Only a minuscule fraction of documents need to have knowledge of the schema definition elements.
This seems to indicate that the definition of the built-in datatypes need to be defined in the xsi namespace (http://www.w3.org/1999/XMLSchema-instance).
Datatypes used in the schema definition elements that are not intended as generally available datatypes (typically commented as utility class not for public use) should be defined in XMLSchema.xsd and would be in the http://www.w3.org/1999/XMLSchema namespace.
When a processor is trying to resolve a type name that is not qualified, it would first look within the current schema and if there was no match, would then attempt to resolve within the schema instance namespace.
So, I'd suggest something like (freehanded definitions, not validated)
xsi.xsd
<schema targetNamespace="http://www.w3.org/1999/XMLSchema-instance"> <simpleType name="urSimpleType"/> <simpleType name="string"/> <simpleType name="integer"/> .... <attribute name="type" type="QName"/> <attribute name="null" type="boolean"/> </schema> |
XMLSchema.xsd
<schema targetNamespace='http://www.w3.org/1999/XMLSchema"> <import targetNamespace="http://www.w3.org/1999/XMLSchema-instance" schemaLocation="xsi.xsd"/> <!-- import of xml namespace goes here --> <!-- since this is not in the instance namespace, it will not be used to resolve unqualified names in other schemas --> <simpleType name="XPathApprox"/> ... <attribute name="type" type="QName"/> <attribute name="null" type="boolean"/> </schema> |
Input from Henry S. Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 21 Apr 2000 17:12:10 +0100
There is already a namespace and a schema for precisely what you
have in mind, namely
http://www.w3.org/1999/XMLSchema-datatypes
Input from Curt Arnold:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Fri, 21 Apr 2000 11:32:55 -0500
I had seen that, but I wasn't sure what the intentions were. It seemed like it was defining a parallel namespace. It wasn't clear that the instance and schema definition namespaces would eventually import it and part2.xsd would no longer be included. Definitely seems the right way to go.
Input from Noah Mendelsohn:
Noah_Mendelsohn@lotus.com to XML Schema Comments list, Thu, 27 Apr 2000 00:16:22 -0400
Uh...I'm not sure it's quite that simple. If we are going to encourage coders of schema instance documents to use:
<el xmlns:dt="http://www.w3.org/1999/XMLSchema-datatypes" xsi:type="dt:integer"/> |
as opposed to:
<el xsi:type="xsd:integer"/> |
then we have to be very careful about the semantic implications. Did we finally manage to make these two "integer" types identical in the sense that they would be the same in the augmented infoset? If not, then your suggestion leads users into big problems. I do not typically want my infosets labeled with two subtly different sorts of integers.
Last I heard in New Orleans, we did not have this level of identity. If that is the case, then I think http://www.w3.org/1999/XMLSchema-datatypes is appropriate for use specifically in languages which will not interact with our schemas. We have been quite clear that the typical (though not required) idiom in a schema is:
<xsd:element name="el" type="xsd:integer"/> |
This suggests to me that the intended instance is the 2nd one above. That said, I have reservations about combining dt: and xsi:; the types can be used in many situations in which xsi: itself would be inappropriate.
Input from Curt Arnold:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Thu, 27 Apr 2000 00:00:13 -0500
I reviewed the schema for datatypes and there are several things that concern me about it:
My current thinking on this is that:
I believe this arrangement provides a near optimal partition. Built-in types can be accessed either through unqualified names or qualified with the datatypes namespace and generic XML documents do not need to import the symbol space for schema definition.
If not, however I would strongly recommend dropping or avoiding the parallel type hierarchy as currently defined in schema-datatypes.
Discussed in call of 2000-07-13.
The question is on the commentator's proposal that we revisit the allocation of our constructs to namespaces. We began consideration of this question in Austin in April 1999, and decided it most recently in New Orleans in March of this year.
The status quo is that all built-in simple datatypes are in a datatypes namespace, and also that all constructs (datatypes and structures) are in the 'xsd' namespace.
The commentator proposes that we remove the overlap, and have two disjoint namespaces, one for the built-in simple datatypes and one for everything else.
The WG discussed the issue.
RESOLVED unanimously: to dispose of issue LC-55 with a polite no, explaining that a separate namespace for datatypes does exist, that the relationship between the xsd and datatypes namespaces is clearly defined by the type derivation mechanism, and that the primer does say to prefer the xsd form over the other in schemas. Making the namespaces disjoint would only make it harder to exploit the default-namespace mechanism when defining schemas.
Should schemaPrefix and targetPrefix attributes be added to the import and schema elements, in order to work around the problems caused by the inaccessibility of namespace-declaration information in XSLT and similar systems?
Input from Curt Arnold:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Fri, 21 Apr 2000 12:12:23 -0500
Namespace prefix definitions (xmlns:prefix="uri"
attributes) are not accessible from XSLT (and probably from anything
else that tries to hide the details of namespacing from you) which
means that it is not possible to resolve qualifed name references to
types, for example, in those environments. I'd suggest adding an
explicit schemaPrefix attribute to the import and targetPrefix
attribute to the schema element. At least, this information could be
used as decoration in the documentation. I am able to work around this
using equivalent attributes from another namespace, but it seems like
a general problem.
Discussed in call of 2000-06-30.
The suggestion has been withdrawn; the issue should be closed.
Should implementations be allowed to have a maximum depth for includes? Should the spec define a minimum value for that maximum depth?
Cf. Arbitrary-precision decimal too much?
Input from Curt Arnold:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Fri, 21 Apr 2000 12:12:23 -0500
Of course, any implementation has to have some limit on the depth of nested includes.
Discussed in call of 2000-06-30.
It might be a theoretical problem (if there is no limit, then for any processor it might be possible to build a valid schema document which the processor could not process -- i.e. it might become impossible to build a conforming processor, if conforming processors accept all and only valid schema documents), or it might be a practical problem. In fact, it seems very unlikely to be a practical problem in XML Schema, any more than it is in XML.
Olken suggested that we require schema processors to support at least 32 or more nested includes. Thompson observed that we don't limit the maximum depth of content models, or require support for any minimum level of nesting. There are all sorts of potential implementation limits we don't mention; the issue is spurious.
Peterson and Olken argued that file descriptor limits are rather different, in practice, from stack space, and might need special treatment on that ground. Sperberg-McQueen (who had begun by supporting Olken's proposal) observed with an air of surprise that, owing to the declarative nature of XML Schemas, the sequence of declarations has no significance, and it is not necessary to nest includes: they can be queued, and a processor could in theory make do with a single file descriptor for reading schema documents.
RESOLVED without dissent: to dispose of this issue with thanks but no change, on the grounds that it is not a serious implementation problem.
How should software react to conflicts between imports, when different schema components import the same namespace from different resources?
Input from Curt Arnold:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Fri, 21 Apr 2000 12:12:23 -0500
A complicated issue is nested imports. It should be moderately common for multiple imports to in turn import some common namespace but possibly from different resources. Trying to resolve the potential conflicts seemed untenable.
How I've currently addressed it in my preprocessor is that only imports that in the schema being compiled add information to the validation package. If an import appears in an include or import and its namespace didn't have an import in the schema being compiled, an informative message is issued (but which might result in some unsatisfied references).
Input from Curt Arnold <carnold@houston.rr.com>:
"Curt Arnold" <carnold@houston.rr.com> to XML Schema Comments list (and xml-dev) on Thu, 11 May 2000 01:19:19 -0500
Section 6.3.2: Point 4
There would also be the need for some sort of statement when inconsistent schemaLocations appear for the same namespace. I would assume that the first would take precedence.
Input from Jim Trezzo <jtrezzo@us.oracle.com>:
Jim Trezzo <jtrezzo@us.oracle.com> to XML Schema Comments list on Fri, 12 May 2000 16:03:28 -0700
3. SchemaLocation problems
The XMLSchema spec seems to allow one to create XML schemas which can contain references to element declarations and type definitions from other namespaces, without having to provide the schemas that contains those definitions. In effect, each instance document can then suggest the location of the other schemas, or the schema processor must have some other way of finding them.
This presents a problem when we are trying to validate multiple documents against a schema (or create a repository for documents conforming to a schema), since each document might suggest changing the type structure for the elements in this schema.
The upshot of this is that implementations are required to provide a way of locating schemas outside of the <import> system if they want to avoid using the <schemaLocation> in the instance documents (which is horrible).
We would like to require specification of the schemaLocation attribute in an <import> element in the schema, so that we don't have to provide an alternative schema location mechanism for schemas that come to us without <import schemaLocation=>. The spec should still allow the <import> to be ignored like <schemaLocation>, for systems that will find schemas by other means.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§6.3.2 How schema definitions are located on the Web
Under Schema Representation Constraint: Schema Document Location Strategy, I'm simply amazed at the lack of constraint here. From my reading of this section, almost anything goes, and I don't understand how a conformance definition could include this normatively.
Discussed in calls of 2000-06-30 and 2000-07-06.
MSM suggested that conflicts among multiple imports are not a problem, because the schemaLoc information is only a hint; even if we did specify a rule for resolving conflicts, the processor could ignore all the hints -- it's not clear it would be possible to tell whether any processor implemented the conflict-resolution rule. The issue might be reclassified A, except that that might appear patronizing.
RESOLVED unanimously: to dispose of this issue by responding that the spec already tells you what to do: the schemaLoc values are hints. If you come across multiple imports for the same namespace, you are free to ignore the hints; if you process a document for a namespace, and discover it has a conflicting definition for something you already have a definition for, it's an error. [Unless you back out everything you've done with the second file and pretend you never opened it.]
The relevant sections of the spec should be mentioned in the response (since it might be just a question of the commentator not having found them). They are 6.2, and the section that forbids conflicts among declarations for the same item.
This issue also contains a separate point, namely Oracle's proposal to require a schemaLoc hint on the import element.
The NS name might suffice, some said, but isn't guaranteed to produce a schema. Requiring a schema location hint places no particular burden on the processor or author, processor can always ignore it.
At least two grounds for objection were identified: (1) schemaLoc on the import statement is now, and should be, parallel in usage to schemaLoc on elements in the document instance, and (2) requiring schemaLoc would suggest that in principle the namespace name cannot be expected to serve as a URI for the schema; in the view of some WG members (at least), the normal case should be to use the namespace name to retrieve the schema, and schemaLoc should be reserved for unusual (if not for pathological) cases.
RESOLVED unanimously: make this a priority feedback issue.
Should the list type constructor be modified to allow the schema author to specify an arbitrary character (or string, or regular expression) for the item separator?
Input from Dario de Judicibus:
"Dario de Judicibus" <ddj@mclink.it> to XML Schema Comments list, Tue, 25 Apr 2000 23:12:02 +0200
The proposals first. The first one is related to lists of strings (simple types). Standards state that no list of string can be defined if some string contain spaces, because list are space separators. In my personal opinion this is a weakness of the standard, especially if you consider that we are speaking of Unicode strings, where the concept (and code point) of space may differ from language to language. What about adding a new facet for simple types which allow to specify the list separator? Default would be blank space 0x20. So we might have
<xsd:simpleType name="ThreeCountries" base="Country" derivedBy="xsd:list"> <xsd:length value="3" /> <xsd:separator value=";" /> </xsd:simpleType> <xsd:element name="threeCountries" type="ThreeCountries" /> <threeCountries>United States of America;Italia;San Marino</threeCountries> |
Input from Curt Arnold:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Wed, 26 Apr 2000 07:56:59 -0500
I am a long time observer of the W3C XML Schema spec and not a member of the W3C or the working group, so what I'm going to say is not official W3C, but things covered in previous posts.
List separators: The working group explicit decided to allow only the space separated lists at this time. Even getting that took a whole lot of work. Some other W3C work (scalable vector graphics) for instance does use other delimiters and if this is going to be changed it will be done to accomodate the uses in other W3C efforts.
Discussed at Edinburgh ftf.
While we know this can useful, it is a slippery slope; lists are included only for legacy purposes in the first place. Microstructure such as this requires the presence of a schema to parse, and in general we wish to limit amount of secondary notations processors need to parse.
So, thank you but we don't believe this would be a wise change.
Formal response sent 11 July. Commentator has not replied.
Should the anyAttribute particle be changed to mean not "any attribute from a particular namespace" but instead "any attribute from a particular attribute group in a particular namespace"? [Unsure of specific change intended. -MSM]
Input from Dario de Judicibus:
"Dario de Judicibus" <ddj@mclink.it> to XML Schema Comments list, Tue, 25 Apr 2000 23:12:02 +0200
The comments now. The first is related to <anyAttribute>. Differently from <any>, it does not look useful to allow any attribute of a specific schema to be used in a specific element of another schema. It makes more sense to allow any attribute belonging to an attribute group of a specific schema to be used in another element. For example
<xsd:element name="image"> <xsd:complexType> <anyAttribute namespace="http://www.w3.org/xhtml" group="attributesForImg" /> </xsd:complexType> </xsd:element> |
Should the datatypes spec be modified to allow the construction of types with simple internal structure (e.g. to allow both quantity and units of measure to be captured in the same simple type)?
Cf. Suggestion: Microparsing support in XML Schema
Input from Dario de Judicibus:
"Dario de Judicibus" <ddj@mclink.it> to XML Schema Comments list, Tue, 25 Apr 2000 23:12:02 +0200
Final comment: there is no way to combine types. For example, if I have
<xsd:simpleType name="units"> <enumeration value="cm" /> <enumeration value="in" /> </xsd:simpleType> |
I have no way to define
<height>12.4cm</height> |
by combining xsd:decimal and units. That might be very useful. We might use a variant of pattern for that:
<xsd:complexType name="heightWithUnits"> <xsd:pattern> <xsd:part type="xsd:decimal" /> <xsd:part value="\p{Zs}*" /> <xsd:part type="units" /> </xsd:pattern> </xsd:complexType> |
Input from Curt Arnold:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Wed, 26 Apr 2000 07:56:59 -0500
The schema group stated that aggregate types were outside the scope of the initial version. Derivation by list was a hard fought exception to that principle.
If you are interested in discussions of dimensional units in XML, I can send you URL's to quite a few discussions.
Input from Martin Bryan <mtbryan@sgml.u-net.com>:
"Martin Bryan" <mtbryan@sgml.u-net.com> to XML Schema Comments list on Sun, 14 May 2000 08:01:08 +0100
The one area I still expect we are going to have problems in using datatype for electronic commerce is measurements. For example, how can I check that 100cm and 1m are exactly equivalent, but 1yd is not. But again I do not expect you to have addressed these problems at this state. (Schema2 will be along within a few years!)
Discussed at Edinburgh ftf.
We think it would be unwise to introduce secondary notations for representing structures which can be represented satisfactorily using XML.
Another commentator is prepared to wait until version 2.0.
How should schema processors treat derivations of simple types which re-specify facets? In particular, should or must they signal an error if the re-specified facets are less restrictive than the original values, not more? Can period and duration be usefully respecified at all?
Input from Ray Waldin:
Ray Waldin <rwaldin@pacbell.net> to XML Schema Comments list, Wed, 26 Apr 2000 03:45:10 -0700 (as corrected 26 Apr 2000 03:51:53 -0700).
I have some questions and a comment concerning SimpleTypes derived "by restriction". To illustrate:
<simpleType name="firstType" base="decimal"> <minInclusive value="1"/> <maxInclusive value="10"/> </simpleType> <simpleType name="secondType" base="firstType"> <minInclusive value="2"/> <maxInclusive value="5"/> </simpleType> |
This seems perfectly acceptible as secondType is derived from firstType "by restricting its value space", by specifying "more restrictive" values for some facets. Here's a case that's not so obvious, using "less restrictive" values:
<simpleType name="thirdType" base="firstType"> <minInclusive value="0"/> <maxInclusive value="11"/> </simpleType> |
My questions:
Is this disallowed or just pointless? In other words, should a schema processor regard this type derivation as an error, or simply produce a thirdType which is no more restrictive than firstType?
What does "more restrictive" mean for the period and duration facets? Can period and duration be re-specified in any meaningful way or is re-specifying either of these values disallowed?
If a derived type re-specifies the pattern facet (as in the case of NCName and Name), are schema processors expected to: A) ensure that a derived type specifies a "more restrictive" pattern than its base type, or B) check all patterns in a type's derivation hierarchy when validating an instance of that type?
My comment:
The datatypes spec should elaborate on the expected behavior of schema processors when encountering derived types which re-specify values for each facet.
Input from Curt Arnold:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Wed, 26 Apr 2000 07:33:29 -0500
My personal preference would be that "looser" facets would be tolerated, though an ideal schema profiler might tell that you are wasting cycles or optimize the evaluation. I don't think that suboptimality is a good reason to reject a document and the complexity involved in determining the active constraint set is not justified in my opinion. While figuring out the looser constraint is fairly obvious for min/max, what if were trying to better if one pattern is looser than another (and both could be active).
As a pattern, I know of no programming language that would fail to compile a conditional like:
if(a > 1 && a > 2 && a < 5 && a <10) {} |
It would be a conceptual error for the duration facet to be specified with two distinct values in a type hierarchy. You shouldn't be able to derive from day and stretch the day to 25 hours. The resulting type (25 hour periods starting at a particular midnight) doesn't fit into the value space of 24 hour periods starting at midnight. I could see making a duplicate specification of duration in a hierarchy an error, I would not try to determine if the values of the duration were identical (say if one had said P1D and another PT3600s)
Multiple period facets can make sense in a hierarchy make sense. You could create a derived type from time with a period of 48 hours that represented a particular time of day every other day. I can't see anything a schema validation can do with the period facet. It seems like a piece of information only the application uses (if it wants) to determine the meaning of the type.
Discussed at Edinburgh ftf. RESOLVED to require processors to detect restrictions which specify a broader range than the base, and report them as errors.
Open questions: how to zero out the value set, whether vacuous restrictions should be errors or not.
Discussed in call of 2000-07-21.
In Edinburgh we agreed that it was an error to re-specify a facet if the re-specification did not restrict the value space (or at least leave it unchanged). (Not clear whether this applies to regexes in the pattern facet or not.)
AM recollects that this does not apply to regex patterns, on account of technical cost. Allowed to issue warning.
RESOLVED: make failure to check this constraint on 'pattern' a priority feedback issue.
We left open questions relating to how to zero out value set, vacuous restrictions.
The minutes read in part:
Diagram of the state of play:
1-10 ==> 1-10 postponed 1-10 ==> 0-10 error, decided 1-10 ==> 11-12 postponed |
RESOLVED unanimously to make Vacuous restriction legal.
RESOLVED: to make zeroing out the value space illegal. Dissent: Sperberg-McQueen (rationale: the empty set is an important tool in reasoning about sets; forbidding it because we think it won't 'normally' be a good idea is not good language design) Abstentions: none.
Should the structures spec be modified to forbid recursion in names content models?
Input from Richard Tobin:
Richard Tobin <richard@cogsci.ed.ac.uk> to XML Schema Comments list, Wed, 26 Apr 2000 15:12:10 +0100
The use of named model groups allows content models that are not regular expressions. For example:
<s:group name="recur"> <s:sequence> <s:element name="open"/> <s:group ref="recur" minOccurs="0" maxOccurs="unbounded"/> <s:element name="close"/> </s:sequence> </s:group> |
Useful though this would be, it is probably not intended.
What happened to general entities? Should XML Schema be modified to make it possible to declare general entities?
Cf. Support declaration of character entities?
Input from Dario de Judicibus:
dejudicibus@it.ibm.com to XML Schema Comments list, Thu, 27 Apr 2000 12:41:19 +0200
Maybe this is trivial for those of you joined this group long time ago, but I cannot find info about:
in one of the old draft of XML Schema, it was possible to define entities too. For example
<textEntity name='rights'>All rights reserved.</textEntity> |
Why that is no more possible? And what can I use to define entities, now?
Is there an error in the description of the Attribute Declaration component? Why are the properties {min occurs} and {max occurs} optional?
Input from James Tauber:
James Tauber <JTauber@bowstreet.com> to XML Schema Comments list, Fri, 28 Apr 2000 02:42:09 -0400
Why are {min occurs} and {max occurs} optional in Attribute Declaration?
The meaning of "absent" for these properties doesn't seem to be defined in 3.2 and I can't think of what it would be.
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 28 Apr 2000 08:54:07 +0100
Because they don't make sense on top-level declarations. It's entirely parallel to element declarations, just less obvious because we make Particle explicit between content model and element declaration, but leave what my parser calls AttributeUse implicit between type def and attr decl.
Input from James Tauber:
James Tauber <JTauber@bowstreet.com> to XML Schema Comments list, Fri, 28 Apr 2000 03:55:45 -0400
Because they don't make sense on top-level declarations.
So, could a sentence be added to this effect?
Is there an error in the description of the Ur-Type? Why is its {name} property shown as "Not specified?" Does that mean "absent"?
Input from James Tauber:
James Tauber <JTauber@bowstreet.com> to XML Schema Comments list, Fri, 28 Apr 2000 04:16:55 -0400
Why is the {name} of the Ur-Type (as a Complex Type) in 3.4 shown as "Not specified"? How is this different from "absent"?
I assume that anonymous complex types in general have "absent" as their {name} or is this not true?
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 28 Apr 2000 09:22:11 +0100
Why is the {name} of the Ur-Type (as a Complex Type) in 3.4 shown as "Not specified"? How is this different from "absent"?
No particular reason - should probably be 'absent'.
I assume that anonymous complex types in general have "absent" as their {name} or is this not true?
Correct.
Formal response to commentator observes that the question is now moot, since the urType now has an explicit name (see issue LC-74.
Is there an error in the links from the term "absent"? Why does it link to the glossary entry for "null"?
Input from James Tauber:
James Tauber <JTauber@bowstreet.com> to XML Schema Comments list, Fri, 28 Apr 2000 04:18:37 -0400
Why does "absent" in the "Complex Type Definition of the Ur-Type" tableau (and possibly elsewhere) point to the definition for "null" in the glossary?
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 28 Apr 2000 10:33:33 +0100
That's a bug, arising from a late decision to change from 'null' to 'absent'.
Should there be a simpleType definition of the urtype?
Input from James Tauber:
James Tauber <JTauber@bowstreet.com> to XML Schema Comments list, Fri, 28 Apr 2000 04:20:18 -0400
I thought there used to be a Simple Type Definition of the Ur-Type? Should there be one in 3.13 parallel to the one for Complex Types in 3.4? Or am I missing something?
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 28 Apr 2000 10:36:39 +0100
This is a tricky issue. The ur-type is neither simple nor complex. What's given in 3.4 is what it looks like as the base of a complex type definition. There's no way to show what it looks like as the base of a simple type definition, or rather, that would simply be a simple type definition with no values for any of the properties, which would not be terribly helpful.
I agree the prose explaining all this is less than satisfactory.
Input from James Tauber:
I get it totally now that you have explained to me, which seems to suggest that an additional sentence or two in the prose could do a great deal.
Should the DTD and schema for schemas define a processContents attribute for the anyAttribute element?
Input from James Tauber:
James Tauber <JTauber@bowstreet.com> to XML Schema Comments list, Fri, 28 Apr 2000 07:28:18 -0400
In the XML Representation Summary for anyAttribute in 4.3.3, anyAttribute is not shown as taking a processContents attribute. The schema for schemas and the DTD for schemas does not allow it, either.
However, 1.1 of the {attribute wildcard} property correspondence mentions it.
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 28 Apr 2000 14:29:32 +0100
It should allow it, you're right.
##local
stand alone in namespace attribute or
must it be in a list?Should the representation summary for the any element
be changed to clarify the usage of ##local
?
Input from James Tauber:
James Tauber <JTauber@bowstreet.com> to XML Schema Comments list, Fri, 28 Apr 2000 09:16:47 -0400
In 4.3.7, in the representation summary for the "any" element information item, it indicates that the attribute "namespace" takes either:
##any
##other
##local
##targetNamespace
}However, in the "otherwise" section of the correspondences to the
Wildcard Schema Component, it refers to ##local
as being
a substring in the list (and this is in fact the only mention of
##local
)
Should the representation summary really read:
namespace = ##any | ##other | list of {uri, ##targetNamespace, ##other} |
?
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 28 Apr 2000 14:33:10 +0100
Your analysis is correct. But I think you typoed above -- the corrected summary should read
namespace = ##any | ##other | list of {uri, ##targetNamespace, ##local} |
Should the meaning of the value-constraint property for attributes be clarified?
Input from James Tauber:
James Tauber <JTauber@bowstreet.com> to XML Schema Comments list, Fri, 28 Apr 2000 15:52:55 -0400
In 3.2, the {value constraint} property of Attribute Declaration components seems to allow a (value, absent) pair. This is supported by the property correspondences for "attribute" element information items (4.3.1) directly under "schema" where {value constraint}'s representation is:
If there is a value attribute, then a pair consisting of the lexical value of that attribute and absent, otherwise absent
What is the meaning of a {value constraint} that is a (value, absent) pair?
In this particular case, it seems the value of the "use" attribute is ignored, but doesn't it default to "optional"?
Should the word parent be changed to ancestor in the representation of the {target namespace} property for attribute declaration components (case 2)?
Input from James Tauber:
James Tauber <JTauber@bowstreet.com> to XML Schema Comments list, Fri, 28 Apr 2000 16:10:56 -0400
In the second case in 4.3.1, a representation is given for when the attribute element information item is not a direct child of schema (that's the first case). However, in the representation of {target namespace} it refers to the targetNamespace attribute of the parent schema element.
Should this say "ancestor"?
Should the URI reference used for the XML Schema namespace change with each revision of the spec? Or not? Does the status of the spec (draft, CR, PR, Rec) matter?
Cf. Use 2000 not 1999 in XML Schema namespace name?
Input from Henry S. Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 28 Apr 2000 23:04:12 +0100
"Joseph M. Reagle Jr." <reagle@w3.org> writes:
I used to both schema and DTD validate, but I didn't realize these things had moved. I'll try using these URLs and see if it still works. However, this policy of locating the schema and DTD at the namespace is pretty confusing. I appreciate you don't want to change the namespace every time you issue a new draft and I hope you would try every time you made a substantive change, because now the result is that even if I write my XML instance that (today) validates under [1,2] next time you put out a new draft it won't! Before, not updating your namespace violated a philosphical point (but the actual dtd and schema were in a more specific (month) date space). Now you are violating a more practical point, if I have an example that works now based on something in date space it won't in the future. (I think, right?)
[1] http://www.w3.org/1999/XMLSchema.dtd
[2] http://www.w3.org/1999/XMLSchema.xsd
Well, it's a difficult point. I'd say we don't have a namespace yet, we're just working towards having one, and using the same name so people will get used to it as the XML Schema namespace.
I appreciate your point about things going stale -- if we ever make a backwards incompatible change, we'll put the old schema and dtd somewhere in date space and you can use them for your stale documents.
We're in new territory here (never before has there been a definitive and operational anything at a namespace URI before), Dan and I have been making policy by the seat of our pants, by all means lets keep discussing this, but move the archiving to the public comments list (see addresses above).
Input from Dan Connolly:
Dan Connolly <connolly@w3.org> to XML Schema Comments list, Fri, 28 Apr 2000 17:48:22 -0500
"Henry S. Thompson" wrote:
Well, it's a difficult point. I'd say we don't have a namespace yet, we're just working towards having one, and using the same name so people will get used to it as the XML Schema namespace.
Really? That seems like an odd way of looking at things, to me. It's quite clear to me that we have a namespace; the definition is what you get back from http://www.w3.org/1999/XMLSchema.
And if we change the definition, it's sort of antisocial to re-use the old address, as Joseph has observed. Though this is work-in-progress stuff, and we don't really plan to support drafts once they've been superceded, we might as well avoid the sort of problems Reagle is having when it's easy to do.
I appreciate your point about things going stale -- if we ever make a backwards incompatible change, we'll put the old schema and dtd somewhere in date space and you can use them for your stale documents.
Again, that seems backwards; if we make backwards-incompatible changes, the thing to do is to leave the old one alone and use a new identifier for the new definition. It's not really fair to expect folks to change pointers in old documents.
If we decide to break their old documents, i.e. to not support them, that's one thing. But the way to support them, if we're going to go to any trouble at all, is to just leave the old definition in place if we make a new, incompatible one.
Input from Joseph M. Reagle Jr.:
Joseph M. Reagle Jr. <reagle@w3.org> to XML Schema Comments list, Fri, 28 Apr 2000 18:15:46 -0400
At 23:04 2000-04-28 +0100, Henry S. Thompson wrote: I appreciate your point about things going stale -- if we ever make a backwards incompatible change, we'll put the old schema and dtd somewhere in date space and you can use them for your stale documents.
But I can't touch my WD in the TR space, which used to work and won't when you make these changes.
Input from Noah Mendelsohn:
Noah_Mendelsohn@lotus.com to XML Schema Comments list, Fri, 28 Apr 2000 19:12:23 -0400
Dan, I think you're presuming answers to a versioning architecture for XML and namespaces. I believe that to be a known hard problem which, in spite of my suggestions to the contrary [1], the XML activity has so far declined to formally consider (I believe it was discussed informally at a CG meeting, perhaps in Montreal last year.)
Everything you propose, i.e. immutable namespaces, makes sense in isolation. The problem I see is that none of the other necessary XML machinery has been developed. Let's assume that some particular vocabulary undergoes within a year 20 minor modifications, mostly bug fixes,introducing little incompatibilities that are not of concern to the vast majority of users. So, over the course of the year, 100,000 documents are written to this vocabulary, 5,000 in each of the 20 namespaces. Question: how do I build and maintain 30 XSL stylesheets that do the right thing with these documents? For the sake of discussion, none of the stylesheets happen to make use of any of the features that were affected by the 20 bug fixes. Were it not for the decision to make namespaces immutable, a single set of 30 stylesheets would suffice, and none of the 30 would have required change through the year. Presuming immutable namespaces, which do indeed have many desirable architectural properties, I either need 600 stylesheets (30 useful sheets x 20 namespaces used in the instances), or some rather messy disjunctions in each of my XPaths.
I do not propose that we go into an extensive discussion of versioning here. I merely wish to agree with Henry that the answers are far from clear, and in that sense we are feeling our way. I think we are far from having worked out the practical ramifications of any particular fixed design for versioning, including any that might be based on immutable namespaces. It is my opinion that almost anything practical we do for robust versioning of XML vocabularies will require some serious engineering in one or another of our existing XML specifications (e.g. XPath, if you believe the analysis above). Pending such developments, I think we in the schemas group will have to make decisions that are somewhat ad hoc at times, perhaps republishing minor fixes as changes to the same namespace, with some means of deploying new ones for major changes. In short, I think we are about to get bitten by an overall lack of investment in figuring out how to do namespace and vocabulary versioning in a robust manner. Maybe I am just being too pessimistic.
[1] http://lists.w3.org/Archives/Member/w3c-xml-plenary/1999Oct/0019.html (My apologies to those on the schema comments list who cannot access this member-only e-mail archive. The note basically suggests that versioning is an important problem that will rear its head soon, and points out some of the issues to be considered.)
Input from Dan Connolly:
Dan Connolly <connolly@w3.org> to XML Schema Comments list, Fri, 28 Apr 2000 20:44:40 -0500
Noah_Mendelsohn@lotus.com wrote:
Dan, I think you're presuming answers to a versioning architecture for XML and namespaces.
I'm observing an answer. Not the only answer, but one that is known to work (i.e. to avoid the problem Joseph ran into).
I believe that to be a known hard problem which, in spite of my suggestions to the contrary [1], the XML activity has so far declined to formally consider (I believe it was discussed informally at a CG meeting, perhaps in Montreal last year.)
Declined to consider versioning? Hardly! Evolution of specs is one of W3C's core values. cf "Web Architecture: Extensible Languages" http://www.w3.org/TR/1998/NOTE-webarch-extlang-19980210
I guess it could depend on your definition of 'formal'. But... I look hard at forward/backward compatibility of all the specs, and I'm not the only one. We insisted on some last-minute changes to XSLT (a way to make xslt:message act as a halt-and-catch-fire instruction) for exactly this reason.
Everything you propose, i.e. immutable namespaces, makes sense in isolation. The problem I see is that none of the other necessary XML machinery has been developed. Let's assume that some particular vocabulary undergoes within a year 20 minor modifications, mostly bug fixes,introducing little incompatibilities that are not of concern to the vast majority of users. So, over the course of the year, 100,000 documents are written to this vocabulary, 5,000 in each of the 20 namespaces. Question: how do I build and maintain 30 XSL stylesheets that do the right thing with these documents?
I think you've answered your own question. You just do. It's hard and awkward, but clearly it's possible.
The answer I'm talking about meets some requirements (namely, that you can write a document and be assured that it will be interpreted consistently henceforth) but doesn't meet others, i.e. easy maintenance of stylesheets.
For the sake of discussion, none of the stylesheets happen to make use of any of the features that were affected by the 20 bug fixes. Were it not for the decision to make namespaces immutable, a single set of 30 stylesheets would suffice, and none of the 30 would have required change through the year. Presuming immutable namespaces, which do indeed have many desirable architectural properties, I either need 600 stylesheets (30 useful sheets x 20 namespaces used in the instances), or some rather messy disjunctions in each of my XPaths.
Right.
I do not propose that we go into an extensive discussion of versioning here. I merely wish to agree with Henry that the answers are far from clear, and in that sense we are feeling our way.
I agree that the whole general problem of language evolution is messy. But I maintain that there's one mechanism, immutable resources, that's known to avoid the problem Joseph ran into.
I think we are far from having worked out the practical ramifications of any particular fixed design for versioning, including any that might be based on immutable namespaces. It is my opinion that almost anything practical we do for robust versioning of XML vocabularies will require some serious engineering in one or another of our existing XML specifications (e.g. XPath, if you believe the analysis above). Pending such developments, I think we in the schemas group will have to make decisions that are somewhat ad hoc at times, perhaps republishing minor fixes as changes to the same namespace, with some means of deploying new ones for major changes.
Sure... I just disagree that Henry's approach of "we'll put the old one someplace that you can find it" is very useful.
In short, I think we are about to get bitten by an overall lack of investment in figuring out how to do namespace and vocabulary versioning in a robust manner. Maybe I am just being too pessimistic.
If we had to design all parts of the Web before we deployed anything, where would we be? I guess I have a little more faith that economical solutions will present themselves in a timely fashion ;-)
Input from Henry S. Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 29 Apr 2000 10:13:24 +0100
I'll make three observations:
1) In the nicest possible way, I'm not sure what we owe to Joseph on this one. We have not published this namespace yet, in the sense of asserting publicly in a process-blessed way that http://www.w3.org/1999/XMLSchema is the namespace URI for a namespace whose semantics are defined in some officially approved REC.
2) The thrust of Dan's remarks are at odds with my memory of the discussions surrounding the decision to enforce http://www.w3.org/1999/XSL/Transform as the namespace URI for XSLT, which strongly suggested a philosophical stance about persistent identity clearly at odds with the practical reality of changing details. In particular the decision to not add a version number to that NS URI was taken after considerable thought.
I'm afraid this takes us back to my concern about too-tight connection between namespace URI and presumed resource (= XML Schema in this case) location.
3) There's certainly precedent for 'stable' resources evolving: the XML Spec DTD still lives at http://www.w3.org/XML/1998/06/ although it's currently in its 21st edition.
Input from Dan Connolly:
Dan Connolly <connolly@w3.org> to XML Schema Comments list, Sat, 29 Apr 2000 09:27:15 -0500
"Henry S. Thompson" wrote:
... 1) In the nicest possible way, I'm not sure what we owe to Joseph on this one. We have not published this namespace yet, in the sense of asserting publicly in a process-blessed way that http://www.w3.org/1999/XMLSchema is the namespace URI for a namespace whose semantics are defined in some officially approved REC.
While it's true that we're not 100% bound to support old drafts, I hope that the spec will gradually stabilize; i.e. that we'll gradually make more of an attempt to avoid problems like the ones Joseph ran into.
2) The thrust of Dan's remarks are at odds with my memory of the discussions surrounding the decision to enforce http://www.w3.org/1999/XSL/Transform as the namespace URI for XSLT, which strongly suggested a philosophical stance about persistent identity clearly at odds with the practical reality of changing details. In particular the decision to not add a version number to that NS URI was taken after considerable thought.
I'm not sure what you mean by "enforce", but perhaps that's no matter...
My remarks aren't at odds with the way the XSLT namespace works; they just don't apply. The XSL WG decided not to promise that the XSLT namespace won't change. I think they made some promise about never changing the namespace is such a way that would change the semantics of a stylesheet that conforms to XSLT 1.0, but they left open the possibility of backward-compatible changes to the namespace (where "backwards-compatible change" is not formally specified).
I have not said that every namespace resource must be immutable. I have just said that using immutable resources as namespaces has some desireable characteristics, including avoiding the "that schema just changed out from under me" problem that Joseph ran into.
I'm afraid this takes us back to my concern about too-tight connection between namespace URI and presumed resource (= XML Schema in this case) location.
In any particular way? Or just general unease?
3) There's certainly precedent for 'stable' resources evolving: the XML Spec DTD still lives at http://www.w3.org/XML/1998/06/ although it's currently in its 21st edition.
I'm not sure what you mean by 'stable'. That resource isn't "stable published" in the Ted Nelson sense; it changes whenever Eve feels like it. Contrast that with http://www.w3.org/TR/1998/REC-xml-19980210 or http://www.ietf.org/rfc/rfc0822.txt which are guaranteed by their publishers not to change, or mid:f5b66t16zij.fsf@cogsci.ed.ac.uk which is guaranteed by the definition of the URI scheme not to change.
Input from Joseph M. Reagle Jr.:
Joseph M. Reagle Jr. <reagle@w3.org> to XML Schema Comments list, Mon, 01 May 2000 15:39:35 -0400
At 09:27 2000-04-29 -0500, Dan Connolly wrote: BTW: Has there ever been any consideration in schema (I thought I saw this but don't see it presently) to include a location attribute in the schema element type? If a namespace need not be dereferencable (like PUBLIC) then wouldn't it make sense to include a SYSTEM as well?
Found it, I knew that I saw it before: 2.6.3 xsi:schemaLocation, xsi:noNamespaceSchemaLocation The xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes http://www.w3.org/TR/xmlschema-1/#xsi:schemaLocation
So including these attributes in the schema for schema example in the spec would be helpful.
Input from Henry S. Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 01 May 2000 21:42:20 +0100
I'm confused by the above extracts, perhaps the context would have clarified things.
SYSTEM and PUBLIC are part of the mechanisms by which an XML instance points to external entities, in particular to (the external subset of) DTDs.
The analogous aspects of XML instances for schemas are
xsi:schemaLocation and xmlns, where
xsi is declared as
http://www.w3.org/1999/XMLSchema-instance
.
You can put a schemaLocation attribute from that namespace on any element in any document without it needing a declaration.
There's a lengthy discussion of all this in section 3 of chapter 6 of the spec. [1].
I'm slightly at a loss to know exactly what you are asking for --
are you suggesting that the schema for schemas should be modified to
include a namespace declaration for
http://www.w3.org/1999/XMLSchema-instance
and a
schemaLocation attribute from that namespace? We could do
that, but since the schema which applies to the schema for schemas,
which is of course itself (:-), actually does live at its
namespace URI, there didn't seem any point.
I'm also perplexed by the analogy with SYSTEM and PUBLIC, in that in vanilla XML 1.0, you don't find those inside DTD documents at all.
But maybe that's not what you meant.
Input from Joseph M. Reagle Jr.:
Joseph M. Reagle Jr. <reagle@w3.org> to XML Schema Comments list, Tue, 02 May 2000 18:38:42 -0400
I'm not surprised as I wasn't being very cogent. <smile> I was thinking something like the following would've mitigated my confusion (particularly as represented in the spec so I would know where to find things).
<xml version='1.0'?> <!-- XML Schema schema for XML Schemas: Part 1: Structures --> <!DOCTYPE schema PUBLIC "-//W3C//DTD XMLSCHEMA 19991216//EN" "http://www.w3.org/1999/XMLSchema.dtd"> <schema xmlns="http://www.w3.org/1999/XMLSchema" targetNamespace="http://www.w3.org/1999/XMLSchema" blockDefault="#all" elementFormDefault="qualified" version="Id: XMLSchema.xsd,v 1.1 2000/04/06 13:51:05 aqw Exp" xsi:schemaLocation ="http://www.w3.org/1999/XMLSchema.xsd" > |
Input from Henry S. Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 03 May 2000 09:36:37 +0100
Right, we could have done that, and anticipating that there may be processors which chose never to attempt dereferencing namespace URIs perhaps we should.
Input from Joseph M. Reagle Jr. <reagle@w3.org>:
"Joseph M. Reagle Jr." <reagle@w3.org> to XML Schema Comments list on Wed, 10 May 2000 12:48:14 -0400
The fact that you are using the same namespace "http://www.w3.org/1999/XMLSchema" across different specifications with substantively different syntaxes may cause problems for applications that expect the definition of a dated name space to be stable. See http://lists.w3.org/Archives/Public/xmlschema-dev/2000Apr/0026.html for more discussion on this topic:
Discussed in call of 2000-07-13.
The question is first on the policy we should be following with regard to changes in our namespace and changes in our spec. Frequent changes inconvenient implementors and schema authors; infrequent changes can lead to confusion and silent invalidation of existing data.
This question leads us to further questions: about our own evolution policy. Some WG members would like to adopt a forward-compatibility policy similar to that of XSLT. Some would like to have two namespace names, with different polices for changing them.
On question 1, MSM suggested there are several policies we could follow:
On question 2 (compatibility policy), shall we introduce a policy similar to that in XSLT, or shall we not?
On question 3 (one NS name or two?), shall we or shall we not?
After discussion, a straw poll was taken, which showed that there was not consensus on these issues.
We did at least agree on one thing:
RESOLVED unanimously: to make the target namespace of the schema for schemas be http://www.w3.org/2000/07/xml-schema (or 08, or whatever month is appropriate) in our next non-CR Working Draft.
Should XML Schema define a specific name for the urtype?
Input from Noah Mendelsohn:
Noah_Mendelsohn@lotus.com to XML Schema Comments list, Fri, 28 Apr 2000 19:24:14 -0400
I would like to raise a new issue for consideration in last call of XML schema. I think this request is based on new information, and I request that it be placed on our list for consideration.
As James Tauber has noted, we do not have an explicit name for the urType. The new information is that, during the design of the latest SOAP specification [1], we realized that even if the schema language itself can get by without a string name for the type, other systems have real need for such a name.
In a nutshell, SOAP has its own mechanisms for declaring array-like structures beyond those currently offered by schemas themselves. So, you will see SOAP elements labeled with attributes like:
SOAP-ENC:arrayType="xsd:int[3]" |
which refers to an array encoded as three XML elements each of which is conformant with xsd:integer. Or:
SOAP-ENC:arrayType="yourNS:address[3]" |
I happen not to be thrilled with that particular syntax, because I would prefer explicit markup and not to have QNames buried within a string, but neither of those is the issue here. The requirement is for some means of saying things like:
SOAP-ENC:arrayType="xsd:urType[3]" |
to indicate an array of three elements each of which must be a (subtype of) our urType. As with our schema design, the intention is to allow both simple types and complex types in the instance, so it is truly is our notion of urType.
I believe that SOAP is not the only system that will emerge with such a requirement.
If we as a workgroup decide not to provide such a name for the type, then it is likely that SOAP will wind up defining something like SOAP-ENC:urType with an indication that it refers to the urType of XML schemas (indeed, that was supposed to make it into the SOAP 1.1 specification and didn't quite because the problem was noticed too late.) I think everyone involved believes that would be undesirable. So, the request is for an officially-blessed name for the urType. (Note: this is not a request to try to split into separate urTypes for simple and complex... just a name for what we have got.)
By the way, I think most ordinary mortals will find the term urType to be unduly obscure. Perhaps something like "xsd:base"? I'm sure we could amuse ourselves with a little name-the-urType contest. Thank you very much.
[1] http://www.ibm.com/software/developer/library/soap/soapv11.html
Input from Curt Arnold:
On this issue of urType, I think I would prefer an explicit declaration of an urType and the derivation of all types in a domain from it.
<!-- circular reference could indicate a urType --> <complexType name="urType" base="urType"> <any minOccurs="0" maxOccurs="unbounded"/> <anyAttribute use="optional"/> <complexType> <!-- simpleType can inherit from complexType as long as complex type has not required attributes or content. Could restrict it further to self references. --> <simpleType name="urSimpleType" base="urType"/> <!-- this could replace content="empty" and serve as the basis for most derivations by extension --> <complexType name="empty" base="urType" derivedBy="restriction"> <any maxOccurs="0"/> <anyAttribute use="prohibited"/> </complexType> |
My entry for the name the ur contest is root or rootType.
Discussed in call of 2000-06-29.
The WG discussed this issue, and distinguished four questions: Shall we provide a name for the urType? Shall we provide names for both the 'basic' urType and the 'urSimpleType'? If so, what names should be used? Should the schema for schemas provide definitions of these types?
RESOLVED: to provide names for both the 'basic' urType and the 'simple' urType. Dissenting: Biron
On the question of providing declarations, the WG was evenly divided among pro, con, and uncertain. This question should be discussed by email; we will come back to it in Friday's meeting and if we can reach a conclusion, we will; otherwise we will launch further email discussion and put the issue at the bottom of the schedule.
The question of names was discussed by email and in the face to face meeting of 1-2 August 2000.
A straw poll was taken on the various proposals. A head to head comparison of the top two candidate terms showed 14 in favor of anyType, 11 in favor of urType. For the basic simple type, anySimpleType was chosen without objection.
Commentator confirms that decision is OK.
Is it an appropriate use of appinfo annotations to use
them to store aplication-specific integrity constraints (e.g. SQL
CHECK
constraints)?
Cf. XML Schema considered inadequately extensible
Cf. Provide guidance on extending schema for schemas?
Input from Vun Kannon, David:
"Vun Kannon, David" <dvunkannon@kpmg.com> to XML Schema Comments list, Mon, 1 May 2000 16:28:00 -0400
I am considering, as the subject line says, using appinfo annotations to store integrity constraints. Consider a document as the transfer syntax for a database predicate. An integrity constraint might be "no worker earns more than their supervisor" or "pay_rate > 0". These integrity constraints could be expressed as CHECK constraints in SQL, for instance.
I was considering trying to achieve the same effect with XSL-T
templates in appinfo elements. Unfortunately, it appears that even in
the April 7 draft, annotation and appinfo are poorly documented.
Annotation is used but not defined in either the schema for schemas or
DTD, and appinfo (and documentation) similarly. What is the content
model ()+
supposed to mean, in sec 4.3.10?
Your comments appreciated on the appropriateness of the idea, and my understanding of appinfo.
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 01 May 2000 21:46:13 +0100
"Vun Kannon, David" <dvunkannon@kpmg.com> writes:
I am considering, as the subject line says, using appinfo annotations to store integrity constraints. Consider a document as the transfer syntax for a database predicate. An integrity constraint might be "no worker earns more than their supervisor" or "pay_rate > 0". These integrity constraints could be expressed as CHECK constraints in SQL, for instance.
That's exactly the sort of thing appinfo is designed for. Sorry the documentation is less complete in this area than it should be.
As the schema for schemas reveals, the content model for appinfo is constrained only in so far as it may not contain elements from the XML Schema namespace itself -- anything else, in any combination, is fine.
So declare a namespace at the top of your schema, and put whatever you like from that namespace inside appinfo. If you give your schema validator a schema for that namespace as well as the schema for schemas, the contents of appinfo from that namespace will get schema-validated as well.
Input from Vun Kannon, David:
"Vun Kannon, David" <dvunkannon@kpmg.com> to XML Schema Comments list, Wed, 3 May 2000 11:53:02 -0400
Got it.
Formal response. Commentator replies "I do consider Henry's explanation an adequate answer to my question. ".
Should XML Schema add a mechanism to allow the schema author to define an attribute as having a value inherited from a specific attribute (e.g. one with the same name) on an enclosing attribute?
Input from Vun Kannon, David:
"Vun Kannon, David" <dvunkannon@kpmg.com> to XML Schema Comments list, Wed, 3 May 2000 11:53:02 -0400
I'm also thinking about appinfo for what I've taken to calling my
"#IMPLIED
resolution strategy". I want my schema to state
explicitly what to do if a particular attribute is absent from an
instance of an element for which that attribute is declared. For
instance, the strategy that most of my attributes follow is:
ancestor-or-self::*[@implied-attribute][1]/@implied-attribute |
which means that the attribute must exist somewhere among the containing elements. Choosing the nearest value gives this useful behavior that the attribute can be given a default, then the default overridden for any subtree.
Is this idea of "absent attribute resolution strategy" useful enough to all schemas that it should be part of XSchema itself, as opposed to hiding it under the appinfo bushel?
Input from Rick JELLIFFE:
Rick JELLIFFE <ricko@geotempo.com> to XML Schema Comments list, Thu, 04 May 2000 01:32:15 +0800
From: Vun Kannon, David (dvunkannon@kpmg.com) Is this idea of "absent attribute resolution strategy" useful enough to all schemas that it should be part of XSchema itself, as opposed to hiding it under the appinfo bushel?
Being able to declare that an attribute should have a value inherited from its parents (if that is the idea) was considered but not taken up. It would have been useful for xml:lang too. The line has to be drawn somewhere, of course (in the absense of some kind of extensible implied-attribute-value resolution framework).
Should the has-facet and has-property elements of part2.xsd be placed outside the main XML Schema namespace (e.g. in order to allow them to occur within the appinfo element)?
Input from Henry Thompson:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Mon, 1 May 2000 23:48:22 -0500
hst wrote:
As the schema for schemas reveals, the content model for appinfo is constrained only in so far as it may not contain elements from the XML Schema namespace itself -- anything else, in any combination, is fine.
I was just thinking about this, the has-facet and has-property elements in part2.xsd should be outside of the XMLSchema namespace.
Discussed in call of 2000-06-30.
An answer of 'yes' is entailed by our decision on issue LC-122. This issue should be closed.
Should types derived by the list constructor be defined as
derivedBy="list"
or as
derivedBy="xsd:list"
?
Cf. Proper home namespace/resource for built-in datatypes
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 02 May 2000 09:10:34 +0100
"Falk, Alexander" <falk@icon.at> writes:
Interesting question... I've looked at both the primer and the (normative) schema for schemas and DTD for schemas. The first seems to indicate that 'xsd:list' should be used, but the normative meta-schema and DTD clearly say that it should be 'list' all by itself.
Maybe someone else can shed some authoritative light on this: if the schema-namespace is defined with prefix xsd, and we have the following simpleType definition
<xsd:simpleType base="PermissionType" derivedBy="?????"/> |
should the derivedBy be 'list' (like the meta-schema and DTD say), or should it be 'xsd:list' (like the primer says)?
My opinion is that the primer is the mistaken one here. The 'derivedBy' attribute has an enumerated type, with values 'extension', 'list' and 'restriction'. It's not a reference, which would have type QName.
There appears to be stray data in the middle of the schema in Datatypes section A (an annotation element, two enumeration elements, and an orphaned end-tag for a simpleType element, betweeen the definitions of the complex types annotated and simpleType).
Input from Dan Vint:
Dan Vint <DVint@lexica.net> to XML Schema Comments list, Tue, 2 May 2000 07:56:33 -0700
I've been trying to use the schema as presented in Appendix A of Part 2 and have come across this piece of data that seems to be out of place or at least missing some more markup:
<complexType name="annotated" base="openAttrs" derivedBy="extension" ... <!-- Error here! <annotation> <documentation>All the things which can occur in any of the attributes controlling derivation or use of derived definitions</documentation> <documentation>A utility type, not for public use</documentation> </annotation> <enumeration value="list"/> <enumeration value="restriction"/> </simpleType> --> <complexType name="simpleType" base="annotated" derivedBy="extension" ... |
Between the two complexType definitions is this annotation and simpleType that is broken.
There appears to be a stray slash in the start-tag for the declaration of the base attribute within the declaration of the complext type named simpleType, in Datatypes section A. The start-tag looks like an empty-element tag, but the element is not empty.
Input from Dan Vint:
Dan Vint <DVint@lexica.net> to XML Schema Comments list, Tue, 2 May 2000 08:00:53 -0700
In the Appendix A (of part 2) the definition of this complexType has the attribute base as an empty tag, but the close tag appears later.
<complexType name="simpleType" base="annotated" derivedBy="extension" abstract="true"> <element ref="facet" minOccurs="0" maxOccurs="unbounded"/> <attribute name="name" type="NCName"> <annotation> <documentation>Can be restricted to required or forbidden</documentation> </annotation> </attribute> <attribute name="base" type="QName" use="required"/> <simpleType base="NMTOKEN"> <enumeration value="list"/> <enumeration value="restriction"/> </simpleType> </attribute> </complexType> |
The separate DTD and XSD files which should be identical to the text given in the appendices of parts 1 and 2 appear not to be identical. What gives?
Input from Dan Vint:
Dan Vint <DVint@lexica.net> to XML Schema Comments list, Tue, 2 May 2000 08:35:46 -0700
1) On the webpage http://www.w3.org/TR/xmlschema-1/ at the top are links to:
"with separate provision of the schema and DTD for schemas described herein. "
The thought was these would be the same as the Appendix versions of the schema and DTD, I beleive they are at least different versions. The relationship between these links and the later appendix information should be clarified and a clearer statement about these top level links should be made as well.
2) In the Appendix for part 1 and part 2, the SCCS ID variables show different names for the actual files than what appear to be the specified INCLUDE and DOCTYPE values for these files.
The current draft of XML Schema appears to have insufficient support for extensions to the language. This could be remedied by defining a syntax for associating active service packages with particular elements.
Cf. Using appinfo annotations to store integrity constraints
Cf. Provide guidance on extending schema for schemas?
Input from Robert Miller:
"Miller, Robert (GXS)" <Robert.Miller@gxs.ge.com> to XML Schema Comments list, Tue, 2 May 2000 15:43:11 -0400
I suppose my greatest concern is that the capabilities represented in the Schema work are not further extendible without also extending the Schema syntax. That's a steep hill for proposed new extensions to climb, and will likely act as a squelch on such extensions. As one who sees shortcomings in what is supported in the current Schema work, I find the closed Schema syntax disturbing.
...
Amid the complexity of the Schema specification is some much wished for capability, and I've been among those making wishes, as the DTD capability provides little of what is needed for Business Information Exchange. But as much as I want such capability, I fear that Schema is all we'll get, it won't be enough, and we'll have to pass it by for something better. That would be disheartening.
A design more in keeping with my desires for extensibility would define a syntax by which active service packages could be associated with XML elements. Edit constraints might be one such service, and one which might (and should) be pre-defined for use. The addition of new services would not require a change to the XML Schema syntax, it would simply require the definition of the new service and access to the process(es) supporting the extended service.
If a service approach were to be considered, some thought should be given to other services that might be desired (such as an array processing service), such that service syntactic support needs are adequately addressed in the underlying Schema syntax, even if the considered services are not fully defined and implemented.
Discussed in call of 2000-06-08.
The WG discussed this issue; some WG members argued that the particular mechanism proposed was probably NOT the best way to associate particular software handlers with elements in instances. The appinfo element, or stand-off annotation schemes (such as the Schema Adjunct Framework being defined by Extensibility) seem to do the job better. Others said that schemas should be more open than they are, but that the specific proposal here is not one they support.
RESOLVED without dissent: to make no change to the spec in response to this comment.
Robert Miller replies (privately): "Actually, it's more like resigned than satisfied. However, ...
"From the responses I have received on each of my issues, I am satisfied that the WG did give serious consideration to these issues, and acted upon them in a fair and open manner. Finalization of the work of the WG on XML Schema is anxiously awaited by many organizations. I have been impressed with the degree of effort put into resolving the outstanding issues in a prompt and thorough manner, and I look forward to final publication. I'll close with a quote from Dave Torma, founding Chair of X12C Communications and Controls, upon the approval of a specification that was not all he had hoped for: 'It's ugly, but I can live with it.'
XML Schema needs better support for semantics; in particular, the ability to link to a repository of semantic information about a particular object would be useful.
Input from Robert Miller:
"Miller, Robert (GXS)" <Robert.Miller@gxs.ge.com> to XML Schema Comments list, Tue, 2 May 2000 15:43:11 -0400
Perhaps the emphasis on `syntax' that underlies the XML DTD specification has too strongly influenced the efforts of people who recognized the limitations of DTD's. In my opinion, more attention needs to be paid to the semantics of information, and less to the syntax of information. There is little support for access to semantic information in the Schema work, where much work is needed.
The ebXML work with which I am a participant envisions repositories of semantic information. Certain of the semantic information in such repositories, such as value constraints, can be represented in the Schema syntax. But other important semantic attributes, (e.g.,a pointer to a repository of semantic information about an XML element), have no specific representation in the Schema.
Discussed in call of 2000-06-08.
Our requirements document does list linking to some specification of the semantics of an element type as a requirement; we thus agree with the commentator that linking to semantic repositories is a needed capability. The view of the WG is that the annotation scheme adopted at our Reston meeting (October/November 1999), and in particular the appinfo element, meets this requirement.
Should XML Schema be modified to provide support for arrays?
Input from Robert Miller:
"Miller, Robert (GXS)" <Robert.Miller@gxs.ge.com> to XML Schema Comments list, Tue, 2 May 2000 15:43:11 -0400
Just today, I received an Email from someone who had seen my earlier Email to the Schema Work Group pointing out the need to support arrays of information. Spreadsheets, a simple array construct, are not provided a common representation in the XML Schema work.
...
If a service approach were to be considered (cf. issue XML Schema considered inadequately extensible), some thought should be given to other services that might be desired (such as an array processing service), such that service syntactic support needs are adequately addressed in the underlying Schema syntax, even if the considered services are not fully defined and implemented.
1. Datatypes Issue MPEG-7 requires both arrays and matrices. We would prefer to have built-in array (1D) and matrix (2D, 3D) datatypes, instead of simply the 'derivedBy = list' mechanism.
If these cannot be provided then the alternative is to use lists. In the current WD, you can only create lists from atomic data types and since a list is not an atomic data type then you cannot create matrices using 'lists of lists' e.g.:
<simpleType name="ArrayOfInteger" base="integer" derivedBy="list"/> <length value="2"/> </simpleType> <simpleType name="MatrixOfInteger" base="ArrayOfInteger" derivedBy="list"/> <length value="4"/> </simpleType> |
Alternatively we can simply convert matrices to flattened lists which can be 1D, 2D or 3D and use a dim facet to lists to specify multi-dimensionality:
<simpleType name="MatrixOfInteger" base="ArrayOfInteger"/> <dim value="2 4"/> </simpleType> |
Formal response to commentator.
Don Brutzman of X3D confirms this is something of a problem for them.
Bob Miller replies "I do not consider XML Schema 1.0 'good enough' on this topic", but adds that he does not want to 'stop the presses' to add arrays to XML Schema 1.0.
Is a co-occurrence constraint missing in the second scenario of Structures 4.3.1 (where 'ref' is absent and 'schema' is not the parent)?
Input from James Tauber:
James Tauber <JTauber@bowstreet.com> to XML Schema Comments list, Tue, 2 May 2000 21:09:11 -0400
In 4.3.1, in the second scenario (where ref is absent and schema is not the parent) the value of {value constraint} is given as "fixed" if the "use" attribute is not "default".
In other words, if a value is given, but the use attribute is one of "optional", "fixed" or "required", the value constraint is taken to be "fixed".
Is this correct?
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 03 May 2000 09:26:50 +0100
There's a co-occurrence constraint missing, you're right. There should be something under Attribute Declaration Representation OK which says "if 'value' is present, then 'use' must be "default", "fixed" or "required".
What is the difference between an optional property (e.g. {min occurs}) and a mandatory property whose legal values may include "absent" (e.g. {target namespace})?
Input from Peter Canning:
Peter Canning <canning@vitria.com> to XML Schema Comments list, Tue, 02 May 2000 20:31:45 -0700
The structure specification identifies some schema component properties (e.g. the "min occurs" property in the "Attribute Declaration" component) as optional and states (section 3 paragraph 1) that optional properties that are missing have "absent" as their value. It also describes some properties (e.g. the "target namespace" property in the "Attribute Declaration" component) as mandatory, but includes "absent" in the list of legal values.
What is the difference between an optional property, and a mandatory property whose value can be "absent"?
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 03 May 2000 09:34:30 +0100
Peter Canning <canning@vitria.com> writes:
Good question. I think (leaving aside the question of what happens when a required property is absent without leave because of a failed reference) that {target namespace} is the only such property, and we should revisit the nomenclature here. There was a change in terminology in this area very late in the publication process, and some more work needs to be done.
The intended distinction is that {target namespace} is always relevant, it value always significant, even when it is the 'absent' value == no namespace. For e.g. {min occurs}, there are circumstances, e.g. for top-level attribute declarations, where the property is irrelevant, the value never looked at and hence not supplied.
Should the xsi:null attribute be changed so that it
accepts the values 0
and 1
as well as the
values true
and false
?
Input from Curt Arnold:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Tue, 2 May 2000 22:33:01 -0500
Section 2.3
Among these, the enumeration facet is one of the most useful and it can be constrain the values of almost simple type, except the boolean type.
Why not the boolean type? xsi:null appears to use an enumerated boolean (allowing true but not 1). Anyway, I believe that xsl:null should accept false, 0, true and 1. With only true and 1 signifying null content.
Should the Primer point out that lists cannot be derived from complex or list types?
Input from Curt Arnold:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Tue, 2 May 2000 22:33:01 -0500
Section 2.3
(you cannot create lists from complex types).
For the narrative, I think it is probably more appropriate that you mention lists can't be derived from other list types.
Should XML Schema be changed to revise the method of binding schema components to the namespace without a name and using them to validate unqualified elements in a document? (E.g. to specify that every schema document must specify a target namespace, and that the unqualified elements in a document instance are bound to a namespace at validation time, using a modification of the noNamespaceSchemaLocation mechanism.)
Input from Curt Arnold:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Tue, 2 May 2000 22:33:01 -0500
3.4 Undeclared target namespaces
I have a problem with this. If you are going to the point of defining a schema, these elements and attributes are in a conceptual namespace whether or not you give it a name. The way targetNamespace is defined, I cannot write one schema that could be used to validate a XML 1.0 sans namespace document and also used in a to validate a document with namespace support. I can't even use include since the targetNamespaces wouldn't match.
What I would recommend is that targetNamespace be required for schema definition. However, the XML 1.0 binding mechanism could specify both a validation namespace and schema location. The noNamespaceSchemaLocation could be a list of two URI's with the first being the validation namespace and the second being the schema location.
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 03 May 2000 09:29:50 +0100
Curt Arnold <carnold@houston.rr.com> writes:
... The way targetNamespace is defined, I cannot > write one schema that could be used to validate a XML 1.0 sans namespace > document and also used in a to validate a document with namespace support.
That's precisely what you can do. Write the schema with no targetNS, it can be used as is for no-namespace documents, and included into schemas with a targetNS and thereby appropriated for use with that target.
Input from Curt Arnold:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Wed, 3 May 2000 09:39:09 -0500
Primer Section 4.1 said:
The one import caveat to using the include is that the target namespace of the included constructions must be the same as the target namespace of the including schema... Maybe this is imprecise and Section 1 allows the included target namespace to be blank.
However, even if you could use the include to achieve reuse, it is still a bad thing to have to publish two near-identical schemas, one for use in XML 1.0 sans Namespace usage and one for namespace aware usage. This is really a distinction in usage and should be addressed in the binding of a schema with a document and not in the schema itself.
For example, you would have to have two schemas for XHTML. Both would truely be using names within the context of the http://www.w3.org/1999/XHTML (whatever the true namespace is), but one would be for XML 1.0 compatible documents and one for XML 1.0+namespaces. This is just not good.
However, simply moving the statement of which namespace to match with unnamespace qualified elements to the binding of the document and schema allows you to use one schema for both XML 1.0 and XML+Namespaces usages.
Input from Curt Arnold <carnold@houston.rr.com>:
"Curt Arnold" <carnold@houston.rr.com> to XML Schema Comments list (and xml-dev) on Thu, 11 May 2000 01:19:19 -0500
Note on defaultNamespace:
A previous comment (http://lists.w3.org/Archives/Public/www-xml-schema-comments/2000AprJun/0124.html) discussed Section 3.4 on Undeclared target namespaces. The current treatment would require one schema for XHTML, for example, for documents where the namespace was declared and another schema for XHTML where the namespace was not declared. To me, it seems that the creation of a schema implies that you are defining elements that are in a conceptual namespace and that the additional burden of picking out a URI for this namespace is minimal. If a instance document doesn't want to use an xmlns attribute, that is a usage issue that could be addressed by a xsi:defaultNamespace attribute (or schema PI) that provides a weaker binding of unqualified names to a namespace that is only used for schema validation.
Should the model of extension-by-suffixation now supported by XML Schema be revised (e.g. with a view toward achieving simpler behavior when the base type has mixed content)?
Input from Curt Arnold:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Tue, 2 May 2000 22:33:01 -0500
4.2 Deriving Types by Extension
Furthermore, the two content models are treated as two children of a sequential group.
I actually prefer this behavior, but it might be out of synch with our recent discussion about behavior when the base type has content model mixed.
Discussed in call of 2000-06-22.
Since this issue was raised by the commentator only as a hypothetical change made logical by the suggestion of LC-51, and since we had not adopted the suggestion of LC-51, the WG felt there was no need to adopt the implicit suggestion in this issue. RESOLVED without dissent: to retain the current suffixation rule for extending types, whether mixed or element-only.
Should XML Schema be extended to support the declaration of general entities (or at least of entities which represent special characters, e.g. eacute)? N.B. character entities are named entities; they are distinct from numeric character references.
Cf. Entities
Input from Steven Pemberton:
"Steven Pemberton" <steven.pemberton@cwi.nl> to XML Schema Comments list, Wed, 3 May 2000 15:17:30 +0200
The HTML WG has requested me to relay to you a request that XML Schemas include a facility to define at least character entities (such as é).
While we recognise that the full entity mechanism might be a burden, HTML markup typically contains a lot of character entities, and we would like to be able to define them when using schemas without having to fall back to a DTD subset.
Input from Dan Connolly:
Dan Connolly <connolly@w3.org> to XML Schema Comments list, Fri, 05 May 2000 08:44:04 -0500
Steven Pemberton wrote: The HTML WG has requested me to relay to you a request that XML Schemas include a facility to define at least character entities (such as é).
We tried that before, but it didn't work out well; we weren't sure we liked it, but we put the idea out for review, and the feedback was overwhelmingly negative:
"The provision within XML Schema: Structures of a mechanism for defining parsed entities presents problems for the relationship between schema-validity and XML 1.0 well-formedness, since references to entities defined only in a schema are undefined from the XML 1.0 perspective. Strictly speaking, a well-formed XML document may contain references to undefined entities only if it is declared as standalone='no' and contains either an external subset or one or more references to external parameter entities in their internal subset. We get around this by [Definition: ] defining a nearly well-formed XML document to be one which either is well-formed per XML 1.0, or which fails to be well-formed only because of undefined general entity references, but which would be well-formed if it were standalone='no' and identified an external subset. We consider this justified on the grounds that the use of a namespace declaration which refers to a schema functions rather as an external subset, and from the XML 1.0 perspective such a reference almost of necessity renders the document non-standalone when schema-validation is applied."
-- http://www.w3.org/1999/05/06-xmlschema-1/#conformance-schemaValidity
If you can think of a less awkward way to do it, let us know.
Otherwise, I think it's most likely that the WG will decline your request.
If you find this explanation to be satisfactory justification for us to decline your request, please let us know by withdrawing your request.
While we recognise that the full entity mechanism might be a burden, HTML markup typically contains a lot of character entities, and we would like to be able to define them when using schemas without having to fall back to a DTD subset.
I'm afraid that's about the only way I can see to make it work.
Another option is to use <eacute/>
instead of
é
, but that requires application-level support
rather than being handled by the XML processor, and it won't work in
attribute value literals.
Discussed in call of 2000-06-08.
Alternatives:
Proposal to provide detailed guidance on instruction on these alternatives. Instruct the editor of Primer to do so. WITHOUT OBJECTION.
Question: Is there a future solution with a change to XML 1.0? Answer: Not trivially. Follow-up: But should we take action on ourselves to forward request to XML core that they address this issue? Shall we suggest to the CG that this go on a list of candidate requirements for XML 2.0? Agreed by majority vote to instruct chairs to take this issue to the XML CG for consideration as a possible candidate requirement for XML 2.0.
Formal response to commentator. HTML WG dissents:
The HTML working group has instructed me to forward their dissent from your WG's decision, and to ask you to send the issue for review by the director.
The group is unhappy with the idea that a user agent would have to be able to process schemas as well as DTD fragments, when an aim of schemas was to replace DTDs.
How can XML Schema be used to support dynamic element-type naming?
Input from Dr. Ardeshir Bahreininejad:
"Dr. Ardeshir Bahreininejad" <bahreininejad@yahoo.com> to XML Schema Comments list, Wed, 3 May 2000 07:57:59 -0700 (PDT)
I wish to define an element in a schema document where the "name" of the element is not known. Let's say, the name of the element may be decided by other parties using the schema for example:
<element name="Cat"/> <element name="Dog"/> <element name="????"/> |
where a different user may decide on the ????. How do we define such dynamic name allocation?
Should XML Schema be modified to allow the creation of union types? (Or, less strongly, enumerated types whose value spaces are the unions of the value spaces of their base types, with the base types required to have disjoint value spaces.)
Input from David Vun Kannon:
"Vun Kannon, David" <dvunkannon@kpmg.com> to XML Schema Comments list, Wed, 3 May 2000 12:04:24 -0400
Can I build a datatype out of pieces that are disjoint? A simple example might be US_states + Canadian_provinces. Say my schema already contains declarations of these two enumerations. The new enumeration I want is the merger of the two. I don't want to extend one with the members of the other, explicitly written out.
A more complex case is SQL style datatypes: integer + NULL. Can I build such a datatype in XSchema?
Input from Ashok Malhotra:
petsa@us.ibm.com to XML Schema Comments list, Wed, 3 May 2000 13:12:49 -0400
No, sorry, not in version 1.
You cannot create a type out of the union of disjoint types, say, integer and string.
You also cannot union two separately declared enumerations.
Discussed at Edinburgh ftf.
This appears to be a narrowing of issue LC-2 which the task force will look at.
Formal response to commentator. David Vun Kannon replies that this looks satisfactory at first blush.
Can the rules defining the Constraint on Schemas "Type Derivation OK (Complex)" be clarified?
Input from Asir S Vedamuthu:
"Asir S Vedamuthu" <asirv@webmethods.com> to XML Schema Comments list, Thu, 4 May 2000 09:59:09 -0400
I am having a hard time to understand - how to evaluate 'The item type definition is validly derived from the {type definition}'. And, here is the text from the spec, http://www.w3.org/TR/xmlschema-1/#coss-ct
"Constraint on Schemas: Type Derivation OK (Complex)
A complex type definition (call it D, for derived) is validly derived from a type definition (call this B, for base) given a subset of {extension, restriction} if:
Please help me understand (maybe an example) & I greatly appreciate your help.
Does xml:lang need to be declared / imported like any other attribute? If so, where is the xml namespace?
Input from Jane Hunter:
Jane Hunter <jane@dstc.edu.au> to XML Schema Comments list, Fri, 05 May 2000 09:54:28 +1000
We're using XML Schema within MPEG-7 to define descriptions of audiovisual content. Certain descriptive elements have a language attribute. For consistency, we'd like to use the xml:lang attribute.
Does this need to be declared like any other attribute for the elements to which it applies? For example:
<complexType name="FrameAnnotation"> <element name="Who" type="string" minOccurs="0"/> <element name="What" type="string" minOccurs="0"/> .... <attribute ref="xml:lang"/> </complexType> |
I assume that we also need to import the xml namespace? If so, where is this located?
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 05 May 2000 09:07:59 +0100
There is an example of this in the schema for schemas [1] itself, which also uses xml:lang.
Short answer, it's at the XML namespace URI, i.e.
http://www.w3.org/XML/1998/namespace
Should the structures spec be modified by dropping the rule that all element types in an equivalence class must be declared as having types which are derived from the type of the exemplar of the equivalence class?
Input from Curt Arnold:
Curt Arnold <carnold@houston.rr.com> to XML Schema Comments list, Fri, 5 May 2000 00:05:30 -0500
Section 2.2.2.2
All such members must have type definitions which are either the same as the exemplar's type definition or restrictions or extensions of it. Therefore, although the names of elements can vary widely as new namespaces and members of the equivalence class are defined, the content of member elements is strictly limited according to the type definition of the equivalence class exemplar
This implies that the justification for the constraint is somehow related to simplification of validation because content is strictly limited....
However, this "limitation" is circumventable by creating an abstract examplar with a minimal type such as empty.
<complexType name="empty"/> <element name="resourceDefRef" abstract="true" type="empty"/> <element name="resource" equivClass ="resourceDefRef"> <complexType base="empty" content="mixed"> <any maxOccurs="unbounded" minOccurs="0"/> </complexType> </element> <elementType name="resourceRef" equivClasss="resourceDefRef"> <complexType base="empty"> <attribute name="href" type="uriReference"/> </complexType> </elementType> |
By which we have created an equivClass with two members that are about as structurally dissimilar as possible. I think it is a good thing to be able to do that since it should be common, as in this example, that logically equivalent things have radically different XML representation.
But then why go through the hoops of manufacturing a common
ancestor type? I have a guess that it is so that
final='restriction'
or final='extension'
makes sense, but that I think that is putting the cart before the
horse.
Like the concept of interfaces in OOP, equivClass's reason for being is to allow structurally dissimilar but conceptually similar items to be substitutable. Interfaces do not make any demands on the structure of the implementing class and one class may support many interfaces.
I would strongly recommend that:
true
would inhibit the use of the element as an
exemplar.I believe this would simplify schema authoring by eliminating the manufacturing of common ancestor classes, would simplify schema validation by eliminating the need to ensure common ancestor classes and would be consistent with the use of interfaces in OOP.
Discussed in call of 2000-06-08.
The WG discussed the current rule that all members of an equivalence class (or 'substitution class', as the chair suggested calling it instead) must have types derived from the exemplar of the class. In favor of retaining it, WG members noted that
The WG also observed in passing that in some cases, eliminating the rule might make it easier to mix in new element types (e.g. when internationalizing a schema) -- the set of cases affected seemed too limited to be compelling.
RESOLVED without dissent: to retain the status quo in this area.
Should hexadecimal notation be allowed for numbers (at least for integer and non-negative integer)?
Cf. Allow multiple lexical spaces for floats?
Input from Doug Ransom:
Doug Ransom <Doug_Ransom@pml.com> to XML Schema Comments list, Fri, 5 May 2000 11:53:52 -0700
I think would be very unfortunate if Integer and NonNegative integer could not have hex lexical structure. i.e. 0xffaa00bb instead of 4289331387.
Binary is really inapprpriate for this -- people often want to represent numbers as hex (i..e HTML colours).
Discussed at Edinburgh ftf.
The general tide of comments has run against allowing multiple lexical forms for the same type, even to the point that some comments criticize the decision to allow leading zeroes for integers. So the WG believes this would not be a wise change.
In discussions of issue LC-21, a proposal have been made for a built-in abstract type corresponding to each of the major existing built-in types, to allow derivation of types which share a value space with the existing built-in type but use a different lexical form. If we adopt that proposal as a general thing, then schema authors could specify hex notation for integers, though schema processors would not be required to understand the mapping from lexical form to value.
How does a system know which schema to search for a declaration of an element in a particular namespace?
Input from gmacri@libero.it:
"gmacri@libero.it"<gmacri@libero.it> to XML Schema Comments list, Sat, 6 May 2000 14:20:01 +0200
In this example there are three declarations of namespace and two declaration of schemalocation that are independent.
<Library xmlns:book ="http://www.somewhere.org/Book" xmlns:person="http://www.somewhere-else.org/Person" xmlns:xsi ="http://www.w3.org/1999/XMLSchema/instance" xsi:schemaLocation="http://www.somewhere.org/Examples http://www.somewhere.org/Examples/Book.xsd http://www.somewhere-else.org/Person http://www.somewhere-else.org/Person/Person.xsd"> <BookCatalogue> <book:Book> <book:Title>Illusions The Adventures of a Reluctant Messiah</book:Title> <book:Author>Richard Bach</book:Author> <book:Date>1977</book:Date> <book:ISBN>0-440-34319-4</book:ISBN> <book:Publisher>Dell Publishing Co.</book:Publisher> </book:Book> |
When I analyze this document with an application (automatically), how I can understand that, for example, for "Book" element I must search it in the schema "Book.xsd" and not in "Person.xsd", to verify the validity of this document?
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 08 May 2000 08:58:58 +0100
The above document is just fine even if it doesn't in itself
provide the answer to your question for a processor. The <Book>
element is in the {http://www.somewhere.org/Book
}
namespace, for which no schema document is identified explicitly in
the value of xsi:schemaLocation. That doesn't mean the
document is not schema valid. Note also that the document element
(<Library>) is not in any namespace, and no schema
is provided explicitly (with
xsi:noNamespaceSchemaLocation) for that element either.
But there are at least two other ways schema components for these elements might be found:
http://www.somewhere.org/Examples/Book.xsd
and
http://www.somewhere-else.org/Person/Person.xsd
might
<import> schemas with appropriate target namespaces
(that is, http://www.somewhere.org/Book
and none, for
<Book> and <Library>
respectively); Can a key contain fields which lie outside the element identified by the selector?
Input from gmacri@libero.it:
"gmacri@libero.it"<gmacri@libero.it> to XML Schema Comments list, Mon, 8 May 2000 14:24:33 +0200
In this example there is a definition of a key:
<xs:element name="car"> <xs:complexType model="empty"> . . . <xs:attribute name="regRef" type="dt:integer"/> <xs:attribute name="regState" type="twoLetterCode"/> </xs:complexType> </xs:element> </xs:complexType> <xs:key name="carRef" > <xs:selector>.//car[@regRef]</xs:selector> <xs:field>@regRef</xs:field> <xs:field>@regState</xs:field> </xs:keyref> </xs:element> |
If I want to define a key similar to the key defined above, can I use as content of <field> a name of element that is not a descendant of "car" but is a name of an element that is, for instance, the child of an ancestor of car?
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list, 08 May 2000 14:52:50 +0100
Yes. As the spec. says [1], the things picked out by <field> are not constrained to lie within the element picked out by <selector>, and no other constraint as to locality is imposed.
[1] http://www.w3.org/TR/xmlschema-1/#Identity-constraint_Definition_details
Should the wording in Structures 3.8 be changed for clarity?
Input from Michael K Smith:
"Smith, Michael K" <michael.smith@eds.com> to XML Schema Comments list, Mon, 8 May 2000 13:59:09 -0500
In '3.8 Particle Details':
The 'either' preceding paragraphs 1.1.1 through 1.1.3 (see below) and 1.2 is confusing. It initiallly appears that 1.1.1 through 1.1.3 are modified by the 'either', while upon further reading they must be joined by an implicit 'and' and the 'either' relates the implicit 1.1 to 1.2. I don't know what the best solution to this might be, perhaps parenthesis or an explict 1.1 with an 'and' conjoining its children. Perhaps your notational introduction explained this, but I didn't see it.
A sequence (possibly empty) of element information items is schema-valid with respect to a particle if either
1.1.1 The length of the sequence is greater than or equal to the {min occurs};
1.1.2 If {max occurs} is a number, the length of the sequence is less than or equal to the {max occurs};
One aspect of this separation of 1.1.1 and 1.1.2 that seems less than optimal is that it makes it easy to define a schema that cannot be satisfied. E.g. minOccurs = 2 and maxOccurs = 1. (Such schemas might be generated mechanically, say from database entries.) Whereas if 1.1.1 and 1.1.2 were combined we could at least have a schema that could be satisfied by instances that did not contain the excluded element.
1.1.1 Let range = [{min occurs}...{max occurs}]. If range is empty then the sequence must be empty, otherwise the length of the sequence must be contained in the range.
How does a schema author indicate that a complex type can have any namespace-qualified attribute, regardless of the namespace?
Input from achille@us.ibm.com:
In section 3.9, it is explained that: "the {namespace constraint} property (for a wildcard) provides for validation of elements that: 3. (not and absent} are namespace qualified."
I assume that it is also true for attribute wildcard. I look at the
XML representation of Wildcard and it does not seem obvious how to
express something like " any attribute with qualified namespace".
Maybe the XML representation of wildcard should allow something like
<any namespace="##qualified"/>
(and for Attribute
wildcard <anyAttribute namespace="##qualified"/>
)
which will correspond to the {namespace constraint} =
(not and absent).
Should XML Schema be modified to allow the definition of abstract information models, together with rules for encoding the information either as elements or as strings (for use as attribute values)?
Cf. Allow record-style simple types?
Input from Anders W. Tell:
Anders W. Tell (<anderst@toolsmiths.se>) to WWW XML Schema Comment list on Wed, 10 May 2000 09:47:27 +0200
Problem:
A common phenomena which now and then surfaces in the markup world is the occurrence of what some authors call "Micro-parsing". This is the situation when Schema writers define that a XML attribute should contain structured information and therefore creates a need for customized parsers, hence the above term.
Two examples are
match="/cars/car[@name='volvo']"
<path d="M 100 100 L 140 100 L 120 140
z"/>
Is this not a paradox? A markup language which cannot be used for markup anymore? Of course all markup languages have a limit and maybe XML's limit have been reached.
Why:
What are the reasons for encoding complex information in a single attribute ?
The reason I have seen are sofar are:
The following suggestions is an attempt to "internalize" these encoding scenarios, to capture as much as possible of the encoding information inside XML Schemas instead of relying on externally created and managed documentation.
Another side effect of the proposal is that it's now possible to have DOM access to structured attributes as if they were XML element encoded.
For Grove enthusiasts it is also possible to view (with a little effort ;)) attributes as hierarchical nodes.
So here goes...
Solution:
First a few initial short definitions:
Encoding "Stereotype" <=> something that should be encoded, is defined by a information model which may be defined in terms of one or more information items (nodes/properties,...).
Encoding "Form" <=> principles for how nodes/properties in an Stereotype's information model must be encoded as a strings or XML elements. (the following suggestion implies two forms, one for attribute encoding and one for XML element encoding)
"Attribute-Micro-Parser" <=> A software artifact which encodes and decodes XML attribute strings to/from XML elements.
interface DOMAttributeMicroParser { readonly attribute string name; readonly attribute string namespace; /* parse attribute string and create the corresponding element tree */ long parse(in DOMAttribute from, out DOMElement to); /* Traverse the element tree and create corresponding attribute string expression */ long construct(in DOMElement from, out DOMAttribute to); }; |
interface DOMParsedAttribute : DOMAttribute { attribute DOMElement fParsed; /* parsed attribute */ }; |
Discussed at Edinburgh ftf.
The consensus of the Working Group is that this would not be a wise change to the language. It amounts to an invitation to reinvent the SGML features DATATAG and SHORTREF features. It is our conviction that XML is a useful language for structured information, and if it is desired to make the internal structure of information explicit, XML may usefully be applied to the problem.
There is also no clear upper boundary on the complexity to be required of a microparser: if it can handle simple patterns, then why not regular languages? If it can handle regular languages, then why not context-free languages?
Formal response to commentator (originally sent 12 July 2000). Commentator not persuaded. Followup.
Can a schema author constrain values of the time-duration type to be measured only or at most in days? (E.g. by using the pattern facet?) Also: must all values which match the pattern facet be members of the value space of the base type?
Input from Michael Anderson <michael@research.canon.com.au>:
Michael Anderson <michael@research.canon.com.au> to XML Schema Comments list (and xml-dev) on Wed, 10 May 2000 11:01:32 +1000
Is it possible to define a new simple type to constrain the pattern of a time duration so that duration can at most be measured in days. For instance,
<xsd:simpleType name="MyTimeDuration" base="xsd:timeDuration"> <xsd:pattern value="-?P(\d+D)?(T(\d+H)?(\d+M)?(\d+(\.\d+)?S)?)?" /> </xsd:simpleType> |
According to the XML Schema working drafts, the pattern facet is allowed for timeDuration. My interpretation is that any pattern facet, once specified for a subtype of the timeDuration type, can further restrict the format of timeDuration provided that it is still of a valid timeDuration format. Is this correct? Furthermore does the parser need to check this? ie check that the pattern of MyTimeDuration is indeed still a valid pattern for timeDuration.
Cheers, Michael.
The XML Schema language suffices to meet the requirements of XML Signature. In particular, the content types (mixed, empty, elementOnly, textOnly) and the wildcard facility are useful.
Input from Joseph M. Reagle Jr. <reagle@w3.org>:
"Joseph M. Reagle Jr." <reagle@w3.org> to XML Schema Comments list on Wed, 10 May 2000 12:48:18 -0400
http://www.w3.org/Signature/2000/05/03-schema-review.html
The XML Signature WG thanks the XML Schema WG for their work and the opportunity to review the last call Working Draft [1]. This comment does not address the ease of implementation but only whether the functionality as specified meets our requirements. To that end, the last call specification easily meets our requirements. In particular, the content types (elementOnly | empty | mixed | textOnly) and the Wildecard Schema Component <ANY/> are very useful for dealing with mixed content scenarios which are common to the signature domain. In time, the type extension capabilities might be a useful feature in constructing other cryptographic (key and certificate) syntaxes but we are presently not employing these typing features.
Since the XML Signature specification should enter the W3C Recommendation and IETF Standard tracks soon, we ask that the schema WG give priority to the need for a stabilized syntax and for expediently advancing the schema specification towards Recommendation.
Joseph Reagle, on behalf of the XML Signature WG
[1]
Formal response to commentator.
replies "The decision is acceptable though as stated I need a little more help in understanding its effects."
Please give priority to stabilizing the syntax of XML Schema, and to speed in completing the Recommendation.
Input from Joseph M. Reagle Jr. <reagle@w3.org>:
"Joseph M. Reagle Jr." <reagle@w3.org> (on behalf of XML Signature WG) to XML Schema Comments list on Wed, 10 May 2000 12:48:18 -0400
Since the XML Signature specification should enter the W3C Recommendation and IETF Standard tracks soon, we ask that the schema WG give priority to the need for a stabilized syntax and for expediently advancing the schema specification towards Recommendation.
Formal response to commentator.
replies "The decision is acceptable though as stated I need a little more help in understanding its effects."
Should short definitions of the twelve types of components be added, for clarity and to assist readers? Should there be a clearer correspondence between the types of components and the headings in section 2.2?
Input from Joseph M. Reagle Jr. <reagle@w3.org>:
"Joseph M. Reagle Jr." <reagle@w3.org> to XML Schema Comments list on Wed, 10 May 2000 12:48:14 -0400
2.2 XML Schema Abstract Data Model
I could understand this chapter better if the 12 components listed somehow corresponded more closely to the 2.2.* section headings. Perhaps, a quick definition on each of the 12 components, or a move away from the "primary" and "secondary" and "helper" designations (towards others) if those terms aren't substantively used elsewhere.
Should the examples in the spec use consistent prefixes (e.g. to make relevant examples easier to find with grep)?
Input from Joseph M. Reagle Jr. <reagle@w3.org>:
"Joseph M. Reagle Jr." <reagle@w3.org> to XML Schema Comments list on Wed, 10 May 2000 12:48:14 -0400
Namespace Prefixes
When trying to understand the specifications, I frequently found myself bouncing between the primer, structures, and datatypes documents, frequently using find or grep facilities to find bits of examples. Using a consistent namespace prefix (xs: or xsd:) through all documents would be helpful.
Should the instance (xsi) namespace be more clearly related to the schema (xsd) namespace?
Input from Joseph M. Reagle Jr. <reagle@w3.org>:
"Joseph M. Reagle Jr." <reagle@w3.org> to XML Schema Comments list on Wed, 10 May 2000 12:48:14 -0400
2.6 Schema-Related Markup in Documents Being Schema-Validated
Could the Schema Instance namespace somehow relate to the Schema namespace? For instance, I'd find it easier to understand who defined the schema instance namespace with something like:
http://www.w3.org/1999/XMLSchema/Instance
http://www.w3.org/1999/XMLSchema#Instance
Should the DTD and schema for schemas be modified to make the system identifiers used more explicit (e.g. by using absolute URIs instead of relative URIs)?
Input from Joseph M. Reagle Jr. <reagle@w3.org>:
"Joseph M. Reagle Jr." <reagle@w3.org> to XML Schema Comments list on Wed, 10 May 2000 12:48:14 -0400
Appendix A (normative) Schema for Schemas
It would be useful for XML declarations to include more explicit declarations of DTD and schema locations. For instance:
<xml version='1.0'?> <!-- XML Schema schema for XML Schemas: Part 1: Structures --> <!DOCTYPE schema PUBLIC "-//W3C//DTD XMLSCHEMA 19991216//EN" "http://www.w3.org/1999/XMLSchema.dtd"> <schema xmlns="http://www.w3.org/1999/XMLSchema" targetNamespace="http://www.w3.org/1999/XMLSchema" blockDefault="#all" elementFormDefault="qualified" version="Id: XMLSchema.xsd,v 1.1 2000/04/06 13:51:05 aqw Exp" xsi:schemaLocation ="http://www.w3.org/1999/XMLSchema.xsd" > |
Defaults
The more explicit representation of default values in schema component definitions is useful. However, the many varied defaults can still be confusing, perhaps this could be simplified, or a table could be provided that includes all default values.
Should the calculation of default values for properties of schema components be simplified? (Alternatively, should a table showing all default values and the conditions under which they apply be provided?)
Input from Joseph M. Reagle Jr. <reagle@w3.org>:
"Joseph M. Reagle Jr." <reagle@w3.org> to XML Schema Comments list on Wed, 10 May 2000 12:48:14 -0400
Defaults
The more explicit representation of default values in schema component definitions is useful. However, the many varied defaults can still be confusing, perhaps this could be simplified, or a table could be provided that includes all default values.
Should Primer section 3 be revised to provide simpler, more explicit guidelines for schema authors (in cookbook style)?
Input from Joseph M. Reagle Jr. <reagle@w3.org>:
"Joseph M. Reagle Jr." <reagle@w3.org> to XML Schema Comments list on Wed, 10 May 2000 12:48:14 -0400
3. Advanced Concepts I: Namespaces, Schemas & Qualification
This topic (not necessarily the exposition) is difficult to comprehend with respect to both comprehending the concepts and as a potential source of validation errors in instances I create. Perhaps some guidelines such as, "If you want to create an instance that has no prefixes in children elements then X; if you want to create an instance ... Y" so readers can easily jump-start their own schema writing.
Some aspects of the Unique Particle Attribution (determinism) constraint appear to need clarification.
Input from Bob Schloss:
Bob Schloss <rschloss@us.ibm.com> to XML Schema Comments list on Wed, 10 May 2000 19:42:22 -0400
My understanding is that Unique Particle Attribution constraint is so that parsers do not have to lookahead.
If I define a complex type as follows:
<xsd:complexType name="ChoiceOfSequences1"> <xsd:choice> <xsd:sequence> <xsd:element name="a" type="typeOfA"/> <xsd:element name="b" type="typeOfB"/> <xsd:element name="d" type="typeOfD"/> </xsd:sequence> <xsd:sequence> <xsd:element name="a" type="typeOfA"/> <xsd:element name="c" type="typeOfC"/> </xsd:sequence> </xsd:choice> </xsd:complexType> |
is this permitted (a legal type definition)?
I could imagine the answer is yes, because a parser doesn't have to look ahead during parsing.
I could imagine the answer is no in the case where the equivalence classes of c and of b are not completely independent, but only because the first sequence requires that d follow b.
If this second thought is the one intended by the working group, shouldn't Structures appendix E say something about this under 'Unique Particle Attribution'?
Here is a different case:
If I define a complex type as follows:
<xsd:complexType name="ChoiceOfSequences2"> <xsd:choice> <xsd:sequence> <xsd:element name="a" type="typeOfA1"/> <xsd:element name="b" type="typeOfB"/> </xsd:sequence> <xsd:sequence> <xsd:element name="a" type="typeOfA2"/> <xsd:element name="c" type="typeOfC"/> </xsd:sequence> </xsd:choice> </xsd:complexType> |
is this permitted (a legal type definition)?
Does this depend on whether there is a common parent type for typeOfA1 and typeOfA2 other than the ur-type? (Since if there was, and the xsi:type attribute was not used on the a element in the instance document with a value of either typeOfA1 or typeOfA2, the parser would have to look ahead before determining which branch of the choice was being processed).
I think the general problem related to choices (which may contain choices, which may contain choices) which contain sequences.
Here is a third case:
If I define a complex type as follows:
<xsd:complexType name="ChoiceOfChoices3"> <xsd:choice> <xsd:choice> <xsd:element name="a" type="typeOfA1"/> <xsd:element name="b" type="typeOfB"/> </xsd:choice> <xsd:choice> <xsd:element name="a" type="typeOfA2"/> <xsd:element name="c" type="typeOfC"/> </xsd:choice> </xsd:choide> </xsd:complexType> |
is this permitted? I don't think it is, because a choice of choices is like flattening to one choice, and then there are 2 different types that can appear with element a. (Appendix E does rule out this if the user had it flattened, and in Section 5.7 on model groups, I think this is covered by Element Declarations Consistent). But this still seems to me that a "Also see..." in Appendix E should cover this case.
I guess I'm asking for clarification of these examples now, and also that Appendix E be more complete in the next spec working draft.
Various editorial suggestions.
Input from Susan Lesch <lesch@w3.org>:
Susan Lesch <lesch@w3.org> to XML Schema Comments list on Wed, 10 May 2000 19:41:36 -0800
These are just a few minor editorial comments on your Last Call draft, XML Schema Part 1: Structures "work in progress." Please feel free to ignore or use them as you see fit.
In the interest of advancing schemas in practice, perhaps in the Abstract or in Introduction section 1, you could identify your audience, encourage them, and (like MathML) explain that this is not a user's guide for the general public. This specification is carefully and beautifully done, but it was a mystery for me on one reading, even after reading the Primer.
Both Webster's (en-US) and the concise Oxford dictionaries list the "z" rather than the "s" form of these words first: normalisation, normalised, optimise, standardised, and characterisation. They could be changed to z's.
The text does not refer to most of the References (Cambridge Communique, DCD, DDML, ISO-11404, and so on). I'm not certain they need to be there, especially the old URI RFCs.
Though I didn't correct this here, apparently the use of "we" is frowned on in specifications. I don't yet have a proper reference for you. One reason is in the final paragraph of http://lists.w3.org/Archives/Public/www-xml-linking-comments/2000JanMa r/0079.html which explains that first person English is hard to translate.
muzmo.com and foo.com are registered domains. You could consider using example.com, example.net, and example.org which IANA registered for examples. (See RFC 2606 section 3 at http://www.ietf.org/rfc/rfc2606.txt.)
Minor typos (quotes are followed by a suggestion)
2.2.1.3 par. 1 with respect to particular simple type ==> with respect to a particular simple type
2.2.1.3 par. 2, list items 1 and 3 A restriction ==> a restriction
3.4 table - {content type}, and 3.4 par. 11, and 3.13 par. 7 I.e ==> i.e.,
3.7 par. 3 {compositor}determines ==> {compositor} determines
3.13 par. 11 . therein. ==> .
4.3.3 table {content type} 4.4.2.3 the the ==> the
5.1 list item 4 has a subheading 4 (that should perhaps be 4.1 or 1).
In 5.11, the first sentence is repeated in the third line
5.11 note in 1.1.6 It is trivially ==> It is trivial
6.3.1 par. 1 mime ==> MIME
6.3.2 list item 2 note, and 6.3.2 last par. recommendation ==> Recommendation
B. line 3 The the ==> The
B. line 172 exculsive ==> exclusive
D. HTML 4.0 Specification ==> HTML 4.01 Specification
RFC 1808,Relative ==> RFC 1808, Relative
RFC 1738,Uniform ==> RFC 1738, Uniform
RFC 2141,URN ==> RFC 2141, URN
XSchema c/xscspecv4.htm For more ==> c/xscspecv4.htm. For more
Input from ht@cogsci.ed.ac.uk (Henry S. Thompson):
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list on 11 May 2000 10:23:47 +0100
Thanks for your careful reading.
Susan Lesch <lesch@w3.org> writes:
These are just a few minor editorial comments on your Last Call draft, XML Schema Part 1: Structures [1] "work in progress." Please feel free to ignore or use them as you see fit.
In the interest of advancing schemas in practice, perhaps in the Abstract or in Introduction section 1, you could identify your audience, encourage them, and (like MathML) explain that this is not a user's guide for the general public. This specification is carefully and beautifully done, but it was a mystery for me on one reading, even after reading the Primer.
Good idea.
Both Webster's (en-US) and the concise Oxford dictionaries list the "z" rather than the "s" form of these words first: normalisation, normalised, optimise, standardised, and characterisation. They could be changed to z's.
I'll check my UK dictionary -- as a UK (naturalised :-) speaker, I've used UK spelling throughout.
The text does not refer to most of the References (Cambridge Communique, DCD, DDML, ISO-11404, and so on). I'm not certain they need to be there, especially the old URI RFCs.
Right.
Though I didn't correct this here, apparently the use of "we" is frowned on in specifications. I don't yet have a proper reference for you. One reason is in the final paragraph of http://lists.w3.org/Archives/Public/www-xml-linking-comments/2000JanMar/0079.html which explains that first person English is hard to translate.
But I hate the academic passive, which is the obvious alternative . . .
muzmo.com and foo.com are registered domains. You could consider using example.com, example.net, and example.org which IANA registered for examples. (See RFC 2606 section 3 at http://www.ietf.org/rfc/rfc2606.txt.)
Agreed.
Various editorial suggestions and requests for clarification.
Input from Curt Arnold <carnold@houston.rr.com>:
"Curt Arnold" <carnold@houston.rr.com> to XML Schema Comments list on Wed, 10 May 2000 23:56:29 -0500
Part 1/Section 3.4
A complex type for which {abstract} is true must not appear as the {type definition} of an ElementDeclaration...
Actually, it would seem that you would allow it, but it would require that the xsi:type specify a non-abstract type derived from the type used in the declaration.
Part 1/Section 4.3.7
Wildcards are subject to the same ambiguity.... If an instance element could match either an explicit particle and a wildcard, or one of two wildcards, within the content model of a type, that model is in error.
If I had version 1.0 of a schema and wanted to create a variant that would be forward compatible for a processing application (so that the processing application would accept 1.0 valid documents and later revisions), I'd be inclined just mechanically add <any> and <anyAttribute> elements, like:
<!-- definition in 1.0 schema --> <complexType name="pipe"> <element ref="material" minOccurs="0" maxOccurs="1"/> </complexType> <!-- definition in 1.0+ schema --> <complexType name="pipe"> <element ref="material" minOccurs="0" maxOccurs="1"/> <any minOccurs="0" maxOccurs="unbounded"/> <anyAttribute/> </complexType> |
Unfortunately, this would run into the Unique Particle Attribution issue and would be in error by my reading. In this simple case, it is fairly easy to rewrite the permissive complexType as:
<complexType name="pipe"> <any minOccurs="0" maxOccurs="unbounded"/> <anyAttribute/> </complexType> |
However, that could be much more difficult in complex real-life schemas. Some sort of lower priority for wildcard matches that would allow the first formulation while avoiding the attribute issue would be beneficial.
Constraint on Schemas: Particle Restriction OK (Elt:Elt - Name and Type OK)/Point 1.1
{nullable} are the same: wouldn't it be a valid restriction if the base type was nullable and the derived type inhibited xsi:null="true"
Section 5.11/Constraint on Schemas: Derivation Valid (Extension)/Point 1.1.2
Either I'm reading it wrong, or it is saying that you must fully repeat all the attributes defined in the base type in the derived type.
Point 1.2.2
So if I have:
<complexType name="base"> <anyAttribute/> </complexType> <complexType name="derived" base="base" derivedBy="restriction"> <attribute name="value" use="required"/> </complexType> |
Does this still allow any other attribute to appear, but value is required? If so doesn't that run into the Unique Particle Attribute issue.
Point 1.3:
Is this saying that if my base type definition has a required attribute, I have to repeat in a derived by restriction type?
Section 5.12/Constraint on Schemas:Derivation Valid
Point 1.2(.1?) This would seem to disallow adding new facets in the derived type
<simpleType name="derived" base="xsdt:string"> <!-- this facet isn't declared for string --> <pattern value="\d*"/> <simpleType> |
Should QName resolution seek first in the target namespace for unqualified names, then in the namespace of simple datatypes? If no change is made, should the current rules be stated more prominently (both in Structures and in Primer)? [Uncertain about specifics of proposal. -MSM]
Input from Curt Arnold <carnold@houston.rr.com>:
"Curt Arnold" <carnold@houston.rr.com> to XML Schema Comments list on Thu, 11 May 2000 00:08:22 -0500
Section 6.2.2: References to schema components across namespaces
The comments in the example are the only place that I have found that indicate how unqualified QName references are to be resolved. From a usability standpoint, it would be much preferable to that an unqualified name be first attempted to be located within the target namespace then with the datatypes (or schema namespace). This, of course, would be a more complex resolution than would be used for a unqualified tagname but seems to be consistent with previous usage.
If unqualified names are strictly going to be resolved this way, an explicit statement should be made prominently in both Primer and Part 1.
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list on 11 May 2000 10:26:34 +0100
They wouldn't be QNames in the meaning of the word if we did it that way. We didn't think it wise to invent a new concept almost-QName-but-not-exactly.
Commentator confirms he is satisfied on this issue ("with proper documentation").
Should XML Schema require that schema locations be declared before or above the elements which claim validity according to the schema in question?
Input from Curt Arnold <carnold@houston.rr.com>:
"Curt Arnold" <carnold@houston.rr.com> to XML Schema Comments list (and xml-dev) on Thu, 11 May 2000 01:19:19 -0500
Section 6.3.2: Point 4
Having declarations of schemaLocations anywhere in the document having document scope, of course, seriously complicates event-based validation. It would seem reasonable to require that schemaLocations appear before the need for the corresponding schema information.
Discussed in call of 2000-07-06.
The question is on the commentator's proposal to require that schemaLoc information appear 'before' occurrences of constructs from the namespace it relates to. The three known possibilities are:
[Discussion ...]
RESOLVED unanimously: to make this a priority feedback issue.
RESOLVED unanimously: to dispose of issue LC-116 by saying yes, some such restriction will be added.
There was some sentiment for scoping schemaLoc information in the same way that namespace declarations are scoped (i.e. specifying that it must occur 'above or in the start-tag of' any use of constructs in the namespace. The proponderance of opinion, however, was for specifying that it must occur above or to the left.
RESOLVED unanimously: to require that schemaLoc information, if it occurs, should appear above or to the left of any use of names from that namespace.
The sense of the word 'require' in the resolution just agreed to was discussed. Two formulations were offered:
The second formulation is an attempt to avoid making it an error, while still making clear that one-pass schemaLoc-aware processing must be possible. A straw poll showed a preponderance of support for the first formulation.
RESOLVED unanimously: to adopt the first formulation.
Commentator's answer suggests refinement.
Should the rules for locating schema components be modified to use public identifiers and otherwise improve the matchup among schemas, namespaces, and resources?
Input from Curt Arnold <carnold@houston.rr.com>:
"Curt Arnold" <carnold@houston.rr.com> to XML Schema Comments list (and xml-dev) on Thu, 11 May 2000 01:19:19 -0500
In general, I think locating schema resources has a couple of serious deficiencies.
First, there is not a one-to-one correspondence between namespaces and schemas. For example, the XHTML namespace has three distinct DTD's associated with it which are distinguished using public identifiers. There may also be successive versions of schemas for the same namespace.
Second, a single schema resource may contain many distinct (possibly tens if not hundreds) namespaces through inclusions. I believe the typical usage would be to have a single schema resource that would contain definitions for all the expected namespaces and then, occassionally, one or more additional schema resources for unanticipated namespaces. Having to enumerate all the namespaces that appear in a mega resource would get very long and prone to error.
Third, there is not a conflict resolution mechanism when a namespace has multiple schema locations are declared either implicitly (through an import within a schema) or explicitly through a schemaLocation attribute.
Fourth, there is not a mechanism to identify a schema resource to be used to validate an XML 1.0 (pre-namespace) compatible document.
It would seem the best approach would be to use public identifiers (fortunately having a rebirth of interest on xml-dev) to explicitly identify a specific schema resource instead of relying on an ambiguous combination of namespace and namespaceLocation to resolve whether a particular cached version of a schema is appropriate.
What I would suggest is that:
<?xsi:schema defaultNamespace PUBLIC pubid sysid ?> <?xsi:schema defaultNamespace SYSTEM sysid ?> |
When xsi:schemaPublic and xsi:schemaSystem appear on the same element, there must be a one to one correspondence between entries, so that if the second public identifier cannot be resolved, the second URI could be used to retrieve the resource. I'm assuming that there can be an acceptible mechanism for representing a null public identifier and a null URI.
When schema information appears for one namespace in multiple schema resources, the first appearance would be used for validation.
C. M. Sperberg-McQueen to IG, 5 July 2000
The commentator proposes that we introduce explicit support for public identifiers into the XML Schema language, and use them to address some of the puzzles which result from the current fact that any namespace may be formalized by more than one schema document; HTML is a good example.
I believe there is likely to be consensus in the WG that this is probably not a good idea.
If I had to provide a rationale, I'd say:
Discussed in call of 2000-07-06.
The question is on the commentator's proposal that we add facilities to relate namespaces and schemas to public identifiers.
There was no visible support for the proposal.
RESOLVED unanimously: to respond to the suggestion with a polite no, giving a rationale which is substantially that outlined in MSM's message of 5 July.
Commentator's reply, second reply, and third reply. Net: he is satisfied.
...
Input from Martin J. Duerst <duerst@w3.org>:
"Martin J. Duerst" <duerst@w3.org> to XML Schema Comments list on Thu, 11 May 2000 16:53:52 +0900
This review deals with the suitability of XML Schema to describe the constructs used in XML-DSig (Schema/DTD).
What is missing, and what I would hope the XML DSig group could contribute significantly to, is some kind of analysis e.g. with respect of what potentials and problems XML Schema offers with regards to describing data that can easily be signed. Two examples:
In other words, try to make sure that for appropriately designed XML Schemas, no additional 'data canonicalization' step is necessary to sign some data.
What do the DSig experts in this group think about such issues?
Input from Joseph M. Reagle Jr. <reagle@w3.org>:
"Joseph M. Reagle Jr." <reagle@w3.org> to XML Schema Comments list on Thu, 11 May 2000 13:00:22 -0400
At 04:53 PM 5/11/00 +0900, Martin J. Duerst wrote: Because the 'boolean' datatype has four lexical values (true, false, 1, 0; this is in the spec, no kidding) instead of two lexical values, that means that additional effort (at least) is necessary if somebody wants to create a schema for some data containing boolean values.
Martin, thank you for reminding me about this. I recall you've mentioned this before and I believe we had an agreement from Michael to do something about ensuring a consistent lexical representation of data types. I can't find a URL for that agreement (I think it was sometime last year) but I can find evidence that the WG was trying to satisfy that requirement (for floating points at least):
3.2.3 - 3.2.5 Lexical notation of floating-point numbers [Where the author requested other notations] This argument was made by several people but there was a strong sentiment for a single lexical representation. http://lists.w3.org/Archives/Public/www-xml-schema-comments/2000AprJun/0043.html
However, I'm not sure to what extent this is a problem (I'm expressing ignorance, not arguing it isn't). If an XML instance uses '0' that is signed, is there and expectation that since the schema permits a 'false' as well, intermediary processors would change it? I appreciate this might happen with character code mappings, but I tend to view schema's as constraints on permissible values, and not a processor (in the vein of infoset/C14N/DOM). (For instance, just because a schema permits an unconstrained string, one wouldn't presume it would change the string ...?)
I don't quite follow. Element of the same element type? Can you give an example?
Input from Martin J. Duerst <duerst@w3.org>:
"Martin J. Duerst" <duerst@w3.org> to XML Schema Comments list on Fri, 12 May 2000 16:38:15 +0900
However, I'm not sure to what extent this is a problem (I'm expressing ignorance, not arguing it isn't). If an XML instance uses '0' that is signed, is there and expectation that since the schema permits a 'false' as well, intermediary processors would change it?
Not necessarily, but that may well happen. We already see in the DSig group that people want to use the DOM, and don't want to keep around e.g. whether an attribute was single-quoted or double-quoted. As we move up the semantic ladder (well, it feels more like a very flat slope, but that's a different issue), exactly the same will very easily happen one step higher.
I appreciate this might happen with character code mappings, but I tend to view schema's as constraints on permissible values, and not a processor (in the vein of infoset/C14N/DOM).
Constraints on permissible values is one function, and probably the most important one the way the spec is written. But for datatypes, the 'infoset' aspect is already there, and C14N is what we are just discussing here, and would be very very easy to add at this point in time compared to having to start another group,... in a few months. Something like DOM is not done yet, but conversion from data to XML streaming and back is an important application of XML Schema, probably the most important one.
(For instance, just because a schema permits an unconstrained string, one wouldn't presume it would change the string ...?)
There is a clear difference between changing the value, and producing a different lexical representation for the same value. Changing from 0 to false is the later, changing the string is the former.
I don't quite follow. Element of the same element type? Can you give an example?
Well, let's assume you have a list of students, with student id, birthday, and a boolean for 'male' (gender). The task is to produce a signable XML document from this data. In order for the sign to be reproducable, the XML document has to be exactly the same for the same data. Assuming that the structure looks something like
<student> <id>... </id> <birthday>date</birthday> <male>boolean</male> </student> ... |
and is described as an XML Schema, the 'missing pieces' for the above task are to make sure the students are always in the same order (e.g. by id) and that date and boolean are always in a canonical form (and of course that the underlying XML is in C14N).
Probably the above is not the most appropriate example, but I hope you get the idea.
Input from John Boyer <jboyer@PureEdge.com>:
"John Boyer" <jboyer@PureEdge.com> to XML Schema Comments list on Fri, 12 May 2000 09:42:08 -0700
<martin> Not necessarily, but that may well happen. We already see in the DSig group that people want to use the DOM, and don't want to keep around e.g. whether an attribute was single-quoted or double-quoted. As we move up the semantic ladder (well, it feels more like a very flat slope, but that's a different issue), exactly the same will very easily happen one step higher. </martin>
<john> Actually, I'm pretty sure we would argue that if you want to schema normalize an XML document, then you would need another transform for that. Whether we define such a transform in this version of the spec is a decision of the chairs.
Input from Martin J. Duerst <duerst@w3.org>:
"Martin J. Duerst" <duerst@w3.org> to XML Schema Comments list on Sun, 14 May 2000 11:49:02 +0900
At 00/05/12 09:42 -0700, John Boyer wrote: <john> Actually, I'm pretty sure we would argue that if you want to schema normalize an XML document, then you would need another transform for that.
Can you give some examples for what you mean by 'schema normalize', i.e. a short document before and after normalization, or so?
What I'm saying is that with some rather minor tweaks to the current Schema drafts, it will be possible to easily write Schemata that will make sure that documents that validate against these Schemata will already be normalized and won't need any normalzation on the schema level anymore.
Whether we define such a transform in this version of the spec is a decision of the chairs.
How could such a transformation look?
Discussed in call of 2000-07-28.
The issue is a general exhortation to the XML Schema WG to make the schema language conduce to the creation of data that can easily be signed. No specific proposals are made, beyond use of single lexical representation; the DSig experts (i.e. Joseph Reagle) who participated in the discussion express a certain skepticism that multiple lexical representations are really a problem.
RESOLVED: to close LC-118 with thanks and cross-refer, for the substantive proposal, to LC-220.
A request for clarification on include
Input from gmacri@libero.it <gmacri@libero.it>:
"gmacri@libero.it"<gmacri@libero.it> to XML Schema Comments list on Thu, 11 May 2000 14:20:06 +0200
When I write a XML schema, for instance schema1.xsd, in which is included another schema, schema2.xsd, the elements's, attributes's and types's name of child of schema1.xsd must be not equal at elements's, attributes's and types's name of child of schema2.xsd?
Input from ht@cogsci.ed.ac.uk (Henry S. Thompson):
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list on 11 May 2000 15:19:27 +0100
That's correct, assuming you mean elements, attributes and types declared at the top level.
A variety of questions about the datatypes spec, in particular about time durations expressed in abstract or concrete months, years, etc.
Input from Ashok Malhotra:
petsa@us.ibm.com to w3c-xml-schema-ig@w3.org and others, Wednesday, May 03, 2000 2:39 PM
"Asir S Vedamuthu" <asirv@webmethods.com>@w3.org on 05/03/2000 01:36:48 PM
(1) <Snip> Section 3.3.8 integer http://www.w3.org/TR/xmlschema-2/#integer
Integer has the following constraining facets: precision, scale, .. </Snip>
Please explain the validation contribution of 'precision' & 'scale' if the {base type definition} is integer.
Integer is derived from decimal by setting scale=0 The PSV infoset for derived datatypes indicates all the facets of the base datatype and their fixed values.
(2) <Snip> Section 3.2.6 timeDuration http://www.w3.org/TR/xmlschema-2/#timeDuration
.. where nY represents the number of years, nM the number of months, .. </Snip>
<Snip> Section 4.2.6 - 4.2.9
.. if the {base type definition} is one of date and time related datatypes, then the value must be chronologically less than or equal to {value} </Snip>
'nY' - Does a year have 360 or 365 days?
'nM' - Does a month have 28, 30, 31, or 400 days?
Note: I scanned thru ISO 8601 and it does not say anything about it
The number of days in the month or year will depend on when the period occurs. We take the position that this is always known and, thus, there is no ambiguity but would like to see counterexamples. ISO8601 seconf edition, does say in 3.15 that "in certain applications a month is regarded as a unit of time of 30 days".
(3) <Snip> Section 4.2.1 length http://www.w3.org/TR/xmlschema-2/#dc-length Constraint on Schemas: length and minLength - it is an error for both length and minLength to be members of {facets} </Snip>
Is it ok if length and maxLength are members of {facets}?
No, if length is specified neither minLength or maxLength can be specified.
(4) <Snip> Sectin 4.2.6 maxInclusive http://www.w3.org/TR/xmlschema-2/#dc-maxInclusive Constraint on Schemas: It is an error for the value specified for minInclusive to be greater than the value specified for maxInclusive for the same datatype</Snip>
Is it ok if minExclusive to be greater than maxInclusive? This question also applies to Section 4.2.7 - 4.2.9
No.
(5) <Snip> Section 4.2.5 enumeration http://www.w3.org/TR/xmlschema-2/#dc-enumeration [value] a set of values from the value space of the {base type definition}</Snip>
Lets say I have a <simpleType/> A that restricts a {base type definition} B. In addition, <simpleType/> A has a set of <enumeration> values, say e1, e2, e3, .., en. Is the set {e1, e2, .. } of values form the value space of {base type definition} B or <simpleType/> A? Note: it is the synthesis of facet values which together determine the value space and properties of the datatype. Please clarify
The enumerated values must be from the value space of A.
Input from Asir Vedamuthu:
Asir S Vedamuthu <asirv@webmethods.com> to <w3c-xml-schema-ig@w3.org> and others, (date?)
Item 1
AM: Integer is derived from decimal by setting scale=0 The PSV infoset for derived datatypes indicates all the facets of the base datatype and their fixed values
If so, integer datatype has -
Item 2
AM: The number of days in the month or year will depend on when the period occurs. We take the position that this is always known and, thus, there is no ambiguity but would like to see counterexamples. ISO8601 seconf edition, does say in 3.15 that "in certain applications a month is regarded as a unit of time of 30 days".
However, 'period' is not a facet or constraining facet of timeDuration. Need more clarification.
Item 3, 4 & 5
Cool. Then item 3, 4 & 5 call for editorial corrections to Part 2 Datatypes spec.
Input from Ashok Malhotra:
Ashok Malhotra to w3c-xml-schema-ig-request@w3.org, Thursday, May 04, 2000 10:42 AM
[The number of days will depend on when the period occurs; this should always be known, no? If you have another use case, it would be useful to see it.]
Input from Asir S Vedamuthu:
Asir S Vedamuthu [mailto:asirv@webmethods.com] to petsa@us.ibm.com, Thursday, May 04, 2000 2:19 PM
Here is a simple use case -
<project> <title>Demolish I-95 and build it from scratch</title> <startDate>unknown</startDate> <duration xsi:type="duration">P30Y45M55D</duration> </project> |
&
<simpleType name="duration" base="dt:timeDuration"> <minInclusive value="P0Y40M25D"/> <maxInclusive value="P35Y53M22DT45H23M68S"/> </simpleType> |
Input from Ninggang Chen <nchen@webMethods.com>:
"Ninggang Chen" <nchen@webMethods.com> to XML Schema Comments list on Thu, 11 May 2000 11:26:34 -0400
According to ISO 8601, there are four ways to express a period of time. The timeDuration datatype uses the second way, which is expressed "in one or more specific components but not associated with any specific start or end". ([ISO 8601] section 5.5.1)
There is no facet in timeDuration specifying the starting point of the time period and as we can see from Asir's example that the starting point is not necessarily known. Therefore, I have a impression that the definition of timeDuration is incomplete. If we don't have an unambiguous definition of year and month, this data type is simply not usable.
Formal response to commentator. Commentator is unsatisfied.
Shall XML Schema be modified so that:
Input from Tim Berners-Lee <timbl@w3.org>:
"Tim Berners-Lee" <timbl@w3.org> to XML Schema Comments list on Thu, 11 May 2000 16:07:19 -0400
Comments on xml-schemas
Firstly, It is essential that important things be referenceable by
URI. It is much easier and safer to use a barenames
#id
xpointer reference than a complex xpointer
expression. Given that HTML element name and complex types are really
important concepts in a schema, and ones which people will want to
refer to from other languages and other schemas, please use type IDs
for that. (I note that unfortunately this cannot apply to attributes:
the creators of schemas will have to think of [an] appropriate ID and
give it explicitly if they want others to be able to use a barename
reference to refer to the attribute name.) It is I think worth
forcing complex types and element names to share the same space,
because little is lost (except for some confusion!) and one gains the
power of xml ID and barename xpointer references for both.
Secondly, I note that the schema spec used name=
for
element types instead of the more natural id=""
. I made
that mistake a long time ago with HTML <A name=>
... and it has been a thorn in everyone's side ever since! Please do
not make that mistake again!!!! Specs should use id=
for things of type ID. (I would be in favor of reserving and
predefining it in all namespaces myself, as it would allow an xpointer
reference to be followed without having to look up the DTD or schema,
and I feel that without that XML becomes unbearably complex)
The same applies to data types. If schema foo defines a
datatype bar then it really is too clumsy unless
foo#bar
is the datatype's URI.
Discussed at Edinburgh ftf.
After much discussion, agreed that this is an important basic problem. The WG didn't buy the proposed solution. We don't believe we have a good long term solution. We will in the short term explain better what is available and in long term will provide something that defines names for all important constructs.
Should the XML Schema for Datatypes be modified so that the has-facet and has-property elements (which occur within appinfo) (1) be assigned to a (documented) namespace, and (2) be given made first-class objects by being given clear names.
Input from Ralph R. Swick <swick@w3.org>:
"Ralph R. Swick" <swick@w3.org> to XML Schema Comments list on Thu, 11 May 2000 16:23:33 -0400
The xsd:appinfo schema component in the schema-for-schemas http://www.w3.org/TR/2000/WD-xmlschema-1-20000407/#element-appinfo appears to adequately address the request for a mechanism to permit an XML Schema schema document to hold declarations for mapping to other application data structures documented in item 3.2 of http://www.w3.org/TR/1999/NOTE-schema-arch-19991007.
I thank the XML Schema WG for including this feature. I also commend the WG for illustrating one usage scenario in the Schema for Datatype definitions http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#schema
My enthusiasm is tempered, however, by the lack of any guidance on how to discover the semantics of the two example elements has-facet and has-property. Having constructed all this wonderful machinery for extending the declarative power of an XML Schema, the sole example of its use fails to lead the way in applying that same machinery to the extension. has-facet and has-property do not themselves have first-class names, nor are they even clearly part of a namespace. I encourage the spec editors to lead by example in your application of appinfo.
Discussed at Edinburgh ftf.
Datatypes editors to provide the requested namespace and documentation for it and associate these elements with it.
Shall the schema for schemas and the DTD for schemas be changed to align their content models for the group element more closely?
Input from Curt Arnold:
"Arnold, Curt" <Curt.Arnold@hyprotech.com> to XML Schema Comments list on Thu, 11 May 2000 15:43:15 -0600
The narrative and schema for schemas allow groups to contain annotation, group, and any elements as children. The DTD only allows all, choice and sequence.
Discussed in call of 2000-06-30.
RESOLVED without dissent: to instruct HST to bring the DTD and the schema for schemas into alignment with each other as regards the content model for group, and to bring the content model for group into alignment with complexType.
[This issue has been split. See the cross references below for the location of the various parts.]
Cf. Arrays?
Cf. Provide guidance on extending schema for schemas?
Cf. Allow specification of size constraints in instance?
Cf. How do I restrict IDREFs to particular (element) types?
Cf. Streamline restriction of content models?
Cf. Simultaneous restriction and extension?
Cf. Re-align occurrence indications for elements and attributes?
Cf. Allow explicit specification of name import/export?
Input from Jane Hunter <jane@dstc.edu.au>:
Jane Hunter <jane@dstc.edu.au> to XML Schema Comments list on Fri, 12 May 2000 11:31:17 +1000
Here's MPEG-7's feedback to the current XML Schema Language WDs: http://archive.dstc.edu.au/mpeg7-ddl/issues.html
It's based on problems which have arisen during encodings of MPEG-7 Descriptors and Description Schemes. Examples of these can be found at: http://archive.dstc.edu.au/mpeg7-ddl/
In some cases there may be alternative methods for solving a problem, so we'd appreciate suggestions from the WG.
I'm going to be at WWW9 in Amsterdam next week so I'll be available to discuss any of these issues face-to-face then if any of the WG are interested.
Should the last-call comment period be extended to allow comment by ebXML?
Input from David RR Webber <Gnosis_@compuserve.com>:
David RR Webber <Gnosis_@compuserve.com> to XML Schema Comments list on Fri, 12 May 2000 03:16:54 -0400
Group.
I'm reporting here direct from the lobby of the ebXML meeting in Brussels.
Some significant decisions have been made this week, and it is very clear that ebXML is one of the major potential users of XML Schema.
It is also clear that several of the potentially mission critical features required are either missing (support for ISO11179 datatypes) or unclearly defined in the current Schema draft.
Also - ebXML is very focused on providing sustainable and low-cost business solutions that are scalable to a global level.
Looking at this there are just too many issues with the current W3C Schema draft to make it viable.
Simply put - if I were asked today by an MIS Manager if I would recommend W3C Schema in its current form as the basis for a major new implementation I would emphatically have to say - NO.
I would there ask that the current period for the review of this release candidate be extended at least another 4 weeks to allow time for discussion of the issues relating specifically to support of the ebXML requirements, and that we develop a set of metrics that we can measure the suitability-to-task in the context of maintainability and use in a global eBusiness environment.
I will work on a draft of these today and post these tonight when I get back to the USA.
Formal response. Commentator concurs.
Should the content model for complexType be changed to require an explicit grouping element (sequence, choice, or all) instead of supplying an implicit choice when it has particles as children?
Cf. Can XML Schema define XSLT?
Cf. Clarify minOccur/maxOccur defaulting?
Input from ht@cogsci.ed.ac.uk (Henry S. Thompson):
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list on 12 May 2000 10:21:11 +0100
Currently this is as follows:
<sequence> <choice> <element ref="facet" minOccurs="0" maxOccurs="unbounded"/> <group ref="particle" minOccurs="0" maxOccurs="unbounded"/> </choice> <group ref="attrDecls"/> </sequence> |
where 'particle' is
<group name="particle"> <choice> <element name="element" type="localElement"/> <element name="group" type="groupRef"/> <element ref="all"/> <element ref="choice"/> <element ref="sequence"/> <element ref="any"/> </choice> </group> |
I'd like to change this to
<sequence> <choice> <element ref="facet" minOccurs="0" maxOccurs="unbounded"/> <group ref="explicitGroup"/> </choice> <group ref="attrDecls"/> </sequence> |
where 'explicitGroup' is
<group name="explicitGroup"> <choice> <element name="group" type="groupRef"/> <element ref="all"/> <element ref="choice"/> <element ref="sequence"/> </choice> </group> |
I think the defaulting of the wrapper is messy, confusing and is getting in our way: the increased clarity in requiring a single <all>, <choice> or <sequence> outweighs the extra verbosity.
The one potential problem is that people will have to learn to write e.g.
<complexType name="paraContent" content="mixed"> <choice minOccurs="0" maxOccurs="1"> <element ref="emph"/> <element ref="strong"/> . . . </choice> </complexType> |
for backward-compatible mixed content.
Discussed in call of 2000-06-23.
Possible resolutions include:
content="elementOnly"
and when
content="mixed"
content = 'elementOnly'
, but no default when
content='mixed'
It was observed that if there is only one child, the fourth option will require that it be a one-item sequence or a one-item choice.
A straw poll showed no support at all for the status quo, and some
tolerance, but no preference, for making sequence the default for both
element-only and mixed content. As between the third and fourth
choices, there was approximately equal preference for each, but
greater toleration (more 'can-live-with' votes) for the fourth, on
which the chair accordingly put the formal question. RESOLVED:
to remove all defaulting in the association between XML representation
of content models and the schema component, and make the relevant
portion of the content model of complexType read (choice | all
| seq | grpref)?
Dissenting: Beech.
Rationale (supplied post hoc by chair): the conditional defaulting rules, though intended to simplify life for schema authors and readers by making the markup load lighter, have the opposite effect: they are confusing and simply make schema writing and reading more error-prone. Implicit grouping elements are (in the view of some though not all WG members) useful to some users, just as end-tag omission is. But like end-tag omission, their cost in confusion to new and occasional users outweighs their advantages to frequent users. Removing the defaults simplifies the spec, makes it unnecessary for readers to learn and understand the defaulting rules, and makes schemas easier to read and write and less prone to mistakes caused by misunderstanding the default rules.
Commentator confirms that the resolution of the issue is acceptable.
Shall the post-schema-validation info set have properties to carry information about schema-validation errors (i.e. to identify all the different ways a document, subtree, element, attribute, or other construct can fail to be schema-valid)?
Input from ht@cogsci.ed.ac.uk (Henry S. Thompson):
To: www-xml-schema-comments@w3.org From: ht@cogsci.ed.ac.uk (Henry S. Thompson) Date: 12 May 2000 10:24:21 +0100 Message-ID: <f5bsnvoi0je.fsf@cogsci.ed.ac.uk> Subject: Error logging in the infoset
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list on 12 May 2000 10:24:21 +0100
Pursuant to our decision in Berkeley to allow the editors the option to systematise and tabulate 'errors', the next internal draft will contain a table of ways to lose and explicit short identifiers for all of them. I'd like to add a post-schema-validation infoset property associated with element and attribute infoitems which records these when appropriate.
Discussed in call of 2000-06-15.
RESOLVED: to add a property to the PSV infoset with the characteristics described: if the subtree is not schema-valid, and the processor identifies the cause with one of the identified error codes, then the processor may optionally record the reason by placing the appropriate error code in the new property -- errors in the schema currently being used for validation are not addressed by this proposal, although an analogous property might be added to any schema-info-set that results from our action on LC-162 and LC-198. Dissenting: Beech.
Commentator confirms that resolution is acceptable.
Should XML Schema be modified so as to allow module users to redeclare types, named model groups, and named attribute groups, by providing the modified definition within the schema import or include element? This might make modules easier to use.
Input from ht@cogsci.ed.ac.uk (Henry S. Thompson):
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list on 12 May 2000 10:40:26 +0100
Someone recently observed to me that we don't explicitly rule out circular type definition. We certainly should, but I wonder if we should consider a carefully controlled exception to this, and to the no redefinition/redeclaration rule: You can redefine or redeclare imported or included components inside the include/import element, and only there. Furthermore, such redefinitions may be circular in the case of types, named model groups and attribute groups, and all such redefinitions are retrospective, i.e. they effect the included/imported components which reference those definitions.
Simple example
schema1.xsd
<schema xmlns='http://www.w3.org/1999/XMLSchema' targetNamespace='http:///www.example.com/stuff' xmlns:m='http:///www.example.com/stuff'> <complexType name="personName"> <sequence> <element name="forename"/> <element name="middle" minOccurs="0" maxOccurs="unbounded"/> <element name="surname"/> </sequence> </complexType> <element name="person" type="m:personName"/> <element name="record"> <complexType> <sequence> <element ref="m:person"/> <element name="position">...</element> ... </sequence> </complexType> </element> </schema> |
schema2.xsd
<schema xmlns='http://www.w3.org/1999/XMLSchema' targetNamespace='http:///www.example.com/stuff'> <include "schema1.xsd"> <complexType name="personName" base="m:personName" derivedBy="extension"/> <element name="genMark" minOccurs="0"/> </complexType> </include> </schema> |
Documents using schema2 for the http:///www.example.com/stuff would be able to have <genMark> as a daughter of <person> in their <record>s, whereas those using schema1 would not.
I certainly haven't thought through all the ramifications of this, but if we want to do anything at all to allow ourselves to reuse our own work, this is the best thing I've thought of.
Discussed in call of 2000-07-06.
The question is on the proposal to support light-weight modifications when including. HT has clarified that this is only for modifications, not for arbitrary replacements of the definition.
The chair proposed to discuss this for a few minutes, to see whether consensus exists; if not, he proposed, we should then take the discusion to email. The WG agreed.
The WG discussed the proposal without reaching any clear conclusion. One point of concern was the sense that a type derivation that does not generate a new type but modifies the old one in place is tricky and dangerous, and possibly inconsistent with our other practice. Some WG members suggested that this trickiness is inherent in the use case; others denied this claim.
Another point of concern was that we need to ensure that there is a clear definition of the rules for handling multiple includes.
It seems plausible to allow such in-place modification to apply to simple types, probably only for user-defined simple types. It is an open question whether changing to a list type should be allowed or not.
Discussed at face to face meeting of 1-2 August 2000.
Discussion had produced two variants of a modifying include, which (in the chair's analysis) varied in two ways:
In neither variant is the old type T available in any sense. In the resulting type lattice, the old type is pruned. Depending on where we end up on second point, we may or may not want to use the term 'include'. We do not currently have any mechanism to do this when the namespaces are different.
A straw poll showed a strong majority (16 to 4, with two abstentions) in favor of adopting some proposal similar to those on the table.
A use case was described, in order to clarify the namespace implications of the proposals. Imagine a user has a schema, using XHTML stuff, but wants to change so that XHTML elements in the user's documents have a certain extra required attribute. On the "must not be in same namespace" account, the user can do this. On the "must be same namespace" account the user can do it, but only by writing your own schema for XHTML namespace and importing it. A second use case is the one described to us by the definers of XHTML, who want to have conditions of the form "When you you have this module, you get this attribute." A third use case: defining v2 of XML Schema.
Some WG members urged caution: this is a piece of versioning, and we want to be careful not to get in way of a full story later. Some preferred to see the new type T derived from the old type T because it solves the use case while preserving important invariants with respect to derived types. It doesn't preclude us from going to the other option in due course; leveraging existing mechanisms is the right way to proceed cautiously. Other WG members felt that anything we do, including nothing, is risky. Some proposed to make it a priority feedback item.
A problem arose: If we derive the new type T from the old type T, we have said the old T effectively disappears. So if the old T was derived from Q by restriction and the new T is derived by extension, we have a problem: we can't go from Q to the new T in one step without losing our normal constraints on the abstract component set. The consensus appeared to be that an anonymous type identical to the old T, which is the type of nothing in the schema, continues to exist in the type lattice.
Question: if I had a type S derived from the old T, and I derive a new T such that type S breaks, what happens? A: Yes, it breaks, just as if you did it by cut and paste in original schema.
A straw poll showed a very strong preponderance of opinion (19 to one, with four abstentions) in favor of deriving the new type T from the old type T.
The WG discussed the namespace question. Some WG members preferred to require that the two schema documents apply to the same namespace, in part because the mechanism could not otherwise be used for developing complex languages like XHTML and in part because it felt "more honest" in owning up to the fact that the author of the second schema document is changing someone else's namespace. Others felt it was "more honest" if the namespaces were required to be different, and that the XHTML use case relies on bad practice, which should not be encouraged. Still others felt that it was wrong to assume (as both 'honesty' arguments do) that the authors of the two schema documents are different people, and that the modular design of XHTML represents good practice, not bad practice, and must be supported.
A straw poll showed majority support (12 to 4 to 5, with one abstention) for requiring that the two schema documents involved in a modifying-include relationship define the same namespace.
The WG then discussed what to call the resulting construct; proposals included include, patch, update, override, occlude, redefine, pollute, revise, and alter. The top three (alter, redefine, revise) were considered, and the WG eventually decided for redefine.
Resolved unanimously: to dispose of LC-128 by adding a 'redefine' element which can take type derivations as its children as originally proposed by HST, with the proviso that each type must be derived in this pseudo-circular way and the namespaces have to be the same? Including attribute groups and model groups.
Various editorial suggestions.
Input from Susan Lesch <lesch@w3.org>:
Susan Lesch <lesch@w3.org> to XML Schema Comments list on Fri, 12 May 2000 03:15:32 -0800
These are a few possible minor typos in your Last Call draft, XML Schema Part 2: Datatypes "work in progress." A section number is followed by a quote and then a suggestion.
In section 3, there are 17 occurrences of double spaces under various "Derived datatypes," for example, in 3.2.1.2. You could remove the 's or put the pair of terms on one line. <a href="#dt-built-in">built-in</a> <a href="#dt-derived">derived</a>
Status of this document - last par. recommendation [twice] ==> Recommendation
2.1 c) a set of facets [You might link to facet's definition the first time it is mentioned.]
2.4.1.4 par. 2 cardinality, there are ==> cardinality; there are [or] ==> cardinality. There are
2.4.1.12 par. 2 octect ==> octet
3.2 par. 1 derived from this the datatype ==> derived from this datatype
3.2.7 par. 2 ocurrence ==> occurrence
3.3.4.2 has an empty list item.
3.3.5 par. 1 fininte-length ==> finite-length
3.3.6 The value space of Name the set ==> The value space of Name is the set
3.3.8.1 a lexical consisting of ==> a lexical representation consisting of
3.3.15 This results in is the ==> This results in the
3.3.24.1, 3.3.25.1, 3.3.26.1, and 3.3.27.1 and an preceding "-" ==> and a preceding "-"
4. par. 1 present, optional ==> present; optional
4.2.12 note associated encoding. ==> associated with encoding.
5.1 table {variety} or any of its any ancestor [Sorry I didn't understand that phrase.]
5.1.2 par. 1 dataype ==> datatype
5.2 par. 1 specificing [Not sure. Do you mean specifying?]
5.2.10 example and only allow whole cents. ==> and only allows whole cents. [assuming "application" is the subject of the sentence]
5.2.12 par. 1 is a encoding ==> is an encoding
5.2.13 example Is "HMO" (Health Maintenance Organization) an internationally understood acronym? If not you might say "health insurance company."
A. and B. line 6 need closing parentheses followed by an ending period after "just in case". To be a little more formal, you could end the sentence after "entity expansions)."
A. lines 28 and 30 builtin ==> built-in
B. line 26 Customisation ==> Customization
E. note after table 3 it it logicall ==> it is logically
E.1 par. 11 chacters ==> characters
E.1.1 par. 4 and par. 8 compliment ==> complement
E.1.1 par. 8 Suppliment ==> Supplement
[This issue has been split. See the cross-references below for the location of the various parts.]
Cf. Make schema for schemas open?
Input from Roger L. Costello <costello@mitre.org>:
"Roger L. Costello" <costello@mitre.org> to XML Schema Comments list on Fri, 12 May 2000 08:51:46 -0400
The below comments summarize the points made in our white paper: <http://www.xfront.org/EvolvableSchemas.html>
Please read the white paper to obtain the full context of the comments.
How do I specify that the elements in a group can occur in any order? Should XML Schema allow this not just at the top level but at any level?
Cf. Allow arbitrary order with occurrence > 1?
Input from Dan Rupe <Dan_Rupe@go.com>:
Dan Rupe <Dan_Rupe@go.com> to XML Schema Comments list on Fri, 12 May 2000 08:21:18 -0700 (PDT)
I would like to be able to specify the opposite of <sequence> (unsequential) within complexTypes and groups.
Consider the following example in a schema:
<complexType name="A_complexType"> <element name="B" type="int" /> <element name="C" type="int" /> </complexType> <element name="A_complex" type="A_complexType" /> |
In my instance document, I would simply like <C> to be able to come before <B>, for example.
Or, in this example schema snippet:
<complexType name="A_complexType"> <element name="B" type="int" minOccurs="0" maxOccurs="unbounded"/> <element name="C" type="int" minOccurs="0" maxOccurs="unbounded"/> </complexType> <element name="A_complex" type="A_complexType" /> |
I would like to able to specify that any number of <B>'s and <C>'s can appear any number of times and in any order. Like this:
<C>1</C> <C>2</C> <B>3</B> <B>4</B> <C>5</C> <B>6</B> <C>7</C> etc... |
It's my understanding that the order of elements declared within a complexType in the schema must appear in that same order in the instance document. (Of course a minOccurs with a value of zero can make an element optional, but following elements must still appear sequentially in the instance document.) The Primer also mentions that a named group is <sequence> by default.
It seems the <all> element may be an attempt to provide unsequential capabilities, but I'm not sure if this is what <all> really provides. (I can't really tell for sure because XML Spy has not implemented it yet. Are there other validators out there that have?) If it does, it's use seems limited since it can only be used at the top-level of any content model, and the its children must be simple elements, not contained groups.
Since <sequence> is the default behavior of complex types and groups, I suggest one of two possible scenarios:
Should the XML Schema spec specify URIs which can/should be used for references to the built-in datatypes and facets? (Even in the absence of a generally satisfactory way to construct a URI for arbitrary schema constructs.)
Input from Ralph R. Swick <swick@w3.org>:
"Ralph R. Swick" <swick@w3.org> to XML Schema Comments list on Fri, 12 May 2000 13:01:09 -0400
The comments in Schema Structures, Section 4.2.1, regarding the use of xpointer to identify schema components do not go quite far enough to permit broad reuse of important features defined by XML Schema. http://www.w3.org/TR/2000/WD-xmlschema-1-20000407/#section-References-to-Schema-Components-from-Elsewhere
Specifically, after lengthy discussion and deliberation the RDF Schema WG was persuaded to defer all base datatyping specifications until after an XML Schema specification was completed so as to be able to leverage any base datatypes from XML.
It will be a significant benefit to the wider XML application community if canonical URIs were defined for each of the built-in datatypes as well as for each the specified facets.
This request is certainly a subset of the broader request for an algorithm to derive a full URI for every Schema component from the namespace URI and the local name (and, perhaps the type or symbol space). I want to emphasize that even if the general case remains unaddressed in version 1, it is important and useful to address the specific case of the built-in datatypes. This partially addresses bullet 3.9 of http://www.w3.org/TR/1999/NOTE-schema-arch-19991007.
Input from Perry A. Caro <caro@Adobe.COM>:
"Perry A. Caro" <caro@Adobe.COM> to XML Schema Comments list on Fri, 12 May 2000 12:12:08 -0700
It will be a significant benefit to the wider XML application community if canonical URIs were defined for each of the built-in datatypes as well as for each the specified facets.
Hear, hear! We emphatically agree with this request. Lack of reliable URI "id's" for the built-in datatypes and specified facets would be a huge obstacle to wide adoption and interoperability. This cannot be overstated.
It will be hard enough to implement compliant, interoperable, real-world processors--human error and the variety of character and URI encodings being what they are--without also having to worry about how to disambiguate the names of these vital items. That being the case, also consider:
To be sure, these last two items are mere wishes compared to the overriding requirement for canonical URI's of any shape or form to be part of the normative matter of the spec.
Input from ht@cogsci.ed.ac.uk (Henry S. Thompson):
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list on 13 May 2000 10:24:58 +0100
In principle, which would you prefer:
Discussed at Edinburgh ftf.
There is some lack of clarity on what should be pointed at in the case of facets. The has-facet in the appinfo on the particular types? Or the element declaration defining a particular facet?
ACTION: HST is respondant for 133. Check with Ralph about what he meant.
Do we have a preference? For facets mostly prefer pointing at the element declaration defining a particular facet. There is a preference for pointing to declarations in the schema for schemas on the grounds that it is more likely to be useful, but a strong minority prefers pointing to the specification text instead as pointing at the schema-for-schemas sets a precedence for the future that may be at odds with out long term direction.
Resolved: to resolve this issue by supplying IDs and stipulating what the URI should be. Dissenting:: Mendelsohn: Against putting ids on facet elements in the schema for schemas because: (1) this suggests that putting id's on declarations is the right way to name abstractions (2) if it were, then I think an id on an element declaration should be for that element, not for the abstraction defined by that element), Beech, Thompson, Gudgin, Ezell. Abstaining: Shannon
It was pointed out that it would be useful to have a docinfo backpointer to the relevant parts of the specification in the schema for schemas. QUESTION: Shall we instruct the editors that the schema for schemas should point to the relevant part of text using an annotation? Preference (11) but not binding majority.
The rules for specifying whether or not namespace prefixes appear in document instances on elements declared local to some complex type appear to change the rules for processing default namespace names in very profound ways. Should this be clarified, or fixed?
Cf. Local declarations: less or more
Input from Ralph R. Swick <swick@w3.org>:
"Ralph R. Swick" <swick@w3.org> to XML Schema Comments list on Fri, 12 May 2000 16:30:53 -0400
The control over namespace qualification provided by form and elementFormDefault appear, if I understand them correctly, to change default namespace processing in a very fundamental way. http://www.w3.org/TR/2000/WD-xmlschema-1-20000407/#declare-element
It appears that the default case,
'elementFormDefault="unqualified"
' would change what a
processor currently understands to be the namespace of some
unqualified elements within the scope of some parents. To understand
which unqualified elements are bound to which namespaces requires
schema processing.
This seems to make default namespace declarations unuseable, (or, conversely, to make schema processing a requirement rather than an option). Neither conclusion feels good to me. I hope I'm fundamentally misunderstanding, Schema Structures section 4.3.2 and that the statement in the final paragraph of Schema Primer section 3.1 is inaccurate. http://www.w3.org/TR/2000/WD-xmlschema-0-20000407/#UnqualLocals
Input from Noah_Mendelsohn@lotus.com:
Noah_Mendelsohn@lotus.com to XML Schema Comments list on Fri, 12 May 2000 17:16:15 -0400
I understand your confusion, but I think you have misinterpreted the schema specification. The namespace qualification or lack thereof for any element in a document to be validated is established according to the rules of the Namespaces Recommendation, and does not change during the course of schema validation. Indeed, validation is defined as an operation on an infoset, so all the mechanisms of namespace defaults and so on are applied before schema validation begins.
I think your confusion is about the case where we have an instance document that looks like this:
<n:a xmlns:a="uria"> <!-- element b is neither prefixed nor qualified --> <b>5</b> </n:a> |
First of all, neither XML 1.0 nor namespaces itself introduces any notion of local scoping for elements. Therefore, it is not surprising that we cannot tell whether the declaration for <b> is locally scoped. Whether you use qualified or unqualified forms, local scoping is always an artifact of schema processing, never fundamental to the instance.
Furthermore, and I think this is your point of confusion, even if the declaration for <b> is locally scoped within the declaration for <n:a>, <b> itself is still not considered to be in a namespace. We cannot change that; the namespaces recommendation says it is not. So, the namespace URI in the infoset for the element (or lack of namespace URI in this case) is not changed by validation. What does happen as a result of the validation is that additional information is added to the infoset, and from that information you can make some determination as to the actual declaration for <b>. So, it is not <b> itself which appears in a namespace, but <b> is associated during validation with definitions and declarations which may well be in the target namespace "uria".
Note too that there is a direct analogy to the treatment of attributes per the namespaces recommendation: the typical attribute appears unqualified, but has a definition scoped to a possibly namespace qualified element on which it appears (you know it is scoped that way because two attributes named 'atr' appearing on two different elements in the same namespace can have two different default values. I don't think anyone disputes this. The namespaces recommendation makes clear that such unprefixed attributes are not specifically in the namespace of the element on which they appear, although a non-normative concept of namespace partition is discussed. Indeed, this analogy is often cited by proponents of 'elementFormDefault=qualified'.)
Now, having prepared this glib explanation, I will admit to one point of discomfort on my part. I had expected that we had put into the augmented infoset a contribution indicating the element declaration for <b>, from which you could surely determine its local scoping. I am a little surprised to see that we only supply the type, which in the example above may be something as simple as integer.
Note that this entire discussion has a direct analogy in the case
where 'elementFormDefault="qualified"
'. You still do
not know whether the element is locally scoped until you check the
schema, and namespaces are still assigned exactly according to the
rules of the namespaces recommendation.
So, I think the fundamental design is sound. I would like to hear from other members of the schema workgroup whether we actually have the infoset contribution quite right. Henry?
Input from Noah_Mendelsohn@lotus.com:
Noah_Mendelsohn@lotus.com to XML Schema Comments list on Fri, 12 May 2000 17:27:59 -0400
Indeed, this analogy is often cited by proponents of 'elementFormDefault=qualified'.
Ooops. Of course I meant:
Indeed, this analogy is often cited by proponents of 'elementFormDefault=unqualified'.
Formal response to commentator sent 25 May and 29 June; commentator has not responded.
Should XML Schema provide support for typed symbolic constants?
Input from Steven Goldfarb <Steven.Goldfarb@cern.ch>:
Steven Goldfarb <Steven.Goldfarb@cern.ch> to XML Schema Comments list on Fri, 12 May 2000 14:17:02 -0700
We would like to request a modification to the current working draft of the XML Schema, Working Draft 7 April 2000. Specifically, we are interested in the implementation of a mechanism for the usage of symbolic constants in XML Schema.
Efforts have recently begun in the High Energy Physics community to use XML to describe the geometry of our detectors. Several languages have already been developed toward this aim and we have recently begun work toward merging our efforts into a standard.
Our geometrical description of a detector involves the construction of a complex structure from simple components in an iterative manner. We create elementary solids and position instances of them in space. These actions require the entry of explicit values for the dimensions of the solids and the coordinates of the positions. In our current model, we store these values as attributes. This involves the entry of thousands of values to describe these complex detectors, making it essential to avoid data repetition.
Ideally, we would like to be able to define a symbolic constant in the XML implementation which could be referenced throughout the document. The constant therefore needs to have a type, a name, a value and possibly a unit. In our case, we would include the constant as an attribute or as element content. Most likely, this implies the need for a mechanism to differentiate between a symbol and plain text when referencing.
To give an example for the XML implementation, we would like to be able to describe a piece of geometry like:
<constant name="chamber_width" value="105.254" unit="mm" /> <constant name="chamber_length" value="210.508" unit="mm" /> <box name="Upper Chamber" X="$chamber_width" Y="$chamber_length" Z="32" material="Lead" /> <box name="Lower Chamber" X="$chamber_width" Y="$chamber_length" Z="14" material="Copper" /> |
In this example, the parser would replace the name of the symbol with the value of the symbol. It is important that this also support type checking, i.e. in the example above, the attributes X, Y and Z should all be defined as 'double'.
We realise that this feature is not implemented for XML and we hope that this will be revised in the future. However, XML schema could benefit from this by facilitating the creation of default values. In the current XML Schema working draft, the default values must be typed explicitly, and cannot be a reference to a previously defined constant.
During the course of our discussions with other experts we found that the 'Express' language contains a similar mechanism to handle symbolic constants, as well as being able to use arithmetic expresssions. Clearly, also this community would benefit from our request.
Discussed in call of 2000-06-29.
Several positions could be distinguished (not mutually exclusive):
A straw poll showed little support for adding the constructs, and substantial support for the view that this is not functionality essential for XML Schema 1.0. The various workarounds had varying degrees of support: most for entities, a lot for union construction, and some for identity constraints.
RESOLVED: to dispose of issue LC-136 by saying no, mentioning the various partial or possible workarounds suggested, and saying this does not seem to be functionality essential to version 1.0.
How do I represent, in a schema, the equivalent of the DTD notation
<!ELEMENT P (#PCDATA|a|b|c)*>
?
Input from nchen <nchen@webMethods.com>:
"Ninggang Chen" <nchen@webMethods.com> to XML Schema Comments list on Fri, 12 May 2000 17:38:17 -0400
In the schema spec, it says "1.2.4 If the {content type} is element-only or mixed, the sequence of the element information item's element information item [children], if any, taken in order, is schema-valid with respect to the {content type}'s particle, as defined in Element Sequence Valid (Particle) (§3.8)" ( http://www.w3.org/TR/xmlschema-1/#Complex_Type_Definition_details).
So how do we represent <!ELEMENT P (#PCDATA|a|b|c)>
in schema?! The order and number of child elements are not
constrained in XML 1.0, and therefore many industry users have already
used this model in their documents. How can they easily migrate these
documents to schema? Or should we suggest them using DTDs for old
documents but using schema when create new documents?
Input from Dan Connolly <connolly@w3.org>:
Dan Connolly <connolly@w3.org> to XML Schema Comments list on Fri, 12 May 2000 17:25:03 -0500
nchen wrote: ... So how do we represent <!ELEMENT P
(#PCDATA|a|b|c)>
in schema?! The order and number of child
elements are not constrained in XML 1.0
I believe you are mistaken. That syntax is not legal in XML 1.0. There is a similar syntax that has the property that the order and number of children are not constrained:
<!ELEMENT P (#PCDATA|a|b|c)*> |
(note the *).
This has a straightforward analog in XML Schema:
<element name='P'> <complexType content='mixed'> <choice minOccurs='0' maxOccurs='unbounded'> <element ref='t:a'/> <element ref='t:b'/> <element ref='t:c'/> </choice> </complexType> </element> |
[This issue has been split. See the cross-references below for pointers to the locations of the parts.]
Cf. Add PSV infoset properties for keyref info?
Cf. Default equivclass blocking?
Cf. How to deal with nested imports?
Cf. An API needed for the PSV info set?
Cf. Add items to PSV infoset for schemas?
Input from Jim Trezzo <jtrezzo@us.oracle.com>:
Jim Trezzo <jtrezzo@us.oracle.com> to XML Schema Comments list on Fri, 12 May 2000 16:03:28 -0700
Oracle Comments on XML Schema Last Call:
Here are the comments we have collected within Oracle.
David Beech and Jim Trezzo
[This issue has been split. See the cross references below for pointers to the locations of the parts.]
Cf. Drop xsi:type?
Cf. Drop xsi:null?
Cf. Align simple and complex types more fully?
Input from Philip Wadler <wadler@research.bell-labs.com>:
Philip Wadler <wadler@research.bell-labs.com> to XML Schema Comments list on Fri, 12 May 2000 19:08:51 -0400
Simplifying XML Schema
The current Schema proposal is complex. Programmers have shown a remarkable ability to put up with complexity, but we do not yet know whether the XML community will be so forgiving. We would like to suggest that it is possible to greatly simplify XML Schema, while not unduly limiting its power. Indeed, some of the suggestions below would both simplify Schema and extend its power at the same time.
We are also asking the XML Query working group to support changes along these lines, but in writing this letter we are not acting as representatives of XML Query.
Yours sincerely,
Philip Wadler, Lucent
Jerome Simeon, Lucent
Mary Fernandez, AT&T
Discussed and resolved at the face to face meeting 1-2 August 2000.
On the table were the MSL proposal from Matthew Fuchs and a proposal for modularizing the XML Schema language, from Rick Jelliffe.
In discussion, some WG members suggested that a large part of what is perceived as complexity in the design is complexity of the documentation: implementors appear not to be having trouble implementing the language, or understanding from the document what implementations need to do. End-users and prospective schema authors, on the other hand, are reporting trouble reading the spec.
The language might be modularized, suggested some, by defining validation and validation-related conformance without speaking about the schema-document to abstract-component mapping; some implementors report that their conversion code for reading schema documents and building components is much larger than their validation code. Such modularization would also address the desires of those who would like to write DTDs in an XML instance syntax without all the other features of XML Schema. Modularization of this kind, however, would take several FTE months of work. Some WG members argued against Jelliffe's modularization proposal on the grounds that it proposes to change the language from the ground up, and it is now much more important to get the spec out and get user feedback instead of sitting in a room and thinking about it.
Several WG members spoke in favor of MSL as providing the beginnings of a formal model which accounts for all of XML Schema. Some objected to its elimination (or apparent elimination, near elimination, or apparent near elimination) of the tag/type distinction, but confirmed that its formalism seemed more accessible to implementors in their organizations. This surprised some WG members, who said that the coders in their shops were not, in general, happy to see specifications formalized in first-order predicate calculus. Some WG members suggested that the apparent simplicity of MSL resulted from the fact that it does not, as presented, actually cover the entire design; if the necessary missing details are added, readers could find MSL just as complex as the current draft.
Several WG members said they would like to rewrite large portions of the spec, but would not like to change the design and would prefer to make the editorial changes during CR rather than postponing CR.
The WG agreed to make it an exit criterion for leaving the Candidate Recommendation phase that there be a formalization of XML Schema which could accompany or be included in the specification. Editorial work on making the prose clearer will also proceed during the CR period.
Various comments from the point of view of the X3D Consortium.
Cf. Allow specification of size constraints in instance?
Input from Don Brutzman <brutzman@nps.navy.mil>:
Don Brutzman <brutzman@nps.navy.mil> to XML Schema Comments list on Fri, 12 May 2000 20:35:59 -0700
The following comments are based on document review and known DTD shortfalls. We have only done preliminary examinations of schema instantiations for the X3D tagset. Summary: looks good.
References:
1. The Extensible 3D (X3D) specification makes heavy use of float, double and integer lists. The list support for float/double/integer appears useful and usable.
2. X3D lists of floats/doubles/integers are often lists of 2-tuples, 3-tuples or 4-tuples. Such typing is commonplace for 3D graphics (e.g. translations are 3-tuples, orientations are axis-angle 4-tuples). Regular-expression patterns will let us express these relationships (hopefully without redefining the numeric base types, not yet sure). No draft schema appears in the current SVG draft - pertinent examples are welcome.
A helpful facet might be to specify the tuple-ordinality of a list type, so that only appropriate multiples of the typed data are allowed. Please be aware that wrapping such X3D tuples in their own type tags has been considered, but is impractical due to unneccesary redundancy and the extremely large volumes of numeric data involved in many scenes.
...
4. Thanks for this excellent work. If further issues are identified during eventual production of the X3D schema, we'll report them back.
An example of a needed kind of extension / extensibility.
[N .B. This issue number has been opportunistically recycled.]
Input from Don Brutzman <brutzman@nps.navy.mil>:
Don Brutzman <brutzman@nps.navy.mil> to XML Schema Comments list on Fri, 12 May 2000 20:35:59 -0700
3. A further feature X3D might use is the ability to identify & evaluate further numerical constraints to be placed on data. For example, Normal (i.e. perpendicular vector) 3-tuples should have unit magnitude. It appears language-specific data validation will be needed to check such cases, probably corresponding to named simpleType or complexType definitions. So Schema capabilities at least appear sufficient to enable such checking independently, even if not supporting it directly.
4. Thanks for this excellent work. If further issues are identified during eventual production of the X3D schema, we'll report them back.
Can ref attributes refer to constructs not at the top level?
Cf. May components not at the top level be named?
Input from gmacri@libero.it <gmacri@libero.it>:
"gmacri@libero.it"<gmacri@libero.it> to XML Schema Comments list on Sun, 14 May 2000 16:11:28 +0200
When I write a XML schema as this that follow:
<schema xmlns="http://www.w3.org/1999/XMLSchema" targetNamespace="http://www.somewhere.org/BookCatalogue" xmlns:cat="http://www.somewhere.org/BookCatalogue"> <element name="BookCatalogue"> <type> <element name="Book" minOccurs="0" maxOccurs="*"> <type> <group ref="BookElements"/> <attribute name="Category" minOccurs="1"> <datatype source="string"> <enumeration value="autobiography"/> <enumeration value="non-fiction"/> <enumeration value="fiction"/> </datatype> </attribute> <attribute name="InStock" type="boolean" default="false"/> <attribute name="Reviewer" type="string" default=""/> </type> </element> </type> </element> <group name="BookElements" order="seq"> <element name="Title" type="string"/> <element name="Author" type="string"/> <element name="Date" type="string"/> <element name="ISBN" type="string"/> <element name="Publisher" type="string"/> </group> </schema> |
the group's "ref" attribute must always refer to a group that is child (top level) of schema ?
Input from ht@cogsci.ed.ac.uk (Henry S. Thompson):
ht@cogsci.ed.ac.uk (Henry S. Thompson) to XML Schema Comments list on 14 May 2000 20:57:40 +0100
"gmacri@libero.it"<gmacri@libero.it> writes:
the group's "ref" attribute must always refer to a group that is child (top level) of schema ?
Yes.
In general (except for local element and attribute declarations, that is), named things must occur at the top level.
Without exception, only things named at the top level can be 'ref'ed.
Should the spec be revised to provide clearer guidance on recommended methods for extending the schema for schemas and adding new constructs to satisfy requirements not covered by the spec? In particular, should the behavior of a conforming processor in the presence of attributes and elements from outside the schema namespace be described?
Cf. Using appinfo annotations to store integrity constraints
Cf. XML Schema considered inadequately extensible
Input from Jane Hunter:
2. Extensibility Issue
MPEG-7 requires clarification of:
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§2.2.6 Annotation Components
While it certainly makes sense that the Schema spec would not define the annotation mechanism, I would have hoped that the DTD design (and the Schema for Schemas) would make it easy to extend. Perhaps rather than (from part2.dtd):
<!ELEMENT %annotation; (%appinfo; | %documentation;)*> |
this could be declared as:
<!ENTITY %annotation.content; "(%appinfo; | %documentation;)*"> <!ELEMENT %annotation; %annotation.content; > |
although I'm not sure how I'd modify the DTD or Schema for Schemas
if I wanted to say, insert a <xhtml:div>
here.
Should XML Schema be modified to allow the specification of size constraints (e.g. for a series of elements representing an array, or for a list of tokens representing an array)? If so, should the ability to specify size constraints in the instance always be available, or be available only if the schema author calls for it?
Cf. X3D-related comments on Schema datatypes
Input from Jane Hunter:
3. Parameterization of Array and Matrix Sizes
We would like to define the size of lists (arrays and matrices) at the time of instantiation. In the example below we suggest a valuePar construct which gives the name of the attribute whose value will be used for the facet. The attribute data type must match the facet data type. This example is currently problematic because facets only apply to simple types and attributes can only be added to complex types. The other issue is whether VectorI is being restricted or extended?
<simpleType name="listOfInteger" base="integer" derivedBy="list"> <complexType name="VectorI" base="listOfInteger" derivedBy="extension"> <length valuePar="Size"/> <attribute name="Size" type="nonNegativeInteger" use="required" /> </complexType> |
Input from Don Brutzman <brutzman@nps.navy.mil>:
Don Brutzman <brutzman@nps.navy.mil> to XML Schema Comments list on Fri, 12 May 2000 20:35:59 -0700
2. X3D lists of floats/doubles/integers are often lists of 2-tuples, 3-tuples or 4-tuples. Such typing is commonplace for 3D graphics (e.g. translations are 3-tuples, orientations are axis-angle 4-tuples). Regular-expression patterns will let us express these relationships (hopefully without redefining the numeric base types, not yet sure). No draft schema appears in the current SVG draft - pertinent examples are welcome.
A helpful facet might be to specify the tuple-ordinality of a list type, so that only appropriate multiples of the typed data are allowed. Please be aware that wrapping such X3D tuples in their own type tags has been considered, but is impractical due to unneccesary redundancy and the extremely large volumes of numeric data involved in many scenes.
Discussed in call of 2000-06-29.
RESOLVED: to dispose of this issue by saying no (with rationale). Dissenting: Jelliffe (It's useful and should be included.) Abstaining: Connolly
How does a schema author use key constraints to specify that a value (which otherwise behaves like an SGML or XML ID) is restricted to pointing at one (or more) particular element type(s)?
Input from Jane Hunter:
4. Typed References
MPEG-7 requires 'typed references' or the ability to constrain IDs and IDREFs to particular elements:
<element name="SummaryDS"> <complexType>.....</complexType> </element> <attribute name="SummaryDSRef" type="IDREF" refType="SummaryDS"/> |
Should XML Schema be modified to allow complex types to be derived from other types not only by restriction or extension but also by simultaneous restriction and extension?
Cf. Streamline restriction of content models?
Input from Jane Hunter:
5. Derivation Issues
...
5.2 Combined Restriction and Extension in One Step
The current XML Schema WD allows a new complex type to be derived from an existing one either by restriction or extension but not both at the same time. Nevertheless, it is quite common for a new type to both restrict and extend a base type. The constraint imposed by the current XML Schema WD means that such type derivation has to go through two steps: a restriction followed by an extension or vice versa. This will create a large number of dummy types. For example:
<complexType name="A"> <element name="B" type="string" minOccurs="0" maxOccurs="unbounded"/> <element name="C" type="string"/> </complexType> |
We then define "D" which restricts B to a single occurrence and adds a new element E. We would like to do this in one step (not two):
<complexType name="D" base="A" derivedBy="both"> <element name="B" type="string" minOccurs="1"/> <element name="E" type="integer"/> < </complexType> |
5.3 Unambiguous derivation of nested elements
If we allow derivation by both restriction and extension in a single step then there needs to be a mechanism for specifying the exact path of element derivations. This is required to avoid ambiguity when there are anonymous embedded type definitions. For example:
<complexType name="A"> <element name="B"> <complexType> <element name="C" type="string" minOccurs="0" maxOccurs="unbounded"/> </complexType> </element> <element name="D" type="integer" /> </complexType> <complexType name="E" base="A" derivedBy="both"> <element name="C" type="string" minOccurs="1" /> <element name="F" type="integer" /> </complexType> |
Is "C" an extension (that is, a new element) or a restriction of "B.C"? If the latter, then it would be better to use a path specification such as B.C. :
<complexType name="E" base="A" derivedBy="both"> <element name="B.C" type="string" minOccurs="1"/> <element name="F" type="integer"/> </complexType> |
5.4 Derivation by Restriction/Extension using Derived Complex Type
We would like to be able to do the following kinds of derivation by restriction/extension based on an existing restricted/extended complex type:
<complexType name="A"> <element name="B" type="BType"/> </complexType> <complexType name="restrictedB" base="BType" derivedBy="restriction"> <!-- derivation by restriction --> </complexType> <complexType name="restrictedA" base="A" derivedBy="restriction"> <element name="B" type="restrictedB"/> </complexType> |
<complexType name="extendedB" base="BType" derivedBy="extension"> <!-- derivation by extension --> </complexType> <complexType name="extendedA" base="A" derivedBy="extension"> <element name="B" type="extendedB"/> </complexType> |
Discussed at Edinburgh ftf.
This has been raised within the WG before. The argument at the time: we have defined two and may yet define more type derivation mechanisms, don't believe we want to define all logical combinations of these. Again it is a slippery slope. It is already tricky because can extend complex types from simple types, don't want to think about what implications of that would be if you have restricting extensions in the mix. It is also the case tha we have said that we want schema processors to check restriction steps to ensure that they are restrictions. Cannot imagine how to check both at once. Too hard. Don't go there.
Shall the XML transfer syntax for occurrence information for attributes and elements be made identical? (Cf. issue 222 attributeOcc: Change the specification of attribute occurrence? in the development-issues list.)
Cf. Clarify minOccur/maxOccur defaulting?
Input from Jane Hunter:
6. Inconsistencies in Occurrence Constraints In the current
working draft there are inconsistencies between the specification of
occurrence constraints for elements and attributes e.g.
default="37"
vs. use="default" value="37"
.
It would be much better if the same attributes are used.
<element name="myElement" type="integer" minOccurs="0" maxOccurs="1" default="37" /> <attribute name="myAttribute" type="integer" use="default" value="37"/> |
Input from XML Query WG:
Paul Cotton <paulcotton@alumni.uwaterloo.ca> to www-xml-schema-comments@w3.org, Mon, 29 May 2000 12:28:34 -0400, Subject: XML Query Comments to XML Schema (2nd part)
2.4 Problems with minoccurs and maxoccurs
A. The default for maxOccurs behaves counter-intuitively. When maxOccurs is not explicitly specified, it inherits the value of minOccurs (which defaults to 1 if not specified). This is confusing. For example, po.xsd in XML-Schema Part-0 (Primer) contains the declaration <xsd:element ref="comment" minOccurs="0"/>
This effectively prohibits comments in the instance-document.
The XML Query Working Group suggests that Schema require that minOccurs and maxOccurs occur together or that Schema normatively adopt the default-rule mentioned in Appendix B of XML Schema Part-1: "maxOccurs defaults to 1 or minOccurs, whichever is greater".
B. The XML Query Working finds the different treatment of the properties minOccurs/maxOccurs, fixed, default, and value in the XML representation for element-declarations and for attribute-declarations confusing. The XML Query Working group suggests to use the same representation for element-declarations and attribute-declarations, and constrain the allowed value for minOccurs and maxOccurs in attribute-declarations to "0" or "1". This would allow queries such as:
"Select all attributes and elements that may occur at most 1 once" |
to be evaluated more efficiently.
C. There is an inconsistency between '*' and 'unbounded'. Primer uses "*" to mean Infinity; Data Type spec uses "*" in appendix B. Other places in the spec use "unbounded".
Discussed in call of 2000-07-20.
The question is on a proposal to reverse our decision on development issue 222 attributeOcc: Change the specification of attribute occurrence?
On 2000-03-16, the WG first voted to suppress minOccurs and maxOccurs (dissenting: Box, Brown, Thompson. abstaining: Biron, Shannon), and then (without dissent) to choose the second proposal above.
On 2000-03-23, the WG declined to reconsider that decision, and there was no consensus in favor of changing the behavior of attributes to make them required by default.
The WG discussed the issue, and distinguished two changes proposed by the commentators: changing the 'use' attribute back to minOccurs and maxOccurs, and changing the name of the 'value' attribute to 'default'.
An initial straw poll showed support for at least the first change, but after further discussion the WG decided against the changes.
RESOLVED: to retain the 'use' attribute instead of reverting to minOccurs and maxOccurs. Dissenting: Biron, Olken, Thompson (by proxy).
Rationale: the arguments in favor of reverting to the earlier syntax are flawed: elements are not the same as attributes (as illustrated by the difference in range of legal values for both min and max, as well as the difference in defaults), and making the syntax suggest they are would be misleading. RDF's decision not to come to grips with the distinction, and to attempt to make attributes and child elements interchangeable is (in the view of some WG members) one of the biggest mistakes in the design of RDF syntax, and this is not a mistake we should repeat or encourage.
On the second question, some members of the WG suggested that the use of the term 'default' for both default and fixed values might be mildly confusing in such close proximity to the sharp distinction being made between fixed and default values in the 'use' attribute. There was, in the end, not enough serious support for the suggested change.
RESOLVED unanimously: to close issue LC-153 part 2 with a polite no.
Should XML Schema be revised to make it possible to import specified names from other modules (rather than the entire set of names), and to specify in the definition of a module what names from it are legitimately used by other modules (are exported)?
Input from Jane Hunter:
7. Element-specific Importation and Exportation It would be useful to be able to specify the importation of specific individual elements, types, attributes or groups rather than only complete schemas.
We would also like to be able to specify which particular components (elements, types, attributes, groups etc.) of a schema are exportable.
Discussed in call of 2000-07-06.
The question is on MPEG-7's proposal that we support explicit control over name export, and explicit support for (selective) name import.
RESOLVED unanimously to dispose of LC-154 by saying that some WG members (at least) see utility in such a suggestion, and no one argued against it on principle, but that experience has made us feel such functionality is better omitted from 1.0. Our earlier design did have similar functionality but was widely criticized as needlessly complex in this area; even those of us who believe the functionality is important also agree that the mechanisms we have thus far thought of seem complicated. We hope that experience will suggest lighter-weight methods of achieving this function. We do foresee the possibility of compatible extensions later to provide this functionality.
It was noted that control of name export can be accomplished, within limits, by making only those things top-level which the module author wishes to have exported. Complex types and element declarations which are local to some complex type cannot be referred to from other modules, and are thus effectively hidden. We are not sure whether this completely satisfies all needs for control over the export of names, or not; we do believe it is useful in many circumstances. We do not see any way, in the current draft, to control import or inclusion of names; users of third-party modules will end up having all the top-level names in those modules visible.
(The chair notes after the fact that perhaps this should be a priority feedback issue.)
Shall XML Schema be revised to allow schema authors to define complex types which allow unknown elements to be inserted anywhere within the type? (or: within the type or within any of its descendants?)
Cf. Make schema for schemas open?
Input from Roger L. Costello <costello@mitre.org>:
"Roger L. Costello" <costello@mitre.org> to XML Schema Comments list on Fri, 12 May 2000 08:51:46 -0400
[1] Please reinstate the capability to specify open content using an
attribute (i.e., content = "open"
).
Rationale:
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§2.2.1.3 Complex Type Definition
The failure of complex type definitions to allow only appending extensions seems very limiting. In a DTD might see a content model defined as:
( frontmatter, preface?, chapter*, backmatter ) |
It sounds as if it would be impossible to extend this to:
( frontmatter, preface?, introduction?, chapter*, backmatter ) |
Being able to extend by appending particles after
backmatter
is pretty useless in this context. It seems
that this would require a schema author to resort to redefinition
rather than extension, thus losing any advantages of type
inheritance.
Discussed in meeting of 1-2 August 2000.
The proposal amounts to this:
content="mixed"
switch.ANY
inclusion
exception in an SGML DTD.It was common ground that:
There were differences of opinion in the WG on whether the switch, or the globally-open style of declaration it makes easier, is a good idea or not.
A straw poll showed a large preponderance of opinion in favor of not going there.
Resolved: to dispose of LC-155 by respectfully declining the proposal. Dissenting: Buck, Costello, Hollander.
Shall the element types declared in the schema for schemas be declared open? If so,
Input from Roger L. Costello <costello@mitre.org>:
"Roger L. Costello" <costello@mitre.org> to XML Schema Comments list on Fri, 12 May 2000 08:51:46 -0400
[2] Please make the schema for schemas open.
Rationale:
Discussed at meeting of 1-2 August 2000.
The status quo is that on every major schema construct, there is an annotation element, which may contain arbitrary content; there are no content-model wildcards elsewhere in the schema for schemas. All elements in schema documents can have attributes from any other namespace.
Resolved: to dispose of LC-156 by politely declining to change the status quo. Dissenting: Buck, Costello, Hollander, Holstege, Olken. Abstaining: Corda, Ezell, Mendelsohn.
Shall the XML Schema spec be modified to describe clearly what behavior is expected of an application which is confronted with documents which conform not to the schema it expects but to a schema derived from that schema?
Input from Roger L. Costello <costello@mitre.org>:
"Roger L. Costello" <costello@mitre.org> to XML Schema Comments list on Fri, 12 May 2000 08:51:46 -0400
[3] Please define the expected behavior of an application configured to process (e.g., extract data from) documents conforming to schema 'X' when it receives documents conforming to schemas derived from schema 'X'.
Discussed at the face to face meeting of 1-2 August 2000.
If schema Y is derived from schema X, and procssor P is hard-coded for schema X, what does P do when confronted with data conforming to schema Y?
Clarification: "schema Y is derived from schema X" means that schema X has derivations by extension or restriction.
Some WG members argued that since the spec doesn't actually talk about conformant application-processors, this question is out of scope. It is also not clear how to give it a formulation that crisply defines the relation of schemas X and Y. RLC withdrew the question; it may be discussed offline as a point of interest.
Should the functionality of XML Schema be divided, with the simpler part being normative, and the more complex part of the functionality being in a non-normative part of the spec?
Input from Roger L. Costello <costello@mitre.org>:
"Roger L. Costello" <costello@mitre.org> to XML Schema Comments list on Fri, 12 May 2000 08:51:46 -0400
[4] Simplify the schema by making it open and moving the more complex features to a non-binding portion of the schema spec. The resulting simplified version of the XML Schema spec can then gradually evolve to incorporate the more complex features (if the market dictates).
Discussed at Edinburgh ftf.
This issue can be folded into LC-143.
Should XML Schema specify that schema processors must make the results of their keyref checking available to downstream applications by adding it (on request, or by default) to the PSV infoset?
Input from Jim Trezzo <jtrezzo@us.oracle.com>:
Jim Trezzo <jtrezzo@us.oracle.com> to XML Schema Comments list on Fri, 12 May 2000 16:03:28 -0700
Part 1: Structures
1. Identity-constraint table
For reasons of performance, and avoidance of duplicate implementation, we believe that a conforming schema processor should always be prepared to pass the results of its keyref checking in the PSV-infoset. It could be very expensive for applications such as query processors to have to redo this work, both in run-time performance and in implementation effort.
Of course, a schema processor could have an option to allow applications to say when they did not wish to receive the information. However, for interoperability, applications should be able to rely on conforming processors making this information available on request, otherwise they would be forced to always include their own implementation of what they needed, just in case.
A part of the problem may be that the Last Call draft of Structures distributes the information across identity-constraint tables attached to different element information items. A single table per document would seem preferable, both conceptually and as a guide to a practical realization.
Input from XML Query WG:
Paul Cotton <paulcotton@alumni.uwaterloo.ca> to www-xml-schema-comments@w3.org, Mon, 29 May 2000 12:28:34 -0400, Subject: XML Query Comments to XML Schema (2nd part)
2.5 Identity-constraints tables
XML Schema Part 1: Structures section 3.10 discusses the Infoset contributions for identity constraints.
In order to verify that identity constraints are satisfied, it defines identity-constraint tables to be added to Element Information Items. These tables in effect would let a query processor find the element referred to by any keyref.
A. The note at the end of the section says, however, that these tables are optional. Conformant schema processors are *not* required to expose them. This means that a query processor working with a PSV Infoset created by a conformant processor that does not expose such tables may be forced to reconstruct some or all of them -- possibly an expensive process, and clearly unnecessary as the schema processor would have created them to check the identity constraints and then thrown them away!
We suggest that all conformant XML Schema processors must be able to expose the identity-constraint tables, but need not do so if requested otherwise.
B. We would like to request a reformulation as a single "identity-constraint index" from which it would also be easy to find all the elements whose keyrefs referred to a key.
A simpler representation would promote interoperability of conformant XML Schema processors. We are thinking both of conceptual simplicity and of a corresponding API that could support transfer of this information in practice.
Discussed in ftf meeting of 1-2 August 2000.
David Beech's proposal (mail to IG, 6 July) is in essence:
identity-constraint info item [definition] an {identity-constraint definition} [scoping element] an EII in the PSV infoset [node table] a set of (EII, key-sequence, EII?) triples |
This differs from the current design in two main ways: first, it pulls identity-constraint information up to top level: this becomes a document property in the PSV infoset. Second, it changes the description of validation by putting the check at the end of processing.
Clarifications: a flag in the definition is used to distinguish keys from keyrefs. If the schema processor doesn't provide access to the component level, then need some kind of fallback will be needed. The proposal calls for one table, rather than one table per constraint, because this way it is (in the view of the task force) conceptually easier to visualize, more encouraging to implementors, easier to transfer whole as an optional piece of the PSV infoset.
Some WG members were concerned about prescribing (or even appearing to prescribe) the order in which processors are to do things, and about requiring that processors keep all this information around to the end of the validation process. Everything else we have done is proportional to the size of the document instance or to the size of the schema; this is not, and may grow much larger than the schema itself.
Some WG members objected to the proposal on the grounds that it changes the scope of the validation domain, which they objected to. Some felt it did not actually make life easier for implementors. There was discussion of the validation implications of forward references, and their description in the current specification.
The question was divided.
(A) Infoset contribution be pulled up to "the top". (B) Change formulation of semantics of keyref to make it a document scope formulation rather than element scope formulation.
On the question Shall this information be optionally provided to downstream processors in the form given in the proposal (with the crucial difference being the attachment of this information to the element at which validation started)?, there was one vote in favor, three abstentions, and a preponderance of opinion against. The proposal failed.
On the question Shall we change the formulation of how identity constraints are checked, to specify that it happens at the end of validation, rather than using the current language about logical priority?, there were a few votes in favor, but a majority opposed. This proposal also failed.
Resolved unanimously: to instruct the editor to use the proposal as given above (minus the scoping-element property) to answer the request from the Query WG for clarification.
Should XML Schema make the default behavior in element declarations be, by default, to block the specification of other elements as belonging in the equivalence class of the element being declared?
Input from Jim Trezzo <jtrezzo@us.oracle.com>:
Jim Trezzo <jtrezzo@us.oracle.com> to XML Schema Comments list on Fri, 12 May 2000 16:03:28 -0700
2. Default for Element Equivalence Classes
Since the flexibility introduced by element equivalence classes adds complexity to schema design and to parsing, we feel that the default should be to block it, rather than as at present to allow it and affect users who are not even aware of the feature.
For example, the designer of schema A who declares element <e> and does not block 'equivClass' thereby allows the designer of any other schema B for a different namespace to add elements to the equivalence class for <e> so that they become valid substitutions for <e> when it is being validated using schema A.
Not only may this be unintended by the designer of schema A, but it will increase the parsing complexity and we fear that it may even lead to ambiguity in the content models for complex types that reference the declaration of e.
Discussed in Edinburgh ftf.
Request feedback about values of schema-level defaults and schema-level defaults in general.
Should a standard API to the post-schema-validation infoset be defined?
Input from Jim Trezzo <jtrezzo@us.oracle.com>:
Jim Trezzo <jtrezzo@us.oracle.com> to XML Schema Comments list on Fri, 12 May 2000 16:03:28 -0700
4. API for PSV-Infoset
The information added to the PSV-Infoset will be very valuable to applications if there is an efficient practical realization of it, e.g. in a standard API.
Discussed in call of 2000-06-16.
RESOLVED without dissent: to reclassify this as class B, and have the chairs respond that this is formally out of our scope.
Should schema processors which know they are processing a schema place information about the schema components into the post-schema-validation infoset?
Cf. Provide type-information in PSV Infoset?
This issue was subsumed, in practice, in issue LC-198.
Input from Jim Trezzo <jtrezzo@us.oracle.com>:
Jim Trezzo <jtrezzo@us.oracle.com> to XML Schema Comments list on Fri, 12 May 2000 16:03:28 -0700
5. Augmentation of PSV-Infoset for schema documents
Since a schema document is an XML document (and a major argument in favor of using XML syntax to represent XML schemas has been that generic XML tools would then be applicable to them), the validation of an XML schema against the Schema for Schemas will produce a PSV-infoset.
Chapter 4 of Structures transforms what is effectively this information into a component and property model, adding information. However, no generic tools exist for processing this information, which would be useful if available to applications such as repositories or query processors.
Would it be possible to augment the PSV-infoset with additional information when the instance being validated is itself a schema document?
Discussed at face to face meeting 1 August 2000.
See LC-198 for details.
Some apparent typos and a technical problem in the description of the recurringDuration type.
Input from Jim Trezzo <jtrezzo@us.oracle.com>:
Jim Trezzo <jtrezzo@us.oracle.com> to XML Schema Comments list on Fri, 12 May 2000 16:03:28 -0700
Part 2: Datatypes
6. recurringDuration
there appears to be a typo in 3.2.7. The second sentence: "The order-relation on timeDuration ..." should read: "The order-relation on recurringDuration ...".
This also brings out a technical problem. Since recurring duration has two facets (duration and period) which should enter into determining the order-relation, the specified rule (x<y iff y - x is positive) is not adequate. We could say that when either duration or period is fixed, the variable facet would be used to determine order-relation.
There also seems to be a conflict between what the text of 3.2.7 (paragraph 3) says and what the explanatory box says. The text says: "... it can be used as a datatype on its own ...", where the box says: "It is an error for recurringDuration to be used directly in a schema".
(AM)
Discussed at Edinburgh ftf.
JT (originator) says that a satisfactory resolution will be: the specification says more clearly that it is an abstract type and that the ordering relation is not defined on the abstract type. When you create a specific derived type (including anonymous) then specify order on that.
Agreed.
Should XML Schema drop xsi:type (e.g. in the interests of better separation of schema and data)?
Input from Philip Wadler <wadler@research.bell-labs.com>:
Philip Wadler <wadler@research.bell-labs.com> to XML Schema Comments list on Fri, 12 May 2000 19:08:51 -0400
1. Clear separation between schema and data
One of the nice feature of XML is that documents are "self-describing". Schema has two features which run counter to this philosophy, xsi:type and xsi:null. Our motto here is, `Keep schema out of the data!'
1.1. xsi:type
Schema permits refinement in two forms: an element may be declared as being a subclass of another element, and a type may be declared as a subtype of another type. This is explained in Section 4 of the primer.
(When one element is a subclass of another element, Schema says the first element is `in the equivalence class' of the second. We use `subclass' because it has the right connotations, whereas `equivalence class' does not.)
When subtyping is used without subclassing, the document is required to include type information. Here's an example from Section 4 of the primer.
<shipTo export-code="1" xsi:type="ipo:UK-Address"> <name>Helen Zoe</name> <street>47 Eden Street</street> <city>Cambridge</city> <postcode>CB1 1JR</postcode> </shipTo> <billTo xsi:type="ipo:US-Address"> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo> |
If subclassing is combined with subtyping, the use of xsi:type can be avoided.
<shipTo export-code="1"> <UK-Address> <name>Helen Zoe</name> <street>47 Eden Street</street> <city>Cambridge</city> <postcode>CB1 1JR</postcode> </UK-Address> </shipTo> <billTo> <US-Address> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </US-Address> </billTo> |
This latter form is easily read by anyone who understands XML, even if they do not understand XML Schema.
We feel that the extra complexity of xsi:type outweighs any of its advantages. We suggest that subtyping be tied to subclassing, and that xsi:type be removed.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§2.6 Schema-Related Markup in Documents Being Schema-Validated
The concept of having a universal instance namespace ("xsl:")
baffles me. Isn't this polluting the actual target namespace(s) of my
instances? I don't want to see xsi:type
or
xsi:null
attributes in document instances. This seems
like a terrible hack. §6.3.2 provides the reasoning for schema
locating, but other uses of the xsi:
namespace should be
strictly prohibited.
Discussed in call of 2000-06-30.
Ashok Malhotra reported that PW had changed his mind.
RESOLVED: to close issue LC-164 with polite no.
Dissenting: Olken (it was a good idea)
Formal response to commentator. Philip Wadler replies (privately) "I won't dissent on xsi:type."
Should XML Schema drop xsi:null (e.g. in the interests of better separation of schema from data)?
Input from Philip Wadler <wadler@research.bell-labs.com>:
Philip Wadler <wadler@research.bell-labs.com> to XML Schema Comments list on Fri, 12 May 2000 19:08:51 -0400
1.2. xsi:null
Reading between the lines, it seems clear that xsi:null is included in Schema to support some ways of using relational databases. That is, Schema is trying to help Query. But it is not at all clear what the Query group will decide about nulls. We believe that xsi:null should be removed from Schema. Query should first decide on what mechanism is required for nulls, and then discuss the situation with Schema if Schema support is required.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Primer
§2.9 Null Values
This seems like one that shouldn't have made it past the 80/20
threshold, and seems more appropriate to application-level validation.
I'm also shocked to see the instance namespace polluted with the
xsi:null
attribute.
Input from XML Query WG:
Paul Cotton <paulcotton@alumni.uwaterloo.ca> to www-xml-schema-comments@w3.org, Mon, 29 May 2000 12:28:34 -0400, Subject: XML Query Comments to XML Schema (2nd part)
3.2 Treatment of NULLS
The Query WG has not reached a consensus regarding the definition of NULLs. We expect that the Query WG will submit comments regarding nulls in the future, once we have determined their potential impact on the Query algebra. In the interim, we have asked individual members of the Query WG to send their comments regarding NULLs directly to the Schema WG.
Discussed in call of 2000-07-14. No consensus.
Discussed at face to face meeting of 1-2 August 2000.
A straw poll showed the WG very evenly divided (nine for keeping xsi:null, nine for removing it, nine abstaining, and three voting to concur with the majority).
Most of the proposals brought forward for redefining xsi:null had already been brought forward at the time the facility was originally designed; they had not commanded a majority of the WG at that time, and more recent arguments appear not to have been more persuasive than those given earlier.
In discussion, some WG members said they had accepted the xsi:null proposal on the grounds that the database specialists in the WG said they needed and wanted it, and asked that the database people get their act together and make a unified recommendation. In response, some database vendor representatives said they were implementing and using it, and noted that the negative feedback amounted largely to a request for tighter semantics and greater expressiveness. Other database people said that while the current xsi:null construct clearly makes sense in a SQL context, its meaning is less clear for rich XML databases; they wanted to make sure the DB community agreed on how to do the right thing. Others suggested that the entire problem should be out-sourced to the XML Query WG, as the primary clients for a null facility (to which others responded that commercial database vendors, not the Query WG, are the primary clients for it. Some WG members argued that in the absence of a concrete proposal for mapping between SQL and XML Schema, it was impossible to state that xsi:null was either necessary or useful; others countered that on the contrary, the discussion in the IG had shown that it is entirely possible to construct arguments for its utility even without a prescribed mapping. Some WG members said that while they or their organization would have preferred a different design which went further, the existing design was all right as it is written. Several argued that the key point was whether it would be easier to take xsi:null out at the end of CR if implementation experience showed that was desirable, or easier to add it in, if implementation experience showed that it was essential.
Resolved: to instruct the editor to make xsi:null a priority feedback issue.
The chair requested comments from the chair of the XML Query WG (who was observing the meeting). He said he took the discussion as a sign that the priority of this issue needed to be raised, and the implications worked out. Do searching for absence of elements, and searching for null elements require different predicates in the XML Query language? He suggested that the two WGs needed to discuss this problem together.
On the question "Shall we resolve issue LC-165 by dropping the xsi:null facility?" the votes were:
The proposal to drop the xsi:null facility therefore failed.
Formal response to commentator. Philip Wadler replies (privately) "I won't personally dissent on this decision."
Should XML Schema allow simple types to be used wherever complex types may now be used?
Cf. Can XML Schema define XSLT?
Input from Philip Wadler <wadler@research.bell-labs.com>:
Philip Wadler <wadler@research.bell-labs.com> to XML Schema Comments list on Fri, 12 May 2000 19:08:51 -0400
2. Simple types vs. complex types
One lack of orthogonality in XML Schema Part 1: Structures is that simple types and complex types cannot always be used in the same way. We suggest that simple types be permitted wherever complex types are.
This would result in a number of simplifications:
For example, we can now specify a LETTER element that consists of a SALUTATION element, followed by some text, followed by a CLOSING element.
<xsd:element name='LETTER'> <xsd:element name='SALUTATION' type='xsd:string'/> <xsd:simpleType type="string"/> <xsd:element name='CLOSING' type='xsd:string'/> </xsd:element> |
This is more precise than using `mixed', and, because it lists the components in the order they appear, it is easier to read.
Of course, types must be parseable and serializable. Usually, values of primitive type can be space separated, the exception being strings (which may themselves contain spaces). Therefore, it is not allowed to specify two successive occurrences of primitive type if one or both of them is a string.
Input from XML Query WG:
4. Simple types vs. complex types
One lack of orthogonality in XML Schema Part 1: Structures is that all kinds of types cannot always be used in the same way. Some members of the Query working group felt that simple types [should] be permitted wherever complex types are.
This would result in a number of simplifications:
We would welcome an explanation of why the Schema group has chosen to support a lack of orthogonality in this area.
Discussed in call of 2000-06-23.
MSM noted that the overt statement of the suggestion appeared to be misleading: the names of simple types are already legal in the same places where the names of complex types are legal (except in the derivation of new complex types): what the commentators appear to want, and show in their examples, is the ability to write content models in which simple types can appear as content-model particles.
In resolving LC-51, we had decided not to introduce an untyped (or string-typed) PCDATA particle into content models. The chair asked whether allowing such particles to be typed with any simple type changed anyone's mind. It did not.
The same rationale as for LC-51 appeared to apply, with the added observation that extending our determinism rule to cover typed PCDATA particles would be complex. (Scribe's note, post hoc: for example, a content model which read, in part:
<choice> <simpleType type='string'/> <simpleType type='date'/> <simpleType type='arithmetic-expression'/> </choice> |
would be not only non-deterministic but ambiguous. We would need to decide whether processors were required to detect the ambiguity or not, and either decision would raise problems and require substantial design effort.)
RESOLVED without dissent: to sustain the status quo, with the rationale given.
Should XML Schema revise its treatment of local declarations, either by eliminating them entirely or by relaxing some of the current onerous restrictions?
Input from Philip Wadler <wadler@research.bell-labs.com>:
Philip Wadler <wadler@research.bell-labs.com> to XML Schema Comments list on Fri, 12 May 2000 19:08:51 -0400
3. Context-independent types vs context-dependent types
One of the great structuring principles of DTDs is that the elements with the same name always have content of the same type. Many users of SGML take this as the foundation stone for structuring a document.
Schema departs from this: the same elements with the same name may have contents of differing type, depending on the context where they appear. However, Schema goes only halfway toward this, as there are some complex restrictions (apparently intended to ease parsing).
We suggest that the design should be a good horse or a good elephant, not a hybrid beast. Either choose a completely context-independent design, similar to DTDs, or choose a completely context-dependent approach, similar to that pursued by, for instance, the work on Xduce at the University of Pennsylvania. In mathematicians terms, we should either deal with trees that represent context-free grammars (which can be parsed by top-down deterministic tree automata), or with regular trees (which can be parsed by either bottom-up deterministic, bottom-up non-deterministic, or top-down non-deterministic automata; the three are equivalent).
3.1 Context-independent types
To make types context-independent, all that is needed is to change Schema to only allow global element declarations.
Advantages:
3.2 Context-dependent types
To make types fully context-dependent, Schema should (at least) remove the restriction that all sibling elements with the same name should have the same type.
Advantages:
Disadvantages:
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Primer
§2.7 Building Content Models
When name tokens occur in DTD content models, they always refer to a singular type definition matching the name. That local type definitions occur seems very problematic. For example, I can imagine this making XSLT stylesheets very difficult to work with, both in transforming instances and XML Schemas.
§3.1 Target Namespace & Unqualified Locals
[I'm feeling a bit like an unqualified local at this point...] Regarding the last paragraph beginning "When local elements...": how can this be considered acceptable? It seems so capricious to base designs on conjecture, which the 'checkerboard' of prefixes must surely be in many instances. Do you expect schema authors to actually "know" if an entire schema is globally-declared? Especially if parts of the schema are distributed?
§3.3 Global vs. Local Declarations
I can't imagine that this is a good idea, especially as regards XSLT transformations. In order to write an XSLT stylesheet, common document authors (who know their instance syntax and the rudiments of XSLT) will suddenly have to read the XML Schema to determine that their XSLT selectors are exclusive enough to only select the correct contexts.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§2.2.2.1 Element Declaration
While the idea that XML Schemas allow for both globally- and
locally-scoped type definitions is appealing, I'm not convinced that
the world into which Schemas are being introduced is ready for such a
notion. We've gone from "a <p>
is a
<p>
is a <p>
" to "a
<p>
is not a <p>
when it's in a
<q>
", which is even enormously simplified from how
it could be stated, given how complex context is capable of
becoming.
Talking about, documenting and using an element type is going to
get remarkably more confusing, especially with the rapid proliferation
of vocabularies and desire to intermix them. Now, we'll truly have no
way of knowing to which <book:title>
is being
referred, since there can now be many. I realize that the
workaround is requiring that namespaces only refer to
globally-declared types, but I don't buy that as a solution to the
problem this introduces. This to me steps over the "80/20" point for
a version 1.0, and I would seriously recommend eliminating the ability
to create context-dependent types.
§2.5 Names and Symbol Spaces
Perhaps this is necessary due to the ability to define components globally as well as locally, but this seems enormously confusing, and I would expect the general community to have great difficulty keeping the various local namespaces distinct. Even the definition wavers, providing some contradictions to the concept ("within a target namespace, simple type definitions and complex type definitions share a symbol space"). Perhaps some examples clearly showing the boundaries would help.
Discussed at Edinburgh ftf.
We don't want to remove locally scoped elements from schemas because that moves the complexity to implementors of other applications which will map data in and out of XML, we do not believe we should drop the restriction because it gets in way of simply treating elements based on fully qualified GI, and as far as we can tell does not take us into the realm of tree grammars. We will add a specific request for input in CR that we believe locally scoped elements handle these certain use cases and we would like feedback on whether they do or not.
Should XML Schema drop its facilities for anonymous types, requiring all types to be given names?
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Primer
§2.4 Anonymous Type Definitions
I'm unclear as to the real value of anonymous types. Schemas are
so incredibly verbose anyway, this seems to have little value in
relation to the ambiguity it seems to create. It would seem that an
anonymous type should simply be for elements one simply didn't care
to type, but anonymous types seem to inherit, as
quantity
does from integer
in the example
provided.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
The definition for "Schema component" states that primary and secondary components may have names. May? I don't understand the circumstances under which a component wouldn't have a name. The spec also doesn't mention what the implications are to not being named. This section doesn't mention what differentiates "primary" from "secondary", and this certainly doesn't seem to have anything to do with the differentiation made between "definition" and "declaration".
Miscellaneous editorial notes on the Primer
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Primer
§2.8 Attribute Groups
Can attribute groups contain attribute groups? By derivation?
§5.5 Any Element, Any Attribute
The value of the xmlns
attribute on
<table>
is incorrect. It should end in
.../1999/xhtml
. Likewise, in the prose text, the
namespace URI value has "XHTML" in uppercase.
Should XML Schema drop the include facility?
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Primer
§4.1 A Schema in Multiple Documents
I don't see the need for yet another inclusion mechanism, given
XInclude, XLink and entities. At very least, XInclude seems to mimic
<include>
sufficiently and without the potential
processing order mismatch of XLink and entities. Can't we just use
XInclude?
It also states "nesting is legal only if all the included parts of the schema are declared with the same target namespace." In a scenario where I was mixing XHTML + XLink + SVG + MathML, how could this suffice? XHTML 2.0, SVG and MathML include XLink, and I'm at loss to understand how one could mix all four given this restriction.
Note that the example ipo.xsd is missing an end quote mark on the
xmlns:ipo
attribute on <schema>
.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
The NOTE regarding no requirement on schemas sharing a target namespace is a bit troubling. One of the difficulties I've had is how XML Schemas are used to create schemas containing multiple namespaces, indeed this has been pointed out as a "failure" in DTDs, and part of the rationale for the XML Schema activity. If at the abstract level schemas may contain multiple target namespaces, but in actual XML Representation they may not, how can one implement a multiple target namespace representation? If this is possible, why doesn't this NOTE refer to how this is accomplished? What am I missing here?
Discussed in call of 2000-07-06.
RESOLVED unanimously: to close issue LC-170 with a polite no and give a rationale for the 'include' element.
The include element is not a simple file inclusion, but can be parameterized and used in such a way as to override the declarations in the schema document being included. That means it cannot be replaced by the use of general entities or Xinclude.
Should XML Schema drop its facilities for declaring keys, unique values, and key references? (Or at least clarify their purpose, utility, and difference from the XML 1.0 ID type.)
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Primer
§5.1 Specifying Uniqueness
My first reaction: wow! My second: this is a 2.0 feature.
§5.2 Defining Keys and their References
What problem are keys solving? If one doesn't have access to the DTD, one cannot know which attributes are IDs. If one doesn't have access to the XML Schema, one doesn't know the scope of keys either. I can understand that keys are more flexible than IDs, but why have both? Can't IDs be simply defined as a key with a particular scope? How do fragment identifiers deal with keys?
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§2.2.4 Identity-constraint Definition Components
The first paragraph of this section is completely indecipherable to me.
Discussed in call of 2000-07-07.
David Cleary summarized and clarified the argument of the
commentator. His real issue is with the UNIQUE
functionality. But there is no need to drop UNIQUE
if
keys are around, so effectively this can be read as a proposal to drop
key constraints. The key constraint feature is separable from the
rest of the spec, so we could do this.
The addressability of ID
s and keys in the absence of DTDs
or schemas is a second part of the issue.
Mary Holstege pointed out that we could resolve the tension felt by
the commentator between keys and ID
/IDREF
by
dropping the latter, as well as by dropping the former.
The chair distinguished three parts of the question, and put the separately.
RESOLVED unanimously: to dispose of the first part of issue LC-171 by saying no, we don't want to drop keys.
RESOLVED unanimously: to dispose of the second part of LC-171 by saying yes, we agree that addressability is an important topic, and that is one reason that LC-159 is an issue; our disposition of LC-159 may affect this area. But N.B. it's a matter of convenience, not of functionality: if you know that something is a key, you can use that knowledge in writing the XPath expression for an XPointer. Exposing the key constraint information in the PSV infoset may affect convenience, and would provide hooks for Xpointer extension, but does not change what can be done on the basis of XML Schema.
As for addressability, using DTD or schema-based information, in the absence of a DTD or schema: without a DTD or a schema, all bets are off anyway.
There was some discussion of the difficulty we have in defining formally the value space of IDs, and of whether our key constraint mechanism can express the constraints imposed by ID and IDREF and IDREFS.
RESOLVED: to dispose of the third part of issue LC-171 by saying no, we don't want to drop ID/IDREF/IDREFS at this time. Dissenting: Connolly, Holstege, Timmermans
Should XML Schema drop its facilities for wildcards in content models?
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Primer
§5.5 Any Element, Any Attribute
This definitely fails the 80/20 test in my opinion.
The value of the xmlns
attribute on
<table>
is incorrect. It should end in
.../1999/xhtml
. Likewise, in the prose text, the
namespace URI value has "XHTML" in uppercase.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§2.2.3.3 Wildcard
Appropriate for a version 1.0? I have a difficult time understanding how the availability of wildcards and the variety of validation levels (strict, skip, and lax) might interact. Powerful perhaps, but certainly confusing.
Discussed in call of 2000-06-08.
The case for eliminating wildcards lies in their complexity, and the simplification to the spec which would result from eliminating them.
In discussion, the WG identified several reasons that wildcards of some kind or another are essential. The ANY keyword of SGML and XML 1.0 does not allow elements to be defined which allow arbitrary blocks of well-formed XML as their content; it is thus impossible, in XML 1.0, to define a DTD for (say) a protocol-oriented envelope, which carries arbitrary XML as its payload. Our wildcards do that, and thus make it possible to have schemas with 'black-box' areas. This is one of the things Schema does which is clearly more expressive than DTDs, and it is essential to allow the 'X' of XML to be meaningful not only in cases where validation is foregone, but in cases where document types are formally defined and validated.
The use of wildcards also makes it much easier to apply different schemas to the same document instance.
David Beech observed that issues 155-157 might lead us to abandon our current ANY wildcard in favor of the interleaving-style of openness described in earlier drafts
RESOLVED without dissent: not to drop the ANY wildcard on account of this comment.
Miscellaneous notes on Part 1, Structures.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
The Structures document describes in detail the XML Schema language, "offering facilities for describing the structure and constraining the contents of XML 1.0 documents." This is not entirely true: XML post-namespaces is a very different beast than simply "XML 1.0", and I believe XML Schemas suffer the same complexities as any other specification attempting to deal with this underspecified concept.
§1.2 Dependencies on Other Specifications
I note that even as late as this last week of the Last Call period, the specifications for XML 1.0, URIs, and XML Namespaces are all perhaps likely to be changed. To my knowledge, problems with XPath have been identified. How can this specification move to Recommendation if many of its dependencies are still evolving? For example, I'm still unclear as to what decision was made as to interpretation of relative URIs as namespace identifiers. While several of these dependencies are noted in this section, this makes it difficult to interpret how the Schema specification will interact with them.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
What's the difference between a "model group" and a "model group definition"? This may become clear later, but why not now?
Under the definition for "target namespace", does the target namespace "determine the namespace URI of the information items it may validate", or the namespace URI determine (or "select" as in the XML Namespace Rec) the target namespace? It sounds like this is backwards.
§2.2.1.1 Type Definition Hierarchy
This introduces the "ur-type definition", a name that perhaps has significance to the authors but has none to me. What class was I asleep during? Mathematics? Linguistics? Computer Science? Isn't there a more accessible term available? [I note that the Schema WG includes this in its list of Last Call comments, so I guess I'm not alone.]
Reading the definition for "ur-type definition" I don't see what it does, nor does the spec mention how many "distinguished" ur-type definitions a schema may have, nor whether there may be more than one Type Definition Hierarchy per schema document. Also, it may have the "unique characteristic" of "functioning as" either complex or simple type definition, but why? How? Is this blurring of definitions a bug or a feature? Confusing.
The term "facet" is introduced without definition. I think the term is overloaded, and in the various contexts I've seen it used (such as lately in topic maps) its definition varies.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§4.2 References to Schema Components
Regarding the subsection Schema Representation Constraint: QName resolution (Schema Document), I read this over three or four times and could make no sense of it.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§4.3.11 (non-normative) XML Representation of Simple Type Definition Schema Components
If anyone can sit down with me and explain exactly what the heck this section is trying to say (in English) I'll buy them a beer:
For each facet in the {facets} of B, there is a a facet of the same kind in R, which is the facet of the same kind in S, if there is one, otherwise a facet of the same kind whose {value} is the {value} of the facet of the same kind in the {facets} of B and whose {annotation} is absent.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§5 Schema Component Validity Constraints
By this point my eyes had begun to glaze over completely. Does anybody completely understand the implications of this section? I can't imagine trying to write a truly conformant validating parser that correctly handled all of this, nor a test suite that would cover the gamut of possibilities.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
Appendix A (normative) Schema for Schemas
It would be helpful for those printing the specification if a line break or two could be added (particularly between attribute specifications) so that certain portions currently running off the right side of the page can be read.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
Appendix B (normative) DTD for Schemas
In attempting to validate the Schema for Schemas I note that the
attribute derivedBy
is not declared on
<simpleType>
, producing validation errors. In
reading through the specification it seems that this attribute should
be included on simple type in some places, but my understanding is
that simple types are not derived from anything else. In any case,
this needs resolution.
Should XML Schema drop the distinction it makes between the abstract component level and the XML transfer syntax?
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§2.2 XML Schema Abstract Data Model
While I certainly understand the rationale for defining schemas in the abstract, the result is that the schema specification itself becomes very difficult to interpret. Could the specification have been written without resorting to abstraction? In the end I don't find the concept of an "infoset" so appealing as to believe the complexity engendered to be justified. If it keeps XML Schemas from being widely accepted, was it worth it? [rhetorical questions, really, as I'm sure there are those who disagree quite violently]
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§4.0 XML Representation of Schemas and Schema Components
The structure of this section is exceedingly difficult to read. It consists of very large "if this" and "otherwise" blocks, but as I've mentioned before, the interaction between different components is so complex as to render this (at least to me) at many time indecipherable.
Discussed at Edinburgh ftf.
We recognize that the abstraction makes for steep reading, but abstracting away from concrete syntax is a help to implementors, is a requirement from query and our charter, and allows us to define what it means to have a schema without having it in a specific transfer syntax (e.g. as a DOM). We will consider ways to restructure the specification to improve readability.
No reply from commentator.
Should XML Schema drop the equivalence-class construct?
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§2.2.2.2 Element Equivalence Class
I'm sure this feature was deemed important by someone, but adding
onto the complexity of having locally- and globally-scoped element
type definitions we also must allow for substitution. So now when we
refer to <book:title>
we need to understand that
it might also show up as <book:foo>
or
<book:bar>
. to it. This section states that "the
content of member elements is strictly limited according to the type
definition of the equivalence class exemplar" yet this is allowed to
be either the same as, restrictions of or extensions to it. Lost me
on that one. Given that any member elements may themselves have
equivalence classes, this would get mighty confusing very quickly.
Finally, "element equivalent classes are not represented as separate components. They are specified in the property values for element declarations." This is followed by a reference to the preceding section, which doesn't seem to say anything whatsoever about element equivalence classes. But I guess we're speaking in the abstract anyway, correct?
Discussed in call of 2000-06-09.
MSM noted that the commentator had not explicitly suggested dropping equivalence classes, but that that was one natural way to interpret the consequences of his remarks. Ezell noted that the important use case for this construct is late binding. The receiving processor may not understand xsi:type per se, but may understand the type bound to some GI. What's missing is an explanation of how equivalence classes and the complex-type hierarchy are different and why we have them both. An informal poll showed a preference for retaining the construct.
RESOLVED without dissent: to retain the current equivalence-class construct (leaving open, however, the issue of whether to rename it), responding to the commentator by saying that it is essential for certain functionality which we do not know how to provide otherwise, and noting for our own purposes that we may wish to elaborate on the explanation of the construct given in the spec.
What is the purpose of the notation declaration?
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§2.2.2.4 Notation Declaration
I've yet to understand the XML Schema differentiation between notations and datatypes. Isn't a notation simply a complex datatype? Doing a word search on "notation" in the specification finds almost nothing. Is this simply a grandfathering of XML notations?
It was in attempting to answer the above question that I realized that it's almost impossible to discern a conclusive definition from the specification. The last paragraph of this section includes three section references (a relatively common number, many have more), which themselves spiral inward. Travelling down just from the first reference:
...an interesting exercise better left to a computer. I think it would improve the specification to include the detail section at the point of definition. As I've mentioned earlier, it would also make things much more digestible if the specification did not define an abstract model, but simply the XML representation.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§3.11 Notation Declaration Details
This section states that "notation declarations do not participate in schema-validation as such. They are referenced in the course of schema-validating strings as members of the NOTATION simple type." Shouldn't there be a hook here to attach a notation processor? I skip back to §5.8 and find a reference back to this section ("property tableau"?), modulo the impact of Missing Sub-components, §7.3. Nothing on notation processing, and checking §3.2.13 in Datatypes still leaves this open. Perhaps I put more interest in notations than other people. Notation=ECMAScript?
Shall XML Schema tighten the rules governing schema-validation?
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§6.1 Layer 1: Summary of the schema-validation core
Another instance of befuddlement. How can this be considered acceptable? (hilighting mine):
The obligation of a schema-aware processor as far as the schema-validation core is concerned is to implement the definitions of schema-valid given below in Schema Validation of Documents (§7.2). Neither the choice of element information item to be schema-validated, nor which of three means of initiating validation are used, is within the scope of this specification.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Part 1 : Structures
§7.9 Missing Sub-components
I've tried three or four times to write up something about this section. Because of my incomplete understanding of the rest of the spec it's difficult to confidently summarize, but my reaction in general is one of mild shock. I long for the days of 'draconian' error handling, and can only attempt to imagine a Web where §7.9 becomes the norm for XML processing.
Currently, I believe that the rules, and their rationales, are (roughly) these (corrections invited):
A Within a document, the schemaLoc attribute can be used on any element to provide a suggestion for where to locate a (not 'the') schema for a particular namespace.
(Rationale: since there may be any number of things at the URI which identifies the namespace, and since content negotiation is currently, ah, imperfectly and incompletely implemented by software and incompletely understood by the average user, it's useful to have a safety valve for cases where the namespace name is not enough.)
[NM adds:] We have been over this ground many times, and without repeating all arguments, that is far from the only reason why a namespace URI might not identify the schema document. I think we have thoroughly analyzed and carefully stated exactly what we want to say about the senses in which a namespace URI does or does not potentially locate the schema. We have similarly agonized over exactly how to the state the options available to processor in dealing with the namespace URI.
I strongly urge you to not try to restate that particular aspect of our specification, which we worked so hard to design and to wordsmith.
B The schemaLoc attribute is, formally, a *hint*, not an instruction. It may be taken as a claim that a schema for the namespace in question may be found at the location indicated. The schema validator is not required to take the hint. The exact method by which a schema validator finds a schema is out of scope and system dependent. We expect schema validators to use mechanisms like command-line options and arguments, menus, environment variables, and any other user-interface mechanism implementors think their users will find helpful.
(Rationale: if I am receiving data from you, either I trust you or I validate the data. If I don't trust your claim that the document is valid, how on earth can I be expected to trust your claim that the schema at a given URI is the one we agreed to validate against? I can't be. So I need to have the right to tell the schema processor, "I don't care what the other guy said is a good schema, the schema *I* trust for this namespace is right *here*." Since the authoritative word must come from the user, not the document, and since we don't want to interfere with user interface design, it would be a huge mistake to prescribe a particular approach to allowing the user to say where to find schemas. Obviously, a processor can provide a 'trust the schemaLoc' option which will work in many cases.)
C The schemaLoc attribute also implies a claim that the relevant parts of the document conform to that schema.
(Rationale: we discussed this; some WG members would have been just as happy for schemaLoc to mean 'a schema for this NS is over there', without embodying any claims about the validity of the document. The WG as a whole, however, thought it preferable to adopt the view that the schemaLoc attribute further embodies a claim to validity against the schema referred to.)
[HT adds:] As the spec says: "On the other hand, in case a document author (human or not) created a document with a particular schema in view, and warrants that some or all of the document is schema-valid per that schema, we provide the schemaLocation and noNamespaceSchemaLocation [attributes] (in the XML Schema instance namespace, that is, http://www.w3.org/1999/XMLSchema-instance) (hereafter xsi:schemaLocation and xsi:noNamespaceSchemaLocation)."
[NM adds:] One subtlety relating to "C". Though one could argue that it is an obscure case, it is quite plausible for a document to provide schemaLocations for namespaces that it does not explicitly reference. For example, the attribute:
width="3 in" |
Is not qualified, but it is quite possible that the simple type used to validate the attribute contents is in a target namespace for which the document is free to offer a schemaLocation hint.
I don't think we should push too hard on the notion that the document is warranting its validity with respect to a certain schema or definition of a target name space. I think there are fewer risks in stating that a schemaLocation allows the opportunity for the instance document to provide hints on the likely location of useful schemas for particular target namespaces. There is certainly no requirement that I am aware of that such namespaces be explicitly referenced in the document, or that they play a particular role in validation (though in practice, few processors would bother to find a schema document known to be for a name space completely irrelevant to the validation.)
D The presence of a schemaLoc attribute does *not* constitute a request for validation.
(Rationale: there are many situations in which a document should be read, possibly by a processor which understands how to validate it, but does not need to be, or SHOULD NOT be, validated. A request for validation is a transaction between a user and a piece of software, or between two pieces of software. It is not a declarative fact about a document. It is best left to a user interface.)
E If more than one schema location is suggested for a particular namespace, it is not an error, but no particular priority is assigned to the two.
(Rationale: they are HINTS, right?)
F A validation process may start at any element in the document and work down.
(Rationale: Launching a validation process is taken to be a matter between a user and a piece of software, or between two pieces of software. It may sometimes be important to validate the entire document; sometimes only certain parts of the document need to be validated. Since the presence of a schemaLoc attribute does not constitute a request for validation (and its absence cannot be taken as a binding request *not* to validate), the user is free to select any point as the starting point. It may be expected that some schema validators will, by default, start at the top of the document. But it is important that they are not REQUIRED to do so.)
G A validation process may work in strict mode, lax mode, or skip mode. It may -- or rather, it must -- switch from mode to mode on the basis of the {process contents} property on the relevant schema component.
(Rationale: For some applications, it's essential to check every element and every attribute, and to insist that they be declared, roughly as in a DTD.
For some applications (black-box applications), it's essential to be able to specify that the schema applies only to some outer envelope, which contains well-formed XML as a payload, and that the payload does not need to conform to the schema and should be skipped entirely. Think of defining an information retrieval protocol like Z39.50 as a set of XML messages going back and forth. The envelope needs to conform to the schema, but the payload does not need to conform, and it would normally be a waste of cycles to try to validate the payload.
For some applications (white box applications), there may be a payload which need not be validated, and the elements in it need not be declared, but if elements are encountered for which declarations *are* available, they should be validated. In a template in an XSL stylesheet, for example, I may not care about validating the elements in the target namespace. But if I see another XSL element inside a target element, I probably do want to validate it.
So strict, skip, and lax are each necessary, because each describes a plausible approach to validation and to coexistence of schemas and namespaces.)
H An application may be guided by the {process contents} property on the relevant schema components, but it NEED NOT be.
(Rationale: the schema may have been devised for skip-processing, but for my purposes I may insist on lax or strict processing. My business partners may not care about the contents of the payload, but for my purposes I want to know that if the payload contains anything that claims to be a purchase order, then it jolly well conforms to my schema for purchase orders.
[HST adds:] As currently written, the spec does not provide for conforming schema processors to ignore the 'processContents' attribute directly, i.e. when encountered, its impact on the [validation attempted] and [validity] properties must be as specified. Applications which want to be stricter have the following options (this is in the spec.) for documents which are apparently valid (their root EII has [validity]!='not'):
I think this is sufficient, but if we want to allow processors to force processContents to be stricter than a schema specifies, we need to add this to the spec.
I If in the schema the relevant {process contents} property has the value 'strict' or 'lax' or 'skip', this may be interpreted as a declarative statement that documents which conform to this schema must have no errors when processed in the specified mode. It follows that if a schema processor processes a black-box payload (declared with processContents='skip') in lax mode, and finds an error, the error in question is not a schema-validity error.
(Rationale: all schema processors should give the same results, as regards schema validity. If the schema says something should be skip-conformant, you do have the right to check it in strict or lax mode, but you and your processor do not have the right to call failure to conform to the rules of strict or lax mode a schema validity error. As long as the distinction is made between failure to conform with the restrictions laid out in the schema, and other failures, all is well. You might also want a processor to check to make sure the document is in ASCII, not UTF-8 or UTF-16. That's your right, and it's OK. But the processor is not allowed to claim that a UTF-16 document is ill formed on that account.)
[HST adds:] This point is important for the proper understanding of points G and H above: you can define your own validation property, say [strict validity], and get your processor to compute it, but you can't produce a PSV Infoset that records strict validity in the [validity] property -- the spec. defines what that property means, and you can't change that.
I believe that the commentator was mostly shocked by rules B and F; I have included the others partly because I think they help make the picture more complete, and partly because some of them are becoming hobbyhorses of mine. And also because if I am wrong about any of them, now would be a better time to learn it than later.
The summary above may also be helpful for responding to issue 183.
I believe the WG has hashed out the rationales for our various validation rules at enough length that everyone is satisfied that what we have is what is needed, and that there won't be any consensus for changing rules B and F as implicitly suggested by issue LC-177.
If anyone does want to change rules B or F, or any of the others, or if anyone believes I have gotten either the rules or the rationales wrong, I am open to correction and/or discussion. But otherwise I propose that we reply to the commentator explaining why the rules are as they are, using the rationales above.
Discussed in call of 2000-06-30.
RESOLVED without dissent: to dispose of this issue with a polite no, and a rationale largely as described in MSM's note on this topic, with changes as per suggestions from Thompson and Mendelsohn.
MSM asked what form point C should take. In particular, should the provision of schemaLoc involve a warrant of validity by the author, or not? The WG discussed the issue. A straw poll showed a preponderance of evidence in favor of the status quo, according to which schemaLoc information does warrant the validity of constructs in the namespace, as defined by the schema document named.
Since there was no consensus for any change, the status quo was reaffirmed. MSM will edit [has edited] point C accordingly.
A plea for simpler prose in Structures.
Input from Murray Altheim:
Murray Altheim, Review of XML Schema Specification, 11 May 2000
Last Word
I have certainly realized during the writing of this document that at a certain point it ceases to be really useful as a review. Toward the end, my handwritten notes on my printed copy began to read "AAAAHHH" and "Huh?" more than intelligent comments. Had I several more weeks (and the fortitude to keep struggling with the text) I perhaps could produce comments on sections that at this time I do not understand. One of things I found most troubling was that my early confidence in understanding the specification eroded as I delved more deeply into it.
The definition in the abstract, a very convoluted conformance
section, global and local definitions, equivalence classes, the
handling of namespaces, a difficult to understand validation
definition, all contributed to shaking this early understanding out
of me. I also have serious reservations about various features, such
as the use of of the xsi:
namespace. I also am not
confident that creating mixed-namespace schemas is a reasonable task.
I do not at this point lay any claim to understanding XML Schemas, except at roughly the same surface level as when I started. What has changed is that I no longer have any confidence that my current understanding will suffice in writing anything more than trivial XML Schemas that are functionally correct and validate an intended document model.
This may be seen as a harsh indictment of the specification, but it merely reflects my own inability to come to terms with what must be admitted is an exceedingly complex specification. I looked back over ISO 8879:1986 to see how it compared (since for whatever reason I never had this level of difficulty understanding SGML) and I can only suggest that the ISO specification is littered with short examples. If at all possible, I would recommend removal of the abstract definitions, localizing definitions to one place in the specification, providing a glossary, and attempting to simplify some of the features that (at least to me) deserve a reevaluation for inclusion in a later version. I don't think the public is ready for the current spec.
I fully realize the amount of energy that has gone into its production and sincerely regret that I wasn't able to provide a more favourable review.
-- Murray
May simple and complex type definitions which are not at the top level be named? What are the rules governing name uniqueness?
Cf. Question on "ref" attribute
Input from Peter Canning:
Peter Canning <canning@vitria.com> to www-xml-schema-comments@w3.org, Tue, 16 May 2000 18:36:47 -0700
Can a (simple or complex) type definition inside an element declaration have a name attribute? I can't find anything in the structure 4/7/2000 that disallows it.
For example (a slight modification from the Primer):
<xsd:element name='internationalPrice'> <xsd:complexType name='myEmptyType' content='empty'> <xsd:attribute name='currency' type='xsd:string' /> <xsd:attribute name='value' type='xsd:decimal' /> </xsd:complexType> </xsd:element> |
If that is legal, am I correct in assuming its illegal to have two type definitions with the same name inside different element declarations?
PS: References to the text in 4/7/2000 spec to confirm the answers would be appreciated.
Should the XML Schema spec be reorganized with a view toward improving locality of exposition? (i.e. so that as far as possible, all the information in the spec about a given topic is given at the same place).
Input from XML Query:
Jerome Simeon <simeon@research.bell-labs.com> to www-xml-schema-comments@w3.org, Thu, 18 May 2000 11:47:39 -0400 (EDT), Subject: XML Query Comments to XML Schema (1st part)
Here is the first set of comments from the XML Query Working Group on the XML Schema last call Working Draft.
In this version, we address the following issues:
This list is not exhaustive and the XML Query WG will provide additional feedback at a later date.
- Jerome Simeon, on behalf of the XML Query WG
0. Introduction: Usage of schema for queries
There are many ways a schema might be a useful information for a query language. Here are some of the use of schema information that the XML Query Working Group find important. This part can be seen as an XML Query use case of XML Schema.
A. query formulation: knowing the structure of the document can help the user writing the appropriate query
B. query typing: knowing the structure of the document can be used to detect errors in the queries
C. query optimization: knowing the structure of the document can be used, for example, to avoid unnecessary navigation in certain portions of the document
D. querying the schema: one might want to query the schema information itself
E. query semantics: knowing the type of values can be used to choose necessary coercions, e.g., when performing comparisons.
The following comments are formulated with these scenarios in mind.
1.1 Complexity of the XML Schema specification
The XML Query group is concerned with the difficulty in understanding the XML Schema specification, both in terms of conceptual complexity and in terms of presentation complexity. Notably, it is often that information is scattered throughout the document in a way that makes it almost impossible to read sequentially. Commonality between schema components is not explicitly captured, and there are no overview tables to help with the problem. As a result, the naive reader finds it difficult to answer even simple questions about the abstract data model found in the Structures spec.
To facilitate the understanding of the document, we suggest it would be useful to enumerate all aspects of each Schema component at a single place. Notably, it would be useful to define what a complex type is at a single place.
Input from XML Query WG:
3. Suggestions for simplifying Schema Structures
The Query working group (QWG) think that the Schema Datatypes spec is well written and well designed. QWG do not know whether the Schema Structures spec is well designed; QWG do know that it is definitely badly explained, and until it is explained better, it is difficult to tell if it is badly designed. In general, it is difficult to get a good overview by reading the spec, and difficult to give good feedback, because the spec is very hard to read.
After extensive review, QWG think the best approach might be to reorganize the Structures specification so that there are two main sections, one devoted to the Abstract Data Model, the other devoted to the declaration syntax and the meaning of each kind of declaration. Concrete suggestions follow:
Formal response. Paul Cotton responds that the XML Query WG is satisfied. "We look forward to seeing any changes that you will make to improve the exposition in the XML Schema specifications."
Should XML Schema allow abstract types to be given as the type of an element in an element declaration?
Input from XML Query:
Jerome Simeon <simeon@research.bell-labs.com> to www-xml-schema-comments@w3.org, Thu, 18 May 2000 11:47:39 -0400 (EDT), Subject: XML Query Comments to XML Schema (1st part)
1.2 Abstract Types
Section 4.6 in XML Schema Part 0 describes the following use of abstract types:
<schema xmlns='http://www.w3.org/1999/XMLSchema' targetNamespace='http://cars.example.com/schema' xmlns:target='http://cars.example.com/schema'> <complexType name='Vehicle' abstract='true'/> <complexType name='Car' base='target:Vehicle' /> <complexType name='Plane' base='target:Vehicle' /> <element name='transport' type='target:Vehicle' /> </schema> |
On the other hand Section 3.4 in XML Schema Part 1 says:
"A complex type for which {abstract} is true must not appear as the {type definition} of an Element Declaration (§2.2.2.1), and must not be referenced from an xsi:type (§2.6.1) attribute in an instance document; such abstract complex types can be used as {base type definition}s, but they are never used directly to validate element content."
This effectively forbids the schema in Section 4.6/XML Schema Part 0. In addition, it does not seem to be compliant with the constraints on Schemas in Section 5.2 (Element Declaration Properties Correct) and in Section 5.11 (Complex Type Definition Properties Correct) - although the latter is somewhat cyclic (referring back to Section 3.4).
The XML Query WG would like to use abstract types in element declarations. Therefore, the XML Query WG finds the above paragraph overly restrictive and asks to change it as follows:
"A complex type for which {abstract} is true must not be referenced from an xsi:type (§2.6.1) attribute in an instance document; such abstract complex types can be used as {base type definition}s and {element type definition}s, but they are never used directly to validate element content. Instead an xsi:type (§2.6.1) attribute must specify explicitly the non-abstract derived type for every element which is declared with an abstract type". Also the sentence "{type definition} must not be an abstract type definition." should be deleted from section 3.3.
For example, the following instance-fragment should be allowed
<transport xsi:type='target:Car'>Driving Directions ....</transport> |
whereas, the following instance-fragment should not be allowed
<transport xsi:type='target:Vehicle'>Driving Directions and Flying Directions ...</transport> |
Discussed in call of 2000-06-23.
Henry Thompson proposed that this should be assigned to class C (bugs and simple errors): we intended it to work as described (except that the commentators propose wording that would rule out the use of abstract complex types for the exemplars of an equivalence class, which we do not wish to rule out). RESOLVED without dissent: to classify issue LC-181 as class C, and instruct the editor to make the appropriate changes.
Should XML Schema change its current rule that requires the binding of a generic identifier to a type to be the same for each occurrence of a generic identifier within a single complex type?
Cf. Local declarations: less or more
Input from XML Query:
Jerome Simeon <simeon@research.bell-labs.com> to www-xml-schema-comments@w3.org, Thu, 18 May 2000 11:47:39 -0400 (EDT), Subject: XML Query Comments to XML Schema (1st part)
1.3 Typing documents and queries with local types
If one considers the following simple XML document:
<authors> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author><first>Dan</first><last>Suciu</last><author> <authors> |
This document is well-formed. It can be easily defined by a user or generated by a query. However, because XML Schema does not allow to use distinct types for local elements with the same name and it is very difficult to provide a schema for it. The best we could come to used a mixed element type for authors, which looses a fair amount of information. As a consequence, this particular limitation could make type checking for query quite difficult.
The XML Query group does not yet fully understand which is the best way to solve this issue. However, the two following concrete proposals are considered as a means to address some aspects of the problem.
Proposal 1: Removing limitations on local elements
The limitation comes from XML Schema Part I: Structures, section 5.7
"If the {particles} contains, either directly, indirectly (that is, within the {particles} of a contained model group, recursively) or implicitly two or more element declaration particles with the same {name} and {target namespace}, all their {type definition}s must be the same."
Removing this limitation would address the problem, as it would allow to write, for instance, the following type for the above document:
<xsd:element name="result"> <xsd:complexType> <xsd:sequence> <xsd:element name="author" type="xsd:string"/> <xsd:element name="author" type="xsd:string"/> <xsd:element name="author"> <xsd:complexType> <xsd:element name="first" type="xsd:string"/> <xsd:element name="last" type="xsd:string"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> |
Proposal 2: Using abstract types
Another approach could be to use abstract types along the lines suggested in 1.1 above. With this approach, a schema for the above instance could be constructed as follows:
<schema xmlns='http://www.w3.org/1999/XMLSchema' targetNamespace='http://used.science.org/schema' xmlns:target='http://used.science.com/schema'> <complexType name='Author' abstract='true'/> <annotation> This assumes that this is the ur-type definition</annotation> <complexType name='SimpleAuthor' base='target:Author' derivedBy='restriction' type='string'/> <annotation>This assumes that this is an allowed derivation from the ur-type</annotation> <complexType name='ComplexAuthor' base='target:Author' derivedBy='restriction'> <annotation>This assumes that this is an allowed derivation from the ur-type</annotation> <element name="first" type="string"/> <element name="last" type="string"/> </complexType> <element name='authors'/> <complexType> <element name='author' minOccurs='0' maxOccurs='unbounded' type='target:Author'> </complexType> </element> </schema> |
'ComplexAuthor' and 'SimpleAuthor' are both (complex) types derived from the abstract type Author. In effect 'author' in 'result' can take either the concrete type 'SimpleAuthor' or the concrete type 'ComplexAuthor'.
Note that the instance must be changed to indicate the concrete type explicitly:
<authors> <author xsi:type="target:SimpleAuthor">Serge Abiteboul</author> <author xsi:type="target:SimpleAuthor">Peter Buneman</author> <author xsi:type="target:ComplexAuthor"><first>Dan</first><last>Suciu</last> </author> </authors> |
Input from XML Query WG:
2. Data Integration
There is a tension in Schema between expressiveness and ease of parsing. Schema disallows sibling elements to have the same name but different types, in order to ensure that a document can be parsed in a top down manner. This restriction makes difficult some aspects of data integration, as explained in Section 1.3 of the following.
XML Query Comments to XML Schema (1st part): <http://lists.w3.org/Archives/Member/w3c-xml-query-wg/2000May/0146.html>
There appears to be a simple way to significantly increase expressiveness while not greatly increasing the complexity of parsing. Namely, remove the above restriction on sibling elements, and replace it with a different restriction: if sibling elements may have the same name but different types, then these elements must be labelled with xsi:type in the data.
We explain what this means by considering again the example in Section 1.3 cited above. This mentioned a data integration query which yielded data in the following form.
<authors> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author><first>Dan</first><last>Suciu</last><author> <authors> |
One might wish to describe this data with a schema of the following sort.
<xsd:element name="result"> <xsd:complexType> <xsd:sequence> <xsd:element name="author" type="xsd:string"/> <xsd:element name="author" type="xsd:string"/> <xsd:element name="author" type="first-last"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:complexType name="first-last-type"> <xsd:element name="first" type="xsd:string"/> <xsd:element name="last" type="xsd:string"/> </xsd:complexType> |
The given data cannot be parsed top-down if serialized as above. However, it could be parsed top-down if serialized with xsi:type information, as required by the above proposal.
<authors> <author xsi:type="xsd:string">Serge Abiteboul</author> <author xsi:type="xsd:string">Peter Buneman</author> <author xsi:type="first-last"><first>Dan</first><last>Suciu</last><author> <authors> |
Most data would not require xsi:type (in particular, any data that is permitted under the current proposal and does not require xsi:type under the current proposal would also not require xsi:type under the new proposal). However, data with sibling elements with the same name, which is not permitted under the current proposal, would be permitted under the new proposal, so long as xsi:type information is present. This would greatly ease data integration.
To further ease data integration, it would be helpful for xsi:type to be able to refer to any type in a schema, including anonymous types. This might be achievable by use of xpath to select the anonymous type.
Discussed in call of 2000-06-30.
RESOLVED: to dispose of this issue with a polite no, saying that some members of the WG would like to relax the rule, in theory, but that in practice it appears to complicate matters too much.
Formal response to commentator. Philip Wadler says he does not wish to dissent.
Should the spec be revised to make the details of (and rationale for) lax validation clearer?
Input from XML Query:
Jerome Simeon <simeon@research.bell-labs.com> to www-xml-schema-comments@w3.org, Thu, 18 May 2000 11:47:39 -0400 (EDT), Subject: XML Query Comments to XML Schema (1st part)
1.4 Partially validated Instance and lax validation
We wonder about the reasons for and details of lax schema validation. Lax schema validation seems to allow for schema instances which make arbitrary extensions to the structure allowed explicitly by a schema in the form of additional elements or attributes.
Another issue is what type the query data model assumes for simple types of lax elements or attributes. The ur-type?
How does a schema author go about specifying a schema with parts in several namespaces, each namespace defined in a separate schema document?
Input from Steve Monk:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to Steve Monk <smonk@geocities.com> Cc: www-xml-schema-comments@w3.org, 18 May 2000 18:16:57 +0100, Subject: Re: [Moderator Action] problem with validating against multiple, independent schemas
I'm trying to validate an instance document that has been created using two separate schemas. Most of the examples of using multiple schemas that I've encountered are cases where one schema is imported into, and used to define elements in, the other. But I want the two schemas to remain independent until validation time. The example given in sections 5.5 and 5.6 of "XML Schema Part 0: Primer" seems to be doing almost what I want, so it seems like it should be possible.
Below is some markup to illustrate what I'm after. I've tried running this through two processors: Oracle XML Schema Validator 0.9 Alpha and XML Spy 3.0 Beta. The Oracle processor doesn't like the reference to the second schema -- it complains that the "Data" element (in the instance) is invalid (and then, of course, that all children have invalid namespaces). XML Spy has the opposite problem: it claims that everything from the second namespace is valid in the instance even when I change the instance so that it DOES contain invalid elements. (For example, I tried mis-spelling "Item" as "Ite" and XML Spy went happily right over it.) So, one processor refuses to accept the second namespace, and the other processor accepts it but then doesn't use it. (By the way, I also tried obvious alternatives such as in-lining the namespace in the "Data" element, as in the Primer example mentioned above, but nothing seems to work.)
So, I'm stuck, and my questions are: Is it in fact possible to do what I'm trying to do? If so, is my markup all right? If so, does anyone know of a processor that will validate the instance properly?
INSTANCE:
<?xml version="1.0"?> <ImportedStuff xmlns="http://www.myorg.com/languageOne" xmlns:two="http://www.myorg.com/languageTwo" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xsi:schemaLocation="http://www.myorg.com/languageOne languageOne.xsd http://www.myorg.com/languageTwo languageTwo.xsd"> <Description> <Comment>Here is some data based on the other schema</Comment> <Author>Steve Monk</Author> </Description> <ExternalData> <two:Data> <two:Order> <two:Item>Amplifier</two:Item> <two:Quantity>2</two:Quantity> </two:Order> <two:Order> <two:Item>CD Player</two:Item> <two:Quantity>4</two:Quantity> </two:Order> </two:Data> </ExternalData> </ImportedStuff> |
SCHEMA ONE ("languageOne.xsd"):
<schema targetNamespace="http://www.myorg.com/languageOne" xmlns="http://www.w3.org/1999/XMLSchema" xmlns:one="http://www.myorg.com/languageOne"> <element name="ImportedStuff"> <complexType> <element ref="one:Description" maxOccurs="1"/> <element ref="one:ExternalData"/> </complexType> </element> <element name="Description" type="one:descriptionType"/> <element name="ExternalData" type="one:externalDataType"/> <complexType name="descriptionType"> <element name="Comment" type="string"/> <element name="Author" type="string"/> </complexType> <complexType name="externalDataType"> <any namespace="##other" processContents="strict" minOccurs="1"/> </complexType> </schema> |
SCHEMA TWO ("languageTwo.xsd"):
<schema targetNamespace="http://www.myorg.com/languageTwo" xmlns="http://www.w3.org/1999/XMLSchema" xmlns:two="http://www.myorg.com/languageTwo"> <element name="Data"> <complexType> <element name="Order" type="two:orderType" maxOccurs="unbounded"/> </complexType> </element> <element name="Item" type="string"/> <element name="Quantity" type="integer"/> <complexType name="orderType"> <element ref="two:Item"/> <element ref="two:Quantity"/> </complexType> </schema> |
Input from Henry Thompson:
ht@cogsci.ed.ac.uk (Henry S. Thompson) to Steve Monk <smonk@geocities.com> Cc: www-xml-schema-comments@w3.org, 18 May 2000 18:16:57 +0100, Subject: Re: [Moderator Action] problem with validating against multiple, independent schemas
Your plan is fine, your instance is fine, your schemas have a small bug in them, which doesn't explain the failures you report.
Here's a fragment of the output from the validator you didn't try, namely XSV [1]:
Validation error: in unnamed entity at line 9 char 3 of file:/projects/ltg/users/ht/xml/xmlschema/monk/instance.xml: element {http://www.myorg.com/languageOne}:Comment not allowed here in element {http://www.myorg.com/languageOne}:Description 1: {None}:Comment->2 2: {None}:Author->3 * 3: |
There were 17 errors of this sort, all arising from your (perfectly reasonable) use of local element declarations. As currently spec'ed, elements declared locally must occur unqualified in instances. To validate your document as written, with no errors and the <any ...strict/> operating as you intended, I added the following to the <schema> element of your schemas:
elementFormDefault="qualified" |
Try making that edit and giving the result to XSV -- you'll see you've got the effect you wanted.
Review of XML Schema by the XML Core WG
Input from XML Core WG:
Paula Angerstein <paulaa@vignette.com> to www-xml-schema-comments@w3.org, Thu, 18 May 2000 16:10:19 -0500, Subject: Core WG comments on XML Schema wrt Infoset
The Core WG has reviewed the XML Schema Last Call public working draft of XML Schema 1.0 (dated 7 April 2000) with respect to the Infoset working draft. We appreciate the effort made to define validation in infoset terms and to cleanly layer schema infoset contributions onto the existing Infoset specification.
At this time we do not see any major issues. We encourage the Schema WG to review the infoset-related passages in the Schema spec as the Infoset spec moves from Last Call to CR to PR, as changes may be made in the spec.
Also, some members of the Core WG feel there may need to be additional schema information exposed in the Infoset, for example, to provide all the annotations associated with an element via its declaration and type hierarchy.
Another example of something that should be exposed via the XML Schema Information Set is an indicator of whether an element name is locally scoped. Without this information, XPath/XSLT processors and other software is not able to tell the difference between locally scoped names and global names (if they have the same local part and namespace) without looking at the schema.
More detailed comments on these topics may be submitted to the Schema WG during the CR period.
Respectfully submitted for the Core WG by Paula Angerstein
Should XML Schema provide a mechanism to allow names in another (foreign) namespace to be imported and 'naturalized' in the (native) namespace being defined (e.g. to allow use of those names without namespace prefixes or namespace qualification, or to allow more flexibility in reuse of existing namespaces by removing some awkwardness of reuse)?
Input from Daniel Veillard:
Daniel Veillard <Daniel.Veillard@w3.org> to www-xml-schema-comments@w3.org, Fri, 19 May 2000 08:41:30 +0200, Subject: Attribute remapping support in XML Schemas
This is too late for Last Call, but I would feel bad for not having raised publicly the following:
It seems that XML Schemas does not support attribute remapping, though this is a feature which would significantly ease the work of people trying to reuse attribute from a different namespace than the element one. There seems to be some resistance to use namespace prefixes to distinguish attributes from foreign namespaces, and the XML namespace specification allows unprefixed attributes to be anchored in another namespace than the container element.
"Note that default namespaces do not apply directly to attributes."
It would be extremely useful if XML Schemas could provide a reliable mechanism to assert the namespace associated to an attribute when not prefixed. One of the very candidate for using such a support would be the capability to retroffit the old unprefixed HTML link construct. Being able to anchor unprefixed "href" attributes into the XLink namespace using a Schema could solve in a rigorous way the problem of retroffiting HTML links within the XLink framework. I assume this would prove extremely useful in a variety of similar situations.
I also note that we don't need the name remapping per see, but being able to anchor unprefixed attributes in a namespace.
thanks,
Daniel, speaking for himself.
Discussed in call of 2000-07-07.
Discussion showed that although there was some support for Henry Thompson's proposal to extend the equivalence class mechanism to global attributes, which would address this issue in part, there was not consensus in favor of adopting that proposal, or for keeping the question open.
RESOLVED: to dispose of 186 with polite not now, maybe later. Dissenting: Campbell (by proxy), Hollander (by proxy), Olken, Sperberg-McQueen, Thompson (by proxy).
Should XML Schema derive all numeric types from rationals?
Input from Graham Kline:
Graham Klyne <GK@Dial.pipex.com> to XML schema comments <www-xml-schema-comments@w3.org>, Sat, 20 May 2000 16:51:51 +0100, Subject: XML schema -- numeric values
I would like to argue for a different approach to defining numeric values. Specificaly, rather than defining three primitive numeric types based on IEEE(?) 'float', 'double' and decimal character sequences, I argue for defining a _single_ primitive numeric type, and defining the existing types in terms of that.
I think the primitive numeric type should be the rational numbers; i.e. the set of numbers that can be expressed as n/m, where n is an integer, and m is an integer greater than zero. The textual representation could be "n/m" or "(n,m)" using decimal radix representation. A canonical representation would have numerator and denominator reduced to lowest terms, or m=1 if n=0, and all leading zeros suppressed.
The current numeric types would retain their current syntax and semantics, except that their values would be defined by a projection from the space of rational values. (I am getting this comment out in a bit of a rush, so I apologize for not being more specific at this time.)
Why change?
I have come to this issue from my work in the CC/PP working group, which is defining a format for describing client capabilities and preferences, using RDF as a base. As part of this work, we are looking at other systems that perform similar functions with a view to designing a system that does not have semantics gratuitously incompatible with those systems.
One such effort is the IETF CONNEG format [RFC 2533] (of which I happen to be an editor). This uses rational number values. These have proven to be especially useful for handling millimetre values expressed in inches, etc. (e.g. RFC 2531 and <draft-ietf-fax-T30-mapping-xx.txt>.)
The plan (as I understand it) is that CC/PP, through RDF, will inherit the XML schema system for (at least) simple types.
I understand that rational numbers can be represented in XML schema by using a pair of integers, but that such an approach would not provide comparison (<=, >=, etc.) for such values.
I happen to believe that rational numbers are the natural underlying set for numbers processed by a computer system: all such numbers are some subset of the rational numbers (integers, floats, doubles, etc.)
So I would suggest the following reasons for a different approach to numeric values, using rational values as the basis for all numbers, with value restrictions for decimal, float, etc.
I believe that the definition of the value set should be separated from its lexical representation in a character string. I think this should apply _even_ in the XML schema environment where all attribute values must be represented as Unicode character sequences.
I understand it is intended that future RDF developments will inherit the XML schema type system, and also that the current RDF approach of having resources and literals as distinct values is under consideration. I don't think that it is appropriate that the RDF model, as a generic metadata representation format, should limit itself to dealing in values that are literal strings. In particular, I don't think it is right that RDF should end up using numeric types that are based on specific textual representations rather than a well-understood framework of values and associated operators (such as the rationals).
Discussed at Edinburgh ftf.
Agreed to endorse Mark Reinhold's detailed response to this comment.
Formal response to commentator.
Various notes and suggestions for the Primer from the Internationalization working group.
Input from I18n WG:
"Martin J. Duerst" <duerst@w3.org> to www-xml-schema-comments@w3.org, Sun, 21 May 2000 21:49:35 +0900, Subject: Some last-call comments
Some last call comments on XML Schema Part 0: Primer:
[numbers are chapters/sections]
1.: 'does not provide a definitive (from the W3C's point of view)': This can be read to mean that it may be the definitive view of somebody else; as the W3C has the definitive say over what's XML schema, this is misleading.
2.2 Why is ZIP decimal? What happens with 9-digit zips, e.g. 12345-6789?
2.2 'Address may appear with an attribute called country': Say that whether or not the attribute appears, it is added to the infoset.
2.2 'there is no default value for maxOccurs' ... 'must occur exactly once': This is confusing/conflicting.
2.4 'immediately following un-named type definition': Not following, but enclosed.
2.5 Explain why 'content="empty"' is necessary; if an element doesn't have content, isn't it empty automatically? (this comes at the end of 2.5, but it should be explained when 'empty' is introduced).
2.9 last sentence: 'may not have any element content' -> 'is not allowed to have any ...' [check all 'may' in the specs, thery are other, similar cases]
2.3 'consist of a very few constraints' -> 'consists of very few constraints'
Formal response (directed to i18n WG by mistake).
Should XML Schema define a more compact method of defining enumerations of values, for the case when the schema author does not need to provide separate annotation for each?
Input from Martin Duerst:
"Martin J. Duerst" <duerst@w3.org> to www-xml-schema-comments@w3.org, Sun, 21 May 2000 18:10:53 +0900, Subject: Easier way to define enumerations
This is a last call comment to XML Schema Structures.
It should be possible to write
<xsd:enumeration value='AK AL AR...' /> |
instead of
<xsd:enumeration value='AK' /> <xsd:enumeration value='AL' /> <xsd:enumeration value='AR' /> ... |
[the example is from sect 2.3 of the Primer]
to get a more reasonable and compact notation, in particular for cases where the number of enumerated items is large.
Discussed at Edinburgh ftf.
General reluctance to make this change, since it would complicate the specification without adding expressive power.
Martin Duerst replies "Please take me down as 'not particularly pleased with outcome, but don't want to push it' on this one. [I.e. I want my dissent to be recorded, but not for the attention of the W3C Director.]"
Should XML Schema drop the current requirement that in the declaration of a complex type, the attribute declarations should follow the content-model declaration (if any)?
Input from Martin Duerst:
"Martin J. Duerst" <duerst@w3.org> to www-xml-schema-comments@w3.org, Sun, 21 May 2000 22:14:47 +0900, Subject: Attribute declarations after complex type definitions
This is a last call comment to XML Schema: Structures.
Currently, all attribute-related stuff in an element decl. has to come after all content-related stuff. This seems not very well motivated and and definitely inconvenient, and should be changed to allow either complete mixture or having these both at the start and at the end (and just add those at the start and those at the end together), or if that's not possible, preferably at the start rather than at the end.
It is more natural to have them at the start, because that's how they appear in the instance. DTD syntax made most people have attribute decl. after element content decls., but there is no need to do so for Schemas.
Discussed at Edinburgh ftf.
We think the evidence actually points the other way (i.e. left to their own devices, people wish to write or read the content model first): DTD authors are not constrained by DTD syntax to declare the element and its attributes in a specific order.
In general, if there is no particular reason to allow flexible ordering, it is better to impose an arbitrary ordering; in this case fixing an order appears to us a better design decision.
Since we have been requested to minimize arbitary changes in the transfer syntax, it seems better to us not to make this change.
Response from MSM, 20 June 2000.
Martin Duerst not convinced..
Summary of comments from XForms Group (formally part of the XHTML WG). Several points of common interest are identified:
Input from XForms WG:
Micah Dubinko <MDubinko@cardiff.com> to "'www-xml-schema-comments@w3.org'" <www-xml-schema-comments@w3.org>, Mon, 22 May 2000 12:06:06 -0700, Subject: Last Call comments from the XForms Working group
At the request of the chairs of the XForms group,and Dan Connolly, I am forwarding this to the public last call comments list.
[Full text of the attached message is at http://lists.w3.org/Archives/Public/www-xml-schema-comments/2000AprJun/att-0237/01-xschema-last-call-response.html
To summarize, this document notes possible differences in the target audiences of XForms and Schema, and calls out the following areas where greater integration between XML Schema, XForms, and P3P can be pursued:
We would like to see the formation of a "Core Data Model Task Force", to forge a mutual understanding of the requirements of the XSchema, XForms, and P3P groups, with particular emphasis on the syntax complexity issue. We would like to work together to build a core data model, or set of complimentary data models that can be deployed across the W3C.
Input from XForms WG:
Ray Waldin <rwaldin@lexica.net> to "'www-xml-schema-comments@w3.org'" <www-xml-schema-comments@w3.org>, Wed, 24 May 2000 12:32:55 -0700, Subject: Analysis of using XML Schemas Datatypes in the XForms Data Model
The chairs of the XForms Working Group have asked me to deliver the following gap analysis document to the XML Schema Working group.
Analysis of using XML Schemas Datatypes in the XForms Data Model (http://lists.w3.org/Archives/Public/www-xml-schema-comments/2000AprJun/att-0243/01-xform.datamodel.via.xschema.html)
Discussed in call of 2000-07-13.
No action seems to be needed.
RESOLVED unanimously: to close this issue number, noting that all the substantive points raised are tracked by other issues.
Comments from the DOM Working Group.
Input from DOM WG:
"Lauren Wood" <lauren@sqwest.bc.ca> to www-xml-schema-comments@w3.org, Thu, 25 May 2000 10:52:21 -0700, Subject: DOM WG comments on the schema Last Call
The DOM WG unfortunately has not been able to go through the Last Call Schemas WD in depth. We have not noticed any problems in our overview. There is one issue we would like to raise, which is that there is no infoset defined for the schema. The lack of an infoset for XML caused the DOM WG problems in Level 1, and we would like to avoid such problems reoccurring by recommending that the Schemas WG define an infoset to go with the XML Schema definition language.
regards,
Lauren Wood, Chair, DOM WG
Input from Noah Mendelsohn:
Noah_Mendelsohn@lotus.com to www-xml-schema-comments@w3.org, Thu, 25 May 2000 15:35:30 -0400, Subject: DOM WG comments on the schema Last Call
>> there is no infoset defined for the schema
I am increasingly intrigued by the notion (which I have mentioned privately to one or two members of the workgroup) that we should rename our schema components "element declaration information item", "complex type definition item", etc.. We have gone to great lengths to define the analog of infoset for schemas, and it is obvious that there is confusion about what we have done. Furthermore, this change makes clear the relationship of our PSV contribution components to the rest of the infoset.
If the legal values for one attribute depend on the value given for some other attribute (or vice versa), how can that be expressed in XML Schema?
Input from Peter van de Hoef:
Peter van de Hoef <peter.vandehoef@springsite.com> to www-xml-schema-comments@w3.org, Thu, 25 May 2000 21:44:58 +0200, Subject: Instance-parameterised constraints
We are working on an application based on XML Schemas that supports configurable products. We would like to specify constraints in schemas based on the values of certain elements in the instance document. This is what you call instance-parameterised constraints. The current version of the XML Schema specs do not support this but could you give me some hints about the way you would solve this?
We are thinking in the direction of specifying expressions instead of constant values in occurrence constraints or other attributes. See example below. Whats your opinion?
For example the following should be possible. If the value "VW Golf" is specified for "Model" then a "CarrierSet" can be chosen too (maxOccurs="1"). If the value "VW Cabrio" is specified for "Model" then a "CarrierSet" is not allowed (maxOccurs="0").
<complexType name="Car"> <element name="Model"> <simpleType base="string"> <enumeration value="VW Golf" /> <enumeration value="VW Cabrio" /> </simpleType> </element> <!-- CarrierSet. Not allowed when Model "VW Cabrio" is chosen. --> <!-- CarrierSet. Allowed when another Model is chosen. --> <element name="CarrierSet" type="boolean" minOccurs="0" maxOccurs="'Model' == 'VW Cabrio' ? '0' : '1'" /> <!-- Expression instead of constant value --> </complexType> |
With regards,
Peter van de Hoef
Springsite
Should XML Schema be aligned more closely with object-relational schema languages?
Input from Michael Stonebraker:
Charles Campbell <campbelc@informix.com> to www-xml-schema-comments@w3.org, Fri, 26 May 2000 23:17:39 -0700, Subject: Fwd: XMLSchema
Here are the comments I received from Mike Stonebraker. I have other within Informix looking at the document to see how they would affect the OR, Object Relational Model.
I Hope this is the correct list to submit these comments on. Let me know if I need to do something else.
Thanks,
chuck
Mike Stonebraker <mike@informix.com> to herbach@informix.com, Fri, 26 May 2000 09:17:19 +0100, Subject: XMLSchema
I come from the following point of view.
It must be possible to map XMLSchema into an OR Schema, so that a "shredder" will work. In addition, it is highly desirable if the OR schema can be updated directly (and not through XML). For example, one might want to directly enter purchase orders and then export them as XML objects, obeying XMLSchema.
Hence, it is desirable if XMLSchema constraints can be mapped to OR structural constraints and not supported by an ingest function.
In this regard, the following stuff seems problematic:
1) minoccurs and maxoccurs -- there is no corresponding SQL construct
2) choice -- this is basically to support union types, and SQL doesn't have them
3) all -- allows items to appear in any order, and XML is order sensitive. Not clear that there is an easy way to support this....
4) equivalence -- allows two types to be declared equivalent -- no corresponding construct in SQL.
There are probably some more cases that are ugly -- these are just the ones that occured to me on a quick reading of the XMLSchema document.
/mike
Discussed in call of 2000-06-23.
The WG discussed the points raised. Some WG members observed that mappings into the OR model are in fact possible for some of these items: SQL 99 has variable arrays which effectively provide a maxOccurs value, though not a minOccurs; choices require the use of flags for unions, or other special handling, but are not impossible to map. The ALL feature, ironically, was added largely in response to repeated requests from people who felt it would simplify the task of mapping to and from relational databases.
In any case, all the features identified are firm requirements for users other than those working with OR database management systems, and cannot be removed, even if they do make life a bit more awkward for DBMS vendors. The applications of XML, and therefore of XML Schema, extend both to DBMS-style data management and to documents and document management.
RESOLVED without dissent: to close this issue without change to our design, with the rationale outlined.
Oral discussion with commentator indicate he is, if not happy, at least resigned.
Eliminate the term obtains in its intransitive usage (e.g. on the grounds that it is archaic)?
Input from Martin Duerst:
"Martin J. Duerst" <duerst@w3.org> to www-xml-schema-comments@w3.org, Mon, 29 May 2000 12:09:34 +0900, Subject: 'obtain' in the sense of 'succeed' is archaic
Dear XML Schema WG,
Part 1 repeatedly uses 'obtain' in the intransitive sense of 'succeed'. According to Webster (http://www.m-w.com/), this use is archaic. Please replace it by 'succeed throughout.
Regards, Martin.
Formal response (directed by mistake to i18n WG).
Should the XML Schema spec be revised to include graphic representations of the important concepts?
Input from Martin Duerst:
"Martin J. Duerst" <duerst@w3.org> to www-xml-schema-comments@w3.org, Mon, 29 May 2000 13:08:06 +0900, Subject: These specs need some graphics
Dear XML Schema WG,
Both part 1 and part 2 would gain enormously by the addition of a few simple graphics, e.g. showing the type hierarchy(ies) or the various ways to derive/define elements, attributes,...
Regards, Martin.
Formal response (directed to i18n WG by mistake).
Should the datatypes spec allow negative values for precision?
Input from I18n WG:
"Martin J. Duerst" <duerst@w3.org> to www-xml-schema-comments@w3.org, Mon, 29 May 2000 19:16:57 +0900, Subject: Part 2: Precision/scale
The current design of precision/scale excludes a serious part of the solution space by restricting 'scale' to be nonNegativeInteger. It would be easy to change this to Integer; a negative value of x would mean that the lowest -x values before the decimal point are zero. This makes the value space more uniform and allows to address a large range of applications, in particular in the financial market area, where storing/transmitting data in thousands/ millions/billions is very frequent.
Discussed at Edinburgh ftf.
Requested clarification from commentator.
Discussed in call of 2000-07-27.
Jim Trezzo, as respondent, summarized the question. We should separate out the value-space implications from the lexical-space implications. The value-space implications would just be that the last n digits of the integer must be 0. Oracle and RDB do allow this; as far as JT can tell, this usage is not widespread. The lexical-space implications are that one might say if scale is -3, you write 5 and mean 5000. Or perhaps you would write 5000, and say that 5123 would be outside the value space.
Some clarifications. Some WG members argue that it could easily be confusing. There was no desire to do this for 1.0
RESOLVED: to close issue LC-197 with a polite no, not for 1.0. Rationale: confusion, need to consider this in the context of the general units question, general lack of user interest. N.B. in precision scale, exponent is positive, and should be negative. up-caret s, should be up-caret -1. Should be fixed in next release. Current point release is wrong.
Formal response to commentator. Commentator indicates it's acceptable, if the issue comes up again in considering new functionality for later versions.
Should XML Schema processors provide type information (full type declarations) as part of the PSV Infoset?
Input from XML Query WG:
Paul Cotton <paulcotton@alumni.uwaterloo.ca> to www-xml-schema-comments@w3.org, Mon, 29 May 2000 12:28:34 -0400, Subject: XML Query Comments to XML Schema (2nd part)
Here is the second set of comments from the XML Query Working Group on the XML Schema last call Working Draft.
In this version, we address the following issues:
This list is not exhaustive and the XML Query WG will provide additional feedback at a later date.
- Paul Cotton, on behalf of the XML Query WG
2. XML Query data model related issues
2.1 Treatment of anonymous types
XML Query will require access to explicit schema information for every element and attribute in order to know, e.g. what kind of operations are legal on those nodes.
The current prescription in XML Schema Part 1: Structures, section 3.3, is that if the name of the actual type definition "is absent, schema processors may, but need not, provide a value unique to the {type definition} of the declaration." Besides that this unique value is rather mysterious, Query will require something which is both mandatory, and consistent with the treatment of named types.
For the query language, we may not need the "identity" of an anonymous type - wouldn't it be sufficient to have the type definition itself? For anonymous types, equality of type can reasonably be defined as structural equivalence. For heavy users of anonymous types, that would lead to enormous redundancy in the PSV-infoset, and suggests that the infoset contributions should also include new Type Information Items that could be referenced from the EIIs.
This would be advantageous for named types too, to save users of the PSV-Infoset from having to locate schemas (except for more general schema investigation). We would like to offer the following proposal for consideration:
Schema Infoset Contribution: Element Validated by Type (Structures 3.3):
First, insert a Type Information Item for the actual type definition into the set of TIIs (see below). Since the TIIs form a set, duplicates are not inserted.
[Note that this requires some work on detailed definition of equality of anonymous types. Also namespaces must be added to named type definitions to avoid false elimination of apparent duplicates. However, for anonymous types one probably wants to ignore the namespace if the types are structurally the same.]
Then add the following to the EII:
Schema Infoset Contribution: Type Information Item
The set of TIIs that need to be referenced within the PSV-Infoset (except for the builtin simple types - and other types defined within the Schema spec?).
A TII has the structure of an EII in the Infoset for the schema that defines the corresponding <simpleType> or <complexType> element.
So navigating a TII would be equivalent to going to the schema and navigating the type definition.
Basically, a user of the PSV-Infoset would always have the content of any type definition handy (or known already from the Schema spec if in that namespace), and would also have the names of named types for strong type checking where needed.
The TII would carry the simple|complex information, so [type definition type] is not needed in the element SISC.
Also [type definition anonymous] can be omitted, since it is redundant with absence or presence of a [type definition name].
Discussed in face to face meeting of 1-2 August 2000.
What type information does / should / may / must a schema processor provide to downstream apps? Five answers are on the table:
Old options (no longer with any support?)
Newer options (still supported by some WG members?)
The difference between the type-information-items and component-information-items proposals is that the element information item for complexType has lots of things in it relating to the structure of the schema in the transfer syntax; the component information item has different properties, which relate to concepts in the abstract model rather than to structures in the transfer syntax. At one level: one deals with material discussed in chapter 3, the other with material discussed in chapter 4 of the spec.
One important point of difference among the proposals is in the handling of anonymous types; this is rather difficult in the status quo, and may be difficult (suggested some WG members) with the augmented-schemadoc proposal. Proponents of the augmented-schemadoc proposal suggested that one might think of querying the schema as one thinks of querying an instance document; there is a clear analogy with the system tables in SQL systems. One might want to take a 'view' of that syntax, however, that doesn't have imports and includes and is thus disconnected from the information one would see in an ordinary instance document. Some WG members wondered whether it would be possible to formulate the component-information-item proposal with a kind of concrete syntax 'view' so that it can be looked at the same way, thus getting the best of both worlds. Some suggested that what was required would be related to the work on an XML Schema dump format what some WG members have talked about; there were concerns, however, that this work would not be completed early enough for inclusion in a CR.
Some WG members wished to continue work (in the longer term) on an XML wrapper for the component-information-item proposal, but emphatically did not want to expose schema information at the schema-document level: we have carefully distinguished the abstract and transfer-syntax levels partly because some conforming schema processors will never have schema documents in the transfer syntax; it would be wrong to force them to generate a schema-document form of their schema just for the sake of exposing the schema to downstream applications.
Some WG members suggested that we should identify properties that might be expensive as infoset items and make them optional as infoset items.
Resolved: to specify that conforming schema processors may, but need not, make schema information available to downstream applications. (This does not affect the ability of downstream applications to say that such information is required as part of their input.) It is not required that conforming schema processors provide a mode in which this information is available.
Among the options before the WG, the large majority preferred the last. Resolved unanimously: to dispose of issues 162 and 198 by instructing editor appropriately wrt CIIs.
[Issue of whether the 'entire' schema must or may be exposed, or only the parts of the schema exercised by the document, was decided in the call of 2000-08-10: since there would be no way to test conformance of a requirement that the entire schema be dumped, it is not required that a processor make any components accessible other than those actually used in validating the document. Processors are not forbidden, however, to make other components available to downstream applications, and thus may, in practice, dump the entire schema.]
RESOLVED unanimously: to instruct the editor to make clear in the spec that any schema processor which provides schema-component information to downstream applications must provide access to:
Should XML Schema provide a standard representation for the schema information items of schemaless (DTD-only, or DTD-less too) documents?
Input from XML Query WG:
Paul Cotton <paulcotton@alumni.uwaterloo.ca> to www-xml-schema-comments@w3.org, Mon, 29 May 2000 12:28:34 -0400, Subject: XML Query Comments to XML Schema (2nd part)
2.2 Schema for schemaless documents
We do require a standard way to represent the "schema" of documents which have DTD's or do not have any schema at all. In particular, we need to have a representation for the ur-type.
Discussed in call of 2000-06-16.
Thompson noted that when a user requests schema validation of a given document it need not (depending on environment and run-time options we don't specify) be an error if no schemas are found for the document.
Connolly proposed to reclassify this issue as class A, and reply along the vein suggested by Thompson. Various WG members suggested that leaving the type information properties absent in the cases described by HST was probably not useful for XML Query, and argued that we should respond by providing the information requested. There will be interactions with provisions for lax vs. strict validation, and possibly with the user's run time environment, which require some thought. We should show XML Query a first draft and ask if it answers their needs.
RESOLVED: to appoint Thompson, Robie, Beech, and Malhotra as respondents for this issue, and ask them to prepare a draft response.
Discussed at face to face meeting of 1 August 2000.
The task-force proposal (mailed to IG on 17 July by Ashok Malhotra) was summarized thus:
There was some discussion of what interpretation to give the urType in this context. In one view, it is the union of all value spaces; in another it is simply the string type. This matters, for example, in a search for all the strings in a document: either it will return everything known to be strings or else it will return all the character sequences in the document except those in mixed content.
A large majority of the WG preferred to interpret an an undeclared attribute or an element with only #PCDATA content as having the urSimpleType, rather than having the type string.
Resolved unanimously: to dispose of LC 199 by adopting the proposal of the task force, with the interpretation just agreed on.
Sometimes the order of children is significant in an XML element, and sometimes it is not. For some types, order is never significant; for others, order is sometimes significant; even when order is significant, it may be irrelevant and ignored for certain kinds of processing.
Should XML Schema provide a mechanism for allowing a schema author and/or a document author to specify that the order of children is, or is not, significant?
Input from XML Query WG:
Paul Cotton <paulcotton@alumni.uwaterloo.ca> to www-xml-schema-comments@w3.org, Mon, 29 May 2000 12:28:34 -0400, Subject: XML Query Comments to XML Schema (2nd part)
2.3 Treatment of collections
In processing a query, sometimes the order of children in an element is relevant and sometimes it is not. In the case where order is not relevant, additional optimizations may be performed. It would be helpful if schema could provide some way to indicate whether the order of the children is significant. For instance, this might be done by giving a type an `ordered' property. Thus, just as the content of a non-empty element is always either mixed or elementOnly, it also might be either ordered or unordered.
Discussed in teleconference of 2000-06-23:
The WG discussed this issue; it was clarified that the ALL group does not address this question, both because the question applies to any case where maxOccurs > 1, and because the sequence of items in an ALL group may be used to carry information (precisely because it is not constrained). Some WG members felt that the proposal would impose an unacceptable burden on schema authors, who would be forced to make in advance a distinction which is actually relevant only seldom; others felt that the distinction between ordered and unordered children in an element was fundamental and would be answered as a matter of course by any plausible document analysis procedure -- it is the inability to record that information, they maintained, that constitutes a burden on schema authors.
It was believed by some WG members that this issue is tied in with other issues relating to min- and maxOccurs, and should be decided in tandem with them. Specific issue numbers were not identified, however; the issues in question may be among those raised in the third message from the Query WG, which has not yet been processed into the lcissues document.
Discussed further in call 2000-07-20.
The question is on a proposal by Paul Cotton (reviving, in some sense, a proposal discussed during development) to allow a schema author and/or a document author to specify that the order of children is, or is not, significant. The chair suggested that there were three basic possibilities:
N.B. proposals all have override only in the direction from significant to non-significant, not vice versa: it's a sticky bit.
Discussion elicited two further variations on the proposal:
A number of points were raised in the discussion; some but not all are noted here.
One argument which swayed several WG members was the observation that the ordered/unordered information would not have any effect on validation: it is there in order to be passed through to downstream applications. Some inferred from this that the information doesn't really belong in the schema proper, but instead in the appinfo; this led to proposal D.
Some noted that the rationale given for including provision for such information always seems to involve relational data, but SQL and other relational schemas do not provide hooks for this kind of information: why need we do so? Others said that relational schemas provide no such hooks, because relational systems consistently specify that rows (and columns) are intrinsically unordered, and a user who wishes to impose an order on the presentation of rows and columns must always do so explicitly. Since XML documents intrinsically have ordered children, a query system might simply always assume that sequence is always and everywhere significant and must always be preserved -- but many search optimizations rely on being able to destroy or ignore the ordering of elements. It was observed that a system might ignore ordering for all query operations (and thus be able to use these optimizations), while keeping track of original document position in order to allow resorting of the result into document order at the end; other WG members conceded that this is true but argued the final sort is too expensive an operation to perform when unnecessary.
The focus on query systems is slightly misleading, said some: many formalisms distinguish systematically among sets, bags, and sequences, and these distinctions often determine the legality or meaning of various operations.
A straw poll taken as time was running out showed a preponderance of preference and acceptability for proposal D (define an element, possibly in a namespace other than XSD, for use in appinfo, and show how to use it).
Matt Fuchs, Jonathan Robie, and Peter Chen agreed to serve on a task force to work out a more detailed description of proposal D.
Discussed further in call 2000-07-21.
We had concluded, at the end of Thursday's call, that we have a preponderance of opinion in favor of defining an element to distinguish sequences from sets, for use in appinfo, and that we wish to see a more fully worked out proposal before making a final decision. Matt Fuchs, Jonathan Robie, and Peter Chen had agreed to serve on a task force to draft such a proposal, after the bridge connection had dropped.
Jim Trezzo and Frank Olken volunteered to serve on the task force. Noah Mendelsohn agreed to chair.
Paul Cotton replies on behalf of XML Query that the XML Query is satisfied.
Should XML Schema support key references across document boundaries?
Input from XML Query WG:
Paul Cotton <paulcotton@alumni.uwaterloo.ca> to www-xml-schema-comments@w3.org, Mon, 29 May 2000 12:28:34 -0400, Subject: XML Query Comments to XML Schema (2nd part)
2.6 Referential mechanisms across multiple documents
Query has a requirement to query across collections of documents, which implies that we will need referential mechanisms other than URI references (e.g., keys/keyRefs) across multiple documents. In version 1, the reference mechanisms defined by Schema are restricted to a single document. Mechanisms such as XPointer might address inter-document references if extended to support the keyRef datatype. We believe there is a future requirement for referential mechanisms between documents.
Discussed in call of 2000-07-07.
RESOLVED unanimously: to dispose of LC-201 by saying yes, we agree this is a relevant topic for further work and possible extensions to various specs, though we do not believe it is feasible to address this question in XML Schema 1.0. There *is* relevant functionality for cross-document reference in XML Schema 1.0, though there is none for cross-document key checking.
Should the post-schema-validation info set have properties for the 'internal' representation of data values (e.g. for a binary form of integer)? Should the XML Schema WG cooperate with other WGs (XML Query, DOM, ...) to decide this issue?
Input from XML Query WG:
Paul Cotton <paulcotton@alumni.uwaterloo.ca> to www-xml-schema-comments@w3.org, Mon, 29 May 2000 12:28:34 -0400, Subject: XML Query Comments to XML Schema (2nd part)
2.7 Internal representation of datatypes
Schema defines datatypes in the PSV Infoset for Query to access. The PSV is extracted from the XML document by a PSV enabled parser. The Query WG is interested in working together with the Schema WG and other working groups, e.g., DOM, to determine whether the physical representation of each schema primitive datatype (e.g., floating point numbers) should be an optional PSV characteristic. This would increase interoperability by moving the conversion of datatypes into the realm of a PSV Schema processor.
Some members of the Query WG believe that this comment encroaches on implementation details, but would like to further discuss this issue with the Schema WG.
Discussed in call of 2000-06-16.
There was discussion of merging this with other infoset-related issues, but in the end the WG agreed that this question is out of scope. Some WG members (at least MSM) thought it might be in scope for DOM, but the majority felt otherwise.
RESOLVED without dissent: to reclassify this issue as B (out of scope), and note in our response that a strong majority of the WG feel that an internal representation of values is an implementation issue which has, in principle, no place in the definition of an information set. It is thus out of scope for XML Schema.
Formal response. Commentator is satisfied. "We expect to have to deal with this issue when we define operators for the Schema data types. We hope Schema will work together with us on this problem."
Parts 1 and 2 seem to be out of alignment on the properties of simple types.
Input from XML Query WG:
Paul Cotton <paulcotton@alumni.uwaterloo.ca> to www-xml-schema-comments@w3.org, Mon, 29 May 2000 12:28:34 -0400, Subject: XML Query Comments to XML Schema (2nd part)
2.8 Infoset contributions for simple types
There are differences in the infoset for simple types (datatypes) between part 1 and part 2 of the schema spec:
A. The part 1 spec has an [abstract] property. The part 2 spec does not.
B. The part 1 spec does not have the property [fundamental facets]. Except for "bounds", the other fundamental facets (equal, order, cardinality, numeric) are constant for a base datatype and its derived types. There is no need to represent this constant information in the PSV Infoset.
C. The structures spec has 2 properties [base type definition] and [primitive type definition]. The datatypes spec has a single property [base type definition]. The primitive type can be obtained by following the base type chain, but storing the primitive type is more efficient for certain kinds of type inference.
Type coercions will be needed to support operations on values. No operations are defined by XML Schema, but they will be defined by XML Query, which would like to work with Schema on the coercion question.
Input from XML Query WG:
Paul Cotton <paulcotton@alumni.uwaterloo.ca> to www-xml-schema-comments@w3.org, Mon, 29 May 2000 12:28:34 -0400, Subject: XML Query Comments to XML Schema (2nd part)
3. Algebra related issues
3.1 Operations
There is a need for operations to be defined on base types. Schema doesn't define any built-in operations or provide any mechanism for user-defined operations on types. As a result, the Query WG needs to define these. The Query WG will also need to determine the type of the arguments to select the right operator (e.g., floating point vs. integer arithmetic) and do the appropriate type coercion. The type coercion rules need to be defined. The Query WG is intending to define these operations and looks forward to doing this in cooperation with the Schema WG.
Discussed in call of 2000-06-29.
RESOLVED unanimously: to dispose of this issue by agreeing that a joint task force to examine this question is a good idea, and naming members of the XML Schema WG willing to serve on such a task force. Paul Biron, David Cleary, Martin Gudgin, Ashok Malhotra, and Jonathan Robie volunteered to serve on the task force.
Formal response. Commentator is satisfied.
A variety of i18n-related issues relating to the Primer.
Input from Martin Duerst:
From: "Martin J. Duerst" <duerst@w3.org> to w3c-xml-schema-wg@w3.org, Sun, 21 May 2000 21:48:58 +0900, Subject: I18N Last call comments on Schema Part 0
This are the last call comments on XML Schema Part 0: Primer from the I18N WG/IG.
These comments don't discuss changes that may have to be done to address changes in Parts 1/2 as a consequence of i18n comments to those two specs.
[1] The I18N WG/IG is very pleased that internationalization (i18n) is used as an example to show some core concepts. The comments we make below should not lead to changing to another example domain.
[2] However, the examples chosen give a very inappropriate impression of i18n. I18n is not the extension of an US solution to the UK. This can easily be corrected, and should be corrected. More details mainly in comments [5], [10], but also [3], [4], [8].
[3] All examples have to use xml:lang for every piece of readable text, e.g. at least all things such as product names, comments,..., and all elements that contain the formentioned kind of elements. xml:lang can only be avoided on things like date, price, quantity. This applies both to schemas and instances.
[4] Addresses are flagged from the start with "country='US'". Prices also have to be flagged from the start with a currency to show good practice. Also, 'weight' should either have a comment indicating which metric unit this is, or have an attribute that gives or fixes the (hopefully metric) unit.
[5] Names of elements should be choosen carefully from the beginning. If an element has an attribute 'country' fixed to 'US', then it should from the start be called 'USAddress'. There are a lot of reasons for this, from making sure people know what kind of type the name refers to both short-time and long-time to make sure the naming is 'politically correct'.
[6] In the examples in 2.3, show that and where datatypes can be international. In particular, include e.g. an accented word in the 'string' example. Where such an example is used (e.g. Table C1), it should use the mechanisms of HTML/XML and display the actual character (in the right column at least).
[7] Notations such as [A-Z]{2} in regexp may work in that specific case, but 'two upper-case letters' is not correct (there are many more upper-case letters, including accented Latin ones, Greek and Cyrillic ones,...), it should be 'two ASCII-only upper-case letters' or something similar, and should say that in the general case, letter categories or lists of letters should be used.
[8] Currency codes should use standards, i.e. EUR and not just EU. (Section 2.5). [sorry, don't remember the ISO standard number] Of course each schema can do what it wants, but W3C examples should conform to good practice unless there is a specific point in diverging.
[9] At the end of sect 2.5, there is an alternative given between 'any' and 'string'. The choice and the i18n consequences have to be explained clearly. [see our Schema comments on this issue]
[10] There is more to addresses than US addresses and UK addresses. As examples, Japanese addresses are completely block-based; there are not many street names, and street names don't turn up in addresses. In Singapore (and Hong Kong), the city field is superfluous, because it's the same as the country field. So the 'IPO' schema has to be adapted.
[11] The IPO schema suggests to include all the derivations for the various countries. This will lead to a very long file. Some kind of division into files,... should be used or at least should be suggested in the text.
[12] The first paragraph of sect. 2.4 should say that it may be important to use explicit types to help later modification.
Input from Misha Wolf:
Misha Wolf <misha.wolf@reuters.com> to w3c-xml-schema-wg@w3.org, Thu, 25 May 2000 21:05:24 +0000 (GMT) Subject: More I18N Last call comments on Schema Part 0
These are some more last call comments on XML Schema Part 0: Primer from the I18N WG/IG. One of the comments [MW6] relates also to XML Schema Part 2: Datatypes. Some of the comments are not I18N specific.
As some of our comments are being written up by Martin, and some by me, I am prefixing each comment number with "MW", to avoid numbering clashes.
[MW1] Section 2.2 -- "An element is required to appear when the value of minOccurs is 1." Suggestion -- "An element is required to appear when the value of minOccurs is 1 or more."
[MW2] Section 2.3 (and others) -- "Sku (shorthand for a product number)" Is "Sku" an acronym? If so, what is its derivation?
[MW3] Section 2.5.1 --
In our comment [8], Martin wrote:
> [8] Currency codes should use standards, i.e. EUR
and not just
EU
. (Section 2.5). [sorry, don't remember the ISO standard number]
Of course each schema can do what it wants, but W3C examples
should conform to good practice unless there is a specific
point in diverging.
The relevant standard is ISO 4217.
[MW4] Section 2.8 -- "... by adding attributes to the item element indicating whether or not the item is in stock, ..." Where is this attribute?
[MW5] Section 4.3 -- "goods are shipped to England" ... "UK-Address" Not surprisingly, some people don't like the equation: England = UK Please change to: "goods are shipped to the UK"
[MW6] Section C --
"\w" ... "XML 1.0 Letter or Digit"
(linking to http://www.w3.org/TR/REC-xml#CharClasses)
This raises serious questions of functionality. My first response
was "Why XML 1.0"? If the XML specification were, at some point
in the future, reissued with a higher version, would "\w
" continue
to stand for an "XML 1.0 Letter or Digit"? Surely not.
As the primer is non--normative, I then consulted the "XML Schema
Part 2: Datatypes" specification:
http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#dt-ccesN
The definition there is quite different, but equally worrying:
"[�-]-[\p{P}\p{S}\p{C}]
(all characters except the
set of "punctuation", "separator" and "control" characters)"
Why does the range end at 0xFFFF?
[MW7] Section C -- "defined by Unicode" links to "http://www.w3.org/TR/xmlschema-0/" (multiple occurrences)
Input from C.M. Sperberg-McQueen:
>[MW5] Section 4.3 -- "goods are shipped to England" ... "UK-Address" Not surprisingly, some people don't like the equation: England = UK. Please change to: "goods are shipped to the UK".
Speaking only for myself, I observe that as far as I can tell, the text does not make the objectionable equation. Since England is part of the UK, and since the postal standards apply uniformly within the UK, shipping goods to England requires a UK address.
If one were to write of shipping goods to Illinois, and then use a type named 'US-Address', would one be equating Illinois and the U.S.?
I recognize that sensibilities are touchy with respect to terms of nationality in virtually all parts of the British Isles. But I think the objection to be met is more accurately described as one against using the term 'England' in proximity to the term 'U.K.', since such proximity distracts some readers from the point at hand and leads them into meditations on the power relations in the British Isles which are not germane to the task of introducing the fundamental constructs of XML Schema 1.0.
Rather than substitute the colorless "goods shipped to the U.K.", however, I would rather suggest that the editor change it to "goods shipped to Wales" or "goods shipped to Scotland", or perhaps a city name ("goods shipped to Manchester"?). I won't suggest shipping to Northern Ireland, for fear of triggering a different but equally distracting train of non-germane reflections.
Formal response to commentator.
On 12 October, Misha Wolf asks that the status of this issue be set to not-ok (member-only link).
A variety of i18n-related issues relating to Structures.
Cf. Easy add-ins
Item [7] of this list is reflected in issue LC-215.
Cf. Merge mixed, text-only, and string?
Item [8] of this list is reflected in issue LC-216.
Cf. Allow pattern on complex types?
Item [9] of this list is reflected in issue LC-217.
Input from Martin J. Duerst:
"Martin J. Duerst" <duerst@w3.org> to w3c-xml-schema-wg@w3.org, Tue, 30 May 2000 18:11:01 +0900, Subject: I18N Last call comments on Schema Part 1
This are the last call comments on XML Schema Part 1: Structures from the I18N WG/IG.
The comments are numbered by [n], but their order does not reflect their importance.
[1] The spec repeatedly contains language such as "the string composed of the [character code] of each of the element information item's character information item [children] in order" This is overly complex and confusing. First, a string is composed of characters, not of character codes (which are numbers). This has to be corrected. Second, the phrase is used so often and the concept behind it so obvious that it would help a lot to define a term for it once. [Similar phrases are also found in Part 2, this comment should also be refelected there; it is made here only once for both parts.]
[2] Section 3.12 says: 'In the case of {user information}, indication may be given as to the identity of the (human) language used in the contents, using the xml:lang attribute.' Please change 'may' to 'should'. Also see points [3], [4], [5].
[3] Please indicate how annotations in multiple languages are done. Being able to make annotations in multiple languages in a clearly defined and interoperable way is important.
[4] Section 5.9, in point 2, says that the value of xml:lang must conform to the req's set out in XML 1.0. There are two problems here:
[5] It should be made clear that <documentation> can contain additional markup. As neither <annotation> nor <documentation> is defined in App. A, this isn't clear.
[6] It should be clear that for all references of URIs/URI References, this is to be understood as including the provisions of relevant section of the W3C Character Model (http://www.w3.org/TR/charmod/#URIs). Please see point [30] of our comments to Part 2.
[7] In http://lists.w3.org/Archives/Member/w3c-xml-schema-wg/1999Nov/0007.html we have made a detailed request to make sure that XML Schemas can address the problems of i18n-related markup. This detailed request was listed as issue 209 but summarily abandoned. http://www.w3.org/XML/Group/xmlschema-current/issues.html#easyAddIns We have not received any response that would allow us to determine that these issues are addressed satisfactorily in the current spec. We herewith resubmit the abovementioned mail as part of this last call comment, and request the XML Schema WG to provide a detailed answer as part of the resolution process so that we can decide whether our requirements are met. Apart from this general answer with followup, we mention a few specific points below [8].
[8] The mail mentioned in [7] mentions addition of elements and attributes in general, but one particular and particularly frequent case is the addition of child elements to elements that do not have any child elements defined yet. In the current draft, such elements can be defined in two ways, either as 'mixed' without any elements specified or as 'string'. [There may be a third one, 'textOnly', as guessable from 4.3.3. However, the spec seems not consistent on this. For example, there is: {base type definition} The type definition resolved to by the value of the base [attribute], if present, otherwise the simple ur-type definition if the content [attribute] is textOnly, otherwise the complex ur-type definition. but earlier, there is only one ur-type, so this is confusing.]
In order to make extensions easy, the 'mixed' type without child elements and the string type (as long as not restricted by a facet, and see point [9]) should be merged. In terms of functionality, this should not provide any problems at all, because it is just a question of deferring decisions until they really are necessary.
It may be claimed that instead of merging 'mixed' and 'string' as above, it would suffice to always use 'mixed' in cases further addition of elements is desired. However, we feel that this is not sufficient, 'string' is too easy to use and will be used in too many instances.
[9] As explained in item [35]/[36] of our comments to part 2, it will often be necessary to include character repertoire constraints in XML Schema. Such constraints should also be applicable to character children even if an element also has element children. This can easily be done by allowing a pattern facet even on complex types provided that this pattern facet only consists of a character class expression. This does not pose any problems with respect to the interleaving order of characters conforming to the pattern and elements conforming to the content model.
[10] The verbal complexity of the XML Schema specs, in particular part 1, is extremely high. We have serious doubts regarding understandability by non-native speakers as well as translatability. We ask the XML Schema WG and the editors to undertake every effort to use clear and simple language.
Formal response to commentator.
On 12 October, Misha Wolf asks that the status of this issue be set to not-ok (member-only link).
A variety of i18n-related issues relating to Datatypes.
Cf. Solve C0 control-character issue?
Point [4] of this list is reflected in issue LC-218.
Cf. Lay foundation for multiple lexical representations?
Points [7], [8], and [9] of this list are reflected in issue LC-219.
Cf. Single lexical representations?
Points [10]-[15] and [27] of this list are reflected in issue LC-220.
Points [17]-[26] of this list are reflected in issue LC-221.
Input from Martin Duerst:
"Martin J. Duerst" <duerst@w3.org> to w3c-xml-schema-wg@w3.org, Mon, 29 May 2000 18:57:09 +0900, Subject: I18N Last call comments on Schema Part 2
This are the last call comments on XML Schema Part 2: Datatypes from the I18N WG/IG.
The comments are numbered by [n], but their order does not reflect their importance.
[1] The definition of 'match' has been copied from XML 1.0. There are propsals for clarifying XML 1.0. The Schema WG should work together with the XML Core WG and the I18N IG to make sure everything is in sync.
[2] The spec says that length for strings is measured in terms of [Unicode] codepoints. This is technically correct, but it should say it's measured in terms of characters as used in the XML Recommendation (production [2]).
[3] In 2.4.2.12, it says 'For example, "20" is the hex encoding for the US-ASCII space character'. It should say something like '"20" encodes a byte value represented e.g. in C as 0x20, which may stand for the space character if US-ASCII (or UTF-8) is used to encode it.' But actually this is a bad example, because encoding text with base64 is a bad idea and is against the spirit of XML.
[4] ...
[5] related to [4]: 3.2.1 seems to allow all Unicode/ISO 10646 characters, this is not true (see [4]).
[6] 3.2.1: Expand 'UCS' to Universal Character Set.
...
[16] For elementary types, there may be a desire to allow whitespace around the actual data. To be clear, the spec should explicitly say that this is disallowed. (except for cases where it has to be allowed for XML/SGML conformance, i.e. ENTITY, ID,...). Another way of expressing this comment is to say that the spec should make clear for which datatypes CDATA attribute-value normalization should be chosen, and for which datatypes not.
...
[28] String length: There should be a note saying that string length as defined here does not always coincide with string length as perceived by the user or with an actual amount of storage units in some digital representation, and that therefore care should be taken both when specifying some bounds as well as when using these bounds to try to derive some storage requirements. [Although this is not an i18n issue, our group also found the simultaneous availability of 'length', 'minLength', and 'maxLength' highly confusing.]
[29] String ordering: This feature seems to be present for no real use, and should be removed. User-oriented string ordering is highly complex and locale-dependent, and is dealt with in other standards (ISO/IEC 14651 and Unicode TR #10). Locale-independent ordering only makes sense if it is usable for something. This may be actually the case if it were possible to specify that all subelements of a given element have to appear in a given order (just to avoid variation). If this is possible with XML Schema, the orderedness of string may be kept. If not, orderedness as a facet should be removed altogether. In any case, the related facets min/maxIn/Exclusive must be removed, because they never lead to any useful subset of strings. (E.g. assume minInclusive='a' and maxExclusive='b'. This makes sure the first letter is a lower case 'a', but allows any letter whatsoever (from the whole Unicode repertoire) after the 'a'. This is most probably not what a naive user is expecting (but as good as we can get), and for an advanced user, this (and many other useful things) are much easier specified by patterns).
[30] URI Reference: This definition must be changed to allow for characters not allowed in URI References, in order to be in accordance with the relevant section of the W3C Character Model (http://www.w3.org/TR/charmod/#URIs) and all the W3C Recommendations and upcomming Recommendations in accordance with it (HTML 4.0, XML 1.0, RDF, XPointer, XLink,...). [While at it, please also remove the definitions of 'absolute uriReference' and 'relative uriReference' if you don't use it, and make sure you mention that RFC 2396 has been updated by RFC 2732: Format for Literal IPv6 Addresses in URL's R. Hinden, B. Carpenter, L. Masinter, December 1999. e.g. at http://www.ietf.org/rfc/rfc2732.txt]
[31] 3.3.1 language: The 'LanguageID' production in XML 1.0 is too narrow. It fits the currently allowed languageIDs of RFC 1766 tightly, but RFC 1766 is being upgraded (see http://search.ietf.org/internet-drafts/draft-alvestrand-lang-tags-v2-01.txt). The I18N WG/IG are working together with the XML Core WG to make sure XML can be adjusted appropriately, and that no premature overly restrictive decisions are taken. The XML Schema WG should work together with the above WG to coordinate this issue.
[32] The 'length/minLenght/maxLength' facets on 'language' are highly doubtful; they do not correspond to any useful concepts in the value domain of this datatype.
[33] It is unclear why certain datatypes are derived from 'string' (e.g. language, nmtoken, name, ncname), but not others (e.g. ID, idref, entity, notation, qname).
[34] Pattern combinations: Section 5.2.4 says that multiple patterns in a derivation of a single type are combined as if they were separate branches of a regular expression. Branches result in an 'OR' combination, i.e. the actual string can conform to either branch. It seems much better to change this to an 'AND' condition, i.e. the actual string has to conform to BOTH regular expressions. There are several reasons for this: - Restrictions on all kinds of facets, on the same derivation or on subsequent derivations, can very generally be modeled as AND conditions (i.e. for a derived simple type, all conditions on that type and any base types apply simultaneously). This allows to deal uniformly with all such restrictions, and to avoid special cases. E.g. instead of saying that having both a minInclusive and a minExclusive on the same derivation is illegal, one of them just becomes redundant. - The regular expression syntax does not allow AND conditions. However, such conditions are frequently used in programming. In programming, they don't have to be part of the regexp syntax, because they can be modelled as two subsequent checks. In XML Schema, there is no device for subsequent checks. - AND conditions on regular expressions are in particular important for i18n (see point [35]).
[35] It has to be possible to specify various restrictions on a string simultaneously. In particular, we expect that combining a restriction regarding the character repertoire (e.g. to deal with encoding restrictions in legacy systems) and a restriction on the structure of a string will be quite frequent. See also point [36].
[36] Some of the regular expressions needed will be quite long. As an example, the regular expression to limit the repertoire to those characters expressible in the traditional Japanese encodings results in a character class with about 6000 characters. To make this reasonably possible, we suggest: - To allow XML spaces in regular expressions (including character classes) in the same way they are allowed in the newer Perl versions. This will lead to greater readability for many other applications, too. - To allow to define character classes or regular expressions in general as objects of their own that can be referenced either in a 'pattern' element or directly in a regular expression or character class. - If the point just above is not possible, in any case to make sure that patterns are combined by 'AND' in the derivation hierarchy.
[37] In appendix E, remove the 'CS Surrogate' character property. Surrogates do not appear on the level that XML Schema is working.
[38] For character sequence '\w', please make sure that the character
class does not end at &#FFFF;
but at 
FFFF;
, and that this
is consistent in the primer.
[39] Upgrade the reference to ISO 10646 to the year 2000 version, removing the reference to the amendments.
[40] Upgrade the reference to Unicode to version 3.0.
[41] Make sure whether/that block escapes are normative (i.e. change the various 'may' in their definition to something more appropriate).
[42] Try to give less US-centric examples.
[43] Make sure that the character property categories and block escape classes for Unicode characters are not bound to a single version of Unicode. This would create an update problem as soon as Unicode is updated, which is sure to happen rather soon. XML Schema should be independent of such upgrades, otherwise this part of it will soon be less and less useful. The pointer to version 3.0.0 of the Unicode Database should be changed to a generic pointer to the latest version.
[44] The current regular expression syntax does not take into account combinations of base characters and combining marks easily. This can be inconvenient for certain scripts, and will become more and more inappropriate because the encoding of precomposed characters has been stopped. There should be a note pointing out this problem, and the XML Schema WG should have a plan of how and when to address this (i.e. the upgrade to the next level of regular expressions according to Unicode TR #18).
[45] Several examples could be less US-centric. In particular, the example in 5.2.11 should be changed from Farenheit to Celsius.
[46] In appendix A, all prose should fall under xml:lang='en'
.
[47] There are a number of inconsistencies and typos, but given the large number of needs for changes as discussed above, it seems more appropriate to check and report such problems on a second reading after an update.
Formal response to commentator.
On 12 October, Misha Wolf asks that the status of this issue be set to not-ok (member-only link).
Shall the equivalence class construct be renamed to avoid confusion? (Equivalence relations are by definition symmetric, but the relation specified by the equivClass attribute is not symmetric.)
Discussed in call of 2000-07-21.
The WG took a negative straw poll on the live possibilities (i.e. the series of dots below show the number of WG members who found the possibility in question problematic, unclear, or otherwise undesirable).
RESOLVED: to dispose of this issue by instructing the editor to use one of the top preferred alternatives for each of these:
w1: 'bold' ________ 'phrase'
is in the equivalence class of ==>
w2: 'phrase' ________ 'bold'
w3: 'bold' and 'italic' are in the ________ of 'phrase'
equivalence class ==>
w4: 'phrase' ________ a/an &w3;
==>
w5:
<element name='bold' ________='phrase'/>
equivClass= ==>
Formal response to commentator. Wadler replies (by private mail) that "'Substitution groups' is an awkward name without regard for the long history of subtyping, but at least it does not suggest a symmetric relation as 'equivalence classes' does. Thank you for the improvement. I see no reason to record a dissent."
Shall XML Schema be modified to make it easier to extend a set of enumerated values? That is, should there be some mechanism to make it easier to define an enumerated type with specified values (which have meanings specified, for example, in the documentation) without actively restricting the ability of a user to supply other values, or of a schema author to derive another enumerated type by extending the set?
Input from Curt Arnold:
I think the actual resolution of the symbolic constants is outside what you could expect from a generic XML processor, however a resolution of LC-2 conjunction types (http://www.w3.org/2000/05/12-xmlschema-lcissues.xml#conjunction-types) and/or open enumerations (http://lists.w3.org/Archives/Public/www-xml-schema-comments/1999OctDec/0022.html and http://lists.w3.org/Archives/Public/www-xml-schema-comments/1999OctDec/0023.html) might enable the behavior that you desire.
Discussed in call of 2000-07-14.
The question is on a proposal to make it easier to add new items to an enumerated type (or alternatively to specify 'suggested values' in an unlimited value space).
RESOLVED unanimously: to dispose of this issue by observing that the resolution will follow clearly from our eventual decision on the union type question (LC-2, LC-93). If we have union types, open (non-final) enumerations come along for free. If we don't have union types, we do not think it wise to add an ad hoc construct to support non-final enumerations.
In the event, the WG did adopt a proposal for union types, so that open enumerations are in fact easy to define.
XForms WG (members-only) confirms that the resolution is acceptable.
Shall the DTD for schemas be made non-normative? (This question arose in connection with issue LC-123 and was assigned a separate number for tracking purposes.)
The question was raised and discussed in call of 2000-06-30.
A straw poll showed a preponderance of opinion for making the DTD non-normative. An email ballot was requested, but was inconclusive. The question was resolved at the face to face meeting of 1-2 August 2000.
Arguments in favor of making the DTD non-normative: it removes a dangerous redundancy, makes clearer that the DTD imposes no extra requirements on data or processors but is only a notational variant on the schema for schemas, and allows us to dodge the problems of supporting arbitrary and multiple redundant namespace prefixes in DTD notation (support for single, consistent prefixes is already built in). Arguments against the change: DTDs are the currently recognized notation for defining XML document types; changing the DTD's status to non-normative will inevitably shake the faith of users in the DTD.
A straw poll showed a preponderance of preference (15:7 preferred, 18:13 can-live-with) for making the DTD non-normative.
Resolved: to dispose of issue LC-210 by making the DTD for schemas non-normative. Dissenting: Holstege, Robie, Thompson. Abstaining: Buck, Chen, Ezell, Grosso, Gudgin, Mendelsohn, Walsh.
Formal response to commentator.
David Beech confirms that he is happy with the WG's decision.
Shall the pattern facet support picture-style masks as well as regular expressions as a language in which to express patterns? Cf. issue picture-or-regex: Pictures, regular expressions, both or neither? (member-only link) in the development-issues list.
Discussed in call of 2000-07-13.
Ashok Malhotra noted that the XForms WG feels very strongly about masks, and holds that regular expressions are difficult. They will not be happy with a decision to stick with regular expressions.
There was discussion.
RESOLVED unanimously: to close issue LC-211 with a polite no, noting the arguments given by MSM, RJ, and HT in email. We can discuss this topic further in the context of the general alignment between XForms and XML Schema, or in the context of future versions of XML Schema. Right now, there is no proposal for masks that is complete enough to work with, and the proposals we have seen (or generated ourselves) are invariably less expressive than regular expressions (so we cannot change from regexes to masks without loss of functionality). The definition of two notations for the same function is not a step toward greater simplicity; if masks are to be used (as suggested by Rick Jelliffe) to provide internal structure for strings, we believe the UXSD principle (Use XML for Structured Data) applies.
Shall XML Schema supply a buillt-in type for currency information?
Discussed in call of 2000-07-14.
RESOLVED unanimously: dispose of this issue by saying yes, we agree that it would be useful to have a predefined (complex) type for currency-labeled amounts. We do not believe that this type should be defined as part of XML Schema, but as part of a library of types, in a separate namespace. We will take steps to invite representatives of appropriate WGs to participate in drafting a NOTE to define such a type library, with the goal of publishing the note during our CR period. We will also instruct the primer editor to put examples into the primer.
Shall the datatypes spec be modified to make it possible for a schema author to derive a type by dropping facets from the base type?
Discussed in call of 2000-07-14.
Sperberg-McQueen observed that the XForms working group might have two slightly different things in mind:
Biron noted that the XForms WG's comparison between the type systems showed that we have some facets which they simply don't have. It would appear from this that A is meant. But if we can supply a facility for B, it would at least have the result that type derivations within the affected subtree would not be in a position to use the facet in question.
We could benefit, ourselves, from having a way to fix a facet. This would allow us to specify that the 'period' facet in the 'date'
After further discussion, a straw poll showed that the WG was leaning strongly toward adopting some mechanism roughly as Biron had described it, either conditionally upon confirmation from the XForms WG that it would in fact be useful to them, or unconditionally. The WG was not ready to make a final decision without seeing a more detailed design, however.
Discussed further at face to face meeting of 1-2 August 2000.
Malhotra reported that what the XForms WG had in mind under the term facet is not what we mean by the term, so they are not really asking for this. However, the editors do believe it would be useful to fix the value of a facet; this would prevent you from creating idiotic subtypes of date, for example. Specifically, the editors propose:
Resolved unanimously: to adopt this proposal.
Our response to the XForms WG should simply say that what they mean by facet is not what we mean by facet.
Shall the datatypes spec be modified to make it possible for a schema author to derive a type by adding new facets to the base type?
Discussed in call of 2000-07-14.
RESOLVED unanimously: to close issue LC-214 with a polite no, and ask for more information for use in developing requirements for later versions of XML Schema.
Shall XML Schema be modified to ensure that it is easy to add new child elements or attributes to existing types in schemas (e.g. in order to internationalize them or to add accessibility features)?
Input from Martin J. Duerst:
The basic problem we are considering is the following:
A lot of schemas will be written without sufficient thought given to i18n markup needs. It should be very easy to change such a schema to add the necessary things for i18n markup. These additions are mostly at the phrasal level, in some cases at the block level. The chances are that WAI has very similar requirements.
Some examples of i18n markup include:
As an example, imagine a book schema written in the US, and somebody who wants to add bidirectional features for Arabic or Hebrew.
I want to make clear at this point that we are not expecting XML Schema to provide a fixed set of attributes/elements (that seems to have been the way our earlier comments were understood).
However, we are requesting the XML WG to make sure that the effort to add such attributes/elements is as effortless as possible, certainly much smaller than copying and changing the original schema.
In the following, I'm trying to give a few ideas for how this could be achieved; I have to admit that I'm not an XML Schema expert.
I hope this is sufficiently clear.
Discussed in call of 2000-07-28.
The issue reflects a general exhortation to the Schema WG to make it 'easy' to add new elements to common types. Cf. LC-216 below.
The WG discussed the issue. Several points should be made in response.
RESOLVED: to adopt these points as a response.
The WG also discussed adding xml:lang to the urType, or defining a complex type with xml:lang as part of a a library of complex types.
A straw poll showed relatively little support for building xml:lang into the standard urType; there was substantial support for inviting the i18n WG to participate in the inter-WG task force to define a library of useful complex types, although 'do nothing' and 'invite them to define a namespace of their own' also were acceptable to a large fraction of the WG.
RESOLVED unanimously: to invite the i18n WG to participate in the inter-WG task force for a common library of complex types, noting in the invitation that they might prefer to create a separate library of i18n-related types, and we are willing to collaborate with them on that if they prefer. RESOLVED unanimously: to dispose of issue LC-215 with the response outlined above, including the invitation.
Responses in same thread; commentators uncertain whether they are satisfied or not.
Shall XML Schema be modified to merge the notions of mixed content without child elements, textOnly content, and the simple type string (e.g. in the interests of ensuring that schema authors don't unintentionally foreclose the possibility of adding child elements to a type later)?
Input from Martin J. Duerst:
"Martin J. Duerst" <duerst@w3.org> to w3c-xml-schema-wg@w3.org, Tue, 30 May 2000 18:11:01 +0900, Subject: I18N Last call comments on Schema Part 1
[8] The mail mentioned in [7] mentions addition of elements and attributes in general, but one particular and particularly frequent case is the addition of child elements to elements that do not have any child elements defined yet. In the current draft, such elements can be defined in two ways, either as 'mixed' without any elements specified or as 'string'. [There may be a third one, 'textOnly', as guessable from 4.3.3. However, the spec seems not consistent on this. For example, there is: {base type definition} The type definition resolved to by the value of the base [attribute], if present, otherwise the simple ur-type definition if the content [attribute] is textOnly, otherwise the complex ur-type definition. but earlier, there is only one ur-type, so this is confusing.]
In order to make extensions easy, the 'mixed' type without child elements and the string type (as long as not restricted by a facet, and see point [9]) should be merged. In terms of functionality, this should not provide any problems at all, because it is just a question of deferring decisions until they really are necessary.
It may be claimed that instead of merging 'mixed' and 'string' as above, it would suffice to always use 'mixed' in cases further addition of elements is desired. However, we feel that this is not sufficient, 'string' is too easy to use and will be used in too many instances.
Discussed in call of 2000-07-28.
The question is on a proposal from the i18n WG to specify, in effect, that the string simple type and textOnly complex types are not legal for elements, so that all such elements will be of complex type mixed, so as to make add-ins simpler.
The WG discussed the issue at some length. Best-practice guidelines might go far to minimize the problems foreseen; we cannot eliminate those problems, because it is necessary to provide the existing functionality for the cases where it is what is actually intended. (I.e. we need to provide enough rope to allow the schema author to do the job; it follows that schema authors will have enough rope to hang themselves, and this is unavoidable.) Forcing all application software always to be prepared for subelements, and never allowing it to expect only strings, simply transfers the burden from one set of shoulders to another. Instead of making hard things easy, this proposal would end up making simple things hard.
Some WG members also suggested that the proposal would do too much violence to our type system: mixed-content elements do not have children of the String simple datatype, just characters.
RESOLVED unanimously: to dispose of LC-216 with a polite no.
Formal response to commentator.
On 12 October, Misha Wolf asks that the status of this issue be set to not-ok (member-only link).
Shall XML Schema be modified to allow patterns on complex types
with content='mixed'
, so as to allow control over the
character repertoire allowed in the character content of elements with
a particular complex type?
Input from Martin J. Duerst:
"Martin J. Duerst" <duerst@w3.org> to w3c-xml-schema-wg@w3.org, Tue, 30 May 2000 18:11:01 +0900, Subject: I18N Last call comments on Schema Part 1
[9] As explained in item [35]/[36] of our comments to part 2, it will often be necessary to include character repertoire constraints in XML Schema. Such constraints should also be applicable to character children even if an element also has element children. This can easily be done by allowing a pattern facet even on complex types provided that this pattern facet only consists of a character class expression. This does not pose any problems with respect to the interleaving order of characters conforming to the pattern and elements conforming to the content model.
Discussed in call of 2000-07-28.
The question is on a proposal from the i18n WG to allow pattern facets on complex types, in order to allow restriction of the character repertoire.
The WG discussed the proposal; few spoke in favor of it.
Most who spoke suggested that the commentators have mistaken the nature of mixed content, and taken it for data with some simple datatype (e.g. String). Mixed content does not map, however, to any simple type. If it is ever reasonable to restrict data to a particular character repertoire, either this is a general problem of supporting a restricted character set in some system or locale, or else it is a problem specific to particular fields -- in which case the fields should be typed, and the types can be restricted.
RESOLVED: to close issue LC-217 with a polite no, on the grounds that the functionality required is appropriate to fields with specific simple datatypes, and can already be supported for such fields, or else the functionality is required in order to support legacy systems which cannot handle Unicode, in which case out-of-band measures are preferable.
Formal response to commentator.
On 12 October, Misha Wolf asks that the status of this issue be set to not-ok (member-only link).
Shall XML Schema be modified to address problems related to the occurrence, in existing systems, of control characters within data?
Cf. I18n notes on datatypes, misc
Input from Martin Duerst:
"Martin J. Duerst" <duerst@w3.org> to w3c-xml-schema-wg@w3.org, Mon, 29 May 2000 18:57:09 +0900, Subject: I18N Last call comments on Schema Part 2
[4] related to [3]: XML is based on Unicode and therefore allows to represent a huge range of characters. However, XML explicitly excludes most control characters in the C0 range. There are fields in databases and programming languages that allow and potentially contain these characters. A user of XML and XML Schema has various alternatives, all not very satisfactory:
This is a serious problem, and should be duly addressed by XML Schema. [There is a related problem with respect to names (GIs in SGML terminology), but this is more an XML 1.0 problem than an XML Schema problem, and there is no danger to lose all i18n information just because of a single character.]
Discussed in call of 2000-07-28.
RESOLVED unanimously: to classify LC-218 as class B (out of scope).
Formal response to commentator.
On 12 October, Misha Wolf asks that the status of this issue be set to not-ok (member-only link).
Shall XML Schema take various steps to support (in some future version) the definition of locale-dependent datatypes (i.e. datatypes with locale-specific lexical spaces)? To wit:
Input from Martin Duerst:
[7] Make sure that functionality for locale-independent representation and locale-dependent information is clearly distinguished. This is the only way to assure both appropriate localization (which we consider to be very important) and worldwide data exchange. The specification is rather close to half of this goal, namely to provide locale-independent datatypes. Some serious improvements however are still possible and necessary (see below). It is clearly desirable that W3C also address locale-dependent data representations. We think that these go beyond simple datatyping/ exchange issues and include:
These issues therefore have to be examined on a wider level involving various groups such as the XML Schema WG, the XSL WG, the CSS WG, the XForms WG, and so on, and this should be done as soon as possible.
We would like to repeat that any mixup between locale-independent and locale-dependent data representation will lead to confusion and will hurt, and not benefit, internationalization and localization. (This point is further addressed in detail in some of the points below: [8], [9], [10]-[16], [20]).
[8] Say explicitly in the specification and in the primer that
the lexical representations you provide for various datatypes (in
particular things such as date, numbers,...) are designed for
locale-independent data exchange, and that they are inappropriate for
locale-dependent data representation. In the primer, an example such
as <date value='2000-05-16'>Tuesday, 16th of March,
2000</date>
(or even just something like <date
value='2000-05-16'>next Tuesday</date>
) with value
defined as a date and the <date> content as string, would help.
Also, explicitly warn that where there is some similarity between
localized representations and the locale-independent representation,
this must not be exploited when presenting the data to a user, and
that similarities are due to - Having to choose *some* kind of
representation - Making this representation somewhat manageable in raw
text for when raw text is needed (debugging, plain text editing,...)
and that the fact that some representations are more similar to some
locales than others is done reluctantly, and not explicitly to
disadvantage certain users. [Indeed, where possible, we would prefer
representations that avoid any similarity to any existing locale.]
[9] As said above and explained below, addressing localized
representations as a whole is a huge problem. The one contribution
that seems most appropriate and relevant from XML Schema is to
associate locale- independent and locale-dependent representations.
Taking the example above, <date
value='2000-05-16'>Tuesday, 16th of March,
2000</date>
, the association between the
locale-independent 'value' and the locale-dependent element content is
implicit; XML Schema should provide a way to make this association
explicit. Including in the association some way to indicate the local
format used / the conversion functions necessary seems also desirable,
although we are not yet aware of an interoperable way to do so.
Discussed in call of 2000-07-28.
MSM noted that the only proposal on the table is the abstract-type proposal brought forward as part of the discussion of LC 220/36. It is ironic that the I18n WG has raised this issue, but opposes the abstract-types proposal.
In later work, the abstract-types proposal was adopted, and then rejected, by the WG. The net result is that while there is sympathy in the WG for (longer-term) support for multiple lexical spaces, there is no consensus as to what form that support should take, nor for what steps can be taken now to prepare for it.
Formal response to commentator.
On 12 October, Misha Wolf asks that the status of this issue be set to not-ok (member-only link).
Shall XML Schema be modified so that each built-in type has only a single legal lexical representation for each value in its value space?
Cf. Integers should not allow non-significant leading or trailing zeroes
Input from Martin Duerst:
[10] Several datatypes have more than one lexical representation for a single value. This gives the impression that these lexical representations actually allow some kind of localization or variation of representation. However, as explained above, such an impression is a dangerous misunderstanding, and has to be avoided at all costs. We therefore strongly request that all duplicate lexical representations be removed. The following points ([11]-[16],[20], [22], [27]) give details for each affected datatype. For each datatype, we indicate where duplicate representations exist, and how it may be removed. Unless otherwise indicated, we do not have any particular preferences of how to remove the duplicates; we just explain one way to do so to allow you to reuse the analysis we (mostly Mark Davis) have already done. We would like to point out that reducing the lexical representations to a single one for each value also makes using digital signatures on such data a lot easier, and to a large extent and at very little cost, avoids the creation of another WG and spec like in the case of XML Canonicalization.
[11] 3.2.2 'boolean': There are currently four lexical reps. for two values. This has to be reduced to two lexical reps. The I18N WG/IG here has a clear preference: most desirable: 0/1 less desirable: true/false clearly absolutely undesirable: 0/1/true/false
[12] 3.2.3.1 'float' allows multiple representations. This must be fixed, e.g. as follows:
Float values have a single standard lexical representation consisting of a mantissa, followed by the character "E" (upper case only), followed by an exponent. The exponent must be an integer. The mantissa must be a decimal number. The representations for exponent and mantissa must follow the lexical rules for integer and decimal numbers discussed above[below?]. The absolute value of the mantissa must be either zero, or greater than or equal to 1 and less than 10. If the mantissa is zero, then the exponent must be zero. For example: Valid: "-1.23E5", "9.9999E14", "1.0000001E-14", "0E0", "1E0" Invalid: "+1.23E5", 100000.0E3", "1.0E3", "1.0E0", "012.E3", "0E1" [This leaves one issue open, namely the issue of too high precision. one way to solve this is to define that the lexical rep. chosen is the one with the shortest lexical rep of the mantissa that corresponds to the desired value according to [Clinger/Gay], or if two lexical reps with the same shortest mantissa correspond, then the closer one should be chosen, and if both are equally close, then the one with an even end digit is chosen. [This should cover all cases, but there may be more accurate or more easy to calculate alternatives, and this should be checked by experts.]] [Some people may claim that e.g. the free choice of exponent or the use of leading digits is necessary to be able to mark up existing data; we would like to point out that if such claims should be made, we would have to request that not only such variations, but also other variations, e.g. due to the use of a different series of digits (Arabic-Indic, Devanagari,... Thai,..., Tibetan,..., ideographic,...) and so on be dealt with at the same level.]
[13] 3.2.4.1 'double' allows multiple representations. This must be fixed. The solution lined out in [12] can be applied.
[14] 3.2.5.1 'decimal' allows multiple representations. This must be fixed, e.g. as follows:
Decimal values have a single, unique, lexical representation. This consists of a string of digits (x30 to x39) with a period (x2E) as a decimal indicator (in accordance with the scale and precision facets), and a leading minus sign (x2D) to indicate a negative number. The decimal indicator must be omitted if there are no fraction digits. Leading and trailing zeros are illegal, except for zero itself (which is written as "0"). For example: Valid: "-1.23", 100000", "12678967.543233", "0" Invalid: "+1.23", 100000.0", "12,678,967.543233", "12,678,967.543233", "0.0", "012."
[15] Lexical representation of derived datatypes: The lexical representation of all datatypes derived (directly or indirectly) from 'decimal' (13 types from 'integer' to 'positiveInteger') must be changed to be unique. The easiest and most consistent way to do this is to just specify for each datatype that the lexical representation for all the values of the type is the same as for 'decimal'. If you want to be specific, you can find some details at: http://lists.w3.org/Archives/Member/w3c-i18n-wg/1999Nov/0007.html (members only). In any case, disallowing a '+' (done on some types, but not consistently) and disallowing leading zeroes should do the job.
[27] The lexical representation of 'hex' encoding (2.4.2.12) must be changed to allow only one case (e.g. only upper case) for hex digits.
This issue was discussed at some length by email and in face to face meetings. In part, it was subsumed by a proposal to introduce abstract simple types, from which schema authors could derive concrete types with variant lexical forms. The abstract-type proposal was raised at the Edinburgh face to face (June 2000), discussed extensively in email (the i18n WG in particular was strongly opposed to it), adopted on the basis of a task-force proposal at the Redmond meeting (August 2000), and then rejected after the editors reported difficulties integrating it into the specification.
The net result of discussion is that most simple types have a single lexical representation for each value in their value space; a few have multiple lexical representations, where the cost of enforcing a single lexical representation seemed high, and the cost of allowing multiple lexical representations seemed low, to the XML Schema WG.
A canonical representation for each value of each built-in simple type has been defined, which may be used in applications where lexical variation must absolutely be avoided.
Formal response to commentator.
On 12 October, Misha Wolf asks that the status of this issue be set to not-ok (member-only link).
Various comments on the date/time types.
Input from Martin Duerst:
[17] The time-related datatypes (timeDuration and recurringDuration and derived datatypes) need to be redesigned to avoid a number of serious problems. For details, please see points [18]-[25].
[18] The specification assumes that usual arithmetic can be done with TimePeriod, but due to the representation chosen, this is not the case. For example, it is absolutely unclear which of P3.01M or P90.5D is greater, or whether they are equal. There are two ways to solve this, either to choose a different representation or to remove orderedness and min/maxIn/Exclusive. The former is clearly desirable because of additional reasons, please see [19].
[19] The use of culture-specific time length units is highly problematic. This in particular applies to years and months in timeDuration. Various calendars use different month and year lengths; the main distinction being the one between lunar calendars and solar calendars. The Islamic, Hebrew, and Chinese months and years, for example, are all different from the corresponding western units. A system either has to be able to represent these units in all calendars (extremely difficult) or should be limited to representations that are to an extremely high degree culturally neutral. In order to deal with [18], too, we propose to do the later.
[20] Unique representation of timeDuration: There must be only one lexical representation for each timeDuration. This can be achieved as follows: Based on the representation of ISO 8601, only PnDTnHnMnS is used (i.e. no years or months). If any unit is zero, the number and the letter after it are removed, except for the zero duration, which is represented as P0D. If only Days are present, 'T' is omitted. Overflows in lower units have to be converted to higher units (i.e. PT24H -> P1D, PT60M -> PT1H, PT60S -> PT1M; except for leap second cases). Decimal fractions are only allowed for seconds, and do not allow trailing zeroes. [A serious alternative to this would be to remove timeDuration altogether.]
[21] The problems with timeDuration ([19]-[21]) heavily affect recurringDuration and all datatypes derived from it. In addition to the arguments above, recurringDuration is clearly of verylimited use even for the areas of the world that use the Gregorian calendar for all their activities. Being able to specify e.g. the 5th of May every year is only of limited value; most events are decided according to a much more complex pattern. The 3rd Wednsday of each month, a certain date if it is not a Sunday, otherways the Monday after it, and so on, are easier examples, and things can get more complex. With the current solution only a small part of the actual requirements can be addressed. Therefore, the datatype 'recurringDuration' must be removed. Several derived datatypes will be removed as a consequence (e.g. timePeriod, recurringDate, recurringDay, time,...). [The only viable alternative to this is to work on a more powerful representation can can address both various cultures and more complicated rules.]
[22] Having a datatype for timeInstant is clearly desirable. The current derived type should be promoted to a base type. Ideally, the representation should be based only on days (and seconds within the day) from an arbitrary but clearly specified base time instant (this would greatly simplify conversions to internal representation of all kinds of OSs and libraries). If this is judged to be not enough readable in plain text, the current scheme based on ISO 8601 may be kept (but should be verified to be absolutely clean of double lexical representations). Please note that while the representation in this case would not be culturally neutral, each timeInstant can with appropriate calculations be represented in a different calendar without problems.
[23] It may be reasonable to consider a datatype 'date', which is related to timeInstant but most probably best defined as a separate base type. 'month', 'year', and 'century' have to be removed for the reasons given above. It may be worth defining a 'composite' datatype 'actualTimePeriod', which consists of a start timeInstant and an end timeInstant. This would cover a lot more (and a lot more useful) cases in a much more uniform manner than what is currently possible, and could even replace 'date'.
[24] ISO 8601 is based on the Gregorian calendar, but there seems to be no indication as to whether this is applicable before 1582, nor how exactly it would be applied. Also, it is unclear how far into the future the Gregorian calendar will be used without corrections. A representation purely based on days and seconds would avoid these problems; if this is not possible, then the spec needs some additonal explanations or references.
[25] Several details in appendix D have to be fixed. It has to be clear that leading zeroes for months and days are needed. Hours obviously go from 0 to 23, minutes from 0 to 59. Seconds indeed can go to 60 in the case of leap seconds, but only in that case.
[26] For international data interchange, a uniform way to transmit measurements not only for time lengths and time instants, but all kinds of other units, seems highly desirable. If this cannot be provided in the first version of XML Schema, it clearly should be taken up soon for the next version.
Discussed in face to face meeting of 1-2 August 2000.
The DT editors suggested we quickly step through the points raised.
Discussion on points 18 and 20 (time periods with only days and less): Strictly speaking, the value space of the underlying datatype specifies clearly that they are not culturally variable, but dependent on the Gregorian calendar; they are therefore not ambiguous and do not vary by culture. What's more, a day is not always 24 hours where 24 hours is certain number of seconds, so the problem of variability will persist anyway. As has been made clear on the IG, applications do need to say "1 month" independent of the number of seconds. Since ISO 8601 is pegged to UTC, which has leap seconds, the only way to eliminate variability in the larger units would be to go to universal atomic time which is seconds. Some WG members would prefer to use that in the scientific arena, but it is not what the commercial world uses. Several WG members agreed that it is unnatural to express time durations at that level of granularity. Several WG members observed that the world is filled with software libraries that know how to do Gregorian dates with years and months, and argued that we should not invent something new and difficult for which there is no software support.
Proposal: Shall we disallow years and months in time periods, so they be written in terms of days, hours, minutes, seconds, and fractions?
The proposal failed. Our response to the commentators should be that this appears to us to be closer to the 80/20 point. The dependency on the Gregorian calendar reflects the fact that it is in wide-spread use (is, in fact, dominant) for commercial purposes; we want to support it. It would be useful to be able to support other calendars as well; we would like to see such support proposed as an alternative, perhaps for 1.1. We do not think that eliminating support for the dominant calendar is a good way of improving support for other calendars.
Item 23: A proposal that we adopt the more restricted duration as an alternative built-in type (now or later) failed to achieve consensus.
Item 25: Proposal: That we reduce the variation in lexical form by defining overflow rules and specifying that values for each unit must fall within certain ranges. (e.g. PT60M->PT1H etc).
Several WG members objected to this on the grounds that it contradicts conventional practice: leases run for 36 months, not three years, 90-day contracts are not three-month contracts, etc. PVB noted that ISO 8601 actually requires they overflow rules suggested by the commentators; it had been the editors' intention to specify those rules in the spec. The WG noted that, as currently defined, the value space for durations does not possess any distinction between (for example) 120 seconds and two minutes; this is because we decided early on that the value corresponding to a lexical form is its extension, not its intension. We had intended to define the overflow rules just in order to ensure that there were clear denotations for all lexical forms, and we would have a flat value space of abstract durations, with the rules for mapping coming out of ISO 8601.
Resolved: that we instruct editors to make overflow rules of 8601 more explicit in the specification. Dissenting: Olken.
Resolved (after further discussion): to define the value space of time duration as points in an n-dimensional space closed under addition but not under substraction and to specify that the canonical form is the lexical form. Dissenting: Campbell.
Formal response to commentator.
On 12 October, Misha Wolf asks that the status of this issue be set to not-ok (member-only link).
Shall XML Schema be modified at the transfer syntax level, the abstract level, or both, to define separate element types for repetition (occurrence) indications, instead of using the minOccurs and maxOccurs attributes on both element elements and groups?
Input from XML Query WG:
XML Query Comments to XML Schema (3rd part)
Here is the third set of comments from the XML Query Working Group on the XML Schema last call Working Draft.
In this version, we address the following issues:
This list may not be exhaustive and the XML Query WG may provide additional feedback at a later date.
- - Philip Wadler, on behalf of the XML Query WG
1. Repetition
The grammar of regular expressions in DTDs features three separate operators, sequence (comma), choice (bar), and repeat (star). In XML Schema, the first two of these are denoted by `sequence' and `choice' elements. However, the third does not appear separately, and instead `minOccurs' and `maxOccurs' may appear on every particle. It would better reflect the underlying structure of regular expressions to have a separate `repeat' element, with `min' and `max' attributes.
For example, consider the DTD
(a, b?, (c|d)+) |
In the current XML Schema syntax, this is rendered as follows:
<sequence> <element ref="a"/> <element ref="b" minOccurs="0" maxOccurs="1"/> <choice minOccurs="1" maxOccurs="unbounded"> <element name="c"/> <element name="d"/> </choice> </sequence> |
It would be better to use a syntax along the following lines:
<sequence> <element name="a"/> <repeat minOccurs="0" maxOccurs="1"> <element name="b"/> </repeat> <repeat minOccurs="1" maxOccurs="unbounded"> <choice> <element name="c"/> <element name="d"/> </choice> </repeat> </sequence> |
One could also define
<star>...</star> to abbreviate <repeat minOccurs="0" maxOccurs="unbounded">...</repeat> <plus>...</plus> to abbreviate <repeat minOccurs="1" maxOccurs="unbounded">...</repeat> <option>...</option> to abbreviate <repeat minOccurs="0" maxOccurs="1">...</repeat> |
With these abbreviations, the above becomes
<sequence> <element name="a"/> <option> <element name="b"/> </option> <plus> <choice> <element name="c"/> <element name="d"/> </choice> </plus> </sequence> |
There are two related but separate questions here.
The examples above dealt with (a) for conciseness, but point (b) is equally important, if not more so. It is important that the PSV infoset have a simple and uniform structure to aid its use in query processing (and other processing).
This design is better for the following reasons.
<element name="c" type="xsd:integer" fixed="5" minOccurs="5" maxOccurs="5"/> |
The proposed new form
<repeat min="5" max="5"> <element name="c" fixed="5" type="xsd:integer"/> </repeat> |
specifies much more clearly that there are 5 elements with fixed value 5.
Note that in XML Schema, by using the `group' element one can already structure specifications in a way similar to repeat.
<sequence> <element name="a"/> <group minOccurs="0" maxOccurs="1"> <element name="b"/> </group> <group minOccurs="1" maxOccurs="unbounded"> <choice> <element name="c"/> <element name="d"/> </choice> </group> </sequence> |
Thus, `repeat' introduces no new issues not already dealt with by XML Schema.
As a compromise position, some members of the Query working group felt that it would be acceptable for Schema to support both the current minOccurs and maxOccurs syntax and the new repeat syntax, so long as the PSV infoset used the equivalent of the repeat syntax.
There was no consensus in the WG in favor of making this change. Some WG members felt it would be an improvement to the transfer syntax (though probably not to the abstract component level); others felt it would be detrimental to the transfer syntax. All felt that it would delay the spec; even some of those who felt the change would be an improvement felt the improvement was too minor to be worth incurring the editorial and other costs.
Formal response to commentator. Philip Wadler responds (by private email) "I would like to dissent from this decision. I believe the awkward syntax will considerably impair learning and use of XML Schema."
Paul Cotton responds that the XML Query WG is "disappointed that the Schema WG did not accept our proposed change. We accept your decision but we point out that your response does not really give a technical rationale for the Schema WG decision.
"This is an issue which may impede alignment of Schema and Query and we will continue to work on it."