Reports

From the W3C SGML ERB to the SGML WG

And from the W3C XML ERB to the XML SIG


Compiled for the use of the WG and SIG by

C. M. Sperberg-McQueen

4 December 1997

Table of Contents


This document contains the text of reports to the World-Wide-Web Consortium's SGML Work Group (SGML WG) and later XML Special Interest Group (XML SIG), from the SGML Editorial Review Board (SGML ERB) and later XML Work Group (XML WG), as posted to the appropriate email discussion lists. The text has not been changed substantively, though some typographic errors have been silently corrected, and asterisks and similar devices used to signal emphasis or list structure have typically been replaced by appropriate SGML tagging.

It is intended that this document reproduce all the reports which describe decisions taken by the ERB/WG and their rationales; if readers are aware of any such reports which have been overlooked, they should contact the author. The rationales sometimes given here are useful, but much of the reasoning behind the decisions summarized here lies in the extensive discussion in the WG/SIG before and after the decisions, which should be consulted by anyone interested in a fuller understanding of the decisions.

The reports are arranged chronologically by date of the meeting. A subject index would be desirable but would go beyond the time available for preparing this compilation.

9 October 1996


Date: Wed, 09 Oct 1996 12:56:39 -0700
From: Tim Bray <tbray@textuality.com>
Subject: Report from the SGML ERB meeting of Oct. 9th

The SGML ERB met Wed. Oct 9th and voted on quite a few items. In attendence: Bosak, Bray, Clark, DeRose, Hollander, Kimber, Maler, Paoli, Sperberg-McQueen, and Sharpe. Absent: Magliery and Connolly. By a recent resolution of the ERB, and at Dan Connolly's request, he is now a non-voting liaison member. Thus, "Unanimous" means 10 in favor. No votes were close enough that Tom's presence or absence would have made a difference.

Several issues were left unresolved at the end of the meeting; the ERB will be meeting tomorrow and Saturday to get through this stuff.

A.1 XML will have only one concrete syntax, fixed at XML specification time, not document-instance parse time (0.2, 13.3, 13.4).

Passed, Unanimous

A.2 All or virtually all the information provided by a normal SGML declaration will be fixed for all documents; no SGML declaration will be necessary. (Possible exception: character-set information may vary document to document, but will be conveyed in other ways.) (6.2.3)

Passed, Unanimous

A.3 XML will have no OMITTAG, DATATAG, SHORTREF, LINK, CONCUR, RANK, or SUBDOC features (7.3.1, 7.3.1.1, 7.3.1.2, 7.3.2, 7.4, 7.5, 7.6, 7.7, 7.8, 9.4.6, 11.2, 11.5, 11.6, 13.5).

Passed, Unanimous

A.4 XML will make only partial use of the SHORTTAG feature: But The final point, on omitted attribute-value specifications, raises the general question of how XML systems will behave when no DTD, or a partial DTD, is provided -- if such omitted or partial DTDs are allowed. It also raises the question of providing a way for a document to signal that its DTD can be skipped without loss of information (e.g. because it has no default attribute values, or no empty elements, etc.). These questions are to be discussed and decided separately.

Passed, Unanimous

A.5 XML will have no quantities or capacities (7.3.3, 7.4.2, 7.9.2, 7.9.4.5, 9.4.1, 9.4.2, 9.8, 11.3.1, 13.2).

Passed, Unanimous

A.6 XML will not allow asynchronous marked sections -- marked sections must begin and end in the same element.

Passed; Unanimous. As Harvey Bingham pointed out, this needs careful phrasing to avoid ambiguity.

A.7 Should XML have CDATA, RCDATA, and TEMP marked sections or not?

XML will have CDATA marked sections, which must begin with the 9-character literal string "<![CDATA[" and end with the 3-character literal string "]]>". This is essentially Charles Goldfarb's proposal, although we may not call them CLEARDATA.

XML will not have RCDATA or TEMP marked sections.

Both Unanimous.

A.8 Should XML have INCLUDE and IGNORE marked sections or not? (If this question is answered YES, it leads to a separate question, how to achieve conditional inclusion in XML markup declarations. This related question is to be decided separately.)

Split in two: XML will not have INCLUDE or IGNORE marked sections in document instances; Unanimous. The question of conditional markup in declarations is still open.

A.9 XML will have no CDATA or RCDATA elements (11.2.3).

Passed, Unanimous.

A.10 How should XML escape markup delimiter characters in content (especially if (R)CDATA elements and marked sections are not allowed)?

Unanimously agreed that CDATA marked sections are to be used for blocks of text. See A19 for more on this.

A.11 XML will retain the distinction between element content and mixed content (7.6, 11.2.4). (Applies only if DTD supplied and used.)

Passed, DeRose dissenting.

A.12 XML will require all attribute-value specifications to take the form of attribute-value literals (7.9.3, 7.9.3.1).

Passed, Unanimous.

A.13 XML will not allow RE to end an entity or character reference; an explicit refc must provided, and it must be a semicolon (9.4.4).

Passed, Unanimous.

A.16 XML will stipulate that character references within processing instructions should be resolved by the XML parser (8).

Defeated, Sperberg-McQueen dissenting.

A.18 XML will have declarations for elements, and attributes, but not for short-references or links (11.1).

Passed, Unanimous, for elements and attributes. Notations and entities remain open.

A.19 XML will retain fundamentally the same parsing rules as SGML, though they may be expressed differently. (N.B. there is some sentiment for making XML's rules more restrictive than SGML's.)

Agreed unanimously that the rules should be stricter than SGML in that the characters '&', '<', and '>' are deemed always to delimit markup, and must always be escaped, specifically as "&amp;", "&lt;", and "&gt;", when appearing in parsed character data. The ERB recognizes that this impinges on the user's name space in an un-SGML-like way, but feels that this has already, de facto, happened.

A.21 like SGML, XML will forbid empty strings as attribute values for non-CDATA attributes, require FIXED attributes to take their default values (7.9.4.1, 7.9.4.2), and distinguish IMPLIED values from null-string values (11.3.4).

Passed, 7 in favor, DeRose, Hollander, and Sperberg-McQueen dissenting.

A.23 XML will have no CURRENT attributes, but it will have FIXED, REQUIRED, and IMPLIED attributes, and attributes with explicit defaults.

Passed, Unanimous.

A.24 Unlike SGML, XML will not allow direct references to external data entities from within parsed character data (9.4).

Passed, Unanimous.

A.25 Like SGML, XML will forbid recursive entity reference (9.4).

Passed, Unanimous.

A.26 Like SGML, XML will allow elements to be declared ANY (11.2.4). (Whether other similar shorthand declarations will be defined, e.g. for any subelements but not allowing PCDATA, will be decided separately.)

Passed, Bray dissenting.

A.27 XML will behave like SGML as regards behavior and precedence of occurrence indicators and connectors in content models (11.2.4.1, 11.2.4.2). (Whether to abolish the AND connector will be decided separately.)

Passed, Unanimous.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

16 October 1996


Date: Thu, 17 Oct 96 11:01:39 CDT
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: some ERB decisions

The SGML ERB met Wed. Oct 16th and voted on several items already submitted to the SGML WG. Participating: Bosak, Bray, Clark, Maler, Paoli, Sperberg-McQueen, and Sharpe. Absent: DeRose, Hollander, Kimber, Magliery, and Connolly. All decisions were by consensus of all those participating in the call, and thus carry a majority of the membership of the ERB.

Several issues were left unresolved at the end of the meeting; the ERB will be meeting today and Saturday to discuss them further and resolve them.

A.20 XML will retain the notion and syntax of comments (= 8879's 'comment declarations') (7.6, 10.3), but comment declarations will contain at most one comment: comments will take the form '<!>' or else will begin with '<!--' and end with '-->' (no space allowed), and may not contain '--'.

Comments will take the form '<!--' ... '-->', no internal '--' is allowed and no white space between the final '--' and the final '>'.

Empty comments (<!>) will not be allowed in XML.

B.1 What should XML's character-set rules be? Should conforming XML documents be restricted to particular character sets? Should conforming XML processors be required to be able to parse all conforming XML documents (13.1)?

Agreed:

Still open: details of the mechanism to be used for signaling the encoding and/or coded character set in use.

B.2 Should XML require each document instance to have a DTD or not (7.1)?

XML will not require each document instance to have a DTD.

Open question: details of partial DTDs or DTD summaries, if any, and possible declarations indicating whether the correct ESIS is derivable for a document its DTD is not read.

B.4 Should XML forbid comments and processing instructions in mixed content, as a way of simplifying RE handling (7.6)?

Assuming that a satisfactory RE rule can be agreed on, XML will not forbid comments and processing instructions in mixed content.

B.5 Should XML restrict the use of the PCDATA token in content models, to simplify RE handling or eliminate the Mixed Content Problem? (7.6.1, 11.2.4)
B.5 restrict PCDATA to models of the form (#PCDATA)

No.

B.6 restrict PCDATA to models of the form (#PCDATA | x ... | z)*

Yes.

B.8 Should XML use MSOCHAR, MSSCHAR, and MSICHAR strings (9.7)?

No.

B.11 Should XML forbid, allow, or require empty end-tags (7.5)?

Forbid.

-C. M. Sperberg-McQueen
University of Illinois at Chicago

17 October 1996


Date: Thu, 17 Oct 96 14:10:29 CDT
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: B.1 and B.2 results

The SGML ERB met today, Oct 17th, and voted on several items already submitted to the SGML WG. Participating: Bosak, Clark, Maler (in part), Magliery (in part), Paoli, Sperberg-McQueen, and Sharpe. Absent: Bray, DeRose, Hollander, Kimber, and Connolly. All decisions were by consensus of all those participating in the call, and thus carry a majority of the membership of the ERB.

The text below is substantially the same as the drafts discussed by the ERB, but was edited after the meeting to reflect the ERB's decisions; the ERB has thus not seen and approved the precise wording given, and may choose to correct any editorial errors made in the revision.

-C. M. Sperberg-McQueen

Character-set Rules

B.1 What should XML's character-set rules be? Should conforming XML documents be restricted to particular character sets? Should conforming XML processors be required to be able to parse all conforming XML documents (13.1)?

It had already been agreed that:

In discussing the mechanism to be used for signaling the encoding and/or coded character set in use, the ERB decided the following. [Editorial note: if the ERB decides that XML will have external text entities, then everything said below about documents will also apply to all external text entities.]

The character repertoire of XML documents is that of ISO 10646. All XML processors are required to accept documents in the UTF-8 and UCS-2 encodings of 10646. It is recognized that accepting documents in the UTF-16 variant would be desirable. Documents encoded in UCS-2 must begin with the Byte Order Mark described by ISO 10646 Annex E and Unicode Appendix B (the ZERO WIDTH NO-BREAK SPACE character, U+FEFF) -- this is an encoding signature, and not (for SGML purposes) part of the document. XML processors must be able to use this character to differentiate between UTF-8 and UCS-2 encoded documents.

XML does not explicitly sanction the use of any other encodings. It is recognized, however, that many documents exist in other encodings. To support processors in dealing with this situation, an XML document may contain at its beginning, before any other text, markup, PIs, or white space, an Encoding Declaration PI matching

 
EncDecl ::=
  '<?XML' S 'encoding' Eq ("'" Encoding "'")|('"' Encoding '"') S? '>'

An XML processor may choose to read Encoding Declaration PIs and accept nonstandard encodings so declared. In validating processors such behavior must be at user option.

An XML document which lacks both the Byte Order Mark and an Encoding Declaration PI must be in the UTF-8 encoding. It is an error for a document to be in an encoding other than that declared in its Encoding Declaration PI.

The XML specification shall include (possibly by reference to relevant IETF documentation) a list of standard declarations for the nonterminal "Encoding" in the above production, to support interoperability, including names for at least ISO-Latin-X and the JIS family.

DTDs

B.2 Should XML require each document instance to have a DTD or not (7.1)?

In discussing this item, the ERB made the following decisions:

1. Well-formedness

The XML spec shall define two characteristics which an XML document may possess, called "well-formedness" and "validity". A well-formed document, informally, is one for which no content model checking has been done, but which can be read by an XML processor with confidence in producing a correct ESIS.

Questions remaining open include:

  1. the specific definition of well-formedness -- it is expected to include at least least (1) a containing root element with no text outside it, (2) properly nested elements, (3) properly structured tags, and possibly other constraints on entity references, empty elements, etc.
  2. whether two distinct levels of well-formedness (e.g. strong and weak) are necessary
  3. the nature of well-formedness when there is no DTD or a partial DTD remains open.

2. Required Markup Declaration (votable Y/N)

XML markup declarations are divided into DTDs pointed-at by the <!DOCTYPE, and internal subsets contained within the <!DOCTYPE. Markup declarations necessary to produce a correct parse may be contained either in the DTD or the subset. XML will include a signalling method whereby instances may contain statements indicating whether the declarations in the DTD and/or the subset are necessary to produce a correct parse.

XML documents may contain a Required Markup Declaration PI as follows:

 
RMDDecl ::= '<?XML' S 'rmd' Eq ('NONE'|'INTERNAL'|'ALL') S? '>'

The RMD PI must appear after the Encoding Declaration PI, if any, and before the document type declaration itself, if any.

Should the RMD state that the DTD is required ('DTD' or 'ALL'), it is a reportable error if the DTD cannot be retrieved.

3. Interpretation of Required Markup Declaration

If no RMD PI is given, then

If an RMD PI is given, then

19 October 1996


Date: Sat, 19 Oct 96 12:33:48 CDT
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: ERB decisions on A.17, B.9, and other questions

The SGML ERB met today, Saturday Oct. 19th, and voted on several items already discussed by the SGML WG. Participating: Bosak, Clark, Kimber, Maler, Paoli, Sharpe, Sperberg-McQueen. Absent: Bray (represented in part by written votes on open issues), DeRose, Hollander, Magliery. All decisions were by consensus of all those participating in the call, and thus carry a majority of the membership of the ERB.

I should note that the wording of the rationales given below reflects the understanding, and is the responsibility, of the author. The rationales have not been reviewed or approved by the ERB; they are thus subject to correction when I have misunderstood or misstated the ERB's intention.

The ERB agreed on the following position statements:

The rationale for the list and for its inclusion in the specification is to allow some topics to be postponed until there is more time for their resolution, and to inform users of XML of the expected lines of development.

In the light of these agreements, the ERB reconfirmed its earlier decision that XML 1.0 will not have SDATA entities. It is thought that most uses of SDATA entities are adequately served by character references to Unicode characters (see example below). Techniques for dealing with non-Unicode characters, specification of glyphs rather than characters, and related topics (such as possible mechanisms for document private agreements governing the ISO 10646 Private Use Areas) will be addressed in future revisions.

Instead of a declaration like

  
  <!ENTITY auml SDATA "[auml    ]">
any XML processor can work properly with a declaration of the form
 
  <!ENTITY auml "&#228;"> <!-- auml = a umlaut, U+00E4 -->

On question A.17 (Should XML have entities or not?), the ERB had already decided that XML would have internal entities (either text or CDATA, not both). Today we decided further:

The rationale for allowing internal text entities was this: CDATA entities are very easy to implement (because they need not be expanded at parse time, but can be expanded later without changing the structure of the parse tree); text entities are more complex (if they are synchronous, they may require the replacement of a leaf node with an arbitrarily complex subtree; if they are asynchronous, they must be expanded at parse time and complicate the parser). Nevertheless, internal text entities are so useful to the user that they justify the cost of implementation.

Whether XML will have external text entities remains an open question.

On question B.9, the ERB decided:

Whether system identifiers in XML 1.0 will be allowed to carry the <url> label remains an open question.

Addition of public identifiers and extension of system identifiers to other formats will be taken up in preparation of future versions of XML.

The rationale for these decisions was that URLs are well understood and well established, and can handle both remote and local addresses. Restricting external identifiers to URLs helps keep the specification simple. In the long run, however, public identifiers are desired by many users and may provide solutions to the well known fragility problems associated with URLs. Better infrastructure, in the form of catalog management tools and http-based catalog resolution services, would help make the introduction of public identifiers into XML smoother.

-C. M. Sperberg-McQueen

23 October 1996


Date: Wed, 23 Oct 96 17:39:24 CDT
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: ERB decisions, 23 October 1996

The ERB met today, 23 October 1996, and decided a number of questions. All members of the ERB were present (Bosak, Bray, Clark, DeRose, Hollander, Kimber, Magliery, Maler, Paoli, Sharpe, Sperberg-McQueen); decisions were taken by consensus except as noted.

As usual, summaries of the rationale for the decisions made have not been reviewed by the ERB and are thus subject to correction and further explanation.

A.17 Should XML have entities, or not?

The ERB had already agreed that XML should have internal text entities and external NDATA entities. Today, after discussion, we agreed that support for external text entities would be an optional feature of XML 1.0 (dissenting: Clark, Paoli, Sharpe).

The rationale for the decision was that support for external entities is (a) essential if XML is to be useful as an authoring language, but (b) a heavy burden for network-based client software. A proposal to define XML in such a way that external text entities were legal only if in local files (and thus not legal in network use of XML) attracted some support, but not enough.

The dissenting view on this decision was that allowing an optional feature and losing the monolithic definition of XML was too high a cost; the dissenters all also felt that external text entities should be disallowed unconditionally.

External text entities will be placed on the list of topics to be reviewed in preparing future versions of XML.

This topic may also be revisited in the near future (i.e. before version 1.0), depending on reports on the progress and status of W3C-based work on this and related topics.

The question of SDATA entities will be taken up again before XML 1.0 is published.

C.1 should XML require all entities to be synchronous with the document's logical structure?

Agreed unanimously that XML will require all entities to be synchronous with the document's element structure.

The rationale is that this simplifies parsing somewhat, allows entity expansion to be delayed if the implementation desires to do so, and makes possible simple checks for the well-formedness of external (and internal) entities.

C.2 should XML prescribe the use of an ENTITY-END character as the canonical method of handling entity boundaries, as a way of simplifying exposition and implementation (6.2.2)?

Agreed unanimously not to prescribe any particular method of handling entity ends; rationale: the proposal would tend to confuse, not simplify, the issue.

C.3 should XML retain or relax SGML's prohibition on ENTITY attributes referring to SGML text entities (7.9.4.3)?

Agreed unanimously to retain the prohibition. Rationale: compatibility.

C.4 if XML makes DTDs optional and allows partial DTDs, what must or may a parser do when it encounters references to undeclared entities (9.4)? Should XML declare any set of entities automatically?

Agreed unanimously that reference to an entity not declared and not included in the list of 'automatic' declarations is a reportable error. No particular error recovery strategy will be prescribed. Rationale: defining this as a non-error would weaken validation too much; error recovery should be left to the implementation, as different strategies are appropriate for different purposes.

Agreed unanimously to define automatically the entities lt, gt, amp, and two entities for double and single quotation (for use in attribute value literals), names to be determined in separate discussion.

Proposals to declare other sets of entities automatically (e.g. all of ISO Latin 1 or all entities declared in HTML 3.2) remain open questions.

C.5 if XML uses ISO 10646, should there be a special form of character reference using hexadecimal, not decimal, numbers, since most references to ISO 10646 and Unicode use hex, not decimal (9.5)?

Agreed (Clark dissenting) to specify that XML documents may refer to characters in ISO 10646 using the form '&u-' or '&U-' followed by four hexadecimal digits, followed by semicolon.

Rationale: Unicode and ISO 10646 documentation is in hexadecimal, not decimal, so this constitutes a small but important convenience and aid to reliability. The proposal to use '&u' was preferred to the '&#u' proposal since it is believed to allow SGML systems to handle these references (which appear to an SGML parser to be general entity references) using a default entity declaration. (Consult James Clark for details.)

C.6 Should XML retain SGML's prohibition on multiple declarations for the same element (11.2.1)?

Agreed unanimously to retain the prohibition. Rationale: compatibility. (Some ERB members may also apply the same rationale as for the dissent on question C.8.)

C.7 Should XML prohibit the use of inclusion and exclusion exceptions in element declarations? (11.2.4, 11.2.5)?

Agreed unanimously to prohibit their use in XML 1.0, and (with dissents from Bray, Magliery, and Sperberg-McQueen) to place them on the list of topics to be considered in preparing future versions. Rationale: simplification of validation and harmonization of XML parsing model with standard formal-language theory and practice.

C.8 Should XML prohibit content-model references to undeclared elements (11.2.4)?

Agreed (Bray, DeRose, and Sharpe dissenting) to allow such references. Rationale: this is a useful technique in the construction of large public DTDs which may be subsetted locally or document-by-document. Rationale for the dissent: clean grammars are easier to process and parse than dirty grammars. (N.B. 'clean' and 'dirty' here have the technical senses usual in discussions of formal grammars.)

C.9 Should XML forbid use of the '&' connector in content models (11.2.4.1)?

Agreed unanimously to forbid use of the '&' connector in XML. Rationale: harmonization with conventional regular expressions.

C.11 Should XML retain SGML's prohibition on multiple attribute-list declarations for the same element (11.3.1) or on multiple declarations for the same attribute (11.3.2)?

Agreed unanimously to retain the prohibition. Rationale: compatibility. (Some ERB members may also apply the same rationale as for the dissent on question C.8.)

C.12 Should XML change the set of types available for attributes? E.g. by suppressing NAME(S), NUMBER(S), NMTOKEN(S), NUTOKEN(S) and adding constraints in the form of regular expressions, ISO dates, language-code, external-id, type IDREF, ... (7.9.4, 11.3.3)

After discussion, agreed unanimously that XML should have the following attribute types: ID, IDREF, IDREFS, ENTITY, ENTITIES, CDATA, enumerated attribute types, NOTATION attribute type, NMTOKEN and NMTOKENS. The types NUMBER(S), NUTOKEN(S), and NAME(S) are to be dropped.

Rationale: the distinctions among the lexically defined types are not useful enough to justify retaining all of them, but they do provide convenient case-folding and white-space normalization. If just one is to be kept, it should be NMTOKENS, since it subsumes all the others and the other lexical types of SGML can be translated into XML by retyping them as NMTOKENS and adding an application-level check on the specific type of token required. Such application-level checks are in any case common among users of these types. The type NMTOKEN was retained in order to preserve the singular/plural symmetry with IDREF and ENTITY.

Extensions to the set of declared-value types in ISO 8879, though supported by Sperberg-McQueen, commanded no support for inclusion in XML 1.0.

Other decisions in batch C are still pending.

-C. M. Sperberg-McQueen

24 October 1996


Date: Thu, 24 Oct 96 13:02:11 CDT
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: ERB decisions, 24 October 1996

The ERB met today, 24 October 1996, and decided a number of questions. Present: Bosak, Clark, Kimber, Magliery, Maler, Paoli, Sharpe, Sperberg-McQueen; absent: Bray (represented in part by proxy votes), DeRose, Hollander. Decisions were taken by consensus except as noted.

As usual, summaries of the rationale for the decisions made have not been reviewed by the ERB and are thus subject to correction and further explanation.

A.15' XML will use a sort of 'formal processing instruction': the first token of the PI's system data will be a Name (e.g. <?TeX \vskip> or <?application-name application-specific instructions>) (7.6, 8)
Should the Name be required to be the name of a declared NOTATION?

Agreed unanimously that the Name need not be that of a declared NOTATION; if it is, however, the spec should state that the meaning is that the PI in question is in the notation (or: appertains to the notation processor) indicated.

Rationale: making the association explicit is a useful semantic clue, but requiring it is excessively burdensome.

A.18' Should XML have declarations for notations (11.1)?

Agreed unanimously that it should. Rationale: needed for NDATA entities (and PIs).

B.12 Should XML retain SGML's prohibition on multiple declarations for the same notation (11.4)?

Agreed unanimously to retain the prohibition. Rationale: compatibility.

B.13 Should XML remove SGML's prohibition on ENTITY attributes for notations (11.4.1)?

Agreed unanimously to retain the prohibition. Rationale: compatibility.

B.13 bis. Should XML allow any attributes at all for notations (11.4.1)?

Agreed (EK dissenting) to drop attributes on notations in XML 1.0. Agreed (MSM and EM dissenting) to place this topic on the list of topics to be (re-)considered in the preparation of future revisions of XML.

C.13 Should XML remove SGML's prohibition on multiple ID or NOTATION attributes on the same element (11.3.3)?

Agreed unanimously to retain the prohibition. Rationale: compatibility.

C.15 Should XML define new specific methods of inferring values for attributes with no attribute-value specifications (11.3.4)? E.g. INHERITED, to signify that the value is taken from the attribute of the same name (and type) on the smallest enclosing element with such an attribute.

Agreed (MSM dissenting) to define neither INHERITED nor any other new method of value-inference. Rationale: this topic will be treated in the second stage of the project.

Several other topics were discussed without achieving consensus. The ERB will meet again Saturday to continue these discussions.

-C. M. Sperberg-McQueen

26 October 1996


Date: Sat, 26 Oct 1996 11:37:27 -0700
From: Tim Bray <tbray@textuality.com>
Subject: SGML ERB Meeting of October 26: RE's resolved!

The ERB met on Saturday October 26. All members were present.

As usual, summaries of the rationale for the decisions made have not been reviewed by the ERB and are thus subject to correction and further explanation.

1. Reservation of name space

Agreed unanimously to add "." to the set of legal name-start characters for XML, and to reserve the portion of all name-spaces beginning ".XML." for the purposes of the language. This includes at least element GI's, attribute names, entity names, and element ID's.

2. RS/RE handling

Executive summary: use HTML rules, but provide an escape to RE Delenda Est.

Agreed unanimously, except for one sub-clause noted below, that:

2.1 There will be a mechanism, using a reserved attribute, to toggle, per element, between two modes of white-space handling. In "White Space Preservation" mode, all white space including RE is passed through to the application, with the exception of a single leading and trailing RE if they are alone on a line with the start- or end-tag. Note: "alone on a line with" assumes that comments have already been stripped. In "White Space Collapse" mode, all initial and trailing white space in an element is eaten by the parser, and all internal white space, including successive blank lines, is replaced by a single space character before passing to the application.

2.2 The setting of this toggle is by default inherited from the parent element. The root element of any document, by default, has the toggle set to "White Space Collapse" mode.

2.3 The White Space mode is orthogonal to the use of CDATA marked sections; that is to say, CDATA marked sections will still ignore markup delimiters, but will respect the current White Space mode. [On this decision, Bray, Sperberg-McQueen, and Sharpe dissented, preferring to have White Space Preservation mode built-in to CDATA Marked Sections].

Notes and Rationale:

  1. A large part of the world builds and uses applications that generally collapse white space. The objective of causing the minimal number of surprises mandates this default behavior.
  2. If XML is going to deviate from 8879 compatibility, it should be in a subtractive way; i.e., XML should eat at least as many RE's as SGML does. The HTML behavior seems the simplest way to achieve this.
  3. James Clark feels that there is a slight 8879 incompatibility in that if there are comments or PI's in an element that has White Space Preservation set, SGML will eat some RE's that XML will pass through. Clearly, given the vote, this is felt to be liveable.
  4. I have since discovered that beginning GI's with .XML. may cause massive incompatibility with CSS - we may end up having to reserve "-XML-".

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

30 October 1996


Date: Wed, 30 Oct 96 13:34:02 CST
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: ERB meeting, 30 October 1996

The ERB met this morning, 30 October 1996. Present: Bosak, Bray, Clark, DeRose, Hollander, Kimber, Maler, Paoli, Sharpe, Sperberg-McQueen. Absent: Magliery.

The rationale given has not been checked by the ERB and is subject to correction and supplementation.

B.10 What form should EMPTY elements take, if there are EMPTY elements in XML: <e>, <e/>, <e></e>, or <@e> (where the NET string is assumed to be '/>' and '@' is assumed to be an XML-specific flag for names of EMPTY elements; in SGML systems, '@' to be added to the set of name-start characters).

Agreed unanimously:

Rationale: Allowing the form <e> simplifies learning and conversion for existing SGML and HTML documents and users -- one of the rare cases where these two populations seem to have the same requirement. Allowing some self-identifying form simplifies the parsing of documents significantly, and makes it much easier to work without explicit declarations. Allowing both forms was felt to be a useful compromise -- part of the committee would have preferred to allow only one form, but was evenly split between the 8879 form and the self-identifying form. The entire committee felt unanimously, however, that allowing both forms was workable, particularly if the spec makes reasonably clear that one is the preferred form and the other is included only for compatibility reasons.

The choice among the proposed self-identifying form was motivated in part by pragmatic considerations and in part by aesthetics. If XML EMPTY elements carry end-tags then the EMPTY keyword will have different meanings to an XML and an SGML system; this was felt to entail too many complications, so <e></e> was ruled out. The form <@e> was not felt significantly easier or harder to implement than the form <e/>, though this may vary in different implementations. Both <@e> and <e/> may be compatible or incompatible with whatever delimiters for empty elements are present in SGML-97. There was a clear preference, however, for the form <e/>, based in part on the visual effect of the slash.

-C. M. Sperberg-McQueen

31 October 1996


Date: Thu, 31 Oct 96 13:01:34 CST
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: ERB decision, 31 October 1996

The ERB met today, 31 October 1996. Present: Bosak, Bray, Clark, Hollander, Kimber, Maler, Paoli, Sharpe, Sperberg-McQueen. Absent: DeRose, Magliery.

We discussed the issue of external text entities and agreed unanimously to rescind the decision of 23 October on question A.17 making support for external text entities an optional feature of XML. Instead, the XML spec will distinguish the treatment required for external text entities in validating and non-validating systems, using language something like this:

a reference to an external text entity is a signal to an XML processor that the entity's contents are to be included at the point of the reference. For purposes of validation, an XML processor must fetch and read the external text entity at the point of reference; for other purposes, processor behavior is not constrained.

This is draft, not final, wording.

If a network client chooses to fetch and process external entities at the point of reference, it may do so; if it chooses instead to insert a small icon instead, and fetch and display the entity only on request, or behave in some other way, it may do that too. This has the effect that, if they wish, browsers may from a technical point of view treat external text entities much the way they treat links to other documents or links to embedded graphics -- the user interface may well differ, but text entities do not require a browser to change its internal organization or way of working.

Rationale: distinguishing validation behavior from other behavior allows us to reduce the number of optional features in XML to zero, while retaining (a) the information provider's ability to segment documents into several files and (b) the network client's ability to handle documents without required waits during network fetches. Since text entities in XML must be synchronous, adding an entity to the data structure requires only the replacement of a leaf with one or more subtrees; this allows entity fetching to be delayed if delay is useful to the application.

As usual, the rationale just given is subject to correction.

-C. M. Sperberg-McQueen

6 November 1996


Date: Wed, 06 Nov 1996 12:30:23 -0800
From: Tim Bray <tbray@textuality.com>
Subject: Recent ERB votes

In a recent series of mail votes and meetings, the ERB has resolved several XML design issues. Under pressure of time, we moved very rapidly and votes may not have been fully and exactly recorded where the sense of the ERB on some issue became quickly obvious. It is possible that ERB members may wish to correct their reported votes. As always, accompanying rationales, where present, have not been reviewed by the ERB and may be subject to correction.

[No item number] Decided unanimously to change PIC for XML to be '?>'. This will allow a lot of things to fit into PI's that currently can't (most notably some proposed server-side scripting languages).

A.8, B.7 XML will have INCLUDE/IGNORE marked sections in DTD's

Passed, Bray and Paoli dissenting.

A.20' XML will change the COM delimiter from '--' to some other string, to minimize user errors. (Candidates: !!, /*, //, **, ??, ;;, ~~, !?, ?!, (), [], others ...)

Defeated, Sperberg-McQueen voting in favor

A.22 XML will have no CONREF attributes (11.3.3, 7.3, 7.9.4.4).

Passed (no CONREF), Kimber and Maler dissenting

B.9' Should XML require system and public identifiers to be FORMAL (13.5)?

This had actually become a discussion of whether to allow the <url> formulation in front of external identifiers, which must be URL's in XML.

Decided, DeRose, Kimber, Maler, and Sperberg-McQueen dissenting, not to allow the <url> prefix.

C.10 Should XML allow nondeterministic content models (11.2.4.3)?

Voted (Bray, Paoli, and Sharpe dissenting) to retain SGML's restriction in this area.

Rationale: Existing SGML tools, for example the SP family, have this rule wired deeply into their logic, and those who wish to use these tools on XML documents won't be able to if they have non-deterministic content models.

C.14 Should XML allow more than one enumerated type (name-group declared value) to contain the same possible value (11.3.3)?

Voted unanimously to remove SGML's restriction in this area.

Rationale: This is incompatible with 8879, but there is every expectation that WG8 will fix this problem soon; furthermore, making this change is not expected to cause serious inconvenience to existing SGML products, whereas the rule is a very serious inconvenience to users of XML and authors of XML software.

D.2 Should XML provide shorthand ways of summarizing the salient points of a document's DTD?

Discussion:

This turned out to be one of the hardest problems the ERB dealt with, and the key issue became that of EMPTY elements. Remember that in a previous decision we had agreed to recommend the <e/> syntax, but accept the 8879 syntax. Here are some of the sticky parts:

Bearing all this in mind, the ERB voted, Maler dissenting, that:

Rationale: For technical reasons, requiring and allowing only <e/> is a big winner. However, many of us, who anticipate an uphill struggle selling XML to web-heads felt that the marketing advantage in making it possible for HTML documents to be valid, and being able to say "XML processors can read HTML", were impossible to give up. In opposition, Eve Maler in particular felt it was unconscionable to kowtow to the requirements of one particular DTD. The ERB acknowledged that allowing <BR>, etc., does not enable to XML to grandfather, on a large scale, the existing inventory of XML; simply to state that (at least some normalized) HTML documents can be read by XML processors.

D.3 Should XML specify short-hand element declaration keywords (e.g. %ANY-ELEMENT;) for element content in which any element in the DTD is legal (same as ANY, but element not mixed content)?

Defeated, Sperberg-McQueen voting in favor.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

7-9 November 1996

COM Delimiter


Date: Sat, 9 Nov 1996 14:51:08 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Decision: A.20' (COM delimiter)

This is one of a series of reports on recent decisions of the SGML ERB.

A.20' XML will change the COM delimiter from '--' to some other string, to minimize user errors. (Candidates: !!, /*, //, **, ??, ;;, ~~, !?, ?!, (), [], others ...)

Decision: No. Dissenting: Sperberg-McQueen.

Conditional Sections in DTDs


Date: Sat, 9 Nov 1996 14:52:45 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Decision: A.8, B.7 (INCLUDE/IGNORE in DTDs)

This is one of a series of reports on recent decisions of the SGML ERB.

A.8, B.7 (merged) XML will have INCLUDE and IGNORE marked sections in DTDs.

Decision: Yes. Dissenting: Paoli, Bray.

CONREF Attributes


Date: Sat, 9 Nov 1996 14:55:09 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Decision: A.22 (no CONREF)

This is one of a series of reports on recent decisions of the SGML ERB.

A.22 XML will have no CONREF attributes (11.3.3, 7.3, 7.9.4.4).

Decision: Yes (no CONREF in XML). Dissenting: Maler.

System Identifiers


Date: Sat, 9 Nov 1996 14:56:02 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Decision: B.9 (<URL> in SYSTEM identifiers)

This is one of a series of reports on recent decisions of the SGML ERB.

B.9 XML will allow SYSTEM identifiers (which in XML 1.0 are required to be URLs) to be prefixed by "<url>".

Decision: No. Dissenting: Maler, Kimber, Sperberg-McQueen, DeRose.

Entity Resolution


Date: Sat, 9 Nov 1996 14:57:46 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Decision: C.4 (Predefined entities)

This is one of a series of reports on recent decisions of the SGML ERB.

Due to some confusion on the part of the Chair, this question got resolved in several pieces, which for purposes of simplicity are somewhat condensed in the following report.

(a) XML will declare a number of entities automatically.

Decision: Yes. Dissenting: Bray.

(b) Users will be able to override the predefined entities.

Decision: No. Dissenting: Sperberg-McQueen.

(Thus, processors shall behave as though declarations for the predefined entities are encountered at the end of the external DTD subset.)

(c) In addition to "lt", "amp", and "gt" (decided in a previous vote), the predefined entities shall include "quot" (for hex 22 -- same as in HTML) and "squot" (for hex 27 -- undefined in HTML).

Decision: Yes.

(d) The predefined entities shall include all those entities specified in the HTML 3.2 specification (the Latin 1 entities plus "copy", "reg", and "nbsp").

Decision: Yes. Dissenting: Bray, Clark.

(e) The predefined entities shall include all the entities recently approved by the HTML ERB for inclusion in the "Cougar" DTD. This means, basically, all of the HTML 3.2 entities plus all of the ISO entities for which characters exist in the Adobe Symbol font set, which is supported across Windows, X11, and Macintosh platforms.

Decision: Yes. Dissenting: Bray, Clark. Abstaining: Maler.

Thus, the list of ISO entities predefined in XML is as follows (list courtesy of Bob Stayton, SCO):

******list******

Incomplete DTDs


Date: Sat, 9 Nov 1996 14:58:24 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: C.16 (Attribute values if DTD not complete)

This is one of a series of reports on recent decisions of the SGML ERB.

C.16 When given an incomplete DTD, XML processors and applications may make any assumptions about the treatment of attributes and their values which are consistent with the document. They will be required neither to assume that all attributes are implicity declared CDATA, nor that attributes with names beginning IDREF, ID, ENTITY, etc. have the types IDREF, ID, ENTITY, etc.

Decision: No. After much discussion, this became:

C.16 In the absence of a declaration, attributes shall behave as if they had been declared CDATA.

Decision: Yes. Dissenting: Hollander.

(Note: The ERB intends to revisit the possibility of reserving certain attribute names such as "ID" during Phase II of this project, during which it will no doubt have to standardize other incursions into the name space in order to specify hypertext mechanisms.)

ANY-ELEMENT


Date: Sat, 9 Nov 1996 14:59:02 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Decision: D.3 (XML version of ANY)

This is one of a series of reports on recent decisions of the SGML ERB.

D.3 XML shall specify a short-hand element declaration keyword (e.g. %ANY-ELEMENT;) for element content in which any element in the DTD is legal (same as ANY, but element not mixed content).

Decision: No. Dissenting: DeRose, Hollander.

(Note: several members of the ERB feel that this should be dealt with in the SGML revision.)

Parameter Entities


Date: Sat, 9 Nov 1996 14:59:35 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Decision: D.4 (parameter entities)

This is one of a series of reports on recent decisions of the SGML ERB.

D.4 XML shall allow parameter entities and parameter-entity references.
(a) XML shall allow internal parameter entities.

Decision: Yes. Dissenting: Paoli, Bray.

(b) XML shall allow external parameter entities.

Decision: Yes. Dissenting: Paoli, Bray, DeRose.

Entities


Date: Sat, 9 Nov 1996 15:17:52 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: (Repeat) Decision: C.4 (Predefined entities)

[In my first attempt at this posting, I omitted the list of entities at the end. -- Jon]

This is one of a series of reports on recent decisions of the SGML ERB.

Due to some confusion on the part of the Chair, this question got resolved in several pieces, which for purposes of simplicity are somewhat condensed in the following report.

(a) XML will declare a number of entities automatically.

Decision: Yes. Dissenting: Bray.

(b) Users will be able to override the predefined entities.

Decision: No. Dissenting: Sperberg-McQueen.

(Thus, processors shall behave as though declarations for the predefined entities are encountered at the end of the external DTD subset.)

(c) In addition to "lt", "amp", and "gt" (decided in a previous vote), the predefined entities shall include "quot" (for hex 22 -- same as in HTML) and "squot" (for hex 27 -- undefined in HTML).

Decision: Yes.

(d) The predefined entities shall include all those entities specified in the HTML 3.2 specification (the Latin 1 entities plus "copy", "reg", and "nbsp").

Decision: Yes. Dissenting: Bray, Clark.

(e) The predefined entities shall include all the entities recently approved by the HTML ERB for inclusion in the "Cougar" DTD. This means, basically, all of the HTML 3.2 entities plus all of the ISO entities for which characters exist in the Adobe Symbol font set, which is supported across Windows, X11, and Macintosh platforms.

Decision: Yes. Dissenting: Bray, Clark. Abstaining: Maler.

Thus, the list of ISO entities predefined in XML is as follows (list courtesy of Bob Stayton, SCO):

Case Sensitivity


Date: Sat, 9 Nov 1996 15:21:04 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Decision: Case sensitivity

In its meeting of Thursday, November 7, the SGML ERB decided that for purposes of the XML 1.0 specification, the 8879 rules for case folding in markup shall be extended by folding lowercase to uppercase according to the Unicode mapping tables (see previous correspondence in the WG for an exhaustive list).

Abstaining: Clark, Hollander.

13 November 1996


Date: Thu, 14 Nov 96 15:55:55 CST
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: ERB discussions and decisions

The ERB met yesterday, 13 November 1996, to discuss the XML working draft and approve the distribution of the current text at SGML '96 next week. We considered a number of topics arising from the draft, some of which have already been discussed, or are still being discussed, on this list, and other of which have not received much discussion. Present: Bosak, Bray (intermittently), Clark, DeRose, Kimber, Maler, Magliery, Paoli, Sharpe, and Sperberg-McQueen. Absent: Hollander.

The author's apologies to busy members of the WG who would prefer a shorter account of the decisions; recent claims on the WG list that the ERB does not explain or discuss its decisions with the rest of the WG have led me, perhaps mischievously, to provide as full a discussion and explanation as my fingers can handle.

There's an executive summary at the end.

Given the number of major topics on which the WG appears not to have reached consensus and the volume of comment lately, it seems safe to say that some issues will require ongoing consideration and discussion, and the text of the working draft which we can distribute next week will be subject to change in non-trivial ways before we can leave this phase of the project's work behind. We considered dropping the plan to distribute printed copies at SGML '96, in order not to give a false impression of completeness. On the whole, however, the ERB thought that having printed copies available would be worthwhile, and we decided to go ahead with the plan. The cover page will, like the current Web copies, identify the document as a Working Draft, so the fact that it's not completely stable should be visible to any reader. And as the experience of the ERB and WG shows, having something that appears completed is one of the best ways to get people to read a draft and comment on it.

Since Henry Thompson raised the question directly: no, it's not too late for comments on substantive issues. The document is a Working Draft, and when the ERB stops work on it and moves to the next phase, it will still be a Working Draft until the W3C advances it to Draft Recommendation status, using the normal W3C procedures. There is some sentiment for avoiding the kind of violent swings in philosophy and technical direction that characterize some working drafts in some organizations, but in principle and in practice, working drafts are subject to change, and discussion about what changes to make is always appropriate unless the rules of the WG make it out of order (e.g. while we focus on some specific issue).

In the meantime, it is too late for typographic corrections to be included in the version distributed at SGML '96.

Other items remaining undiscussed and undecided were implicitly declared editorial questions for purposes of getting copy to the printer in time for SGML '96 distribution. The editors resisted the temptation to seize this opportunity to restore the DSD syntax for markup declarations.

The spec has now gone to the printer; the editors would like to thank those members of the WG who sent us corrections and pointed out errors. It'll be a materially more complete, correct, and less confusing document thanks to your efforts.

- C. M. Sperberg-McQueen

Summary:

11 December 1996


Date: Wed, 11 Dec 1996 12:05:51 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB discussion of public identifiers

We spent a lot of time on this question on Dec. 11th, and it is clear we need some more help from the WG.

The ERB is, by at least a substantial majority, convinced that there is a real nead for PUBLIC identifiers in XML.

The ERB is highly concerned, in at least a significant minority, about the effects of putting this facility in without specifying a resolution mechanism. Doing so would contravene one of the major design goals of XML - that any compliant XML processor should be able to read any compliant XML document.

On the other hand, there is also substantial concern about giving an unconditional blessing to any particular name resolution mechanism at this point in history.

Thus, there are a variety of options open to us.

  1. Leave it as it is
  2. Agree that we'll put PUBLIC identifiers into XML when we are ready to specify the resolution mechanism; the practical effect is almost certainly that they don't go in for now.
  3. Put a slot in the syntax for PUBLIC identifiers...

Note that there is a continuum between 3b and 3c; we could place varying strengths of recommendation behind one resolution mechanism, with homilies about document portability.

In the area of which resolution mechanism to (perhaps nonexclusively) bless, SGML/Open catalogs (hereinafter Socats) stand out, and would probably be the ERB's choice. On another hand, there has been a lot of work go into the URN effort; on another hand, that work has not yet born practical fruit in terms of ubiquitous implementations; on another hand, the FPI syntax is repellent to some and it is not clear how well it supports internationalization; on another hand, it may be the case that FPI's really are URN's as they stand.

I suspect that if a binding vote were taken today, the ERB would either (a) reinstate the PUBLIC keyword, and put in a nonexclusive recommendation for Socat support, or (b) refuse to put PUBLIC in until there was agreement on a required resolution mechanism.

Input, please.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

18 December 1996


Date: Wed, 18 Dec 1996 11:17:13 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB decisions on RS/RE and whitespace

On December 18, the ERB took up the question of RS/RE and whitespace. All members were present except James Clark and Eve Maler.

The vote in favor of the following was unanimous.

XML processors, when operating without a DTD, are required to consider all bytes that are not markup to be data and to pass them to the application. When operating with a DTD, the processor may, but is not required to, pass on to the app white space known to be insigificant because it's in element content. In the case where it passes white space on, it must also inform the app that this is element content and so cannot be significant.

The XML Specification will contain an appendix which provides a set of recommendations which, if followed by authors, will ensure that they get a parse tree that will be the same whether or not the DTD is taken into consideration. We didn't discuss the exact contents of this: it will include at least [a] no white space where it might be element content and [b] no defaulted attributes; careful attention will be required from everyone on the list to make sure we get this right.

The -XML-SPACE attribute will be retained, but its role becomes advisory; an XML processor will always pass the data as noted above, and must also pass the value of -XML-SPACE when specified. The allowed values of -XML-SPACE change to "PRESERVE" and "DEFAULT". Formally

  
   -XML-SPACE (PRESERVE|DEFAULT) #IMPLIED

PRESERVE is a signal from the author to the application that all the whitespace bytes are to be considered significant. DEFAULT means that the application's default handling is considered OK. The attribute's value is considered to be inherited by descendent elements of an element for which it's specified. For the root element, the default is DEFAULT. It is an error, in the context of a DTD, for an element with element content to have -XML-SPACE="PRESERVE".

Further discussion of these issues is unlikely to be read by anyone in their right mind.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

15 January 1997


Date: Wed, 15 Jan 1997 10:22:07 -0800
From: Tim Bray <tbray@textuality.com>
Subject: Changed comment syntax

At a meeting of the ERB, with all present except Magliery and Hollander, it was resolved unanimously to change XML comment syntax so that comments begin '<--*' and end '*-->'. We will still require, pending clarification of this by WG8, that for compatibility, neither '--' nor '--*' can appear in the body of comment; we are hopeful that the '--' restriction can be lifted soon. This will require changes to section 2.5, and to productions 19 and 21. - Tim

22 January 1997


Date: Wed, 22 Jan 1997 13:21:58 -0900
From: "W. Eliot Kimber" <eliot@isogen.com>
Subject: Relationship Taxonomy Questions

All,

The ERB would like to ask the list to address the following questions relating to relationship (link) typing.

The questions:

Assuming that there is a distinction between link behavior and the relationship types that links represent, and in particular, a distinction between behavioral "primitives" and relationship "primitives":

1. Is it necessary or useful for XML to define some finite set of well-defined relationship types or primitives?

Our presumption, as yet unproved, is that the interoperation of XML documents within some general purview (e.g., the Web, as opposed to domain-specific purviews, such as a particular intranet) requires some basic set of link types whose meaning is well defined and understood. This presumption is based in part on the opinion that typing links is in fact a useful thing to do for some types of information.

We take it as a given that the set of possible relationship types is unbounded.

2. If the answer to question 1 is "yes", what is the list of types?
3. Given such a list, A. can these types be considered to be a set supertypes from which new types may be derived? B. If so, what mechanisms could or should be used to define such a class hierarchy?
4. Is there a preferred formalism, in terms of prose rhetoric, formal notation, or both, by which the meaning of relationship types should be expressed. NOTE: this formalism cannot consist only of behavior specification (although it may include a behavior specification).

NOTE: The issue of behavior primitives is not open for discussion at this time. The behavior issue will be taken up after the base link representation syntax and link typing issues have been sufficiently resolved.

Definition of terms:

behavior
What happens when some agent interacts with the link, either directly or by interaction with one of its link ends. Behavior includes all of what happens in user interfaces, and could also include behaviors of translators, processors, query engines, etc. In the general case, behavior is not permanently (and exclusively) bound to data objects (i.e., the SGML content vs. style model). However, some element types or base element type classes may have semantics that largely or exclusively suggest a particular behavior (e.g., <font>), although it is generally regarded as poor practice for most applications (partly because implementation of the suggested behavior cannot be universally enforced).

In the SGML model, behavior can be considered an aspect of "style" or presentation and may be defined explicitly through "style sheets" or "processing specifications" or may be embedded into a particular browser or processor (e.g., HTML browsers pre CSS). In this broad definition of the term style, mechanisms such as scripts, controls, and plug-ins could all be considered aspects of style specification.

At this point we are assuming that behavior will be specified both in some normative way in an XML specification and, at user option, through some as-yet-undetermined behavior specification system or systems (e.g. "link style sheet").

relationship
A semantic association among two or more objects intended to describe the nature of the assocation. Relationship types may be thought of as analogous to element types in SGML, such that where element types classify data objects, relationships (and thus links) classify assocations among data objects. Like element types, relationship types can range from the very general ("linked") to the very specific ("Counterargument").

Our assumption is that links always represent relationships of some defined (albeit possibly very general) type. In the likely syntax design model, the link type will be named either through the element type of a link element or through an attribute that defines the type name.

Relationships are distinguished by relationship type. In addition, some relationship description models may further describe relationships by naming the roles in the relationship (e.g., the HyTime hyperlink model). As for element types, some relationship types may largely or exclusively suggest a particular behavior (e.g, <goto>). Such relationship types are poor practice only when their use fails to identify a more specific relationship type that would enhance the value of the information in the scope of its expected application and use (in other words, when you don't care about the link type, there's little value in being more specific than "link", but if your system expects and depends on typed links, you'd better type them).

Relationships may be implicit in the data structure (i.e., the hierarchical relationships defined by SGML markup) or explicit through hyperlinks or other associative systems (i.e., relational databases).

semantic
The "meaning" associated with a type. The term "semantic" is dangerous because it is overloaded and can mean different things in different contexts. In this discussion, we are trying to clearly differentiate meaning, which is abstract, from behavior, which is concrete. In general, there is a one-to-one relationship between a type and its semantic, but a one-to-many relationship between a type and its possible behaviors. In other words, a type's semantic doesn't change, but that semantic may be interpreted into specific behaviors in a variety of ways depending on the use to which the type is put or the arbitrary whim of the behavior specifier.

Cheers,

Eliot.


--
W. Eliot Kimber (eliot@isogen.com)
Senior SGML Consulting Engineer, Highland Consulting
2200 North Lamar Street, Suite 230, Dallas, Texas 75202
+1-214-953-0004 +1-214-953-3152 fax
http://www.isogen.com (work) http://www.drmacro.com (home)
"Rats in the morning, rats in the afternoon...if they don't go away, I'll be re-educated soon..." --Austin Lounge Lizards, "1984 Blues"

31 January 1997


Date: Fri, 31 Jan 1997 10:47:54 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB publishes discussion framework and vote schedule for XML-Link

The ERB met today and agreed to proceed with building the XML-Link spec as follows.

I have abstracted from the initial draft spec a set of 82 discussion points, each of which is identified with reference to language in the draft spec.

The basic idea is to proceed as we did with XML - I will email out each of these to the WG as a discussion item, requesting people, where possible, to use the subject line so created in order to keep us moving forward. Obviously, there are some cases where a set of questions is closely related; in quite a few cases, I will exercise editorial judgement and mail out a small batch of questions under a single title, where it seems sane.

These fall nicely into 5 groups, corresponding to the 5 top-level sections in the draft spec. The plan goes like this:

We have to have our draft nailed down to get distributed at WWW6, which begins April 7; which means the copy finalized basically last of March. The fact that the planned voting ends on March 12 gives us a (painfully small) bit of slack to deal with hot items or others that will undoubtedly spill out of the voting process.

To provide a larger-scale context for the discussion, I have a primitively HTML-ified version of the full question list at http://www.textuality.com/sgml-erb/xml-link-work.html

Please note that the numbers do not predispose any decisions about the final spec; they are simply a convenient mechanism for associating questions with existing spec language that gives background for them.

Stand by for batches 0 and 1.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

12 February 1997


Date: Wed, 12 Feb 1997 13:42:50 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB meeting of Feb. 12th

The ERB met Feb. 12th. Present were Bosak, Bray, Clark, DeRose, Magliery, Maler, Paoli. All decisions were unanimous.

1. The title of the spec will be "Hypertext Links in XML". There will be no new acronym, XHL or XHA or anything. The URL fragment will be WD-xml-link. The URL fragment for the XML syntax spec will be WD-xml-syntax. The URL fragment WD-xml will point to a tiny document just containing pointers to WD-xml-link, WD-xml-syntax, and presumably at least one more part in the future.

2. Links will be expressed as XML elements. We will write the spec so that the only other spec it depends on is xml-syntax. Obviously, links will be SGML elements as well.

3. We deferred the question of a link processor until we have more of the spec done; if we need to define a link processor in order to meet our specification goals, we will.

4. We deferred the question of a mechanism for signaling what link machinery is being used until we know what machinery is available.

5. We decided that formatting issues are outside the scope of XML linking, and we will neither discuss them nor provide a special attribute nor any other machinery in this specification for communicating formatting information. Note that we fully appreciate that the distinction between formatting and behavior is troublesome at best; this decision does not prejudice the possibility that XML links may contain behavior attributes and that the spec may predefine certain behaviors. In the ERB discussion it became obvious that lots more work is needed on this particular area.

6. We agreed that if we say that the links are elements and attributes, this provides all the syntax definition that we need; thus no additional BNF is required in the specification. [Ed note: Yes!]

7. We agreed that no special language is required in the spec to say that the links must be well-formed in the XML sense. While this spec is primarily designed for use in the XML domain, there seems nothing to gain in placing barriers in the way of full SGML processors that may wish to use this machinery in non-well-formed documents.

8. We spent the rest of the meeting arguing over details of terminology, without coming to a resolution; an additional meeting has been scheduled for Saturday morning [yecch] to enable us to finish working through this and the remaining 1.* questions.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

15 February 1997


Date: Sat, 15 Feb 1997 20:10:49 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB terminology votes

The ERB met Sat. Feb. 15th. Present: Bosak, Bray, DeRose, Magliery, Maler, Paoli, Sperberg-McQueen. All decisions were unanimous.

We spent most of the time on the issue of terminology detail. Although this was not articulated formally, some underlying design principles seem to have guided us:

  1. We should re-use Web terminology where appropriate (thanks to Dan for this input)
  2. We should not be afraid of lengthier English compound constructions as opposed to single words, when this makes things easier to understand and explain (thanks to Liora)
  3. We should distinguish clearly between terms for the underlying Platonic concepts and those for the syntactic constructs (thanks to Henry)

We had discovered that, even at this late date, there was still room for confusion as to which bits were which; so Steve and I, inspired by Henry, cooked up a simple picture that was very helpful:

  
<BOOK><A NAME="foo" HREF="http://x.com/y/z.html#SEC1">Click here</A></BOOK>
|------------------------------p0-----------------------------------------|
      |------------------------p1----------------------------------|
                    |----------p2-------------------|
                          |----p3------------------|
                          |----p4-------------| |p5|
 
<BOOK><SEC ID="SEC1">Thank you for clicking to get here.</SEC></BOOK>
|------------------------------q0-----------------------------------|
      |------------------------q1----------------------------|
 

1. The relationship which the "<A>" element asserts the existence of is called a "link".

There is an interesting ontological debate as to whether the link is in fact the assertion, or whether the link already existed and the linking machinery merely describes it, but it is probably not necessary to resolve this for the purposes of the spec. I will cheerfully argue this point with anyone as long as they keep buying the necessary beer. WWW theory, as pointed out by Dan Connolly, is explicit that the link is the assertion.

2. An XML or SGML element (example: p1) which serves as the syntactic expression of a link is called a "linking element".

3. A participant in a link relationship (example: q1) is called a "resource". Our definition will be very similar to the official WWW definition, found in http://www.w3.org/pub/WWW/Architecture/Terms which everyone on this list should go and read. That definition is:

an addressable unit of information or service in the Web. Examples include files, images, documents, programs, query results, etc.

In our case we should not limit it to "in the Web". Note that a resource could include the results of an SQL query, a temporally limited section of a video clip, or the invocation of a script that flushes a toilet in Tuktoyaktuk.

There is an interesting debate, in the case of the example, as to whether one or two resources are involved. Clearly, "q1" is a resource. If there another resource, it is probably the linking element itself, "p1". It is clear that in some cases (independent links or out-of-line links or whatever), a linking element need not be a resource. Unlike the ontological debate mentioned above, we are going to have to decide this one to get a clean spec.

4. A string used to specify a resource (example: p3) is called a "locator". It might be a name or an address or a query expression; one way or another it is undeniably used to locate the resource.

5. An attribute containing a locator (example: p2), is called a "locator attribute". Should we end up, in the case of multi-ended links, using subelements to hold locators, they would be called "locator elements".

Note that a few items that are labeled in the picture do not appear in this discussion. They appear because our discussion revealed that we may not be finished with the terminology battle; there may be some more concepts that are worthwhile nailing down. My next message will present these issues for further discussion.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

19 February 1997


Date: Thu, 20 Feb 1997 15:08:29 -0500
From: "Steven J. DeRose" <sjd@ebt.com>
Subject: DRAFT: Summary of ERB conference call

The following is a summary of the ERB's telephone conference of 2/19/97, ending with a question to the WG:

We considered questions 2.1 through 2.4.

2.1.a Should we allow link recognition via a reserved attribute?

The ERB is strongly leaning to yes.

2.1.b If so, should we generalize this and say that it's an AF?

2.1.c If so, should we provide an introduction to AF's?

We achieved consensus that we should provide clear self-sufficient documentation; readers should not have to understand the extended facilities annex to understand what we are talking about. We should not go further: no general introduction to AFs. We should mention at least once that it is an AF; with the reference to HyTime probably.

2.1.d If we allow such recognition, what should the attribute be, and what should be the values for each element type we define?

This remains undecided. We are leaning toward XML-Link, but the nature of values must remain uncertain until we decide about the constructs the values are naming. There is some thought of having xml-tlink and xml-ilink as attributes, whose value has all the information we need, eg something roughly like

 
  xml-tlink="url (http://www.uic.edu/orgs/tei/p3) id (foo) child (3 p)"

(This is an illustrative example ONLY. It is NOT a proposal.)

2.2.a Should we allow link recognition via a reserved GI?

This remains hotly in question; see below. It should be noted that the HyTime TC introduces some of this effect by default: a GI that matches the name of an active architectural form, defaults to being of that form.

2.2.b If so, what should the set of GI's be?

The same as the keyword values of xml-link attribute (if it has keyword values ...)

2.3 Should we provide a PI or other signaling mechanism whereby a document can specify that particular elements ought to be processed as link elements?

This remains unclear; see below. If the proposed SGML TC fails for any reason and we never have multiple attlists, this is the way to go (this appears to be the consensus, or at least the majority view).

2.4: Should we allow that processors can decide that something is a link for their own unspecified reasons, e.g. hardwired knowledge of their own private element types, or external interaction?

The consensus was that we should not forbid, discourage, encourage, or mention this more than necessary.

Overall:

We almost all agree that the best long-term solution is to allow multiple ATTLIST decls, so a document can start

  
   <!DOCTYPE tei.2 public "-//TEI//DTD P3//EN" [
   <!ATTLIST xref
             xml-link    CDATA   #fixed "xml-tlink" >
   ]>

This documents that the tei element <xref> is a tlink in XML terms.

We think we can have multiple attlists when the TC passes. It's not clear what to do in the meantime. Choices include:

1 Play it Safe. Document the magic attribute and go no further yet. To use it with validating systems, you'll have to monkey with the actual DTD. XML docs will otherwise be verbose.

After the TC passes, we'll add documentation for doing it with multiple attlists, so you don't have to monkey with the DTD.

2 Count on Utopian ATTLISTs. Document the magic attribute and the use of multiple attlists. After the TC passes, XML will be kosher. Until it passes, it's non-conforming (although individual documents can choose to be SGML-conformant or not). If the TC fails, we have to go back and add a PI.

3 Stopgap PI. If we can't stand the verbosity of Playing it Safe, and can't risk counting on Utopian ATTLISTs, we need a stopgap. The simplest seems to be to define a PI for the necessary function, e.g.

  
  <?ATTLIST xref xml-link CDATA 'xml-tlink'  ?>
or
 
  <?xml-attlist xref xml-link cdata 'xml-tlink' ?>

In the short term, we have no verbosity problems. In the long term, after the TC passes, we withdraw the PI and use only multiple attlists. If the TC fails, we change nothing.

Drawbacks: planned obsolescence of this syntax may be hard to enforce. Once we have XML legacy data, removing the PI in version 2 may be hard. (Of course, xml link won't be final until november or so, we should know by then. Until November, we should -- by definition -- not have legacy data. All xml data made before then is experimental and has no claim on our protection.)

4 GI escape hatch. Like playing it safe, but to avoid the verbosity problem we at least allow GIs to be recognized. so in the long term you can be terse with multiple attlists; in the short term, by using magic gis.

The ERB requests that the WG consider the question of how best to deal with this signalling issue, given the options and tradeoffs (and any others the WG may perceive).

Steve
(with copious thanks to Michael and his notes)

22 February 1997


Date: Mon, 24 Feb 1997 16:22:11 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB decisions on linking element recognition

The ERB met Sat. Feb. 22. Present: all but Clark and DeRose. Discussion was on the subject of recognition of linking elements. The decisions we have apparently taken do leave us with some fairly serious concerns, so we are submitting both the decisions and the concerns to the WG on the theory that someone may convince us either that the concerns are overblown, or that they are understated and that we should proceed to a fallback position.

I. Recognizing linking elements via attributes

With respect to recognition of linking elements, the ERB has consensus that the best way to do this is with reserved attributes. The attribute name should probably be "XML-LINK". There is, however, a consequence which could lead to direct conflict between the desire for operational simplicity and that for document validity.

The problem is, how to declare and provide a default value for the XML-LINK attribute?

1. The case where no markup declarations are provided: plan A: supply the attribute for each element that is a linking element.

2. The case where only the internal subset is to be provided: plan B: declare the attribute with a #FIXED default value in an <!ATTLIST in the internal subset.

Both of these are perfectly viable; plan A might be sensible even with an internal subset, if the number of linking attributes is small.

3. The case where there is an external DTD subset:

But plan C means the client has to fetch the external subset in order to get the necessary declaration. Which is a violation, we think, of our axioms. To avoid that, stick it in the internal subset. Oops, then either

Plan C1 is operationally infeasible. C2 violates another of our axioms, that XML should support operations on valid documents.

Of course, the problem goes away if 8879 is modified to remove the current prohibition on multiple <!ATTLIST declarations; and we hope that this will happen in the not-too-distant future.

However, the ERB is convinced that for Web viability, there must be a signaling mechanism that's within the document instance that gets sent down the pipe. So, if it seems that the conflict with 8879 doesn't go away, we will avail ourselves of an escape hatch; at the moment there seem to be four options:

There was sharp dissension in the ERB on what to do given the uncertainty on what WG8 will do. Several members, while respecting the integrity and appropriateness of the <!LINKTYPE technique, find the syntax, and the prospect of explaining it, repellent. Nonetheless, the <!LINKTYPE technique, if only as an interim measure, remains the choice of at least one member. The PI technique is nicer looking and easier to explain, but requires extra implementation. It also (I think) remains the first choice of one or more ERB members. Some feel that the <!ATTLIST in the subset has the advantage of requiring no extra syntax beyond that in base XML, and the problem of the conflict between manageability and maintaining SGML validity would be a non-issue, operationally. There is also concern on the part of ERB members about adopting interim measures at all, especially while there is active work going on among WG8 members in an effort to address the concerns raised by the XML work.

By a vote of 8 to 1 (Sperberg-McQueen dissenting), the ERB tentatively decided to

II. Recognizing linking elements via GI

The ERB voted as follows: In favor of allowing recognition via GI: Bray, Magliery, Maler, Kimber. Against: Bosak, Hollander, Paoli, Sharpe, Sperberg-McQueen. Thus the measure fails.

The arguments here are simple. In favor of doing this are the facts that it's easy to explain, and that this is the specified default behavior for an architecture anyhow. Against it are the benefits of having only one way to do things, with the accompanying desirable shrinkage and simplification in the specification. Given the closeness of the vote, I think I can speak for most of us in saying that it was a pretty close call, and most of the ERB, regardless of the way they voted, probably could have lived with it going either way; the arguments on both sides are palpably good, and the consequences of a wrong choice don't seem that severe.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

26 February / 1 March 1997


Date: Sat, 01 Mar 1997 18:39:22 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB work on 3.* (Linking Elements) issues

The ERB has now put two meetings work in on this set of issues and is nowhere near done. Not surprising, given the importance of the issues. One of the factors holding us back a bit has been the fact that the discussion in the WG on the 3.* issues has been lacking in both volume and depth. Reasons for this might be (a) that the WG is tired (the ERB is), (b) that the WG is busy on other things, and (c) that the WG has substantially less experience in these issues than in those that came up in the XML language discussion.

Partially as a consequence, the following decisions include some constructs and ideas that were made up on-the-spot in the ERB without WG discussion. For these reasons, this set of decisions would benefit from particularly close review by the WG.

Unfortunately, due to workload and brain dysfunction caused by illness, I do not have ERB attendence rosters and votes. However I believe that none of the decisions below had any dissenters; if I've missed any I'd appreciate corrections from the wronged ERB members.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

3.a List link required?

3.a. The initial draft does not have any construct analogous to the "List Link" construct in Eliot's proposal. Do we need one?

The ERB detects no strong requirement to proceed on this in the near term. Once the shape of the Extended Link is better-defined, we may wish to revisit this.

3.1.a All linkage info in markup not data?

3.1.a Should we have a principle that all linkage information is encoded in GIs and/or attribute values, never in character data?

No. ERB Consensus was that this doesn't need saying and may unduly constrain design decisions.

3.1 b-h: ROLE & LOCATOR attributes

Dimension 1: Which pieces of information should be specified for possible inclusion in linking elements?

ERB consensus:

The decision on LOCATOR SCHEME is deferred until we tackle addressing

The question of whether these are optional or required remains undecided.

3.1 b-h: LABEL attributes

Dimension 1: Which pieces of information should be specified for possible inclusion in linking elements? ... e. caption ...

ERB consensus:

However, the ERB is unhappy with allowing only a simple character string; how does one support multilingual labeling, or labels which are graphic? Our provisional solution is to support both a RESOURCE LABEL attribute and/or a RESOURCE LABEL LOCATOR attribute, the latter being used to locate and retrieve a structured label. But then do we need a LABEL LOCATOR SCHEME? Or make it use the same scheme as the resource locator? We decided to defer this until we had a better understanding of linktypes and addresses.

3.2.b Locators in attributes or elements?

3.2.b Should the locators of a general link be packaged in attributes as in HyTime, or as child elements as in the initial draft?

Consensus of the ERB is to use subelements to package up locators in extended links. The simplicity of doing it all with attributes was appealing, but there are two big problems:

Eliot assures us that this is legal HyTime, given the right grove plan.

3.4 New Item: unify simple and extended links

During the meeting of March 1st, the ERB agreed to package up xlink locators in subelements. Jean Paoli pointed out [but I have agreed to bring this forward since he's on the road] that the declarations and attlists for simple and extended links are very similar; and it might be appealing to allow one locator to exist within the start-tag of an extended link, such that an extended link is just a simple link with child elements. E.g.

 
<a role="3-way" href="#lab1"><extra href="#lab2"><extra href="#lab3"></a>

One virtue of this is that we can go to the web-heads and say "not only have we given you a powerful extended link facility, but you can do it just by adding children to your existing <A> elements".

Some problems come up:

Input requested.

3.1 b-h: BEHAVIOR

Dimension 1: Which pieces of information should be specified for possible inclusion in linking elements? ... f. behavior ...

This one has been tough. Since we put in immense amounts of time, I will try to reproduce some of our discussions.

Some of the options:

  1. say nothing about behavior; leave it to the apps and to stylesheeting
  2. provide a behavior bucket; an attribute in which to pass behavior info, but specify nothing about what goes in there
  3. provide one or two attributes governing simple abstract axes of link behavior policy, with lots of room for user-agents/clients to devise mechanisms to meet the policies
  4. provide a rich, detailed, set of behavior specification rules that people such as users of EBT products, TEI, and HyTime have come to expect.

(a) and (b) seem the safest in terms of avoiding doing something really stupid. However, there was stringent opposition from those speaking on behalf of the authors, who wanted some (even if only abstract) way to signal whether a link, when activated, should cause the replacement of the current display, or transclusion behavior; they claimed that without this, there was no interoperability. It is also material that on the WWW, there are at least two well-defined link behaviors, that of the <A HREF= and that of the <IMG SRC= in the HTML case, accomplished by associating application semantics with well-known GI's. Also, perhaps <FORM ACTION= is really a link?

(d) is probably the right thing to do, had we but world enough and time, but also carries a lot of risk in terms of overspecifying and getting things wrong.

So, we ended up with (c). The ERB consensus is that there will be two attributes specified to control behavior; the names and values given below are provisional.

Each attribute may be provided per-linking-element or per-resource; while there was a sense that per-linking-elements could probably be used to provide defaults, and be overriden by per-resource elements, this was not discussed thoroughly, nor really voted on.

There will be an attribute named SHOW, which may have one of three values:

INCLUDE
means that upon traversal of the link, the indicated resource should be embedded, for the purposes of display or processing, in the body of the resource where the traversal started. (e.g. like HTML <IMG>)
REPLACE
means that upon traversal of the link, the indicated resource should, for the purposes of display or processing, replace the resource where the traversal started. (e.g. like HTML <A>)
NEW
means that upon traversal of the link, the indicated resource should be displayed or processed in a new context, not affecting that of the resource where the traversal started (e.g. like HTML <A TARGET="NEW"> [I think])

There will be an attribute named ACTUATE, which may have one of three values:

AUTO
means that the link should be traversed and used when encountered; that the display or processing of the resource where the traversal started is not considered complete until this is done (e.g. HTML <IMG>)
USER
means that the link should not be traversed until there is an explicit external request for this to happen (e.g. HTML <A>)
PUSH
means that the resource is volatile, subject to change, and should be processed immediately and continuously.

Notes:

  1. HTML <A> is equivalent to SHOW="REPLACE" ACTUATE="USER"
  2. HTML <IMG> is equivalent to SHOW="INCLUDE" ACTUATE="AUTO"
  3. Obviously, it is legal for user-agents to ignore these settings and do as they will; for example, turning image loading off.

With reference to the values of the ACTUATE element, it may be possible to use the event names that are being proposed for use in the W3C's work in progress on the "Document Object Model"; we have an action item to check this out and do so if possible.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

3.2.a Name for generalized multilink

3.2.a What should the generalized multiway link be called? Multilink, General Link, Independent link, anything else?

There has been no vote in the ERB on this. But "Extended Link" seems to have taken the world by storm, and that's what we say in ERB discussions now; if anyone has a problem with this they should holler now or it will just become a de fact choice.

8 / 12 March 1997


Date: Fri, 14 Mar 1997 08:48:28 -0800
From: Tim Bray <tbray@textuality.com>
Subject: Recent ERB work - miscellaneous

More reports from the March 8 and 12 meetings. Attendence was confusing: every member of the ERB was in attendence in part, but there was a certain amount of checking in and out based on travel plans and interruptions; my notes record no dissenting votes to any of the following, but some member who missed a particular vote may choose to record a dissent:

1. The XML-LINK attribute will get a new declared value. So far, we have (LINK|XLINK), which inconveniently doesn't provide a this-ain't-no-steenking-link value to override a defaulted value from the subset. So we'll change it to (LINK|XLINK|FALSE).

2. We discussed the idea of having a way to provide a base address; the LOCATION-SOURCE and IMPLIED-LOCATION-SOURCE stuff in the initial draft. The ERB is powerfully in favor of giving entities a way to specify the canonical address by which they'd like to be referred to, bookmarked, relative address computed, etc. But we realize this is really not XML-LINK stuff, just a very convenient convenience feature for the XML language itself. So we provisionally decided (provisionally because this hasn't had WG exposure) to create a new per-entity <?XML-BASE PI and write that into the language spec. We could not, at this time, muster support for the IMPLIED-LOCATION-SOURCE stuff.

3. We took up the questions of what subset of TEI Xpointers we're going to need. We developed consensus that:

4. We decided that we were not going to specify support for any query language, built-in or FOREIGN, in XML-Link 1.0, beyond TEI Xpointers, which are in fact a query language. Support in some way for SDQL remains firmly on the agenda, but not in this release.

5. On the subject of Extended Link Groups, the ERB is unanimous that we must have some way of providing this function - of pointing to other documents that contain links into a current one. However, there is another standardization effort underway in the W3C called Web Collections that may well be, in parallel, cooking up a solution to this same problem. Furthermore, the rumor mill says that the idea of using XML for Web Collections is in play. We have an action item to check this out - if we can outsource this job to existing web machinery, it would clearly be a good thing.

6. We discussed whether or not we should specify a process model, i.e. assert that a locator containing a '#' should be split in two, the part before the '#' being handled by the server to retrieve a resource, the part after being used by the client to track down a more specific resource within the retrieved one. This is universal Web behavior. We decided not to specify this, and leave it to the implementors; clearly we can think of cases (humungous SGML docs) where we'd like the server to help out with the after-the-# part; while it's hard to see how that could happen in the current Web architecture, we are reluctant to rule it out.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

15 March 1997

Report


Date: Sun, 16 Mar 1997 18:58:24 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB: decision and conundrum

More on addressing. On March 15, the ERB agreed that:

1. Contrary to our decision of last time, we will support subelement addressing by a simple search operator. We will make it clear that bit-for-bit matching without respect to words or tokens is compliant behavior; if implementations wish to compete on the basis of case-folding or other fancy search optimization, that's fine.

2. Locators shall consist of a URL, optionally followed by a '#'

3. The '#' may be followed by the string "<tei>", in which case the remainder of the locator is to be treated as a TEI extended pointer. Michael Sperberg-McQueen has an action item to figure out the required changes to TEI xptr syntax to fit them into a URL.

Note: with respect to our previous concerns on internationalization, we investigated and it appears that both Netscape and MSIE are trying to do the right thing; while there remain bugs in this area, our policy seems to be reasonable.

On another subject, we agonized further over the fact that current implementations of '#' in URLs always fetch the whole document and then navigate to the fragment in the client. For SGML, this is probably often unreasonable. Too bad - this behavior is not carved in stone; early implementations that stupidly try to fetch the entire OED or Physician's Desk Reference, just to pull out a fragment, will not succeed in the marketplace

CONUNDRUM:

4. If the '#' is followed only by a string, then.... what? This should be an IDREF, right? Maybe. And if it is, how do you know how to find ID attributes in an XML document out at the far end of a URL? Can you be sure of finding the appropriate declaration in the internal DTD subset? Can you be sure of finding the external subset?

On the Web, in the URL "http://foo.bar.com/baz.html#sec1.2", the "sec1.2" should correspond to a <A NAME='sec1.2'. It is not, in the HTML DTD, an ID attribute. They want to use more characters than SGML ID allows, and they don't want to enforce uniqueness. If there is more than one matching NAME=, few browsers will do anything reasonable, but it's not an error. In fact, the semantics of #-fragments in HTML are easily expressed in a simple TEI xptr query saying "find the first A element whose NAME attribute has the value whatever". We could duplicate that in XML, but it feels limiting. We could duplicate it but, in the linking element, provide other attributes to say what the element type and attribute name you're trying to match are. But then you're duplicating something you could do with a "#<tei>" string. Or, we could say that it is an IDREF, and by default look for an attribute named 'ID' with the indicated value, and also, if it's possible, look in the internal subset or the whole DTD to find out what attributes are IDs. This would be weaker than HTML in the allowed values (SGML NAME) and requirement for only one match. Big deal?

What we want is to have a simple behavior that makes sense, specified simply. No surprise that it's hard to be simple. Input and inspiration from the WG are solicited.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

Correction


Date: Mon, 17 Mar 1997 07:34:06 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB reporting group

At 06:58 PM 3/16/97 -0800, Tim Bray wrote:

1. Contrary to our decision of last time, we will support subelement addressing by a simple search operator. We will make it clear that bit-for-bit matching without respect to words or tokens is compliant behavior; if implementations wish to compete on the basis of case-folding or other fancy search optimization, that's fine.

Oops. I am reminded that we never got around to voting on that one, and that it's not a done deal. Slap the court reporter's wrists. - Tim

19 March 1997


Date: Wed, 19 Mar 97 13:07:22 CST
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: the return of the Public Identifier Question

Executive summary: ERB will take up the question of public identifiers next week. Current leaning appears to be toward

During its meeting today (19 March 1997), the ERB discussed (among other things) the preparation of a corrected draft of the XML language spec in time for distribution at the Web conference next month.

In particular, we agreed to vote, next week, on the issue of public identifiers and their inclusion in the XML-Lang spec. This issue was discussed in the WG when the subcommittee draft was released on 31 January, but without reaching an absolutely clear consensus. The ERB discussion (and an informal straw poll) made fairly clear that the ERB is leaning to the position(s) described below. Those with an interest in this topic have a week to confirm the ERB in its leanings, or make a conclusive case for another solution. (Simple reiterations of arguments already made do not, however, qualify as a conclusive case.)

1 There is strong but not unanimous sentiment for changing the syntax of external identifiers in XML to allow public identifiers. Some on the ERB would strongly prefer that a resolution mechanism be specified as well, but at least some of the pro-resolution camp are willing to add the syntax even if no consensus can be reached on a resolution method.

2 There is also a general leaning toward the view that if public identifiers are included, a resolution mechanism should also be defined. (Pro: an implementer can read the spec and know what is involved in supporting it. Con: there is no currently accessible resolution mechanism that appears to command consensus, so there is nothing ready for inclusion in the XML language spec.)

If we can agree on a suitable resolution mechanism, we'll include it in the revised spec (see below for an explanation of why this seems unlikely); if we can't, we'll include the syntax in the spec anyway, with a note that work on a resolution method is continuing.

3 There appear to be three approaches to resolution that command or could command non-negligible support:

  1. SGML Open Catalogs, as specified in the current version of the relevant SGML Open technical resolution
  2. a simplified form of SGML Open Catalogs, not necessarily that proposed on 31 January by the subcommittee
  3. reliance on URN resolution mechanisms

With regard to these, the ERB leanings appear to be:

  1. Support for full SGML Open catalogs is probably more work than should be demanded of XML implementors; the relevant TR should probably be mentioned as a relevant standard, but not incorporated in full.
  2. The ERB is leaning toward including a suitable simplification of SGML Open catalogs in the XML-Lang spec as a required minimum for XML implementations. Conforming implementations may support additional resolution techniques as well, but should all support at least this one. However, there is no consensus that the current subcommittee draft hits the right note here, and it seems likely that the draft of 31 March (or whenever) will have just a promissory note.
  3. URNs will be a plausible mechanism to consider when they are complete, but this appears not yet to be the case.

4 If an external id contains both a system identifier and a public identifier, the XML spec might specify which to try first, when to try both, etc., or it might leave such things unspecified. The possible policies appear to be these:

  1. Forbid this combination: one or the other is allowed, but not both. A minority of the ERB favored this; others in the ERB felt they could live with it, but favored another approach.
  2. System first, then public (if the system id 'fails', whatever an implementation decides that means). No support for this.
  3. Public first, then system (if the public id is not found in the catalog). One vote for this.
  4. Implementations may choose which to try first, but if the first ID it tries fails, then the implementation should try the other one. I.e. implementations may not say "If both a PUBLIC and a SYSTEM identifier are given, the XXXXX one is processed and the YYYYY one is ignored." Strong support for this view.
  5. Leave unspecified, as in 8879. No support at all for this approach.
  6. Leave for future decision. Minority support for this.

If anyone on the WG has reached enlightenment on these issues in the time since they were last discussed, please share your light with the rest of us.

-C. M. Sperberg-McQueen

22 March 1997


Date: Sun, 23 Mar 1997 18:12:27 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB call on addressing

After endless further discussion, and realization that co-existing with the web is hard, the ERB, on March 22, voted as follows:

A locator is a string which may contain either or both a URL and a TEI extended pointer [Xptr]. The URL indicates a resource; if the Xptr appears, this means that the desired resource is a "sub-resource" of that indicated by the URL. The URL must appear first in the locator string. If the URL does not appear, the Xptr is to be applied to the current document. If an Xptr appears, it must be preceded by a Separator character. There are three possible separator characters:

# - means that the user-agent is to fetch the resource described by the URL, and use the Xptr to extract the desired sub-resource. e.g.: http://www.xml.com/faq.xml#ID(a27)

? - means that the user-agent is to transmit the URL and Xptr to the server, which is to use the Xptr to extract the desired sub-resource and transmit it to the user-agent. In this case, the Xptr must be preceded by the string "XML-PTR=" e.g.: http://www.xml.com/faq.xml?XML-PTR=ID(A27)

| - means that this locator only expresses the fact that the desired sub-resource is to be retrieved by applying the Xptr to the resource identified by the URL. No constraint is placed on the system as to how this should be accomplished. e.g.: http://www.xml.com/faq.xml|ID(A27)

Notes:

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

26 March 1997


Date: Wed, 26 Mar 1997 11:48:29 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB Decisions of March 26th

The ERB met Wed. March 26th. All members were present except Dave Hollander, who was represented by proxy [there had been plenty of advance discussion of the items to be voted]; Peter Sharpe missed the final vote due to having to leave early.

1. Should the spec be changed to allow attribute values (specifically the nonterminal QuotedCData) to include unsecaped "<"?

After some discussion, we were unable to develop any consensus in favor of re-opening this question. The spec stands as is.

2. XML requires the string ']]>' to be escaped as ']]&gt;' when it is data. Should the draft specify that this is "for compatibility" only?

Unanimous: Yes, this is for compatibility only.

3. Should the XML declaration at the front of the document entity be made optional?

[Ed. note: a lot of discussion on this one; made more difficult that the people who wanted it optional could see good reasons for making it compulsory, and those who wanted it compulsory could see good reasons for making it optional]

Optional: Bray, Clark, DeRose, Kimber, Magliery, Sharpe, Bosak
Required: Maler, Paoli, Sperberg-McQueen

So it's now optional.

4a. Should we change the way the draft spec describes when and where parameter entity references are legal?

Unanimous: Yes. We have a proposal from Michael and me for major cleanups to describe 4 straightforward ways to use PE's, and much more controversial language for another more general way to use them. The 4 straightforward ones are going into the [imminent] draft spec, the ERB still has to chew on the hard one.

5. Should production 69 (external ID) be changed to make the SystemLiteral optional?

Unanimous: leave it required.

6. In section 4.2.2 "External Entities," should the following sentences be dropped (or modified)?

barring an external mechanism for establishing the base... Relative URLs are relative to the location of the entity or file within which the entity declaration occurs. Relative URLs in entity declarations within the internal DTD subset are thus relative to the location of the document; those in entity declarations in the external subset are relative to the location of the files containing the external subset.

Unanimous: leave as is.

7a. Should production 69 be changed to allow public identifiers?

No issue since DSD's has caused the ERB so much trouble. The vote went as follows:

Yes, allow PUBLIC: Kimber, DeRose, Sperberg-McQueen, Maler, Hollander
No, no PUBLIC ID: Paoli, Sharpe, Magliery, Clark, Bray, Bosak

So in this draft, no public IDs. It should be voted that every person on the No side would change their vote to Yes if there was an agreed-on resolution mechanism for PUBLIC identifiers.

8. Should the predefined entities be removed or altered?

Proposal: Drop all predefined entities

Yes: Kimber, Bosak, Maler
No: Bray, Clark, Paoli, Sperberg-McQueen, Hollander, Sharpe, Magliery, DeRose

Proposal: Well-formed XML docs are considered to have &lt;, &gt;, &apos;, &quot;, and &amp; predefined. Valid XML docs must have them declared if they use them; the spec will give a precise definition of what the declaration must be.

Passed unanimously. Outstanding item to get the declaration just right.

9. Should we allow and ignore the tag omission [-O] [-O] syntax? Another close call. Pro: eases conversion and DTD management. Con: non-functional in XML, another irritant in explaining to the world.

Yes, allow them: DeRose, Kimber, Maler, Sperberg-McQueen
No, keep it as is: Bray, Bosak, Clark, Magliery, Paoli

So no tag omission indicators for now.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

1 April 1997


Date: Tue, 01 Apr 1997 08:09:56 -0800
From: Tim Bray <tbray@textuality.com>
Subject: Feature adjustment

Meeting 97/04/01 [printing deadlines for the WWW6 conference are creating severe time pressures] the ERB voted as follows:

Add sections 6.1-6.17 (draft forthcoming) enabling the use of the CONCUR feature in XML.

Yes: Bray, Bosak, Clark, Maler, Sperberg-McQueen, Sharpe, Paoli, Magliery, DeRose, Hollander
No: Kimber

-Tim

2 April 1997

Underscores in Names


Date: Wed, 2 Apr 1997 11:29:38 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: ERB decision on underscores in names

By mail vote and in conference 1997.04.02, the SGML ERB decided the following question:

Should the XML rule for Name allow underscore within names (by adding it to the class of name-start characters)?

Yes: Hollander, Sharpe, Clark, Bosak, Magliery, DeRose, Kimber, Paoli
No: Sperberg-McQueen, Bray, Maler

Comments


Date: Wed, 2 Apr 1997 11:32:45 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: ERB decision on comment delimiters

By mail vote and in conference 1997.04.02, the SGML ERB decided the following question:

Should we (a) change the draft TC to ask for a SIMPLE COMMENT flag, for the SGML declaration, which would have the effect of suppressing recognition of the com delimiter within comment declarations (so the second com delimiter is recognized only in the context com, mdc), and simultaneously (b) change XML comment delimiters back to '<!--' and '-->'?

Yes: Bray, Bosak, DeRose, Paoli
No: Maler, Hollander, Sharpe, Sperberg-McQueen, Clark, Magliery, Kimber

Second proposal (considered only if the preceding fails):

Should we change the XML comment delimiters back to '<!--' and '-->' and forbid '--' within XML comments?

Yes: Maler, Hollander, Sharpe, Bray, Bosak, Clark, Magliery, Paoli, Kimber
No: Sperberg-McQueen

[Editorial note: In other words, the ERB feels that the SGML prohibition against "--" in comments is better than the alternatives currently available to us.]

Public Identifiers


Date: Wed, 2 Apr 1997 11:38:57 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: ERB decision on public identifiers

By mail vote and in conference 1997.04.02, the SGML ERB decided the following question:

Should the XML spec allow external identifiers to take the form
 
   'PUBLIC' Publicid SystemLiteral
with an appropriate definition of Publicid and the note:
In addition to a system literal, an external identifier may include a public identifier. A system may use the public identifier to try to generate an alternative URL. If a system is unable to do so, it must use the URL specified in the system literal.

Yes: Maler, Hollander, Sperberg-McQueen, Clark, Bosak, Magliery, DeRose, Kimber
No: Bray, Sharpe, Paoli

30 April 1997


Date: Wed, 30 Apr 1997 20:20:40 -0700
From: Tim Bray <tbray@textuality.com>
Subject: News from the ERB

The ERB has somewhat recovered from the run-up to the W6 conference and has been considering its future work items. Meeting today, April 30, the ERB decided to approach decision-making in the following order:

  1. Error handling
  2. Stylesheet linkage (simple and full-function modes)
  3. Unfinished work on XML-link

Once this has been finished (shouldn't take too long) we will consider in what order we should attack the near-infinite number of items on the input stack. One obvious candidate is the original phase 3 work item, the specification of a stylesheet facility suitable for delivery SGML over the Web.

There are also a large number of new work items, all of them clearly of real importance, that would reward consideration. These include:

In related news, we also agreed to aim for new XML-lang and XML-link drafts by July 1.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

7 May 1997


Date: Wed, 07 May 1997 11:32:50 -0700
From: Tim Bray <tbray@textuality.com>
Subject: ERB votes on error handling

The ERB met on May 7th. All members were present in person or by proxy. The chief subject under discussion was error handling; I have been asked to report on the discussion and results. The arguments on both sides have been exhaustively covered, and I won't repeat them. There were, however, a few new issues that came up in the course of the meeting.

1. WF-ness may not be as easy to check as I have been claiming - getting the grammar right for a complex ATTLIST inside an INCLUDed marked section is nontrivial.

2. We have a strong political reality to deal with here in that for the first time, the big browser manufacturers have noticed XML and have together made a strong request: that error-handling be completely deterministic, and that browsers not compete on the basis of excellence in handling mangled documents. It was observed that if they wanted to do this, they could just do it; but then pointed out that this is exactly why standards exist - to codify the desired practices shared between competitors. In any case, if we want XML to succeed on the Web, it will be difficult to throw the first serious request from M & N back in their face.

3. In fact, everyone on the ERB substantially agrees with M&N's goal, in that we do not, ever, want an XML user-agent to encounter a WF error and proceed as though everything were OK. Our disagreements centre on how to use the spec machinery to achieve this.

4. We're not worried that XML editors will silently recover from errors, because they exist precisely to create and manipulate correct content and to fix incorrect content. XML processors that are "read-only" are the things that have the problem, because users have no incentive to prefer error-free documents.

5. We considered an alternative proposal, which makes two major changes to the XML spec by defining the concept of an XML-conformant application, and the concept of a human user. This proposal would require an XML-conformant application, when confronted with a WF error, to refuse to proceed until a human user had been notified of the error and explicitly authorized error recovery. After some discussion, this proposal failed to win majority support - concerns included

However, this proposal did get serious consideration, and quite likely would have attracted significant numbers of votes from the Tolerants in the crowd.

6. If it turns out that there are common classes of WF errors that are bedeviling users, we should be willing to fix the language to address the problem.

7. There are some detailed operational concerns about the draconian model. First, it allows processors to feed parsed info to the app up to the point of error; but is this required, i.e. can a processor refuse to cough up a single byte because the doc is non-WF? Second, it is important that the processor be able to feed the app raw un-parsed text to aid in error repair - given that the processor knows where he is in the entity tree, it's much easier for the processor to do this than the app - and this should probably include portions of the doc before the error.

8. It was pointed out that if adopt the draconian policy, and then at some later point decide that error recovery should be allowed in some or all circumstances, we can relax it. The reverse is not perceived to be true.

So after all this, the vote:

The question is [note special terms 'must', and 'may']:

1. The XML-lang spec should be modified (probably in the conformance section) to state that for well-formed documents, an XML processor must make available to the application, at a minimum, the character data extracted from among the markup in the document, and a description of the logical document structure expressed by the markup.

2. The XML-lang spec should be modified to state:

When an XML processor encounters a violation of a well-formedness constraint, it must report this error to the application. It may continue processing the data to search for further errors, and report such errors to the application. In order to support correction of errors, it may make the unprocessed text from the document, with intermingled character data and markup, available to the application.
Once such a violation is detected, however, the processor must not continue the process, described in [ref. to language in point 1], of passing character data extracted from markup, and description of the logical document structure expressed by the markup, to the application.

Yes: Bosak, Bray, DeRose, Magliery, Maler, Paoli, Wood*
No: Clark, Hollander, Kimber, Sperberg-McQueen

(* Lauren Wood was substituting for Peter Sharpe, with the approval of the Chair and ERB)

On a related point, the ERB agreed to put some application notes in the spec covering the points raised in items 4 and 7 above.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

21 May 1997


Date: Wed, 21 May 1997 11:59:30 -0700
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: ERB disposition of "SD" issues

The SGML ERB met today and decided to proceed on the "SD" questions posted last week as described below. Please note that time constraints and the need to firm up the xml-link spec figured heavily in making these decisions. The ERB has concluded (rather sadly) that it will have to start meeting twice a week again to get through the issues that need to be addressed in order to have new xml-lang and xml-link drafts out on July 1, so your cooperation in focusing email discussions on the questions determined to be of the highest priority will be greatly appreciated.

SD1 - Short End Tags

Finding: This is a religious issue that we can't resolve in the XML 1.0 time frame. A great deal of discussion went into this question early in the design of the language, and the case made for changing this basic feature over the last week has not proven persuasive.

Action: Take any further discussion of this issue elsewhere (for example, comp.text.sgml).

SD2 - Structured Attributes

Finding: The goal for this request needs to be stated more clearly. Jean Paoli has agreed to formulate another statement of what needs to be accomplished here.

Action: Suspend discussion of this question pending receipt of Jean's clarification.

SD3 - Data Types

Finding: We discussed this issue at length based on input received from the WG. We agree that there is a real need here and we are hopeful that we can find a solution that will solve the majority of the most important user requirements in this area. Steve DeRose has taken an action to formulate and post a straw proposal based on a direction that seems promising.

Action: Suspend discussion of this question pending receipt of Steve's proposal.

SD4 - Schema Format

Finding: This question in a related form nearly destroyed the XML effort back in September. The political climate has changed somewhat since then, but several of us feel strongly that an architectural issue of this magnitude has to be undertaken in cooperation with WG8 and is way too big to tackle in the 1.0 time frame. However, we are also hearing that this could be a make-or-break feature for some other W3C activities that are considering XML as their data format.

Action: We will check with the leaders of the related W3C activities and with the W3C Coordination Group responsible for XML liaison within the W3C to better understand the requirements before proceeding any further with this. Please suspend discussion of this question until we have a clearer understanding of the situation.

SD5 - Namespaces

Finding: This one is very difficult, but we are agreed that it is the most important xml-lang question facing us in the near term. We have not yet opened up discussion of the details in the ERB.

Action: Please focus on this question and on the current xml-link issues as top priorities for the time being.

Jon

4 June 1997

Record-end Handling


Date: Tue, 10 Jun 97 18:40:49 CDT
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: RE deleta est

During the ERB meeting of 5 [read: 4 -CMSMcQ] June 1997, the ERB voted unanimously to change the rules for white-space handling in section 2.8. Present: Bosak, Bray, Clark, Connolly, DeRose, Hollander, Magliery, Maler, Paoli, Sperberg-McQueen, Wood; absent: Kimber.

In particular, the paragraph

An XML processor which does not read the DTD must always pass all characters in a document that are not markup through to the application. An XML processor which does read the DTD must always pass all characters in mixed content through to the application. It may also choose to pass white space occurring in element content to the application; if it does so, it must signal to the application that the white space in question is not significant.
will be changed more or less as follows:
An XML processor must always pass all characters in a document that are not markup through to the application. An XML processor which reads the DTD must distinguish white space in element content from other non-markup characters, and signal to the application that white space in element content is not significant.

Rationale: eliminating the optional behavior of suppressing white space in element content eliminates the potential inconsistency among XML processors in the counting of pseudo-elements (this topic came up as a digression from the discussion of pseudo-element counting for CHILD, NEXT, PREV, etc.). Since the Technical Corrigendum to 8879 will provide a KEEPALL keyword for the SGML declaration which will specify that all white space should be passed to the application, the exception for element content is no longer necessary for the sake of compatibility with 8879. Downside: the new rule does mean that existing SGML parsers will need to be modified to retain all white space (e.g. using a run-time switch), but the parser makers in the group considered this to be a relatively simple surgery to perform on existing code.

That afternoon, when this was announced at the joint conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, David Durand was crowned with a victor's wreath. At least, he would have been if Kingston, Ontario, had had any olive trees. He settled for a paper crown instead.

Other decisions bearing not on XML-lang but on XML-link will be reported separately.

-C. M. Sperberg-McQueen

XML-Link Issues


Date: Wed, 11 Jun 1997 13:11:39 -0400
From: "Steven J. DeRose" <sjd@eps.inso.com>
Subject: Report on ERB work last week

(sorry this took a while to get through -- phoneline probs while out of country)

The ERB met last week with Bosak, Bray, Clark, Connolly, DeRose, Hollander, Kimber, Magliery, Maler, Paoli, Sperberg-McQueen and Wood present.

We set as agenda resolving several summary questions under discussion.

Link decisions: 1. syntax

None of the 9 syntactic options in question for distinguishing "HERE" as the EPN keyword from "HERE" as an ID in a URL found enthusiastic support, though several seemed acceptable. James introduced a tenth suggestion, namely to require an empty parameter list after those EPN keywords that do not already require them. For example:
HEREis an ID
HERE()is an extended pointer

This met with immediate enthusiasm for several reasons, including that it increases the consistency of EPN itself and appears to be easy to teach/learn/implement/document. This option was unanimously approved.

Link decisions: 2. Pseudo-elements

Discussion involved several aspects of the problem of whether to count pseudo-elements.

<NOTE TYPE="terminological"> A pseudo-element is a portion of #PCDATA content uninterrupted by markup. A "real" element in contrast, is one that has a GI. "Subelement addressing" involves addresses like "the third word". </NOTE>

Whitespace relates to both pseudo-element and sub-element addressing. We tabled the pseudoelement issue to discuss how the SGML TC changes re. whitespace relates. The result was RE deleta est as reported already by Michael.

Returning to the pseudo-element question, we noted that the removal of ambiguity about the presence of whitespace removes ambiguity in how to count pseudo-elements (though not about whether to).

The great cost of not counting pseudo-elements is that then you cannot address them. It was pointed out that if you do still allow sub-element addressing (such as character offsets into #PCDATA), you can get at pseudo-elements that way, but that character counting across markup boundaries is itself complex and relatively fragile. It also imposes a subtle incompatiblity with HyTime and with TEI pointers (and not just for CHILD, but for several other keywords including complex cases such as PRECEDING and DESCENDANT).

After much discussion the ERB is leaning toward a proposal under which both options are available to the user, distinguished by the GI parameter. This has not been voted, but seems at this time to be the best compromise. Thus:
CHILD (3)locates the 3rd real subelement
CHILD (3 *)locates the 3rd real subelement
CHILD (3 !)locates the 3rd real or pseudo subelement

(the particular reserved value to flag the last case is to be determined; "!" is merely for illustration).

The approach was also suggested, that pseudo-elements consisting only of whitespace not be counted. This may enhance intuitiveness and compatibility with SGML systems that do not yet support the TC.

This proposal will be presented to the WG for discussion.

Link decisions: 3. Sets & singletons

Discussion here centered on our relationship to the DOM work, since both require an explicit definition of what the document structure representation is before we can give a complete formal specification of what information is in fact referenced by a locator, particularly in the more complex cases, where the destination is not a single element.

Lauren will be coordinating this liaison effort, and seek to present a first cut proposal for a DOM/XML data schema (or grove plan) by July 1. Michael and James will be contributing to this effort.

As for locating spans, there are complexities because a span is not generally representable as a set, list, or tree or elements. The span from the 2nd to the 4th <P> within SEC ID=SEC3 can be; but the span from the last word of one <P> through the 4th word of the next <P> is not. Neither including nor excluding the <P>'s involved, or their common ancestor, fully represents the link: All those elements are partly included in the resource.

The end proposal was to include spans in the location syntax, specified as a start/end pair, with the meaning defined in the same manner as in TEI: as a reference to the included range. At the same time, we will acknowledge that the precise details are not yet specified, and that we expect that to be accomplished via the DOM effort, with which we are working.

This was approved, with James and Dave dissenting.

Steven J. DeRose, Ph.D., Chief Scientist
Inso Electronic Publishing Solutions
(formerly EBT)

11 June 1997


Date: Sat, 14 Jun 1997 19:14:03 -0700
From: Jon.Bosak@Eng.Sun.COM (Jon Bosak)
Subject: Associating stylesheets with documents

Way back on April 23 the ERB discussed methods for linking stylesheets to documents and decided to take a two-tiered approach: adopt a method based on processing instructions and already implemented experimentally for the simple cases, and work to define a more elaborate method based on some kind of "binding document" for the more complex cases. This would allow us to quickly put in place a simple mechanism for the typical case while giving us more time to come up with a good design for the tougher but less common cases.

I should have reported this decision, but some details remained to be specified, and then we got embroiled in the error-handling controversy while simultaneously trying to steer the WebSGML TC through WG8 in Barcelona, and stylesheet linking just sort of fell through the cracks. I brought this back up during last Wednesday's ERB meeting, and we made sufficient progress to finally report to the WG (and early implementors) where we currently stand and where we seem to be headed.

The simple mechanism is easily described. A stylesheet is associated with an XML document by inserting into the prolog of the document a processing instruction of the form

   <?XML-stylesheet type="text/dsssl" href="duckbook.dsl"?>
where
type
specifies a stylesheet language such as text/dsssl or text/css
href
is a system identifier such as a file name or URL

Thus, a typical XML document might begin:

 
   <?XML version="1.0"?>
   <!DOCTYPE chapter SYSTEM "duckbook.dtd" [
      <?XML-stylesheet type="text/dsssl" href="duckbook.dsl"?>
   ]>
   <chapter><title>....
   </chapter>

XML-stylesheet processing instructions can appear only in the document prolog; if they appear anywhere else, they are simply ignored. Note, however, that under the rules for XML prologs, the following would be legal:

  
   <?XML version="1.0"?>
   <?XML-stylesheet type="text/dsssl" href="duckbook.dsl"?>
   <chapter><title>....
   </chapter>

This simple method has already been implemented in Jade and in HyBrick, the SGML/HyTime/DSSSL Web browser demonstrated by Fujitsu Labs at WWW6. In last week's ERB meeting, we informally agreed that we need to add the following in parallel with the latest WD-style draft ( http://www.w3.org/pub/WWW/TR/WD-style-970324):

1. An optional MEDIA attribute with the same semantics as the MEDIA attribute specified in WD-style.

2. An optional TITLE attribute with the same semantics as the TITLE attribute specified in WD-style.

3. A form <?XML-alt-stylesheet ... ?> with the same semantics as REL="alternate stylesheet" in WD-style.

Note that WD-style uses the TITLE and MEDIA attributes to group stylesheet options for the user in various ways and to indicate (in conjunction with the REL attribute) whether a stylesheet is "persistent," "default," or "alternate." The specification of these interactions in WD-style appears to me to assume certain features of CSS that may or may not apply to DSSSL; this could need some further exploration.

One question that certainly needs resolution is the implied meaning of multiple stylesheet PIs. The corresponding structure in WD-style (a series of LINK REL=stylesheet elements) specifies a cascade of the stylesheets in the order in which the LINK elements appear in the HEAD. The alternatives here seem to be:

1. Make the appearance of more than one xml-stylesheet PI in a prolog an error. (You could still have multiple xml-alt-stylesheet PIs.)

2. Allow multiple xml-stylesheet PIs and cascade them if they are written in a language such as CSS for which cascading has been defined, but leave the behavior system-dependent if they are not. (No cascading rules have been defined for DSSSL, but there seems to be nothing preventing this at some time in the future.)

3. Allow multiple xml-stylesheet PIs but either

(a) state that the various stylesheets should always be presented as user options, or

(b) allow the treatment be completely system-dependent.

One argument for approach #3 is that user agents that can cascade stylesheets are probably significantly harder to implement than user agents that simply allow the application of alternative stylesheets, and our emphasis on simplicity means that we won't require user agents that support stylesheets to also support cascading.

Another question is what to specify as the behavior when multiple stylesheet PIs are given and they are of more than one type. This could be decided by the system, the user, or some combination of the two. For the near term, it would seem sufficient to state that the behavior is currently considered to be system-dependent and that we intend to sort this out later.

As for the more complex "binding document" approach, that's on hold until we can confer with other W3C activities having similar requirements to see whether we can arrive at one solution that will work for all of us. Given everything else that's happening right now, it will probably be a couple of months before we get to that. But the simple method above should carry us well into the initial wave of stylesheet implementations if we can get some resolution of the open questions just stated.

Jon

18 June 1997

General


Date: Wed, 18 Jun 1997 11:31:59 -0700
From: Tim Bray <tbray@textuality.com>
Subject: ERB meeting of June 18

The ERB met June 18; present: Bosak, Kimber, Maler, Wood (for Sharpe) DeRose, Paoli, Hollander, Bray, Sperberg-McQueen, Magliery.

1. On the issue of contextual/inline/independent links. Formerly known as "Link-5: Extended Contextual Links", the "tesuji" example. We agreed unanimously:

We will add a new attribute INLINE to both simple and extended links. The values are TRUE and FALSE, with the default being TRUE. TRUE means that all content of the linking element, excluding pieces of the linking element's own machinery (i.e. child locator elements) are to be considered a resource of the link.

We will add new attributes LOCAL-ROLE and LOCAL-LABEL to both simple and extended links, which may be used to provide ROLE and LABEL information for the content, which, assuming INLINE="TRUE", will be referred to as the local resource. If INLINE="FALSE" but LOCAL-* attribute(s) are present, this is not an error, but the LOCAL-* attributes have no effect.

2. We agreed that the only sensible content model for both SIMPLE and EXTENDED linking elements is ANY.

3. On the subject formerly known as "Link-4: Extended Linking Group Indirection", we decided that: - the explanation obviously needs to be improved, many readers didn't get the idea - we will add a new XML-XLG-STEPS attribute to the <group> element saying how many steps the author thinks a processor should chain out, in order to build the appropriate set of links. However, this will be treated only as a hint and will have no required effect on XML-Link processing.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

Parameter Entities


Date: Thu, 19 Jun 1997 08:30:37 -0400
From: "Eve L. Maler" <elm@arbortext.com>
Subject: Parameter entities vs. GI name groups

The ERB, in its meeting yesterday, discussed the issue of parameter entities in XML. We are strongly leaning towards voting to remove parameter entities entirely from V1.0 as long as GI name groups are reinstated in ELEMENT and ATTLIST declarations:

  
<!ELEMENT (a|b|c) ...>
<!ATTLIST (x|y|z) ...>

We plan to vote on this at next Wednesday's meeting; your input is welcome.

The reasons to prefer GI name groups:

One consequence:

Eve

External Entities


Date: Sun, 22 Jun 1997 21:40:20 -0700
From: Jon.Bosak@Eng.Sun.COM (Jon Bosak)
Subject: ERB decision: External entities

In its meeting of June 18, the ERB considered the recent discussions of external entities and concluded that no compelling reasons have been presented that would justify our reconsideration of this question in the XML 1.0 time frame. WG members are requested to conduct further discussion of this question off the list.

Jon

Character Sets


Date: Sun, 22 Jun 1997 21:41:05 -0700
From: Jon.Bosak@Eng.Sun.COM (Jon Bosak)
Subject: ERB decision: Character sets

In its meeting of June 18, the ERB considered the recent discussions of character sets and decided to defer consideration of this question until after the next version of xml-lang is out. WG members are requested to hold their comments on this question or conduct further discussion off the list for the moment.

Jon

25 June 1997

Colon as Name Character


Date: Wed, 25 Jun 97 13:03:49 CDT
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: ERB decision: colon as name character

In its meeting today (25 June 1997), the ERB discussed the problem of namespaces, and in particular the proposal to include ':' as a name character in XML. The group decided unanimously that:

Other decisions will be reported separately.

-C. M. Sperberg-McQueen

Other Decisions


Date: Thu, 26 Jun 1997 09:30:11 -0700
From: Tim Bray <tbray@textuality.com>
Subject: Some progress on PE's

The ERB met on June 25; everyone but Dave Hollander was present. There was considerable discussion of Parameter Entities, an unofficial summary follows:

  1. The discussion in the WG makes it clear that XML's utility as an authoring environment would be severely compromised by omission of PE's, or even by constraining them much more than they are now.
  2. It is generally agreed that implementation of the full suite of constraints on the placement of and replacement text for PE's is beyond what should be expected of a lightweight nonvalidating parser.
  3. It would be dangerous to relax the constraints, i.e. say that PE references can go anywhere in the DTD, as this would tend to create a large legacy class of instances that would be well-formed but not 8879-conformant, and hard to make conformant.

Thus it was unanimously agreed to have two sets of PE rules, one for the external and one for the internal subset. In the external subset, the rules will stand as they are now, although we'll try to improve the explanation in the spec. (Those who have said it could be explained more clearly are hereby invited to submit specific suggestions).

In the internal subset, PEs must expand to match "markupdecl" (prod. 28 in the current draft), and references can only be placed where a markupdecl can be recognized. The feeling is that this level of recognition is well within the capabilities of the most modest parsers.

In follow-on discussion, we realized that this highlights a weakness in the spec. Currently, it only dicusses validating and non-validating processors; the former are required to read the DTD. In fact, there are already non-validating parsers which do read and use the DTD, if only to extract default attributes and entity declarations. This seems like an obviously good thing to do. Yet, such processors are unlikely to want to fetch and retrieve the whole external subset, since if they're not validating they don't care about content models; it seems reasonable for there to be a common class of XML documents which group the markup declarations not required for validation, but useful to a processor, in the internal subset for efficient transfer down the wire.

This is closely related to the question of the RMD; a conformant processor cannot refuse, at the moment, to read the external subset; should this be allowed in some class of nonvalidating parsers? And yet the phrase "some class" suggests the use of an option, something we have to date vigorously resisted introducing into XML.

Solving this problem may be made easier by removing discussion of the processor entirely from the spec, as suggested by both Henry Thompson and Dan Conolly.

We judged that this lack of clarity is not fatal to the progress toward a July 1st version of XML-lang (processors are de facto apparently doing what seems like the right thing) - but there will be a work item on our agenda later this year to address and clean up this area. Furthermore, the next release of the draft will contain an editorial note acknowledging the existence of this set of issues.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

25 June 1997


Date: Sun, 29 Jun 97 15:30:55 CDT
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: ERB decision: colon as name character

This note was originally posted to the SGML WG list server on 25 June; thanks to Jon Bosak for pointing out that it never went out to the list as a whole. -CMSMcQ

In its meeting today (25 June 1997), the ERB discussed the problem of namespaces, and in particular the proposal to include ':' as a name character in XML. The group decided unanimously that:

Other decisions will be reported separately.

-C. M. Sperberg-McQueen

27 August 1997


Date: Sun, 31 Aug 97 12:38:25 CDT
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: XML WG decisions of 27 August 1997

The XML Work Group discussed the following questions, and made the decisions indicated, in the meeting of 27 August 1997.

Present: Jon Bosak, James Clark, Steve DeRose, Eliot Kimber, Eve Maler, Makoto Murata, Peter Sharpe, C. M. Sperberg-McQueen.

Case Folding

1. A decision on case folding was postponed.

Background: The current draft XML spec requires that most names (i.e. generic identifiers, attribute names, IDs, IDREFs, name tokens in attribute values PI targets, notation names, and document type names) be case-folded, while entity names are case sensitive. It has been repeatedly urged that this be changed and that all names be case-sensitive. The arguments are familiar:

For case folding: since the reference concrete syntax requires case folding, many current users of SGML and HTML are familiar with and have come to expect this behavior.

For case sensitivity: since SGML parsers are required to fold up, rather than down, the XML spec is inconsistent with recommended Unicode practice. (Unicode recommends folding down rather than up since there are slightly fewer unpleasant surprises and inconsistencies that way.) There is no rule for case folding which works in the culturally expected manner for all speakers of all alphabetic languages: a lower-case e with acute accent is (correctly) uppercased one way in Quebec and a different way in metropolitan France. Lowercase I (with a dot) is uppercased one way in Turkish and another way in other languages using the Latin alphabet.

A strong majority of those participating felt that we should make XML case sensitive and drop case folding, but in view of the sensitive nature of the decision, it was decided to postpone the decision until a larger fraction of the work group was present.

20.25-bit Characters

2. XML characters range from #x0 to #x10FFFF.

Decision: Legal XML characters are those representable in UTF-16 / Unicode 2.0, i.e. those in the first seventeen planes of ISO/IEC 10646. Unanimous.

Rationale: The current spec says that XML characters may include any character defined by ISO/IEC 10646. Currently, that standard defines characters only within the Basic Multilingual Plane, each of which can be represented by a string of 16 bits; in principle, however, ISO/IEC 10646 defines a 31-bit character space, and production 2 accordingly defines Character as covering the range #x0 to #x7FFFFFFF, with some gaps for forbidden characters.

XML processors, however, are not required to support the flat 32-bit character encoding UCS-4, only the 16- and 8-bit encodings of UCS-2 and UTF-8. (The latter can represent all the characters of the 31-bit character space, but UCS-2 cannot.) In many places, the XML spec suggests, or at least allows incautious readers to believe, that XML characters are only 16 bits wide.

Either way, it's important to eliminate the ambiguity in the spec.

In favor of restricting XML characters to 16 bits: it simplifies life for users of Java and other tools. It seems clear that the full 31-bit space of 10646 will not be needed, even for extremely specialized applications, in the foreseeable future.

In favor of defining XML characters to be 31 bits wide: 16 bits is manifestly too few for anyone working with historical texts in Han characters. Politically, it would be unwise to give the impression that only the Basic Multilingual Plane is of importance. The surrogate method, while clever, is clearly a hack which demonstrates that the original Unicode claim (16 bits is enough to build an absolutely flat character space which will last for all time) has fallen apart under the pressure of fact; the surrogate method abandons the flat character space which is one of the most important advantages of Unicode.

The compromise (BMP plus the next 16 planes) appears

UTF-16 Support

3. Processors must support UTF-16, not just UCS-2.

Background: the current draft spec says (4.3.3): "All XML processors must be able to read entities in either UTF-8 or UCS-2." It has been proposed to change this to require support for UTF-8 and UTF-16 (which is UCS-2 plus support for the surrogate-character mechanism by which characters outside the Basic Multilingual Plane may be encoded).

Decision: (i) XML processors must support 16-bit data streams (i.e. UTF-16) for input. (ii) They must not corrupt surrogate characters. (iii) If the processor uses a 16-bit buffer or a 16-bit interface to the downstream application, it must correctly represent numeric character references to non-BMP characters as pairs of surrogate characters. Unanimous.

Rationale: since all name characters in XML are in the Basic Multilingual Plane, characters outside the BMP can only appear in XML documents as data. Since an XML processor is required to do nothing more to data than store it and pass it to the downstream application without corrupting it, no special handling is required for surrogate characters. The only new requirement is that processors understand the surrogate-character mechanism for characters outside the BMP, and use it, when necessary, to handle numeric character references correctly.

Character-Set Standard References

4. XML will refer to Unicode 2.0 and ISO/IEC 10646 with Am. 1-7.

The current draft spec refers to Unicode 2.0 and ISO/IEC 10646 with Amendments 1 through 5. It has been suggested (a) that XML should refer only to Unicode, and (b) that the reference should be to "the current version" of Unicode, so that as Unicode is revised, XML automatically accepts the revisions.

Decision: refer to 10646 with Amendments 1 through 7, but otherwise retain the current reference. I.e. do not drop the reference to ISO/IEC 10646, and do not phrase the reference so as to incorporate changes to Unicode automatically. Unanimous.

Rationale: the agreement between ISO/IEC JTC1/SC2 and the Unicode Consortium to keep Unicode and 10646 synchronized is extremely important to all users. A joint reference to both standards makes clear to both parties that we, as users, wish them to honor that agreement. A reference solely to Unicode would imply clearly that XML would follow Unicode even if Unicode were to diverge from ISO/IEC 10646. The joint reference makes clear our intent: if the Unicode Consortium and SC2 fail to keep the two standards in synch, then XML is not guaranteed to follow either of them.

Reference to as yet unpublished standards (which is what reference to "the most recent version" amounts to) is unwise because there is and can be no guarantee that revisions in Unicode and 10646 will not require corresponding revisions to the XML spec.

Encoding of External Text Entities

5. Encoding of external text entities is kept as is.

It has been suggested that by allowing external entities to be in different character encodings, XML is incompatible with ISO 8879, which does not allow this.

The WG unanimously reaffirmed its belief that the current draft spec is in fact compatible with ISO 8879 under what is sometimes called the 'new' character model. SGML documents must have a single document character set declaration and thus a single document character set, but this reflects the output from, not the input to, the entity manager, and is thus independent of the character encoding encountered in the actual data stream of the external text entity.

Ideographic Space

6. Ideographic space is not white space.

Decision (unanimous): ideographic space (#x3000) will be removed from the non-terminals S and PubidCharacter.

Rationale: Ideographic space corresponds more closely to the no-break space (#xA0, &nbsp;) than to the standard space character (#x20). #xA0 is not allowed in S, and neither should ideographic space be. It is unlikely, with current standard input methods for kanji, that any operator would unintentionally or accidentally insert an ideographic (#x3000) rather than a Latin (#x20) space within a tag.

Sources of Encoding Information

7. Binding sources of information for character encodings will be specified.

The current draft spec says nothing about the priority of various sources of information regarding character encodings. Some participants (notably Gavin Nicol and Makoto Murata) have argued that this should be specified.

Decision: The spec should include wording to the following effect:

If an XML document or entity is in a file, the Byte-Order Mark and encoding-declaration PI are used (if present) to determine the character encoding. All other heuristics and sources of information are solely for error recovery.
If an XML document is delivered via the HTTP protocol with a MIME type of text/xml, then the HTTP header determines the character encoding method; all other heuristics and sources of information are solely for error recovery.
If an XML document is delivered via the HTTP protocol with a MIME type of application/xml, then the Byte-Order Mark and encoding-declaration PI are used (if present) to determine the character encoding. All other heuristics and sources of information are solely for error recovery.

-C. M. Sperberg-McQueen

3 September 1997


Date: Wed, 3 Sep 1997 20:50:31 +0200 (MET DST)
From: "C. M. Sperberg-McQueen" <cmsmcq@hd.uib.no>
Subject: XML WG decisions of 3 September 1997

The XML Work Group met today (3 Sept 1997) and made the decisions described below. Present were Jon Bosak (JB), Tim Bray (TB), James Clark (JC), Dan Connolly (DC), Steve DeRose (SJD), Paul Grosso (PG), Dave Hollander (DH), Eliot Kimber (EK), Murray Maloney (MMa), Makoto Murata (MMu), Joel Nava (JN), Jean Paoli (JP), Peter Sharpe (PS), and Michael Sperberg-McQueen (MSM).

1. Procedures for determination of character encoding to be described in an appendix.

Background: last week's report of decisions (31 August, posting from U35395@UICVM.UIC.EDU), included as item 7 a decision regarding "Binding sources of information for character encodings". The WG revisited the issue, noted that in fact no formal vote on it had been taken (error in the report), and discussed whether such rules belong in the XML language spec or not.

Against inclusion: the rules really apply to the delivery of XML in very specific protocol environments, and should be included in the specification of the protocol. XML will be delivered by many protocols, some of them not yet invented; the language spec should not have to be revised every time a new protocol is deployed or invented.

For inclusion: such conventions are important for encouraging interoperability of XML software. Conforming processors reading the same material in the same environment should make the same decisions about the character encoding.

Decision: The rules for locating binding information about the character encoding of XML entities (reported last week) will be described in an appendix. They will be accompanied by a note making clear that the rules about http service properly belong in the RFC defining the Mime types text/xml and application/xml, and that when those RFCs are available their text will supersede the recommendations of the appendix.

The wording given in the posting of 31 August will be changed by replacing the phrases 'XML document or entity' and 'XML document' with the phrase 'XML entity'. (It has been argued that the term 'entity' is not currently well defined in the XML spec; if the usage of the term is later revised, this occurrence may be changed.)

In favor: all present.

2. A decision on case-folding was postponed again.

A summary of the issues and a request for discussion by the SIG will be posted shortly.

3. XML processors to normalize CR, LF, and CRLF to LF.

Background: the current draft XML spec says nothing about whether or how XML processors or applications should normalize the common line-break sequences CR, LF, and CRLF.

For normalization: since the three sequences are intended, in practice, to have the same meaning, they can be normalized without loss of useful information. If the XML processor does not normalize these sequences, every single downstream XML application will be forced to do so; experience shows that relying on them to do so will result in broken applications and inconsistent behavior.

Against normalization: right now the spec has no concept of line or line break; there is no need to introduce one, so for the sake of economy (and clarity) none should be introduced.

For normalizing to LF: thanks to C's standard IO model, it's what most program libraries provide, and thus what most programs and most programmers expect.

For normalizing to CRLF: it's more consistent with the specifications governing the Web. Last time anybody looked at the ASCII spec, CRLF was the preferred form of this information.

Against CRLF: specifications? On the Web?

Decision: When an XML processor encounters any of the character sequences CR (UTF-16 x000D), LF (UTF-16 x000A), or CR LF (UTF-16 x000D x000A), the processor must pass a single LF character to the downstream application.

(Note: this formulation of the decision presupposes that the set of information which XML processors may or must make visible to downstream applications will be described more fully than it is in the current draft spec. If the WG decides against such a description, this substantive decision will need to be expressed in some other form. If the processor disappears from the XML language specification, as has been proposed, this decision may be expressed as a constraint on whether the differences among line-break sequences in the input stream are 'visible' or 'significant'.)

-C. M. Sperberg-McQueen
University of Illinois at Chicago
tei@uic.edu

10 September 1997


Date: Thu, 11 Sep 1997 21:16:13 -0700
From: Tim Bray <tbray@textuality.com>
Subject: XML WG decisions of Wed. Sep. 10

The XML WG met on Wed. Sep. 10th. Present: Bosak, Kimber, Murata, Clark, Sperberg-McQueen, Wood, Nava, Bos, Maler, Bray, Tigue, Maloney, Paoli, DeRose.

Errors in discussion summaries are, as usual, mine.

1. Discussion of case sensitivity

Few new arguments arose in the discussion of case sensitivity, aside from Steve DeRose's observation that disallowing case folding will, by removing the possibility that attribute values are case-folded, reduce the number of instances where the results of parsing can be affected by the presence/absence of a DTD. (Note that the handling of white space can still be affected in the case where attribute values are known to be tokenized, so the problem hasn't entirely gone away).

This is a summary of points made in a brief last-chance-to-speak- your-mind go-around:

For Case Sensitivity:

For Case Folding:

The Question:

Modify the XML specification to achieve the effect of NAMECASE GENERAL NO in SGML.

Yes: Bosak Kimber Murata Clark Sperberg-McQueen Nava Bos Bray Tigue Maloney Paoli DeRose
No: Wood
Abstain: Maler

So XML is now case-sensitive.

1a: Since XML is case sensitive, we must specify the case of our keywords, i.e. <!ELEMENT or <!element. Names not recorded, vote was Upper: 7 Lower: 3 Abstain: 4 (In this vote, some of the abstains should be taken as don't-cares).

2. Chris Maden's suggestion that NOTATION System Identifiers should be mime types. The WG liked the idea, but declined to modify the spec to achieve tihs effect; among other things, URLs and mime types are not syntactically distinguishable. It was the feeling of the group that it would be desirable that a new URL scheme be created to allow a URL to locate a mime type.

3. Discussion of the proposition that the XML spec should say more about what the processor passes the App. John Tigue has volunteered to write an XML Grove Plan; while there is little sentiment that this should be made normative, it might serve usefully as either a separate application note or an appendix.

The WG agreed that the editors should enrich the language of the spec sufficiently to make it clear (as it does with PIs and comments) what a processor may and must make available to an application.

Cheers, Tim Bray
tbray@textuality.com
http://www.textuality.com/

PS: For your amusement, I attach the output produced by a moments-ago-updated Lark when asked to process the XML spec:

Loading
Testing: Lark V0.92 Copyright (c) 1997 Tim Bray.
 All rights reserved; the right to use these class files for any purpose
 is hereby granted to everyone.
Parsing...
Syntax error at line 127:57: Start/End tags differ only in case: p/P
Syntax error at line 367:23: Start/End tags differ only in case: ITEM/item
Syntax error at line 369:51: Start/End tags differ only in case: ITEM/item
Syntax error at line 370:69: Start/End tags differ only in case: item/ITEM
Syntax error at line 454:4: Start/End tags differ only in case: P/p
Syntax error at line 457:50: Start/End tags differ only in case: p/P
Syntax error at line 750:50: Start/End tags differ only in case: termdef/TERMDEF
Syntax error at line 752:34: Start/End tags differ only in case: lhs/LHS
Syntax error at line 755:71: Start/End tags differ only in case: prod/PROD
Syntax error at line 955:43: Start/End tags differ only in case: P/p
Syntax error at line 956:7: Start/End tags differ only in case: ITEM/item
Syntax error at line 959:19: Start/End tags differ only in case: p/P
Syntax error at line 959:26: Start/End tags differ only in case: item/ITEM
Syntax error at line 991:7: Start/End tags differ only in case: list/LIST
Syntax error at line 1031:22: Start/End tags differ only in case: P/p
Syntax error at line 1039:4: Start/End tags differ only in case: p/P
Syntax error at line 1062:4: Start/End tags differ only in case: P/p
Syntax error at line 1137:31: Start/End tags differ only in case: p/P
Syntax error at line 1140:4: Start/End tags differ only in case: p/P
Syntax error at line 1207:4: Start/End tags differ only in case: P/p
Syntax error at line 1278:4: Start/End tags differ only in case: P/p
Syntax error at line 1289:60: Start/End tags differ only in case: p/P
Syntax error at line 1453:7: Start/End tags differ only in case: DIV2/div2
Syntax error at line 1544:4: Start/End tags differ only in case: P/p
Syntax error at line 1586:4: Start/End tags differ only in case: P/p
Syntax error at line 1652:14: Start/End tags differ only in case: P/p
Syntax error at line 1655:19: Start/End tags differ only in case: p/P
Syntax error at line 1675:4: Start/End tags differ only in case: P/p
Syntax error at line 1706:22: Start/End tags differ only in case: P/p
Syntax error at line 1721:36: Start/End tags differ only in case: p/P
Syntax error at line 1726:45: Start/End tags differ only in case: P/p
Syntax error at line 1935:40: Start/End tags differ only in case: P/p
Syntax error at line 2072:4: Start/End tags differ only in case: P/p
Syntax error at line 2376:8: Start/End tags differ only in case: SCRAP/scrap
Syntax error at line 2377:4: Start/End tags differ only in case: P/p
Syntax error at line 2438:8: Start/End tags differ only in case: SCRAP/scrap
Syntax error at line 2530:7: Start/End tags differ only in case: div3/DIV3
Syntax error at line 2595:8: Start/End tags differ only in case: SCRAP/scrap
Syntax error at line 2665:10: Start/End tags differ only in case: p/P
Syntax error at line 2858:7: Start/End tags differ only in case: DIV2/div2
Syntax error at line 3650:19: Start/End tags differ only in case: p/P
Done.

24 September 1997


Date: Wed, 24 Sep 1997 20:59:33 +0200 (MET DST)
From: "C. M. Sperberg-McQueen" <cmsmcq@hd.uib.no>
Subject: WG meeting 24 September 1997

The XML WG met Wednesday, 24 September 1997 and discussed the following questions, with the results indicated. The summaries of arguments and rationales for the decisions are intended as a correct record of the views expressed in the meeting, but they have not been reviewed by the work group and are subject to correction.

Present: Jon Bosak, Tim Bray, James Clark, Dan Connolly, Dave Hollander, Eliot Kimber, Andrew Layman, Eve Maler, Murray Maloney (in part), Makoto Murata, Joel Nava, C. M. Sperberg-McQueen, John Tigue.

Should XML require support for a subset of SGML-Open catalog files (e.g. as described at http://www.uic.edu/~cmsmcq/tech/xml/pisocat.html)?

Consensus: no.

Rationale: if XML were to use a catalog-based mechanism, it should use catalogs expressed in XML instance syntax. But there is in any case no clear consensus (in the WG, or in the SIG, or in industry) in favor of this or any other specific resolution mechanism.

Should the XML spec be revised to omit all mention of `XML processors'?

Arguments: (pro) the processor is not completely specified by the XML spec, and mentions of an unspecified, or incompletely specified, object are not useful and violate the design goal of having a formal, concise spec.

(con) The notion of an 'XML processor' is a useful editorial device, not unlike that of the 'C pre-processor', which is described as a processor separate from the main compiler but need not be implemented that way and often is not. Removing references to the processor will only make the spec more cumbersome; it also involves pretending we have no expectations whatever about how XML will be processed, which is simply not true: we do have such expectations and there is no harm and a lot of use in letting the reader of the spec see some of them.

Consensus: section 2.10 (Required Markup Declaration) currently requires more attentive reading than ought to be required; several sentences are actively misleading until the last few words. This section should be changed for XML 1.0 to remove the potentially misleading passages.

For the rest of the spec, however, the consensus was that it was not necessary to remove all mention of the XML processor. The question may be considered anew when the time comes to work on XML 1.1. (Dissents: DC would remove all mentions of the processor now; MSM does not believe the question should come up again for XML 1.1.)

Rationale: section 2.10 is difficult enough that it must be fixed without delay, but no further changes should be required lest they delay version 1.0 unnecessarily.

Should XML restrict the content of processing instructions to characters &#x20;-&#x7F;?

Background (as well as the reporter understands it): when translating a document from one encoding (say, UTF-16) to another (say, ISO 8859-1), a translator may encounter some characters present in the document which are not directly representable in the target encoding (e.g. kanji, which are not present in ISO 8859-1). In data, such characters may be represented by character references, but in processing instructions character references are not recognized by the XML processor. How can we ensure that the necessary information is conveyed to the applications needing it?

Consensus: no restriction should be made. The ability to use any legal character in PIs is too important to restrict it for what is, after all, a problem which only afflicts those not using Unicode. At least two workarounds are possible: 1 those facing such problems can adopt the convention of using character references in PIs, and allowing the application to resolve them (i.e. pass the buck to the application -- this was not a popular suggestion), and 2 the entire PI can be replaced by a general entity reference, with the original text moved into the declaration of the entity. Since the XML processor does recognize character references within the entity text of an entity declaration, such character references will be handled by the XML processor. (The document may be made ugly, but the results will be reliable.) It may also be regarded as a problem to be solved by another protocol layer, not the XML layer.

Should XML restrict the content of the XML declaration to characters &#x20;-&#x7F;?

Consensus: the grammar already restricts the XML declaration and the encoding declaration in this way; no further note is needed.

Should XML change SGML reserved words as suggested by Paul Prescod and [amended by] Dave Peterson? To wit:

Consensus: No.

Rationale: any SGML processor which can handle XML (in particular, the change to the NET delimiter) is almost certain to be able to handle changes to the reserved words. But any such change would carry a high cost in confusion to users already familiar with the default forms. Such a cost might be accepted, if we could formulate a solution which removed all of the deficiencies felt to be present in the set of default names. But prominent among such deficiencies is the use of the keyword CDATA in several senses perceived by some as distinct. The 8879 renaming rules do not allow CDATA to be replaced differently in different contexts, so no renaming of reserved words can solve all the perceived problems with the default set of values. In this case, the cost of any change is high enough that a partial solution was felt to be worse than no solution at all.

The reserved words in question are, in any case, mostly seen by those writing DTDs (who were felt to be "priests already") or reading DTDs (who are, it was suggested, best classed with the mythical Managers Who Read COBOL Programs and probably not worth designing for).

Should XML drop the notion of external (general) text entities?

Consensus: no, it should retain them.

Rationale: no new arguments have been proposed since this question was addressed in the autumn of 1996. Those arguments (general text entities do not guarantee reusability of document fragments, everything that can be done with general text entities can be done with links or -- in an SGML environment -- SUBDOCs, there is no point having general text entities if you can do the job with links) were deemed to be non-compelling. (To wit: reusability is not the only reason to use general text entities. Neither links nor subdocs provide the same well-understood transparency to validation that general text entities provide; the prospect of being able to validate across links eventually is not a compelling reason to give up a facility understood, widely implemented, and relied on today; XML processors are not currently required to implement XLL, and should not be; even if we drop external general text entities, we will continue to need external parameter entities to do things like embed DTD fragments or standard entity sets -- XLL-based links will not be available within the DTD, at least not as they are currently designed and documented.)

As always, errors, misleading summaries, and poorly worded explanations are the responsibility of the reporter, not the WG.

-C. M. Sperberg-McQueen

1 October 1997

Namespaces


From: Andrew Layman <andrewl@microsoft.com>
Date: Wed, 1 Oct 1997 22:35:16 -0700
Subject: Namespaces -- Universal Names

At its meeting on Wednesday night, 10/1/97, the XML WG decided on a direction for namespaces (Universal Names) in XML 1.0, which is the following syntactic rules:

Rules Which Enable Namespaces

1. Colons are allowed in names. Colons can be used in any names, that is, in elements, attributes, notations and ids. The portion of a name preceding a colon (if any) is known as the "namespace qualifier."

2. There is a reserved Processing Instruction of the form <?XML:namespace href="someURI" as="someshortname"?> which associates the shortname with a URI for the purpose of namespace qualification.

Rules Which Limit Namespace Syntax

3. There is at most one colon per name.

4. Such processing instructions must appear at the beginning of the document, before any elements.

5. If a namespace qualifier is used within any name, it must match the value of the "as" attribute in a namespace PI.

6. No two namespace PIs may have the same value for their "as" attribute.

Notes

a. The rules which limit namespaces are to prevent namespace qualifiers from being used in ways that would conflict with future expansion or refinement of the namespace facility.

b. XML makes no special interpretation of qualified names, whether in DTDs or instances. In particular, namespaces has no effect on validation.

c. These are syntactic rules. Applications may use the qualified names and the namespace PI to gain the effects described in earlier mail as "Universal Names" but it is not the responsibility of the parser to universalize names.

Review

This is the direction of the XML WG but not a decision. Please review these rules. If you see syntactic difficulties with them, please speak up.

--Andrew Layman
Microsoft Corporation

Parameter Entities


Date: Thu, 02 Oct 97 12:45:26 CDT
From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
Subject: WG decisions 1 October 1997

The XML WG met Wednesday, 1 October 1997, and discussed the issues of parameter entities and namespaces, with the outcomes described below.

Present: Jon Bosak, Tim Bray, James Clark, Dan Connolly, Dave Hollander, Andrew Layman (alternate for Jean Paoli), Eve Maler, Murray Maloney, Makoto Murata, Joel Nava, Michael Sperberg-McQueen, and John Tigue.

1. Parameter entities

The work group agreed, without long discussion, that it is essential to retain parameter entities with something like their current power. Proposals to simplify parameter-entity resolution by restricting the locations where parameter entities can occur (e.g. only where full markup declarations are legal, or within entity text; or at any location EXCEPT within entity text) were discussed briefly but gained no significant support.

Decision: After somewhat more discussion, the group reached consensus on a proposal based on the separate suggestions made by James Clark and Henry Thompson, and agreed to adopt it, if it proves possible to insert a corresponding change in the pending WebSGML Technical Corrigendum to ISO 8879.

  1. The description of the rules for recognizing and resolving parameter-entity references will be changed. The % operator will be dropped; instead, the spec will state that

    A note will point out that this rule ensures that, except within entity text, parameter-entity references force token boundaries at the beginning and end of the replacement text; the behavior of a PE-resolver thus resembles that of a macro preprocessor in languages like C.

  2. The constraints on PE replacement text now enforced by the % operator will in part be relaxed and in part be replaced by normative prose, as in ISO 8879. ISO 8879 imposes three constraints on parameter entities beyond those covered by the resolution rules above. The first two will be retained by XML:

    ISO 8879 imposes a third constraint:

    The work group agreed that this constraint complicates life for parser writers without simplifying it for DTD writers, and should not be retained in XML. In order to allow it to be relaxed for XML, however, it must be relaxed (or relaxable) in Full SGML; this will require a change to the WebSGML TC.

  3. The existing rules on the resolution of references (i.e. PE, general entity, and character references) remain unchanged; Appendix C, therefore (on "Expansion of Entity and Character References") cannot be omitted from the spec.
  4. As a consequence of point a) above, the rules forbidding white space on either side of the keyword in a conditional section will be relaxed; the rule for CDATA sections, in contrast, will not change.

The work group agreed without dissent that XML would adopt these rules if the relaxation of the third normative-prose constraint (connectors) can be introduced into the WebSGML Technical Corrigendum to ISO 8879.

Rationale: in the SIG, the large majority of those most immediately concerned (those who actually write DTDs) seems to agree that omitting PEs entirely, or restricting their locations radically, fatally cripples XML DTDs, so that XML without PEs is effectively XML without DTDs; this is not acceptable.

If we retain PEs, we need a simple, straightforward way of explaining the constraints on their location and replacement text. The % operator appears, in practice, not to be that simple, straightforward explanation. Many readers, including implementors of parsers, report finding it confusing; one WG member doubted that the spec is clear enough to ensure that different implementations will produce the same results. The consensus proposal appears, in contrast, to be relatively easy to understand. Point a) resembles the standard explanation of comments and similar 'semi-visible' constructs in programming languages (not part of the grammar, but with an effect on tokenization). For implementors working with a conventional lexer/parser organization, it is fairly clear, from the description, how to implement parameter-entity recognition in the lexical scanner, with some simple ad hoc checks on the extra constraints in the parser proper.

The main drawback identified with the consensus proposal is that point b) relies on normative prose rather than on the grammatical formalism; this violates design goal 8 (The design of XML shall be formal and concise), which is glossed in document DD 1996-0001 (available at http://www.textuality.com/sgml-erb/dd-1996-0001.html and possibly somewhere on the W3C server) as meaning (inter alia) 'avoid normative prose'. For implementors using parser-generator tools, this means the constraints of point b) must be checked ad hoc instead of being checked automatically by the parser. On the whole, the WG felt that if we can get the desired addition to the WebSGML TC, the normative prose can be kept very simple, and the gain in clarity would outweigh the loss in formality.

2. Namespaces

The work group tentatively agreed on a direction for the solution of the name-space question; details have already been reported separately by Andrew Layman.

8 October 1997

Report


Date: Wed, 8 Oct 1997 21:17:38 +0100 (MET)
From: "C. M. Sperberg-McQueen" <cmsmcq@hd.uib.no>
Subject: WG decisions, 8 Oct 97

The XML WG met on Wednesday, 8 October 1997 and discussed name spaces, with the results summarized below.

Present: Jon Bosak, Bert Bos (for Dan Connolly), Tim Bray, James Clark, Steve DeRose, Dave Hollander, Eliot Kimber, Andrew Layman (for Jean Paoli), Eve Maler, Joel Nava, Peter Sharpe, Michael Sperberg-McQueen, John Tigue.

The decision taken can (it seems to this reporter) best be understood in the context of the WG's discussion, so this report takes a more discursive form than usual; the reporter's apologies for those who prefer a briefer, drier summary. (To cut to the chase, seek for the word 'Decision:' below.)

In informal discussion, the group first established that no firm consensus has yet formed on several issues related to name spaces.

If there is a name-space mechanism based on compound names separated by colons (as described in the report on last week's meeting by Andrew Layman, at http://lists.w3.org/Archives/Member/w3c-xml-sig/1997Oct/0043.html and elaborated since), all of those present accept the colonization of generic identifiers, a bare majority the colonization of attribute names (but several pointed out that Murata-san was not present to argue against them and talk us out of colonizing attribute names). On the use of colons in attribute values (for attributes with enumerated types), IDs and IDREFs, notation names, entity names, and as targets in processing instructions, the group was divided not quite equally among the ayes, the nays, and the not sure. Several votes shifted in the course of the discussion. Arguments in favor of allowing colonization of all names: it's simpler to treat all names alike; we don't know where it will not be needed; cut and paste will be materially easier if IDs and IDREFs (or more generally ALL names) can be (must be) colonized in the process. Against: the arguments for colons in GIs simply don't apply to (say) entity names; since we don't have much user experience in this area we should be conservative and avoid buildup of large volumes of legacy data; and colonizing names constitutes neither a full nor (really) even a partial solution to the cut-and-paste problem.

A second round of discussion revealed roughly equal groups favoring colons in GIs only, colons in GIs and attribute names only, and colons in all names, though in some cases the choice depended both on whether or not the use of colons was to be constrained as outlined in Andrew Layman's report (cited above) or Tim Bray's summary of constraints at http://lists.w3.org/Archives/Member/w3c-xml-sig/1997Oct/0121.html or Henry Thompson's proposal for changes to the grammar at http://lists.w3.org/Archives/Member/w3c-xml-sig/1997Oct/0129.html and also on whether or not the intended application (with or without the word 'semantics') was to be described in the spec, outside the spec, or not at all. Ignoring the conditions, qualifications, and hesitations, a clear majority of those participating favored constraining the occurrence of colon within names (though we did not discuss the particular case of null prefixes, for which the results might differ), with minorities favoring no constraints, or constraints expressed only in an external document. A large majority of participants favored explaining what the colon is intended to be used for; opinions were divided on whether that should happen within or outside the XML spec.

In clarifying his position on this last issue (explanation of colons), James Clark propounded a position which, it developed, was able after discussion and elaboration to attract unanimous approval. If the XML spec defines a processing instruction for declaring name spaces (he argued), then the meaning of that processing instruction must be explained, and its use constrained, in the XML spec. If the XML spec does not define such a PI, then neither explanation nor constraints are needed in the spec.

This raised the possibility of deciding the name space issue by retaining the status quo, in which colons are allowed as name characters, with a note to the effect that colon is intended for use in name space experimentation and that when name space usage is standardized all documents using colons are likely to need updating. Discussion showed that this commanded a slim majority, but only on condition that the name space discipline propounded last week be documented in a separate document (whether to be called a Note or a Technical Report or something else may be subject to W3C rules; for now, the WG rather prefers to call it a Note), so that those who wish for name spaces can follow the recommendations in that document, or follow other recommendations which seem to them superior.

Jon Bosak reminded the group (with assistance from Dave Hollander) that in fact the WG's most recent work plan calls for a direction on name spaces to be documented in a white paper which the RDF group and other groups with similar concerns can use in their work. The work plan foresees inclusion of a namespace mechanism in the XML spec itself only in version 1.1. We have been discussing the issue now because it seemed useful to see whether we could reach enough consensus to get a mechanism into XML 1.0, but our commitment to RDF and others is currently only for a white paper. A Note seems an appropriate way to discharge that commitment.

Decision: it was agreed

Rationale: the external Note will provide the guidance RDF and others seem to desire. By externalizing it, we avoid locking the XML spec itself into a particular name space mechanism which still seems to many WG members unready for standardization. Alternative proposals can be formulated and implemented (as with XLL and XSL, for example, or document-type-constraint schemas in document-instance syntax) without conflict. If usage shows that one proposal (including the WG proposal, but possibly a different one) is superior to the others, that proposal can be built into XML 1.1 without problem. The warning note in the XML spec, together with the processing instructions or other labeling which competing name space proposals will presumably carry, will help ensure that any legacy data not conforming to the mechanism eventually included in 1.1 can be converted without excessive headaches. There is some uncertainty in the WG about whether a three to six month period will be sufficient to gain the required user experience; it was agreed that this is a question best considered after three to six months elapse.

A draft of the external note will be prepared by Tim Bray, Dave Hollander, and Andrew Layman; the WG instructed them to include (at least most of) the constraints discussed in the SIG, but left them to try to find their own consensus on the applicability of the colon mechanism to names other than GIs and attribute names.

Note: as usual, the WG has not approved this discussion, so the description of the rationale for the WG decision is the responsibility of the writer, and subject to correction and augmentation by other WG members.

-C. M. Sperberg-McQueen

Correction


From: Andrew Layman <andrewl@microsoft.com>
Date: Fri, 10 Oct 1997 12:00:13 -0700
Subject: RE: WG decisions, 8 Oct 97

One point of clarification: The WG directed us (did not leave to our discretion) to write the motivation, description and syntax in a way that uses colonized namespaces for both GIs and attribute names, and to include in the section on competing arguments both the arguments in favor of restricting namespace qualification to GIs only and also the arguments in favor of extending namespace qualification to a larger set of names.

--Andrew Layman
AndrewL@microsoft.com

15 October 1997


Date: Wed, 15 Oct 1997 21:05:57 +0100 (MET)
From: "C. M. Sperberg-McQueen" <cmsmcq@hd.uib.no>
Subject: WG decisions of 15 October 1997

The XML WG met on Wednesday, 15 October 1997, and discussed a number of issues, with the results summarized below.

Present: Jon Bosak, Tim Bray, Steve DeRose, Dave Hollander, Eliot Kimber, Andrew Layman (in part; alternate for Jean Paoli), Eve Maler, Makoto Murata, Joel Nava, Michael Sperberg-McQueen, John Tigue.

Should name groups be allowed in element declarations?

Decision: Not for XML 1.0.

Rationale: It would be a convenience, but it is not essential and would complicate the spec. If there is a serious demand for this facility, it can be added later without difficulty. Unanimous.

Should XML require PI targets to be declared notation names? (Status quo: it is recommended but not required).

Decision: No; the status quo will be retained. Unanimous.

Rationale: Requiring notation declarations would effectively mean that well-formed but DTD-less documents could not have any processing instructions at all. It would also mean that simple uses of processing instructions would become significantly more complex.

Should XML incorporate a conditional-inclusion mechanism for the document instance? If so, what should that mechanism be?

Decision: Not in version 1.0. Unanimous. This topic is expected to be revisited later, and in the meantime we welcome proposals.

Rationale: The conditional marked section construct of SGML is not quite what is desired; in particular, it fails to guarantee that an application can have both/all versions of a document (so that one can switch back and forth under stylesheet or user control); it lacks an ELSE construct; it has Boolean conditions are notoriously difficult to express with it; and it is felt by some to be unattractive syntactically. There is no clear alternative, however -- certainly nothing ready for inclusion in version 1.0. There are a number of interesting and difficult technical problems involved, and we do not wish to delay version 1.0 until they are solved.

Should XML provide a general attribute-remapping facility? If so, what form should it take? Standard architectural forms? A simplified notation that maps to architectural forms? Something else?

Decision: no.

Rationale: there is no consensus in the WG that such a facility is needed in the base XML spec.

Should XML define global attributes analogous to XML-space for the identification of the language and script of the content of elements? Proposed names: XML-lang and XML-script.

Decision: yes, in principle the WG would like to include one or more attributes of this description, if we can formulate a proposal which seems acceptable to most of those interested in internationalization. A subcommittee consisting of Murata and Sperberg-McQueen was appointed to draft a proposal for the WG; they are to consult with James Clark if possible, but to post the proposal to the SIG for general comment within the next several days. The decision was unanimous, but Kimber and Maler felt that such global attributes should ideally go not in the XML spec but in a basic or global document architecture which the W3C or some similar organization should construct on top of XML.

Rationale: XML is intended for the representation of human documents, i.e. of human language. Knowing the language of a document or of a particular passage is central to proper display, to correct information retrieval (particularly if stemming and morphological analysis are performed), and to correct formatting. The Unicode characters in the CJK region are underspecified with regard to proper display: some characters need to be represented by distinct font images in order to provide correct display, depending on whether the text is Chinese, Japanese, or Korean. (N.B. the locale of the display engine is not the relevant factor, but the language of the text.) A global attribute of the type proposed is extremely important for interoperability of XML systems, and potentially a huge gain for users. The gain is worth the slight ugliness of including something which might not strictly belong in a spec of the kind we are writing.

The XML WG will take a keen interest in the work of other bodies in this field, and if general W3C-wide recommendations for information of this type are made, we will take this topic up again in order to harmonize XML with such recommendations. But we should not delay defining such an element until such recommendations are ready, if we can define an attribute which seems to do the job.

What does the RMD mean?

Decision: none.

Rationale: We aren't ready to decide this yet.

Discussion revealed several points of disagreement, several possible ways of interpreting the existing RMD, and several pieces of information which it might be desirable to signal to users and software acting for users (whether or not the existing RMD signals that or other information). In particular, various members of the WG distinguished:

There was vigorous discussion concerning the likelihood that typical document authors will in fact be in a position to know whether given pieces of information are in fact important for processing (by users they do not know, or even by users they do know), or for full understanding of the document, or whether given types of declarations are or are not present in various parts of the DTD.

It was proposed that there was, really, something close to consensus in the WG, but this was disputed. It was however generally agreed that if there is consensus, it is proving hard to put into words, and that the words currently in the spec (including the name of the declaration itself) could benefit from rewording, even if we agree that what they say is what the spec should continue to say.

It was decided to ask the SIG for comment on a set of well focused questions, but no volunteer was found willing to formulate the questions. (OK, I'll take a stab at it: see separate posting. -CMSMcQ)

As usual, the WG has not reviewed the text of this note, and the summaries of our rationales for decisions are subject to correction and augmentation by members of the WG.

-C. M. Sperberg-McQueen

22 October 1997


Date: Wed, 22 Oct 1997 11:40:56 -0700
From: Tim Bray <tbray@textuality.com>
Subject: XML Working Group Meeting of October 22

The XML group met on October 22nd. Present: Bos, Bray, Clark, DeRose, Hollander, Maler, Maloney, Murata, Nava, Paoli, Sperberg-McQueen, Tigue

Michael chaired, as Jon is on a (well-earned) vacation. All decisions today were unanimous. Errors in discussion summaries are all mine, as usual.

1. The XML:LANG attribute proposal

We observed with pleasure that there seemed to be real consensus emerging around a basic minimal XML:LANG attribute and there seems to be a good prospect of getting this into 1.0. Thanks to the SIG for the intelligent input. We'll try to vote this in next week.

2. On the complex of issues surrounding the RMD, well-formedness, and external declarations

We agreed unanimously that:

  1. We will retain the current provision that non-validating parsers need not fetch and parse external entities. We will make it clear that this applies also to the external subset of the DTD, and to any external PEs referenced from the internal subset. We will need to introduce a careful definition of the term "External Declarations" for these cases.
  2. We will change the constraint that entities must be declared from a well-formedness constraint to a validity constraint (because the declarations might be external).
  3. We will extend the provision that non-validating parsers need not expand entities known to be external, to include entities declared externally (e.g. an internal entity defined in the external subset).
  4. The RMD will be retained (but more on this later).
  5. The RMD's assertion about whether markup declarations affect the parse tree will be made into a validity constraint.

This, in effect, legitimizes a class of network-oriented processors that wish to work only on the document entity, and avoid multiple network fetches, in order to achieve their parsing goals.

James pointed out that we have a contradiction in that we talk about well-formed documents, which in principle include external entities, but a WF-only checker doesn't have to read them, so how can full WF-checking be done? We will have to adjust the spec either to allow lazy evaluation of well-formedness, or to talk about well-formed entities (which would require special handling of the document entity which unlike others, requires a root element).

3. On the RMD

The RMD will be rewritten to remove all discussion of processors and required behavior. It will simply be an assertion that certain classes of markup will affect the document's parse.

There was discussion of what exactly the RMD should be saying, and should it remain a 3-valued thing. I'll outline this for the SIG's input in a subsequent message.

4. New classes of processors

We took up the question of whether the spec should describe a new class of processor that is not only non-validating but DTD-oblivious. While there was no consensus that we should do so at this time in the spec, there is proof by existence of such a class of processors, and we'd like to allow them to claim XML legitimacy if we can do so without compromising XML's design. We should look at this more closely after we've fully absorbed the implications of the 5-part decision outlined in #2 above, which in itself may go a long way in this respect.

Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

29 October 1997


Date: Wed, 29 Oct 1997 11:17:53 -0800
From: Tim Bray <tbray@textuality.com>
Subject: XML WG Meeting of Oct. 29th

The XML WG met on Oct. 29th.

Present: Bosak, Bray, Clark, Connolly, DeRose, Kimber, Layman (for Paoli), Maler, Maloney, Mikula (for Tigue), Murata, Nava, Wood (for Sharpe)

All decisions were unanimous. Discussion summaries are mine and are subject to error.

1. Process Issues

The WG will publish an interim XML draft as soon as possible to capture the (many) recent WG discussions.

The WG will submit XML 1.0 to the W3C Director with sufficient lead time that it becomes a W3C Proposed Recommendation on December 8th, 1997

The WG will vote on and submit the Namespace draft with sufficient lead time that it becomes a W3C WG Note on December 8th.

The WG has an outstanding work item to update Working Draft and PR dates for XML 1.1, and for XLL 1.0. There was some sentiment that XLL 1.0 is of higher priority than XML 1.0, and in particular, that it would be desirable that XLL 1.0 become a W3C Proposed Recommendation at the time of the WWW7 conference in Brisbane in April 1998.

2. The Language Attribute

XML 1.0 will define such an attribute, and its name will be XML:LANG. There was strong sentiment that this attribute, and probably the current XML-SPACE, should eventually go into a related but separate application-conventions document that will be expressed using the namespace/schema mechanism. As a corollary, XML-SPACE probably becomes XML:SPACE - this is on our to-do list.

The spec will state that the semantic of XML:LANG is inherited as with XML-SPACE, but will leave this processing to the application.

For consistency with and by reference to RFC 1766, XML:LANG will be allowed to contain only one language code. At some future point, we may want to change its declared value from NMTOKEN to NMTOKENS to support multiple languages.

For consistency with and by reference to RFC 1766, XML:LANG will use only two-character language codes; at such time as ISO ratifies the 3-letter codes in 639-2 and/or RFC 1766 blesses them, we should probably start allowing their use.

There will be no facility for renaming the XML:LANG attribute.

For consistency with and by reference to RFC 1766, XML:LANG values will be case-insensitive. The XML spec should contain a note pointing this fact out, as it is inconsistent with XML naming practice.

XML:LANG will separate languages and subcodes with hyphens and only with hyphens.

At this time, we will not define an additional attribute for SCRIPT processing. There is sentiment that this is perhaps a good idea, but the idea is not sufficiently cooked to have developed consensus support.

Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

12 November 1997

Date: Tue, 18 Nov 1997 20:34:51 -0800
From: Jon.Bosak@eng.Sun.COM (Jon Bosak)
Subject: Report from XML WG Meeting of 13 November

[Posted on behalf of C. M. Sperberg-McQueen, who is out of email contact at the moment]

Report from XML WG Meeting of 13 November

C. M. Sperberg-McQueen

18 November 1997

The XML WG met 13 November (sic: should be '12 November' -Ed.) and made the decisions described below.

Present: Jon Bosak, Tim Bray, James Clark, Dan Connolly, Steve DeRose, Dave Hollander, Eliot Kimber, Eve Maler, Andrew Layman (for Jean Paoli), Makoto Murata (in part), Joel Nava, Peter Sharpe, Michael Sperberg-McQueen, John Tigue.

S.14.d How shall XML handle the issues around modifying someone's DTD? Is it necessary to enable designers to prevent this? Do we need a flag to enable a document to signal that it has modified a DTD in some way?

Decision: No action.

Rationale: If the DTD is designed to be modified, no particular notice is needed. If the DTD is not designed to be modified, a modified version of it is effectively a different DTD and may be treated, for XML purposes, as unrelated to the original DTD from which it was derived. (If the relationship is to be specified, this must happen out of band from XML.)

S.31. Should attributes and keywords defined by XML be uppercase, lowercase, camelcase, or some mixture (e.g. XML:lang, XML:space, etc.)?

Decision: keywords defined by SGML are uppercase; keywords defined by XML are lowercase.

Specifically, the keywords in the following productions will be uppercase or lowercase as shown (the list was prepared on the basis of the current editors' working copy, which may differ in some respects both from the last published draft and from the next published snapshot):

Uppercase:

[21]  CDStart ::= '&lt;![CDATA['
[29]  doctypedecl ::= '&lt;!DOCTYPE' ... '>'
[40]  elementdecl ::= '&lt;!ELEMENT' S Name S contentspec S? '>'
[41]  contentspec ::= 'EMPTY' | 'ANY' | Mixed | children
[46]  Mixed ::= '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*'
              | '(' S? '#PCDATA' S? ')*'
[47]  AttlistDecl ::= '&lt;!ATTLIST' S Name S AttDef+ S? '>'
[50]  StringType ::= 'CDATA'
[51]  TokenizedType ::= 'ID' | 'IDREF' | 'IDREFS' | 'ENTITY' |
         'ENTITIES' | 'NMTOKEN' | 'NMTOKENS'
[53]  NotationType ::= 'NOTATION' ...
[57]  Default ::= '#REQUIRED' | '#IMPLIED' | (('#FIXED' S)? AttValue)
[59]  includeSect ::= '&lt;![' S? 'INCLUDE' S? '[' markupdecls* ']]&gt;'
[60]  ignoreSect ::= '&lt;![' S? 'IGNORE' S? '[' ignoreSectContents* ']]&gt;'
[66]  EntityDecl ::= '&lt;!ENTITY' ... '>'
[69]  ExternalID ::= 'SYSTEM' S SystemLiteral
                   | 'PUBLIC' S PubidLiteral S SystemLiteral
[70]  NDataDecl ::= S 'NDATA' S Name
[71]  NotationDecl ::= '&lt;!NOTATION' S Name S ExternalID S? '>'

Lowercase:

[25]  XMLDecl ::= '&lt;?xml' ... '?>'
[26]  VersionInfo ::= S 'version' Eq ('"1.0"'|"'1.0'")
[33]  RMDecl ::= S 'rmd' Eq "'" ('none' | 'internal' | 'all') "'"
             |   S 'rmd' Eq '"' ('none' | 'internal' | 'all') '"'
[72]  EncodingDecl ::= S 'encoding' Eq QEncoding
[72]  EncodingPI ::= '&lt;?xml' S 'encoding' Eq QEncoding S? '?>'
      xml-space
      xml-lang

Actually, production 33 will be changed to read something like this:

[33]  SADecl ::= S 'standalone' Eq "'" ('true' | 'false') "'"
               | S 'standalone' Eq "'" ('true' | 'false') "'"

Rationale: There is some likelihood that the WG would have preferred to have all keywords in lowercase; this is not feasible because there is a slight flaw in the definition of the SGML declaration. Because all keywords are parsed as NAMEs, SGML parsers are required to casefold the keywords. There is an obvious fix for this, which several members of WG4 have already endorsed, but using lowercase keywords would still be impossible for users of unmodified pre-TC SGML parsers. So defining all keywords as lowercase was agreed to be unacceptable.

Defining all keywords as uppercase would have the advantage of providing a very simple rule. For some WG members, the primary (and successful) objection to this approach was the familiar one that all-uppercase is unappealing and resembles shouting. For others, delaying the onset of carpal tunnel syndrome was a more important goal.

The solution chosen (lowercase what XML defines, leave SGML keywords uppercase) initially seemed to some WG members to be too complex to explain to 12-year-old users. Not all XML users will know offhand whether a keyword in the XML spec comes from ISO 8879 or not. When it was observed that the SGML keywords visible in XML are all declaration keywords, while the XML (and XLL and XSL) keywords are all generic identifiers, attribute names, attribute values, or PI tokens which are carefully designed to resemble these (`pseudo-GIs', `pseudo-attributes', etc.), the initial reservations were removed and this solution was approved by all those present, with the dissent of Dan Connolly.

Note: in an earlier meeting, the WG also resolved issue S.17. This decision has not yet been reported because it involves no change to the status quo; I take this opportunity to report it so the SIG knows that the WG has closed the issue.

S.17. Should the element form <e/> be required, allowed, or forbidden for various types of elements? The main proposals appear to be:

Decision: retain the status quo. Both forms are allowed for any element which has no content (whether elements of its type are allowed to have content or not). There should be an interoperability warning discouraging the use of the <e></e> form for elements declared EMPTY, since that form cannot be used for those elements in pre-TC SGML systems.

Rationale: The WG was persuaded by James Clark's arguments on this topic in the SIG discussion.

If the first choice were taken, a well-formedness constraint would have to be added to insist on consistency in all the occurrences of a given element type, in order to preserve the invariant condition that every well-formed document can be made a valid document by the addition of a suitable prolog. The WG was unwilling to jettison this invariant condition. If a DTD exists, the WF constraint would have to insist on consistency with the DTD; in all cases (even if no DTD is named), all WF-parsers would have to keep track of every element-type used in a document. Such a constraint would in turn violate the general rule that WF constraints are constraints checkable without reference to the DTD. The WG was unwilling to jettison this general rule.

If the third choice were taken, it would be impossible to parse valid XML documents with elements declared EMPTY using pre-TC SGML parsers.

17 November 1997

Date: Tue, 18 Nov 1997 00:13:12 -0800
From: Tim Bray <tbray@textuality.com>
Subject: WG meeting of Nov. 17, 1997

The WG met on Nov. 17th - an extra meeting to get through agenda items before our time runs out.

Present: Bosak, Bray, Clark, DeRose, Hollander, Kimber, Maler, Maloney, Murata, Nava, Paoli, Sharpe. Errors in summaries of discussions are mine.

The Internal Subset

The set of related questions have centers around what should be allowed to appear in the internal subset, what processors should be required to do with it, and what the RMD is about.

The feeling is that lightweight non-validating processors, which do not want to fetch external entities, should not be required to parse or act on markup declarations that are chiefly concerned with validation; this includes element declarations and most uses of parameter entities.

Agreed, Kimber and Maler abstaining, that:

  1. non-validating processors need not expand any external entities
  2. non-validating processors need not expand any parameter entities
  3. element declarations may not appear in the internal DTD subset
  4. parameter entity references in the internal subset are limited to `the top level' - i.e. where they can match markupdecl.

Note that this means that the internal subset can be parsed with a relatively simple regular expression.

What processors must do

Having established that even a non-validating processor must read the internal subset, must it use those declarations? Agreed unanimously that all processors must process and use ATTLIST declarations (providing apps with default attribute values where appropriate) and internal ENTITY declarations, expanding such entity references in the instance.

The issue here is that some have postulated the existence of super-dumb processors that are completely DTD-oblivious. We've really cut down what a non-validating processor has to deal with in the internal DTD, but if can't at least parse and use ATTLIST and internal ENTITY, then it isn't a conforming XML parser.

RMD

RMD is toast. We have renamed it standalone, with two formulations standalone='true' and standalone='false' - The value false means that there are markup declarations found externally (external subset, or ref to external PE) that will have an affect on what the processor will pass the application. The value true means that any such markup declarations, if they exist, are in the internal subset. This was unanimous.

We took up the question of the default value for standalone and lean strongly to making it true, but wanted a couple of days to mull this over. Note that it has no meaning unless there is a reference to the external subset or another external PE; another possibility would be to have no default, and simply say that a standalone= is required if there are external declarations.

Appendix F - Trivial Grammar

The participants in the call were unanimous, giving the simplifications above, in wanting to remove this appendix, feeling it adds little and suggests that we bless the existence of nonconforming parsers. Since this hasn't really been discussed recently, we decided to let it sit for a day or two.

xml-space

We discussed whether this is still of value assuming the presence of stylesheets, and found no consensus to remove it.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

19 November 1997

Date: Thu, 20 Nov 1997 17:39:32 -0600
From: C M Sperberg-McQueen <cmsmcq@TIGGER.CC.UIC.EDU>
Subject: XML WG decisions, 19 November 1997

The XML WG met on Wednesday, 19 November 1997, and discussed a number of issues, with the results outlined below.

Present: Jon Bosak, Tim Bray, Steve DeRose, Eliot Kimber, Andrew Layman (in part; for Jean Paoli), Eve Maler, Murray Maloney, Joel Nava, Peter Sharpe (in part), Michael Sperberg-McQueen.

Absent: James Clark, Dan Connolly, Dave Hollander, Makoto Murata, Jean Paoli (represented in part by Andrew Layman and in part by Jon Bosak as proxy), John Tigue.

Question: is a document which declares external NDATA entities in its internal subset, and names them in an ENTITY attribute on some element (e.g. for an image) allowed to say standalone='yes'? What about a document which declares external text entities?

Answer: standalone='yes' in both cases.

N.B. the values are yes/no, not true/false (typo in the report to the SIG).

(In the discussions which follow, issue numbers are from "Outstanding Issues in XML-lang"; the current version is kept at http://www.uic.edu/~cmsmcq/tech/xml/issues.html and older versions are kept in the same directory, with obvious names.)

S.14.c. Should the section on the standalone declaration (or some other section) have a note describing the normalization process required to allow an arbitrary document to use RMD='NONE' or RMD='INTERNAL'? If so, should it be a fairly detailed description of the process, or just a `motherhood note' intended to suggest to implementors that such normalization processes would be useful?

Decision: yes, at least an apple-pie statement that most documents with standalone='no' can be translated mechanically into equivalent documents with standalone='yes'.

Rationale: it does no harm and may serve to draw the attention of implementors to the possibility and utility of such normalization.

S.14.d. What should the default value of the standalone declaration be?

Decision: when there is no document type declaration, the standalone declaration has no meaning, so it doesn't matter. When there is a document type declaration, the default value is standalone='false'.

Rationale: for most real-life documents, standalone will be false -- if only on account of insignificant white space in element content -- unless the document has been run through a normalizer. It is easy for a normalizer to change the standalone value, or for the information provider to do so. If the default is made standalone='false', XML processors which act on that information will read the external DTD subset and all documents will be read correctly; if the default is made standalone='true', non-validating parsers which act on that information will skip the external subset and produce incorrect results for some documents.

S.16.a Should the name prefixes XLL and XSL be reserved?

Failed to garner sufficient interest to open this question. (In favor: Bosak, Sperberg-McQueen, Maloney, DeRose. Against: Maler, Nava, Bray, Kimber.)

S.16.b Should XML define the name XML-stylesheet and its meaning?

Decision: this is a request for enhancement; defer to version 1.1. (Dissenting: J. Nava.)

S.19 Should XML forbid empty system identifiers?

Failed to garner sufficient interest to open this question. The empty string is a valid URL; it is unlikely to be useful or correct in the declaration of entities, but it's impossible to require useful and correct URLs by syntactic means.

S.20 Should hexadecimal character references be uppercase, lower-case, mixed-case, or some combination?

Decision: the HCRO (hex character-reference open) delimiter should be '&#x' (i.e. lowercase) in XML. The hex characters may be uppercase, lowercase, or mixed case.

Rationale: the lowercase delimiter is preferable for reasons of aesthetics and legibility. There was some sentiment for lowercase hex characters, but allowing them to be either case makes it easier to cut and paste from Unicode tables in uppercase.

S.21 Should the XML spec and related documents refer to URLs or to URIs?

Failed to garner sufficient support to open this question, so the status quo (URL) is maintained. In the discussion, the main argument in favor of URIs was their intended greater inclusiveness; the main arguments against were (a) that while most readers can be expected to have a fairly precise understanding of URL, few would know precisely what a URI is, and (b) it is not clear that any binding specification of what counts as a URI could be found, to which the readers or implementors of the XML spec could be directed.

S.23 (Murray-Rust) Should XML formalize in some way the notion of attribute value inheritance used in XML:lang and in various XLL constructs?

Decision: an RFE; to be considered for 1.1. (Some WG members think this should not be reconsidered.)

S.24.a Must XML processors detect the error of using an encoding declaration other than at the beginning of an external entity?

Decision: yes, they should. Using an encoding declaration other than at the beginning of an external entity is a fatal error.

Rationale: as Murata-san has pointed out, this state of affairs might arise either because users believe wrongly that they can change encoding in the middle of an entity or because they have concatenated two originally distinct data streams into a single data stream (e.g. by a Unix cat command). The mid-stream encoding declaration is the only chance a processor has to detect and signal such an error.

If the two encoding declarations actually named the same encoding, a processor could in theory `recover' from the error and issue a warning instead of a fatal error; the WG felt having a single, simple rule was better than allowing such recovery. (If the spec still had the concept of a `reportable error', some members of the WG would have voted to make this one, but no one felt we should reintroduce that notion for this case.)

S.24.b (Nava) Should XML allow an encoding declaration at the beginning of internal text entities?

Decision: No. Logic is the same as for S.24.a.

S.24.c (Murata) May encoding declarations be specified via text entities (i.e. after a declaration of the form
   &lt;!ENTITY decl "&lt;?XML ENCODING='EUC-JP'?>">
may an external general text entity begin
   &amp;decl;?

Decision: No, it may not.

Rationale: allowing ampersand followed by a name as the first characters of an entity which needs an encoding declaration would complicate the algorithm for detecting enough of the encoding to read the declaration.

S.27 Should XML prescribe that if no ATTLIST declaration is present, attributes named ID should be treated as if declared ID ID #IMPLIED?

Decision: an RFE, to be considered for XML 1.1. (Dissenting: Murray Maloney.)

Rationale: too complex to introduce without more lead time. Also, some WG members felt that if adopted, this rule should apply only when no document type declaration at all is provided.

S.29 Should XML revise its methods of escaping text for the inclusion of scripting language? (In particular, using a keyword other than CDATA.)

Insufficient support for reopening this question.

Rationale: no good term suggests itself as a replacement for CDATA: any candidate must apply equally to all expected uses of CDATA sections, and to CDATA attributes. Some WG members felt that anyone engaged in embedding scripting language material into an XML document was necessarily at a technical level adequate to grasp the notion of a magic formula which makes no intuitive sense and must be repeated with great accuracy; they questioned whether changing from a wholly opaque term to a partially opaque one was a sufficient improvement.

Errors in the summaries are mine.

-C. M. Sperberg-McQueen

24 November 1997

Main Report

Date: Wed, 26 Nov 1997 16:25:51 -0600
From: C M Sperberg-McQueen <cmsmcq@TIGGER.CC.UIC.EDU>
Subject: XML WG meeting of 24 November 1997

The XML WG met on 24/25 November (i.e. it was 24 November for some participants and 25 November for others) and discusssed several issues, with the results outlined below. As usual, issue numbers (where used) refer to the relevant paragraphs of "Outstanding Issues in XML-lang" at www.uic.edu/~cmsmcq/tech/xml/issues.html. Several issues were taken up from email, and their numbers assigned retrospectively.

Present: Jon Bosak, James Clark, Steve DeRose, Eliot Kimber (in part), Eve Maler, Murray Maloney, Makoto Murata, Joel Nava, Jean Paoli, Peter Sharpe, C. M. Sperberg-McQueen, and John Tigue.

Namespace note. The WG agreed to accept the change proposed by Andrew Layman, to allow both a name and a URL for the namespace being identified in a Namespace PI. This was unanimous, but Sperberg-McQueen wished the record to show that he continued to believe the Namespace PI should provide slots for the URL of both the machine-readable schema and the human-readable documentation, and disagreed with Layman's suggestion that Layman's proposal was responsive to Sperberg-McQueen's concern.

S.29 Scripting languages and CDATA sections.

As reported earlier, the WG felt on 19 November that there was no useful change to be made to the specification in this area. It was felt that the recent discussion of CDATA sections in the SIG had raised no new issues and provided no new information. The WG accordingly reconfirmed its decision of 19 November to make no change in the relevant portions of the spec, but agreed to treat the issue as an RFE (request for enhancement). Paoli registered his violent objection to retaining CDATA sections in the spec.

Should XML remove the Character classes Ignorable and Extender from the class of Name characters?

The WG discussed this issue, as raised by Murata during the last week and in several previous messages dating back several months, deciding ultimately to delay final decision until Wednesday, so that WG members not fully conversant with the salient issues could review the discussion and gather their thoughts.

S.26 Should the productions and texts for EntityValue be revised to change or clarify the processing of percent signs (in particular percent signs which do not mark parameter-entity references)? Should XML forbid literal percent signs to be used, and add percent to the list of predefined entities?

Decision: the productions for general entities and parameter entities definitely need to be split, to ensure that parameter entities are not declared with the keyword NDATA. There was some uneasiness with the current spec's treatment of percent-sign and ampersand within entity values (the syntax treats them identically, but the parser must treat general- and parameter-entity references differently), but no consensus on an alternative treatment. It was agreed that if the editors or other WG members can think of a better way to describe the treatment of entity values, the current description should be replaced. Unanimous.

S.28 Should XML drop the attribute XML-space?

Withdrawn by the proposer.

S.30 (Clark, Nava) Should XML define (for use of DTD-oblivious processors) an XML:atts attribute? It would be declared as NMTOKENS and contain a series of attribute name / attribute type pairs:

  <e id=foo ref1=bar ref2=bars XML-ATTS="id ID ref1 IDREF ref2 IDREF"/>

Decision: an RFE, to be reconsidered for version 1.1. (Dissenting: Sperberg-McQueen, on the grounds that the proposal should be rejected, not postponed.)

Rationale: Sorry, this reporter is unable to provide a rationale for this decision, which fills him with amazed incomprehension.

S.32 (Connolly, Clark) Should the concept of well-formedness be revised to apply only to entities, rather than to documents?

Decision: yes, as far as possible.

S.33 Should the global attributes defined by XML have names with hyphens (xml-space, xml-lang) or colons (xml:space, xml:lang)?

Decision (Sperberg-McQueen dissenting): Colons.

Rationale for the decision: this is a natural application of our name-space rules. Rationale for the dissent: XML currently has no name-space rules; until it does, we should not pretend we do.

S.34 Should AttlistDecls be allowed to have zero Attdefs?

Decision: Yes.

Rationale: this is legal in the WebSGML annex, and very useful.

S.35 Should EncodingPI be allowed to include XML version information as well as encoding information?

Decision: Yes.

Rationale: this allows entities to function both as external entities and as document entities, for documents without a DTD. Without this decision such behavior would have been restricted to entities which do not need an encoding declaration (i.e. entities in UTF-8 or UTF-16.

S.36 Should NOTATION declarations be allowed to have a PUBLIC identifier without a following system identifier?

Decision (Sperberg-McQueen dissenting): Yes.

Rationale: there is no suitable interpretation of a URL in the case of a notation. A pointer to the documentation is not particularly useful to a software system; a pointer to a program to handle data in that notation is not guaranteed to be system-independent. (Rationale for the dissent: what's good for the goose is good for the gander. System identifiers should be optional whenever a public identifier is provided.)

S.37 Should XML require ID attributes to have a default value of either #IMPLIED or #REQUIRED to agree with ISO 8879 11.3.4?

Decision: Yes.

Rationale: ISO 8879 requires this, and there is no reason at all to diverge from SGML in this area. In many cases, any other declaration would lead to invalid documents, in which multiple elements have the same (default) ID, but there are other cases.

S.38 Should the grammar for version information be changed to allow XML processors to accept version="1.1" without a fatal error, in the interests of a smoother transition to later versions of XML?

Decision: Yes, a form of simple name token should be defined for use there. Accompanying prose should say that the intent is that version 1.0 will mean this version of this spec and other values will be used for later revisions. Processors may signal an error if they receive documents labeled with versions they do not support. (But they need not do so: a document may be labeled HTML 4.0 and be, at the same time, legal HTML 1.0, and similarly for XML.) It is an error for a document to use the value 1.0 if it does not conform to this version of this spec.

Rationale: otherwise, the transition from version 1.0 of XML to later versions will be fraught with problems. There is anecdotal evidence that some other specifications which made this mistake have continued to require that data conforming to versions 2 and later of the spec be labeled version=1.0, precisely because the installed base of parsers had hard-coded the value.

As always, summaries of the WG's rationale for its decisions are the responsibility of the reporter.

-C. M. Sperberg-McQueen
University of Illinois at Chicago
ACH/ACL/ALLC Text Encoding Initiative
cmsmcq@uic.edu, tei@uic.edu
+1 (312) 413-0317, fax +1 (312) 996-6834

Supplement

From: "Joel Nava" <jnava@Adobe.COM>
Date: Wed, 26 Nov 1997 16:20:56 -0800
Subject: Re: XML WG meeting of 24 November 1997

On Nov 26, 4:25pm, C M Sperberg-McQueen wrote:

S.30 (Clark, Nava) Should XML define (for use of DTD-oblivious processors) an XML:atts attribute? It would be declared as NMTOKENS and contain a series of attribute name / attribute type pairs:
  &lt;e id=foo ref1=bar ref2=bars XML-ATTS="id ID ref1 IDREF ref2 IDREF"/>

Decision: an RFE, to be reconsidered for version 1.1. (Dissenting: Sperberg-McQueen, on the grounds that the proposal should be rejected, not postponed.)

Rationale: Sorry, this reporter is unable to provide a rationale for this decision, which fills him with amazed incomprehension.

As always, summaries of the WG's rationale for its decisions are the responsibility of the reporter.

Well then, let me provide a rationale to be added to the archive.

This proposal is intended to provide for the needs of DTD-oblivious parsers. One thing that a DTD-oblivious parser may still find very usefull is attribute typing. Keeping this proposal as an RFE leaves a placeholder for this issue and for 2 other related proposals to solve the same problem. One involves a new xml PI, and the other involves allowing attlist declarations in the prolog when there is no DTD. I think some of us in the WG would like to consider this at a later time, thus the rationale for making this an rfe.

Joel

26 November 1997

Date: Mon, 1 Dec 1997 14:40:27 -0600
From: C M Sperberg-McQueen <cmsmcq@tigger.cc.uic.edu>
Subject: WG meeting of 26 November 1997

The XML WG met Wednesday, 26 November 1997, and reached the decisions summarized below.

Present: Jon Bosak (JB), Tim Bray (TB), James Clark (JC), Dan Connolly (DC), Steve DeRose (SD), Dave Hollander (DH), Eve Maler (in part -- EM), Murray Maloney (MMal), Makoto Murata (MMur), Joel Nava (JN), Jean Paoli (JP), Peter Sharpe (in part -- PS), Michael Sperberg-McQueen (MSM), John Tigue (in part -- JT).

The WG began by revisiting issues S.39, S.21, and S.22, and then continued with other items not yet addressed.

S.39 Should the syntax of XML names:

Decision: The syntax of XML names will resemble that given in Unicode section 5.14, and in the current draft of the XML spec, with the following differences:

Note in particular that the class of 'Extender' characters is retained.

In favor: JB, TB, SD, DH, MMal, MMur, JP, PS, MSM
Opposed: JC, JN, JT
Abstaining: DC

Rationale: Several WG members objected strongly to the status quo on the grounds that it included too many characters of extremely dubious utility. Since the XML does not require processors to be Unicode-conformant (in particular, it does not require that they treat as identical such 'equivalent' forms as precomposed and decomposed e with acute accent as identical), a close adherence to the Unicode definition of identifiers is unnecessary. Since the Unicode rule may plausibly regarded as not fully cooked, a close adherence to it would be unwise.

The ISO technical report appears very promising, and had as many strong supporters as any of the choices discussed, but its publication date is too far in the future to allow us to rely on it; several WG members objected strongly to it on this ground.

It was suggested that XML follow the rules of Java identifiers; the Java rule consists of allowing everything within a relatively small number of character ranges. This had almost no opponents, though no strong supporters, until discussion made clear that we would be forced either to identify Java as the source of the rule (in which case there were several strong opponents) or not identify Java as the source of the rule (in which case there was a different set of strong opponents).

The alternatives of 'radical inclusion' and 'radical exclusion' were proposed, with the senses "include everything not defined by XML as a delimiter", and "exclude anything anyone could object to", respectively. Radical inclusion was agreed to be simple to define, but half of those present were strongly opposed to it. Radical exclusion was deemed impractical in its pure form, since creating a full list of what anyone might object to would take a lot of time, and effectively reduce to choice (e) in the list above. The WG was virtually unanimous in its belief that the XML WG and SIG are not suitable bodies for a character-by-character consideration of the entire character inventory of Unicode.

A more limited form of exclusion was rapidly defined and agreed to, namely the omission of 'compatibility decompositions' and 'Ignorable' characters. This weeds out the majority of items found objectionable, but is easily defined in terms of the Unicode character properties.

S.21 Should the XML spec and related documents refer to URLs or to URIs?

This was first addressed on 17 November 1997, but was reconsidered in view of the subsequent discussion and in view of DC's report that the W3C is trying to unify its specs on the term URI.

Decision: URI.

In favor: JB, JC, DC, SD, DH, MMur, JN, JP, PS, MSM, JT
Opposed: TB, MMal

Rationale: the work on URNs is not as moribund as has been alleged, there is in fact a spec for URIs, and soon (reports say) the spec we refer to for URLs will itself be obsolete. These observations led some WG members to feel that the term 'URI' is not as problematic as had been felt in the earlier decision. DC's report that the W3C as an institution is trying to unify on the term 'URI', coupled with a sense that URIs should in fact be embraced rather than cold-shouldered, led to the conclusion that the term URI is not only acceptable but preferable. In dissent, TB argued that URIs are not in fact currently deployed, that most readers will be familiar with the term URL but not with the term URI, and that until the new specs are actually adopted, any speculation that the term URI is about to become the standard term remains just speculation. Some members of the WG asserted that some off-the-shelf browsers do perform URN lookup, or can be made to do so (I am not sure I followed all the details); as regards the terminology, the majority felt that a definition of URI and some reader education should do the trick.

S.22 Should XML make any syntactic distinction between the internal and external subsets of the DTD? (At various times, it has been proposed and sometimes accepted that XML should forbid some parameter-entity references, all parameter-entity references, and conditional sections in the internal subset.) Should any distinctions apply to external parameter entities referred to from the internal subset?

This was discussed 17 November 1997, at which time the WG voted to forbid element declarations, conditional sections, and parameter-entity references within declarations (other than the doctype declaration); it was reconsidered, however, in light of the subsequent discussion in the SIG, notably Henry Thompson's compromise suggestion to make element declarations legal without requiring conforming processors to parse them fully, and Charles Goldfarb's proposal to remove the external subset entirely from XML.

Decisions:

S.22.a Element declarations will be restored to the internal subset. Like every other part of the grammar, they should be understood and parsed by all conforming processors (i.e. Henry Thompson's compromise proposal was rejected).

In favor: JB, TB, SD, EM, MMal, JN, JP, MSM, JT
Opposed: DC
Abstaining: JC, DH, MMur, PS

Rationale: several WG members reported that their vote on 17 November had been based on a misapprehension. Given a document known to be valid (or even merely well-formed), it is possible for a simple process to skip past the internal DTD subset using a regular expression, even if element declarations are allowed within the internal subset. That is, forbidding element declarations in the internal subset makes no difference to the ability of a 'desperate Perl hacker' to perform an ad hoc transformation of an XML document instance using regular expressions. It makes a difference only to the creators of non-validating conforming XML processors. Other WG members reported that in the light of subsequent discussion they no longer entertained the hope that eliminating element declarations would make a serious difference to this class of developer, or to developers of RDF processors, SMIL processors, etc.

Excluding element declarations from the internal subset thus seemed to the WG to have none of the advantages originally hoped for. All of its known disadvantages, however, remained every bit as large as had been thought. Under the circumstances, the compromise proposal put forward by Henry Thompson (allow element declarations but not require non-validating processors to check them) was felt to involve an unnecessary complication of the relationships among the concept of well-formedness, conformance, Draconian error handling, and the EBNF grammar in the spec. Some members of the WG felt they could live either with or without element declarations in the internal subset, but the one thing they were seriously opposed to was the Thompson compromise.

S.22.b Should XML eliminate all distinctions between the internal subset and external parameter entities and the external subset (i.e. allow conditional sections and unrestricted parameter-entity references in the internal subset)?

Decision: no.

In favor: JN, JP, MSM
Opposed: JB, TB, JC, DC, DH, EM, MMal, PS, JT
Abstaining: SD, MMur

Rationale: Unlike element declarations, conditional sections do complicate the task both of the non-validating processor writer and of the desperate Perl hacker. Parameter-entity references similarly complicate life, at least for the processor writer.

S.22.c Should XML eliminate the external subset itself (as proposed by Charles Goldfarb)?

Failed to generate enough consensus to (re-)open this question.

S.40 Should Entity Declared be a VC or a WFC?

Decision: In a standalone document (one without a DTD, one with only an internal subset and no references to external parameter entities, or one with "standalone='yes'"), this constraint should be treated as a WFC: i.e. it must be checked by all conforming processors. In a document with a DTD and "standalone='no'", it should be treated as a VC.

Unanimous (MMal and EM abstaining).

Rationale: it cannot be a WFC without serious injury to the notion of Draconian error handling. As the current draft (97-11-17) makes explicit, a non-validating processor cannot be expected to know whether an entity declaration for an entity being referred to does or does not occur in some external parameter entity or external DTD subset. But if the constraint is a well-formedness constraint, even a non-validating processor should catch the error. So for "standalone='no'", it should be a VC -- a constraint enforceable only if one reads the entire DTD.

For documents without a DTD, however, or with "standalone='yes'", a non-validating processor can in fact be expected to know what entities have been declared (if there is no DTD, none have been; if standalone='yes', only those declared in the internal subset) and to tell whether an entity referred to in the document is one of them. Failure to require all processors to detect this as a fatal error would lead to all sorts of ad hoc bad practice. Some WG members speculated that failing to make this a WFC would mean the entity 'today' would immediately be declared to mean 'the date given by the system clock' and so on -- an entire API could be defined as a set of magic entity names, with parts of the name separated by dots or hyphens being treated as parameters ... A brief consideration of this possibility quickly led all doubters to agree that undeclared entities should be a WFC in all cases where a non-validating processor can be expected to detect them.

All summaries are the responsibility of the reporter and are subject to correction and elaboration.

-C. M. Sperberg-McQueen
University of Illinois at Chicago
ACH/ACL/ALLC Text Encoding Initiative
cmsmcq@uic.edu, tei@uic.edu
+1 (312) 413-0317, fax +1 (312) 996-6834

1 December 1997

Date: Thu, 4 Dec 1997 18:54:21 -0600
From: C M Sperberg-McQueen <cmsmcq@tigger.cc.uic.edu>
Subject: XML WG decisions of 1 December 1997

The XML WG met on the 1st of December, 1997, and discussed several issues, with the results shown below.

Present: J. Bosak, T. Bray, J. Clark, S. DeRose, D. Hollander, E. Kimber, E. Maler, M. Maloney, M. Murata, J. Nava, J. Paoli, P. Sharpe, C. M. Sperberg-McQueen, J. Tigue.

The issues and their resolutions are shown in the numerical order of their identifying numbers, not in the chronological order of discussion.

S.41 Should the list of declarations which affect the standalone document declaration be changed to include attributes of type ID, or be changed in other ways?

Decision: No change. Unanimous.

Rationale: this was originally proposed on the grounds that DSSSL processes (for example) might depend on knowing whether a given attribute was or was not declared as an ID. When the WG discussed it, however, the DSSSL experts among us suggested it was not such a good idea after all.

M.1 Should the draft be re-organized as suggested by Dan Connolly (at www.w3.org/XML/Group/9708/xml-lang-dc.html and in mail to the work group at various times)?

Decision: most of DC's suggestions should be considered requests for enhancement and be considered in work on any later version of the spec. Some specific items were considered, as follows.

M.1.a Move origin and goals section to an appendix.

No support for this.

M.1.b Move section 1.2 (Relationship to Existing Standards) to the back, rename it "Normative References", and move remarks about the SGML TC to the "Status of this Document" note at the beginning.

Agreed unanimously.

M.1.c Drop section 1.3 ("Terminology") entirely; insert definition of each of these terms at first use.

An RFE.

M.1.d Move section 1.4 ("Notation") to the end of the document.

Agreed.

M.1.e Drop section 1.5 ("Common Syntactic Constructs") entirely; include these productions in the main body of the text, at the point where each is first referred to.

Accepted as a desirable change for version 1.1; not for 1.0, as moving so much so fast is too error-prone.

M.1.f Drop section 2.8 ("White Space Handling") entirely.

Already decided as item S.28.

M.2 Passim: Should the non-terminal S be dropped from the grammar and replaced with a single rule for lexical scanning within markup and DTD? (CMSMcQ, 1 July)

An RFE.

M.3 Passim: Should the selection of non-terminals be systematically revised?
M.3.a In particular, should we make all subexpressions of the form (Char* - (Char* 'some-literal' Char*)* into independent non-terminals? This would result in productions for CommentData, PIData, and CDSectData. Cf. EntityValue, AttValue, etc. "The breakdown of things into named syntax components is very haphazard and needs careful re-analysis. I think the naming of more things would lead to better modularity (by exposing things that were irregularly defined) and to better interaction among implementors (by regularizing naming)" (Kent Pitman, 1 July 97)
M.3.b In particular, should definitions of character references and of Name and LatinName be regularized? (Kent Pitman, 1 July)

An RFE.

M.4 Sec. 1.2 Should the reference to ISO 8879 be labeled a "normative reference"? (Sperberg-McQueen) (Bullard)

Decided (Sperberg-McQueen dissenting) not to make the reference a normative reference.

M.5 Should sec. 4.3.3 allow external entities to inherit encoding info from the referring entity? (Murata)

Withdrawn.

M.6.a Sec. 1.5 Should the XML spec define URLchar, since RFC 1738 does not? (Thompson)
M.6.b Should URLchar be renamed xchar since that is what 1738 calls it?
M.6.c Should URLchar be rewritten (uchar | reserved) since that is what 1808 uses (Thompson)?
M.6.d Should a note be added showing how to include all Unicode characters in URLs? (Duerst)

Decided unanimously:

  1. No attempt will be made to reproduce the character restrictions on URLs or URIs; processors will not be required to enforce such restrictions.
  2. The prose (not the grammar) will say that the system identifier is a URI.
  3. The prose will also point out that strictly speaking the hashmark and fragment identifier familiar to HTML users is not part of the URI.
M.7 Should there be a digression (in Clause 4.3.2) or appendix on escaping characters in system identifiers? "There is frequent confusion in HTML over when to URL-escape a character and when to SGML-escape it. For instance, to query the engine http://foo.com/cgi-bin/search for the poem with id jack&jill and the company at&t, one must URL-escape the meaningful (to URLs) ampersand as %26: jack%26jill, at%26t. One then creates a URL:
http://foo.com/cgi-bin/search?poem=jack%26jill&amp;amp;comp=at%26t
Then, to include this URL as an href attribute, one must SGML-escape the meaningful (to SGML) ampersand as &amp; <a href = 'http://foo.com/cgi-bin/search?poem=jack%26jill&amp;comp=at%26t'> This quickly exceeds the understanding of most users, and should be explicit in the XML spec." (Maden)

Decision (Paoli dissenting): no.

Rationale: this belongs in a user's guide to XML, not the spec.

M.8 Should sec. 4.3.3 outlaw hankaku katakana characters (range #xFF66 - #xFF9F), restricting such characters to numeric references (Murata); make corresponding changes to NAMESTRT (drop 65382-65391) and BaseChar (production [74]). (So also Jelliffe)

Resolved by our decision on S.39 (97-11-26).

M.9 Various character reclassifications (Murata, 3 July)

Resolved by our decision on S.39 (97-11-26).

M.10 Should sec. 4.3.3 outlaw EUC control functions SS2 and SS3; restricting the necessary characters to numeric references? (Murata)

Withdrawn.

M.11 Should sec. 4.3.3 deprecate (but not outlaw) all compatibility-zone characters? (Jelliffe)

No.

M.13.a Should Appendix A remove #xff10-#xff19 (fullwidth digits) from Digit? (Murata)

Resolved by our decision on S.39 (97-11-26).

M.13.b Appendix A remove everything from Digit except the (Western) Arabic numerals 0..9 (i.e. drop Arabic-Indic, Eastern Arabic-Indic, Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, Malayalam, Thai, Lao, Tibetan, and fullwidth decimal digits? (Jelliffe) (Murata) (Peterson)

An RFE.

M.14 Should hex character references use (a) upper-case X, (b) lower-case X, or (c) either? Should they use upper-, lower-, or either-case digits A-F?

Same as issue S.20 (the 'x' will be lowercase, hex characters can be in either case).

M.15 Should the spec refer to XML as "The Extensible Markup Language" or as "Extensible Markup Language" without a definite article (e.g. in the first sentence)?

The WG elected to give no guidance to the editors on this issue (in the full expectation that the result would depend on which editor touched the file last).

Rationale: after several minutes' discussion and increasing hilarity, no consensus had been reached, but the end of the allotted time for the conference call had.

M.16 Should NDATA entities be referred to as binary or as unparsed entities? Should other entities be referred to as text or as parsed entities?

Decided unanimously (Murata and Sperberg-McQueen abstaining) to use the terms 'parsed' and 'unparsed' instead of 'text' and 'binary' (and in preference to the other suggestions 'opaque' and 'NDATA'.

The WG also considered whether XML processors must check that the characters of an XML document are legal Unicode characters, or legal XML characters.

Decided by consensus that the definition of well-formedness implies that all processors must check that characters in the document fall within the definition of character in the spec (nonterminal Char).

Considered further whether to modify the production for Char to ensure that XML documents contain only legal Unicode 2.0 characters; no serious support for this possibility.

Considered whether to add a validity constraint requiring validating XML processors to check characters for legality in Unicode 2.0. The proposal failed.

In favor: Bray, Murata
Opposed: Bosak, Clark, DeRose, Maler, Maloney, Nava, Paoli, Sperberg-McQueen
Abstaining: Hollander, Kimber, Sharpe, Tigue

Considered further whether character references should or should not follow the same rule. Agreed unanimously that they should.

As usual, the summary of the WG rationales is the responsibility of the author and is subject to correction and elaboration.

-C. M. Sperberg-McQueen
University of Illinois at Chicago
ACH/ACL/ALLC Text Encoding Initiative
cmsmcq@uic.edu, tei@uic.edu
+1 (312) 413-0317, fax +1 (312) 996-6834


15 Dec 1997