Reports

From the W3C SGML ERB to the SGML WG

And from the W3C XML ERB to the XML SIG


Compiled for the use of the WG and SIG by

C. M. Sperberg-McQueen

4 December 1997

Table of Contents


This document contains the text of reports to the World-Wide-Web Consortium's SGML Work Group (SGML WG) and later XML Special Interest Group (XML SIG), from the SGML Editorial Review Board (SGML ERB) and later XML Work Group (XML WG), as posted to the appropriate email discussion lists. The text has not been changed substantively, though some typographic errors have been silently corrected, and asterisks and similar devices used to signal emphasis or list structure have typically been replaced by appropriate SGML tagging.

It is intended that this document reproduce all the reports which describe decisions taken by the ERB/WG and their rationales; if readers are aware of any such reports which have been overlooked, they should contact the author. The rationales sometimes given here are useful, but much of the reasoning behind the decisions summarized here lies in the extensive discussion in the WG/SIG before and after the decisions, which should be consulted by anyone interested in a fuller understanding of the decisions.

The reports are arranged chronologically by date of the meeting. A subject index would be desirable but would go beyond the time available for preparing this compilation.

9 October 1996


Date: Wed, 09 Oct 1996 12:56:39 -0700
From: Tim Bray <tbray@textuality.com>
Subject: Report from the SGML ERB meeting of Oct. 9th

The SGML ERB met Wed. Oct 9th and voted on quite a few items. In attendence: Bosak, Bray, Clark, DeRose, Hollander, Kimber, Maler, Paoli, Sperberg-McQueen, and Sharpe. Absent: Magliery and Connolly. By a recent resolution of the ERB, and at Dan Connolly's request, he is now a non-voting liaison member. Thus, "Unanimous" means 10 in favor. No votes were close enough that Tom's presence or absence would have made a difference.

Several issues were left unresolved at the end of the meeting; the ERB will be meeting tomorrow and Saturday to get through this stuff.

A.1 XML will have only one concrete syntax, fixed at XML specification time, not document-instance parse time (0.2, 13.3, 13.4).

Passed, Unanimous

A.2 All or virtually all the information provided by a normal SGML declaration will be fixed for all documents; no SGML declaration will be necessary. (Possible exception: character-set information may vary document to document, but will be conveyed in other ways.) (6.2.3)

Passed, Unanimous

A.3 XML will have no OMITTAG, DATATAG, SHORTREF, LINK, CONCUR, RANK, or SUBDOC features (7.3.1, 7.3.1.1, 7.3.1.2, 7.3.2, 7.4, 7.5, 7.6, 7.7, 7.8, 9.4.6, 11.2, 11.5, 11.6, 13.5).

Passed, Unanimous

A.4 XML will make only partial use of the SHORTTAG feature: But The final point, on omitted attribute-value specifications, raises the general question of how XML systems will behave when no DTD, or a partial DTD, is provided -- if such omitted or partial DTDs are allowed. It also raises the question of providing a way for a document to signal that its DTD can be skipped without loss of information (e.g. because it has no default attribute values, or no empty elements, etc.). These questions are to be discussed and decided separately.

Passed, Unanimous

A.5 XML will have no quantities or capacities (7.3.3, 7.4.2, 7.9.2, 7.9.4.5, 9.4.1, 9.4.2, 9.8, 11.3.1, 13.2).

Passed, Unanimous

A.6 XML will not allow asynchronous marked sections -- marked sections must begin and end in the same element.

Passed; Unanimous. As Harvey Bingham pointed out, this needs careful phrasing to avoid ambiguity.

A.7 Should XML have CDATA, RCDATA, and TEMP marked sections or not?

XML will have CDATA marked sections, which must begin with the 9-character literal string "<![CDATA[" and end with the 3-character literal string "]]>". This is essentially Charles Goldfarb's proposal, although we may not call them CLEARDATA.

XML will not have RCDATA or TEMP marked sections.

Both Unanimous.

A.8 Should XML have INCLUDE and IGNORE marked sections or not? (If this question is answered YES, it leads to a separate question, how to achieve conditional inclusion in XML markup declarations. This related question is to be decided separately.)

Split in two: XML will not have INCLUDE or IGNORE marked sections in document instances; Unanimous. The question of conditional markup in declarations is still open.

A.9 XML will have no CDATA or RCDATA elements (11.2.3).

Passed, Unanimous.

A.10 How should XML escape markup delimiter characters in content (especially if (R)CDATA elements and marked sections are not allowed)?

Unanimously agreed that CDATA marked sections are to be used for blocks of text. See A19 for more on this.

A.11 XML will retain the distinction between element content and mixed content (7.6, 11.2.4). (Applies only if DTD supplied and used.)

Passed, DeRose dissenting.

A.12 XML will require all attribute-value specifications to take the form of attribute-value literals (7.9.3, 7.9.3.1).

Passed, Unanimous.

A.13 XML will not allow RE to end an entity or character reference; an explicit refc must provided, and it must be a semicolon (9.4.4).

Passed, Unanimous.

A.16 XML will stipulate that character references within processing instructions should be resolved by the XML parser (8).

Defeated, Sperberg-McQueen dissenting.

A.18 XML will have declarations for elements, and attributes, but not for short-references or links (11.1).

Passed, Unanimous, for elements and attributes. Notations and entities remain open.

A.19 XML will retain fundamentally the same parsing rules as SGML, though they may be expressed differently. (N.B. there is some sentiment for making XML's rules more restrictive than SGML's.)

Agreed unanimously that the rules should be stricter than SGML in that the characters '&', '<', and '>' are deemed always to delimit markup, and must always be escaped, specifically as "&amp;", "&lt;", and "&gt;", when appearing in parsed character data. The ERB recognizes that this impinges on the user's name space in an un-SGML-like way, but feels that this has already, de facto, happened.

A.21 like SGML, XML will forbid empty strings as attribute values for non-CDATA attributes, require FIXED attributes to take their default values (7.9.4.1, 7.9.4.2), and distinguish IMPLIED values from null-string values (11.3.4).

Passed, 7 in favor, DeRose, Hollander, and Sperberg-McQueen dissenting.

A.23 XML will have no CURRENT attributes, but it will have FIXED, REQUIRED, and IMPLIED attributes, and attributes with explicit defaults.

Passed, Unanimous.

A.24 Unlike SGML, XML will not allow direct references to external data entities from within parsed character data (9.4).

Passed, Unanimous.

A.25 Like SGML, XML will forbid recursive entity reference (9.4).

Passed, Unanimous.

A.26 Like SGML, XML will allow elements to be declared ANY (11.2.4). (Whether other similar shorthand declarations will be defined, e.g. for any subelements but not allowing PCDATA, will be decided separately.)

Passed, Bray dissenting.

A.27 XML will behave like SGML as regards behavior and precedence of occurrence indicators and connectors in content models (11.2.4.1, 11.2.4.2). (Whether to abolish the AND connector will be decided separately.)

Passed, Unanimous.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

16 October 1996


Date: Thu, 17 Oct 96 11:01:39 CDT
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: some ERB decisions

The SGML ERB met Wed. Oct 16th and voted on several items already submitted to the SGML WG. Participating: Bosak, Bray, Clark, Maler, Paoli, Sperberg-McQueen, and Sharpe. Absent: DeRose, Hollander, Kimber, Magliery, and Connolly. All decisions were by consensus of all those participating in the call, and thus carry a majority of the membership of the ERB.

Several issues were left unresolved at the end of the meeting; the ERB will be meeting today and Saturday to discuss them further and resolve them.

A.20 XML will retain the notion and syntax of comments (= 8879's 'comment declarations') (7.6, 10.3), but comment declarations will contain at most one comment: comments will take the form '<!>' or else will begin with '<!--' and end with '-->' (no space allowed), and may not contain '--'.

Comments will take the form '<!--' ... '-->', no internal '--' is allowed and no white space between the final '--' and the final '>'.

Empty comments (<!>) will not be allowed in XML.

B.1 What should XML's character-set rules be? Should conforming XML documents be restricted to particular character sets? Should conforming XML processors be required to be able to parse all conforming XML documents (13.1)?

Agreed:

Still open: details of the mechanism to be used for signaling the encoding and/or coded character set in use.

B.2 Should XML require each document instance to have a DTD or not (7.1)?

XML will not require each document instance to have a DTD.

Open question: details of partial DTDs or DTD summaries, if any, and possible declarations indicating whether the correct ESIS is derivable for a document its DTD is not read.

B.4 Should XML forbid comments and processing instructions in mixed content, as a way of simplifying RE handling (7.6)?

Assuming that a satisfactory RE rule can be agreed on, XML will not forbid comments and processing instructions in mixed content.

B.5 Should XML restrict the use of the PCDATA token in content models, to simplify RE handling or eliminate the Mixed Content Problem? (7.6.1, 11.2.4)
B.5 restrict PCDATA to models of the form (#PCDATA)

No.

B.6 restrict PCDATA to models of the form (#PCDATA | x ... | z)*

Yes.

B.8 Should XML use MSOCHAR, MSSCHAR, and MSICHAR strings (9.7)?

No.

B.11 Should XML forbid, allow, or require empty end-tags (7.5)?

Forbid.

-C. M. Sperberg-McQueen
University of Illinois at Chicago

17 October 1996


Date: Thu, 17 Oct 96 14:10:29 CDT
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: B.1 and B.2 results

The SGML ERB met today, Oct 17th, and voted on several items already submitted to the SGML WG. Participating: Bosak, Clark, Maler (in part), Magliery (in part), Paoli, Sperberg-McQueen, and Sharpe. Absent: Bray, DeRose, Hollander, Kimber, and Connolly. All decisions were by consensus of all those participating in the call, and thus carry a majority of the membership of the ERB.

The text below is substantially the same as the drafts discussed by the ERB, but was edited after the meeting to reflect the ERB's decisions; the ERB has thus not seen and approved the precise wording given, and may choose to correct any editorial errors made in the revision.

-C. M. Sperberg-McQueen

Character-set Rules

B.1 What should XML's character-set rules be? Should conforming XML documents be restricted to particular character sets? Should conforming XML processors be required to be able to parse all conforming XML documents (13.1)?

It had already been agreed that:

In discussing the mechanism to be used for signaling the encoding and/or coded character set in use, the ERB decided the following. [Editorial note: if the ERB decides that XML will have external text entities, then everything said below about documents will also apply to all external text entities.]

The character repertoire of XML documents is that of ISO 10646. All XML processors are required to accept documents in the UTF-8 and UCS-2 encodings of 10646. It is recognized that accepting documents in the UTF-16 variant would be desirable. Documents encoded in UCS-2 must begin with the Byte Order Mark described by ISO 10646 Annex E and Unicode Appendix B (the ZERO WIDTH NO-BREAK SPACE character, U+FEFF) -- this is an encoding signature, and not (for SGML purposes) part of the document. XML processors must be able to use this character to differentiate between UTF-8 and UCS-2 encoded documents.

XML does not explicitly sanction the use of any other encodings. It is recognized, however, that many documents exist in other encodings. To support processors in dealing with this situation, an XML document may contain at its beginning, before any other text, markup, PIs, or white space, an Encoding Declaration PI matching

 
EncDecl ::=
  '<?XML' S 'encoding' Eq ("'" Encoding "'")|('"' Encoding '"') S? '>'

An XML processor may choose to read Encoding Declaration PIs and accept nonstandard encodings so declared. In validating processors such behavior must be at user option.

An XML document which lacks both the Byte Order Mark and an Encoding Declaration PI must be in the UTF-8 encoding. It is an error for a document to be in an encoding other than that declared in its Encoding Declaration PI.

The XML specification shall include (possibly by reference to relevant IETF documentation) a list of standard declarations for the nonterminal "Encoding" in the above production, to support interoperability, including names for at least ISO-Latin-X and the JIS family.

DTDs

B.2 Should XML require each document instance to have a DTD or not (7.1)?

In discussing this item, the ERB made the following decisions:

1. Well-formedness

The XML spec shall define two characteristics which an XML document may possess, called "well-formedness" and "validity". A well-formed document, informally, is one for which no content model checking has been done, but which can be read by an XML processor with confidence in producing a correct ESIS.

Questions remaining open include:

  1. the specific definition of well-formedness -- it is expected to include at least least (1) a containing root element with no text outside it, (2) properly nested elements, (3) properly structured tags, and possibly other constraints on entity references, empty elements, etc.
  2. whether two distinct levels of well-formedness (e.g. strong and weak) are necessary
  3. the nature of well-formedness when there is no DTD or a partial DTD remains open.

2. Required Markup Declaration (votable Y/N)

XML markup declarations are divided into DTDs pointed-at by the <!DOCTYPE, and internal subsets contained within the <!DOCTYPE. Markup declarations necessary to produce a correct parse may be contained either in the DTD or the subset. XML will include a signalling method whereby instances may contain statements indicating whether the declarations in the DTD and/or the subset are necessary to produce a correct parse.

XML documents may contain a Required Markup Declaration PI as follows:

 
RMDDecl ::= '<?XML' S 'rmd' Eq ('NONE'|'INTERNAL'|'ALL') S? '>'

The RMD PI must appear after the Encoding Declaration PI, if any, and before the document type declaration itself, if any.

Should the RMD state that the DTD is required ('DTD' or 'ALL'), it is a reportable error if the DTD cannot be retrieved.

3. Interpretation of Required Markup Declaration

If no RMD PI is given, then

If an RMD PI is given, then

19 October 1996


Date: Sat, 19 Oct 96 12:33:48 CDT
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: ERB decisions on A.17, B.9, and other questions

The SGML ERB met today, Saturday Oct. 19th, and voted on several items already discussed by the SGML WG. Participating: Bosak, Clark, Kimber, Maler, Paoli, Sharpe, Sperberg-McQueen. Absent: Bray (represented in part by written votes on open issues), DeRose, Hollander, Magliery. All decisions were by consensus of all those participating in the call, and thus carry a majority of the membership of the ERB.

I should note that the wording of the rationales given below reflects the understanding, and is the responsibility, of the author. The rationales have not been reviewed or approved by the ERB; they are thus subject to correction when I have misunderstood or misstated the ERB's intention.

The ERB agreed on the following position statements:

The rationale for the list and for its inclusion in the specification is to allow some topics to be postponed until there is more time for their resolution, and to inform users of XML of the expected lines of development.

In the light of these agreements, the ERB reconfirmed its earlier decision that XML 1.0 will not have SDATA entities. It is thought that most uses of SDATA entities are adequately served by character references to Unicode characters (see example below). Techniques for dealing with non-Unicode characters, specification of glyphs rather than characters, and related topics (such as possible mechanisms for document private agreements governing the ISO 10646 Private Use Areas) will be addressed in future revisions.

Instead of a declaration like

  
  <!ENTITY auml SDATA "[auml    ]">
any XML processor can work properly with a declaration of the form
 
  <!ENTITY auml "&#228;"> <!-- auml = a umlaut, U+00E4 -->

On question A.17 (Should XML have entities or not?), the ERB had already decided that XML would have internal entities (either text or CDATA, not both). Today we decided further:

The rationale for allowing internal text entities was this: CDATA entities are very easy to implement (because they need not be expanded at parse time, but can be expanded later without changing the structure of the parse tree); text entities are more complex (if they are synchronous, they may require the replacement of a leaf node with an arbitrarily complex subtree; if they are asynchronous, they must be expanded at parse time and complicate the parser). Nevertheless, internal text entities are so useful to the user that they justify the cost of implementation.

Whether XML will have external text entities remains an open question.

On question B.9, the ERB decided:

Whether system identifiers in XML 1.0 will be allowed to carry the <url> label remains an open question.

Addition of public identifiers and extension of system identifiers to other formats will be taken up in preparation of future versions of XML.

The rationale for these decisions was that URLs are well understood and well established, and can handle both remote and local addresses. Restricting external identifiers to URLs helps keep the specification simple. In the long run, however, public identifiers are desired by many users and may provide solutions to the well known fragility problems associated with URLs. Better infrastructure, in the form of catalog management tools and http-based catalog resolution services, would help make the introduction of public identifiers into XML smoother.

-C. M. Sperberg-McQueen

23 October 1996


Date: Wed, 23 Oct 96 17:39:24 CDT
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: ERB decisions, 23 October 1996

The ERB met today, 23 October 1996, and decided a number of questions. All members of the ERB were present (Bosak, Bray, Clark, DeRose, Hollander, Kimber, Magliery, Maler, Paoli, Sharpe, Sperberg-McQueen); decisions were taken by consensus except as noted.

As usual, summaries of the rationale for the decisions made have not been reviewed by the ERB and are thus subject to correction and further explanation.

A.17 Should XML have entities, or not?

The ERB had already agreed that XML should have internal text entities and external NDATA entities. Today, after discussion, we agreed that support for external text entities would be an optional feature of XML 1.0 (dissenting: Clark, Paoli, Sharpe).

The rationale for the decision was that support for external entities is (a) essential if XML is to be useful as an authoring language, but (b) a heavy burden for network-based client software. A proposal to define XML in such a way that external text entities were legal only if in local files (and thus not legal in network use of XML) attracted some support, but not enough.

The dissenting view on this decision was that allowing an optional feature and losing the monolithic definition of XML was too high a cost; the dissenters all also felt that external text entities should be disallowed unconditionally.

External text entities will be placed on the list of topics to be reviewed in preparing future versions of XML.

This topic may also be revisited in the near future (i.e. before version 1.0), depending on reports on the progress and status of W3C-based work on this and related topics.

The question of SDATA entities will be taken up again before XML 1.0 is published.

C.1 should XML require all entities to be synchronous with the document's logical structure?

Agreed unanimously that XML will require all entities to be synchronous with the document's element structure.

The rationale is that this simplifies parsing somewhat, allows entity expansion to be delayed if the implementation desires to do so, and makes possible simple checks for the well-formedness of external (and internal) entities.

C.2 should XML prescribe the use of an ENTITY-END character as the canonical method of handling entity boundaries, as a way of simplifying exposition and implementation (6.2.2)?

Agreed unanimously not to prescribe any particular method of handling entity ends; rationale: the proposal would tend to confuse, not simplify, the issue.

C.3 should XML retain or relax SGML's prohibition on ENTITY attributes referring to SGML text entities (7.9.4.3)?

Agreed unanimously to retain the prohibition. Rationale: compatibility.

C.4 if XML makes DTDs optional and allows partial DTDs, what must or may a parser do when it encounters references to undeclared entities (9.4)? Should XML declare any set of entities automatically?

Agreed unanimously that reference to an entity not declared and not included in the list of 'automatic' declarations is a reportable error. No particular error recovery strategy will be prescribed. Rationale: defining this as a non-error would weaken validation too much; error recovery should be left to the implementation, as different strategies are appropriate for different purposes.

Agreed unanimously to define automatically the entities lt, gt, amp, and two entities for double and single quotation (for use in attribute value literals), names to be determined in separate discussion.

Proposals to declare other sets of entities automatically (e.g. all of ISO Latin 1 or all entities declared in HTML 3.2) remain open questions.

C.5 if XML uses ISO 10646, should there be a special form of character reference using hexadecimal, not decimal, numbers, since most references to ISO 10646 and Unicode use hex, not decimal (9.5)?

Agreed (Clark dissenting) to specify that XML documents may refer to characters in ISO 10646 using the form '&u-' or '&U-' followed by four hexadecimal digits, followed by semicolon.

Rationale: Unicode and ISO 10646 documentation is in hexadecimal, not decimal, so this constitutes a small but important convenience and aid to reliability. The proposal to use '&u' was preferred to the '&#u' proposal since it is believed to allow SGML systems to handle these references (which appear to an SGML parser to be general entity references) using a default entity declaration. (Consult James Clark for details.)

C.6 Should XML retain SGML's prohibition on multiple declarations for the same element (11.2.1)?

Agreed unanimously to retain the prohibition. Rationale: compatibility. (Some ERB members may also apply the same rationale as for the dissent on question C.8.)

C.7 Should XML prohibit the use of inclusion and exclusion exceptions in element declarations? (11.2.4, 11.2.5)?

Agreed unanimously to prohibit their use in XML 1.0, and (with dissents from Bray, Magliery, and Sperberg-McQueen) to place them on the list of topics to be considered in preparing future versions. Rationale: simplification of validation and harmonization of XML parsing model with standard formal-language theory and practice.

C.8 Should XML prohibit content-model references to undeclared elements (11.2.4)?

Agreed (Bray, DeRose, and Sharpe dissenting) to allow such references. Rationale: this is a useful technique in the construction of large public DTDs which may be subsetted locally or document-by-document. Rationale for the dissent: clean grammars are easier to process and parse than dirty grammars. (N.B. 'clean' and 'dirty' here have the technical senses usual in discussions of formal grammars.)

C.9 Should XML forbid use of the '&' connector in content models (11.2.4.1)?

Agreed unanimously to forbid use of the '&' connector in XML. Rationale: harmonization with conventional regular expressions.

C.11 Should XML retain SGML's prohibition on multiple attribute-list declarations for the same element (11.3.1) or on multiple declarations for the same attribute (11.3.2)?

Agreed unanimously to retain the prohibition. Rationale: compatibility. (Some ERB members may also apply the same rationale as for the dissent on question C.8.)

C.12 Should XML change the set of types available for attributes? E.g. by suppressing NAME(S), NUMBER(S), NMTOKEN(S), NUTOKEN(S) and adding constraints in the form of regular expressions, ISO dates, language-code, external-id, type IDREF, ... (7.9.4, 11.3.3)

After discussion, agreed unanimously that XML should have the following attribute types: ID, IDREF, IDREFS, ENTITY, ENTITIES, CDATA, enumerated attribute types, NOTATION attribute type, NMTOKEN and NMTOKENS. The types NUMBER(S), NUTOKEN(S), and NAME(S) are to be dropped.

Rationale: the distinctions among the lexically defined types are not useful enough to justify retaining all of them, but they do provide convenient case-folding and white-space normalization. If just one is to be kept, it should be NMTOKENS, since it subsumes all the others and the other lexical types of SGML can be translated into XML by retyping them as NMTOKENS and adding an application-level check on the specific type of token required. Such application-level checks are in any case common among users of these types. The type NMTOKEN was retained in order to preserve the singular/plural symmetry with IDREF and ENTITY.

Extensions to the set of declared-value types in ISO 8879, though supported by Sperberg-McQueen, commanded no support for inclusion in XML 1.0.

Other decisions in batch C are still pending.

-C. M. Sperberg-McQueen

24 October 1996


Date: Thu, 24 Oct 96 13:02:11 CDT
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: ERB decisions, 24 October 1996

The ERB met today, 24 October 1996, and decided a number of questions. Present: Bosak, Clark, Kimber, Magliery, Maler, Paoli, Sharpe, Sperberg-McQueen; absent: Bray (represented in part by proxy votes), DeRose, Hollander. Decisions were taken by consensus except as noted.

As usual, summaries of the rationale for the decisions made have not been reviewed by the ERB and are thus subject to correction and further explanation.

A.15' XML will use a sort of 'formal processing instruction': the first token of the PI's system data will be a Name (e.g. <?TeX \vskip> or <?application-name application-specific instructions>) (7.6, 8)
Should the Name be required to be the name of a declared NOTATION?

Agreed unanimously that the Name need not be that of a declared NOTATION; if it is, however, the spec should state that the meaning is that the PI in question is in the notation (or: appertains to the notation processor) indicated.

Rationale: making the association explicit is a useful semantic clue, but requiring it is excessively burdensome.

A.18' Should XML have declarations for notations (11.1)?

Agreed unanimously that it should. Rationale: needed for NDATA entities (and PIs).

B.12 Should XML retain SGML's prohibition on multiple declarations for the same notation (11.4)?

Agreed unanimously to retain the prohibition. Rationale: compatibility.

B.13 Should XML remove SGML's prohibition on ENTITY attributes for notations (11.4.1)?

Agreed unanimously to retain the prohibition. Rationale: compatibility.

B.13 bis. Should XML allow any attributes at all for notations (11.4.1)?

Agreed (EK dissenting) to drop attributes on notations in XML 1.0. Agreed (MSM and EM dissenting) to place this topic on the list of topics to be (re-)considered in the preparation of future revisions of XML.

C.13 Should XML remove SGML's prohibition on multiple ID or NOTATION attributes on the same element (11.3.3)?

Agreed unanimously to retain the prohibition. Rationale: compatibility.

C.15 Should XML define new specific methods of inferring values for attributes with no attribute-value specifications (11.3.4)? E.g. INHERITED, to signify that the value is taken from the attribute of the same name (and type) on the smallest enclosing element with such an attribute.

Agreed (MSM dissenting) to define neither INHERITED nor any other new method of value-inference. Rationale: this topic will be treated in the second stage of the project.

Several other topics were discussed without achieving consensus. The ERB will meet again Saturday to continue these discussions.

-C. M. Sperberg-McQueen

26 October 1996


Date: Sat, 26 Oct 1996 11:37:27 -0700
From: Tim Bray <tbray@textuality.com>
Subject: SGML ERB Meeting of October 26: RE's resolved!

The ERB met on Saturday October 26. All members were present.

As usual, summaries of the rationale for the decisions made have not been reviewed by the ERB and are thus subject to correction and further explanation.

1. Reservation of name space

Agreed unanimously to add "." to the set of legal name-start characters for XML, and to reserve the portion of all name-spaces beginning ".XML." for the purposes of the language. This includes at least element GI's, attribute names, entity names, and element ID's.

2. RS/RE handling

Executive summary: use HTML rules, but provide an escape to RE Delenda Est.

Agreed unanimously, except for one sub-clause noted below, that:

2.1 There will be a mechanism, using a reserved attribute, to toggle, per element, between two modes of white-space handling. In "White Space Preservation" mode, all white space including RE is passed through to the application, with the exception of a single leading and trailing RE if they are alone on a line with the start- or end-tag. Note: "alone on a line with" assumes that comments have already been stripped. In "White Space Collapse" mode, all initial and trailing white space in an element is eaten by the parser, and all internal white space, including successive blank lines, is replaced by a single space character before passing to the application.

2.2 The setting of this toggle is by default inherited from the parent element. The root element of any document, by default, has the toggle set to "White Space Collapse" mode.

2.3 The White Space mode is orthogonal to the use of CDATA marked sections; that is to say, CDATA marked sections will still ignore markup delimiters, but will respect the current White Space mode. [On this decision, Bray, Sperberg-McQueen, and Sharpe dissented, preferring to have White Space Preservation mode built-in to CDATA Marked Sections].

Notes and Rationale:

  1. A large part of the world builds and uses applications that generally collapse white space. The objective of causing the minimal number of surprises mandates this default behavior.
  2. If XML is going to deviate from 8879 compatibility, it should be in a subtractive way; i.e., XML should eat at least as many RE's as SGML does. The HTML behavior seems the simplest way to achieve this.
  3. James Clark feels that there is a slight 8879 incompatibility in that if there are comments or PI's in an element that has White Space Preservation set, SGML will eat some RE's that XML will pass through. Clearly, given the vote, this is felt to be liveable.
  4. I have since discovered that beginning GI's with .XML. may cause massive incompatibility with CSS - we may end up having to reserve "-XML-".

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

30 October 1996


Date: Wed, 30 Oct 96 13:34:02 CST
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: ERB meeting, 30 October 1996

The ERB met this morning, 30 October 1996. Present: Bosak, Bray, Clark, DeRose, Hollander, Kimber, Maler, Paoli, Sharpe, Sperberg-McQueen. Absent: Magliery.

The rationale given has not been checked by the ERB and is subject to correction and supplementation.

B.10 What form should EMPTY elements take, if there are EMPTY elements in XML: <e>, <e/>, <e></e>, or <@e> (where the NET string is assumed to be '/>' and '@' is assumed to be an XML-specific flag for names of EMPTY elements; in SGML systems, '@' to be added to the set of name-start characters).

Agreed unanimously:

Rationale: Allowing the form <e> simplifies learning and conversion for existing SGML and HTML documents and users -- one of the rare cases where these two populations seem to have the same requirement. Allowing some self-identifying form simplifies the parsing of documents significantly, and makes it much easier to work without explicit declarations. Allowing both forms was felt to be a useful compromise -- part of the committee would have preferred to allow only one form, but was evenly split between the 8879 form and the self-identifying form. The entire committee felt unanimously, however, that allowing both forms was workable, particularly if the spec makes reasonably clear that one is the preferred form and the other is included only for compatibility reasons.

The choice among the proposed self-identifying form was motivated in part by pragmatic considerations and in part by aesthetics. If XML EMPTY elements carry end-tags then the EMPTY keyword will have different meanings to an XML and an SGML system; this was felt to entail too many complications, so <e></e> was ruled out. The form <@e> was not felt significantly easier or harder to implement than the form <e/>, though this may vary in different implementations. Both <@e> and <e/> may be compatible or incompatible with whatever delimiters for empty elements are present in SGML-97. There was a clear preference, however, for the form <e/>, based in part on the visual effect of the slash.

-C. M. Sperberg-McQueen

31 October 1996


Date: Thu, 31 Oct 96 13:01:34 CST
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: ERB decision, 31 October 1996

The ERB met today, 31 October 1996. Present: Bosak, Bray, Clark, Hollander, Kimber, Maler, Paoli, Sharpe, Sperberg-McQueen. Absent: DeRose, Magliery.

We discussed the issue of external text entities and agreed unanimously to rescind the decision of 23 October on question A.17 making support for external text entities an optional feature of XML. Instead, the XML spec will distinguish the treatment required for external text entities in validating and non-validating systems, using language something like this:

a reference to an external text entity is a signal to an XML processor that the entity's contents are to be included at the point of the reference. For purposes of validation, an XML processor must fetch and read the external text entity at the point of reference; for other purposes, processor behavior is not constrained.

This is draft, not final, wording.

If a network client chooses to fetch and process external entities at the point of reference, it may do so; if it chooses instead to insert a small icon instead, and fetch and display the entity only on request, or behave in some other way, it may do that too. This has the effect that, if they wish, browsers may from a technical point of view treat external text entities much the way they treat links to other documents or links to embedded graphics -- the user interface may well differ, but text entities do not require a browser to change its internal organization or way of working.

Rationale: distinguishing validation behavior from other behavior allows us to reduce the number of optional features in XML to zero, while retaining (a) the information provider's ability to segment documents into several files and (b) the network client's ability to handle documents without required waits during network fetches. Since text entities in XML must be synchronous, adding an entity to the data structure requires only the replacement of a leaf with one or more subtrees; this allows entity fetching to be delayed if delay is useful to the application.

As usual, the rationale just given is subject to correction.

-C. M. Sperberg-McQueen

6 November 1996


Date: Wed, 06 Nov 1996 12:30:23 -0800
From: Tim Bray <tbray@textuality.com>
Subject: Recent ERB votes

In a recent series of mail votes and meetings, the ERB has resolved several XML design issues. Under pressure of time, we moved very rapidly and votes may not have been fully and exactly recorded where the sense of the ERB on some issue became quickly obvious. It is possible that ERB members may wish to correct their reported votes. As always, accompanying rationales, where present, have not been reviewed by the ERB and may be subject to correction.

[No item number] Decided unanimously to change PIC for XML to be '?>'. This will allow a lot of things to fit into PI's that currently can't (most notably some proposed server-side scripting languages).

A.8, B.7 XML will have INCLUDE/IGNORE marked sections in DTD's

Passed, Bray and Paoli dissenting.

A.20' XML will change the COM delimiter from '--' to some other string, to minimize user errors. (Candidates: !!, /*, //, **, ??, ;;, ~~, !?, ?!, (), [], others ...)

Defeated, Sperberg-McQueen voting in favor

A.22 XML will have no CONREF attributes (11.3.3, 7.3, 7.9.4.4).

Passed (no CONREF), Kimber and Maler dissenting

B.9' Should XML require system and public identifiers to be FORMAL (13.5)?

This had actually become a discussion of whether to allow the <url> formulation in front of external identifiers, which must be URL's in XML.

Decided, DeRose, Kimber, Maler, and Sperberg-McQueen dissenting, not to allow the <url> prefix.

C.10 Should XML allow nondeterministic content models (11.2.4.3)?

Voted (Bray, Paoli, and Sharpe dissenting) to retain SGML's restriction in this area.

Rationale: Existing SGML tools, for example the SP family, have this rule wired deeply into their logic, and those who wish to use these tools on XML documents won't be able to if they have non-deterministic content models.

C.14 Should XML allow more than one enumerated type (name-group declared value) to contain the same possible value (11.3.3)?

Voted unanimously to remove SGML's restriction in this area.

Rationale: This is incompatible with 8879, but there is every expectation that WG8 will fix this problem soon; furthermore, making this change is not expected to cause serious inconvenience to existing SGML products, whereas the rule is a very serious inconvenience to users of XML and authors of XML software.

D.2 Should XML provide shorthand ways of summarizing the salient points of a document's DTD?

Discussion:

This turned out to be one of the hardest problems the ERB dealt with, and the key issue became that of EMPTY elements. Remember that in a previous decision we had agreed to recommend the <e/> syntax, but accept the 8879 syntax. Here are some of the sticky parts:

Bearing all this in mind, the ERB voted, Maler dissenting, that:

Rationale: For technical reasons, requiring and allowing only <e/> is a big winner. However, many of us, who anticipate an uphill struggle selling XML to web-heads felt that the marketing advantage in making it possible for HTML documents to be valid, and being able to say "XML processors can read HTML", were impossible to give up. In opposition, Eve Maler in particular felt it was unconscionable to kowtow to the requirements of one particular DTD. The ERB acknowledged that allowing <BR>, etc., does not enable to XML to grandfather, on a large scale, the existing inventory of XML; simply to state that (at least some normalized) HTML documents can be read by XML processors.

D.3 Should XML specify short-hand element declaration keywords (e.g. %ANY-ELEMENT;) for element content in which any element in the DTD is legal (same as ANY, but element not mixed content)?

Defeated, Sperberg-McQueen voting in favor.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

7-9 November 1996

COM Delimiter


Date: Sat, 9 Nov 1996 14:51:08 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Decision: A.20' (COM delimiter)

This is one of a series of reports on recent decisions of the SGML ERB.

A.20' XML will change the COM delimiter from '--' to some other string, to minimize user errors. (Candidates: !!, /*, //, **, ??, ;;, ~~, !?, ?!, (), [], others ...)

Decision: No. Dissenting: Sperberg-McQueen.

Conditional Sections in DTDs


Date: Sat, 9 Nov 1996 14:52:45 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Decision: A.8, B.7 (INCLUDE/IGNORE in DTDs)

This is one of a series of reports on recent decisions of the SGML ERB.

A.8, B.7 (merged) XML will have INCLUDE and IGNORE marked sections in DTDs.

Decision: Yes. Dissenting: Paoli, Bray.

CONREF Attributes


Date: Sat, 9 Nov 1996 14:55:09 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Decision: A.22 (no CONREF)

This is one of a series of reports on recent decisions of the SGML ERB.

A.22 XML will have no CONREF attributes (11.3.3, 7.3, 7.9.4.4).

Decision: Yes (no CONREF in XML). Dissenting: Maler.

System Identifiers


Date: Sat, 9 Nov 1996 14:56:02 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Decision: B.9 (<URL> in SYSTEM identifiers)

This is one of a series of reports on recent decisions of the SGML ERB.

B.9 XML will allow SYSTEM identifiers (which in XML 1.0 are required to be URLs) to be prefixed by "<url>".

Decision: No. Dissenting: Maler, Kimber, Sperberg-McQueen, DeRose.

Entity Resolution


Date: Sat, 9 Nov 1996 14:57:46 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Decision: C.4 (Predefined entities)

This is one of a series of reports on recent decisions of the SGML ERB.

Due to some confusion on the part of the Chair, this question got resolved in several pieces, which for purposes of simplicity are somewhat condensed in the following report.

(a) XML will declare a number of entities automatically.

Decision: Yes. Dissenting: Bray.

(b) Users will be able to override the predefined entities.

Decision: No. Dissenting: Sperberg-McQueen.

(Thus, processors shall behave as though declarations for the predefined entities are encountered at the end of the external DTD subset.)

(c) In addition to "lt", "amp", and "gt" (decided in a previous vote), the predefined entities shall include "quot" (for hex 22 -- same as in HTML) and "squot" (for hex 27 -- undefined in HTML).

Decision: Yes.

(d) The predefined entities shall include all those entities specified in the HTML 3.2 specification (the Latin 1 entities plus "copy", "reg", and "nbsp").

Decision: Yes. Dissenting: Bray, Clark.

(e) The predefined entities shall include all the entities recently approved by the HTML ERB for inclusion in the "Cougar" DTD. This means, basically, all of the HTML 3.2 entities plus all of the ISO entities for which characters exist in the Adobe Symbol font set, which is supported across Windows, X11, and Macintosh platforms.

Decision: Yes. Dissenting: Bray, Clark. Abstaining: Maler.

Thus, the list of ISO entities predefined in XML is as follows (list courtesy of Bob Stayton, SCO):

******list******

Incomplete DTDs


Date: Sat, 9 Nov 1996 14:58:24 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: C.16 (Attribute values if DTD not complete)

This is one of a series of reports on recent decisions of the SGML ERB.

C.16 When given an incomplete DTD, XML processors and applications may make any assumptions about the treatment of attributes and their values which are consistent with the document. They will be required neither to assume that all attributes are implicity declared CDATA, nor that attributes with names beginning IDREF, ID, ENTITY, etc. have the types IDREF, ID, ENTITY, etc.

Decision: No. After much discussion, this became:

C.16 In the absence of a declaration, attributes shall behave as if they had been declared CDATA.

Decision: Yes. Dissenting: Hollander.

(Note: The ERB intends to revisit the possibility of reserving certain attribute names such as "ID" during Phase II of this project, during which it will no doubt have to standardize other incursions into the name space in order to specify hypertext mechanisms.)

ANY-ELEMENT


Date: Sat, 9 Nov 1996 14:59:02 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Decision: D.3 (XML version of ANY)

This is one of a series of reports on recent decisions of the SGML ERB.

D.3 XML shall specify a short-hand element declaration keyword (e.g. %ANY-ELEMENT;) for element content in which any element in the DTD is legal (same as ANY, but element not mixed content).

Decision: No. Dissenting: DeRose, Hollander.

(Note: several members of the ERB feel that this should be dealt with in the SGML revision.)

Parameter Entities


Date: Sat, 9 Nov 1996 14:59:35 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Decision: D.4 (parameter entities)

This is one of a series of reports on recent decisions of the SGML ERB.

D.4 XML shall allow parameter entities and parameter-entity references.
(a) XML shall allow internal parameter entities.

Decision: Yes. Dissenting: Paoli, Bray.

(b) XML shall allow external parameter entities.

Decision: Yes. Dissenting: Paoli, Bray, DeRose.

Entities


Date: Sat, 9 Nov 1996 15:17:52 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: (Repeat) Decision: C.4 (Predefined entities)

[In my first attempt at this posting, I omitted the list of entities at the end. -- Jon]

This is one of a series of reports on recent decisions of the SGML ERB.

Due to some confusion on the part of the Chair, this question got resolved in several pieces, which for purposes of simplicity are somewhat condensed in the following report.

(a) XML will declare a number of entities automatically.

Decision: Yes. Dissenting: Bray.

(b) Users will be able to override the predefined entities.

Decision: No. Dissenting: Sperberg-McQueen.

(Thus, processors shall behave as though declarations for the predefined entities are encountered at the end of the external DTD subset.)

(c) In addition to "lt", "amp", and "gt" (decided in a previous vote), the predefined entities shall include "quot" (for hex 22 -- same as in HTML) and "squot" (for hex 27 -- undefined in HTML).

Decision: Yes.

(d) The predefined entities shall include all those entities specified in the HTML 3.2 specification (the Latin 1 entities plus "copy", "reg", and "nbsp").

Decision: Yes. Dissenting: Bray, Clark.

(e) The predefined entities shall include all the entities recently approved by the HTML ERB for inclusion in the "Cougar" DTD. This means, basically, all of the HTML 3.2 entities plus all of the ISO entities for which characters exist in the Adobe Symbol font set, which is supported across Windows, X11, and Macintosh platforms.

Decision: Yes. Dissenting: Bray, Clark. Abstaining: Maler.

Thus, the list of ISO entities predefined in XML is as follows (list courtesy of Bob Stayton, SCO):

Case Sensitivity


Date: Sat, 9 Nov 1996 15:21:04 -0800
From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
Subject: Decision: Case sensitivity

In its meeting of Thursday, November 7, the SGML ERB decided that for purposes of the XML 1.0 specification, the 8879 rules for case folding in markup shall be extended by folding lowercase to uppercase according to the Unicode mapping tables (see previous correspondence in the WG for an exhaustive list).

Abstaining: Clark, Hollander.

13 November 1996


Date: Thu, 14 Nov 96 15:55:55 CST
From: Michael Sperberg-McQueen <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: ERB discussions and decisions

The ERB met yesterday, 13 November 1996, to discuss the XML working draft and approve the distribution of the current text at SGML '96 next week. We considered a number of topics arising from the draft, some of which have already been discussed, or are still being discussed, on this list, and other of which have not received much discussion. Present: Bosak, Bray (intermittently), Clark, DeRose, Kimber, Maler, Magliery, Paoli, Sharpe, and Sperberg-McQueen. Absent: Hollander.

The author's apologies to busy members of the WG who would prefer a shorter account of the decisions; recent claims on the WG list that the ERB does not explain or discuss its decisions with the rest of the WG have led me, perhaps mischievously, to provide as full a discussion and explanation as my fingers can handle.

There's an executive summary at the end.

Given the number of major topics on which the WG appears not to have reached consensus and the volume of comment lately, it seems safe to say that some issues will require ongoing consideration and discussion, and the text of the working draft which we can distribute next week will be subject to change in non-trivial ways before we can leave this phase of the project's work behind. We considered dropping the plan to distribute printed copies at SGML '96, in order not to give a false impression of completeness. On the whole, however, the ERB thought that having printed copies available would be worthwhile, and we decided to go ahead with the plan. The cover page will, like the current Web copies, identify the document as a Working Draft, so the fact that it's not completely stable should be visible to any reader. And as the experience of the ERB and WG shows, having something that appears completed is one of the best ways to get people to read a draft and comment on it.

Since Henry Thompson raised the question directly: no, it's not too late for comments on substantive issues. The document is a Working Draft, and when the ERB stops work on it and moves to the next phase, it will still be a Working Draft until the W3C advances it to Draft Recommendation status, using the normal W3C procedures. There is some sentiment for avoiding the kind of violent swings in philosophy and technical direction that characterize some working drafts in some organizations, but in principle and in practice, working drafts are subject to change, and discussion about what changes to make is always appropriate unless the rules of the WG make it out of order (e.g. while we focus on some specific issue).

In the meantime, it is too late for typographic corrections to be included in the version distributed at SGML '96.

Other items remaining undiscussed and undecided were implicitly declared editorial questions for purposes of getting copy to the printer in time for SGML '96 distribution. The editors resisted the temptation to seize this opportunity to restore the DSD syntax for markup declarations.

The spec has now gone to the printer; the editors would like to thank those members of the WG who sent us corrections and pointed out errors. It'll be a materially more complete, correct, and less confusing document thanks to your efforts.

- C. M. Sperberg-McQueen

Summary:

11 December 1996


Date: Wed, 11 Dec 1996 12:05:51 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB discussion of public identifiers

We spent a lot of time on this question on Dec. 11th, and it is clear we need some more help from the WG.

The ERB is, by at least a substantial majority, convinced that there is a real nead for PUBLIC identifiers in XML.

The ERB is highly concerned, in at least a significant minority, about the effects of putting this facility in without specifying a resolution mechanism. Doing so would contravene one of the major design goals of XML - that any compliant XML processor should be able to read any compliant XML document.

On the other hand, there is also substantial concern about giving an unconditional blessing to any particular name resolution mechanism at this point in history.

Thus, there are a variety of options open to us.

  1. Leave it as it is
  2. Agree that we'll put PUBLIC identifiers into XML when we are ready to specify the resolution mechanism; the practical effect is almost certainly that they don't go in for now.
  3. Put a slot in the syntax for PUBLIC identifiers...

Note that there is a continuum between 3b and 3c; we could place varying strengths of recommendation behind one resolution mechanism, with homilies about document portability.

In the area of which resolution mechanism to (perhaps nonexclusively) bless, SGML/Open catalogs (hereinafter Socats) stand out, and would probably be the ERB's choice. On another hand, there has been a lot of work go into the URN effort; on another hand, that work has not yet born practical fruit in terms of ubiquitous implementations; on another hand, the FPI syntax is repellent to some and it is not clear how well it supports internationalization; on another hand, it may be the case that FPI's really are URN's as they stand.

I suspect that if a binding vote were taken today, the ERB would either (a) reinstate the PUBLIC keyword, and put in a nonexclusive recommendation for Socat support, or (b) refuse to put PUBLIC in until there was agreement on a required resolution mechanism.

Input, please.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

18 December 1996


Date: Wed, 18 Dec 1996 11:17:13 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB decisions on RS/RE and whitespace

On December 18, the ERB took up the question of RS/RE and whitespace. All members were present except James Clark and Eve Maler.

The vote in favor of the following was unanimous.

XML processors, when operating without a DTD, are required to consider all bytes that are not markup to be data and to pass them to the application. When operating with a DTD, the processor may, but is not required to, pass on to the app white space known to be insigificant because it's in element content. In the case where it passes white space on, it must also inform the app that this is element content and so cannot be significant.

The XML Specification will contain an appendix which provides a set of recommendations which, if followed by authors, will ensure that they get a parse tree that will be the same whether or not the DTD is taken into consideration. We didn't discuss the exact contents of this: it will include at least [a] no white space where it might be element content and [b] no defaulted attributes; careful attention will be required from everyone on the list to make sure we get this right.

The -XML-SPACE attribute will be retained, but its role becomes advisory; an XML processor will always pass the data as noted above, and must also pass the value of -XML-SPACE when specified. The allowed values of -XML-SPACE change to "PRESERVE" and "DEFAULT". Formally

  
   -XML-SPACE (PRESERVE|DEFAULT) #IMPLIED

PRESERVE is a signal from the author to the application that all the whitespace bytes are to be considered significant. DEFAULT means that the application's default handling is considered OK. The attribute's value is considered to be inherited by descendent elements of an element for which it's specified. For the root element, the default is DEFAULT. It is an error, in the context of a DTD, for an element with element content to have -XML-SPACE="PRESERVE".

Further discussion of these issues is unlikely to be read by anyone in their right mind.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

15 January 1997


Date: Wed, 15 Jan 1997 10:22:07 -0800
From: Tim Bray <tbray@textuality.com>
Subject: Changed comment syntax

At a meeting of the ERB, with all present except Magliery and Hollander, it was resolved unanimously to change XML comment syntax so that comments begin '<--*' and end '*-->'. We will still require, pending clarification of this by WG8, that for compatibility, neither '--' nor '--*' can appear in the body of comment; we are hopeful that the '--' restriction can be lifted soon. This will require changes to section 2.5, and to productions 19 and 21. - Tim

22 January 1997


Date: Wed, 22 Jan 1997 13:21:58 -0900
From: "W. Eliot Kimber" <eliot@isogen.com>
Subject: Relationship Taxonomy Questions

All,

The ERB would like to ask the list to address the following questions relating to relationship (link) typing.

The questions:

Assuming that there is a distinction between link behavior and the relationship types that links represent, and in particular, a distinction between behavioral "primitives" and relationship "primitives":

1. Is it necessary or useful for XML to define some finite set of well-defined relationship types or primitives?

Our presumption, as yet unproved, is that the interoperation of XML documents within some general purview (e.g., the Web, as opposed to domain-specific purviews, such as a particular intranet) requires some basic set of link types whose meaning is well defined and understood. This presumption is based in part on the opinion that typing links is in fact a useful thing to do for some types of information.

We take it as a given that the set of possible relationship types is unbounded.

2. If the answer to question 1 is "yes", what is the list of types?
3. Given such a list, A. can these types be considered to be a set supertypes from which new types may be derived? B. If so, what mechanisms could or should be used to define such a class hierarchy?
4. Is there a preferred formalism, in terms of prose rhetoric, formal notation, or both, by which the meaning of relationship types should be expressed. NOTE: this formalism cannot consist only of behavior specification (although it may include a behavior specification).

NOTE: The issue of behavior primitives is not open for discussion at this time. The behavior issue will be taken up after the base link representation syntax and link typing issues have been sufficiently resolved.

Definition of terms:

behavior
What happens when some agent interacts with the link, either directly or by interaction with one of its link ends. Behavior includes all of what happens in user interfaces, and could also include behaviors of translators, processors, query engines, etc. In the general case, behavior is not permanently (and exclusively) bound to data objects (i.e., the SGML content vs. style model). However, some element types or base element type classes may have semantics that largely or exclusively suggest a particular behavior (e.g., <font>), although it is generally regarded as poor practice for most applications (partly because implementation of the suggested behavior cannot be universally enforced).

In the SGML model, behavior can be considered an aspect of "style" or presentation and may be defined explicitly through "style sheets" or "processing specifications" or may be embedded into a particular browser or processor (e.g., HTML browsers pre CSS). In this broad definition of the term style, mechanisms such as scripts, controls, and plug-ins could all be considered aspects of style specification.

At this point we are assuming that behavior will be specified both in some normative way in an XML specification and, at user option, through some as-yet-undetermined behavior specification system or systems (e.g. "link style sheet").

relationship
A semantic association among two or more objects intended to describe the nature of the assocation. Relationship types may be thought of as analogous to element types in SGML, such that where element types classify data objects, relationships (and thus links) classify assocations among data objects. Like element types, relationship types can range from the very general ("linked") to the very specific ("Counterargument").

Our assumption is that links always represent relationships of some defined (albeit possibly very general) type. In the likely syntax design model, the link type will be named either through the element type of a link element or through an attribute that defines the type name.

Relationships are distinguished by relationship type. In addition, some relationship description models may further describe relationships by naming the roles in the relationship (e.g., the HyTime hyperlink model). As for element types, some relationship types may largely or exclusively suggest a particular behavior (e.g, <goto>). Such relationship types are poor practice only when their use fails to identify a more specific relationship type that would enhance the value of the information in the scope of its expected application and use (in other words, when you don't care about the link type, there's little value in being more specific than "link", but if your system expects and depends on typed links, you'd better type them).

Relationships may be implicit in the data structure (i.e., the hierarchical relationships defined by SGML markup) or explicit through hyperlinks or other associative systems (i.e., relational databases).

semantic
The "meaning" associated with a type. The term "semantic" is dangerous because it is overloaded and can mean different things in different contexts. In this discussion, we are trying to clearly differentiate meaning, which is abstract, from behavior, which is concrete. In general, there is a one-to-one relationship between a type and its semantic, but a one-to-many relationship between a type and its possible behaviors. In other words, a type's semantic doesn't change, but that semantic may be interpreted into specific behaviors in a variety of ways depending on the use to which the type is put or the arbitrary whim of the behavior specifier.

Cheers,

Eliot.


--
W. Eliot Kimber (eliot@isogen.com)
Senior SGML Consulting Engineer, Highland Consulting
2200 North Lamar Street, Suite 230, Dallas, Texas 75202
+1-214-953-0004 +1-214-953-3152 fax
http://www.isogen.com (work) http://www.drmacro.com (home)
"Rats in the morning, rats in the afternoon...if they don't go away, I'll be re-educated soon..." --Austin Lounge Lizards, "1984 Blues"

31 January 1997


Date: Fri, 31 Jan 1997 10:47:54 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB publishes discussion framework and vote schedule for XML-Link

The ERB met today and agreed to proceed with building the XML-Link spec as follows.

I have abstracted from the initial draft spec a set of 82 discussion points, each of which is identified with reference to language in the draft spec.

The basic idea is to proceed as we did with XML - I will email out each of these to the WG as a discussion item, requesting people, where possible, to use the subject line so created in order to keep us moving forward. Obviously, there are some cases where a set of questions is closely related; in quite a few cases, I will exercise editorial judgement and mail out a small batch of questions under a single title, where it seems sane.

These fall nicely into 5 groups, corresponding to the 5 top-level sections in the draft spec. The plan goes like this:

We have to have our draft nailed down to get distributed at WWW6, which begins April 7; which means the copy finalized basically last of March. The fact that the planned voting ends on March 12 gives us a (painfully small) bit of slack to deal with hot items or others that will undoubtedly spill out of the voting process.

To provide a larger-scale context for the discussion, I have a primitively HTML-ified version of the full question list at http://www.textuality.com/sgml-erb/xml-link-work.html

Please note that the numbers do not predispose any decisions about the final spec; they are simply a convenient mechanism for associating questions with existing spec language that gives background for them.

Stand by for batches 0 and 1.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

12 February 1997


Date: Wed, 12 Feb 1997 13:42:50 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB meeting of Feb. 12th

The ERB met Feb. 12th. Present were Bosak, Bray, Clark, DeRose, Magliery, Maler, Paoli. All decisions were unanimous.

1. The title of the spec will be "Hypertext Links in XML". There will be no new acronym, XHL or XHA or anything. The URL fragment will be WD-xml-link. The URL fragment for the XML syntax spec will be WD-xml-syntax. The URL fragment WD-xml will point to a tiny document just containing pointers to WD-xml-link, WD-xml-syntax, and presumably at least one more part in the future.

2. Links will be expressed as XML elements. We will write the spec so that the only other spec it depends on is xml-syntax. Obviously, links will be SGML elements as well.

3. We deferred the question of a link processor until we have more of the spec done; if we need to define a link processor in order to meet our specification goals, we will.

4. We deferred the question of a mechanism for signaling what link machinery is being used until we know what machinery is available.

5. We decided that formatting issues are outside the scope of XML linking, and we will neither discuss them nor provide a special attribute nor any other machinery in this specification for communicating formatting information. Note that we fully appreciate that the distinction between formatting and behavior is troublesome at best; this decision does not prejudice the possibility that XML links may contain behavior attributes and that the spec may predefine certain behaviors. In the ERB discussion it became obvious that lots more work is needed on this particular area.

6. We agreed that if we say that the links are elements and attributes, this provides all the syntax definition that we need; thus no additional BNF is required in the specification. [Ed note: Yes!]

7. We agreed that no special language is required in the spec to say that the links must be well-formed in the XML sense. While this spec is primarily designed for use in the XML domain, there seems nothing to gain in placing barriers in the way of full SGML processors that may wish to use this machinery in non-well-formed documents.

8. We spent the rest of the meeting arguing over details of terminology, without coming to a resolution; an additional meeting has been scheduled for Saturday morning [yecch] to enable us to finish working through this and the remaining 1.* questions.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

15 February 1997


Date: Sat, 15 Feb 1997 20:10:49 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB terminology votes

The ERB met Sat. Feb. 15th. Present: Bosak, Bray, DeRose, Magliery, Maler, Paoli, Sperberg-McQueen. All decisions were unanimous.

We spent most of the time on the issue of terminology detail. Although this was not articulated formally, some underlying design principles seem to have guided us:

  1. We should re-use Web terminology where appropriate (thanks to Dan for this input)
  2. We should not be afraid of lengthier English compound constructions as opposed to single words, when this makes things easier to understand and explain (thanks to Liora)
  3. We should distinguish clearly between terms for the underlying Platonic concepts and those for the syntactic constructs (thanks to Henry)

We had discovered that, even at this late date, there was still room for confusion as to which bits were which; so Steve and I, inspired by Henry, cooked up a simple picture that was very helpful:

  
<BOOK><A NAME="foo" HREF="http://x.com/y/z.html#SEC1">Click here</A></BOOK>
|------------------------------p0-----------------------------------------|
      |------------------------p1----------------------------------|
                    |----------p2-------------------|
                          |----p3------------------|
                          |----p4-------------| |p5|
 
<BOOK><SEC ID="SEC1">Thank you for clicking to get here.</SEC></BOOK>
|------------------------------q0-----------------------------------|
      |------------------------q1----------------------------|
 

1. The relationship which the "<A>" element asserts the existence of is called a "link".

There is an interesting ontological debate as to whether the link is in fact the assertion, or whether the link already existed and the linking machinery merely describes it, but it is probably not necessary to resolve this for the purposes of the spec. I will cheerfully argue this point with anyone as long as they keep buying the necessary beer. WWW theory, as pointed out by Dan Connolly, is explicit that the link is the assertion.

2. An XML or SGML element (example: p1) which serves as the syntactic expression of a link is called a "linking element".

3. A participant in a link relationship (example: q1) is called a "resource". Our definition will be very similar to the official WWW definition, found in http://www.w3.org/pub/WWW/Architecture/Terms which everyone on this list should go and read. That definition is:

an addressable unit of information or service in the Web. Examples include files, images, documents, programs, query results, etc.

In our case we should not limit it to "in the Web". Note that a resource could include the results of an SQL query, a temporally limited section of a video clip, or the invocation of a script that flushes a toilet in Tuktoyaktuk.

There is an interesting debate, in the case of the example, as to whether one or two resources are involved. Clearly, "q1" is a resource. If there another resource, it is probably the linking element itself, "p1". It is clear that in some cases (independent links or out-of-line links or whatever), a linking element need not be a resource. Unlike the ontological debate mentioned above, we are going to have to decide this one to get a clean spec.

4. A string used to specify a resource (example: p3) is called a "locator". It might be a name or an address or a query expression; one way or another it is undeniably used to locate the resource.

5. An attribute containing a locator (example: p2), is called a "locator attribute". Should we end up, in the case of multi-ended links, using subelements to hold locators, they would be called "locator elements".

Note that a few items that are labeled in the picture do not appear in this discussion. They appear because our discussion revealed that we may not be finished with the terminology battle; there may be some more concepts that are worthwhile nailing down. My next message will present these issues for further discussion.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

19 February 1997


Date: Thu, 20 Feb 1997 15:08:29 -0500
From: "Steven J. DeRose" <sjd@ebt.com>
Subject: DRAFT: Summary of ERB conference call

The following is a summary of the ERB's telephone conference of 2/19/97, ending with a question to the WG:

We considered questions 2.1 through 2.4.

2.1.a Should we allow link recognition via a reserved attribute?

The ERB is strongly leaning to yes.

2.1.b If so, should we generalize this and say that it's an AF?

2.1.c If so, should we provide an introduction to AF's?

We achieved consensus that we should provide clear self-sufficient documentation; readers should not have to understand the extended facilities annex to understand what we are talking about. We should not go further: no general introduction to AFs. We should mention at least once that it is an AF; with the reference to HyTime probably.

2.1.d If we allow such recognition, what should the attribute be, and what should be the values for each element type we define?

This remains undecided. We are leaning toward XML-Link, but the nature of values must remain uncertain until we decide about the constructs the values are naming. There is some thought of having xml-tlink and xml-ilink as attributes, whose value has all the information we need, eg something roughly like

 
  xml-tlink="url (http://www.uic.edu/orgs/tei/p3) id (foo) child (3 p)"

(This is an illustrative example ONLY. It is NOT a proposal.)

2.2.a Should we allow link recognition via a reserved GI?

This remains hotly in question; see below. It should be noted that the HyTime TC introduces some of this effect by default: a GI that matches the name of an active architectural form, defaults to being of that form.

2.2.b If so, what should the set of GI's be?

The same as the keyword values of xml-link attribute (if it has keyword values ...)

2.3 Should we provide a PI or other signaling mechanism whereby a document can specify that particular elements ought to be processed as link elements?

This remains unclear; see below. If the proposed SGML TC fails for any reason and we never have multiple attlists, this is the way to go (this appears to be the consensus, or at least the majority view).

2.4: Should we allow that processors can decide that something is a link for their own unspecified reasons, e.g. hardwired knowledge of their own private element types, or external interaction?

The consensus was that we should not forbid, discourage, encourage, or mention this more than necessary.

Overall:

We almost all agree that the best long-term solution is to allow multiple ATTLIST decls, so a document can start

  
   <!DOCTYPE tei.2 public "-//TEI//DTD P3//EN" [
   <!ATTLIST xref
             xml-link    CDATA   #fixed "xml-tlink" >
   ]>

This documents that the tei element <xref> is a tlink in XML terms.

We think we can have multiple attlists when the TC passes. It's not clear what to do in the meantime. Choices include:

1 Play it Safe. Document the magic attribute and go no further yet. To use it with validating systems, you'll have to monkey with the actual DTD. XML docs will otherwise be verbose.

After the TC passes, we'll add documentation for doing it with multiple attlists, so you don't have to monkey with the DTD.

2 Count on Utopian ATTLISTs. Document the magic attribute and the use of multiple attlists. After the TC passes, XML will be kosher. Until it passes, it's non-conforming (although individual documents can choose to be SGML-conformant or not). If the TC fails, we have to go back and add a PI.

3 Stopgap PI. If we can't stand the verbosity of Playing it Safe, and can't risk counting on Utopian ATTLISTs, we need a stopgap. The simplest seems to be to define a PI for the necessary function, e.g.

  
  <?ATTLIST xref xml-link CDATA 'xml-tlink'  ?>
or
 
  <?xml-attlist xref xml-link cdata 'xml-tlink' ?>

In the short term, we have no verbosity problems. In the long term, after the TC passes, we withdraw the PI and use only multiple attlists. If the TC fails, we change nothing.

Drawbacks: planned obsolescence of this syntax may be hard to enforce. Once we have XML legacy data, removing the PI in version 2 may be hard. (Of course, xml link won't be final until november or so, we should know by then. Until November, we should -- by definition -- not have legacy data. All xml data made before then is experimental and has no claim on our protection.)

4 GI escape hatch. Like playing it safe, but to avoid the verbosity problem we at least allow GIs to be recognized. so in the long term you can be terse with multiple attlists; in the short term, by using magic gis.

The ERB requests that the WG consider the question of how best to deal with this signalling issue, given the options and tradeoffs (and any others the WG may perceive).

Steve
(with copious thanks to Michael and his notes)

22 February 1997


Date: Mon, 24 Feb 1997 16:22:11 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB decisions on linking element recognition

The ERB met Sat. Feb. 22. Present: all but Clark and DeRose. Discussion was on the subject of recognition of linking elements. The decisions we have apparently taken do leave us with some fairly serious concerns, so we are submitting both the decisions and the concerns to the WG on the theory that someone may convince us either that the concerns are overblown, or that they are understated and that we should proceed to a fallback position.

I. Recognizing linking elements via attributes

With respect to recognition of linking elements, the ERB has consensus that the best way to do this is with reserved attributes. The attribute name should probably be "XML-LINK". There is, however, a consequence which could lead to direct conflict between the desire for operational simplicity and that for document validity.

The problem is, how to declare and provide a default value for the XML-LINK attribute?

1. The case where no markup declarations are provided: plan A: supply the attribute for each element that is a linking element.

2. The case where only the internal subset is to be provided: plan B: declare the attribute with a #FIXED default value in an <!ATTLIST in the internal subset.

Both of these are perfectly viable; plan A might be sensible even with an internal subset, if the number of linking attributes is small.

3. The case where there is an external DTD subset:

But plan C means the client has to fetch the external subset in order to get the necessary declaration. Which is a violation, we think, of our axioms. To avoid that, stick it in the internal subset. Oops, then either

Plan C1 is operationally infeasible. C2 violates another of our axioms, that XML should support operations on valid documents.

Of course, the problem goes away if 8879 is modified to remove the current prohibition on multiple <!ATTLIST declarations; and we hope that this will happen in the not-too-distant future.

However, the ERB is convinced that for Web viability, there must be a signaling mechanism that's within the document instance that gets sent down the pipe. So, if it seems that the conflict with 8879 doesn't go away, we will avail ourselves of an escape hatch; at the moment there seem to be four options:

There was sharp dissension in the ERB on what to do given the uncertainty on what WG8 will do. Several members, while respecting the integrity and appropriateness of the <!LINKTYPE technique, find the syntax, and the prospect of explaining it, repellent. Nonetheless, the <!LINKTYPE technique, if only as an interim measure, remains the choice of at least one member. The PI technique is nicer looking and easier to explain, but requires extra implementation. It also (I think) remains the first choice of one or more ERB members. Some feel that the <!ATTLIST in the subset has the advantage of requiring no extra syntax beyond that in base XML, and the problem of the conflict between manageability and maintaining SGML validity would be a non-issue, operationally. There is also concern on the part of ERB members about adopting interim measures at all, especially while there is active work going on among WG8 members in an effort to address the concerns raised by the XML work.

By a vote of 8 to 1 (Sperberg-McQueen dissenting), the ERB tentatively decided to

II. Recognizing linking elements via GI

The ERB voted as follows: In favor of allowing recognition via GI: Bray, Magliery, Maler, Kimber. Against: Bosak, Hollander, Paoli, Sharpe, Sperberg-McQueen. Thus the measure fails.

The arguments here are simple. In favor of doing this are the facts that it's easy to explain, and that this is the specified default behavior for an architecture anyhow. Against it are the benefits of having only one way to do things, with the accompanying desirable shrinkage and simplification in the specification. Given the closeness of the vote, I think I can speak for most of us in saying that it was a pretty close call, and most of the ERB, regardless of the way they voted, probably could have lived with it going either way; the arguments on both sides are palpably good, and the consequences of a wrong choice don't seem that severe.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

26 February / 1 March 1997


Date: Sat, 01 Mar 1997 18:39:22 -0800
From: Tim Bray <tbray@textuality.com>
Subject: ERB work on 3.* (Linking Elements) issues

The ERB has now put two meetings work in on this set of issues and is nowhere near done. Not surprising, given the importance of the issues. One of the factors holding us back a bit has been the fact that the discussion in the WG on the 3.* issues has been lacking in both volume and depth. Reasons for this might be (a) that the WG is tired (the ERB is), (b) that the WG is busy on other things, and (c) that the WG has substantially less experience in these issues than in those that came up in the XML language discussion.

Partially as a consequence, the following decisions include some constructs and ideas that were made up on-the-spot in the ERB without WG discussion. For these reasons, this set of decisions would benefit from particularly close review by the WG.

Unfortunately, due to workload and brain dysfunction caused by illness, I do not have ERB attendence rosters and votes. However I believe that none of the decisions below had any dissenters; if I've missed any I'd appreciate corrections from the wronged ERB members.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

3.a List link required?

3.a. The initial draft does not have any construct analogous to the "List Link" construct in Eliot's proposal. Do we need one?

The ERB detects no strong requirement to proceed on this in the near term. Once the shape of the Extended Link is better-defined, we may wish to revisit this.

3.1.a All linkage info in markup not data?

3.1.a Should we have a principle that all linkage information is encoded in GIs and/or attribute values, never in character data?

No. ERB Consensus was that this doesn't need saying and may unduly constrain design decisions.

3.1 b-h: ROLE & LOCATOR attributes

Dimension 1: Which pieces of information should be specified for possible inclusion in linking elements?

ERB consensus:

The decision on LOCATOR SCHEME is deferred until we tackle addressing

The question of whether these are optional or required remains undecided.

3.1 b-h: LABEL attributes

Dimension 1: Which pieces of information should be specified for possible inclusion in linking elements? ... e. caption ...

ERB consensus:

However, the ERB is unhappy with allowing only a simple character string; how does one support multilingual labeling, or labels which are graphic? Our provisional solution is to support both a RESOURCE LABEL attribute and/or a RESOURCE LABEL LOCATOR attribute, the latter being used to locate and retrieve a structured label. But then do we need a LABEL LOCATOR SCHEME? Or make it use the same scheme as the resource locator? We decided to defer this until we had a better understanding of linktypes and addresses.

3.2.b Locators in attributes or elements?

3.2.b Should the locators of a general link be packaged in attributes as in HyTime, or as child elements as in the initial draft?

Consensus of the ERB is to use subelements to package up locators in extended links. The simplicity of doing it all with attributes was appealing, but there are two big problems:

Eliot assures us that this is legal HyTime, given the right grove plan.

3.4 New Item: unify simple and extended links

During the meeting of March 1st, the ERB agreed to package up xlink locators in subelements. Jean Paoli pointed out [but I have agreed to bring this forward since he's on the road] that the declarations and attlists for simple and extended links are very similar; and it might be appealing to allow one locator to exist within the start-tag of an extended link, such that an extended link is just a simple link with child elements. E.g.

 
<a role="3-way" href="#lab1"><extra href="#lab2"><extra href="#lab3"></a>

One virtue of this is that we can go to the web-heads and say "not only have we given you a powerful extended link facility, but you can do it just by adding children to your existing <A> elements".

Some problems come up:

Input requested.

3.1 b-h: BEHAVIOR

Dimension 1: Which pieces of information should be specified for possible inclusion in linking elements? ... f. behavior ...

This one has been tough. Since we put in immense amounts of time, I will try to reproduce some of our discussions.

Some of the options:

  1. say nothing about behavior; leave it to the apps and to stylesheeting
  2. provide a behavior bucket; an attribute in which to pass behavior info, but specify nothing about what goes in there
  3. provide one or two attributes governing simple abstract axes of link behavior policy, with lots of room for user-agents/clients to devise mechanisms to meet the policies
  4. provide a rich, detailed, set of behavior specification rules that people such as users of EBT products, TEI, and HyTime have come to expect.

(a) and (b) seem the safest in terms of avoiding doing something really stupid. However, there was stringent opposition from those speaking on behalf of the authors, who wanted some (even if only abstract) way to signal whether a link, when activated, should cause the replacement of the current display, or transclusion behavior; they claimed that without this, there was no interoperability. It is also material that on the WWW, there are at le