]>
`SGML-Lite' is a name given to a set of conventions that, when applied to an SGML document, enable a parser to extract all information contained in that document from the document instance alone, without reading the DTD or the document subset.
The creator of an SGML-Lite document must make sure that certain features of SGML are not used. When the document is created `by hand', that may mean that an SGML normalizer, like spam by James Clark, is used to insert omitted tags, etc. A document created by an SGML application is usually already normalized.
An SGML-Lite document is also an SGML document, so it can still be used in SGML-conformant applications. SGML-Lite only restricts the way the markup can be formatted, but it doesn't affect the document structure or contents. In other words, the ESIS (Element Structure Information Set) may be represented in a certain concrete syntax, but not in others.
Fortunately, nearly all SGML documents can be reformatted in this way. Only documents that use LINK, CONCUR or SUBDOC cannot be represented in SGML-Lite, but these features are are rarely used anyway.
These are the restrictions that make an SGML document SGML-Lite conformant:
The character set must be Unicode. Note that Latin 1, as a subset of Unicode, is also acceptable. To be precise, the SGML declaration must contain (or contains implicitly):
    CHARSET
    BASESET "ISO 10646:199?//CHARSET ..."
    DESCSET
      0 9 unused     9 2 9      11 2 unused     13 1 13
      14 18 unused   32 95 32   127 65408 127
  
  or a more restrictive set.
The syntax of delimiters must be as in the Reference Concrete Syntax (RCS). That means that '<' and '>' delimit tags, `&' and `;' delimit entities, etc. The SGML declaration for the document must contain:
    SYNTAX PUBLIC "ISO 8879:1986//SYNTAX Core//EN"
  
  or, equivalently (SGML, clause 14):
    SYNTAX
    SHUNCHARS CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
      16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 255
    BASESET "ISO 646:1983//CHARSET International Reference
      Version (IRV)//ESC 2/5 4/0"
    DESCSET 0 128 0
    FUNCTION RE 13 RS 10 SPACE 32 TAB SEPCHAR 9
    NAMING LCNMSTRT "" UCNMSTRT "" LCNMCHAR "-." UCNMCHAR "-."
      NAMECASE GENERAL YES ENTITY NO
    DELIM GENERAL SGMLREF SHORTREF SGMLREF
    NAMES SGMLREF
    QUANTITY SGMLREF
  
  Except that QUANTITY need not be obeyed by the document. (This means that an SGML-Lite application won't know if a document is too large withgout actually trying, but this is seldom a problem.)
The document may not use DATATAG and RANK. OMITTAG and SHORTTAG may be used with some restrictions:
LINK, CONCUR, and SUBDOC may not be used:
    LINK SIMPLE NO IMPLICIT NO EXPLICIT NO
    OTHER CONCUR NO SUBDOC NO
  Ignorable whitespace in content may not be used, except for some RS/RE's as noted below. In other wosrd, all whitespace in an SGML-Lite document will always be data characters. The only exception is that RS characters may occur in content and are always ignored, and some ignorable RE characters (a subset of those that are ignored under clause 7.6.1 of SGML) may occur. In SGML-Lite, clause 7.6.1 simplifies to:
There is still the problem of recognizing empty elements. There are several possible soulitions:
(Back) to style sheet overview
Bert Bos, 4 July 1995