`SGML-Lite' is a name given to a set of conventions that, when applied to an SGML document, enable a parser to extract all information contained in that document from the document instance alone, without reading the DTD or the document subset.
The creator of an SGML-Lite document must make sure that certain features of SGML are not used. When the document is created `by hand', that may mean that an SGML normalizer, like spam by James Clark, is used to insert omitted tags, etc. A document created by an SGML application is usually already normalized.
An SGML-Lite document is also an SGML document, so it can still be used in SGML-conformant applications. SGML-Lite only restricts the way the markup can be formatted, but it doesn't affect the document structure or contents. In other words, the ESIS (Element Structure Information Set) may be represented in a certain concrete syntax, but not in others.
Fortunately, nearly all SGML documents can be reformatted in this way. Only documents that use LINK, CONCUR or SUBDOC cannot be represented in SGML-Lite, but these features are are rarely used anyway.
These are the restrictions that make an SGML document SGML-Lite conformant:
The character set must be Unicode. Note that Latin 1, as a subset of Unicode, is also acceptable. To be precise, the SGML declaration must contain (or contains implicitly):
CHARSET BASESET "ISO 10646:199?//CHARSET ..." DESCSET 0 9 unused 9 2 9 11 2 unused 13 1 13 14 18 unused 32 95 32 127 65408 127
or a more restrictive set.
The syntax of delimiters must be as in the Reference Concrete Syntax (RCS). That means that '<' and '>' delimit tags, `&' and `;' delimit entities, etc. The SGML declaration for the document must contain:
SYNTAX PUBLIC "ISO 8879:1986//SYNTAX Core//EN"
or, equivalently (SGML, clause 14):
SYNTAX SHUNCHARS CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 255 BASESET "ISO 646:1983//CHARSET International Reference Version (IRV)//ESC 2/5 4/0" DESCSET 0 128 0 FUNCTION RE 13 RS 10 SPACE 32 TAB SEPCHAR 9 NAMING LCNMSTRT "" UCNMSTRT "" LCNMCHAR "-." UCNMCHAR "-." NAMECASE GENERAL YES ENTITY NO DELIM GENERAL SGMLREF SHORTREF SGMLREF NAMES SGMLREF QUANTITY SGMLREF
Except that QUANTITY need not be obeyed by the document. (This means that an SGML-Lite application won't know if a document is too large withgout actually trying, but this is seldom a problem.)
The document may not use DATATAG and RANK. OMITTAG and SHORTTAG may be used with some restrictions:
LINK, CONCUR, and SUBDOC may not be used:
LINK SIMPLE NO IMPLICIT NO EXPLICIT NO OTHER CONCUR NO SUBDOC NO
Ignorable whitespace in content may not be used, except for some RS/RE's as noted below. In other wosrd, all whitespace in an SGML-Lite document will always be data characters. The only exception is that RS characters may occur in content and are always ignored, and some ignorable RE characters (a subset of those that are ignored under clause 7.6.1 of SGML) may occur. In SGML-Lite, clause 7.6.1 simplifies to:
There is still the problem of recognizing empty elements. There are several possible soulitions:
(Back) to style sheet overviewBert Bos, 4 July 1995