]> Addressing in SGML/HTML style sheets

Addressing in SGML/HTML style sheets

A style sheet is a collection of rules, that assign properties to specific elements in the SGML tree. The order of rules is unimportant. Each rule consists of three or four parts:

A priority (optional): how important is this rule
A selector: to what element instance(s) does the rule apply
A property: what aspect of the lay out is specified
A value: what value to assign to the property

The four parts are recognizable by their punctuation, e.g.:

  !prio selector : property = value
  (element selector :prio prio '(property value))

Some `syntactic sugar' can be added to this, for example to allow more than one property/value pair to be entered after a single selector:

  !prio selector : property = value; property = value
  (element selector :prio prio '((property value) (property value))

Conceptually, every element has all properties. If there is no explicit rule te assign a value to a property, then the value comes either from the parent of that element (inheritance) or is the default value. Which scheme is used depends on the property (see the list of properties).

Restrictions

The selector selects elements based on their GI and attributes, on their context: ancestors and elder siblings in the element tree. To make matching a selector to an element instance efficient, some restrictions are necessary:

The parser should need only a stack to represent the currently open elements. A full parse tree containing already closed elements should not be necessary.
Information about elder siblings of each open element need to be available only up to a certain fixed depth, so that a fixed amount of space can be reserved for this information in the stack frame.
To match a selector against an element and its context, a deterministic algorithm should suffice; backtracking should not be needed.
No knowledge of the DTD is assumed.

One concrete choice of restrictions could be the following:

Elements and their ancestors can be selected with GIs or GIs plus attributes. (Question: just attributes, without a GI, is also possible, but less efficient; do we need it?)
Wildcards can be used to match exactly one or any number of elements in the current context.
Every element in the context can be further qualified with the GI of its immediate predecessor (elder sibling). That is: zero or one sibling, and then only the GI, not its attributes.
The style sheet may have to help the parser to recognize empty elements, since there seems to be no other place to store this information. (Question: the parser may also have trouble recognizing CDATA and RCDATA, can we require of the SGML document that such contents do not occur?)
See `SGML-Lite' for a DTD-less subset of SGML.

Syntax

A possible syntax for the selector could be the following, where names stand for GIs, bracketed parts are attributes, and the asterisk is a wildcard to match zero or more elements.

  *                                        (1)
  * E1 E2                                  (2)
  * E3 * E4                                (3)
  E5 * E6                                  (4)
  * E7 *                                   (5)
  * E8 [ attr1 = val1 ]                    (6)
  * E9 [ attr2 = val2 ] [ attr3 = val3 ]   (7)
  * /E10 E11/                              (8)
  * /E12 E13/ [ attr4 = val4 ]             (9)
  * E14 * /E15 E16/                       (10)
  * E17 ? E18                             (11)
  * E18 [ attr5 ]                         (12)
  * /E19/                                 (13)

Whitespace can be omitted as long as no ambiguities can arise. Names that contain whitespace or punctuation must be quoted. The details of the syntax need to be worked out.

(1) Matches every element; (2) matches every E2 element that is a child of an E1; (3) matches every E4 that is a descendant of an E3; (4) matches every E6 that is part of a document with an E5 as root; (5) matches every element that has an E7 as ancestor; (6) matches every E8 thas has an attribute attr1 with value val1; (7) matches every E9 thas has an attribute attr2 with value val2 and an attribute attr3 with value val3; (8) matches an E11 if it immediately follows an E10; (9) matches an E13 if has an attribute attr4 with value val 4 and immediately follows an E12; (10) matches an E16 if it immediately follows an E15 and both have an ancestor E14; (11) matches an E18 if it is a grandchild of an E17; (12) matches an E18 if it has an attribute attr5 with any value; (13) matches an E19 that has no elder siblings.

Some attributes are more likely to be needed in the selector than others. In particular, SGML formats that use architectural forms or pseudo-architectural forms (e.g., the `CLASS' attribute in HTML) are prime candidates for style sheet based formatting. Some syntactic sugar can be added to make specifying those attributes more convenient.

One possibility is to allow the style sheet language to be parametrized with the name of the attribute that is to be treated specially. For example:

  @archform CLASS
  [... and further down:]
  * DIV@ABSTRACT

is a convenience form for the longer selector:

  * DIV[CLASS=ABSTRACT]

Attributes that are declared as IDs are also likely to play special roles in selecting elements. It may be necessary to provide a short hand notation for such attributes, although it is a problem that ID's can be stored in attributes of different names and even several times in one element.

(Minor) priorities

It is possible that several rules with the same (major) priority match the same element instance, e.g., *P and *P[KEY], or *P and *DIV P. The intuitive rule is that more specific selectors have priority. In these examples that rule is clear enough, but in more complex selectors there seems to be no obvious priority. One method to disambiguate rules is the following:

replace in the selector every * with a `1'
replace every occurrence of [...] with a `2'
replace every GI with a `3'
interpret the result as a number (in base-4 notation or larger).

Higher numbers have higher priority. This has the effect of giving longer selectors higher priority. It also assigns higher priority to ``DIV*P'' then to ``*DIV P'', and higher priority to ``DIV P'' than to ``P[key]''.

Formal syntax

Here is a formal CFG for the selectors, defining the syntax used above.

selector: : STAR [anchored_element element*]?
| anchored_element element*
;
element: : [STAR | QUESTIONMARK]? anchored_element
;
anchored_element: : NAME attr_selectors
| pair attr_selectors
;
pair: : SLASH NAME NAME? SLASH
;
attr_selectors: : LBRACK NAME [EQUALS attr_value]? RBRACK
| ATSIGN attr_value
;
attr_value: : NAME
| NUMBER
| STRING
;

The terminals of this grammar are:

STAR = \*
QUESTIONMARK = \?
LBRACK = \[
RBRACK = \]
NAME = [A-Za-z][A-Za-z0-9_.-]*
NUMBER = [0-9]+|[0-9]*\.[0-9]+
EQUALS = \=
SLASH = \/
STRING = \".*\" | \'.*\'
ATSIGN = \@

(Back) to style sheet overview

Bert Bos, 30 May 1995