Bert Bos | Selectors
This isn't a definition of the class of document formats that CSS can be applied to, just a set of heuristics to guide the design of the language.
For a while it looked that XSL could cover the needs of complex documents and it allowed us to refuse features that would have made CSS difficult to use.
But the lack of a standard language for describing GUIs on the one hand and the current lack of resources to develop XSL-FO 2 on the other has meant that the pressure on CSS has become high to add features for GUIs and for complex layout. The goal to keep CSS usable for the normal Web author seems to be largely abandoned. How to fix that situation is currently unknown.
In SGML, XML and HTML5, the element type serves as the type of a node. The attributes are the attributes that CSS use, with any values normalized (as per SGML/XML rules) and represented as Unicode strings.
In HTML/HTML5, MathML and SVG, the classes are given by the class attribute. In other formats, they may be specified in other ways, or not exist.
In HTML/HTML5, MathML and SVG, the unique name is given by the ID attribute. In other formats it may be specified differently, or be absent.
The namespace wasn't part of the original model. It was added in 1999, when XML Namespaces were added to XML. As far as CSS is concerned, a namespace is an arbitrary string. It may be specified with the url() notation, because in XML it is a URL, but CSS never dereferences it and doesn't care whether it is a valid URL or not.
The named links are proposed (for level 4 of CSS) to correspond to IDREF attributes in SGML and XML (and the equivalent in HTML5). The reason is that the LABEL element in HTML uses an IDREF to link itself to an INPUT element and thus it would be nice to style the INPUT that belongs to a certain LABEL, or all LABELs the point to a certain INPUT. These slides do not treat this proposed feature.
Text nodes are leaf nodes. They have no further substructure and they cannot be selected with selectors. Of course, for the formatting part of CSS they do have some structure, even structure that may overlap the tree structure, in the form of lines, words and bidi-fragments. But that is outside the scope of these slides.
CSS is not like Perl or other tools to transform documents: it doesn't parse any text itself to create the parse tree. The input is abstract. It's not text, it's a tree, without any concrete representation.
CSS thus doesn't have to deal with concrete syntaxes or with parse errors. How a document is converted to a tree is out of scope.
Note that XPath (and thus XSL) gives access to much more of the XML “Infoset.” CSS ignores most of SGML and XML and assumes its input is a simple tree. Such a tree can be made from an SGML or XML document in a fairly obvious way, but also from other kinds of formats.
In particular, the “fairly obvious way” includes expanding entities, ignoring processing instructions, ignoring comments, etc.
Attributes still play a role in selectors, although you cannot select them by themselves: they can be used to distinguish elements from one another.
CSS was never meant for all SGML documents, or to make use of all possible information in an SGML document. We only planned to use a subset of the information that SGML provided (see below).
I personally expected the Web to use only a subset of the capabilities of SGML, able to be parsed without the need for out-of-band information about the concrete syntax. I called my proposal for such a subset “SGML-lite.” (It wasn't a complete specification, it proposed some goals and a few alternatives for the concrete syntax.)
When later other people took the initiative to define such a format and make it a standard, under the name of XML, SGML-Lite was one of the inputs. Which meant that CSS could support XML right away. Only XML Namespaces, which were added to XML a little later, had not been foreseen and required an addition to the CSS model (see below).
We have ideas for relying less on magic for the document semantics. E.g., we could use XLink, HLink, or CSS properties to indicate which elements are links (hyperlinks or tranclusions, such as images). But then selectors such as ':link' become difficult to define…
Apart from '~' these are all old and well-known, so I'll not say any more about them here. The '~' is the generalization of the '>': not just the immediately following sibling, but any following sibling.
For '>' (and for the pseudo-classes in the next slides), only
elements are counted. Intervening text nodes do not matter. Thus
the EM is the immediately following sibling of the SPAN in …
<span>word</span> between
<em>words</em>…
In most modern programming languages, white space is not
significant, other than that it is sometimes necessary to separate
tokens: if true then
needs spaces, because
iftruethen
would be a single token. But otherwise you
can omit it: a := b + 7
is the same as
a:=b+7
.
Not so in CSS selectors: H2.sub
is a conjunction
(an element with type H2 and class sub), while H2
.sub
is a descendant selector (an element with class sub
that is a descendant of an element of type H2). Programmers often
complain about this.
But it was a conscious decision: there are far fewer programmers that other people…
(There are other places in CSS where the syntax doesn't follow
recent tradition of programming languages: font-family
accepts font names with and without quotes, and the white space is
added in the obvious way. And grid templates also mix quotes
strings and bare identifiers: flow: c
refers to slot c
in grid: "a b c".
)
nth-child
is explained below.
nth-of-type is typically used together with a type selector:
dd:nth-of-type(2)
Some attributes may be defined as case-insensitive by the document format, but for those that are not, it may still be useful to match them case-insensitively, e.g., if they contain human-readable text.
These missing selectors are missing in part because they are difficult for the average user. Until we have a new, easy-to-use style sheet language, we may want to hold off from adding these.
Language can come, e.g., from protocol headers and be overridden by attributes (such as xml:lang)
If you follow a link to somewhere in the middle of another document, then you can give the element that you jumped to a special style by means of ':target'.
Of course, you can also jump within the same document, e.g., from the table of contents to a section. Some tricks rely on jumping within a document to change the style of the document with every click, such as showing and hiding tabbed cards. (But hopefully one day we'll have ways to do such style changes directly, without the limitations of this trick.)
CSS selectors don't have fully general boolean logic. There is a top-level OR (the comma) only. And the NOT only applies to the simple selectors. There is a proposal for level 4 to allow a kind of parentheses to have an OR inside a selector, and even inside a NOT.
Negative selectors are difficult to use, especially in a
contextual selector (a selector that includes ancestors, siblings
or descendants). Often the easier way is to style all except for a
specific one. But sometimes that is not possible, and the
:not()
is the only way.
The example is one such. It selects elements in German of which the parent is not in German. Listing all possible languages that are not German is impossible if you do not know in advance which languages are being used. Which is the case here, because this is from a stye sheet for a set of pages with a growing number of translations. Adding a style rule whenever a new translation is added would be tedious.
Note also how this uses both the :lang()
selector
(to match the parent, whose language need not come from an
attribute, but could be inherited), and the |=
operator to check the LANG attribute. In this case, the
[lang|=de]
could actually have been
:lang(de)
as well.
:checked obviously needs a checkbox in the document, :target only needs a link (which can be made to look like a button, e.g.)
But :target has the disadvantage that every activation is added to the history. Going back then doesn't go back to the previous document, but reopens or closes the text.
Both have to be preceding siblings of the text to hide (or of an ancestor of the text to hide), because in level 3 there are no selectors that can go “back up” the tree.
To understand how the CSS selectors work and what the different kinds of punctuation mean, it helps to know the original and current goals of CSS. Although the selectors can be (and indeed are) used without CSS, they were designed to be easy and compact for applying certain kinds of style to certain kinds of documents. Extension mechanisms built in to the syntax allow other uses, but they aren't necessarily as compact or easy to read.
The high quality is a function of the implementation. We don't expect the author to be a typographer, he just selects some fonts and margins. The UAs task is to do the best it can with the author's hints. The reader's task is then to select the UA that satisfies his needs best. UAs may offer excellent typography that is suitable for printing, high speed, automatic table of content, transposing tables, advanced search, user style sheets, hypertext features, intra- and inter-document navigation, etc. CSS is just a style sheet language, not a typesetting system.
That typography follows the tree structure, even in simple documents, is a simplicfication that has many exceptions. We decided to ignore most of them.
::first-line
and::first-letter
are some that we did not ignore. On the other hand, we ignored the rule in some typographical traditions (American, but not French) that punctuation should be in the same style as the word that precedes it. (One way to deal with this is to include a transformation step in the formatting, e.g., by means of XSLT.)The limitation to simple layout is necessary, because complex layouts are almost certainly difficult to make, especially on the Web, where you don't know the reader's window size.
We hoped HTML would have a long life, at least 50, if not 100 years, but we couldn't be sure of that. It might be that CSS outlived HTML. And besides, there are other useful document formats (TEI, DocBook, etc.). And so CSS should not be bound too tightly to HTML, in as far that was possible without making it too difficult too use.
For complex documents, I expected one of two things to happen: either we would learn enough from CSS in a five or ten years to be able to make a language that was as easy to use, but also allowed complex layouts, or, more likely, we would make two new languages, an easy one for normal users and an advanced one suitable for complex documents.
What happened was that we made XSL (consisting of XSLT and XSL-FO). It had the right model, based on DSSSL, for complex layouts. But we never replaced CSS. And XSL, although very successful in the printing industry, never became popular for online or interactive documents. That had consequences for CSS, see below.