From HTML WG Wiki
Jump to: navigation, search

Eliminating authoring distinctions between block and non-block semantics

also to permit varied content models within the Q (or another) element

Corresponds to issue number 45

Problem statements / use cases

  • Authors need a richer element for conveying paragraphs: one that can contain other structural / grouping elements.
  • Ideally the P element should not have an implicit close tag, but that cannot be accomplished without significantly breaking existing content
  • The current draft (and the HTML 4.01 recommendation) do not really provide clear and presentation independent content models and element kinds. The current HTML5 draft often slips between discussing content models and element kinds where it should keep these distinct. Basically, we have three distinct content models: "block", "structured inline" and "strictly inline". An element's kind is largely dependent upon the content model the element supports. On the other hand, an element's content model is also dependent upon the context in which it appears. This two-way dependency should be made more clear.
  • HTML currently borrows from the HTML 4.01 recommendation in defining content models and element kind. This relies too much upon a language involved with presentation and therefore tends to muddy the distinctions between content model and element kinds on one hand and their display-model and display-role on the other hand. While these are likely related — i.e., CSS3 display-role and display-modlel will have some dependencies on element kind and content model — they should be made more distinct in the draft.
  • Quotations are one a few examples of semantics elements that must be expressed across two separate elements due to the dependency of presentation on element kind and in turn the dependency of element kind on content state. In this way authors must concern themselves with not only marking up a quotation, but also with whether the quoted material contains structural element content or simply a sub-paragraph (inline or phrase) level of element content. This has often led to confusion among authors who may want a quotation to have a block display role (due to a phrase quotation exceeding a certain number of words) as well as using BLOCKQUOTE for its semantic purpose of marking up a quotation with a block display model (to use CSS3 terminology on this which should have some relation to the more semantic content model/ element kind definitions).
  • the original decision in HTML to allow the omission of the closing tag for the P element implies that Q and BLOCKQUOTE are both necessary when HTML could otherwise get by with only the Q element (this is likewise a problem for any semantic element that can potentially be used within a P element and that might also need to contain P elements or other structures)

Dependency of element kind on element content model/state

This table illustrates the dependencies of CSS display-model, CSS display-role and HTML5 element kind on the content model and content state needed by an author. In other words this table stars from the left-side with a particular authors needs for semantic markup of a quotation and also the content model and content state (including number of words) of that quotation. From there it moves to the right determining the element kind. Moving to the next pair of columns it indicates the CSS display-role that can sometimes clash with the semantics expressed in HTML (and leads to 'misuse' of the BLOCKQUOTE element). Finally, the last column pair shows the display-model that is very dependent on the semantics in an HTML5 document.

Quotation dependencies → element kind display-role display-model
content state ↓ structured phrase block inline block-inside inline-inside
strictly inline (phrase) Q Q Q
strictly inline (phrase) (> n words) Q BLOCKQUOTE Q

The distinctions illustrated here for Q and BLOCKQUOTE could be repeated for othere elements that share semantics, but whose content models can differ greatly (SPAN/DIV, CODE/BLOCKCODE, MAP). Yet other elements typically have a display-role of block (or always start a new line anyway) and so no attempt is made within the semantic markup to capture this distinction (e.g. P, LI, DD, TD, TH, PRE, DETAILS). Still other elements might benefit from the flexibility to have alternate content models if the display dependencies could be handled appropriately (without need for redundant semantic markup).

Proposed solutions

Add a type attribute to the DIV element

DIV elements are used for three broad purposes in HTML documents:

  • as a paragraph with text content (significant characters and text elements)
  • as a section element (in the absence of a dedicated section element)
  • as a component element, defining different logical segments of a web application that may have some presentational or page layout implications

HTML5 (as well as XHTML2) introduce new elements and modified content models to largely eliminate the need for DIV except for the last case of a web component element. However, while new content models for P and SECTION elements will work in XML serializations automatically and for UAs updated to handle the text/html serialization of HTML5, these will not work in legacy browsers and authors cannot start using them until all non-HTML5 UAs are no longer part of the author’s targeted UAs.

By adding a type attribute and allowing authors to use the DIV element as a synonymous substitute element, authors can start taking advantage of these features of HTML5 immediately. Such a use for the DIV element would also facilitate easy straightforward conversion of the serialization to HTML5 or XHTML5 at any time.

<div type='p' >
As we can see from this extended quotation, this is clearly not the authors intent:

<div type='p'>
<div type='p'>

For an author only targeting XML or HTML5 aware text/html processing UAs the same document fragment could be authored as:

As we can see from this extended quotation, this is clearly not the authors intent:


As one can see from this example, the need for the BLOCKQUOTE element is completely obviated from a semantic point of view. It is clear from the contents of the element that the quotation involves block content. Requiring authors to provide this information redundantly by using a separate element is unnecessary.

The only use for BLOCKQUOTE that remains is in its inappropriate use as a styling mechanism for text quotes (not involving block/structure/grouping elements) but exceeding a particular number of words as many style guides and publishers direct.

Recommend authors use DIV type=p for text/html serialization

This proposal considers whether UAs may be able to lift the need for authors to markup different quotations with different markup depending on the content of the quotation: especially since the display-role presentation of the quotation may depend on different circumstances than the content model distinction between Q and BLOCKQUOTE (a distinction that really is redundant and adds nothing to the markup than the quotation markup itself).

In the XML serialization deprecate the BLOCKQUOTE element and recommend authors only use the Q element with a new content model of: inline or block but not both.


  • rework content model and element kind (especially their names) to be less presentation oriented
  • adds 'contentModel' DOM attribute to HTMLElement interface
  • adds 'threshold' and 'words' to HTMLQuoteElement interface
  • liaison with CSS3 WG

In summary, the proposal constitutes one new DOM method and one new DOM attribute on a new HTMLQuoteElement interface. Second, the proposal adds a new DOM attribute to the HTMLElement interface.

contentModel DOM attribute added to HTMLElement interface

The contentModel DOM attribute on the HTMLElement interface would return the state of an element's content. Return values are either 'phrase', 'text' or 'structured' (either a integers 0, 1, 2 respectively or as those actual strings). This would provide a convenience method for DOM API consumers to easily, consistently and accurately evaluate the content state of an element. For element's whose content model can vary, this would provided information that the element name can not provide. For example, the name Q and BLOCKQUOTE unambiguously indicate the content state of an element that is conforming: i.e., 'phrase' for Q and 'structural' for BLOCKQUOTE. However, for the P element in a XML de-serialized DOM, the element can either be 'phrase' or 'text' depending on the situation. {NOTE: these return values could be changed depending on the categories established in our CR HTML5 draft}

New HTMLQuoteElement interface with threshold DOM method and words DOM attribute

 interface HTMLQuoteElement : HTMLElement {
        attribute DOMString cite;
 +      boolean threshold (unsigned long words);
 +	attribute unsigned long words;
 +	attribute DOMString marks;
 +	attribute DOMString pages;
 +	attribute DOMString annotation;

The threshold method returns true if the contents innerText method of the quotation element exceeds the unssinged long words argument passed to the interface. Otherwise it returns false.

Liaison with CSS3 WG

Liaison with CSS3 WG to provide pseudo-class selectors to distinguish between elements based on their content state and to distinguish between quotation elements based on their word length. The CSS WG has come up with some fairly ingenious mechanisms that are both simple and elegant to solve similar problems. It would be nice to be able to specify a word length property through CSS and then specify a pseudo-class selector select for styling only those quotations exceeding (or not exceeding) that number of words.

Discussion and Evaluation


WG members should post feedback and other discussion to the WG’s list serve (the URI for the links below provides date information). Search on this email subject.

Originally introduced in a review of the HTML5 draft section on phrase elements.

  1. Review by Rob Burns

Discussion of alternate proposal on wai-xtech list serve with the subject: "Alternate Additional Attribute Set for a Single Quote Element"

See also