HTML 4.01 Test Suite - Assertions
Testable Assertions: Section 9 Text
9 Text - Paragraphs, Lines, and Phrases
(informative) The document character set includes a wide variety of white space Characters. Many of these are typographic elements used in some applications to produce particular visual spacing effects. In HTML, only the following characters are defined as white space characters:
ASCII space ( )
ASCII tab (	)
ASCII form feed (
Zero-width space (​)
(informative) Line breaks are also white space characters. Note that although
are defined in [ISO10646] to unambiguously separate lines and paragraphs, respectively, these do not constitute line breaks in HTML, nor does this specification include them in the more general category of white space characters
(informative) For all HTML elements except PRE, sequences of white space separate "words" (we use the term "word" here to mean "sequences of non-white space characters").
(should) When formatting text, user agents should identify the sequences of non-white space characters and lay them out according to the conventions of the particular written language (script) and target medium.
(may) This layout may involve putting space between words (called inter-word space), but conventions for inter-word space vary from script to script.
(may) A sequence of white spaces between words in the source document may result in an entirely different rendered inter-word spacing (except in the case of the PRE element).
(should) User agents should collapse input white space sequences when producing output inter-word space. This can and should be done even in the absence of language information.
(must) EM: Indicates emphasis. Phrase elements add structural information to text fragments. Start tag and end tag are required.
(must) STRONG: Indicates stronger emphasis. Phrase elements add structural information to text fragments. Start tag and end tag are required.
(must) CITE: Contains a citation or a reference to other sources. Phrase elements add structural information to text fragments. Start tag and end tag are required.
(must) DFN: Indicates that this is the defining instance of the enclosed term. Phrase elements add structural information to text fragments. Start tag and end tag are required.
(must) CODE: Designates a fragment of computer code. Phrase elements add structural information to text fragments. Start tag and end tag are required.
(must) SAMP: Designates sample output from programs, scripts, etc. Phrase elements add structural information to text fragments. Start tag and end tag are required.
(must) KBD: Indicates text to be entered by the user. Phrase elements add structural information to text fragments. Start tag and end tag are required.
(must) VAR: Indicates an instance of a variable or program argument. Phrase elements add structural information to text fragments. Start tag and end tag are required.
(must) ABBR: Indicates an abbreviated form (e.g., WWW, HTTP, URI, Mass., etc.). Phrase elements add structural information to text fragments. Start tag and end tag are required.
(must) ACRONYM: Indicates an acronym (e.g., WAC, radar, etc.). Phrase elements add structural information to text fragments. Start tag and end tag are required.
(must) EM and STRONG are used to indicate emphasis. The other phrase elements have particular significance in technical documents.
(should) The presentation of phrase elements depends on the user agent. Generally, visual user agents present EM text in italics and STRONG text in bold font.
(may) Speech synthesizer user agents may change the synthesis parameters, such as volume, pitch and rate accordingly.
(informative) The ABBR and ACRONYM elements allow authors to clearly indicate occurrences of abbreviations and acronyms. Western languages make extensive use of acronyms such as "GmbH", "NATO", and "F.B.I.", as well as abbreviations like "M.", "Inc.", "et al.", "etc.". Both Chinese and Japanese use analogous abbreviation mechanisms, wherein a long name is referred to subsequently with a subset of the Han characters from the original occurrence. Marking up these constructs provides useful information to user agents and tools such as spell checkers, speech synthesizers, translation systems and search-engine indexers.
(may) The title attribute of these elements may be used to provide the full or expanded form of the expression.
(informative) Abbreviations and acronyms often have idiosyncratic pronunciations.
(must) Start tags and end tags for quotations are required.
(should) BLOCKQUOTE and Q: cite = uri [CT].
The value of this attribute is a URI that designates a source document or message. This attribute is intended to give information about the source from which the quotation was borrowed.
(should) Visual user agents generally render BLOCKQUOTE as an indented block.
(must) Visual user agents must ensure that the content of the Q element is rendered with delimiting quotation marks. Authors should not put quotation marks at the beginning and end of the content of a Q element.
(should) User agents should render quotation marks in a language-sensitive manner (see the lang attribute). Many languages adopt different quotation styles for outer and inner (nested) quotations, which should be respected by user-agents.
(should) Since the language of both quotations is American English, user agents should render them appropriately, for example with single quote marks around the inner quotation and double quote marks around the outer quotation:
(should) It is recommended that style sheet implementations provide a mechanism for inserting quotation marks before and after a quotation delimited by BLOCKQUOTE in a manner appropriate to the current language context and the degree of nesting of quotations.
(should) However, as some authors have used BLOCKQUOTE merely as a mechanism to indent text, in order to preserve the intention of the authors, user agents should not insert quotation marks in the default style.
(must) Start tags and End tags for subscripts and superscripts are required.
(should) Many scripts (e.g., French) require superscripts or subscripts for proper rendering. The SUB and SUP elements should be used to markup text in these cases.
(must) Start tags for paragraphs are required. End tags for paragraphs are optional.
(must) The P element represents a paragraph. It cannot contain block-level elements (including P itself).
(should) Authors are discouraged from using empty P elements. User agents should ignore empty P elements.
(must) A line break is defined to be a carriage return (
), a line feed (
), or a carriage return/line feed pair. All line breaks constitute white space.
(must) Start tags for controlling line breaks are required. End tags for controlling line breaks are forbidden.
(must) The BR element forcibly breaks (ends) the current line of text.
(must) For visual user agents, the clear attribute can be used to determine whether markup following the BR element flows around images and other objects floated to the left or right margin, or whether it starts after the bottom of such objects.
(should) With respect to bidirectional formatting, the BR element should behave the same way the [ISO10646] LINE SEPARATOR character behaves in the bidirectional algorithm.
(must) Those browsers that interpret soft hyphens must observe the following semantics: If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character.
(should) For operations such as searching and sorting, the soft hyphen should always be ignored.
(must) The plain hyphen is represented by the "-" character (- or -). The soft hyphen is represented by the character entity reference ­ (­ or ­)
(must) Start tags and End tags for preformatted text are required.
(must)(deprecated) PRE: width = number [CN]
Deprecated. This attribute provides a hint to visual user agents about the desired width of the formatted block. The user agent can use this information to select an appropriate font size or to indent the content appropriately. The desired width is expressed in number of characters. This attribute is not widely supported currently.
(must) The PRE element tells visual user agents that the enclosed text is "preformatted".
(may) When handling preformatted text, visual user agents:
1. May leave white space intact.
2. May render text with a fixed-pitch font.
3. May disable automatic word wrap.
4. Must not disable bidirectional processing.
(may) Non-visual user agents are not required to respect extra white space in the content of a PRE element.
(informative) The DTD fragment above indicates which elements may not appear within a PRE declaration. This is the same as in HTML 3.2, and is intended to preserve constant line spacing and column alignment for text rendered in a fixed pitch font.
(should) The horizontal tab character (decimal 9 in [ISO10646] and [ISO88591] ) is usually interpreted by visual user agents as the smallest non-zero number of spaces necessary to line characters up along tab stops that are every 8 characters.
(should) Using horizontal tabs in preformatted text is strongly discouraged since it is common practice, when editing, to set the tab-spacing to other values, leading to misaligned documents.
(should) How paragraphs are rendered visually depends on the user agent. Paragraphs are usually rendered flush left with a ragged right margin. Other defaults are appropriate for right-to-left scripts.
(may) HTML user agents have traditionally rendered paragraphs with white space.
(should) Following the precedent set by the NCSA Mosaic browser in 1993, user agents generally don't justify both margins, in part because it's hard to do this effectively without sophisticated hyphenation routines. The advent of style sheets, and anti-aliased fonts with subpixel positioning promises to offer richer choices to HTML authors than previously possible.
(should) Style sheets provide rich control over the size and style of a font, the margins, space before and after a paragraph, the first line indent, justification and many other details. The user agent's default style sheet renders P elements in a familiar form.
(should) By convention, visual HTML user agents wrap text lines to fit within the available margins. Wrapping algorithms depend on the script being formatted.
(should) In Western scripts, for example, text should only be wrapped at white space. Early user agents incorrectly wrapped lines just after the start tag or just before the end tag of an element, which resulted in dangling punctuation.
(must) Start tags and End tags for INS and DEL elements are required.
(must)INS and DEL: cite = uri [CT].
The value of this attribute is a URI that designates a source document or message. This attribute is intended to point to information explaining why a document was changed.
(must)INS and DEL: datetime = datetime [CS].
The value of this attribute specifies the date and time when the change was made.
(may) INS and DEL are unusual for HTML in that they may serve as either block-level or inline elements (but not both). They may contain one or more words within a paragraph or contain one or more block-level elements such as paragraphs, lists and tables.
(must) The INS and DEL elements must not contain block-level content when these elements behave as inline elements.
(should) User agents should render inserted and deleted text in ways that make the change obvious.