A Complete Set of Constructs

The recommended usage is incomplete; it only includes those constructs that are easy to implement and explain. This section discusses a few more constructs that allow you to do anything that can legally be done. There are constructs beyond these, but they can all be reduced to constructs shown here.

Document Structure

The DTD actually allows a superset of the recommended (HEAD, BODY) structure. In fact, the structure of the HTML document element is just a list of elements and/or text characters.

While the content of the HTML element is somewhat arbitrary, the content of the HEAD and BODY elements is not. For example, a TITLE is not allowed in the BODY, and a DL is not allowed in the HEAD.

This structure is flexible, but it makes some wierd structures legal. There is no limit on the order or frequency of elements at the top level. So something like (ISINDEX, BODY, TITLE, HEAD, TITLE, BODY) is legal, though certainly discouraged. (read: watch out, parser implementors!)

The PLAINTEXT tag signals the end of the HTML text entity, and the beginning of a non-SGML data entity. (The format of the data is governed by the MIME text/plain content type.)

Header Elements

TITLE

The title can have an '<' character, as long as it's not followed by a '/' and a letter. See the section on SGML delimiters in RCDATA.

Body Elements

Note that conforming SGML parsers will treat &, <, </, and <! as normal text characters when they are not followed by a letter. HTML producers are discouraged from taking advantage of this feature.

The normal text content of body elements may include several kinds of markup.

A comment that you shouldn't see: For copyrights, RCS keywords, etc.

Inline Markup

We can mix classical TeX emphasis, typewriter, and the TeXinfo strong emphasis all together in one ransom note -- er, paragraph.

Use the expression variable := f(z); to compute, for example, this sample text.

Then type this keyboard text . The excellent book Dirk Gently's Wholistic Detective Agency chronicles the use of raw bold, italic, and markup.

ISO Latin-1 Characters

The HTML DTD references the public text "ISO 8879:1986//ENTITIES Added Latin 1//EN" to define entities for latin-1 characters, for example Gödel was a famous mathemetician. See also: all the Latin 1 entities.

Anchors

Order and Apperance of Attributes

name implied

HREF implied

HREF before name

Note: Interpretation of Literals

Section 7.9.3 of the SGML standard states

An attribute value literal is interpreted as an attribute value by replacing references within it, ignoring Ee and RS, and replacing RE or SEPCHAR with SPACE.

For the SGML-impared, Ee is Entity End (like EOF); RS is '\n'; RE is '\r'; SEPCHAR is '\t' and SPACE is ' '.

Note that this creates an interaction between the syntax of URLs and SGML syntax. We could resolve this issue by removing '&' from the URL syntax .

Headings

Six levels of headings are defined:

Level four heading

Another level four heading. It's long. It's only conventional and suggested that lines be less than 72 characters long. It's certainly not specified, defined, or required.

Level five heading

Level six heading

Paragraphs

Normal paragraphs consist of text consisting of words, sentences, and other stuff. Line breaks are not significant. This is still the first paragraph of this section.

Here's the second paragraph. It's long. It's only conventional and suggested that lines be less than 72 characters long. It's certainly not specified, defined, or required.

A P tag isn't needed between a paragraph and some other element, like a heading.

Ordered lists

These are for things like lists of steps, where the order is significant.

This is the first item of an unordered list.
This is the second item. It's kinda long, and should wrap around on most screens.
This is the third item.
This is the fourth and final item.

Case of names is not significant: different cases

Case of names is not significant: both lower case

PRE

Anything you could put on a typewriter (or an ASCII display
device, more precicesly) can be represented in a PRE
element:

Tags: <start> </end>
Character references: &#60; &#38;

Tables made from tabs:

col 1	col 2	col 3	col 4
1		3	4
	2	3	4
1	2	3	4

Plus, you can use hypertext links.

Linebreaks _are_ significant. There should be three blank lines from here



to here.

The ASCII Horizontal Tab (HT) character should be interpreted as the smallest positive nonzero number of spaces which will leave the number of characters so far on the line as a multiple of 8. Its use is not recommended however.

Literal Text Elements

XMP and LISTING

These elements are used when you want to type the characters into the source document and have them show up in the output just like you typed them.

These elements act much like the PRE element, but only entity references and end tags are recognized.

You can draw pictures /\ in example elements / \ see: \__/ This is literal text. THIS word should line up under THIS word. There should be exactly three blank lines between here and here. These elements are the source of the most errors in HTML implementations. They should be used only for simple examples that don't contiain SGML markup constructs. Comment declaration as data follows:  Markup declaration as data follows: <!this would be an markup delcaration, which would be an error in PCDATA. It's data in RCDATA.> Start tag follows: <start> tags are fine! </end> tags can be done with entities. & as long as it's not followed by a letter or '#', it's fine! &# is even ok, unless it's followed by a letter or a number. Tabs in XMP content: This is literal text with tabs. THESE words should line up under THESE words.