The recommended usage is
incomplete; it only includes those constructs that are easy to
implement and explain. This section discusses a few more constructs
that allow you to do anything that can legally be done. There are constructs beyond these, but they
can all be reduced to constructs shown here.
The DTD actually allows a superset of the recommended (HEAD, BODY)
structure. In fact, the structure of the HTML document element is just
a list of elements and/or text characters.
While the content of the HTML element is somewhat arbitrary, the
content of the HEAD and BODY elements is not. For example, a TITLE
is not allowed in the BODY, and a DL is not allowed in the HEAD.
This structure is flexible, but it makes some wierd structures legal.
There is no limit on the order or frequency of elements at the top
level. So something like (ISINDEX, BODY, TITLE, HEAD, TITLE, BODY) is
legal, though certainly discouraged. (read: watch out, parser
implementors!)
The PLAINTEXT tag signals the end of the HTML text entity, and
the beginning of a non-SGML data entity. (The format of the data
is governed by the MIME text/plain content type.)
Header Elements
The title can have an '<' character, as long as it's not followed
by a '/' and a letter. See the
section on SGML delimiters in RCDATA.
Body Elements
Note that conforming SGML parsers will treat &,
<, </, and <! as
normal text characters when they are not followed by a letter. HTML
producers are discouraged from taking advantage of this feature.
The normal text content of body elements may include several kinds of
markup.
A comment that you shouldn't see: For copyrights, RCS keywords, etc.
Inline Markup
We can mix classical TeX emphasis, typewriter,
and the TeXinfo strong emphasis all together in one
ransom note -- er, paragraph.
Use the expression variable := f(z);
to
compute, for example, this sample text.
Then type this keyboard text . The excellent book Dirk Gently's Wholistic Detective
Agency chronicles the use of raw bold, italic,
and markup.
ISO Latin-1 Characters
The HTML DTD references the public text "ISO 8879:1986//ENTITIES Added
Latin 1//EN" to define entities for latin-1 characters, for example
Gödel was a famous mathemetician. See also: all the Latin 1 entities.
Anchors
Order and Apperance of Attributes
name implied
HREF implied
HREF before name
Note: Interpretation of Literals
Section 7.9.3 of the SGML standard states
- An attribute value literal is interpreted as an attribute value by
replacing references within it, ignoring Ee and RS, and replacing RE
or SEPCHAR with SPACE.
For the SGML-impared, Ee is Entity End (like EOF); RS is '\n'; RE is
'\r'; SEPCHAR is '\t' and SPACE is ' '.
Note that this creates an interaction between the syntax of URLs and
SGML syntax. We could resolve this issue by removing '&' from the
URL syntax
.
Headings
Six levels of headings are defined:
Level four heading
Another level four heading. It's long. It's only conventional and suggested that lines be less than 72 characters long. It's certainly not specified, defined, or required.
Level five heading
Level six heading
Paragraphs
Normal paragraphs consist of text consisting of words, sentences, and
other stuff.
Line breaks are not significant.
This is still the first
paragraph of this section.
Here's the second paragraph. It's long. It's only conventional and suggested that lines be less than 72 characters long. It's certainly not specified, defined, or required.
A P tag isn't needed between a paragraph and some other element, like
a heading.
Ordered lists
These are for things like lists of steps, where the order is
significant.
- This is the first item of an unordered list.
- This is the second item. It's kinda long, and should wrap around
on most screens.
- This is the third item.
- This is the fourth and final item.
Case of names is not significant: different cases
Case of names is not significant: both lower case
PRE
Anything you could put on a typewriter (or an ASCII display
device, more precicesly) can be represented in a PRE
element:
Tags: <start> </end>
Character references: < &
Tables made from tabs:
col 1 col 2 col 3 col 4
1 3 4
2 3 4
1 2 3 4
Plus, you can use hypertext links.
Linebreaks _are_ significant. There should be three blank lines from here
to here.
The ASCII Horizontal Tab (HT) character should be interpreted as the
smallest positive nonzero number of spaces which will leave the number
of characters so far on the line as a multiple of 8. Its use is not
recommended however.
Literal Text Elements
XMP and LISTING
These elements are used when you want to type the characters into the
source document and have them show up in the output just like you
typed them.
These elements act much like the PRE element, but only entity
references and end tags are recognized.
You can draw pictures /\
in example elements / \
see: \__/
This is literal text. THIS word
should line up under THIS word.
There should be exactly three blank lines between here
and here.
These elements are the source of the most errors in HTML
implementations. They should be used only for simple examples that
don't contiain SGML markup constructs.
Comment declaration as data follows:
Markup declaration as data follows:
Start tag follows:
tags are fine! </end> tags can be done with entities.
& as long as it's not followed by a letter or '#', it's fine!
is even ok, unless it's followed by a letter or a number.
Tabs in XMP content:
This is literal text with tabs. THESE words
should line up under THESE words.