Hypertext Markup Language - 2.0


There are a number of syntactic idioms that are not supported or are supported inconsistently in some historical user agent implementations. These idioms are identified in notes like this throughout this specification.


The document character set is somewhat independent of the character encoding scheme used to represent a document. For example, the `ISO-2022-JP' character encoding scheme can be used for HTML documents, since its repertoire is a subset of the [ISO-10646] repertoire. The critical distinction is that numeric character references agree with [ISO-10646] regardless of how the document is encoded.


In the interest of robustness and extensibility, there are a number of widely deployed conventions for handling non-conforming documents. See section Undeclared Markup Error Handling for details.


To support non-western writing systems, HTML user agents are encouraged to support `ISO-10646-UCS-2' or similar character encoding schemes and as much of the character repertoire of [ISO-10646] as is practical.


There are SGML mechanisms, CDATA and RCDATA declared content, that allow most `<', `>', and `&' characters to be entered without the use of entity references. Because these mechanisms tend to be used and implemented inconsistently, and because they conflict with techniques for reducing HTML to 7 bit ASCII for transport, they are deprecated in this version of HTML. See section Example and Listing: XMP, LISTING.


The SGML declaration for HTML specifies SHORTTAG YES, which means that there are other valid syntaxes for tags, such as NET tags, `<EM/.../'; empty start tags, `<>'; and empty end-tags, `</>'. Until support for these idioms is widely deployed, their use is strongly discouraged.


Some historical implementations consider any occurrence of the `>' character to signal the end of a tag. For compatibility with such implementations, when `>' appears in an attribute value, it should be represented with a numeric character reference. For example, `<IMG SRC="eq1.jpg" alt="a>b">' should be written `<IMG SRC="eq1.jpg" alt="a&#62;b">' or `<IMG SRC="eq1.jpg" alt="a&gt;b">'.


Some historical implementations allow any character except space or `>' in a name token.


Some historical implementations only understand the minimized syntax.


Some historical HTML implementations incorrectly consider any `>' character to be the termination of a comment.


If the body of a `text/html' message entity does not begin with a document type declaration, an HTML user agent should infer the above document type declaration.


The length of a title is not limited; however, long titles may be truncated in some applications. To minimize this possibility, titles should be fewer than 64 characters.


The META element should not be used where a specific element, such as TITLE, would be more appropriate. Rather than a META element with a URI as the value of the CONTENT attribute, use a LINK element.


The method by which the server extracts document meta-information is unspecified and not mandatory. The META element only provides an extensible mechanism for identifying and embedding document meta-information -- how it may be used is up to the individual server implementation and the HTML user agent.


References to the "beginning of a new line" do not imply that the renderer is forbidden from using a constant left indent for rendering preformatted text. The left indent may be constrained by the width required.


Constraints on the processing of PRE content may limit or prevent the ability of the HTML user agent to faithfully render phrase markup.


Some historical documents contain P tags in PRE elements. User agents are encouraged to treat this as a line break. A P tag followed by a newline character should produce only one line break, not a line break plus a blank line.


In a previous draft of the HTML specification, the syntax of XMP and LISTING elements allowed closing tags to be treated as data characters, as long as the tag name was not XMP or LISTING, respectively.


In a previous draft, HTML included a PLAINTEXT element that is similar to the LISTING element, except that there is no closing tag: all characters after the PLAINTEXT start-tag are data.


User agents may support the DFN element, not included in this specification, as it has been deployed to some extent. It is used to indicate the defining instance of a term, and it is typically rendered in italic or bold italic.


User agents may support some typographic elements not included in this specification, as they have been deployed to some extent. The STRIKE element indicates horizontal line through the characters, and the U element indicates an underline.


Some HTML user agents can process graphics linked via anchors, but not IMG graphics. If a graphic is essential, it should be referenced from an A element rather than an IMG element. If the graphic is not essential, then the IMG element is appropriate.


In practice, the media types of image resources are limited to a few raster graphic formats: typically `image/gif', `image/jpeg'. In particular, `text/html' resources are not intended to be used as image resources.


Use of the non-breaking space and soft hyphen indicator characters is discouraged because support for them is not widely deployed.


To support non-western writing systems, a larger character repertoire will be specified in a future version of HTML. The document character set will be [ISO-10646], or some subset that agrees with [ISO-10646]; in particular, all numeric character references must use code positions assigned by [ISO-10646].


The URI from a query form submission can be used in a normal anchor style hyperlink. Unfortunately, the use of the `&' character to separate form fields interacts with its use in SGML attribute values as an entity reference delimiter. For example, the URI `http://host/?x=1&y=2' must be written `<a href="http://host/?x=1&#38;y=2"' or `<a href="http://host/?x=1&amp;y=2">'. HTTP server implementors, and in particular, CGI implementors are encouraged to support the use of `;' in place of `&' to save users the trouble of escaping `&' characters this way.


The URL encoding may result in very long URIs, which cause some historical HTTP server implementations to exhibit defective behavior. As a result, some HTML forms are written using `METHOD=POST' even though the form submission has no side-effects.