6 Basic HTML data types

Contents

This section of the specification describes the basic data types that may appear as an element's content or an attribute's value.

For introductory information about reading the HTML DTD, please consult the SGML tutorial.

6.1 Case information

Each attribute definition includes information about the case-sensitivity of its values. The case information is presented with the following keys:

CS: The value is case-sensitive (i.e., user agents interpret "a" and "A" differently).
CI: The value is case-insensitive (i.e., user agents interpret "a" and "A" as the same).
CN: The value is not subject to case changes, e.g., because it is a number or a character from the document character set.
CA: The element or attribute definition itself gives case information.
CT: Consult the type definition for details about case-sensitivity.

If an attribute value is a list, the keys apply to every value in the list, unless otherwise indicated.

6.2 SGML basic types

The document type declaration specifies the syntax of HTML element content and attribute values using SGML tokens (e.g., PCDATA, CDATA, NAME, ID, etc.). Information on what these tokens mean and how user agents should interpret them may be found in [GOLD90].

The following is a summary of key information:

CDATA is a sequence of characters from the document character set and may include character entities. User agents should interpret attribute values as follows:
- Replace character entities with characters,
- Ignore line feeds,
- Replace each carriage return or tab with a single space.
User agents may ignore leading and trailing white space in CDATA attribute values (e.g., " myval " may be interpreted as "myval"). Authors should not declare attribute values with leading or trailing white space.
For some HTML 4.0 attributes with CDATA attribute values, the specification imposes further constraints on the set of legal values for the attribute that may not be expressed by the DTD.
Although the STYLE and SCRIPT elements use CDATA for their data model, for these elements, CDATA must be handled differently by user agents. Markup and entities must be treated as raw text and passed to the application as is. The first occurrence of the character sequence "</" (end-tag open delimiter) is treated as terminating the end of the element's content. In valid documents, this would be the end tag for the element.
ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").
IDREF and IDREFS are references to ID tokens defined by other attributes. IDREF is a single token and IDREFS is a space-separated list of tokens.
NUMBER tokens must contain at least one digit ([0-9]).

6.3 Text strings

A number of attributes (%Text; in the DTD) take text that is meant to be "human readable". For introductory about attributes, please consult the tutorial discussion of attributes.

6.4 URLs

This specification uses the term URL for the general case of resource identifiers called "URI" in [RFC1630], including the term "URL" as defined in [RFC1738] and [RFC1808], and the term "URN" as defined in [RFC2141].

URLs are represented in the DTD by the parameter entity %URL.

URLs in general are case-sensitive. There may be URLs, or parts of URLs, where case doesn't matter (e.g., machine names), but identifying these may not be easy. Users should always consider that URLs are case-sensitive.

Relative URLs are resolved to full URLs using a base URL. [RFC1808] defines the normative algorithm for this process. For more information about base URLs, please consult the section on base URLs in the chapter on links.

Please consult the appendix for information about representing non-ASCII characters in URLs.

6.5 Colors

The attribute value type "color" (%Color) refers to color definitions as specified in [SRGB]. A color value may either be a hexadecimal number (prefixed by a hash mark) or one of the following sixteen color names. The color names are case-insensitive.

**Color names and sRGB values**
	Black = "#000000"		Green = "#008000"
	Silver = "#C0C0C0"		Lime = "#00FF00"
	Gray = "#808080"		Olive = "#808000"
	White = "#FFFFFF"		Yellow = "#FFFF00"
	Maroon = "#800000"		Navy = "#000080"
	Red = "#FF0000"		Blue = "#0000FF"
	Purple = "#800080"		Teal = "#008080"
	Fuchsia = "#FF00FF"		Aqua = "#00FFFF"

Thus, the color values "#800080" and "Purple" both refer to the color purple.

6.5.1 Notes on using colors

Although colors can add significant amounts of information to document and make them more readable, please consider the following guidelines when including color in your documents:

The use of HTML elements and attributes for specifying color is deprecated. You are encouraged to use style sheets instead.
Don't use color combinations that cause problems for people with color blindness in its various forms.
If you use a background image or set the background color, then be sure to set the various text colors as well.
Colors specified with the BODY and FONT elements and bgcolor on tables look different on different platforms (e.g., workstations, Macs, Windows, and LCD panels vs. CRTs), so you shouldn't rely entirely on a specific effect. In the future, support for the [SRGB] color model together with ICC color profiles should mitigate this problem.
When practical, adopt common conventions to minimize user confusion.

6.6 Lengths

HTML specifies three types of length values for attributes:

Pixel: The value (%Pixels in the DTD) is integer that represents the number of pixels of the canvas (screen, paper). Thus, the value "50" means fifty pixels. For normative information about the definition of a pixel, please consult [CSS1].
Length: The value (%Length in the DTD) may be either a %Pixel; or a percentage of the available horizontal or vertical space. Thus, the value "50%" means half of the available space.
MultiLength: The value (%MultiLength in the DTD) may be a %Length; or a relative length. A relative length has the form "i*", where "i" is an integer. When allotting space among elements competing for that space, user agents allot pixel and percentage lengths first, then divide up remaining available space among relative lengths. Each relative length receives a portion of the available space that is proportional to the integer preceding the "*". The value "*" is equivalent to "1*". Thus, if 60 pixels of space are available after the user agent allots pixel and percentage space, and the competing relative lengths are 1*, 2*, and 3*, the 1* will be alloted 10 pixels, the 2* will be alloted 20 pixels, and the 3* will be alloted 30 pixels.

Length values are case-neutral

6.7 Content types (MIME types)

Note. A media type (defined in [RFC2045] and [RFC2046]) specifies the nature of a linked resource. This specification employs the term content type rather than media type in accordance with current usage.

This type is represented in the DTD by %ContentType.

Content types are case-insensitive.

Examples of content types include "text/html", "image/png", "image/gif", "video/mpeg", "audio/basic", "text/tcl", "text/javascript", and "text/vbscript". For the current list of registered MIME types, please consult [MIMETYPES].

Note. The content type "text/css", while not current registered with IANA, should be used when the linked resource is a [CSS1] style sheet.

6.8 Language codes

The value of attributes whose type is a language code (%LanguageCode in the DTD) refers to a language code as specified by [RFC1766]. For information on specifying language codes in HTML, please consult the section on language codes. Whitespace is not allowed within the language-code.

6.9 Character encodings

The "charset" attributes (%Charset in the DTD) refer to a character encoding as described in the section on character encodings. Values must be strings (e.g., "euc-jp") from the IANA registry (see [CHARSETS] for a complete list). Names for character encodings are case-insensitive.

User agents must follow the steps set out in the section on specifying character encodings in order to determine the character encoding of an external resource.

6.10 Single characters

Certain attributes call for single character from the document character set. These attributes take the %Character type in the DTD.

Single characters may be specified with character references (e.g., "&").

6.11 Dates and times

[ISO8601] allows many options and variations in the representation of dates and times. The current specification uses one of the formats described in the profile [DATETIME] for its definition of legal date/time strings (%Datetime in the DTD).

The format is:

  YYYY-MM-DDThh:mm:ssTZD

where:

     YYYY = four-digit year
     MM   = two-digit month (01=January, etc.)
     DD   = two-digit day of month (01 through 31)
     hh   = two digits of hour (00 through 23) (am/pm NOT allowed)
     mm   = two digits of minute (00 through 59)
     ss   = two digits of second (00 through 59)
     TZD  = time zone designator

The time zone designator is one of:

Z: indicates UTC (Coordinated Universal Time). The "Z" must be upper case.
+hh:mm: indicates that the time is a local time which is hh hours and mm minutes ahead of UTC.
-hh:mm: indicates that the time is a local time which is hh hours and mm minutes behind UTC.

Exactly the components shown here must be present, with exactly this punctuation. Note that the "T" appears literally in the string (it must be upper case), to indicate the beginning of the time element, as specified in [ISO8601]

If a generating application does not know the time to the second, it may use the value "00" for the seconds (and minutes and hours if necessary).

Note. [DATETIME] does not address the issue of leap seconds.

6.12 Link types

Authors may use the following recognized link types, listed here with their conventional interpretations. These are defined as being case-insensitive, i.e., "Alternate" has the same meaning as "alternate". In the DTD, %LinkTypes refers to a space-separated list of link types. White space characters are not permitted within link types.

User agents, search engines, etc. may interpret these link types in a variety of ways. For example, user agents may provide access to linked documents through a navigation bar.

Alternate: Designates substitute versions for the document in which the link occurs. When used together with the lang attribute, it implies a translated version of the document. When used together with the media attribute, it implies a version designed for a different medium (or media).
Stylesheet: Refers to an external style sheet. See the section on external style sheets for details. This is used together with the link type "Alternate" for user-selectable alternate style sheets.
Start: Refers to the first document in a collection of documents. This link type tells search engines which document is considered by the author to be the starting point of the collection.
Next: Refers to the next document in an linear sequence of documents. User agents may choose to preload the "next" document, to reduce the perceived load time.
Prev: Refers to the previous document in an ordered series of documents. Some user agents also support the synonym Previous.
Contents: Refers to a document serving as a table of contents. Some user agents also support the synonym ToC (from "Table of Contents").
Index: Refers to a document providing an index for the current document.
Glossary: Refers to a document providing a glossary of terms that pertain to the current document.
Copyright: Refers to a copyright statement for the current document.
Chapter: Refers to a document serving as a chapter in a collection of documents.
Section: Refers to a document serving as a section in a collection of documents.
Subsection: Refers to a document serving as a subsection in a collection of documents.
Appendix: Refers to a document serving as an appendix in a collection of documents.
Help: Refers to a document offering help (more information, links to other sources information, etc.)
Bookmark: Refers to a bookmark. A bookmark is a link to a key entry point within an extended document. The title attribute may be used, for example, to label the bookmark. Note that several bookmarks may be defined in each document.

Authors may wish to define additional link types not described in this specification. If they do so, they should use a profile to cite the conventions used to define the link types. Please see the profile attribute of the HEAD element for more details.

For further discussions about link types, please consult the section on links in HTML documents.

6.13 Media descriptors

The following is a list of recognized media descriptors (%MediaDesc in the DTD).

screen: Intended primarily for non-paged computer screens, but also applicable to printed and projected presentations.
tty: Intended for media using a fixed-pitch character grid, such as teletypes, terminals, or portable devices with limited display capabilities.
tv: Intended for television-type devices (low resolution, color, limited scrollability).
projection: Intended for projectors.
handheld: Intended for handheld devices (small screen, monochrome, bitmapped graphics, limited bandwidth).
print: Intended for paged, opaque material and for documents viewed on screen in print preview mode.
braille: Intended for braille tactile feedback devices.
aural: Intended for speech synthesizers.
all: Suitable for all devices.

Future versions of HTML may introduce new values and may allow parameterized values. To facilitate the introduction of these extensions, user agents conforming to this specification must be able to parse the media attribute value as follows:

Comma characters (Unicode decimal 44) are used to break the media attribute value into a list of entries, e.g.:
```
media="screen, 3d-glasses, print and resolution > 90dpi"
```
is mapped to:
```
"screen"
"3d-glasses"
"print and resolution > 90dpi"
```
Each entry is truncated just before the first character that isn't a US ASCII letter [a-zA-Z] (Unicode decimal 65-90, 97-122), digit [0-9] (Unicode hex 30-39), or hyphen (45). In the example, this gives:
```
"screen"
"3d-glasses"
"print"
```
A case-sensitive match is then made with the set of media types defined above. User agents may ignore entries that don't match. In the example we are left with screen and print.

Note. Style sheets may include media-dependent variations within them (e.g., the CSS @media construct). In such cases it may be appropriate to use "media=all".

6.14 Script data

The content of the SCRIPT element and the value of intrinsic event attributes is script data (indicated by %Script; in the DTD). As such, this data must not be evaluated by the user agent as HTML markup. The user agent must pass it on as data to a script engine. The case-sensitivity of script data depends on the scripting language.

HTML parsers must be able to recognize script data as beginning immediately after the start tag and ending as soon as the ETAGO ("</") delimiters are followed by a name character ([a-zA-Z]). The script data does not necessarily end with the </SCRIPT> end tag, but is terminated by any "</" followed by a name character.

Consequently, any HTML markup that is meant to be sent to a script engine (which may do whatever it wants with the markup) must be "escaped" so as not to confuse the HTML parser. Designers of each scripting language should recommend language-specific support for resolving this issue.

ILLEGAL EXAMPLE:
The following code is invalid due the to presence of the "</" characters found, as part of "</EM>", inside the SCRIPT element:

    <SCRIPT type="text/javascript">
      document.write ("<EM>This won't work</EM>")
    </SCRIPT>

A conforming parser must treat the "</" characters as the end of script data, which is clearly not what the author intended.

In JavaScript, this code can be expressed legally by ensuring that the apparent ETAGO delimiter does not appear immediately before an SGML name start character:

    <SCRIPT type="text/javascript">
      document.write ("<EM>This will work<\/EM>")
    </SCRIPT>

In Tcl, one may accomplish this as follows:

    <SCRIPT type="text/tcl">
      document write "<EM>This will work<\/EM>"
    </SCRIPT>

In VBScript, the problem may be avoided with the Chr() function:

    "<EM>This will work<" & Chr(47) & "EM>"

6.15 Frame target names

Except for the reserved names listed below, frame target names (%FrameTarget; in the DTD) must begin with an alphabetic character (a-zA-Z). User agents should ignore all other target names.

The following target names are reserved and have special meanings.

_blank: The user agent should load the designated document in a new, unnamed window.
_self: The user agent should load the document in the same frame as the element that refers to this target.
_parent: The user agent should load the document into the immediate FRAMESET parent of the current frame. This value is equivalent to _self if the current frame has no parent.
_top: The user agent should load the document into the full, original window (thus cancelling all other frames). This value is equivalent to _self if the current frame has no parent.