There are various places in HTML that accept particular data types, such as dates or numbers. This section describes what the conformance criteria for content in those formats is, and how to parse them.
The space characters, for the purposes of this specification, are U+0020 SPACE, "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), and "CR" (U+000D).
The White_Space characters are
  those that have the Unicode property "White_Space" in the Unicode
  PropList.txt data file. [UNICODE]
This should not be confused with the "White_Space"
  value (abbreviated "WS") of the "Bidi_Class" property in the Unicode.txt data file.
A number of attributes are boolean attributes. The presence of a boolean attribute on an element represents the true value, and the absence of the attribute represents the false value.
If the attribute is present, its value must either be the empty string or a value that is an ASCII case-insensitive match for the attribute's canonical name, with no leading or trailing whitespace.
The values "true" and "false" are not allowed on boolean attributes. To represent a false value, the attribute has to be omitted altogether.
Here is an example of a checkbox that is checked and disabled.
   The checked and disabled attributes are the
   boolean attributes.
<label><input type=checkbox checked name=cheese disabled> Cheese</label>
This could be equivalently written as this:
<label><input type=checkbox checked=checked name=cheese disabled=disabled> Cheese</label>
You can also mix styles; the following is still equivalent:
<label><input type='checkbox' checked name=cheese disabled=""> Cheese</label>
Some attributes are defined as taking one of a finite set of keywords. Such attributes are called enumerated attributes. The keywords are each defined to map to a particular state (several keywords might map to the same state, in which case some of the keywords are synonyms of each other; additionally, some of the keywords can be said to be non-conforming, and are only in the specification for historical reasons). In addition, two default states can be given. The first is the invalid value default, the second is the missing value default.
If an enumerated attribute is specified, the attribute's value must be an ASCII case-insensitive match for one of the given keywords that are not said to be non-conforming, with no leading or trailing whitespace.
When the attribute is specified, if its value is an ASCII case-insensitive match for one of the given keywords then that keyword's state is the state that the attribute represents. If the attribute value matches none of the given keywords, but the attribute has an invalid value default, then the attribute represents that state. Otherwise, if the attribute value matches none of the keywords but there is a missing value default state defined, then that is the state represented by the attribute. Otherwise, there is no default, and invalid values must be ignored.
When the attribute is not specified, if there is a missing value default state defined, then that is the state represented by the (missing) attribute. Otherwise, the absence of the attribute means that there is no state represented.
The empty string can be a valid keyword.
A string is a valid integer if it consists of one or more characters in the range ASCII digits, optionally prefixed with a U+002D HYPHEN-MINUS character (-).
A valid integer without a "-" (U+002D) prefix represents the number that is represented in base ten by that string of digits. A valid integer with a "-" (U+002D) prefix represents the number represented in base ten by the string of digits that follows the U+002D HYPHEN-MINUS, subtracted from zero.
A string is a valid non-negative integer if it consists of one or more characters in the range ASCII digits.
A valid non-negative integer represents the number that is represented in base ten by that string of digits.
A string is a valid floating point number if it consists of:
A valid floating point number represents the number obtained by multiplying the significand by ten raised to the power of the exponent, where the significand is the first number, interpreted as base ten (including the decimal point and the number after the decimal point, if any, and interpreting the significand as a negative number if the whole string starts with a "-" (U+002D) character and the number is not zero), and where the exponent is the number after the E, if any (interpreted as a negative number if there is a "-" (U+002D) character between the E and the number and the number is not zero, or else ignoring a "+" (U+002B) character between the E and the number if there is one). If there is no E, then the exponent is treated as zero.
The Infinity and Not-a-Number (NaN) values are not valid floating point numbers.
A valid list of integers is a number of valid integers separated by U+002C COMMA characters, with no other characters (e.g. no space characters). In addition, there might be restrictions on the number of integers that can be given, or on the range of values allowed.
In the algorithms below, the number of days in month month of year year is: 31 if month is 1, 3, 5, 7, 8, 10, or 12; 30 if month is 4, 6, 9, or 11; 29 if month is 2 and year is a number divisible by 400, or if year is a number divisible by 4 but not by 100; and 28 otherwise. This takes into account leap years in the Gregorian calendar. [GREGORIAN]
The digits in the date and time syntaxes defined in this section must be characters in the range ASCII digits, used to express numbers in base ten.
A month consists of a specific proleptic Gregorian date with no time-zone information and no date information beyond a year and a month. [GREGORIAN]
A string is a valid month string representing a year year and month month if it consists of the following components in the given order:
A date consists of a specific proleptic Gregorian date with no time-zone information, consisting of a year, a month, and a day. [GREGORIAN]
A string is a valid date string representing a year year, month month, and day day if it consists of the following components in the given order:
A yearless date consists of a month and a day, but with no associated year.
A string is a valid yearless date string representing a month month and a day day if it consists of the following components in the given order:
In other words, if the month is
  "02", meaning February, then the day can be
  29, as if the year was a leap year.
A time consists of a specific time with no time-zone information, consisting of an hour, a minute, a second, and a fraction of a second.
A string is a valid time string representing an hour hour, a minute minute, and a second second if it consists of the following components in the given order:
The second component cannot be 60 or 61; leap seconds cannot be represented.
A local date and time consists of a specific proleptic Gregorian date, consisting of a year, a month, and a day, and a time, consisting of an hour, a minute, a second, and a fraction of a second, but expressed without a time zone. [GREGORIAN]
A string is a valid local date and time string representing a date and time if it consists of the following components in the given order:
A string is a valid normalized local date and time string representing a date and time if it consists of the following components in the given order:
A time-zone offset consists of a signed number of hours and minutes.
A string is a valid time-zone offset string representing a time-zone offset if it consists of either:
A "Z" (U+005A) character, allowed only if the time zone is UTC
Or, the following components, in the given order:
This format allows for time-zone offsets from -23:59 to +23:59. In practice, however, the range of offsets of actual time zones is -12:00 to +14:00, and the minutes component of offsets of actual time zones is always either 00, 30, or 45.
See also the usage notes and examples in the global date and time section below for details on using time-zone offsets with historical times that predate the formation of formal time zones.
A global date and time consists of a specific proleptic Gregorian date, consisting of a year, a month, and a day, and a time, consisting of an hour, a minute, a second, and a fraction of a second, expressed with a time-zone offset, consisting of a signed number of hours and minutes. [GREGORIAN]
A string is a valid global date and time string representing a date, time, and a time-zone offset if it consists of the following components in the given order:
Times in dates before the formation of UTC in the mid twentieth century must be expressed and interpreted in terms of UT1 (contemporary Earth solar time at the 0° longitude), not UTC (the approximation of UT1 that ticks in SI seconds). Time before the formation of time zones must be expressed and interpeted as UT1 times with explicit time zones that approximate the contemporary difference between the appropriate local time and the time observed at the location of Greenwich, London.
The following are some examples of dates written as valid global date and time strings.
0037-12-13 00:00Z"1979-10-14T12:00:00.001-04:00"8592-01-01T02:09+02:09"Several things are notable about these dates:
T" is replaced by a space, it
    must be a single space character. The string "2001-12-21  12:00Z" (with two spaces
    between the components) would not be parsed successfully.A string is a valid normalized forced-UTC global date and time string representing a date, time, and a time-zone offset if it consists of the following components in the given order:
A week consists of a week-year number and a week number representing a seven-day period starting on a Monday. Each week-year in this calendaring system has either 52 or 53 such seven-day periods, as defined below. The seven-day period starting on the Gregorian date Monday December 29th 1969 (1969-12-29) is defined as week number 1 in week-year 1970. Consecutive weeks are numbered sequentially. The week before the number 1 week in a week-year is the last week in the previous week-year, and vice versa. [GREGORIAN]
A week-year with a number year has 53 weeks if it corresponds to either a year year in the proleptic Gregorian calendar that has a Thursday as its first day (January 1st), or a year year in the proleptic Gregorian calendar that has a Wednesday as its first day (January 1st) and where year is a number divisible by 400, or a number divisible by 4 but not by 100. All other week-years have 52 weeks.
The week number of the last day of a week-year with 53 weeks is 53; the week number of the last day of a week-year with 52 weeks is 52.
The week-year number of a particular day can be different than the number of the year that contains that day in the proleptic Gregorian calendar. The first week in a week-year y is the week that contains the first Thursday of the Gregorian year y.
A string is a valid week string representing a week-year year and week week if it consists of the following components in the given order:
A duration consists of a number of seconds.
Since months and seconds are not comparable (a month is not a precise number of seconds, but is instead a period whose exact length depends on the precise day from which it is measured) a duration as defined in this specification cannot include months (or years, which are equivalent to twelve months). Only durations that describe a specific number of seconds can be described.
A string is a valid duration string representing a duration t if it consists of either of the following:
A literal U+0050 LATIN CAPITAL LETTER P character followed by one or more of the following subcomponents, in the order given, where the number of days, hours, minutes, and seconds corresponds to the same number of seconds as in t:
One or more digits followed by a U+0044 LATIN CAPITAL LETTER D character, representing a number of days.
A U+0054 LATIN CAPITAL LETTER T character followed by one or more of the following subcomponents, in the order given:
This, as with a number of other date- and time-related microsyntaxes defined in this specification, is based on one of the formats defined in ISO 8601. [ISO8601]
One or more duration time components, each with a different duration time component scale, in any order; the sum of the represented seconds being equal to the number of seconds in t.
A duration time component is a string consisting of the following components:
Zero or more space characters.
One or more digits, representing a number of time units, scaled by the duration time component scale specified (see below) to represent a number of seconds.
If the duration time component scale specified is 1 (i.e. the units are seconds), then, optionally, a "." (U+002E) character followed by one, two, or three digits, representing a fraction of a second.
Zero or more space characters.
One of the following characters, representing the duration time component scale of the time unit used in the numeric part of the duration time component:
Zero or more space characters.
This is not based on any of the formats in ISO 8601. It is intended to be a more human-readable alternative to the ISO 8601 duration format.
A string is a valid date string with optional time if it is also one of the following:
A simple color consists of three 8-bit numbers in the range 0..255, representing the red, green, and blue components of the color respectively, in the sRGB color space. [SRGB]
A string is a valid simple color if it is exactly seven characters long, and the first character is a "#" (U+0023) character, and the remaining six characters are all in the range ASCII digits, U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F, U+0061 LATIN SMALL LETTER A to U+0066 LATIN SMALL LETTER F, with the first two digits representing the red component, the middle two digits representing the green component, and the last two digits representing the blue component, in hexadecimal.
A string is a valid lowercase simple color if it is a valid simple color and doesn't use any characters in the range U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F.
A set of space-separated tokens is a string containing zero or more words (known as tokens) separated by one or more space characters, where words consist of any string of one or more characters, none of which are space characters.
A string containing a set of space-separated tokens may have leading or trailing space characters.
An unordered set of unique space-separated tokens is a set of space-separated tokens where none of the tokens are duplicated.
An ordered set of unique space-separated tokens is a set of space-separated tokens where none of the tokens are duplicated but where the order of the tokens is meaningful.
Sets of space-separated tokens sometimes have a defined set of allowed values. When a set of allowed values is defined, the tokens must all be from that list of allowed values; other values are non-conforming. If no such set of allowed values is provided, then all values are conforming.
How tokens in a set of space-separated tokens are to be compared (e.g. case-sensitively or not) is defined on a per-set basis.
A set of comma-separated tokens is a string containing zero or more tokens each separated from the next by a single "," (U+002C) character, where tokens consist of any string of zero or more characters, neither beginning nor ending with space characters, nor containing any "," (U+002C) characters, and optionally surrounded by space characters.
For instance, the string " a ,b,,d d " consists of four
  tokens: "a", "b", the empty string, and "d d". Leading and
  trailing whitespace around each token doesn't count as part of the
  token, and the empty string can be a token.
Sets of comma-separated tokens sometimes have further restrictions on what consists a valid token. When such restrictions are defined, the tokens must all fit within those restrictions; other values are non-conforming. If no such restrictions are specified, then all values are conforming.
A valid hash-name reference to an element of type type is a string consisting of a "#" (U+0023) character followed by a string which exactly matches the value
  of the name attribute of an element with type
  type in the document.
A string is a valid media query if it matches the
  media_query_list production of the Media
  Queries specification. [MQ]
A string matches the environment of the user if it is the empty string, a string consisting of only space characters, or is a media query that matches the user's environment according to the definitions given in the Media Queries specification. [MQ]