3. Conformance Definition
This section is normative.
In order to ensure that XHTML-family documents are maximally
portable among XHTML-family user agents, this specification rigidly
defines conformance requirements for both of these and for XHTML-family document types. While the conformance
definitions can be found in this section, they necessarily reference
normative text within this document, within the base XHTML
specification [XHTML1], and within other
related specifications. It is only possible to fully comprehend the
conformance requirements of XHTML through a complete reading of all normative
references.
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as
described in [RFC2119].
3.1. XHTML Host Language Document Type Conformance
It is possible
to modify existing document types and define wholly new document types
using both modules defined in this specification and other
modules. Such a document type is "XHTML Host Language Conforming"
when it meets the following criteria:
-
The document type must be defined using one of the implementation
methods defined by the W3C.
Currently this is limited to XML DTDs, but XML
Schema will be available soon.
The rest of this section refers to "DTDs" although other implementations
are possible.
- The DTD which defines the document type must have a unique
identifier as defined in Naming Rules that uses the string "XHTML" in its first token of the public text description.
- The DTD which defines the document type must include,
at a minimum, the Structure, Hypertext,
Text, and List modules defined in this specification.
- For each of the W3C-defined modules that are included,
all of the elements, attributes, types of attributes (including any required
enumerated value lists), and any required
minimal content models must be included (and optionally extended) in the document type's content
model. When content models are extended, all of the elements and attributes
(along with their types or any required enumerated value lists)
required in the original content model must continue to be required.
- The DTD which defines the document type may define additional
elements and attributes.
However, these must be in their own XML namespace
[XMLNS].
3.2. XHTML Integration Set Document Type Conformance
It is also possible
to define document types that are based upon XHTML, but do not adhere to its
structure.
Such a document type is "XHTML Integration Set Conforming"
when it meets the following criteria:
-
The document type must be defined using one of the implementation
methods defined by the W3C.
Currently this is limited to XML DTDs, but XML
Schema will be available soon.
The rest of this section refers to "DTDs" although other implementations
are possible.
- The DTD which defines the document type must have a unique
identifier as defined in Naming Rules that uses the string "XHTML" NOT in its first token of the public text description.
- The DTD which defines the document type must include,
at a minimum, the Hypertext,
Text, and List modules defined in this specification.
- For each of the W3C-defined modules that are included,
all of the elements, attributes, types of attributes (including any required
enumerated lists), and any required
minimal content models must be included (and optionally extended) in the document type's content
model. When content models are extended, all of the elements and attributes
(along with their types or any required enumerated value lists)
required in the original content model must continue to be required.
- The DTD which defines the document type may define
additional elements and attributes.
However, these must be in their own XML namespace
[XMLNS].
3.3. XHTML Family Module Conformance
This specification defines a method for defining
XHTML-conforming modules. A module conforms to this specification when
it meets all of the following criteria:
-
The document type must be defined using one of the implementation
methods defined by the W3C.
Currently this is limited to XML DTDs, but XML
Schema will be available soon.
The rest of this section refers to "DTDs" although other implementations
are possible.
-
The DTD which defines the module must have a unique identifier
as defined in Naming Rules.
-
When the module is defined using an XML DTD, the module must
insulate its parameter entity names
through the use of unique prefixes or other, similar methods.
-
The module definition must have a prose definition that describes the syntactic
and semantic requirements of the elements, attributes, and/or content
models that it declares.
-
The module definition must not reuse any element names that are
defined in other
W3C-defined modules, except when the content model and semantics of
those elements are either identical to the original or an extension
of the original, or when the reused element names are within their own
namespace (see below).
-
The module definition's elements and attributes must be part of
an XML namespace
[XMLNS].
If the module is defined by
an organization other than the W3C, this namespace must NOT be the same as the
namespace in which other W3C modules are defined.
3.4. XHTML Family Document Conformance
A conforming XHTML family document is a valid instance of an
XHTML Host Language Conforming Document Type.
3.5. XHTML Family User Agent Conformance
A conforming user agent must meet all of the following
criteria (as defined in [XHTML1]):
- In order to be consistent with the XML 1.0 Recommendation [XML], the user agent must parse and evaluate
an XHTML document for well-formedness. If the user agent claims
to be a validating user agent, it must also validate documents
against their referenced DTDs according to
[XML].
- When the user agent claims to support
facilities defined within this specification or required by
this specification through normative reference, it must do so in
ways consistent with the facilities' definition.
- When a user agent processes an XHTML document as generic [XML],
it shall only recognize attributes of type
ID
(e.g., the id
attribute on most XHTML elements)
as fragment identifiers.
- If a user agent encounters an element it does not recognize,
it must continue to process the children of that element. If the content is
text, the text must be presented to the user.
- If a user agent encounters an attribute it does not
recognize, it must ignore the entire attribute specification
(i.e., the attribute and its value).
- If a user agent encounters an attribute value it doesn't
recognize, it must use the default attribute value.
- If it encounters an entity reference (other than one
of the predefined entities) for which the user agent has
processed no declaration (which could happen if the declaration
is in the external subset which the user agent hasn't read), the entity
reference should be rendered as the characters (starting
with the ampersand and ending with the semi-colon) that
make up the entity reference.
- When rendering content, user agents that encounter
characters or character entity references that are recognized but not renderable should display the document in such a way that it is obvious to the user that normal rendering has not taken place.
-
White space is handled according to the following rules. The following
characters are defined in [XML]
as white space characters:
- SPACE ( )
- HORIZONTAL TABULATION (	)
- CARRIAGE RETURN (
)
- LINE FEED (
)
The XML processor normalizes different systems' line end codes into one
single LINE FEED character, that is passed up to the application.
The user agent must process white space characters in the data received
from the XML processor as follows:
- All white space surrounding block elements should be removed.
- Comments are removed entirely and do not affect white space handling.
One white space character on either side of a comment is treated as two
white space characters.
- When the '
xml:space
' attribute is set to
'preserve
', white space characters must be preserved and
consequently LINE FEED characters within a block must not be converted.
- When the '
xml:space
' attribute is not set to
'preserve
', then:
- Leading and trailing white space inside a block element must be
removed.
- LINE FEED characters must be converted into one of the following
characters: a SPACE character, a ZERO WIDTH SPACE character (​), or
no character (i.e. removed). The choice of the resulting character is user
agent dependent and is conditioned by the script property of the characters
preceding and following the LINE FEED character.
- A sequence of white space characters without any LINE FEED characters must
be reduced to a single SPACE character.
- A sequence of white space characters with one or more LINE FEED characters
must be reduced in the same way as a single LINE FEED character.
White space in attribute values is processed according to
[XML].
Note (informative): In determining how to convert a LINE FEED
character a user agent should consider the following cases, whereby the script
of characters on either side of the LINE FEED determines the choice of
the replacement. Characters of COMMON script (such as punctuation) are
treated as the same as the script on the other side:
- If the characters preceding and following the LINE FEED character
belong to a script in which the SPACE character is used as a word
separator, the LINE FEED character should be converted into a SPACE
character. Examples of such scripts are Latin, Greek, and Cyrillic.
- If the characters preceding and following the LINE FEED character
belong to an ideographic-based script or writing system in which there
is no word separator, the LINE FEED should be converted into no
character. Examples of such scripts or writing systems are Chinese,
Japanese.
- If the characters preceding and following the LINE FEED character
belong to a non ideographic-based script in which there is no word
separator, the LINE FEED should be converted into a ZERO WIDTH SPACE
character (​) or no character. Examples of such scripts are Thai,
Khmer.
- If none of the conditions in (1) through (3) are true, the LINE FEED
character should be converted into a SPACE character.
The Unicode [UNICODE]
technical report TR#24 (Script Names) provides an assignment of script
names to all characters.
3.6. Naming Rules
XHTML Host Language document types must adhere to strict
naming conventions so that it is possible for software and users to readily
determine the relationship of document types to XHTML. The names
for document types implemented as XML document type definitions
are defined through Formal Public
Identifiers (FPIs). Within FPIs, fields are separated by double slash
character sequences (//
). The various fields must be composed as
follows:
- The leading field must be "-" to indicate a privately defined
resource.
- The second field must contain the name of the organization responsible for
maintaining the named item. There is no formal registry for these organization
names. Each organization should define a name that is unique. The name used
by the W3C is, for example,
W3C
.
-
The third field
contains two constructs: the public text class followed by the
public text description.
The first token in the third field is the public text class which
should adhere to ISO 8879 Clause 10.2.2.1 Public Text Class.
Only XHTML Host Language conforming documents should begin the
public text description with the token XHTML.
The public text description should contain the string XHTML if
the document type is Integration Set conforming.
The field must also contain
an organization-defined unique identifier (e.g., MyML 1.0).
This identifier should be composed of a unique name and a version
identifier that can be updated as the document type evolves.
- The fourth field defines the language in which the item is developed
(e.g.,
EN
).
Using these rules,
the name for an XHTML Host Language conforming document type might be
-//MyCompany//DTD XHTML MyML 1.0//EN
. The name for an XHTML
family conforming module might be
-//MyCompany//ELEMENTS XHTML MyElements 1.0//EN
. The name for an
XHTML Integration Set conforming document type might be
-//MyCompany//DTD Special Markup with XHTML//EN
.
3.7. XHTML Module Evolution
Each module defined in this specification is given a
unique identifier that adheres to the naming rules in the previous section. Over time,
a module may evolve. A logical ramification of such evolution may be that some aspects of
the module are no longer compatible with its previous definition. To help ensure that
document types defined against modules defined in this specification continue to operate,
the identifiers associated with a module that changes will be updated. Specifically,
the Formal Public Identifier and System Identifier of the module will be changed by
modifying the version identifier included in each. Document types that wish to
incorporate the updated functionality will need to be similarly updated.
In addition, the earlier version(s) of the
module will continue to be available via its earlier, unique identifier(s). In this
way, document types developed using XHTML modules will continue to function seamlessly
using their original definitions even as the collection expands and evolves.
Similarly, document instances written against such document types will continue to
validate using the earlier module definitions.
Other
XHTML Family Module and Document Type authors are encouraged to adopt a similar strategy to
ensure the
continued functioning of document types based upon those modules and document
instances based upon those document types.