Copyright © 2010 W3C ® ( MIT , ERCIM , Keio ), All Rights Reserved. W3C liability , trademark and document use rules apply.
This
specification
defines
rules
and
guidelines
for
adapting
the
RDF
in
XHTML:
Syntax
and
Processing
(RDFa)
specification
for
use
in
the
HTML5
and
XHTML5
members
of
the
HTML
family.
The
rules
defined
in
this
document
specification
not
only
apply
to
HTML5
documents
in
non-XML
and
XML
mode,
but
also
to
HTML4
and
XHTML
documents
interpreted
through
the
HTML5
parsing
rules.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This
is
the
First
Public
a
Working
Draft
of
the
"HTML+RDFa:
A
mechanism
for
embedding
RDF
in
HTML"
specification
for
review
by
W3C
members
and
other
interested
parties.
This Working Draft includes the following changes:
If you wish to make comments regarding this document, please send them to public-rdf-in-xhtml-tf@w3.org ( subscribe , archives ) or to public-html-comments@w3.org ( subscribe , archives ), or submit them using the W3C's public bug database .
Implementors
should
be
aware
that
this
specification
is
not
stable.
Implementors
who
are
not
taking
part
in
the
discussions
are
likely
to
find
the
specification
changing
out
from
under
them
in
incompatible
ways.
Vendors
interested
in
implementing
this
specification
before
it
eventually
reaches
the
Candidate
Recommendation
stage
should
join
note
the
aforementioned
mailing
lists
status,
and
take
part
in
are
encouraged
to
join
the
discussions.
RDFa
Working
Group.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
The publication of this document by the W3C as a W3C Working Draft does not imply that all of the participants in the W3C HTML working group endorse the contents of the specification. Indeed, for any section of the specification, one can usually find many members of the working group or of the W3C as a whole who object strongly to the current text, the existence of the section at all, or the idea that the working group should even spend time discussing the concept of that section.
The latest stable version of the editor's draft of this specification is always available on the W3C CVS server . The latest editor's working copy (which may contain unfinished text in the process of being prepared) is also available.
The
W3C
This
specification
has
been
jointly
developed
by
the
RDFa
Task
Force
and
the
HTML
Working
Group
and
is
currently
being
published
by
the
W3C
working
group
responsible
for
this
specification's
progress
along
the
W3C
Recommendation
track.
HTML
Working
Group
to
further
discussions
there.
This specification is an extension to the HTML5 language. All normative content in the HTML5 specification, unless specifically overridden by this specification, is intended to be the basis for this specification.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .
This section is informative.
Today's web is built predominantly for human consumption. Even as machine-readable data begins to permeate the web, it is typically distributed in a separate file, with a separate format, and very limited correspondence between the human and machine versions. As a result, web browsers can provide only minimal assistance to humans in parsing and processing web data: browsers only see presentation information. RDFa is intended to solve the problem of machine-readable data in HTML documents. RDFa provides a set of HTML attributes to augment visual data with machine-readable hints. Using RDFa, authors may turn their existing human-visible text and links into machine-readable data without repeating content.
In early 2004, Mark Birbeck published a document named [ XHTMLRDF ] via the XHTML2 Working Group wherein he laid the groundwork for what would eventually become RDFa (The Resource Description Framework in Attributes).
In 2006, the work was co-sponsored by the Semantic Web Deployment Work Group, which began to formalize a technology to express semantic data in XHTML. This technology was successfully developed and reached consensus at the W3C, later published as an official W3C Recommendation. While HTML provides a mechanism to express the structure of a document (title, paragraphs, links), RDFa provides a mechanism to express the meaning in a document (people, places, events).
The document, titled "RDF in XHTML: Syntax and Processing" [ XHTML+RDFa ], defined a set of attributes and rules for processing those attributes that resulted in the output of machine-readable semantic data. While the document applied to XHTML, the attributes and rules were always intended to operate across any tree-based structure containing attributes on tree nodes (such as HTML4, SVG and ODF).
While RDFa was initially specified for use in XHTML, adoption by a number of large organizations on the Web spurred RDFa's use in non-XHTML languages. Its use in HTML4, before an official specification was developed for those languages, caused concern regarding document conformance.
Over the years, the members of the RDFa Task Force [ RDFaTF ] had discussed the possibility of applying the same attributes and processing rules outlined in the XHTML+RDFa specification to all HTML family documents. By design, the possibility of a unified semantic data expression mechanism between all HTML and XHTML family documents was squarely in the realm of possibility.
This section describes the modifications to the original XHTML+RDFa specification that permit the use of RDFa in all HTML family documents. By using the attributes and processing rules described in the XHTML+RDFa specification and heeding the minor changes in this section, authors can expect to generate markup that produces the same semantic data output in HTML4, HTML5 and XHTML5.
This section is normative.
Section 5.5: Sequence , of the [ XHTML+RDFa ] specification defines a generic processing model for extracting RDF from a tree-based model. The method of transforming an input document into a model suited for the RDFa processing rules is intentionally not defined in the XHTML+RDFa specification. The method of transformation was intended to be defined in the implementation language, in this case, this section of the HTML+RDFa specification.
The HTML5 and XHTML5 DOMs are each a super-set of the tree-based model on which the RDFa processing rules operate. Therefore, a mapping mechanism to translate from a DOM to a tree-model is not necessary. The HTML5 and XHTML5 DOM, or equivalent data structure, should be used as input to the RDFa processing rules. The normative language for construction of the HTML5 DOM and XHTML5 DOM is contained in the HTML5 specification.
This section is informative.
RDFa's tree-based processing rules, outlined in Section 5.5: Sequence of the XHTML+RDFa specification, allow an input document to be automatically corrected, cleaned-up, re-arranged, or modified in any way that is approved by the host language prior to processing. For example, element nesting issues in HTML documents may be corrected before the input document is translated into the DOM, a valid tree-based model, on which the RDFa processing rules will operate.
Any mechanism that generates a data structure equivalent to the HTML5 or XHTML5 DOM, such as the html5lib library, may be used as the mechanism to construct the tree-based model provided as input to the RDFa processing rules.
This section is normative.
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [ RFC2119 ].
In order for a document to claim that it is a conforming HTML+RDFa document, it must provide the facilities described as mandatory in this section. The document conformance criteria are listed below, of which only a subset are mandatory:
version
attribute
on
the
html
element.
The
value
of
the
version
attribute
should
be
"HTML+RDFa
1.0"
if
the
document
is
a
non-XML
mode
document,
or
"XHTML+RDFa
1.0"
if
the
document
is
a
XML
mode
document.
link
element
contained
in
the
head
element
that
contains
profile
for
the
the
rel
attribute
and
http://www.w3.org/1999/xhtml/vocab
for
the
href
attribute.
A conforming RDFa user agent must:
A conforming RDFa Processor must implement all of the mandatory features specified in the XHTML+RDFa specification. It must also support any mandatory features specified in this specification.
This section is normative.
The [ XHTML+RDFa ] Recommendation is the base document on which this specification builds. XHTML+RDFa specifies the attributes, in Section 2.1: The RDFa Attributes , and processing model, in Section 5: Processing Model , for extracting RDF from an XHTML document. This section specifies changes to the attributes and processing model defined in XHTML+RDFa in order to support extracting RDF from HTML documents.
The requirements and rules, as specified in XHTML+RDFa and further modified in this document, apply to all HTML5 documents. The RDFa Processor operating on HTML and XHTML documents, specifically the resulting DOMs, must apply the same processing rules for both types of serializations and DOMs.
The
lang
attribute
must
be
processed
in
the
same
manner
as
the
xml:lang
attribute
is
in
the
XHTML+RDFa
specification,
Section
5.5:
Sequence
,
step
#3.
If
an
author
is
editing
an
HTML
fragment
and
is
unsure
of
the
final
encapsulating
MIME
type
for
their
markup,
it
is
suggested
that
the
author
specify
both
lang
and
xml:lang
where
the
value
in
both
attributes
is
exactly
the
same.
When generating literals of type XMLLiteral, the processor must ensure that the output XMLLiteral is a namespace well-formed XML fragment. A namespace well-formed XML fragment has the following properties:
xmlns
attribute
as
well
as
all
currently
active
attributes
starting
with
xmlns:
must
be
preserved
in
the
generated
XMLLiteral.
This
preservation
must
be
accomplished
by
placing
all
active
namespaces
in
each
top-level
element
in
the
generated
XMLLiteral,
taking
care
to
not
over-write
pre-existing
namespace
values.
An RDFa Processor that transforms the XML fragment must use the Coercing an HTML DOM into an Infoset rules, as specified in the HTML5 specification, prior to generating the triple containing the XMLLiteral. The serialization algorithm that must be used for generating the XMLLiteral is normatively defined in the Serializing XHTML Fragments section of the HTML5 specification.
Transformation to a namespace well-formed XML fragment is required because an application that consumes XMLLiteral data expects that data to be a namespace well-formed XML fragment.
The
transformation
requirement
does
not
apply
to
input
data
that
are
text-only,
such
as
literals
that
contain
a
datatype
attribute
with
an
empty
value
(
""
),
or
input
data
that
that
contain
only
text
nodes.
An example transformation demonstrating the preservation of namespace values is provided below. The → symbol is used to denote that the line is a continuation of the previous line and is included purely for the purposes of readability:
<p xmlns:ex="http://example.org/vocab#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> Two rectangles (the example markup for them are stored in a triple): <svg xmlns="http://www.w3.org/2000/svg" property="ex:markup" datatype="rdf:XMLLiteral"> → <rect width="300" height="100" → style="fill:rgb(0,0,255);stroke-width:1; stroke:rgb(0,0,0)"/> → <rect width="50" height="50" → style="fill:rgb(255,0,0);stroke-width:2; → stroke:rgb(0,0,0)"/></svg> </p>The markup above should produce the following triple:
<> <http://example.org/vocab#markup> "<rect xmlns=\"http://www.w3.org/2000/svg\" width=\"300\" → height=\"100\" style=\"fill:rgb(0,0,255);stroke-width:1; stroke:rgb(0,0,0)\"/> → <rect xmlns=\"http://www.w3.org/2000/svg\" width=\"50\" → height=\"50\" style=\"fill:rgb(255,0,0);stroke-width:2; → stroke:rgb(0,0,0)\"/>"^^http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteralNote the preservation of the SVG namespace by injecting a new
xmlns
attribute.
Since
the
ex
and
rdf
namespaces
are
not
used
in
either
rect
element,
they
are
not
preserved
in
the
XMLLiteral.
xmlns:
-Prefixed
Attributes
While this section outlines xmlns: processing in RDFa, the support for distributed extensibility in non-XML mode HTML5 (using xmlns and xmlns:) is still an open issue. This section may be further modified before Last Call based on progress made on the distributed extensibility issue.
CURIE
prefix
mappings
specified
using
attributes
prepended
with
xmlns:
must
be
processed
using
the
rules
specified
in
Section
5.4,
CURIE
and
URI
Processing,
contained
in
the
XHTML+RDFa
specification.
Since
CURIE
prefix
mappings
have
been
specified
using
xmlns:
,
and
since
HTML
attribute
names
are
case-insensitive,
CURIE
prefix
names
declared
using
the
xmlns:
attribute-name
pattern
xmlns:<PREFIX>="<URI>"
should
be
specified
using
only
lower-case
characters.
For
example,
the
text
"
xmlns:
"
and
the
text
in
"<PREFIX>"
should
be
lower-case
only.
This
is
to
ensure
that
prefix
mappings
are
interpreted
in
the
same
way
between
HTML
(case-insensitive
attribute
names)
and
XHTML
(case-sensitive
attribute
names)
document
types.
Status: ISSUE-41 (decentralized extensibility) blocks progress to Last Call
This section is normative.
There are a few changes that are required to the HTML5 specification in order to fully support RDFa. The following sub-sections outline the necessary modifications to the base HTML5 specification.
All RDFa attributes and valid values (including CURIEs), as listed in Section 2.1: The RDFa Attributes , are conforming when used in an HTML5 or XHTML5 document.
xmlns:
-Prefixed
Attributes
While this section outlines xmlns: conformance criteria for HTML+RDFa, the support for distributed extensibility in non-XML mode HTML5 (using xmlns and xmlns:) is still an open issue. This section may be further modified before Last Call based on progress made on the distributed extensibility issue.
Since
RDFa
uses
attributes
starting
with
xmlns:
to
specify
CURIE
prefixes,
it
is
important
that
any
attribute
starting
with
a
case-insensitive
match
on
the
text
string
"
xmlns:
"
be
preserved
in
the
DOM
or
other
tree-like
model
that
is
passed
to
the
RDFa
Processor.
While
it
is
specified
that
HTML5
must
preserve
these
attributes
in
the
DOM,
it
must
also
accept
these
attributes
as
conforming
in
non-XML
HTML5.
For
documents
conforming
to
this
specification,
attributes
with
names
that
have
the
case
insensitive
prefix
"
xmlns:
"
are
conforming
in
both
HTML5
and
XHTML5.
This section needs feedback from the user agent vendors to ensure that this feature does not conflict with user agent architecture and has no technical reason that it cannot be implemented.
RDFa
is
currently
dependent
on
the
xmlns:
pattern
to
declare
prefix
mappings,
it
is
imperative
that
namespace
information
that
is
declared
in
non-XML
mode
HTML5
documents
are
mapped
to
an
Infoset
correctly.
In
order
to
ensure
this
mapping
is
performed
correctly,
the
"Coercing
an
HTML
DOM
into
an
infoset"
rules
defined
in
[
HTML5
]
must
be
modified
to
include
the
following
rule:
If the XML API is namespace-aware, the tool must ensure that proper ([ namespace name ], [ local name ], [ normalized value ]) namespace tuples are created when converting the non-XML mode DOM into an Infoset.
For example, given the following input text:
<div xmlns:audio="http://purl.org/media/audio#">The
div
element
above,
when
coerced
from
an
HTML
DOM
into
an
Infoset,
should
contain
an
attribute
in
the
[
namespace
attributes
]
list
with
a
[namespace
name]
set
to
"
http://www.w3.org/2000/xmlns/
",
a
[local
name]
set
to
audio
,
and
a
[normalized
value]
of
"
http://purl.org/media/audio#
".
Status:
First
draft
This
section
is
informative
While the intent of the RDFa processing instructions were to provide a set of rules that are as language and toolchain agnostic as possible, for the sake of clarity, detailed methods of extracting RDFa content from processors operating on an XML Information Set are provided below.
Extracting namespaced RDFa attributes while operating from within an Infoset-based RDFa processor can be achieved using the following algorithm:
While processing an element as described in [ XHTML+RDFA ], Section 5.5, Step #2 :
xmlns:
,
create
a
[
URI
mapping
]
by
storing
the
[local
name]
part
with
the
xmlns:
characters
removed
as
the
value
to
be
mapped,
and
the
[
normalized
value
]
as
the
value
to
map.
To demonstrate, assume that the following markup is processed by an Infoset-based RDFa processor:
<div xmlns:audio="http://purl.org/media/audio#" ...After the markup is processed, there should exist a [URI mapping] in the [local list of URI mappings] that contains a mapping from
audio
to
http://purl.org/media/audio#
.There are a number of non-prefixed attributes that are associated with RDFa Processing in HTML5. If an XML Information Set based RDFa processor is used to process these attributes, the following algorithm should be used to detect and extract the values of the attributes.
While processing an element as described in [ XHTML+RDFA ], Section 5.5, Step #4 through Step #9 :
http://www.w3.org/1999/xhtml
,
extract
and
use
the
[
normalized
value
].
This section is informative
This mechanism should be double-checked against all of the RDFa Javascript implementations to ensure correctness.
While the intent of the RDFa processing instructions were to provide a set of rules that are as language and toolchain agnostic as possible, for the sake of clarity, detailed methods of extracting RDFa content from processors operating in a DOM2 environment are provided below.
Extracting namespaced RDFa attributes while operating from within a DOM Level 2 based RDFa processor can be achieved using the following algorithm:
While processing each [ Element ] as described in [ XHTML+RDFA ], Section 5.5, Step #2 :
xmlns
,
create
a
[
URI
mapping
]
by
storing
the
[
local
name
]
as
the
value
to
be
mapped,
and
the
[
Node.nodeValue
]
as
the
value
to
map.
xmlns:
,
create
a
[
URI
mapping
]
by
storing
the
[local
name]
part
with
the
xmlns:
characters
removed
as
the
value
to
be
mapped,
and
the
[
Node.nodeValue
]
as
the
value
to
map.
To demonstrate, assume that the following markup is processed by a DOM2-based RDFa processor:
<div xmlns:audio="http://purl.org/media/audio#" ...After the markup is processed, there should exist a [URI mapping] in the [local list of URI mappings] that contains a mapping from
audio
to
http://purl.org/media/audio#
.There are a number of non-prefixed attributes that are associated with RDFa processing in HTML5. If an DOM2-based RDFa processor is used to process these attributes, the following algorithm should be used to detect and extract the values of the attributes.
While processing an element as described in [ XHTML+RDFA ], Section 5.5, Step #4 through Step #9 :
http://www.w3.org/1999/xhtml
,
extract
and
use
the
[
Node.nodeValue
]
as
the
value.