The terms defined by this document are also provided in RDF Schema format.
Copyright © 2017 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
This document is a specification for a vocabulary to represent content in the Resource Description Framework (RDF). This vocabulary is intended to provide a flexible framework within different usage scenarios to semantically represent any type of content, be it on the Web or in local storage media. For example, it can be used by web quality assurance tools such as web accessibility evaluation tools to record a representation of the assessed web content, including text, images, or other types of formats. In many cases, it can be used together with HTTP Vocabulary in RDF 1.0, which allows quality assurance tools to record the HTTP headers that have been exchanged between a client and a server. This is particularly useful for quality assurance testing, conformance claims, and reporting languages like the W3C Evaluation And Report Language (EARL).
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This Representing Content in RDF 1.0 is published as a W3C Working Group Note because the Evaluation and Repair Tools Working Group (ERT WG) reached the end of its Charter.
Representing Content in RDF 1.0 is a supporting document for the Evaluation and Report Language (EARL) 1.0 Schema but can be used in other contexts too. It is considered to be complete and mature but at this time there are not sufficient implementations to finalize this work.
If you wish to make comments regarding this Representing Content in RDF 1.0 document, please send them to public-earl10-comments@w3.org (publicly visible mailing list archive).
Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document has been produced by the Evaluation and Repair Tools Working Group (ERT WG) as part of the Web Accessibility Initiative (WAI) Technical Activity.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 1 September 2015 W3C Process Document.
This document is the specification for a vocabulary to represent Content in the Resource Description Framework (RDF). There is a wide variety of scenarios (see section below) where a representation of any type of content, either on the Web or in any local storage media, is necessary. This specification provides an RDF application that allows to present semantically such content. The vocabulary is built in a flexible manner, thus there are no limitations known at the time of writing this specification. It also provides opportunities for extensions to match particular needs of its users.
This document assumes the following background knowledge:
The terms defined by this document can be used as part of the W3C Evaluation And Report Language (EARL) and in other contexts too. Developer Guide for Evaluation and Report Language (EARL) 1.0 explains how to implement and use EARL, including conformance requirements for software tools.
Although the concepts of the Semantic Web are simple, their abstraction with RDF is known to bring difficulties to beginners. It is recommended to read carefully the aforementioned references and other tutorials found on the Web. It must be also borne in mind that RDF is primarily targeted to be machine processable, and therefore, some of its expressions are not very intuitive for developers used to work with XML only. The examples will be serialized using the abbreviated RDF/XML notation.
The keywords must, required, recommended, should, may, and optional are used in accordance with [RFC2119].
For limitations of this vocabulary, see section 5.
Table 1 presents the namespaces typically used by
this vocabulary. The core namespace has the URI
http://www.w3.org/2011/content#
and the prefix cnt
.
The prefix notation presents the typical conventions used in the Web and in
this document to denote a given namespace, and can be freely modified.
Namespace prefix | Namespace URI | Description |
---|---|---|
cnt |
http://www.w3.org/2011/content# |
Namespace for Representing Content in RDF. |
dct |
http://purl.org/dc/terms/ |
Namespace for Dublin Core Metadata Terms. |
owl |
http://www.w3.org/2002/07/owl# |
Namespace for OWL [OWL]. |
rdf |
http://www.w3.org/1999/02/22-rdf-syntax-ns# |
Namespace for RDF [RDF]. |
As stated earlier, this framework is designed in an open way to facilitate different implementation scenarios. The origin of the application comes from vocabularies describing testing scenarios like the Evaluation And Report Language (EARL) [EARL]. Typical applications could be:
This section presents a description of the classes of this RDF vocabulary. We present every class together with its subclasses. We also include whenever relevant short snippets and examples.
The cnt:Content
class is an overarching class for any content that could be found on the Web,
in an Intranet or in local storage media, for example. It is recommended
always to use one of its subclasses. There is no restriction within the
vocabulary scope on what can be represented with this class: textual content,
XML files, binary files
(e.g., images or movies), etc.
There are
three subclasses from the Content
class: cnt:ContentAsBase64
, cnt:ContentAsText
and cnt:ContentAsXML
.
In order
to connect resources with different cnt:Content
sub-types with
each other, use the dct:hasFormat
and dct:isFormatOf
properties to point to each other. E.g. if there is an
XML resource transmitted via HTTP, the original version would
be a cnt:ContentAsBase64
resource. But
cnt:ContentAsText
and cnt:ContentAsXML
resource
could also be created and point to the cnt:ContentAsBase64
resource.
Example 2.1: This example shows how to relate derived resources to the original resource.
<cnt:ContentAsBase64 rdf:ID="xml0"> <!-- ... --> <dct:isFormatOf rdf:resource="http://www.example.org/index.html"/> <dct:hasFormat rdf:resource="#xml1"/> <dct:hasFormat rdf:resource="#xml2"/> </cnt:ContentAsBase64> <cnt:ContentAsText rdf:ID="xml1"> <!-- ... --> <dct:isFormatOf rdf:resource="http://www.example.org/index.html"/> <dct:hasFormat rdf:resource="#xml0"/> <dct:hasFormat rdf:resource="#xml2"/> </cnt:ContentAsText> <cnt:ContentAsXML rdf:ID="xml2"> <!-- ... --> <dct:isFormatOf rdf:resource="http://www.example.org/index.html"/> <dct:hasFormat rdf:resource="#xml0"/> <dct:hasFormat rdf:resource="#xml1"/> </cnt:ContentAsXML>
The cnt:ContentAsBase64
class
is a subclass of the cnt:Content
class for Base64 encoded binary content (as defined by [RFC2045]) and can be used for any type of content,
although its more typical use case is for binary files.
Example 2.2: This example displays the representation of
the W3C logo as a
ContentAsBase64
resource. (Note: due to its length, the encoded
string has been chunked until {...}
.)
<cnt:ContentAsBase64> <cnt:bytes>77+9UE5HDQoaCgAAAA1JSERSAAAASAAAADAIAwAAAO+{...}</cnt:bytes> <dct:isFormatOf rdf:resource="http://www.w3.org/Icons/w3c_home.png"/> </cnt:ContentAsBase64>
The
cnt:ContentAsText
class is a subclass of the cnt:Content
class for any type of
textual content.
Example 2.3: The following example represents a Cascading
Style Sheet (CSS) file as a ContentAsText
resource.
<cnt:ContentAsText> <cnt:characterEncoding>UTF-8</cnt:characterEncoding> <cnt:chars>body { color: #000; background: #fff } h1 { font-size: 1.6em } h2 { font-size: 1.3em }</cnt:chars> <dct:isFormatOf rdf:resource="http://example.org/example.css"/> </cnt:ContentAsText>
The
cnt:ContentAsXML
class is a subclass of the cnt:Content
class only for wellformed
XML content.
See the Mapping between the Document Object Model (DOM) and the Content-in-RDF vocabulary.
Example 2.4: The XHTML page with the following source code:
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> <title>The title</title> </head> <body> <p>Some paragraph.</p> </body> </html>
could be represented as this ContentAsXML
resource.
<cnt:ContentAsXML> <cnt:rest rdf:parseType="Literal"><html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> <title>The title</title> </head> <body> <p>Some paragraph.</p> </body> </html></cnt:rest> <dct:isFormatOf rdf:resource="http://example.org/example203.html"/> </cnt:ContentAsXML>
For the use of
leadingMisc
and dtDecl
see Appendix A: A practical example.
A document type declaration. This class is normally used in conjunction
with the ContentAsXML
class,
when the corresponding XML resource contains a document
type declaration. The relation is expressed via the dtDecl
property.
See the Mapping between the Document Object Model (DOM) and the Content-in-RDF vocabulary.
Example 2.6: A typical XHTML 1.0 Strict document type declaration:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
could be
represented as the following DoctypeDecl
resource:
<cnt:DoctypeDecl rdf:ID="dtd0"> <cnt:doctypeName>html</cnt:doctypeName> <cnt:publicId>-//W3C//DTD XHTML 1.0 Strict//EN</cnt:publicId> <cnt:systemId>http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</cnt:systemId> </cnt:DoctypeDecl>
This section presents a description of the properties of this RDF vocabulary.
Character string representing the Base64 encoded byte sequence of the given content.
cnt:ContentAsBase64
http://www.w3.org/2001/XMLSchema#base64Binary
)The character encoding.
ContentAsBase64
: If the byte sequence was
created from a given character sequence this property can be used to store
the character encoding that was applied to create the byte sequence.ContentAsText
: If the character sequence
was created from a given byte sequence this property can be used to store
the character encoding that was applied to create the character
sequence.ContentAsXML
: If the parser's input
character stream was created from a given byte stream this property can be
used to store the character encoding that was applied to create the
character stream. Note: This is the used
character encoding, not the one declared in an XML declaration.cnt:Content
The character sequence of the given content.
cnt:ContentAsText
The character encoding specified in the XML declaration.
cnt:ContentAsXML
The document type name.
cnt:DoctypeDecl
This property relates an XML Content to its Document Type Declaration.
cnt:ContentAsXML
cnt:DoctypeDecl
The internal subset of a document type declaration.
cnt:DoctypeDecl
The part of the XML information items (whitespace, comments and processing instructions) following the XML declaration and preceding the document type declaration if there is one.
cnt:ContentAsXML
The formal public identifier of a document type declaration.
cnt:DoctypeDecl
It contains comments, processing instructions and the root element.
cnt:ContentAsXML
The standalone document declaration.
cnt:ContentAsXML
The system identifier of a document type declaration.
cnt:DoctypeDecl
http://www.w3.org/2001/XMLSchema#anyURI
)The XML version specified in the XML declaration.
cnt:ContentAsXML
We have identified some situations to make clear when to create which type of content resources. The following are only recommendations and are non-normative:
This includes images,
multimedia, or other non-text resources. The byte sequence is recorded in
Base64 format and represented as a literal using the cnt:bytes
property of the cnt:ContentAsBase64
. Non-text content should not
be represented using cnt:ContentAsText
.
This includes
HTML, CSS, client-side script, or other
text-based resources. Given the byte sequence of text content
(byteSeq) received from a Web server and an appropriate character
encoding (ce). byteSeq is recorded in Base64 format and
represented as a literal using the cnt:bytes
property of the
cnt:ContentAsBase64
.
After transforming the
byteSeq to a character sequence charSeq using character
encoding ce, charSeq is represented as a literal using
the cnt:chars
property of the cnt:ContentAsText
and
ce
as a literal usind the cnt:characterEncoding
property.
Given the byte sequence of text content
(byteSeq) received from a Web server and an inappropriate character
encoding (ce). byteSeq is recorded in Base64 format and
represented as a literal using the cnt:bytes
property of the
cnt:ContentAsBase64
. Because transforming byteSeq to a
character sequence charSeq using character encoding ce
fails, no cnt:ContentAsText
resource can be created.
Given the
character sequence of text content (charSeq) created in memory and
an appropriate character encoding (ce). A
cnt:ContentAsText
resource may be created with a
cnt:chars
property with an object literal created from
charSeq
. After transforming charSeq to byte sequence
byteSeq using character encoding ce, a
cnt:ContentAsBase64
resource may be created with
cnt:bytes
property with an object literal byteSeq and
cnt:characterEncoding
property with an object literal
ce.
Given the byte
sequence of wellformed XML content (byteSeq)
received from a Web server and an appropriate character encoding
(ce). cnt:ContentAsBase64
and
cnt:ContentAsText
resources may be created as in situation B.
Additionally, an cnt:ContentAsXML
resource may be created.
Given a
DOM Document in memory, originally created by parsing some
XML source, but afterwards changed by
DOM operations. A cnt:XMLDecl
resource may be
created from the information in the Document node itself (version,
declaredEncoding and standalone), and a cnt:DoctypeDecl
resource
from the information in the DocumentType node. A cnt:ContentAsXML
resource may be created after serializing the relevant child nodes of the
Document node to create object literals for cnt:leadingMisc
(serialize Comment and ProcessingInstruction nodes preceding a DocumentType
node) and cnt:rest
(serialize nodes following a DocumentType
node). See the Mapping between the Document Object
Model (DOM) and Content-in-RDF properties.
The vocabulary provides a framework that allows the representation of any type of content. Of course, there are many possibilities for extensions that will allow the inclusion of additional metadata, like, e.g., that included in some multimedia formats. Typical scenarios for extensions could be:
However, at the point of writing this specification, the Working Group has decided to provide the basic framework that will support the immediate needs of vocabularies using this specification like the Evaluation and Report Language (EARL) [EARL], leaving the room open for further extensions as new use cases are presented to us.
To understand the versatility of the vocabulary, let us assume we have a given XHTML page containing an XML declaration, a comment preceding a document type declaration and some XHTML elements.
Example 2.6: A typical XHTML page.
<?xml version="1.0" encoding="UTF-8" standalone="no" ?> <!-- this is a comment --> <!DOCTYPE html "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en"> <head> <title>The title</title> </head> <body> <p>Some paragraph.</p> </body> </html>
This page could be represented as simple ContentAsText
:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cnt="http://www.w3.org/2011/content#"> <cnt:ContentAsText> <cnt:chars><?xml version="1.0" encoding="UTF-8" standalone="no" ?> <!-- this is a comment --> <!DOCTYPE html "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>The title</title> </head> <body> <p>Some paragraph.</p> </body> </html></cnt:chars> <dct:isFormatOf rdf:resource="http://example.org/example207.html"/> </cnt:ContentAsText> </rdf:RDF>
or likewise as ContentAsXML
. As the comment
<!-- this is a comment -->
precedes the document type
declaration a cnt:leadingMisc
property is created with its object
literal containing the comment. The document type declaration is modelled as a
DoctypeDecl
resource and refered to from the
cnt:ContentAsXML
resource by the cnt:dtDecl
property.
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cnt="http://www.w3.org/2011/content#" xml:base="http://example.org/example208.html"> <cnt:DoctypeDecl rdf:ID="dtd0"> <cnt:systemId>http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</cnt:systemId> <cnt:publicId>-//W3C//DTD XHTML 1.0 Strict//EN</cnt:publicId> <cnt:doctypeName>html</cnt:doctypeName> </cnt:DoctypeDecl> <cnt:ContentAsXML> <cnt:version>1.0</cnt:version> <cnt:declaredEncoding>UTF-8</cnt:declaredEncoding> <cnt:standalone>no</cnt:standalone> <cnt:leadingMisc rdf:parseType="Literal"><!-- this is a comment --></cnt:leadingMisc> <cnt:dtDecl rdf:resource="#dtd0" /> <cnt:rest rdf:parseType="Literal"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>The title</title> </head> <body> <p>Some paragraph.</p> </body> </html></cnt:rest> <dct:isFormatOf rdf:resource="http://example.org/example208.html"/> </cnt:ContentAsXML> </rdf:RDF>
The following terms are defined by this specification:
Class name | Label | Comment | Refinements | Related properties |
---|---|---|---|---|
cnt:Content |
Content | The content. | cnt:ContentAsBase64, cnt:ContentAsText, cnt:ContentAsXML | |
cnt:ContentAsBase64 |
Base64 Content | The base64 encoded content (can be used for binary content). | - | cnt:bytes, cnt:characterEncoding |
cnt:ContentAsText |
Text Content | The text content (can be used for text content). | - | cnt:chars, cnt:characterEncoding |
cnt:ContentAsXML |
XML content | The XML content (can be used for XML-wellformed content). | - | cnt:version, cnt:declaredEncoding, cnt:standalone, cnt:leadingMisc, cnt:dtDecl, cnt:rest, cnt:characterEncoding |
cnt:DoctypeDecl |
Document type declaration | The document type declaration. | - | cnt:doctypeName, cnt:internalSubset, cnt:publicId, cnt:systemId |
Property name | Label | Comment | Domain | Range |
---|---|---|---|---|
cnt:bytes |
Base64 encoded byte sequence | The Base64 encoded byte sequence of the content. | cnt:ContentAsBase64 |
http://www.w3.org/2001/XMLSchema#base64Binary |
cnt:characterEncoding |
Character encoding | The character encoding used to create a character sequence from a byte sequence or vice versa. | cnt:Content |
RDF Literal |
cnt:chars |
Character sequence | The character sequence of the text content. | cnt:ContentAsText |
RDF Literal |
cnt:declaredEncoding |
XML character encoding | The character encoding declared in the XML declaration. | cnt:ContentAsXML |
RDF Literal |
cnt:doctypeName |
Document type name | The document type name. | cnt:DoctypeDecl |
RDF Literal |
cnt:dtDecl |
Document type declaration | The document type declaration. | cnt:ContentAsXML |
cnt:DoctypeDecl |
cnt:internalSubset |
Internal DTD subset | The internal document type definition subset within the document type declarations. | cnt:DoctypeDecl |
RDF Literal |
cnt:leadingMisc |
XML leading misc | The XML content preceding the document type declaration. | cnt:ContentAsXML |
XML Literal |
cnt:publicId |
Public ID | The document type declarations's public identifier. | cnt:DoctypeDecl |
RDF Literal |
cnt:rest |
XML rest | The XML content following the document type declaration. | cnt:ContentAsXML |
XML Literal |
cnt:standalone |
XML standalone document declaration | The standalone declaration in the XML declaration. | cnt:ContentAsXML |
RDF Literal |
cnt:systemId |
System ID | The document type declarations's system identifier (typed: xsd:anyURI) | cnt:DoctypeDecl |
http://www.w3.org/2001/XMLSchema#anyURI |
cnt:version |
XML version | The XML version declared in the XML declaration. | cnt:ContentAsXML |
RDF Literal |
DOM property | Content-in-RDF property |
---|---|
Document.xmlVersion |
version |
Document.xmlEncoding |
declaredEncoding |
Document.xmlStandalone |
standalone |
Document.doctype |
dtDecl |
DocumentType.name |
doctypeName |
DocumentType.publicId |
publicId |
DocumentType.systemId |
systemId |
DocumentType.internalSubset |
internalSubset |
http://www.w3.org/TR/EARL10/
http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
http://www.w3.org/TR/rdf-primer/
http://www.w3.org/TR/rdf-schema/
http://www.w3.org/TR/rdf-syntax-grammar/
http://www.ietf.org/rfc/rfc2119.txt
http://www.ietf.org/rfc/rfc2045.txt
http://www.w3.org/TR/xml/
Contributors to this Working Draft: Shadi Abou-Zahra, Philip Ackermann, Carlos Iglesias, Johannes Koch, Michael Squillace, and Carlos Velasco.
The following is a list of substantial changes since the 29 October 2009 Working Draft:
dct:source
from Content
classdct:hasFormat
and dct:isFormatOf
to Content
class