[contents]


Abstract

This document is a specification for a vocabulary to represent content in the Resource Description Framework (RDF). This vocabulary is intended to provide a flexible framework within different usage scenarios to semantically represent any type of content, be it on the Web or in local storage media. For example, it can be used by web quality assurance tools such as web accessibility evaluation tools to record a representation of the assessed web content, including text, images, or other types of formats. In many cases, it can be used together with HTTP Vocabulary in RDF 1.0, which allows quality assurance tools to record the HTTP headers that have been exchanged between a client and a server. This is particularly useful for quality assurance testing, conformance claims, and reporting languages like the W3C Evaluation And Report Language (EARL).

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This 2 May 2011 Editors Draft of Representing Content in RDF 1.0 is an update of the previous Representing Content in RDF Working Draft of 29 October 2009, and it incorporates all comments received since. This document is part of the W3C Evaluation And Report Language (EARL) but can be reused in other contexts too. This document is intended to be published and maintained as a W3C Working Group Note after review and refinement.

The Evaluation and Repair Tools Working Group (ERT WG) believes it has addressed all issues brought forth through previous Working Draft iterations. The Working Group encourages feedback about this document, Representing Content in RDF 1.0, by developers and researchers who have interest in software-supported evaluation and validation of websites, and by developers and researchers who have interest in Semantic Web technologies for content description, annotation, and adaptation. In particular, feedback from the groups involved in the W3C Semantic Web Activity, especially the Semantic Web Coordination Group, the Semantic Web Deployment Working Group, and the Semantic Web Interest Group would be greatly appreciated.

Please send comments on this Representing Content in RDF 1.0 document by @@@ to public-earl10-comments@w3.org (publicly visible mailing list archive).

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document has been produced by the Evaluation and Repair Tools Working Group (ERT WG) as part of the Web Accessibility Initiative (WAI) Technical Activity.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.


Table of Contents

  1. Introduction
  2. Classes
  3. Properties
  4. Usage scenarios
  5. Limitations of the vocabulary

Appendices

  1. A practical example
  2. Terms
  3. Mapping between DOM and Content-in-RDF properties
  4. References
  5. Contributors
  6. Document Changes

1 Introduction

This document is the specification for a vocabulary to represent Content in the Resource Description Framework (RDF). There is a wide variety of scenarios (see section below) where a representation of any type of content, either on the Web or in any local storage media, is necessary. This specification provides an RDF application that allows to present semantically such content. The vocabulary is built in a flexible manner, thus there are no limitations known at the time of writing this specification. It also provides opportunities for extensions to match particular needs of its users.

This document assumes the following background knowledge:

The terms defined by this document can be used as part of the W3C Evaluation And Report Language (EARL) and in other contexts too. Developer Guide for Evaluation and Report Language (EARL) 1.0 explains how to implement and use EARL, including conformance requirements for software tools.

Although the concepts of the Semantic Web are simple, their abstraction with RDF is known to bring difficulties to beginners. It is recommended to read carefully the aforementioned references and other tutorials found on the Web. It must be also borne in mind that RDF is primarily targeted to be machine processable, and therefore, some of its expressions are not very intuitive for developers used to work with XML only. The examples will be serialized using the abbreviated RDF/XML notation.

The keywords must, required, recommended, should, may, and optional are used in accordance with [RFC2119].

For limitations of this vocabulary, see section 5.

1.1 Namespaces

Table 1 presents the namespaces typically used by this vocabulary. The core namespace has the URI http://www.w3.org/2011/content# and the prefix cnt. The prefix notation presents the typical conventions used in the Web and in this document to denote a given namespace, and can be freely modified.

Table 1: namespaces used by this document.
Namespace prefix Namespace URI Description
cnt http://www.w3.org/2011/content# Namespace for Representing Content in RDF.
dct http://purl.org/dc/terms/ Namespace for Dublin Core Metadata Terms.
owl http://www.w3.org/2002/07/owl# Namespace for OWL [OWL].
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# Namespace for RDF [RDF].

1.2 Use cases

As stated earlier, this framework is designed in an open way to facilitate different implementation scenarios. The origin of the application comes from vocabularies describing testing scenarios like the Evaluation And Report Language (EARL) [EARL]. Typical applications could be:

2 Classes

This section presents a description of the classes of this RDF vocabulary. We present every class together with its subclasses. We also include whenever relevant short snippets and examples.

2.1 Content Class

[Editor's note: The Working Group has introduced the use of the properties dct:hasFormat and dct:isFormatOf to allow different representations of the same content; comments on this approach is welcome.]

The cnt:Content class is an overarching class for any content that could be found on the Web, in an Intranet or in local storage media, for example. It is recommended always to use one of its subclasses. There is no restriction within the vocabulary scope on what can be represented with this class: textual content, XML files, binary files (e.g., images or movies), etc.

There are three subclasses from the Content class: cnt:ContentAsBase64, cnt:ContentAsText and cnt:ContentAsXML.

In order to connect resources with different cnt:Content sub-types with each other, use the dct:hasFormat and dct:isFormatOf properties to point to each other. E.g. if there is an XML resource transmitted via HTTP, the original version would be a cnt:ContentAsBase64 resource. But cnt:ContentAsText and cnt:ContentAsXML resource could also be created and point to the cnt:ContentAsBase64 resource.

Examples

Example 2.1: This example shows how to relate derived resources to the original resource.

<cnt:ContentAsBase64 rdf:ID="xml0">
  <!-- ... -->
  <dct:isFormatOf rdf:resource="http://www.example.org/index.html"/>
  <dct:hasFormat rdf:resource="#xml1"/>
  <dct:hasFormat rdf:resource="#xml2"/>
</cnt:ContentAsBase64>

<cnt:ContentAsText rdf:ID="xml1">
  <!-- ... -->
  <dct:isFormatOf rdf:resource="http://www.example.org/index.html"/>
  <dct:hasFormat rdf:resource="#xml0"/>
  <dct:hasFormat rdf:resource="#xml2"/>
</cnt:ContentAsText>

<cnt:ContentAsXML rdf:ID="xml2">
  <!-- ... -->
  <dct:isFormatOf rdf:resource="http://www.example.org/index.html"/>
  <dct:hasFormat rdf:resource="#xml0"/>
  <dct:hasFormat rdf:resource="#xml1"/>
</cnt:ContentAsXML>

Related Properties

2.1.1 ContentAsBase64 Class

The cnt:ContentAsBase64 class is a subclass of the cnt:Content class for Base64 encoded binary content (as defined by [RFC2045]) and can be used for any type of content, although its more typical use case is for binary files.

Related Properties
Examples

Example 2.2: This example displays the representation of the W3C logo as a ContentAsBase64 resource. (Note: due to its length, the encoded string has been chunked until {...}.)

<cnt:ContentAsBase64>
  <cnt:bytes>77+9UE5HDQoaCgAAAA1JSERSAAAASAAAADAIAwAAAO+{...}</cnt:bytes>
  <dct:isFormatOf rdf:resource="http://www.w3.org/Icons/w3c_home.png"/>
</cnt:ContentAsBase64>

2.1.2 ContentAsText Class

The cnt:ContentAsText class is a subclass of the cnt:Content class for any type of textual content.

Related Properties
Examples

Example 2.3: The following example represents a Cascading Style Sheet (CSS) file as a ContentAsText resource.

<cnt:ContentAsText>
  <cnt:characterEncoding>UTF-8</cnt:characterEncoding>
  <cnt:chars>body {
  color: #000;
  background: #fff
}
h1 {
  font-size: 1.6em
}
h2 {
  font-size: 1.3em
}</cnt:chars>
  <dct:isFormatOf rdf:resource="http://example.org/example.css"/>
</cnt:ContentAsText>

2.1.3 ContentAsXML Class

The cnt:ContentAsXML class is a subclass of the cnt:Content class only for wellformed XML content.

Related Properties

See the Mapping between the Document Object Model (DOM) and the Content-in-RDF vocabulary.

Examples

Example 2.4: The XHTML page with the following source code:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <title>The title</title>
  </head>
  <body>
    <p>Some paragraph.</p>
  </body>
</html>

could be represented as this ContentAsXML resource.

<cnt:ContentAsXML>
  <cnt:rest rdf:parseType="Literal"><html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <title>The title</title>
  </head>
  <body>
    <p>Some paragraph.</p>
  </body>
</html></cnt:rest>
  <dct:isFormatOf rdf:resource="http://example.org/example203.html"/>
</cnt:ContentAsXML>

For the use of leadingMisc and dtDecl see Appendix A: A practical example.

2.2 DoctypeDecl Class

A document type declaration. This class is normally used in conjunction with the ContentAsXML class, when the corresponding XML resource contains a document type declaration. The relation is expressed via the dtDecl property.

Related Properties

See the Mapping between the Document Object Model (DOM) and the Content-in-RDF vocabulary.

Examples

Example 2.6: A typical XHTML 1.0 Strict document type declaration:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

could be represented as the following DoctypeDecl resource:

<cnt:DoctypeDecl rdf:ID="dtd0">
  <cnt:doctypeName>html</cnt:doctypeName>
  <cnt:publicId>-//W3C//DTD XHTML 1.0 Strict//EN</cnt:publicId>
  <cnt:systemId>http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</cnt:systemId>
</cnt:DoctypeDecl>

3 Properties

This section presents a description of the properties of this RDF vocabulary.

3.1 bytes Property

Character string representing the Base64 encoded byte sequence of the given content.

Domain:
cnt:ContentAsBase64
Range:
Base64 encoded bytes (http://www.w3.org/2001/XMLSchema#base64Binary)

3.2 characterEncoding Property

The character encoding.

Domain:
cnt:Content
Range:
Literal

3.3 chars Property

The character sequence of the given content.

Domain:
cnt:ContentAsText
Range:
Literal

3.4 declaredEncoding Property

The character encoding specified in the XML declaration.

Domain:
cnt:ContentAsXML
Range:
Literal

3.5 doctypeName Property

The document type name.

Domain:
cnt:DoctypeDecl
Range:
Literal

3.6 dtDecl Property

This property relates an XML Content to its Document Type Declaration.

Domain:
cnt:ContentAsXML
Range:
cnt:DoctypeDecl

3.7 internalSubset Property

The internal subset of a document type declaration.

Domain:
cnt:DoctypeDecl
Range:
Literal

3.8 leadingMisc Property

The part of the XML information items (whitespace, comments and processing instructions) following the XML declaration and preceding the document type declaration if there is one.

Domain:
cnt:ContentAsXML
Range:
XML Literal

3.9 publicId Property

The formal public identifier of a document type declaration.

Domain:
cnt:DoctypeDecl
Range:
Literal

3.10 rest Property

It contains comments, processing instructions and the root element.

Domain:
cnt:ContentAsXML
Range:
XML Literal

3.11 standalone Property

The standalone document declaration.

Domain:
cnt:ContentAsXML
Range:
Literal

3.12 systemId Property

The system identifier of a document type declaration.

Domain:
cnt:DoctypeDecl
Range:
Literal typed as URI (http://www.w3.org/2001/XMLSchema#anyURI)

3.13 version Property

The XML version specified in the XML declaration.

Domain:
cnt:ContentAsXML
Range:
Literal

5 Usage scenarios

We have identified some situations to make clear when to create which type of content resources. The following are only recommendations and are non-normative:

Situation A: byte sequence of non-text content

This includes images, multimedia, or other non-text resources. The byte sequence is recorded in Base64 format and represented as a literal using the cnt:bytes property of the cnt:ContentAsBase64. Non-text content should not be represented using cnt:ContentAsText.

Situation B: byte sequence of text content with appropriate character encoding information

This includes HTML, CSS, client-side script, or other text-based resources. Given the byte sequence of text content (byteSeq) received from a Web server and an appropriate character encoding (ce). byteSeq is recorded in Base64 format and represented as a literal using the cnt:bytes property of the cnt:ContentAsBase64.

After transforming the byteSeq to a character sequence charSeq using character encoding ce, charSeq is represented as a literal using the cnt:chars property of the cnt:ContentAsText and ce as a literal usind the cnt:characterEncoding property.

Situation C: byte sequence of text content with inappropriate character encoding information

Given the byte sequence of text content (byteSeq) received from a Web server and an inappropriate character encoding (ce). byteSeq is recorded in Base64 format and represented as a literal using the cnt:bytes property of the cnt:ContentAsBase64. Because transforming byteSeq to a character sequence charSeq using character encoding ce fails, no cnt:ContentAsText resource can be created.

Situation D: character sequence of text content with appropriate character encoding information

Given the character sequence of text content (charSeq) created in memory and an appropriate character encoding (ce). A cnt:ContentAsText resource may be created with a cnt:chars property with an object literal created from charSeq. After transforming charSeq to byte sequence byteSeq using character encoding ce, a cnt:ContentAsBase64 resource may be created with cnt:bytes property with an object literal byteSeq and cnt:characterEncoding property with an object literal ce.

Situation E: byte sequence of XML content with appropriate character encoding information

Given the byte sequence of wellformed XML content (byteSeq) received from a Web server and an appropriate character encoding (ce). cnt:ContentAsBase64 and cnt:ContentAsText resources may be created as in situation B. Additionally, an cnt:ContentAsXML resource may be created.

Situation F: Document Object Model (DOM) changed in memory

Given a DOM Document in memory, originally created by parsing some XML source, but afterwards changed by DOM operations. A cnt:XMLDecl resource may be created from the information in the Document node itself (version, declaredEncoding and standalone), and a cnt:DoctypeDecl resource from the information in the DocumentType node. A cnt:ContentAsXML resource may be created after serializing the relevant child nodes of the Document node to create object literals for cnt:leadingMisc (serialize Comment and ProcessingInstruction nodes preceding a DocumentType node) and cnt:rest (serialize nodes following a DocumentType node). See the Mapping between the Document Object Model (DOM) and Content-in-RDF properties.

6 Limitations of the vocabulary

The vocabulary provides a framework that allows the representation of any type of content. Of course, there are many possibilities for extensions that will allow the inclusion of additional metadata, like, e.g., that included in some multimedia formats. Typical scenarios for extensions could be:

However, at the point of writing this specification, the Working Group has decided to provide the basic framework that will support the immediate needs of vocabularies using this specification like the Evaluation and Report Language (EARL) [EARL], leaving the room open for further extensions as new use cases are presented to us.

Appendix A: A practical example

To understand the versatility of the vocabulary, let us assume we have a given XHTML page containing an XML declaration, a comment preceding a document type declaration and some XHTML elements.

Example 2.6: A typical XHTML page.

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!-- this is a comment -->
<!DOCTYPE html "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
  <head>
    <title>The title</title>
  </head>
  <body>
    <p>Some paragraph.</p>
  </body>
</html>

This page could be represented as simple ContentAsText:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:cnt="http://www.w3.org/2011/content#">

  <cnt:ContentAsText>
    <cnt:chars>&lt;?xml version="1.0" encoding="UTF-8" standalone="no" ?&gt;
&lt;!-- this is a comment --&gt;
&lt;!DOCTYPE html "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"&gt;
&lt;html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"&gt;
  &lt;head&gt;
    &lt;title&gt;The title&lt;/title&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;p&gt;Some paragraph.&lt;/p&gt;
  &lt;/body&gt;
&lt;/html&gt;</cnt:chars>
    <dct:isFormatOf rdf:resource="http://example.org/example207.html"/>
  </cnt:ContentAsText>

</rdf:RDF>

or likewise as ContentAsXML. As the comment <!-- this is a comment --> precedes the document type declaration a cnt:leadingMisc property is created with its object literal containing the comment. The document type declaration is modelled as a DoctypeDecl resource and refered to from the cnt:ContentAsXML resource by the cnt:dtDecl property.

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:cnt="http://www.w3.org/2011/content#"
    xml:base="http://example.org/example208.html">

  <cnt:DoctypeDecl rdf:ID="dtd0">
    <cnt:systemId>http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</cnt:systemId>
    <cnt:publicId>-//W3C//DTD XHTML 1.0 Strict//EN</cnt:publicId>
    <cnt:doctypeName>html</cnt:doctypeName>
  </cnt:DoctypeDecl>

  <cnt:ContentAsXML>
    <cnt:version>1.0</cnt:version>
    <cnt:declaredEncoding>UTF-8</cnt:declaredEncoding>
    <cnt:standalone>no</cnt:standalone>
    <cnt:leadingMisc rdf:parseType="Literal"><!-- this is a comment --></cnt:leadingMisc>
    <cnt:dtDecl rdf:resource="#dtd0" />
    <cnt:rest rdf:parseType="Literal"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <title>The title</title>
  </head>
  <body>
    <p>Some paragraph.</p>
  </body>
</html></cnt:rest>
    <dct:isFormatOf rdf:resource="http://example.org/example208.html"/>
  </cnt:ContentAsXML>

</rdf:RDF>

Appendix B: Terms

The following terms are defined by this specification:

Classes

Classes in the Content-in-RDF namespace
Class name Label Comment Refinements Related properties
cnt:Content Content The content. cnt:ContentAsBase64, cnt:ContentAsText, cnt:ContentAsXML
cnt:ContentAsBase64 Base64 Content The base64 encoded content (can be used for binary content). - cnt:bytes, cnt:characterEncoding
cnt:ContentAsText Text Content The text content (can be used for text content). - cnt:chars, cnt:characterEncoding
cnt:ContentAsXML XML content The XML content (can be used for XML-wellformed content). - cnt:version, cnt:declaredEncoding, cnt:standalone, cnt:leadingMisc, cnt:dtDecl, cnt:rest, cnt:characterEncoding
cnt:DoctypeDecl Document type declaration The document type declaration. - cnt:doctypeName, cnt:internalSubset, cnt:publicId, cnt:systemId

Properties

Properties in the Content-in-RDF namespace
Property name Label Comment Domain Range
cnt:bytes Base64 encoded byte sequence The Base64 encoded byte sequence of the content. cnt:ContentAsBase64 http://www.w3.org/2001/XMLSchema#base64Binary
cnt:characterEncoding Character encoding The character encoding used to create a character sequence from a byte sequence or vice versa. cnt:Content RDF Literal
cnt:chars Character sequence The character sequence of the text content. cnt:ContentAsText RDF Literal
cnt:declaredEncoding XML character encoding The character encoding declared in the XML declaration. cnt:ContentAsXML RDF Literal
cnt:doctypeName Document type name The document type name. cnt:DoctypeDecl RDF Literal
cnt:dtDecl Document type declaration The document type declaration. cnt:ContentAsXML cnt:DoctypeDecl
cnt:internalSubset Internal DTD subset The internal document type definition subset within the document type declarations. cnt:DoctypeDecl RDF Literal
cnt:leadingMisc XML leading misc The XML content preceding the document type declaration. cnt:ContentAsXML XML Literal
cnt:publicId Public ID The document type declarations's public identifier. cnt:DoctypeDecl RDF Literal
cnt:rest XML rest The XML content following the document type declaration. cnt:ContentAsXML XML Literal
cnt:standalone XML standalone document declaration The standalone declaration in the XML declaration. cnt:ContentAsXML RDF Literal
cnt:systemId System ID The document type declarations's system identifier (typed: xsd:anyURI) cnt:DoctypeDecl http://www.w3.org/2001/XMLSchema#anyURI
cnt:version XML version The XML version declared in the XML declaration. cnt:ContentAsXML RDF Literal

Appendix C: Mapping between the Document Object Model (DOM) and Content-in-RDF properties

DOM property Content-in-RDF property
Document.xmlVersion version
Document.xmlEncoding declaredEncoding
Document.xmlStandalone standalone
Document.doctype dtDecl
DocumentType.name doctypeName
DocumentType.publicId publicId
DocumentType.systemId systemId
DocumentType.internalSubset internalSubset

Appendix D: References

[EARL]
Evaluation and Report Language (EARL) 1.0 Schema. W3C Working Draft 28 April 2009.
http://www.w3.org/TR/EARL10/
[RDF]
Resource Description Framework (RDF) Model and Syntax Specification. W3C Recommendation, 22 February 1999.
http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
[RDF-PRIMER]
RDF Primer. W3C Recommendation, 10 February 2004.
http://www.w3.org/TR/rdf-primer/
[RDFS]
RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recommendation 10 February 2004.
http://www.w3.org/TR/rdf-schema/
[RDF-XML]
RDF/XML Syntax Specification (Revised). W3C Recommendation 10 February 2004.
http://www.w3.org/TR/rdf-syntax-grammar/
[RFC2119]
Request for Comments: 2119. Key words for use in RFCs to Indicate Requirement Levels, March 1997 (IETF).
http://www.ietf.org/rfc/rfc2119.txt
[RFC2045]
Request for Comments: 2045. Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies, November 1996 (IETF).
http://www.ietf.org/rfc/rfc2045.txt
[OWL]
Web Ontology Language (OWL) Overview - W3C
[XML]
Extensible Markup Language (XML) 1.0 (Fifth Edition). W3C Recommendation 26 November 2008.
http://www.w3.org/TR/xml/

Appendix E: Contributors

Contributors to this Working Draft: Shadi Abou-Zahra, Philip Ackermann, Carlos Iglesias, Johannes Koch, Michael Squillace, and Carlos Velasco.

Appendix F: Document Changes

The following is a list of substantial changes since the 29 October 2009 Working Draft: