Abstract

This document is a specification for a vocabulary to represent Content in RDF. This vocabulary is intended to provide a flexible framework within different usage scenarios to semantically represent any type of content, be it on the Web or in local storage media. For example, it can be used by Web accessibility evaluation tools to record a representation of the assessed Web content in an Evaluation And Report Language (EARL) 1.0 Schema evaluation report. The document contains introductory information on its usage and some examples.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

@@@ CHANGE @@@This First Public Working Draft of the Representing Content in RDF document was published on 8 September 2008 by the Evaluation and Repair Tools Working Group (ERT WG). It presents an RDF approach for recording different representations of content on the Web or in local storage media. This document is part of the W3C Evaluation And Report Language (EARL). It is expected to be published as a W3C Working Group Note after further review and refinement.

The RDF terms defined by this document can be used to extend the Evaluation And Report Language (EARL) 1.0 Schema, but can also be used separately to record different representations of content for any purpose. For example it can be used together with the HTTP Vocabulary in RDF to describe Web content that has been retrieved from a server using specific HTTP headers. The Working Group believes that this document is fairly stable despite being a first public working draft. The group encourages feedback about the approach, as well as about the completeness and maturity of this document by developers and researchers who have interest in representing content in RDF format. Feedback from the W3C Quality Assurance Interest Group, the W3C Semantic Web Interest Group, and the Protocol for Web Description Resources Working Group is particularly welcome. Please send comments on this document by @@@ CHANGE @@@29 September 2008 to the public mailing list of the working group public-wai-ert@w3.org. The archives of the working group mailing list are publicly available.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.


Table of Contents

  1. Introduction
  2. Classes
  3. Properties
  4. Usage scenarios
  5. Limitations of the vocabulary

Appendices

  1. A practical example
  2. Schema in RDF/XML
  3. Mapping between DOM and Content-in-RDF properties
  4. Document Changes
  5. References

1 Introduction

This document is the specification for a vocabulary to represent Content in RDF. There is a wide variety of scenarios (see section below) where a representation of any type of content, either on the Web or in any local storage media, is necessary. This specification provides an RDF application that allows to present semantically such content. The vocabulary is built in a flexible manner, thus there are no limitations known at the time of writing this specification. It also provides opportunities for extensions to match particular needs of its users.

This document assumes the following background knowledge:

Although the concepts of the Semantic Web are simple, their abstraction with RDF is known to bring difficulties to beginners. It is recommended to read carefully the aforementioned references and other tutorials found on the Web. It must be also borne in mind that RDF is primarily targeted to be machine processable, and therefore, some of its expressions are not very intuitive for developers used to work with XML only. The examples will be serialized using the abbreviated RDF/XML notation.

The keywords must, required, recommended, should, may, and optional are used in accordance with [RFC2119].

For limitations of this vocabulary, see section 5.

1.1 Namespaces

Table 1 presents the namespaces typically used by this vocabulary. The core namespace has the URI http://www.w3.org/2008/content# and the prefix cnt. The prefix notation presents the typical conventions used in the Web and in this document to denote a given namespace, and can be freely modified.

Table 1: namespaces used by this document.
Namespace prefix Namespace URI Description
cnt http://www.w3.org/2008/content# The default namespace for Representing Content in RDF.
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# Default RDF namespace [RDF].

1.2 Use cases

As stated earlier, this framework is designed in an open way to facilitate different implementation scenarios. The origin of the application comes from vocabularies describing testing scenarios like EARL [EARL]. Typical applications could be:

2 Classes

This section presents a description of the classes of this RDF vocabulary. We present every class together with its subclasses. We also include whenever relevant short snippets and examples.

2.1 Content Class

The Content class is an over arching class for any content that could be found on the Web, in an Intranet or in local storage media, for example. It is recommended always to use one of its subclasses. There is no restriction within the vocabulary scope on what can be represented with this class: textual content, binary files (e.g., images or movies), XML files, etc.

There are three subclasses from the Content class: Base64Content, TextContent and XMLContent.

2.1.1 Base64Content Class

The Base64Content class is a subclass of the Content class for Base64 encoded binary content (as defined by [RFC2045]) and can be used for any type of content, although its more typical use case is for binary files.

Properties with domain Base64Content Properties with range Base64Content

Conformance Note: A Base64Content must have exactly 1 bytes, and 0 or 1 characterEncoding.

Example 2.1: This example displays the representation of the W3C logo as a Base64Content resource. (Note: due to its length, the encoded string has been chunked until {...}.)

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:cnt="http://www.w3.org/2008/content#">

  <cnt:Base64Content rdf:about="http://www.w3.org/Icons/w3c_home.png">
    <cnt:bytes>77+9UE5HDQoaCgAAAA1JSERSAAAASAAAADAIAwAAAO+{...}</cnt:bytes>
  </cnt:Base64Content>

</rdf:RDF>

2.1.2 TextContent Class

The TextContent class is a subclass of the Content class for any type of textual content.

Properties with domain TextContent Properties with range TextContent

Conformance Note: A TextContent must have exactly 1 chars, and 0 or 1 characterEncoding.

Example 2.2: The following example represents a CSS file as a TextContent resource.

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:cnt="http://www.w3.org/2008/content#">

  <cnt:TextContent rdf:about="http://example.org/example.css">
    <cnt:characterEncoding>UTF-8</cnt:characterEncoding>
    <cnt:chars>body {
  color: #000;
  background: #fff
}
h1 {
  font-size: 1.6em
}
h2 {
  font-size: 1.3em
}</cnt:chars>
  </cnt:TextContent>

</rdf:RDF>

2.1.3 XMLContent Class

The XMLContent class is a subclass of the Content class for XML content.

Properties with domain XMLContent Properties with range XMLContent

Conformance Note: A XMLContent must have exactly 1 xmlRest, and 0 or 1 xmlDecl, xmlLeadingMisc, doctypeDecl and characterEncoding.

See the Mapping between DOM and the Content-in-RDF vocabulary.

Example 2.3: The XHTML page with the following source code:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <title>The title</title>
  </head>
  <body>
    <p>Some paragraph.</p>
  </body>
</html>

could be represented as this XMLContent resource.

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:cnt="http://www.w3.org/2008/content#">

  <cnt:XMLContent rdf:about="http://example.org/example203.html">
    <cnt:xmlRest rdf:parseType="Literal"><html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <title>The title</title>
  </head>
  <body>
    <p>Some paragraph.</p>
  </body>
</html>
</cnt:xmlRest>
  </cnt:XMLContent>

</rdf:RDF>

For the use of xmlDecl, xmlLeadingMisc and doctypeDecl see Appendix A: A practical example.

2.2 XMLDecl Class

An XML declaration. This class is normally used in conjunction with the XMLContent class, when the corresponding XML resource contains an XML declaration. The relation is expressed via the xmlDecl property.

Properties with domain XMLDecl Properties with range XMLDecl
xmlDecl

Conformance Note: A XMLDecl must have exactly 1 xmlVersion, and 0 or 1 xmlEncoding and xmlStandalone.

See the Mapping between DOM and the Content-in-RDF vocabulary.

Example 2.4: A typical XML declaration:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

Can be expressed as the following XMLDecl resource:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:cnt="http://www.w3.org/2008/content#">

  <cnt:XMLDecl rdf:ID="xmld0">
    <cnt:xmlStandalone>no</cnt:xmlStandalone>
    <cnt:xmlEncoding>UTF-8</cnt:xmlEncoding>
    <cnt:xmlVersion>1.0</cnt:xmlVersion>
  </cnt:XMLDecl>

</rdf:RDF>

2.3 DoctypeDecl Class

A document type declaration. Likewise XMLDecl, this class is normally used in conjunction with the XMLContent class, when the corresponding XML resource contains a document type declaration. The relation is expressed via the doctypeDecl property.

Properties with domain DoctypeDecl Properties with range DoctypeDecl
doctypeDecl

Conformance Note: A DoctypeDecl must have exactly 1 doctypeName, and 0 or 1 publicId, systemId and internalSubset.

See the Mapping between DOM and the Content-in-RDF vocabulary.

Example 2.5: A typical XHTML 1.0 Strict document type declaration:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

could be represented as the following DoctypeDecl resource:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:cnt="http://www.w3.org/2008/content#">

  <cnt:DoctypeDecl rdf:ID="dtd0">
    <cnt:systemId>http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</cnt:systemId>
    <cnt:publicId>-//W3C//DTD XHTML 1.0 Strict//EN</cnt:publicId>
    <cnt:doctypeName>html</cnt:doctypeName>
  </cnt:DoctypeDecl>

</rdf:RDF>

3 Properties

This section presents a description of the properties of this RDF vocabulary.

3.1 bytes Property

Character string representing the Base64 encoded byte sequence of the given content.

Domain: Base64Content
Range: Literal

3.2 characterEncoding Property

The character encoding.

When used with Base64Content: If the byte sequence was created from a given character sequence this property can be used to store the character encoding that was applied to create the byte sequence.

When used with TextContent: If the character sequence was created from a given byte sequence this property can be used to store the character encoding that was applied to create the character sequence.

When used with XMLContent: If the parser's input character stream was created from a given byte stream this property can be used to store the character encoding that was applied to create the character stream. Note: This is the used character encoding, not the one declared in an XML declaration.

Domain: Content
Range: Literal

3.3 chars Property

The character sequence of the given content.

Domain: TextContent
Range: Literal

3.4 doctypeDecl Property

This property relates an XML Content to its Document Type Declaration.

Domain: XMLContent
Range: DoctypeDecl

3.5 doctypeName Property

The document type name.

Domain: DoctypeDecl
Range: Literal

3.6 internalSubset Property

The internal subset of a document type declaration.

Domain: DoctypeDecl
Range: Literal

3.7 publicId Property

The formal public identifier of a document type declaration.

Domain: DoctypeDecl
Range: Literal

3.8 systemId Property

The system identifier of a document type declaration.

Domain: DoctypeDecl
Range: Literal typed xsd:anyURI

3.9 xmlDecl Property

This property relates an XML Content to its XML Declaration.

Domain: XMLContent
Range: XMLDecl

3.10 xmlEncoding Property

The character encoding specified in the XML declaration.

Domain: XMLDecl
Range: Literal

3.11 xmlLeadingMisc Property

The part of the XML (comments and processing instructions) following the XML declaration and preceding the document type declaration if there is one.

Domain: XMLContent
Range: XML Literal

3.12 xmlRest Property

It contains comments, processing instructions and the root element.

Domain: XMLContent
Range: XML Literal

3.13 xmlStandalone Property

The standalone document declaration.

Domain: XMLDecl
Range: Literal

3.14 xmlVersion Property

The XML version specified in the XML declaration.

Domain: XMLDecl
Range: Literal

4 Usage scenarios

We have identified some situations to make clear when to create which type of content resources. The following are only recommendations and are non-normative:

Situation A: byte sequence of non-text content

This includes images, multimedia, or other non-text resources. The byte sequence is recorded in Base64 format and represented as a literal using the cnt:bytes property of the cnt:Base64Content. Non-text content should not be represented using cnt:TextContent.

Situation B: byte sequence of text content with appropriate character encoding information

This includes HTML, CSS, client-side script, or other text-based resources. Given the byte sequence of text content (byteSeq) received from a Web server and an appropriate character encoding (ce). byteSeq is recorded in Base64 format and represented as a literal using the cnt:bytes property of the cnt:Base64Content.

After transforming the byteSeq to a character sequence charSeq using character encoding ce, charSeq is represented as a literal using the cnt:chars property of the cnt:TextContent and ce as a literal usind the cnt:characterEncoding property.

Situation C: byte sequence of text content with inappropriate character encoding information

Given the byte sequence of text content (byteSeq) received from a Web server and an inappropriate character encoding (ce). byteSeq is recorded in Base64 format and represented as a literal using the cnt:bytes property of the cnt:Base64Content. Because transforming byteSeq to a character sequence charSeq using character encoding ce fails, no cnt:TextContent resource can be created.

Situation D: character sequence of text content with appropriate character encoding information

Given the character sequence of text content (charSeq) created in memory and an appropriate character encoding (ce). A cnt:TextContent resource may be created with a cnt:chars property with an object literal created from charSeq. After transforming charSeq to byte sequence byteSeq using character encoding ce, a cnt:Base64Content resource may be created with cnt:bytes property with an object literal byteSeq and cnt:characterEncoding property with an object literal ce.

Situation E: byte sequence of XML content with appropriate character encoding information

Given the byte sequence of wellformed XML content (byteSeq) received from a Web server and an appropriate character encoding (ce). cnt:Base64Content and cnt:TextContent resources may be created as in situation B. Additionally, an cnt:XMLContent resource may be created.

Situation F: DOM Document changed in memory

Given a DOM Document in memory, originally created by parsing some XML source, but afterwards changed by DOM operations. A cnt:XMLDecl resource may be created from the information in the Document node itself (xmlVersion, xmlEncoding and xmlStandalone), and a cnt:DoctypeDecl resource from the information in the DocumentType node. A cnt:XMLContent resource may be created after serializing the relevant child nodes of the Document node to create object literals for cnt:xmlLeadingMisc (serialize Comment and ProcessingInstruction nodes preceding a DocumentType node) and cnt:xmlRest (serialize nodes following a DocumentType node). See the Mapping between DOM and the Content-in-RDF vocabulary.

5 Limitations of the vocabulary

The vocabulary provides a framework that allows the representation of any type of content. Of course, there are many possibilities for extensions that will allow the inclusion of additional metadata, like, e.g., that included in some multimedia formats. Typical scenarios for extensions could be:

However, at the point of writing this specification, the Working Group has decided to provide the basic framework that will support the immediate needs of vocabularies using this specification like EARL [EARL], leaving the room open for further extensions as new use cases are presented to us.

Appendix A: A practical example

To understand the versatility of the vocabulary, let us assume we have a given XHTML page containing an XML declaration, a comment preceding a document type declaration and some XHTML elements.

Example 2.6: A typical XHTML page.

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!-- this is a comment -->
<!DOCTYPE html "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
  <head>
    <title>The title</title>
  </head>
  <body>
    <p>Some paragraph.</p>
  </body>
</html>

This page could be represented as simple TextContent:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:cnt="http://www.w3.org/2008/content#">

  <cnt:TextContent rdf:about="http://example.org/example207.html">
    <cnt:chars>&lt;?xml version="1.0" encoding="UTF-8" standalone="no" ?&gt;
&lt;!-- this is a comment --&gt;
&lt;!DOCTYPE html "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"&gt;
&lt;html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"&gt;
  &lt;head&gt;
    &lt;title&gt;The title&lt;/title&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;p&gt;Some paragraph.&lt;/p&gt;
  &lt;/body&gt;
&lt;/html&gt;</cnt:chars>
  </cnt:TextContent>

</rdf:RDF>

or likewise as XMLContent. The information from the XML declaration is modelled as an XMLDecl resource and refered to from the XMLContent resource by the cnt:xmlDecl property. As the comment <!-- this is a comment --> precedes the document type declaration a cnt:xmlLeadingMisc property is created with its object literal containing the comment. The document type declaration is modelled as a DoctypeDecl resource and refered to from the cnt:XMLContent resource by the cnt:doctypeDecl property.

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:cnt="http://www.w3.org/2008/content#"
    xml:base="http://example.org/example208.html">

  <cnt:DoctypeDecl rdf:ID="dtd0">
    <cnt:systemId>http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</cnt:systemId>
    <cnt:publicId>-//W3C//DTD XHTML 1.0 Strict//EN</cnt:publicId>
    <cnt:doctypeName>html</cnt:doctypeName>
  </cnt:DoctypeDecl>

  <cnt:XMLDecl rdf:ID="xmld0">
    <cnt:xmlStandalone>no</cnt:xmlStandalone>
    <cnt:xmlEncoding>UTF-8</cnt:xmlEncoding>
    <cnt:xmlVersion>1.0</cnt:xmlVersion>
  </cnt:XMLDecl>

  <cnt:XMLContent rdf:about="#">
    <cnt:xmlLeadingMisc rdf:parseType="Literal"><!-- this is a comment --></cnt:xmlLeadingMisc>
    <cnt:doctypeDecl rdf:resource="#dtd0" />
    <cnt:xmlDecl rdf:resource="#xmld0" />
    <cnt:xmlRest rdf:parseType="Literal"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <title>The title</title>
  </head>
  <body>
    <p>Some paragraph.</p>
  </body>
</html></cnt:xmlRest>
  </cnt:XMLContent>

</rdf:RDF>

Appendix B: Schema in RDF/XML

An RDF Schema can be found at @@@TBD@@@.

Appendix C: Mapping between DOM and Content-in-RDF properties

DOM property Content-in-RDF property
Document.xmlVersion xmlVersion
Document.xmlEncoding xmlEncoding
Document.xmlStandalone xmlStandalone
Document.doctype doctypeDecl
DocumentType.name doctypeName
DocumentType.publicId publicId
DocumentType.systemId systemId
DocumentType.internalSubset internalSubset

Appendix D: Document Changes

@@@TBD@@@ Changes from this version to WD-Content-in-RDF-20080908

Since the 23 June editorial draft, only minor editorial changes have happened. Since the 20 February 2008 editorial draft, this document has changed as follows:

Appendix E: References

[EARL]
Evaluation and Report Language (EARL) 1.0 Schema. W3C Working Draft 23 March 2007.
http://www.w3.org/TR/EARL10/
[RDF]
Resource Description Framework (RDF) Model and Syntax Specification. W3C Recommendation, 22 February 1999.
http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
[RDF-PRIMER]
RDF Primer. W3C Recommendation, 10 February 2004.
http://www.w3.org/TR/rdf-primer/
[RDFS]
RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recommendation 10 February 2004.
http://www.w3.org/TR/rdf-schema/
[RDF-XML]
RDF/XML Syntax Specification (Revised). W3C Recommendation 10 February 2004.
http://www.w3.org/TR/rdf-syntax-grammar/
[RFC2119]
Request for Comments: 2119. Key words for use in RFCs to Indicate Requirement Levels, March 1997 (IETF).
http://www.ietf.org/rfc/rfc2119.txt
[RFC2045]
Request for Comments: 2045. Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies, November 1996 (IETF).
http://www.ietf.org/rfc/rfc2045.txt
[OWL]
OWL Web Ontology Language Overview. W3C Recommendation 10 February 2004.
http://www.w3.org/TR/owl-features/
[XML]
Extensible Markup Language (XML) 1.0 (Fourth Edition). W3C Recommendation 16 August 2006, edited in place 29 September 2006.
http://www.w3.org/TR/xml/