W3C

Refactoring RDF/XML Syntax

W3C Working Draft 06 September 2001

This version:
http://www.w3.org/TR/2001/WD-rdf-syntax-grammar-20010906/
Latest version:
http://www.w3.org/TR/rdf-syntax-grammar/
Previous version:
None.
Editor:
Dave Beckett (University of Bristol)

Abstract

This RDF Core WG Working Draft describes the updates to the grammar for the XML syntax of the RDF model as described in RDF Model & Syntax after amendments and clarifications from the RDF Core WG.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.

This is a W3C RDF Core WG Working Draft of the RDF Core Working Group produced as part of the W3C Semantic Web Activity. It incorporates decisions made by the Working Group updating the XML syntax for RDF from the original RDF Model & Syntax document.

This document is being released for review by W3C members and other interested parties to encourage feedback and comments, especially with regard to how the changes affect existing implementations, and how the grammar can be formalized with schema languages. This is the current state of an ongoing work on the syntax and does not yet record all the related decisions or include the descriptive text from the grammar section of the original document.

This is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use it as reference material or to cite as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.

Comments on this document are invited and should be sent to the public mailing list www-rdf-comments@w3.org. An archive of comments is available at http://lists.w3.org/Archives/Public/www-rdf-comments/.

Table of contents

1. Introduction
2. Original Grammar
3. Updated Grammar after RDF Core decisions
  3.1. Updated Grammar
  3.2. Grammar Changes by Issue
4. Grammar Using XML Infoset Terms
  4.1. Infoset Notation
  4.2 - 4.22 Infoset Grammar
  4.23 Transformation Notes
5. Infoset Conformance
Appendix A: References


1. Introduction

RDF Model & Syntax used an EBNF form plus explanatory text to explain the XML syntax. Subsequent implementations of this syntax and comparison of the resulting RDF models have shown that there was ambiguity - implementations generated different models and certain syntax forms were not widely implemented. These issues were generally made as either feedback to the www-rdf-comments@w3.org (archive) or from discussions on the RDF Interest Group list www-rdf-interest@w3.org (archive) .

The RDF Core Working Group is chartered to respond to the need for a number of fixes, clarifications and improvements to the specification of RDF's abstract model and XML syntax. The working group invites feedback from the developer community on the effects of its proposals on existing implementations and documents.

Several decisions including amendments and deletions to the grammar are refered to below. The definitive record of the decisions is the RDF Core WG issues list.

This document records the process of updating the existing grammar showing the changes made step-by-step. The original grammar uses EBNF in terms of characters such as '<'. This was then transformed to be represented in terms of XML Information Set items which moves from the rather low-level details, such as particular forms of empty elements. This allows the grammar to be more precisely recorded and the mapping from the XML syntax to the RDF model more clearly shown.

This process is not yet complete, in that the final step is defining for each syntax production which RDF statements are added to the resulting model (if any). It is required that this be a more precise process than before in preferably a machine checkable language, mapping from the XML syntax to the RDF model. For this to happen means formalizing using one or more of various technologies such as XML Schema (Primer, Structures, Datatypes), RELAX, TREX, Relax NG and Schematron (not an exclusive list).

At present we are evaluating which of these technologies are sufficient and appropriate for this formalization but would appreciate feedback on this approach and suggestions for other formalisms that could be used.

2. Original Grammar

This section contains the EBNF grammar of the RDF/XML syntax from RDF Model & Syntax Formal Grammar for RDF section. The only changes made here were to make it legal XHTML via tidy and to change the links to the productions to point to those in the original document.

  [6.1] RDF            ::= ['<rdf:RDF>'] obj* ['</rdf:RDF>']
  [6.2] obj            ::= description | container
  [6.3] description    ::= '<rdf:Description' idAboutAttr? bagIdAttr? propAttr* '/>'
                         | '<rdf:Description' idAboutAttr? bagIdAttr? propAttr* '>'
                                propertyElt* '</rdf:Description>'
                         | typedNode
  [6.4] container      ::= sequence | bag | alternative
  [6.5] idAboutAttr    ::= idAttr | aboutAttr | aboutEachAttr
  [6.6] idAttr         ::= ' ID="' IDsymbol '"'
  [6.7] aboutAttr      ::= ' about="' URI-reference '"'
  [6.8] aboutEachAttr  ::= ' aboutEach="' URI-reference '"'
                         | ' aboutEachPrefix="' string '"'
  [6.9] bagIdAttr      ::= ' bagID="' IDsymbol '"'
 [6.10] propAttr       ::= typeAttr
                         | propName '="' string '"' (with embedded quotes escaped)
 [6.11] typeAttr       ::= ' type="' URI-reference '"'
 [6.12] propertyElt    ::= '<' propName idAttr? '>' value '</' propName '>'
                         | '<' propName idAttr? parseLiteral '>'
                               literal '</' propName '>'
                         | '<' propName idAttr? parseResource '>'
                               propertyElt* '</' propName '>'
                         | '<' propName idRefAttr? bagIdAttr? propAttr* '/>'
 [6.13] typedNode      ::= '<' typeName idAboutAttr? bagIdAttr? propAttr* '/>'
                         | '<' typeName idAboutAttr? bagIdAttr? propAttr* '>'
                               propertyElt* '</' typeName '>'
 [6.14] propName       ::= Qname
 [6.15] typeName       ::= Qname
 [6.16] idRefAttr      ::= idAttr | resourceAttr
 [6.17] value          ::= obj | string
 [6.18] resourceAttr   ::= ' resource="' URI-reference '"'
 [6.19] Qname          ::= [ NSprefix ':' ] name
 [6.20] URI-reference  ::= string, interpreted per [URI]
 [6.21] IDsymbol       ::= (any legal XML name symbol)
 [6.22] name           ::= (any legal XML name symbol)
 [6.23] NSprefix       ::= (any legal XML namespace prefix)
 [6.24] string         ::= (any XML text, with "<", ">", and "&" escaped)
 [6.25] sequence       ::= '<rdf:Seq' idAttr? '>' member* '</rdf:Seq>'
                         | '<rdf:Seq' idAttr? memberAttr* '/>'
 [6.26] bag            ::= '<rdf:Bag' idAttr? '>' member* '</rdf:Bag>'
                         | '<rdf:Bag' idAttr? memberAttr* '/>'
 [6.27] alternative    ::= '<rdf:Alt' idAttr? '>' member+ '</rdf:Alt>'
                         | '<rdf:Alt' idAttr? memberAttr? '/>'
 [6.28] member         ::= referencedItem | inlineItem
 [6.29] referencedItem ::= '<rdf:li' resourceAttr '/>'
 [6.30] inlineItem     ::= '<rdf:li' '>' value </rdf:li>'
                         | '<rdf:li' parseLiteral '>' literal </rdf:li>'
                         | '<rdf:li' parseResource '>' propertyElt* </rdf:li>'
 [6.31] memberAttr     ::= ' rdf:_n="' string '"' (where n is an integer)
 [6.32] parseLiteral   ::= ' parseType="Literal"'
 [6.33] parseResource  ::= ' parseType="Resource"'
 [6.34] literal        ::= (any well-formed XML)

(Note: there are EBNF bugs in the 6.30 production where the </rdf:li> tags are not fully enclosed in quotes as '</rdf:li>')

3. Updated Grammar after RDF Core decisions

This section updates the original grammar in Section 2 by amending and deleting various productions according to the recorded RDF Core WG decisions. Some productions are also removed since they are no longer needed, once the above changes are made.

3.1. Updated Grammar

Key:
This text should be added If it is not, your browser will not display this section properly.
This text should be deleted. If it is not, your browser will not display this section properly.

Updated RDF/XML grammar productions
Production
Number
Production
Name
Definition
6.1 RDF "<rdf:RDF>" obj description* "</rdf:RDF>"
| description
6.2 obj description | container
6.3 description "<rdf:Description" idAboutAttr? bagIdAttr? propAttr* "/>"
| "<rdf:Description" idAboutAttr? bagIdAttr? propAttr* ">"
propertyElt* "</rdf:Description>"
| typedNode
6.4 container sequence | bag | alternative
6.5 idAboutAttr idAttr | aboutAttr | aboutEachAttr
6.6 idAttr " rdf:ID=\"" IDsymbol "\""
6.7 aboutAttr " rdf:about=\"" URI-reference "\""
6.8 aboutEachAttr " rdf:aboutEach=\"" URI-reference "\""
| " aboutEachPrefix=\"" string "\""
6.9 bagIdAttr " rdf:bagID=\"" IDsymbol "\""
6.10 propAttr typeAttr
| propName "=\"" string "\"" (with embedded quotes escaped)
6.11 typeAttr " rdf:type=\"" URI-reference "\""
6.12 propertyElt "<" propName idAttr? ">" value "</" propName ">"
| "<" propName idAttr? parseLiteral ">"
literal "</" propName ">"
| "<" propName idAttr? parseResource ">"
propertyElt* "</" propName ">"
| "<" propName idRefAttr? bagIdAttr? propAttr* "/>"
6.13 typedNode "<" typeName idAboutAttr? bagIdAttr? propAttr* "/>"
| "<" typeName idAboutAttr? bagIdAttr? propAttr* ">"
propertyElt* "</" typeName ">"
6.14 propName Qname
6.15 typeName Qname
6.16 idRefAttr idAttr | resourceAttr
6.17 value obj description | string
6.18 resourceAttr " rdf:resource=\"" URI-reference "\""
6.19 Qname [ NSprefix ":" ] name
6.20 URI-reference string, interpreted per [URI]
6.21 IDsymbol any legal XML name symbol
6.22 name any legal XML name symbol
6.23 NSprefix any legal XML namespace prefix
6.24 string any XML text, with "<", ">", and "&" escaped
6.25 sequence "<rdf:Seq" idAttr? ">" member* "</rdf:Seq>"
| "<rdf:Seq" idAttr? memberAttr* "/>"
6.26 bag "<rdf:Bag" idAttr? ">" member* "</rdf:Bag>"
| "<rdf:Bag" idAttr? memberAttr* "/>"
6.27 alternative "<rdf:Alt" idAttr? ">" member+ "</rdf:Alt>"
| "<rdf:Alt" idAttr? memberAttr? "/>"
6.28 member referencedItem | inlineItem
6.29 referencedItem "<rdf:li" resourceAttr "/>"
6.30 inlineItem "<rdf:li" ">" value </rdf:li>"
| "<rdf:li" parseLiteral ">" literal </rdf:li>"
| "<rdf:li" parseResource ">" propertyElt* </rdf:li>"
6.31 memberAttr " rdf:_n=\"" string "\"" (where n is an integer)
6.32 parseLiteral " rdf:parseType=\"Literal\""
6.33 parseResource " rdf:parseType=\"Resource\""
6.34 literal any well-formed XML

3.2. Grammar Changes by Issue

The decided issues that changed the grammar are recorded here but this is not the definitive list or description - see the RDF Core WG issues list. There are other decided issues that did not affect the EBNF grammar but do affect the syntax by amending the descriptive text in the original grammar. These issues are not recorded here at this time. Decided issues may also have associated test cases which can be found in the RDF Test Cases document (work in progess at this date).

Changes to the RDF/XML grammar listed by RDF Core WG Issue
Productions Issue Description
6.6, 6.7, 6.8, 6.9, 6.11, 6.18, 6.32, 6.33 rdf-ns-prefix-confusion On 25th May 2001, the WG decided that ALL attributes must be namespace qualified. There is a description of the decision, including detail on the grammar productions affected and a collection of test cases
6.8 rdfms-abouteachprefix On 1st June 2001, the WG decided that aboutEachPrefix would be removed from the RDF Model and Syntax Recommendation on the grounds that there is a lack of implementation experience, and it therefore should not be in the recommendation. A future version of RDF may consider support for this feature.
6.25, 6.26, 6.27, 6.28, 6.29, 6.30, 6.31 rdf-containers-syntax-ambiguity
rdf-containers-syntax-vs-schema
On 29th June 2001, the WG decided that containers will match the typed node production in the grammar (production 6.13) and that the container specific productions (productions 6.25 to 6.31) and any references to them be removed from the grammar. rdf:li elements will be translated to rdf:_nnn elements when they are found matching either a propertyElt (production 6.12) or a a typedNode (production 6.13). The decision includes a set of test cases.
6.4 (definition) - container is no longer needed after all its sub-productions 6.25, 6.26 and 6.27 are removed.
6.2 (definition), 6.1 (use) - obj is no longer needed after the container production is removed.

4. Grammar Using XML Infoset Terms

This section takes the updated EBNF grammer in Section 3.1 and removes the low-level XML syntax characters, replacing them with description in terms of XML Infoset information items.

4.1. Infoset Notation

The following notation is used for XML Infoset information items and EBNF.

Notation for XML Infoset information items and EBNF.
Notation Meaning
[property]=value XML Infoset information item property and value
element([prop1]=value1,
  [prop2]=value2, ...)
XML Infoset Element Information Item with properties
attribute([prop1]=value1, [prop1]=value2, ...) XML Infoset Attribute Information Item with properties
character() An XML Infoset Character Information Item with any allowed code.
list(item1, item2, ...); list() An ordered list of items in document order; an empty list
set(item1, item2, ...); set() An unordered set of items; an empty set
* Zero or more of preceding term
? Zero or one of preceding term
+ One or more of preceding term
A | B | ... The A, B, ... terms are alternatives, with left-to-right priority. For example if term A matches, it is chosen even if term B also matches.
"ABC" A string used for value of [local name] property.
any Any legal property value.

4.2 Production RDF (was 6.1 RDF)

element([namespace name]=rdf-ns,
    [local name]="RDF",
    [children]=list(node*),
    [attributes]=set())
| node

4.3 Production node (new)

description | typedNode

4.4 Production description (was 6.3 description)

element([namespace name]=rdf-ns,
    [local name]="Description",
    [attributes]=set(idAboutAttr?, bagIdAttr?, propertyAttr*),
    [children]=list())
| element([namespace name]=rdf-ns,
    [local name]="Description",
    [attributes]=set(idAboutAttr?, bagIdAttr?, propertyAttr*),
    [children]=list(propertyElt+))

4.5 Production typedNode (was 6.13 typedNode)

element([namespace name]=any,
    [local name]=any,
    [attributes]=set(idAboutAttr?, bagIdAttr?, propertyAttr*),
    [children]=list())
| element([namespace name]=any,
    [local name]=any,
    [attributes]=set(idAboutAttr?, bagIdAttr?, propertyAttr*),
    [children]=list(propertyElt+))

4.6 Production propertyElt (was 6.12 propertyElt)

element([namespace name]=any,
    [local name]=any,
    [attributes]=set(idAttr?),
    [children]=list(node))
| element([namespace name]=any,
    [local name]=any,
    [attributes]=set(idAttr?),
    [children]=list(character()+))
| element([namespace name]=any,
    [local name]=any,
    [attributes]=set(idAttr?), [children]=list())
| element([namespace name]=any,
    [local name]=any,
    [attributes]=set(idAttr?, parseLiteral),
    [children]=list(literal))
| element([namespace name]=any,
    [local name]=any,
    [attributes]=set(idAttr?, parseLiteral),
    [children]=list())
| element([namespace name]=any,
    [local name]=any,
    [attributes]=set(idAttr?, parseResource),
    [children]=list(propertyElt*))
| element([namespace name]=any,
    [local name]=any,
    [attributes]=set(idAttr?, parseOther),
    [children]=any)
| element([namespace name]=any,
    [local name]=any,
    [attributes]=set((idAttr | resourceAttr)?, bagIdAttr?, propertyAttr*),
    [children]=list())

4.7 Production idAboutAttr (was 6.5 idAboutAttr)

idAttr | aboutAttr | aboutEachAttr

4.8 Production idAttr (was 6.6 idAttr)

attribute([namespace name]=rdf-ns,
    [local name]="ID",
    [normalized value]=rdf-id)

4.9 Production aboutAttr (was 6.7 aboutAttr)

attribute([namespace name]=rdf-ns,
    [local name]="about",
    [normalized value]=URI-reference)

4.10 Production aboutEachAttr (was 6.8 aboutEachAttr)

attribute([namespace name]=rdf-ns,
    [local name]="aboutEach",
    [normalized value]=URI-reference)

4.11 Production bagIdAttr (was 6.9 bagIdAttr)

attribute([namespace name]=rdf-ns,
    [local name]="bagID",
    [normalized value]=rdf-id)

4.12 Production propertyAttr (new)

typeAttr | propAttr

4.13 Production propAttr (was 6.10 propAttr)

attribute([namespace name]=any,
    [local name]=any,
    [normalized value]=CDATA)

4.14 Production typeAttr (was 6.11 typeAttr)

attribute([namespace name]=rdf-ns,
    [local name]="type",
    [normalized value]=URI-reference)

4.15 Production resourceAttr (was 6.18 resourceAttr)

attribute([namespace name]=rdf-ns,
    [local name]="resource",
    [normalized value]=URI-reference)

4.16 Production parseLiteral (was 6.32 parseLiteral)

attribute([namespace name]=rdf-ns,
    [local name]="parseType",
    [normalized value]="Literal")

4.17 Production parseResource (was 6.33 parseResource)

attribute([namespace name]=rdf-ns,
    [local name]="parseType",
    [normalized value]="Resource")

4.18 Production parseOther (new)

attribute([namespace name]=rdf-ns,
    [local name]="parseType",
    [normalized value]=CDATA)

4.19 Production URI-reference (was 6.20 URI-reference)

CDATA interpreted as a URI reference defined in RFC2396 BNF production URI-reference.

ISSUE: is this the best way to specify this?

4.20 Production literal (was 6.34 literal)

Any non-empty well-formed XML.

ISSUE: This is not precise enough. What to do here? Need to link to Infoset terms including character().

4.21 Production rdf-ns (new)

The URI http://www.w3.org/1999/02/22-rdf-syntax-ns#

4.22 Production rdf-id (new)

CDATA matching any legal XML token Nmtoken

ISSUE: Should this be changed from any legal XML Nmtoken to be the same as that for XML IDs? In XML 1.0 (Second Edition) XML IDs must match Validity constraint: ID which requires the identifiers to match the Name production - a more restricted identifier than Nmtoken.

4.23 Transformation Notes

  1. 6.17 value was removed and merged into the remaining single use in propertyElt as description | string.

  2. 6.24 string was removed and merged into the remaining single use in propertyElt as two expanded terms - list(character()+) and list() .

  3. 6.34 literal was modified to be non-empty XML (although this isn't very precise) and the empty/non-empty distinction preserved in the single case where it was used in propertyElt as an expansion to list(literal) and list()

  4. In 6.13 typedNode and 6.3 description, the propertyElt* were replaced with propertyElt+ so that there was no ambiguity with which part was handled when there were no contained propertyElt.

  5. The production numbers were removed and the productions reordered into approximate elements, attributes and terminal terms order.

  6. node added to distinguish description and typedNode. Replaces description.

  7. propertyAttr added to make clear alternatives - propAttr and typeAttr. Replaces propAttr.

  8. idRefAttr removed and replaced inline into the single place used in propertyElt

  9. A | B was redefined to have left-to-right priority so, for example, typeAttr matches before propAttr.

  10. parseOther added to provide a place to later define what happens with non-Literal, non-Resource parseTypes.

5. Infoset Conformance

This specification requires an information set as defined in XML Infoset which supports at least the following information items and properties:

Attribute Information Item
[local name], [namespace name], [normalized value]
Character Information Item
[character code]
Element Information Item
[local name], [namespace name], [children], [attributes]

This specification does not require any destructive alterations to the input information set; no items are added or removed.

This section is intended to satisfy the requirements for Conformance to the XML Infoset specification.

Appendix A: References

Normative References

RDF Model & Syntax
World Wide Web Consortium. Resource Description Framework (RDF) Model and Syntax Specification, 22 February 1999.
XML 1.0 Recommendation (Second Edition)
World Wide Web Consortium. Extensible Markup Language (XML) 1.0, Second Edition.
Namespaces in XML
World Wide Web Consortium. Namespaces in XML.
XML Information Set
World Wide Web Consortium, XML Information Set - W3C Proposed Recommendation, 10 August 2001.
RFC 2396 - URIs
T. Berners-Lee, Fielding and Masinter, RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax, August 1998.
XML Schema Part 0: Primer
World Wide Web Consortium, XML Schema Part 0: Primer - W3C Recommendation, 2 May 2001.
XML Schema Part 1: Structures
World Wide Web Consortium, XML Schema Part 1: Structures - W3C Recommendation, 2 May 2001.
XML Schema Part 2: Datatypes
World Wide Web Consortium, XML Schema Part 2: Datatypes - W3C Recommendation, 2 May 2001.
RELAX NG
Relax NG Specification, James Clark and MURATA Makoto, editors, OASIS, 11 August 2001.
RDF Test Cases
RDF Test Cases, RDF Core WG Internal Draft, work in progress.
TREX
TREX - Tree Regular Expressions for XML, James Clark, Thai Open Source Software Center, 2001.
RELAX
RELAX (Regular Language description for XML), MURATA Makoto, INSTAC (Information Technology Research and Standardization Center), 2001
Schematron
Schematron, Rick Jelliffe, Academia Sinica Computing Centre, Taibei.

Informational References

Other ways to express the existing grammar, new syntaxes and grammars and other new ideas.

N-Triples
World Wide Web Consortium. N-Triples, RDF Core Work Group Internal Working Draft.
RDF Data Model strawman
Resource Description Framework: Data Model Summary, RDF Interest Group Discussion Document, Dan Brickley.
Formal Grammar for RDF 1.0
Forest grammar/tree regular expression for RDF 1.0, Proposal and RELAXNG schema by Jonathan Borden, Open Healthcare Group, 20 June 2001, announced on the RDF Interest list.
Basic Semantic Web Language
Basic Semantic Web Language, Proposal by Sean B. Palmer, 16 July 2001.
Reforming RDF
A modest proposal for reforming RDF Version 0.1, Proposal by Drew McDermott, 13 Dec 2000, announced on www-rdf-logic.
RDF Abstract Syntax
Formal Grammar for RDF 1.0, Proposal by Jonathan Borden, Pat Hayes and Drew McDermott, 26 June 2001.
Proposal for clarification of RDF
Proposal for clarification of RDF by Rick Jelliffe, 20 June 2001.
Blindfold
Blindfold Grammar System - allows the defining of annotated grammars for XML and non-XML documents that can extract RDF statements, Sandro Hawke, 30 August 2001.
ARP
ARP: Another RDF Parser - a parser written in Java by Jeremy Carroll, 27 July 2001.
Meta-BNF
A meta-grammar for describing XML-based formats by Bert Bos, 8 Feb 1999.
RDFS for XML Infoset
An RDF Schema for the XML Information Set, W3C Note, Richard Tobin, 6 April 2001.
RXP XML Parser
RXP XML Parser which can emit the XML infoset as XML suitable for transforming, Richard Tobin.