Turtle

6 Turtle Grammar

A Turtle document is a Unicode[UNICODE] character string encoded in UTF-8. Unicode characters only in the range U+0000 to U+10FFFF inclusive are allowed.

6.5 Grammar

The RDF Working Group proposes to make the following changes to align Turtle with SPARQL.

The addition of sparqlPrefix and sparqlBase which allow for using SPARQL style BASE and PREFIX directives in a Turtle document.

Feedback, both positive and negative, is invited by sending email to mailing list public-rdf-comments@w3.org (subscribe, archives).

The EBNF used here is defined in XML 1.0 [EBNF-NOTATION]. Production labels consisting of a number and a final 's', e.g. [60s], reference the production with that number in the SPARQL Query Language for RDF grammar [RDF-SPARQL-QUERY]. - When tokenizing the input and choosing grammar rules, the longest match is chosen. The strings @prefix and @base match the pattern for LANGTAG, though neither "prefix" nor "base" are registered language subtags. This specification does not define whether a quoted literal followed by either of these tokens (e.g. "A"@base) is in the Turtle language.

+ Notes:

Keywords in single quotes ('@base', '@prefix', 'a', 'true', 'false') are case-sensitive. Keywords in double quotes ("BASE", "PREFIX") are case-insensitive.
Escape sequences UCHAR and ECHAR are case sensitive.
When tokenizing the input and choosing grammar rules, the longest match is chosen.
The Turtle grammar is LL(1) and LALR(1) when the rules with uppercased names are used as terminals.
The entry point into the grammar is turtleDoc.
In signed numbers, no white space is allowed between the sign and the number.
The [162s] ANON ::= '[' WS* ']' token allows any amount of white space and comments between []s. The single space version is used in the grammar for clarity.
The strings '@prefix' and '@base' match the pattern for LANGTAG, though neither "prefix" nor "base" are registered language subtags. This specification does not define whether a quoted literal followed by either of these tokens (e.g. "A"@base) is in the Turtle language.

	[1]	`turtleDoc`	::=	statement`*`
	[2]	`statement`	::=	directive `\|` triples '`.`'
	[3]	`directive`	::=	base `\|` prefixID `\|` sparqlBase `\|` sparqlPrefix
	[4]	`prefixID`	::=	'`@prefix`' PNAME_NS IRIREF '`.`'
	[5]	`base`	::=	'`@base`' IRIREF '`.`'
-	[28*]	`sparqlPrefix`	::=	[`Pp`] [`Rr`] [`Ee`] [`Ff`] [`Ii`] [`Xx`] PNAME_NS IRIREF
-	[29*]	`sparqlBase`	::=	[`Bb`] [`Aa`] [`Ss`] [`Ee`] IRIREF
+	[5s]	`sparqlBase`	::=	"`BASE`" IRIREF
+	[6s]	`sparqlPrefix`	::=	"`PREFIX`" PNAME_NS IRIREF
	[6]	`triples`	::=	subject predicateObjectList `\|` blankNodePropertyList predicateObjectList?
	[7]	`predicateObjectList`	::=	verb objectList ('`;`' (verb objectList)?)`*`
	[8]	`objectList`	::=	object ('`,`' object)`*`
	[9]	`verb`	::=	predicate `\|` '`a`'
-	[10]	`subject`	::=	iri `\|` blank
+	[10]	`subject`	::=	iri `\|` BlankNode `\|` collection
	[11]	`predicate`	::=	iri
-	[12]	`object`	::=	iri `\|` blank `\|` blankNodePropertyList `\|` literal
+	[12]	`object`	::=	iri `\|` BlankNode `\|` collection `\|` blankNodePropertyList `\|` literal
	[13]	`literal`	::=	RDFLiteral `\|` NumericLiteral `\|` BooleanLiteral
-	[14]	`blank`	::=	BlankNode `\|` collection
	[14]	`blankNodePropertyList`	::=	'`[`' predicateObjectList '`]`'
	[15]	`collection`	::=	'`(`' object`*` '`)`'
	[16]	`NumericLiteral`	::=	INTEGER `\|` DECIMAL `\|` DOUBLE
	[128s]	`RDFLiteral`	::=	String (LANGTAG `\|` '`^^`' iri)?
	[133s]	`BooleanLiteral`	::=	'`true`' `\|` '`false`'
	[17]	`String`	::=	STRING_LITERAL_QUOTE `\|` STRING_LITERAL_SINGLE_QUOTE `\|` STRING_LITERAL_LONG_SINGLE_QUOTE `\|` STRING_LITERAL_LONG_QUOTE
	[135s]	`iri`	::=	IRIREF `\|` PrefixedName
	[136s]	`PrefixedName`	::=	PNAME_LN `\|` PNAME_NS
	[137s]	`BlankNode`	::=	BLANK_NODE_LABEL `\|` ANON
Productions for terminals
	[18]	`IRIREF`	::=	'`<`' ([^#x00-#x20<>\"{}\|^`\] `\|` UCHAR)`*` '`>`'
	[139s]	`PNAME_NS`	::=	PN_PREFIX? '`:`'
	[140s]	`PNAME_LN`	::=	PNAME_NS PN_LOCAL
	[141s]	`BLANK_NODE_LABEL`	::=	'`_:`' (PN_CHARS_U `\|` [`0-9`]) ((PN_CHARS `\|` '`.`')`*` PN_CHARS)?
	[144s]	`LANGTAG`	::=	'`@`' [`a-zA-Z`]`+` ('`-`' [`a-zA-Z0-9`]`+`)`*`
	[19]	`INTEGER`	::=	[`+-`]? [`0-9`]`+`
-	[21]	`DECIMAL`	::=	[`+-`]? ([`0-9`]`*` '`.`' [`0-9`]`+`)
+	[20]	`DECIMAL`	::=	[`+-`]? [`0-9`]`*` '`.`' [`0-9`]`+`
	[21]	`DOUBLE`	::=	[`+-`]? ([`0-9`]`+` '`.`' [`0-9`]`*` EXPONENT `\|` '`.`' [`0-9`]`+` EXPONENT `\|` [`0-9`]`+` EXPONENT)
	[154s]	`EXPONENT`	::=	[`eE`] [`+-`]? [`0-9`]`+`
	[22]	`STRING_LITERAL_QUOTE`	::=	'`"`' ([`^#x22#x5C#xA#xD`] `\|` ECHAR `\|` UCHAR)`*` '`"`'
	[23]	`STRING_LITERAL_SINGLE_QUOTE`	::=	"`'`" ([`^#x27#x5C#xA#xD`] `\|` ECHAR `\|` UCHAR)`*` "`'`"
	[24]	`STRING_LITERAL_LONG_SINGLE_QUOTE`	::=	"`'''`" (("`'`" `\|` "`''`")? [`^'\`] `\|` ECHAR `\|` UCHAR)`*` "`'''`"
	[25]	`STRING_LITERAL_LONG_QUOTE`	::=	'`"""`' (('`"`' `\|` '`""`')? [`^"\`] `\|` ECHAR `\|` UCHAR)`*` '`"""`'
	[26]	`UCHAR`	::=	'`\u`' HEX HEX HEX HEX `\|` '`\U`' HEX HEX HEX HEX HEX HEX HEX HEX
	[159s]	`ECHAR`	::=	'`\`' [`tbnrf\"'`]
-	[160s]	`NIL`	::=	'`(`' WS`*` '`)`'
	[161s]	`WS`	::=	`#x20` `\|` `#x9` `\|` `#xD` `\|` `#xA`
	[162s]	`ANON`	::=	'`[`' WS`*` '`]`'
-	[163s]	`PN_CHARS_BASE`	::=	[`A-Z`] `\|` [`a-z`] `\|` [`#00C0-#00D6`] `\|` [`#00D8-#00F6`] `\|` [`#00F8-#02FF`] `\|` [`#0370-#037D`] `\|` [`#037F-#1FFF`] `\|` [`#200C-#200D`] `\|` [`#2070-#218F`] `\|` [`#2C00-#2FEF`] `\|` [`#3001-#D7FF`] `\|` [`#F900-#FDCF`] `\|` [`#FDF0-#FFFD`] `\|` [`#10000-#EFFFF`]
+	[163s]	`PN_CHARS_BASE`	::=	[`A-Z`] `\|` [`a-z`] `\|` [`#x00C0-#x00D6`] `\|` [`#x00D8-#x00F6`] `\|` [`#x00F8-#x02FF`] `\|` [`#x0370-#x037D`] `\|` [`#x037F-#x1FFF`] `\|` [`#x200C-#x200D`] `\|` [`#x2070-#x218F`] `\|` [`#x2C00-#x2FEF`] `\|` [`#x3001-#xD7FF`] `\|` [`#xF900-#xFDCF`] `\|` [`#xFDF0-#xFFFD`] `\|` [`#x10000-#xEFFFF`]
	[164s]	`PN_CHARS_U`	::=	PN_CHARS_BASE `\|` '`_`'
-	[166s]	`PN_CHARS`	::=	PN_CHARS_U `\|` '`-`' `\|` [`0-9`] `\|` `#00B7` `\|` [`#0300-#036F`] `\|` [`#203F-#2040`]
+	[166s]	`PN_CHARS`	::=	PN_CHARS_U `\|` '`-`' `\|` [`0-9`] `\|` `#x00B7` `\|` [`#x0300-#x036F`] `\|` [`#x203F-#x2040`]
	[167s]	`PN_PREFIX`	::=	PN_CHARS_BASE ((PN_CHARS `\|` '`.`')`*` PN_CHARS)?
	[168s]	`PN_LOCAL`	::=	(PN_CHARS_U `\|` '`:`' `\|` [`0-9`] `\|` PLX) ((PN_CHARS `\|` '`.`' `\|` '`:`' `\|` PLX)`*` PN_CHARS `\|` '`:`' `\|` PLX)?
	[169s]	`PLX`	::=	PERCENT `\|` PN_LOCAL_ESC
	[170s]	`PERCENT`	::=	'`%`' HEX HEX
	[171s]	`HEX`	::=	[`0-9`] `\|` [`A-F`] `\|` [`a-f`]
	[172s]	`PN_LOCAL_ESC`	::=	'`\`' ('`_`' `\|` '`~`' `\|` '`.`' `\|` '`-`' `\|` '`!`' `\|` '`$`' `\|` '`&`' `\|` "`'`" `\|` '`(`' `\|` '`)`' `\|` '`*`' `\|` '`+`' `\|` '`,`' `\|` '`;`' `\|` '`=`' `\|` '`/`' `\|` '`?`' `\|` '`#`' `\|` '`@`' `\|` '`%`')

7 Parsing

The RDF Concepts and Abstract Syntax ([[!RDF-CONCEPTS]]) specification defines three types of RDF Term: IRIs, literals and blank nodes. Literals are composed of a lexical form and an optional language tag [[!BCP47]] or datatype IRI. An extra type, prefix, is used during parsing to map string identifiers to namespace IRIs. This section maps a string conforming to the grammar in to a set of triples by mapping strings matching productions and lexical tokens to RDF terms or their components (e.g. language tags, lexical forms of literals). Grammar productions change the parser state and emit triples.

7.1 Parser State

Parsing Turtle requires a state of five items:

IRI baseURI — When the base production is reached, the second rule argument, IRIREF, is the base URI used for relative IRI resolution (test: base1 base2).
Map[prefix -> IRI] namespaces — The second and third rule arguments (PNAME_NS and IRIREF) in the prefixID production assign a namespace name (IRIREF) for the prefix (PNAME_NS). Outside of a prefixID production, any PNAME_NS is substituted with the namespace (test: prefix1 escapedNamespace1). Note that the prefix may be an empty string, per the PNAME_NS, production: (PN_PREFIX)? ":" (test: default1).
Map[string -> blank node] bnodeLabels — A mapping from string to blank node.
RDF_Term curSubject — The curSubject is bound to the subject production.
RDF_Term curPredicate — The curPredicate is bound to the verb production. If token matched was "a", curPredicate is bound to the IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#type (test: type).

7.2 RDF Term Constructors

This table maps productions and lexical tokens to RDF terms or components of RDF terms listed in :

production	type	procedure
IRIREF	IRI	The characters between "<" and ">" are unescaped¹ to form the unicode string of the IRI. Relative IRI resolution is performed per .
IRIREF	IRI	The characters between "<" and ">" are taken, with the numeric escape sequences unescaped, to form the unicode string of the IRI. Relative IRI resolution is performed per Section 6.3.
PNAME_NS	prefix	The potentially empty unicode string matching the first sequence of the rule, PN_PREFIX, is a key into the namespaces map.
PNAME_NS	prefix	When used in a prefixID or sparqlPrefix production, the `prefix` is the potentially empty unicode string matching the first argument of the rule is a key into the namespaces map.
PNAME_NS	IRI	When used in a PrefixedName production, the `iri` is the value in the namespaces map corresponding to the first argument of the rule.
PNAME_LN	IRI	A prefix is identified by the first argument, `PNAME_NS`. The namespaces map has a corresponding `namespace`. The unicode string of the IRI is formed by concatenating this `namespace` and the second argument, `PN_LOCAL`.
PNAME_LN	IRI	A potentially empty prefix is identified by the first sequence, `PNAME_NS`. The namespaces map MUST have a corresponding `namespace`. The unicode string of the IRI is formed by unescaping the reserved characters in the second argument, `PN_LOCAL`, and concatenating this onto the `namespace`.
STRING_LITERAL1	lexical form	The characters between the outermost "'"s are unescaped¹ to form the unicode string of a lexical form.
STRING_LITERAL2	lexical form	The characters between the outermost '"'s are unescaped¹ to form the unicode string of a lexical form.
STRING_LITERAL_LONG1	lexical form	The characters between the outermost "'''"s are unescaped¹ to form the unicode string of a lexical form.
STRING_LITERAL_LONG2	lexical form	The characters between the outermost '"""'s are unescaped¹ to form the unicode string of a lexical form.
STRING_LITERAL1	lexical form	The characters between the outermost "'"s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
STRING_LITERAL2	lexical form	The characters between the outermost '"'s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
STRING_LITERAL_LONG1	lexical form	The characters between the outermost "'''"s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
STRING_LITERAL_LONG2	lexical form	The characters between the outermost '"""'s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
LANGTAG	language tag	The characters following the `@` form the unicode string of the language tag.
RDFLiteral	literal	The literal has a lexical form of the first rule argument, `String`, and either a language tag of `LANGTAG` or a datatype IRI of `iri`, depending on which rule matched the input. if neither a language tag nor a datatype IRI is provided, the literal has a datatype of `xsd:string`.
INTEGER	literal	The literal has a lexical form of the input string, and a datatype of `xsd:integer`.
DECIMAL	literal	The literal has a lexical form of the input string, and a datatype of `xsd:decimal`.
DOUBLE	literal	The literal has a lexical form of the input string, and a datatype of `xsd:double`.
BooleanLiteral	literal	The literal has a lexical form of the `true` or `false`, depending on which matched the input, and a datatype of `xsd:boolean`.
BLANK_NODE_LABEL	blank node	The string matching the second argument, `PN_LOCAL`, is a key in bnodeLabels. If there is no corresponding blank node in the map, one is allocated.
ANON	blank node	A blank node is generated.
blankNodePropertyList	blank node	A blank node is generated. Note the rules for `blankNodePropertyList` in the next section.
collection	blank node	A blank node is generated. Note the rules for `collection` in the next section.
collection	blank node	For non-empty lists, a blank node is generated. Note the rules for `collection` in the next section.
collection	IRI	For empty lists, the resulting IRI is `rdf:nil`. Note the rules for `collection` in the next section.

¹ Escape Sequences defines a mapping from escaped unicode strings to unicode strings. The following lexical tokens are unescaped to produce unicode strings: IRIREF, STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1 and STRING_LITERAL_LONG2.

7.3 RDF Triples Constructors

A Turtle document defines an RDF graph composed of set of RDF triples. The subject production sets the curSubject. The verb production sets the curPredicate. Each object N in the document produces an RDF triple: curSubject curPredicate N .

Property Lists:

Beginning the blankNodePropertyList production records the curSubject and curPredicate, and sets curSubject to a novel blank node B. Finishing the blankNodePropertyList production restores curSubject and curPredicate. The node produced by matching blankNodePropertyList is the blank node B.

Collections:

Beginning the collection production records the curSubject and curPredicate. Each object in the collection production has a curSubject set to a novel blank node B and a curPredicate set to rdf:first. For each object object_n after the first produces a triple:object_n-1 rdf:rest object_n . Finishing the collection production creates an additional triple curSubject rdf:rest rdf:nil . and restores curSubject and curPredicate The node produced by matching collection is the first blank node B for non-empty lists and rdf:nil for empty lists.

Beginning the collection production records the curSubject and curPredicate, sets curSubject to a novel blank node B_head and sets curSubject and curPredicate to B_head and rdf:first respectively. Each object object in collection allocates a novel blank node B_n, creates an additional triple curSubject rdf:rest B_n . and sets curSubject to B_n. Finishing the collection production creates an additional triple curSubject rdf:rest rdf:nil . and restores curSubject and curPredicate The node produced by matching collection is the blank node B_head.

7.4 Parsing Example

The following informative example shows the semantic actions performed when parsing this Turtle document with an LALR(1) parser:

Map the prefix ericFoaf to the IRI http://www.w3.org/People/Eric/ericP-foaf.rdf#.
Map the empty prefix to the IRI http://xmlns.com/foaf/0.1/.
Assign curSubject the IRI http://www.w3.org/People/Eric/ericP-foaf.rdf#ericP.
Assign curPredicate the IRI http://xmlns.com/foaf/0.1/givenName.
Emit an RDF triple: <...rdf#ericP> <.../givenName> "Eric" .
Assign curPredicate the IRI http://xmlns.com/foaf/0.1/knows.
Emit an RDF triple: <...rdf#ericP> <.../knows> <...who/dan-brickley> .
Emit an RDF triple: <...rdf#ericP> <.../knows> _:1 .
Save curSubject and reassign to the blank node _:1.
Save curPredicate.
Assign curPredicate the IRI http://xmlns.com/foaf/0.1/mbox.
Emit an RDF triple: _:1 <.../mbox> <mailto:timbl@w3.org> .
Restore curSubject and curPredicate to their saved values (<...rdf#ericP>, <.../knows>).
Emit an RDF triple: <...rdf#ericP> <.../knows> <http://getopenid.com/amyvdh> .