6 Turtle Grammar

A Turtle document is a Unicode[UNICODE] character string encoded in UTF-8. Unicode characters only in the range U+0000 to U+10FFFF inclusive are allowed.

6.5 Grammar

The RDF Working Group proposes to make the following changes to align Turtle with SPARQL.

  • The addition of sparqlPrefix and sparqlBase which allow for using SPARQL style BASE and PREFIX directives in a Turtle document.

Feedback, both positive and negative, is invited by sending email to mailing list public-rdf-comments@w3.org (subscribe, archives).

The EBNF used here is defined in XML 1.0 [EBNF-NOTATION]. Production labels consisting of a number and a final 's', e.g. [60s], reference the production with that number in the SPARQL Query Language for RDF grammar [RDF-SPARQL-QUERY]. - When tokenizing the input and choosing grammar rules, the longest match is chosen. The strings @prefix and @base match the pattern for LANGTAG, though neither "prefix" nor "base" are registered language subtags. This specification does not define whether a quoted literal followed by either of these tokens (e.g. "A"@base) is in the Turtle language.

+ Notes:

  1. Keywords in single quotes ('@base', '@prefix', 'a', 'true', 'false') are case-sensitive. Keywords in double quotes ("BASE", "PREFIX") are case-insensitive.
  2. Escape sequences UCHAR and ECHAR are case sensitive.
  3. When tokenizing the input and choosing grammar rules, the longest match is chosen.
  4. The Turtle grammar is LL(1) and LALR(1) when the rules with uppercased names are used as terminals.
  5. The entry point into the grammar is turtleDoc.
  6. In signed numbers, no white space is allowed between the sign and the number.
  7. The [162s] ANON ::= '[' WS* ']' token allows any amount of white space and comments between []s. The single space version is used in the grammar for clarity.
  8. The strings '@prefix' and '@base' match the pattern for LANGTAG, though neither "prefix" nor "base" are registered language subtags. This specification does not define whether a quoted literal followed by either of these tokens (e.g. "A"@base) is in the Turtle language.
  [1] turtleDoc ::= statement*
  [2] statement ::= directive | triples '.'
  [3] directive ::= base | prefixID | sparqlBase | sparqlPrefix
  [4] prefixID ::= '@prefix' PNAME_NS IRIREF '.'
  [5] base ::= '@base' IRIREF '.'
- [28*] sparqlPrefix ::= [Pp] [Rr] [Ee] [Ff] [Ii] [Xx] PNAME_NS IRIREF
- [29*] sparqlBase ::= [Bb] [Aa] [Ss] [Ee] IRIREF
+ [5s] sparqlBase ::= "BASE" IRIREF
+ [6s] sparqlPrefix ::= "PREFIX" PNAME_NS IRIREF
  [6] triples ::= subject predicateObjectList | blankNodePropertyList predicateObjectList?
  [7] predicateObjectList ::= verb objectList (';' (verb objectList)?)*
  [8] objectList ::= object (',' object)*
  [9] verb ::= predicate | 'a'
- [10] subject ::= iri | blank
+ [10] subject ::= iri | BlankNode | collection
  [11] predicate ::= iri
- [12] object ::= iri | blank | blankNodePropertyList | literal
+ [12] object ::= iri | BlankNode | collection | blankNodePropertyList | literal
  [13] literal ::= RDFLiteral | NumericLiteral | BooleanLiteral
- [14] blank ::= BlankNode | collection
  [14] blankNodePropertyList ::= '[' predicateObjectList ']'
  [15] collection ::= '(' object* ')'
  [16] NumericLiteral ::= INTEGER | DECIMAL | DOUBLE
  [128s] RDFLiteral ::= String (LANGTAG | '^^' iri)?
  [133s] BooleanLiteral ::= 'true' | 'false'
  [17] String ::= STRING_LITERAL_QUOTE | STRING_LITERAL_SINGLE_QUOTE | STRING_LITERAL_LONG_SINGLE_QUOTE | STRING_LITERAL_LONG_QUOTE
  [135s] iri ::= IRIREF | PrefixedName
  [136s] PrefixedName ::= PNAME_LN | PNAME_NS
  [137s] BlankNode ::= BLANK_NODE_LABEL | ANON

Productions for terminals

  [18] IRIREF ::= '<' ([^#x00-#x20<>\"{}|^`\] | UCHAR)* '>'
  [139s] PNAME_NS ::= PN_PREFIX? ':'
  [140s] PNAME_LN ::= PNAME_NS PN_LOCAL
  [141s] BLANK_NODE_LABEL ::= '_:' (PN_CHARS_U | [0-9]) ((PN_CHARS | '.')* PN_CHARS)?
  [144s] LANGTAG ::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
  [19] INTEGER ::= [+-]? [0-9]+
- [21] DECIMAL ::= [+-]? ([0-9]* '.' [0-9]+)
+ [20] DECIMAL ::= [+-]? [0-9]* '.' [0-9]+
  [21] DOUBLE ::= [+-]? ([0-9]+ '.' [0-9]* EXPONENT | '.' [0-9]+ EXPONENT | [0-9]+ EXPONENT)
  [154s] EXPONENT ::= [eE] [+-]? [0-9]+
  [22] STRING_LITERAL_QUOTE ::= '"' ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* '"'
  [23] STRING_LITERAL_SINGLE_QUOTE ::= "'" ([^#x27#x5C#xA#xD] | ECHAR | UCHAR)* "'"
  [24] STRING_LITERAL_LONG_SINGLE_QUOTE ::= "'''" (("'" | "''")? [^'\] | ECHAR | UCHAR)* "'''"
  [25] STRING_LITERAL_LONG_QUOTE ::= '"""' (('"' | '""')? [^"\] | ECHAR | UCHAR)* '"""'
  [26] UCHAR ::= '\u' HEX HEX HEX HEX | '\U' HEX HEX HEX HEX HEX HEX HEX HEX
  [159s] ECHAR ::= '\' [tbnrf\"']
- [160s] NIL ::= '(' WS* ')'
  [161s] WS ::= #x20 | #x9 | #xD | #xA
  [162s] ANON ::= '[' WS* ']'
- [163s] PN_CHARS_BASE ::= [A-Z] | [a-z] | [#00C0-#00D6] | [#00D8-#00F6] | [#00F8-#02FF] | [#0370-#037D] | [#037F-#1FFF] | [#200C-#200D] | [#2070-#218F] | [#2C00-#2FEF] | [#3001-#D7FF] | [#F900-#FDCF] | [#FDF0-#FFFD] | [#10000-#EFFFF]
+ [163s] PN_CHARS_BASE ::= [A-Z] | [a-z] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] | [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
  [164s] PN_CHARS_U ::= PN_CHARS_BASE | '_'
- [166s] PN_CHARS ::= PN_CHARS_U | '-' | [0-9] | #00B7 | [#0300-#036F] | [#203F-#2040]
+ [166s] PN_CHARS ::= PN_CHARS_U | '-' | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040]
  [167s] PN_PREFIX ::= PN_CHARS_BASE ((PN_CHARS | '.')* PN_CHARS)?
  [168s] PN_LOCAL ::= (PN_CHARS_U | ':' | [0-9] | PLX) ((PN_CHARS | '.' | ':' | PLX)* PN_CHARS | ':' | PLX)?
  [169s] PLX ::= PERCENT | PN_LOCAL_ESC
  [170s] PERCENT ::= '%' HEX HEX
  [171s] HEX ::= [0-9] | [A-F] | [a-f]
  [172s] PN_LOCAL_ESC ::= '\' ('_' | '~' | '.' | '-' | '!' | '$' | '&' | "'" | '(' | ')' | '*' | '+' | ',' | ';' | '=' | '/' | '?' | '#' | '@' | '%')

7 Parsing

The RDF Concepts and Abstract Syntax ([[!RDF-CONCEPTS]]) specification defines three types of RDF Term: IRIs, literals and blank nodes. Literals are composed of a lexical form and an optional language tag [[!BCP47]] or datatype IRI. An extra type, prefix, is used during parsing to map string identifiers to namespace IRIs. This section maps a string conforming to the grammar in to a set of triples by mapping strings matching productions and lexical tokens to RDF terms or their components (e.g. language tags, lexical forms of literals). Grammar productions change the parser state and emit triples.

7.1 Parser State

Parsing Turtle requires a state of five items:

7.2 RDF Term Constructors

This table maps productions and lexical tokens to RDF terms or components of RDF terms listed in :

production type procedure
IRIREF IRI The characters between "<" and ">" are unescaped¹ to form the unicode string of the IRI. Relative IRI resolution is performed per .
IRIREF IRI The characters between "<" and ">" are taken, with the numeric escape sequences unescaped, to form the unicode string of the IRI. Relative IRI resolution is performed per Section 6.3.
PNAME_NS prefix The potentially empty unicode string matching the first sequence of the rule, PN_PREFIX, is a key into the namespaces map.
PNAME_NS prefix When used in a prefixID or sparqlPrefix production, the prefix is the potentially empty unicode string matching the first argument of the rule is a key into the namespaces map.
IRI When used in a PrefixedName production, the iri is the value in the namespaces map corresponding to the first argument of the rule.
PNAME_LN IRI A prefix is identified by the first argument, PNAME_NS. The namespaces map has a corresponding namespace. The unicode string of the IRI is formed by concatenating this namespace and the second argument, PN_LOCAL.
PNAME_LN IRI A potentially empty prefix is identified by the first sequence, PNAME_NS. The namespaces map MUST have a corresponding namespace. The unicode string of the IRI is formed by unescaping the reserved characters in the second argument, PN_LOCAL, and concatenating this onto the namespace.
STRING_LITERAL1 lexical formThe characters between the outermost "'"s are unescaped¹ to form the unicode string of a lexical form.
STRING_LITERAL2 lexical formThe characters between the outermost '"'s are unescaped¹ to form the unicode string of a lexical form.
STRING_LITERAL_LONG1 lexical formThe characters between the outermost "'''"s are unescaped¹ to form the unicode string of a lexical form.
STRING_LITERAL_LONG2 lexical formThe characters between the outermost '"""'s are unescaped¹ to form the unicode string of a lexical form.
STRING_LITERAL1 lexical formThe characters between the outermost "'"s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
STRING_LITERAL2 lexical formThe characters between the outermost '"'s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
STRING_LITERAL_LONG1 lexical formThe characters between the outermost "'''"s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
STRING_LITERAL_LONG2 lexical formThe characters between the outermost '"""'s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
LANGTAG language tagThe characters following the @ form the unicode string of the language tag.
RDFLiteral literal The literal has a lexical form of the first rule argument, String, and either a language tag of LANGTAG or a datatype IRI of iri, depending on which rule matched the input. if neither a language tag nor a datatype IRI is provided, the literal has a datatype of xsd:string.
INTEGER literal The literal has a lexical form of the input string, and a datatype of xsd:integer.
DECIMAL literal The literal has a lexical form of the input string, and a datatype of xsd:decimal.
DOUBLE literal The literal has a lexical form of the input string, and a datatype of xsd:double.
BooleanLiteral literal The literal has a lexical form of the true or false, depending on which matched the input, and a datatype of xsd:boolean.
BLANK_NODE_LABEL blank node The string matching the second argument, PN_LOCAL, is a key in bnodeLabels. If there is no corresponding blank node in the map, one is allocated.
ANON blank node A blank node is generated.
blankNodePropertyList blank node A blank node is generated. Note the rules for blankNodePropertyList in the next section.
collection blank node A blank node is generated. Note the rules for collection in the next section.
collection blank node For non-empty lists, a blank node is generated. Note the rules for collection in the next section.
IRI For empty lists, the resulting IRI is rdf:nil. Note the rules for collection in the next section.

¹ Escape Sequences defines a mapping from escaped unicode strings to unicode strings. The following lexical tokens are unescaped to produce unicode strings: IRIREF, STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1 and STRING_LITERAL_LONG2.

7.3 RDF Triples Constructors

A Turtle document defines an RDF graph composed of set of RDF triples. The subject production sets the curSubject. The verb production sets the curPredicate. Each object N in the document produces an RDF triple: curSubject curPredicate N .

Property Lists:

Beginning the blankNodePropertyList production records the curSubject and curPredicate, and sets curSubject to a novel blank node B. Finishing the blankNodePropertyList production restores curSubject and curPredicate. The node produced by matching blankNodePropertyList is the blank node B.

Collections:

Beginning the collection production records the curSubject and curPredicate. Each object in the collection production has a curSubject set to a novel blank node B and a curPredicate set to rdf:first. For each object objectn after the first produces a triple:objectn-1 rdf:rest objectn . Finishing the collection production creates an additional triple curSubject rdf:rest rdf:nil . and restores curSubject and curPredicate The node produced by matching collection is the first blank node B for non-empty lists and rdf:nil for empty lists.

Beginning the collection production records the curSubject and curPredicate, sets curSubject to a novel blank node Bhead and sets curSubject and curPredicate to Bhead and rdf:first respectively. Each object object in collection allocates a novel blank node Bn, creates an additional triple curSubject rdf:rest Bn . and sets curSubject to Bn. Finishing the collection production creates an additional triple curSubject rdf:rest rdf:nil . and restores curSubject and curPredicate The node produced by matching collection is the blank node Bhead.

7.4 Parsing Example

The following informative example shows the semantic actions performed when parsing this Turtle document with an LALR(1) parser: