A Turtle document is a Unicode[UNICODE] character string encoded in UTF-8. Unicode characters only in the range U+0000 to U+10FFFF inclusive are allowed.
The RDF Working Group proposes to make the following changes to align Turtle with SPARQL.
BASE and PREFIX directives in a Turtle document.
Feedback, both positive and negative, is invited by sending email to mailing list public-rdf-comments@w3.org (subscribe, archives).
The EBNF used here is defined in XML 1.0
[EBNF-NOTATION]. Production labels consisting of a number and a final 's', e.g. [60s], reference the production with that number in the SPARQL Query Language for RDF grammar [RDF-SPARQL-QUERY]. - When tokenizing the input and choosing grammar rules, the longest match is chosen. The strings @prefix and @base match the pattern for LANGTAG, though neither "prefix" nor "base" are registered language subtags. This specification does not define whether a quoted literal followed by either of these tokens (e.g. "A"@base) is in the Turtle language.
+ Notes:
@base', '@prefix', 'a', 'true', 'false') are case-sensitive.
Keywords in double quotes ("BASE", "PREFIX") are case-insensitive.
UCHAR and ECHAR are case sensitive.
turtleDoc.
ANON
::=
'[' WS* ']'
token allows any amount of white space and comments between []s.
The single space version is used in the grammar for clarity.
@prefix' and '@base' match the pattern for LANGTAG, though neither "prefix" nor "base" are registered language subtags.
This specification does not define whether a quoted literal followed by either of these tokens (e.g. "A"@base) is in the Turtle language.
| [1] | turtleDoc |
::= | statement* |
|
| [2] | statement |
::= | directive | triples '.' |
|
| [3] | directive |
::= | base | prefixID | sparqlBase | sparqlPrefix |
|
| [4] | prefixID |
::= | '@prefix' PNAME_NS IRIREF '.' |
|
| [5] | base |
::= | '@base' IRIREF '.' |
|
| - | [28*] | sparqlPrefix |
::= | [Pp] [Rr] [Ee] [Ff] [Ii] [Xx] PNAME_NS IRIREF |
| - | [29*] | sparqlBase |
::= | [Bb] [Aa] [Ss] [Ee] IRIREF |
| + | [5s] | sparqlBase |
::= | "BASE" IRIREF |
| + | [6s] | sparqlPrefix |
::= | "PREFIX" PNAME_NS IRIREF |
| [6] | triples |
::= | subject predicateObjectList | blankNodePropertyList predicateObjectList? |
|
| [7] | predicateObjectList |
::= | verb objectList (';' (verb objectList)?)* |
|
| [8] | objectList |
::= | object (',' object)* |
|
| [9] | verb |
::= | predicate | 'a' |
|
| - | [10] | subject |
::= | iri | blank |
| + | [10] | subject |
::= | iri | BlankNode | collection |
| [11] | predicate |
::= | iri | |
| - | [12] | object |
::= | iri | blank | blankNodePropertyList | literal |
| + | [12] | object |
::= | iri | BlankNode | collection | blankNodePropertyList | literal |
| [13] | literal |
::= | RDFLiteral | NumericLiteral | BooleanLiteral |
|
| - | [14] | blank |
::= | BlankNode | collection |
| [14] | blankNodePropertyList |
::= | '[' predicateObjectList ']' |
|
| [15] | collection |
::= | '(' object* ')' |
|
| [16] | NumericLiteral |
::= | INTEGER | DECIMAL | DOUBLE |
|
| [128s] | RDFLiteral |
::= | String (LANGTAG | '^^' iri)? |
|
| [133s] | BooleanLiteral |
::= | 'true' | 'false' |
|
| [17] | String |
::= | STRING_LITERAL_QUOTE | STRING_LITERAL_SINGLE_QUOTE | STRING_LITERAL_LONG_SINGLE_QUOTE | STRING_LITERAL_LONG_QUOTE |
|
| [135s] | iri |
::= | IRIREF | PrefixedName |
|
| [136s] | PrefixedName |
::= | PNAME_LN | PNAME_NS |
|
| [137s] | BlankNode |
::= | BLANK_NODE_LABEL | ANON |
|
Productions for terminals | ||||
| [18] | IRIREF |
::= | '<' ([^#x00-#x20<>\"{}|^`\] | UCHAR)* '>' |
|
| [139s] | PNAME_NS |
::= | PN_PREFIX? ':' |
|
| [140s] | PNAME_LN |
::= | PNAME_NS PN_LOCAL | |
| [141s] | BLANK_NODE_LABEL |
::= | '_:' (PN_CHARS_U | [0-9]) ((PN_CHARS | '.')* PN_CHARS)? |
|
| [144s] | LANGTAG |
::= | '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)* |
|
| [19] | INTEGER |
::= | [+-]? [0-9]+ |
|
| - | [21] | DECIMAL |
::= | [+-]? ([0-9]* '.' [0-9]+) |
| + | [20] | DECIMAL |
::= | [+-]? [0-9]* '.' [0-9]+ |
| [21] | DOUBLE |
::= | [+-]? ([0-9]+ '.' [0-9]* EXPONENT | '.' [0-9]+ EXPONENT | [0-9]+ EXPONENT) |
|
| [154s] | EXPONENT |
::= | [eE] [+-]? [0-9]+ |
|
| [22] | STRING_LITERAL_QUOTE |
::= | '"' ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* '"' |
|
| [23] | STRING_LITERAL_SINGLE_QUOTE |
::= | "'" ([^#x27#x5C#xA#xD] | ECHAR | UCHAR)* "'" |
|
| [24] | STRING_LITERAL_LONG_SINGLE_QUOTE |
::= | "'''" (("'" | "''")? [^'\] | ECHAR | UCHAR)* "'''" |
|
| [25] | STRING_LITERAL_LONG_QUOTE |
::= | '"""' (('"' | '""')? [^"\] | ECHAR | UCHAR)* '"""' |
|
| [26] | UCHAR |
::= | '\u' HEX HEX HEX HEX | '\U' HEX HEX HEX HEX HEX HEX HEX HEX |
|
| [159s] | ECHAR |
::= | '\' [tbnrf\"'] |
|
| - | [160s] | NIL |
::= | '(' WS* ')' |
| [161s] | WS |
::= | #x20 | #x9 | #xD | #xA |
|
| [162s] | ANON |
::= | '[' WS* ']' |
|
| - | [163s] | PN_CHARS_BASE |
::= | [A-Z] | [a-z] | [#00C0-#00D6] | [#00D8-#00F6] | [#00F8-#02FF] | [#0370-#037D] | [#037F-#1FFF] | [#200C-#200D] | [#2070-#218F] | [#2C00-#2FEF] | [#3001-#D7FF] | [#F900-#FDCF] | [#FDF0-#FFFD] | [#10000-#EFFFF] |
| + | [163s] | PN_CHARS_BASE |
::= | [A-Z] | [a-z] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] | [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] |
| [164s] | PN_CHARS_U |
::= | PN_CHARS_BASE | '_' |
|
| - | [166s] | PN_CHARS |
::= | PN_CHARS_U | '-' | [0-9] | #00B7 | [#0300-#036F] | [#203F-#2040] |
| + | [166s] | PN_CHARS |
::= | PN_CHARS_U | '-' | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040] |
| [167s] | PN_PREFIX |
::= | PN_CHARS_BASE ((PN_CHARS | '.')* PN_CHARS)? |
|
| [168s] | PN_LOCAL |
::= | (PN_CHARS_U | ':' | [0-9] | PLX) ((PN_CHARS | '.' | ':' | PLX)* PN_CHARS | ':' | PLX)? |
|
| [169s] | PLX |
::= | PERCENT | PN_LOCAL_ESC |
|
| [170s] | PERCENT |
::= | '%' HEX HEX |
|
| [171s] | HEX |
::= | [0-9] | [A-F] | [a-f] |
|
| [172s] | PN_LOCAL_ESC |
::= | '\' ('_' | '~' | '.' | '-' | '!' | '$' | '&' | "'" | '(' | ')' | '*' | '+' | ',' | ';' | '=' | '/' | '?' | '#' | '@' | '%') |
|
The RDF Concepts and Abstract Syntax ([[!RDF-CONCEPTS]]) specification defines three types of RDF Term:
IRIs,
literals and
blank nodes.
Literals are composed of a lexical form and an optional language tag [[!BCP47]] or datatype IRI.
An extra type, prefix, is used during parsing to map string identifiers to namespace IRIs.
This section maps a string conforming to the grammar in to a set of triples by mapping strings matching productions and lexical tokens to RDF terms or their components (e.g. language tags, lexical forms of literals). Grammar productions change the parser state and emit triples.
Parsing Turtle requires a state of five items:
baseURI — When the base production is reached, the second rule argument, IRIREF, is the base URI used for relative IRI resolution (test: base1 base2).namespaces — The second and third rule arguments (PNAME_NS and IRIREF) in the prefixID production assign a namespace name (IRIREF) for the prefix (PNAME_NS). Outside of a prefixID production, any PNAME_NS is substituted with the namespace (test: prefix1 escapedNamespace1). Note that the prefix may be an empty string, per the PNAME_NS, production: (PN_PREFIX)? ":" (test: default1).bnodeLabels — A mapping from string to blank node.curSubject — The curSubject is bound to the subject production.curPredicate — The curPredicate is bound to the verb production. If token matched was "a", curPredicate is bound to the IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#type (test: type).This table maps productions and lexical tokens to RDF terms or components of RDF terms listed in :
| production | type | procedure |
|---|---|---|
| IRIREF | IRI | The characters between "<" and ">" are unescaped¹ to form the unicode string of the IRI. Relative IRI resolution is performed per . |
| IRIREF | IRI | The characters between "<" and ">" are taken, with the numeric escape sequences unescaped, to form the unicode string of the IRI. Relative IRI resolution is performed per Section 6.3. |
| PNAME_NS | prefix | The potentially empty unicode string matching the first sequence of the rule, PN_PREFIX, is a key into the namespaces map. |
| PNAME_NS | prefix | When used in a prefixID or sparqlPrefix production, the prefix is the potentially empty unicode string matching the first argument of the rule is a key into the namespaces map. |
| IRI | When used in a PrefixedName production, the iri is the value in the namespaces map corresponding to the first argument of the rule. | |
| PNAME_LN | IRI | A prefix is identified by the first argument, PNAME_NS. The namespaces map has a corresponding namespace. The unicode string of the IRI is formed by concatenating this namespace and the second argument, PN_LOCAL. |
| PNAME_LN | IRI | A potentially empty prefix is identified by the first sequence, PNAME_NS. The namespaces map MUST have a corresponding namespace. The unicode string of the IRI is formed by unescaping the reserved characters in the second argument, PN_LOCAL, and concatenating this onto the namespace. |
| STRING_LITERAL1 | lexical form | The characters between the outermost "'"s are unescaped¹ to form the unicode string of a lexical form. |
| STRING_LITERAL2 | lexical form | The characters between the outermost '"'s are unescaped¹ to form the unicode string of a lexical form. |
| STRING_LITERAL_LONG1 | lexical form | The characters between the outermost "'''"s are unescaped¹ to form the unicode string of a lexical form. |
| STRING_LITERAL_LONG2 | lexical form | The characters between the outermost '"""'s are unescaped¹ to form the unicode string of a lexical form. |
| STRING_LITERAL1 | lexical form | The characters between the outermost "'"s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form. |
| STRING_LITERAL2 | lexical form | The characters between the outermost '"'s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form. |
| STRING_LITERAL_LONG1 | lexical form | The characters between the outermost "'''"s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form. |
| STRING_LITERAL_LONG2 | lexical form | The characters between the outermost '"""'s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form. |
| LANGTAG | language tag | The characters following the @ form the unicode string of the language tag. |
| RDFLiteral | literal | The literal has a lexical form of the first rule argument, String, and either a language tag of LANGTAG or a datatype IRI of iri, depending on which rule matched the input. if neither a language tag nor a datatype IRI is provided, the literal has a datatype of xsd:string. |
| INTEGER | literal | The literal has a lexical form of the input string, and a datatype of xsd:integer. |
| DECIMAL | literal | The literal has a lexical form of the input string, and a datatype of xsd:decimal. |
| DOUBLE | literal | The literal has a lexical form of the input string, and a datatype of xsd:double. |
| BooleanLiteral | literal | The literal has a lexical form of the true or false, depending on which matched the input, and a datatype of xsd:boolean. |
| BLANK_NODE_LABEL | blank node | The string matching the second argument, PN_LOCAL, is a key in bnodeLabels. If there is no corresponding blank node in the map, one is allocated. |
| ANON | blank node | A blank node is generated. |
| blankNodePropertyList | blank node | A blank node is generated. Note the rules for blankNodePropertyList in the next section. |
| collection | blank node | A blank node is generated. Note the rules for collection in the next section. |
| collection | blank node | For non-empty lists, a blank node is generated. Note the rules for collection in the next section. |
| IRI | For empty lists, the resulting IRI is rdf:nil. Note the rules for collection in the next section. |
¹ Escape Sequences defines a mapping from escaped unicode strings to unicode strings. The following lexical tokens are unescaped to produce unicode strings: IRIREF, STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1 and STRING_LITERAL_LONG2.
A Turtle document defines an RDF graph composed of set of RDF triples.
The subject production sets the curSubject.
The verb production sets the curPredicate.
Each object N in the document produces an RDF triple: curSubject curPredicate N .
Beginning the blankNodePropertyList production records the curSubject and curPredicate, and sets curSubject to a novel blank node B.
Finishing the blankNodePropertyList production restores curSubject and curPredicate.
The node produced by matching blankNodePropertyList is the blank node B.
Beginning the collection production records the curSubject and curPredicate.
Each object in the collection production has a curSubject set to a novel blank node B and a curPredicate set to rdf:first.
For each object objectn after the first produces a triple:objectn-1 rdf:rest objectn .
Finishing the collection production creates an additional triple curSubject rdf:rest rdf:nil . and restores curSubject and curPredicate
The node produced by matching collection is the first blank node B for non-empty lists and rdf:nil for empty lists.
Beginning the collection production records the curSubject and curPredicate, sets curSubject to a novel blank node Bhead and sets curSubject and curPredicate to Bhead and rdf:first respectively.
Each object object in collection allocates a novel blank node Bn, creates an additional triple curSubject rdf:rest Bn . and sets curSubject to Bn.
Finishing the collection production creates an additional triple curSubject rdf:rest rdf:nil . and restores curSubject and curPredicate
The node produced by matching collection is the blank node Bhead.
The following informative example shows the semantic actions performed when parsing this Turtle document with an LALR(1) parser:
ericFoaf to the IRI http://www.w3.org/People/Eric/ericP-foaf.rdf#.http://xmlns.com/foaf/0.1/.curSubject the IRI http://www.w3.org/People/Eric/ericP-foaf.rdf#ericP.curPredicate the IRI http://xmlns.com/foaf/0.1/givenName.<...rdf#ericP> <.../givenName> "Eric" .curPredicate the IRI http://xmlns.com/foaf/0.1/knows.<...rdf#ericP> <.../knows> <...who/dan-brickley> .<...rdf#ericP> <.../knows> _:1 .curSubject and reassign to the blank node _:1.curPredicate.curPredicate the IRI http://xmlns.com/foaf/0.1/mbox._:1 <.../mbox> <mailto:timbl@w3.org> .curSubject and curPredicate to their saved values (<...rdf#ericP>, <.../knows>).<...rdf#ericP> <.../knows> <http://getopenid.com/amyvdh> .