A Turtle document is a Unicode[UNICODE] character string encoded in UTF-8. Unicode characters only in the range U+0000 to U+10FFFF inclusive are allowed.
The RDF Working Group proposes to make the following changes to align Turtle with SPARQL.
BASE
and PREFIX
directives in a Turtle document.
Feedback, both positive and negative, is invited by sending email to mailing list public-rdf-comments@w3.org (subscribe, archives).
The EBNF used here is defined in XML 1.0
[EBNF-NOTATION]. Production labels consisting of a number and a final 's', e.g. [60s], reference the production with that number in the SPARQL Query Language for RDF grammar [RDF-SPARQL-QUERY]. - When tokenizing the input and choosing grammar rules, the longest match is chosen. The strings @prefix
and @base
match the pattern for LANGTAG, though neither "prefix" nor "base" are registered language subtags. This specification does not define whether a quoted literal followed by either of these tokens (e.g. "A"@base
) is in the Turtle language.
+ Notes:
@base
', '@prefix
', 'a
', 'true
', 'false
') are case-sensitive.
Keywords in double quotes ("BASE
", "PREFIX
") are case-insensitive.
UCHAR
and ECHAR
are case sensitive.
turtleDoc
.
ANON
::=
'[
' WS*
']
'
token allows any amount of white space and comments between []
s.
The single space version is used in the grammar for clarity.
@prefix
' and '@base
' match the pattern for LANGTAG, though neither "prefix
" nor "base
" are registered language subtags.
This specification does not define whether a quoted literal followed by either of these tokens (e.g. "A"@base
) is in the Turtle language.
[1] | turtleDoc |
::= | statement* |
|
[2] | statement |
::= | directive | triples '. ' |
|
[3] | directive |
::= | base | prefixID | sparqlBase | sparqlPrefix |
|
[4] | prefixID |
::= | '@prefix ' PNAME_NS IRIREF '. ' |
|
[5] | base |
::= | '@base ' IRIREF '. ' |
|
- | [28*] | sparqlPrefix |
::= | [Pp ] [Rr ] [Ee ] [Ff ] [Ii ] [Xx ] PNAME_NS IRIREF |
- | [29*] | sparqlBase |
::= | [Bb ] [Aa ] [Ss ] [Ee ] IRIREF |
+ | [5s] | sparqlBase |
::= | "BASE " IRIREF |
+ | [6s] | sparqlPrefix |
::= | "PREFIX " PNAME_NS IRIREF |
[6] | triples |
::= | subject predicateObjectList | blankNodePropertyList predicateObjectList? |
|
[7] | predicateObjectList |
::= | verb objectList ('; ' (verb objectList)?)* |
|
[8] | objectList |
::= | object (', ' object)* |
|
[9] | verb |
::= | predicate | 'a ' |
|
- | [10] | subject |
::= | iri | blank |
+ | [10] | subject |
::= | iri | BlankNode | collection |
[11] | predicate |
::= | iri | |
- | [12] | object |
::= | iri | blank | blankNodePropertyList | literal |
+ | [12] | object |
::= | iri | BlankNode | collection | blankNodePropertyList | literal |
[13] | literal |
::= | RDFLiteral | NumericLiteral | BooleanLiteral |
|
- | [14] | blank |
::= | BlankNode | collection |
[14] | blankNodePropertyList |
::= | '[ ' predicateObjectList '] ' |
|
[15] | collection |
::= | '( ' object* ') ' |
|
[16] | NumericLiteral |
::= | INTEGER | DECIMAL | DOUBLE |
|
[128s] | RDFLiteral |
::= | String (LANGTAG | '^^ ' iri)? |
|
[133s] | BooleanLiteral |
::= | 'true ' | 'false ' |
|
[17] | String |
::= | STRING_LITERAL_QUOTE | STRING_LITERAL_SINGLE_QUOTE | STRING_LITERAL_LONG_SINGLE_QUOTE | STRING_LITERAL_LONG_QUOTE |
|
[135s] | iri |
::= | IRIREF | PrefixedName |
|
[136s] | PrefixedName |
::= | PNAME_LN | PNAME_NS |
|
[137s] | BlankNode |
::= | BLANK_NODE_LABEL | ANON |
|
Productions for terminals | ||||
[18] | IRIREF |
::= | '< ' ([^#x00-#x20<>\"{}|^`\ ] | UCHAR)* '> ' |
|
[139s] | PNAME_NS |
::= | PN_PREFIX? ': ' |
|
[140s] | PNAME_LN |
::= | PNAME_NS PN_LOCAL | |
[141s] | BLANK_NODE_LABEL |
::= | '_: ' (PN_CHARS_U | [0-9 ]) ((PN_CHARS | '. ')* PN_CHARS)? |
|
[144s] | LANGTAG |
::= | '@ ' [a-zA-Z ]+ ('- ' [a-zA-Z0-9 ]+ )* |
|
[19] | INTEGER |
::= | [+- ]? [0-9 ]+ |
|
- | [21] | DECIMAL |
::= | [+- ]? ([0-9 ]* '. ' [0-9 ]+ ) |
+ | [20] | DECIMAL |
::= | [+- ]? [0-9 ]* '. ' [0-9 ]+ |
[21] | DOUBLE |
::= | [+- ]? ([0-9 ]+ '. ' [0-9 ]* EXPONENT | '. ' [0-9 ]+ EXPONENT | [0-9 ]+ EXPONENT) |
|
[154s] | EXPONENT |
::= | [eE ] [+- ]? [0-9 ]+ |
|
[22] | STRING_LITERAL_QUOTE |
::= | '" ' ([^#x22#x5C#xA#xD ] | ECHAR | UCHAR)* '" ' |
|
[23] | STRING_LITERAL_SINGLE_QUOTE |
::= | "' " ([^#x27#x5C#xA#xD ] | ECHAR | UCHAR)* "' " |
|
[24] | STRING_LITERAL_LONG_SINGLE_QUOTE |
::= | "''' " (("' " | "'' ")? [^'\ ] | ECHAR | UCHAR)* "''' " |
|
[25] | STRING_LITERAL_LONG_QUOTE |
::= | '""" ' (('" ' | '"" ')? [^"\ ] | ECHAR | UCHAR)* '""" ' |
|
[26] | UCHAR |
::= | '\u ' HEX HEX HEX HEX | '\U ' HEX HEX HEX HEX HEX HEX HEX HEX |
|
[159s] | ECHAR |
::= | '\ ' [tbnrf\"' ] |
|
- | [160s] | NIL |
::= | '( ' WS* ') ' |
[161s] | WS |
::= | #x20 | #x9 | #xD | #xA |
|
[162s] | ANON |
::= | '[ ' WS* '] ' |
|
- | [163s] | PN_CHARS_BASE |
::= | [A-Z ] | [a-z ] | [#00C0-#00D6 ] | [#00D8-#00F6 ] | [#00F8-#02FF ] | [#0370-#037D ] | [#037F-#1FFF ] | [#200C-#200D ] | [#2070-#218F ] | [#2C00-#2FEF ] | [#3001-#D7FF ] | [#F900-#FDCF ] | [#FDF0-#FFFD ] | [#10000-#EFFFF ] |
+ | [163s] | PN_CHARS_BASE |
::= | [A-Z ] | [a-z ] | [#x00C0-#x00D6 ] | [#x00D8-#x00F6 ] | [#x00F8-#x02FF ] | [#x0370-#x037D ] | [#x037F-#x1FFF ] | [#x200C-#x200D ] | [#x2070-#x218F ] | [#x2C00-#x2FEF ] | [#x3001-#xD7FF ] | [#xF900-#xFDCF ] | [#xFDF0-#xFFFD ] | [#x10000-#xEFFFF ] |
[164s] | PN_CHARS_U |
::= | PN_CHARS_BASE | '_ ' |
|
- | [166s] | PN_CHARS |
::= | PN_CHARS_U | '- ' | [0-9 ] | #00B7 | [#0300-#036F ] | [#203F-#2040 ] |
+ | [166s] | PN_CHARS |
::= | PN_CHARS_U | '- ' | [0-9 ] | #x00B7 | [#x0300-#x036F ] | [#x203F-#x2040 ] |
[167s] | PN_PREFIX |
::= | PN_CHARS_BASE ((PN_CHARS | '. ')* PN_CHARS)? |
|
[168s] | PN_LOCAL |
::= | (PN_CHARS_U | ': ' | [0-9 ] | PLX) ((PN_CHARS | '. ' | ': ' | PLX)* PN_CHARS | ': ' | PLX)? |
|
[169s] | PLX |
::= | PERCENT | PN_LOCAL_ESC |
|
[170s] | PERCENT |
::= | '% ' HEX HEX |
|
[171s] | HEX |
::= | [0-9 ] | [A-F ] | [a-f ] |
|
[172s] | PN_LOCAL_ESC |
::= | '\ ' ('_ ' | '~ ' | '. ' | '- ' | '! ' | '$ ' | '& ' | "' " | '( ' | ') ' | '* ' | '+ ' | ', ' | '; ' | '= ' | '/ ' | '? ' | '# ' | '@ ' | '% ') |
The RDF Concepts and Abstract Syntax ([[!RDF-CONCEPTS]]) specification defines three types of RDF Term:
IRIs,
literals and
blank nodes.
Literals are composed of a lexical form and an optional language tag [[!BCP47]] or datatype IRI.
An extra type, prefix
, is used during parsing to map string identifiers to namespace IRIs.
This section maps a string conforming to the grammar in to a set of triples by mapping strings matching productions and lexical tokens to RDF terms or their components (e.g. language tags, lexical forms of literals). Grammar productions change the parser state and emit triples.
Parsing Turtle requires a state of five items:
baseURI
— When the base production is reached, the second rule argument, IRIREF
, is the base URI used for relative IRI resolution (test: base1 base2).namespaces
— The second and third rule arguments (PNAME_NS
and IRIREF
) in the prefixID production assign a namespace name (IRIREF
) for the prefix (PNAME_NS
). Outside of a prefixID
production, any PNAME_NS
is substituted with the namespace (test: prefix1 escapedNamespace1). Note that the prefix may be an empty string, per the PNAME_NS,
production: (PN_PREFIX)? ":"
(test: default1).bnodeLabels
— A mapping from string to blank node.curSubject
— The curSubject
is bound to the subject
production.curPredicate
— The curPredicate
is bound to the verb
production. If token matched was "a
", curPredicate
is bound to the IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#type
(test: type).This table maps productions and lexical tokens to RDF terms
or components of RDF terms
listed in :
production | type | procedure |
---|---|---|
IRIREF | IRI | The characters between "<" and ">" are unescaped¹ to form the unicode string of the IRI. Relative IRI resolution is performed per . |
IRIREF | IRI | The characters between "<" and ">" are taken, with the numeric escape sequences unescaped, to form the unicode string of the IRI. Relative IRI resolution is performed per Section 6.3. |
PNAME_NS | prefix | The potentially empty unicode string matching the first sequence of the rule, PN_PREFIX, is a key into the namespaces map. |
PNAME_NS | prefix | When used in a prefixID or sparqlPrefix production, the prefix is the potentially empty unicode string matching the first argument of the rule is a key into the namespaces map. |
IRI | When used in a PrefixedName production, the iri is the value in the namespaces map corresponding to the first argument of the rule. | |
PNAME_LN | IRI | A prefix is identified by the first argument, PNAME_NS . The namespaces map has a corresponding namespace . The unicode string of the IRI is formed by concatenating this namespace and the second argument, PN_LOCAL . |
PNAME_LN | IRI | A potentially empty prefix is identified by the first sequence, PNAME_NS . The namespaces map MUST have a corresponding namespace . The unicode string of the IRI is formed by unescaping the reserved characters in the second argument, PN_LOCAL , and concatenating this onto the namespace . |
STRING_LITERAL1 | lexical form | The characters between the outermost "'"s are unescaped¹ to form the unicode string of a lexical form. |
STRING_LITERAL2 | lexical form | The characters between the outermost '"'s are unescaped¹ to form the unicode string of a lexical form. |
STRING_LITERAL_LONG1 | lexical form | The characters between the outermost "'''"s are unescaped¹ to form the unicode string of a lexical form. |
STRING_LITERAL_LONG2 | lexical form | The characters between the outermost '"""'s are unescaped¹ to form the unicode string of a lexical form. |
STRING_LITERAL1 | lexical form | The characters between the outermost "'"s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form. |
STRING_LITERAL2 | lexical form | The characters between the outermost '"'s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form. |
STRING_LITERAL_LONG1 | lexical form | The characters between the outermost "'''"s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form. |
STRING_LITERAL_LONG2 | lexical form | The characters between the outermost '"""'s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form. |
LANGTAG | language tag | The characters following the @ form the unicode string of the language tag. |
RDFLiteral | literal | The literal has a lexical form of the first rule argument, String , and either a language tag of LANGTAG or a datatype IRI of iri , depending on which rule matched the input. if neither a language tag nor a datatype IRI is provided, the literal has a datatype of xsd:string . |
INTEGER | literal | The literal has a lexical form of the input string, and a datatype of xsd:integer . |
DECIMAL | literal | The literal has a lexical form of the input string, and a datatype of xsd:decimal . |
DOUBLE | literal | The literal has a lexical form of the input string, and a datatype of xsd:double . |
BooleanLiteral | literal | The literal has a lexical form of the true or false , depending on which matched the input, and a datatype of xsd:boolean . |
BLANK_NODE_LABEL | blank node | The string matching the second argument, PN_LOCAL , is a key in bnodeLabels. If there is no corresponding blank node in the map, one is allocated. |
ANON | blank node | A blank node is generated. |
blankNodePropertyList | blank node | A blank node is generated. Note the rules for blankNodePropertyList in the next section. |
collection | blank node | A blank node is generated. Note the rules for collection in the next section. |
collection | blank node | For non-empty lists, a blank node is generated. Note the rules for collection in the next section. |
IRI | For empty lists, the resulting IRI is rdf:nil . Note the rules for collection in the next section. |
¹ Escape Sequences defines a mapping from escaped unicode strings
to unicode strings
. The following lexical tokens are unescaped to produce unicode strings
: IRIREF, STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1 and STRING_LITERAL_LONG2.
A Turtle document defines an RDF graph composed of set of RDF triples.
The subject
production sets the curSubject
.
The verb
production sets the curPredicate
.
Each object N
in the document produces an RDF triple: curSubject
curPredicate
N
.
Beginning the blankNodePropertyList
production records the curSubject
and curPredicate
, and sets curSubject
to a novel blank node
B
.
Finishing the blankNodePropertyList
production restores curSubject
and curPredicate
.
The node produced by matching blankNodePropertyList
is the blank node B
.
Beginning the collection
production records the curSubject
and curPredicate
.
Each object
in the collection
production has a curSubject
set to a novel blank node
B
and a curPredicate
set to rdf:first
.
For each object objectn
after the first produces a triple:objectn-1
rdf:rest
objectn
.
Finishing the collection
production creates an additional triple curSubject rdf:rest rdf:nil
. and restores curSubject
and curPredicate
The node produced by matching collection
is the first blank node B
for non-empty lists and rdf:nil
for empty lists.
Beginning the collection
production records the curSubject
and curPredicate
, sets curSubject
to a novel blank node
Bhead
and sets curSubject
and curPredicate
to Bhead
and rdf:first
respectively.
Each object object
in collection
allocates a novel blank node
Bn
, creates an additional triple curSubject rdf:rest Bn
. and sets curSubject
to Bn
.
Finishing the collection
production creates an additional triple curSubject rdf:rest rdf:nil
. and restores curSubject
and curPredicate
The node produced by matching collection
is the blank node Bhead
.
The following informative example shows the semantic actions performed when parsing this Turtle document with an LALR(1) parser:
ericFoaf
to the IRI http://www.w3.org/People/Eric/ericP-foaf.rdf#
.http://xmlns.com/foaf/0.1/
.curSubject
the IRI http://www.w3.org/People/Eric/ericP-foaf.rdf#ericP
.curPredicate
the IRI http://xmlns.com/foaf/0.1/givenName
.<...rdf#ericP>
<.../givenName>
"Eric"
.curPredicate
the IRI http://xmlns.com/foaf/0.1/knows
.<...rdf#ericP>
<.../knows>
<...who/dan-brickley>
.<...rdf#ericP>
<.../knows>
_:1
.curSubject
and reassign to the blank node _:1
.curPredicate
.curPredicate
the IRI http://xmlns.com/foaf/0.1/mbox
._:1
<.../mbox>
<mailto:timbl@w3.org>
.curSubject
and curPredicate
to their saved values (<...rdf#ericP>
, <.../knows>
).<...rdf#ericP>
<.../knows>
<http://getopenid.com/amyvdh>
.