W3C

N-Triples

A line-based syntax for an RDF graph

W3C Working Group Note 09 April 2013

This version:
http://www.w3.org/TR/2013/NOTE-n-triples-20130409/
Latest published version:
http://www.w3.org/TR/n-triples/
Latest editor's draft:
https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-turtle/n-triples.html
Previous version:
Editor:
Gavin Carothers, Lex Machina, Inc
Author:
David Beckett

Abstract

N-Triples is a line-based, plain text format for encoding an RDF graph.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

N-Triples was originally defined as a syntax for the RDF Test Cases [RDF-TESTCASES] document. Due to its populatity as exchange format the RDF Working Group decided to publish an updated version. This document is intended to become a Working Group Note.

This document was published by the RDF Working Group as a First Public Working Group Note. If you wish to make comments regarding this document, please send them to public-rdf-comments@w3.org (subscribe, archives). All comments are welcome.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1. Introduction

This document defines an easy to parse line-based subset of Turtle [TURTLE-TR] named N-Triples.

The syntax is a revised version of N-Triples as originally defined in the RDF Test Cases [RDF-TESTCASES] document. Its original intent was for writing test cases, but it has proven to be popular as a exchange format for RDF data.

An N-Triples document contains no parsing directives.

N-Triples triples are a sequence of RDF terms representing the subject, predicate and object of an RDF Triple. These may be seperated by white space (spaces #x20 or tabs #x9). This sequence is terminated by a '.' and a new line (optional at the end of a document).

Example 1

N-Triples triples are also Turtle simple triples, but Turtle includes other representations of RDF terms and abbreviations of RDF Triples. When parsed by a Turtle parser, data in the N-Triples format will produce exactly the same triples as a parser for the restricted N-triples language.

The RDF graph represented by an N-Triples document contains exactly each triple matching the N-Triples triple production.

2. N-Triples Language

2.1 Simple Triples

The simplest triple statement is a sequence of (subject, predicate, object) terms, separated by whitespace and terminated by '.' after each triple.

Example 2

2.2 IRIs

IRIs may be written only as absolute IRIs. IRIs are enclosed in '<' and '>' and may contain numeric escape sequences (described below). For example <http://example.org/#green-goblin>.

2.3 RDF Literals

Literals are used to identify values such as strings, numbers, dates.

Literals (Grammar production Literal) have a lexical form followed by a language tag, a datatype IRI, or neither. The representation of the lexical form consists of an initial delimiter " (U+0022), a sequence of permitted characters or numeric escape sequence or string escape sequence, and a final delimiter. Literals may not contain the characters ", LF, or CR. In addition '\' (U+005C) may not appear in any quoted literal except as part of an escape sequence. The corresponding RDF lexical form is the characters between the delimiters, after processing any escape sequences. If present, the language tag is preceded by a '@' (U+0040). If there is no language tag, there may be a datatype IRI, preceeded by '^^' (U+005E U+005E). If there is no datatype IRI and no language tag, the datatype is xsd:string.

Issue 1

Include examples with a few escapes for new lines, etc

2.4 RDF Blank Nodes

RDF blank nodes in N-Triples are expressed as _: followed by a blank node label which is a series of name characters. The characters in the label are built upon PN_CHARS_BASE, liberalized as follows:

A fresh RDF blank node is allocated for each unique blank node label in a document. Repeated use of the same blank node label identifies the same RDF blank node.

Example 3

3. Changes from RDF Test Cases format

This section is non-normative.

4. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MUST, MUST NOT, REQUIRED, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this specification are to be interpreted as described in [RFC2119].

This specification defines conformance criteria for:

A conforming N-Triple document is a Unicode string that conforms to the grammar and additional constraints defined in section 5. Grammar, starting with the ntriplesDoc production. A N-Triple document serializes an RDF graph.

A conforming N-Triple parser is a system capable of reading N-Triple documents on behalf of an application. It makes the serialized RDF graph, as defined in section 6. Parsing, available to the application, usually through some form of API.

Issue 2

N-Triple seralizers are not defined here, will likely include optional behavior to conform to RDF Test Cases syntax as well.

The IRI that identifies the N-Triple language is: http://www.w3.org/ns/formats/N-Triple

4.1 Media Type and Content Encoding

The media type of N-Triples is application/n-triples. The content encoding of N-Triples is always UTF-8. See N-Triples Media Type for the media type registration form.

4.1.1 Other Media Types

N-Triples has been historically provided with other media types. N-Triples may also be provided as text/plain. When used in this way N-Triples MUST use the escaped form of any character outside US-ASCII. As N-Triples is a subset of Turtle a N-Triples document MAY also be provided as text/turtle. In both of these cases the document is not an N-Triples document as an N-Triples document is only provided as application/n-triples.

5. Grammar

A N-Triples document is a Unicode [UNICODE] character string encoded in UTF-8. Unicode codepoints only in the range U+0 to U+10FFFF inclusive are allowed.

The EBNF used here is defined in XML 1.0 [EBNF-NOTATION].

Escape sequence rules are the same as Turtle [TURTLE-TR]. However, as only the STRING_LITERAL_QUOTE production is allowed new lines in literals MUST be escaped.

[1] ntriplesDoc ::= triple? (EOL triple)* EOL?
[2] triple ::= WS* subject WS+ predicate WS+ object WS* '.' WS*
[3] subject ::= IRIREF | BLANK_NODE_LABEL
[4] predicate ::= IRIREF
[5] object ::= IRIREF | BLANK_NODE_LABEL | literal
[6] literal ::= STRING_LITERAL_QUOTE ('^^' IRIREF | '@' LANG)?

Productions for terminals

[7] LANG ::= [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
[8] EOL ::= [#xD#xA]+
[9] WS ::= [#x20#x9]
[10] IRIREF ::= '<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>'
[11] STRING_LITERAL_QUOTE ::= '"' ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* '"'
[141s] BLANK_NODE_LABEL ::= '_:' (PN_CHARS_U | [0-9]) ((PN_CHARS | '.')* PN_CHARS)?
[12] UCHAR ::= '\u' HEX HEX HEX HEX | '\U' HEX HEX HEX HEX HEX HEX HEX HEX
[153s] ECHAR ::= '\' [tbnrf"']
[157s] PN_CHARS_BASE ::= [A-Z] | [a-z] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] | [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[158s] PN_CHARS_U ::= PN_CHARS_BASE | '_' | ':'
[160s] PN_CHARS ::= PN_CHARS_U | '-' | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040]
[162s] HEX ::= [0-9] | [A-F] | [a-f]

6. Parsing

Issue 3

It may be simple, but should still be defined.

A. Summary of diffrences in N-Triples and Turtle

This section is non-normative.

A.1 Triples

This section is non-normative.

N-Triples only allows for simple triple statements which MUST NOT contain new lines. A single triple is always a single line of the document.

A.2 IRI Representations

Turtle N-Triples example
absolute IRI yes yes <http://a.example/some/path/>
relative IRI yes no </some/path/>
prefixed name yes no rdfs:label
a for the predicate rdf:type yes no a

A.3 Literal Representations

Turtle N-Triples example
single-quoted single-line lexical representation yes no 'some literal'
double-quoted single-line lexical representation yes yes "some literal"
single-quoted multi-line lexical representation yes no '''some
literal'''
double-quoted multi-line lexical representation yes no """some
literal"""
abbreviated numeric yes no 13
abbreviated boolean yes no true

A.4 Summary of Blank Node Representations in N-Triples and Turtle

Turtle N-Triples example
labeled blank node yes yes <http://a.example/who#Alice> <http://xmlns.com/foaf/0.1/knows> _:bob .
anonymous node yes no <http://a.example/who#Alice> foaf:knows [] .
blank node property list yes no <http://a.example/who#Alice> foaf:knows [ foaf:name "Bob" ] .

B. N-Triples Internet Media Type, File Extension and Macintosh File Type

Contact:
Eric Prud'hommeaux
See also:
How to Register a Media Type for a W3C Specification
Internet Media Type registration, consistency of use
TAG Finding 3 June 2002 (Revised 4 September 2002)

The Internet Media Type / MIME Type for N-Triples is "application/n-triples".

It is recommended that N-Triples files have the extension ".nt" (all lowercase) on all platforms.

It is recommended that N-Triples files stored on Macintosh HFS file systems be given a file type of "TEXT".

This information that follows will be submitted to the IESG for review, approval, and registration with IANA.

Type name:
application
Subtype name:
n-triples
Required parameters:
None
Optional parameters:
None
Encoding considerations:
The syntax of N-Triples is expressed over code points in Unicode [UNICODE]. The encoding is always UTF-8 [UTF-8].
Unicode code points may also be expressed using an \uXXXX (U+0 to U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a hexadecimal digit [0-9A-F]
Security considerations:
N-Triples is a general-purpose assertion language; applications may evaluate given data to infer more assertions or to dereference IRIs, invoking the security considerations of the scheme for that IRI. Note in particular, the privacy issues in [RFC3023] section 10 for HTTP IRIs. Data obtained from an inaccurate or malicious data source may lead to inaccurate or misleading conclusions, as well as the dereferencing of unintended IRIs. Care must be taken to align the trust in consulted resources with the sensitivity of the intended use of the data; inferences of potential medical treatments would likely require different trust than inferences for trip planning.
N-Triples is used to express arbitrary application data; security considerations will vary by domain of use. Security tools and protocols applicable to text (e.g. PGP encryption, MD5 sum validation, password-protected compression) may also be used on N-Triples documents. Security/privacy protocols must be imposed which reflect the sensitivity of the embedded information.
N-Triples can express data which is presented to the user, for example, RDF Schema labels. Application rendering strings retrieved from untrusted N-Triples documents must ensure that malignant strings may not be used to mislead the reader. The security considerations in the media type registration for XML ([RFC3023] section 10) provide additional guidance around the expression of arbitrary data and markup.
N-Triples uses IRIs as term identifiers. Applications interpreting data expressed in N-Triples should address the security issues of Internationalized Resource Identifiers (IRIs) [RFC3987] Section 8, as well as Uniform Resource Identifier (URI): Generic Syntax [RFC3986] Section 7.
Multiple IRIs may have the same appearance. Characters in different scripts may look similar (a Cyrillic "о" may appear similar to a Latin "o"). A character followed by combining characters may have the same visual representation as another character (LATIN SMALL LETTER E followed by COMBINING ACUTE ACCENT has the same visual representation as LATIN SMALL LETTER E WITH ACUTE). Any person or application that is writing or interpreting data in Turtle must take care to use the IRI that matches the intended semantics, and avoid IRIs that make look similar. Further information about matching of similar characters can be found in Unicode Security Considerations [UNISEC] and Internationalized Resource Identifiers (IRIs) [RFC3987] Section 8.
Interoperability considerations:
There are no known interoperability issues.
Published specification:
This specification.
Applications which use this media type:
No widely deployed applications are known to use this media type. It may be used by some web services and clients consuming their data.
Additional information:
Magic number(s):
None.
File extension(s):
".nt"
Macintosh file type code(s):
"TEXT"
Person & email address to contact for further information:
Eric Prud'hommeaux <eric@w3.org>
Intended usage:
COMMON
Restrictions on usage:
None
Author/Change controller:
The N-Triples specification is the product of the RDF WG. The W3C reserves change control over this specifications.

C. References

C.1 Normative references

[EBNF-NOTATION]
Tim Bray; Jean Paoli; C. M. Sperberg-McQueen; Eve Maler; François Yergeau. EBNF Notation 26 November 2008. W3C Recommendation. URL: http://www.w3.org/TR/REC-xml/#sec-notation
[RDF-TESTCASES]
Jan Grant; Dave Beckett. RDF Test Cases. 10 February 2004. W3C Recommendation. URL: http://www.w3.org/TR/2004/REC-rdf-testcases-20040210/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Internet RFC 2119. URL: http://www.ietf.org/rfc/rfc2119.txt
[RFC3023]
M. Murata; S. St.Laurent; D. Kohn. XML Media Types January 2001. Internet RFC 3023. URL: http://www.ietf.org/rfc/rfc3023.txt
[RFC3986]
T. Berners-Lee; R. Fielding; L. Masinter. Uniform Resource Identifier (URI): Generic Syntax. January 2005. Internet RFC 3986. URL: http://www.ietf.org/rfc/rfc3986.txt
[RFC3987]
M. Dürst; M. Suignard. Internationalized Resource Identifiers (IRIs). January 2005. Internet RFC 3987. URL: http://www.ietf.org/rfc/rfc3987.txt
[TURTLE-TR]
Eric Prud'hommeaux; Gavin Carothers. Turtle: Terse Triple Language 19 February 2013. W3C Candidate Recommendation. URL: http://www.w3.org/TR/2013/CR-turtle-20130219/
[UNICODE]
The Unicode Consortium. The Unicode Standard.. Defined by: The Unicode Standard, Version 6.2.0, (Mountain View, CA: The Unicode Consortium, 2012. ISBN 978-1-936213-07-8) , as updated from time to time by the publication of new versions URL: http://www.unicode.org/standard/versions/enumeratedversions.html
[UTF-8]
F. Yergeau. UTF-8, a transformation format of ISO 10646. IETF RFC 3629. November 2003. URL: http://www.ietf.org/rfc/rfc3629.txt

C.2 Informative references

[UNISEC]
Mark Davis; Michel Suignard. Unicode Security Considerations 4 August 2010. URL: http://www.unicode.org/reports/tr36/