N-Triples

W3C RDF Core WG Internal Working Draft

This version:: http://www.w3.org/2001/sw/RDFCore/ntriples/
Revision: 1.9
Latest version:: http://www.w3.org/TR/rdf-testcases/#ntriples
Previous version:: None

Status of this Document

PLEASE NOTE: This document has been superceded by the RDF Test Cases Working Draft. See N-Triples for more information.

1. Introduction

N-Triples is a line-based, plain text format for representing the correct answers for parsing RDF/XML[RDFMS] test cases as part of the RDF Core working group.

Test cases in N-Triples can be found linked from the RDF Core issues tracking area especially the attention developers and closed issues sections.

This format was designed to be a fixed subset of N3[N3][N3-Primer] and hence N3 tools such as cwm, n-triples2kif.pl and Euler can be used to read and process it. cwm can output this format when invoked as "cwm -triples".

It is recommended, but not required, that N-Triples content is stored in files with an '.nt' suffix to distinguish them from N3.

The Internet Media Type / MIME Type of N-Triples is text/plain and the character encoding is 7-bit US-ASCII.

2. Extended Backus-Naur Form (EBNF) Grammar

An N-Triples document is a sequence of US-ASCII characters and is defined by the ntripleDoc grammar term below. Parsing it results in a sequence of RDF statements formed from the subject, predicate and object terms. The meaning of these are defined either in [RDFMS] or is ongoing as part of the RDF Core WG activity.

This EBNF is the notation used in XML 1.0 second edition

ntripleDoc	::=	line*
line	::=	ws* (comment \| triple) ? eoln
comment	::=	'#' (character - ( cr \| lf ) )*
triple	::=	subject ws+ predicate ws+ object ws* '.' ws*
subject	::=	uriref \| namedNode
predicate	::=	uriref
object	::=	uriref \| namedNode \| literal
uriref	::=	'<' absoluteURI '>'
namedNode	::=	'_:' name
literal	::=	'"' string '"'
ws	::=	space \| tab
eoln	::=	cr \| lf \| cr lf
space	::=	#x20 /* US-ASCII space - decimal 32 */
cr	::=	#xD /* US-ASCII carriage return - decimal 13 */
lf	::=	#xA /* US-ASCII linefeed - decimal 10 */
tab	::=	#x9 /* US-ASCII horizontal tab - decimal 9 */
string	::=	character* with escapes. Defined in section Strings
name	::=	[A-Za-z][A-Za-z0-9]*
absoluteURI	::=	( character - ( '<' \| '>' \| space ) )+
character	::=	[#x20-#x7E] /* US-ASCII space to decimal 127 */

3. Strings

N-Triples strings are characters from the character production range, with selected characters outside that range made available by \-escape sequences as follows:

Escape sequence	Encodes character
\\	US-ASCII backslash character (decimal 92, #x5c)
\"	US-ASCII double quote (decimal 34, #x22)
\n	US-ASCII linefeed (decimal 10, #xA) - lf character
\r	US-ASCII carriage return (decimal 13, #xD) - cr character
\t	US-ASCII horizontal tab (decimal 9, #x9) - tab character

This is based on a subset of the allowed escapes in [N3] which is in turn based on Python string literals. It is a subset since not all that complexity is needed or required and for example, the \ before real newlines (cr or lf) could not be allowed since that would break the line-basis of N-Triples.

4. Example

The following N-Triples file consists of three RDF statements:

<http://www.w3.org/2001/sw/RDFCore/ntriples/> <http://purl.org/dc/elements/1.1/creator> "Dave Beckett" .
<http://www.w3.org/2001/sw/RDFCore/ntriples/> <http://purl.org/dc/elements/1.1/creator> "Art Barstow" .
<http://www.w3.org/2001/sw/RDFCore/ntriples/> <http://purl.org/dc/elements/1.1/publisher> <http://www.w3.org/> .

which represents the following RDF/XML:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/">
  <rdf:Description rdf:about="http://www.w3.org/2001/sw/RDFCore/ntriples/">
    <dc:creator>Art Barstow</dc:creator>
    <dc:creator>Dave Beckett</dc:creator>
    <dc:publisher rdf:resource="http://www.w3.org/"/>
  </rdf:Description>
</rdf:RDF>

5. Issues

N-Triples is a text/plain MIME type format but we require it to express all Unicode[Unicode] characters. Consider the following solutions:
1. Add the following escapes (after Python) to allow N-Triples to encode Unicode characters:
  \uxxxx for characters [#x0-#x8],[#xB#xC],[#xE-#x1F],[#x7F-#xFFFF]
  \Uxxxxxxxx for characters [#x10000-#xFFFFFFFF]
  and maybe \xhh for characters [#x0-#x8],[#xB#xC],[#xE-#x1F],[#x7F-#xFF]
  but make sure each character has only one way to encode it - as recommended by Character Escaping in [Charmod].
  
  This does not match the recommendation that escapes should have an explicit terminator.
2. Add one escape \u [A-Fa-f0-9]{1,8} ';' to match the recommendation in Charmod that there should be an explicit terminator for escapes.
3. Make N-Triples a UTF-8 format and have no special escaping. There is growing language support for this, however it makes N-Triples impossible to generate Unicode characters with plain-text tools.
  [Charmod] section Reference Processing Model recommends new text based formats that require compatibility with ASCII to use UTF-8 character encoding. This does not prevent having character escaping also.
Once one of 1-3 chosen, add it to absoluteURI after the recommendation in section Character Encoding in URI Referencesin [Charmod]

6. References

[RDFMS] Lassila and Swick (ed.), RDF Model and Syntax, W3C Recommendation, 22 February 1999.

[N3] Tim-Berners-Lee, Notation 3, over period 1998-.

[N3-Primer] Tim Berners-Lee, Primer: Getting into RDF & Semantic Web using N3, period 2000-.

[Unicode] The Unicode Standard, Version 3.0, Addison Wesley, Reading MA, 2000, ISBN: 0-201-61633-5.

[Charmod] Martin J. Dürst, François Yergeau, Character Model for the World Wide Web, W3C Working Draft, 26 January 2001.

7. History

N-Triples was named and decided as the test-case format in the RDF Core WG 2001-06-01 meeting; see the minutes and chat logs

It was based on discussions from the long thread on RDF Core WG in May 2001 - Test cases: format of input and output mostly summarised in this message from Dan Connolly.

The original version of the grammar was based on emails to RDF Core list from Art Barstow: 1, 2 and change to 'triple' rule to allow ws* before '.' from Jan Grant.

Dave Beckett made several changes as documented in changes.

A. Changes

2001-09-06 V1.9
 Updated to point to RDF Test Cases draft where this has been merged into.
   Dave Beckett
  
2001-08-09 V1.8
 Added example section
 Minor edits
   Dave Beckett

2001-07-25 V1.7
 Added reference to Unicode specification
 Added references to Charmod and its impact on character encoding
 Updated issue resolutions.
 Moved references section before history.
   Dave Beckett

2001-07-24 V1.6
 Now a US-ASCII format (only characters 0..127 allowed).
 Changed eoln to be cr | lf | cr lf
 Renamed anonNode to namedNode to remove any implied meaning.
 Made character production use the correction notation for character range.
 Renamed qLiteral to literal.
   Dave Beckett


2001-07-19 V1.5

 Updated after comments from Graham Klyne:
  http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Jul/0231.html
  http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Jul/0235.html

 Grammar now uses the correct EBNF from the XML notation section.
 Changed name to always start with a letter
 Updated issues with suggested solutions for encoding, eoln.
 
   Dave Beckett

2001-07-10 V1.4

	Removed blankline, use ? in line production instead.
	  Dave Beckett

2001-07-09

	Added space term
	Added character term
	Added name term for defining anonNode (was Nmtoken)
	Defined absoluteURI inline rather than via URI spec.
	Moved grammar history to history section.
	Added changes section.
   Dave Beckett

2001-07-08

	Reworked grammar to bring out line basis.
	Remove vertical tab from ws.
	Made eoln be cr? lf
	Added string escaping section
	Split references into history + references sections
	Added section numbers, links
	Updated unicode encoding issues
	Lots of rewording.
	  Dave Beckett

2001-07-05

	grammar is now a table

	New file.
	  Dave Beckett

Dave Beckett, Institute for Learning and Research Technology, University of Bristol

Art Barstow, World Wide Web Consortium