W3C

Media Types Issues for Text RDF Formats

Status of this Document

Prepared by ericP for media type discussions with the ietf and a W3C Team discussion in January 2008.

Abstract

An XML serialization of RDF has been a standard since 1998. Two textual representations of RDF, n3 and turtle, have been in use since 2000 and 2001 respectively. This documents the issues associated with selecting media types for these languages.

Cast of Characters

name role current media type example
RDF data model N/A
RDFXML XML serialization of RDF application/rdf+xml
<rdf:Description rdf:about="http://purl.org/commons/record/ncbi_gene/1812">
  <rdfs:label>human DRD1</rdfs:label>
</rdf:Description
ntriples simple serialization of RDF text/plain http://purl.org/commons/record/ncbi_gene/1812 http://www.w3.org/2000/01/rdf-schema#label "human DRD1" .
turtle textual serialization of RDF application/x-turtle ncbi_gene:1812 rdfs:label "human DRD1" .
n3 extension¹ of turtle language expressing a superset of RDF text/rdf+n3 (not registered) hcls:kb foo:asserts { ncbi_gene:1812 rdfs:label "human DRD1" } .

¹ Note that n3 actually predates turtle, but considering it an extension is practical for media type consideration.

Issues

subtree (e.g. text/ vs. application/)
turtle and n3 are textual expressions of RDF and are intended to be human-readable/editable. They are more human-readable than ntriples as they add namespace prefixes, relative URIs, and abbreviations for some atoms.
charset parameter
Both turtle and n3 use only UTF-8 character encoding. MIME (RFC 2046) and the media type registration process (RFC3023) allow a language to prescribe any character encoding as long as it doesn't confuse the interpretation of CR and LF. For compatibility for very old implementations of HTTP, HTTP/1.1 prescribes the use of a charset parameter for anything other than ascii or iso-8859-1 data that is served with a text/ Content-Type.

Registration Application

The registration application included the following parameters:

Type name:
text
Subtype name:
n3
turtle
Mandatory parameters:
none
Optional parameters:
charset — this parameter may be required when transfering non-ascii data across some protocols. If present, the value of charset is always UTF-8.
Encoding considerations:
The encoding is always UTF-8

with the expectation that, until HTTP sets no default charset (or a default charset of UTF-8), each HTTP transation which includes non-ascii characters will include a charset parameter:

Content-type: text/n3; charset="UTF-8"
Content-type: text/turtle; charset="UTF-8"

Discussion

subtree

While the choice of subtree may be forced for practical reasons (the desire to not include a charset parameter in every HTTP transaction), there remains the ideological choice between text/ and application/. The two camps principally divide on whether one's grandmother should see turtle and whether the text/ tree is meaningful if that is the only litmus.

charset

MIME (RFC2046) says that the default character encoding is us-ascii unless specified in the registration process (RFC3023). n3 and turtle, as well as related languages like SPARQL, are explcitly UTF-8. As stated above, HTTP would require a charset parameter for these languages if they used text/ media types. While MIME applies to technologies like SMTP and NNTP, the vast majority of custom code that will need to accomodate media types for these is for HTTP server scripts serving turtle or n3 and HTTP client scripts requesting them.

Choosing an application/ media type obviates the need for including ;charset="UTF-8" in HTTP transations, however, Martin Dürst asserts that modern browsers use the user preferences instead of 8859-1; there is no observable default charset in HTTP any more. If the community agrees that this is the only practice we need to worry about, we should update HTTP. The contra-indications would be security considerations and introduced need for user intervention to view documents they used to view without manually selecting a charset. Re security: no utf-8 document interpreted as 8859-1 will have any unintended control characters (u01-u1f). The principle debate here appears to be between Frank Ellermann preferring "default ASCII" and Martin Dürst preferring "no default". and which allows the application to use the internal charset.

Mail Archive

Media types for RDF languages N3 and Turtle

Dec 17 Eric Prud'homme (8.2K) Media types for RDF languages N3 and Turtle
                              │ initial summary of the problem space
Dec 17 Garret Wilson   (7.6K) ├─>Re: Media types for RDF languages N3 and Turtle
                              │ │ pointer to discussion of media type for ntriples
Dec 17 Sean B. Palmer  (2.9K) │ └─>
                              │   │ is us-ascii the default encoding? changing to UTF-8 would require an RFC
Dec 17 Garret Wilson   (6.2K) │   └─>
                              │     │ text/plain defaults to us-ascii; other subtrees may pick their own defaults
                              │     │ could render text/ obselete
Dec 17 Sean B. Palmer  (2.1K) │     └─>
                              │       │ does interpreting utf-8 as US-ASCII cause security issues?
                              │       │ DanC noted that civil disobedience was an option
Dec 17 Garret Wilson   (0.7K) │       └─>
                              │         │ +1 to civil disobedience if we document it
Dec 30 Garret Wilson   (1.6K) │         └─>
                              │             note that RFC2616 — HTTP 1.1 has a default charset of ISO-8859-1
Dec 17 Eric Prud'homme (8.3K) └─>[RESEND] Media types for RDF languages N3 and Turtle
                                │ same summary of the problem space (some thinkos and typos corrected)
Dec 19 Graham Klyne    (7.3K)   └─>Re: [RESEND] Media types for RDF languages N3 and Turtle
                                  │ use text/ *only* for media types intended *primarily* for human consumption
                                  │ use '-' instead of '+', e.g. application/rdf-turtlec (or provide good use case for '+')
Dec 20 Eric Prud'homme ( 10K)     ├─>Re: Re: [RESEND] Media types for RDF languages N3 and Turtle
                                  │ │ is the above metric (primarily for human consumption) shared by the community?
                                  │ │ is there precedent for '-'?
Dec 20 Graham Klyne    ( 10K)     │ └─>Re: [RESEND] Media types for RDF languages N3 and Turtle
                                  │     Ned Freed asserts that text/html was a mistake
                                  │     no precedent for '-'
                                  │     point of RDF is to *not* end up with a family of syntactically related languages
Dec 21 Garret Wilson   (2.1K)     └─>Re: [RESEND] Media types for RDF languages N3 and Turtle
                                    │ above metric (primarily for human consumption) precludes everything but text/plain
                                    │ propose criteria that match spirit of RFC2046:
                                    │ • all bytes compose text characters
                                    │ • always editable in a text editor
                                    │ • abstract values represented in text
Dec 21 Garret Wilson   (1.1K)       └─>
                                        • revision control system (e.g. CVS) treats it as text                              
    

Request for review of Turtle (an RDF serialization) media type: text/turtle

Dec 18 Eric Prud'homme (5.5K) Request for review of Turtle (an RDF serialization) media type: text/turtle
                              │ strawman request for text/turtle
Dec 18 Julian Reschke  (1.6K) ├─>Re: Request for review of Turtle (an RDF serialization) media type: text/turtle
                              │ │ why not application/?
                              │ │ note HTTP 1.1 default charset
                              │ │ see also HTTPbis issue 20
Dec 18 Frank Ellermann (0.3K) │ ├─>Unknown text/* subtypes (was: Request for review of Turtle (an RDF serialization) media type: text/turtle)
                              │ │ │ HTTP oddities shouldn't affect MIME registration
                              │ │ │ 2616bis can switch to "unknown text is ASCII"
Dec 18 Julian Reschke  (0.6K) │ │ ├─>Re: Unknown text/* subtypes
                              │ │ │ │ why the HTTP 1.1 rule?
Dec 18 Frank Ellermann (1.3K) │ │ │ └─>
                              │ │ │   │ some RFC history
Dec 26 Martin Duerst   (3.3K) │ │ │   └─>
                              │ │ │       necessary for backwards-compatibility with very early HTTP versions
                              │ │ │       backwards compatibility is no longer necessary
Dec 26 Martin Duerst   (1.2K) │ │ └─>Re: Unknown text/* subtypes (was: Request for review of Turtle (an RDFserialization) media type: text/turtle)
                              │ │   │ "there is no default" so the application can look for internal charset info
                              │ │   │ "unknown text is ASCII" would prohibit app from using the internal charset
Dec 26 Eric Prud'homme (3.2K) │ │   ├─>Re: Re: Unknown text/* subtypes (was: Request for review of Turtle (an RDFserialization) media type: text/turtle)
                              │ │   │   how will the browser know how to look for charset info?
Dec 28 Frank Ellermann (2.2K) │ │   ├─>Re: Unknown text/* subtypes
                              │ │   │ │ "default ASCII" would be consistent with MIME
                              │ │   │ │ docs with chars outside 00-7f should be erroneous without some charset
                              │ │   │ │ years after 2616bis, could migrate to "default UTF-8" 
                              │ │   │ │ current apps allow internal charsets to override if there is not explicit charset parm (**contradicts Martin's assertion**)
                              │ │   │ │ nobody treats text/html as "default Latin-1"
Dec 28 Anne van Kester (0.7K) │ │   │ ├─>
                              │ │   │ │   existing software treats text/xml as it treats application/xml
Jan 13 Ian Hickson     (1.6K) │ │   │ └─>
                              │ │   │   │ (HMTL4, HTML5, CSS) override both MIME and HTTP
                              │ │   │   │ defining behavior in HTTP likely to be ignored
                              │ │   │   │ use lower level (MIME) or app-level (XML, HTML, CSS, ...)
Jan 13 Eric Prud'homme (7.6K) │ │   │   ├─>Re: Re: Unknown text/* subtypes
                              │ │   │   │ │ consistent with Martin's assertion
                              │ │   │   │ │ what should the CRLF rules be?
                              │ │   │   │ │ when is "default" charset applied?
Jan 13 Ned Freed       (7.5K) │ │   │   │ └─>
                              │ │   │   │     MIME specs cover some SMTP-specific stuff
                              │ │   │   │     CRLF rules strengthened recently
                              │ │   │   │     text/ defaults to UTF-8, then 8859-1 won't fly for mail
Jan 15 Frank Ellermann (1.1K) │ │   │   └─>Re: Unknown text/* subtypes
                              │ │   │     │ assume-Latin-1 rule ignored by everyone, including the W3C validator
Jan 15 Ian Hickson     (0.6K) │ │   │     └─>
                              │ │   │       │ handling should be in [media-specific specs like] HTML
Jan 16 Frank Ellermann (0.9K) │ │   │       └─>
                              │ │   │           @@@ lost
Jan 04 Julian Reschke  (2.4K) │ │   └─>Re: Unknown text/* subtypes
                              │ │       2046 (text/* is US-ASCII), 2616 (text/* over HTTP is ISO8859-1), 3023 (text/xml is US-ASCII)
                              │ │       2616 should get out of it
Dec 18 Eric Prud'homme (4.6K) │ └─>Re: Re: Request for review of Turtle (an RDF serialization) media type: text/turtle
                              │   turtle is the most human-readable RDF format
                              │   text/ is useless if we can't use it here
Dec 19 James Cloos     (0.5K) └─>Re: Request for review of Turtle (an RDF serialization) media type: text/turtle
                                │ should unicode reference be to UCS (ISO 10646)?
Dec 20 Felix Sasaki    (1.0K)   └─>
                                  │ see Charmod Referencing Unicode and Charmod C062
Dec 20 James Cloos     (0.8K)     └─>
                                    │ IETF point of view is to prefer ISO over an industry organization
Dec 21 Felix Sasaki    (1.1K)       └─>
                                        Unicode provides additional semantics useful for implementers

media type for N-Triples (www-rdf-comments)

Oct 15 Tim Berners-Lee (  21) N-Triples MIME type should not be text/plain -- comment on RDF Test Cases.
                              │ propose text/rdf+n3 or text/rdf+n3; level=nt for NTriples
Oct 24 Graham Klyne    (  47) └─>Re: N-Triples MIME type should not be text/plain -- comment on RDF  Test Cases.
                                │ +xml convention assumed common consumers could fallback to suffix
                                │ ntriples isn't principally human-consumable
Nov 03 Garret Wilson   (  71)   └─>Re: N-Triples MIME type should not be text/plain -- comment on RDF   Test Cases.
                                  │ use of text/ is problematic RFC 4329
Nov 04 Garret Wilson   ( 108)     └─>Re: N-Triples MIME type should not be text/plain -- comment on RDF    Test Cases.
                                    │ background: 1 2
Nov 04 Graham Klyne    ( 148)       ├─>
                                    │ │ argument against text/
                                    │ │ history of +xml
Nov 04 Garret Wilson   (  27)       │ ├─>Re: N-Triples MIME type should not be text/plain -- comment on RDF     Test Cases.
                                    │ │ recipe use case introducing application/recipe+rdf+n3 and application/config+rdf+n3
Nov 04 Graham Klyne    (  71)       │ │ └─>
                                    │ │   │ distinguishing between RDF super-languages not useful due to open content model
Nov 04 Garret Wilson   (  39)       │ │   └─>Re: N-Triples MIME type should not be text/plain -- comment on RDF      Test Cases.
                                    │ │       having browser render application/recipe+rdf+n3 still useful
Nov 04 Garret Wilson   (  29)       │ └─>Re: N-Triples MIME type should not be text/plain -- comment on RDF     Test Cases.
                                    │ why *NOT* render application/...+rdf+n3 ?
Nov 05 Dan Brickley    (  22)       └─>Re: N-Triples MIME type should not be text/plain -- comment on RDF     Test Cases.
                                      │ note text/xml broken
Nov 04 Garret Wilson   (  61)         └─>Re: N-Triples MIME type should not be text/plain -- comment on RDF      Test Cases.
                                          *why* is it broken? default interpretation by the browser, and allowed/default encoding
                                          c.f. RFC2045 — MIME bodies and RFC2046 — MIME media types