Data on the Web URI Best Practices
This best practice specifies a simple use of URIs to address data on the web. One URI can have multiple versions of the same information (variants). For example, http://example.com/rome can have the Treaty of Rome in 24 languages and each language in HTML, PDF and plain text; in addition to metadata and other useful information.
One of the objectives is to allow the treatment of natural language as a dimension using the current web specifications. The intension is the proper use of existing specifications in the simplest possible way: it is not about creating new specifications. It can also be viewed as a return to the source (resuming the four points into two): use URIs and provide useful information.
Get a record-like size of data – XHTML with IDs
Get the whole database – one file, SQLite, compressed
Get the metadata and other useful information – XHTML with IDs
Get data branch
Language as a dimension – parameters as file extensions
http://example.com/1122.fr http://example.com/1122.xhtml http://example.com/1122.es.txt
Language as a dimension – parameters as query string
http://example.com/1122?lang=fr http://example.com/1122?format=xhtml http://example.com/1122?lang=es;format=txt http://example.com/1122?lang=en;format=pdf
URI and HTTP
Uniform Resource Identifier (URI) identifies a resource. One resource can have multiple variants and associated metadata and other useful information. The parameters requesting the variants can be sent in the HTTP header fields and/or in the URI query string usually as key-value.
Some relevant request header fields
Accept: (content type, format) Accept-Charset Accept-Language
Order of preference
- Header fields
- Server configuration
Human and machine readable
Human and machines must be able read/process the data. It is recommended to have one format valid for both. Other formats might be more specialized for human or machine.
Structure the data: the how is secondary.
Domain versus path
XHTML with IDs
In most browsers, the code below will be readable by humans and with proper marking it is machine procesable, though it is better if the server can supply the format desired by the requester. See also [Microformats].
<html> <head> <title>example</title> </head> <body> <div> <span name="key">date</span> <span name="value">2011-08-24</span> </div> <div> <span name="key">creator</span> <span name="value">M.T. Carrasco Benitez</span> </div> <div> <span name="key">version</span> <span name="value">1</span> </div> </body> </html>
Example of transformations.
<example> <date>2011-08-11</date> <creator>M.T. Carrasco Benitez</creator> <version>1</version> </example>
date=2011-08-24 creator=M.T. Carrasco Benitez version=1
There should be defaults, in particular for the format.
Big and small data
One must be able to address a whole database (e.g., 1TB) and a single record (e.g., 1kb).
The particular case of multilingual data must be resolved in the wide context of web data: it must be treated as a dimension in the same fashion as the format (media type). Another dimension might be time (previous versions).
Data on the web must be valid for tabular and prose data; raw and clean data.
"Feature negotiation intends to provide for all areas of negotiation not covered by the type, charset, and language dimensions. Examples are negotiation on
- HTML extensions
- Extensions of other media types
- Color capabilities of the user agent
- Screen size
- Output medium (screen, paper, ...)
- Preference for speed vs. preference for graphical detail"
Abbreviation of Data on the Web URI Best Practices. A noun that follows the appropriate morphological of the language. For example, Durip, durip, durips.
- Linked Data
- Uniform Resource Identifier (URI): Generic Syntax
- Hypertext Transfer Protocol – HTTP/1.1
- Transparent Content Negotiation in HTTP
- HTTP Framework for Time-Based Access to Resource States -- Memento
- Tags for Identifying Languages
- Media Types
- List of file formats
- Data formats
M.T. Carrasco Benitez
Manuel.Carrasco-Benitez AT ec.europa.eu