Best Practice for Web Data URI

From Data on the Web Best Practices
Revision as of 13:07, 21 February 2014 by Mcarrasc (Talk | contribs)

Jump to: navigation, search

This best practice specifies a simple use of URIs to address data on the web. One URI can have multiple versions of the same information (variants). For example, http://example.com/rome can have the Treaty of Rome in 24 languages and each language in HTML, PDF and plain text; in addition to metadata and other useful information.

One of the objectives is to allow the treatment of natural language as a dimension using the current web specifications. The intension is the proper use of existing specifications in the simplest possible way: it is not about creating new specifications. It can also be viewed as a return to the source (resuming the four points into two): use URIs and provide useful information.

Examples

Get a record-like size of data – XHTML with IDs

http://example.com/1122

Get the whole database – one file, SQLite, compressed

http://example.com/?all

Get the metadata and other useful information – XHTML with IDs

 http://example.com/1122?info

Get data branch

 http://example.com/1122/foo/bar

Language as a dimension – parameters as file extensions

 
  http://example.com/1122.fr
  http://example.com/1122.xhtml
  http://example.com/1122.es.txt
  

Language as a dimension – parameters as query string

 
  http://example.com/1122?lang=fr
  http://example.com/1122?format=xhtml
  http://example.com/1122?lang=es;format=txt
  http://example.com/1122?lang=en;format=pdf
  

URI and HTTP

Uniform Resource Identifier (URI) identifies a resource. One resource can have multiple variants and associated metadata and other useful information. The parameters requesting the variants can be sent in the HTTP header fields and/or in the URI query string usually as key-value.

Some relevant request header fields

 Accept:  (content type, format)
 Accept-Charset
 Accept-Language

Order of preference

  • URI
  • Header fields
  • Server configuration

Human and machine readable

Human and machines must be able read/process the data. It is recommended to have one format valid for both. Other formats might be more specialized for human or machine.

Structure the data: the how is secondary.

Domain versus path

Example

http://en.example.com/foo
http://example.com/foo.en

XHTML with IDs

In most browsers, the code below will be readable by humans and with proper marking it is machine procesable, though it is better if the server can supply the format desired by the requester. See also [Microformats].

<html>
 <head>
  <title>example</title>
 </head>
 <body>
  <div>
   <span name="key">date</span>
   <span name="value">2011-08-24</span>
  </div>
  <div>
   <span name="key">creator</span>
   <span name="value">M.T. Carrasco Benitez</span>
  </div>
  <div>
   <span name="key">version</span>
   <span name="value">1</span>
  </div>
 </body>
</html>

Example of transformations.

XML

<example>
 <date>2011-08-11</date>
 <creator>M.T. Carrasco Benitez</creator>
 <version>1</version>
</example>

Key-value

date=2011-08-24
creator=M.T. Carrasco Benitez
version=1

Avoid techniques such as Javascript.

There should be defaults, in particular for the format.

Big and small data

One must be able to address a whole database (e.g., 1TB) and a single record (e.g., 1kb).

Multilingual data

The particular case of multilingual data must be resolved in the wide context of web data: it must be treated as a dimension in the same fashion as the format (media type). Another dimension might be time (previous versions).

Data on the web must be valid for tabular and prose data; raw and clean data.

Dimensions

Priorities

  • Language
  • Format
  • Location
  • Time

Existing specifications

Feature

"Feature negotiation intends to provide for all areas of negotiation not covered by the type, charset, and language dimensions. Examples are negotiation on

  • HTML extensions
  • Extensions of other media types
  • Color capabilities of the user agent
  • Screen size
  • Output medium (screen, paper, ...)
  • Preference for speed vs. preference for graphical detail"

Terminology

  • durip


Abbreviation of Data on the Web URI Best Practices. A noun that follows the appropriate morphological of the language. For example, Durip, durip, durips.

References

Editor

M.T. Carrasco Benitez
Manuel.Carrasco-Benitez AT ec.europa.eu

Feel free to modify this page or email the editor/mailing list public-dwbp-wg - (archive).