Best Practice for Web Data URI

From Data on the Web Best Practices
Revision as of 13:07, 21 February 2014 by Mcarrasc (Talk | contribs)

Jump to: navigation, search

This best practice specifies a simple use of URIs to address data on the web. One URI can have multiple versions of the same information (variants). For example, can have the Treaty of Rome in 24 languages and each language in HTML, PDF and plain text; in addition to metadata and other useful information.

One of the objectives is to allow the treatment of natural language as a dimension using the current web specifications. The intension is the proper use of existing specifications in the simplest possible way: it is not about creating new specifications. It can also be viewed as a return to the source (resuming the four points into two): use URIs and provide useful information.


Get a record-like size of data – XHTML with IDs

Get the whole database – one file, SQLite, compressed

Get the metadata and other useful information – XHTML with IDs

Get data branch

Language as a dimension – parameters as file extensions

Language as a dimension – parameters as query string;format=txt;format=pdf


Uniform Resource Identifier (URI) identifies a resource. One resource can have multiple variants and associated metadata and other useful information. The parameters requesting the variants can be sent in the HTTP header fields and/or in the URI query string usually as key-value.

Some relevant request header fields

 Accept:  (content type, format)

Order of preference

  • URI
  • Header fields
  • Server configuration

Human and machine readable

Human and machines must be able read/process the data. It is recommended to have one format valid for both. Other formats might be more specialized for human or machine.

Structure the data: the how is secondary.

Domain versus path


XHTML with IDs

In most browsers, the code below will be readable by humans and with proper marking it is machine procesable, though it is better if the server can supply the format desired by the requester. See also [Microformats].

   <span name="key">date</span>
   <span name="value">2011-08-24</span>
   <span name="key">creator</span>
   <span name="value">M.T. Carrasco Benitez</span>
   <span name="key">version</span>
   <span name="value">1</span>

Example of transformations.


 <creator>M.T. Carrasco Benitez</creator>


creator=M.T. Carrasco Benitez

Avoid techniques such as Javascript.

There should be defaults, in particular for the format.

Big and small data

One must be able to address a whole database (e.g., 1TB) and a single record (e.g., 1kb).

Multilingual data

The particular case of multilingual data must be resolved in the wide context of web data: it must be treated as a dimension in the same fashion as the format (media type). Another dimension might be time (previous versions).

Data on the web must be valid for tabular and prose data; raw and clean data.



  • Language
  • Format
  • Location
  • Time

Existing specifications


"Feature negotiation intends to provide for all areas of negotiation not covered by the type, charset, and language dimensions. Examples are negotiation on

  • HTML extensions
  • Extensions of other media types
  • Color capabilities of the user agent
  • Screen size
  • Output medium (screen, paper, ...)
  • Preference for speed vs. preference for graphical detail"


  • durip

Abbreviation of Data on the Web URI Best Practices. A noun that follows the appropriate morphological of the language. For example, Durip, durip, durips.



M.T. Carrasco Benitez
Manuel.Carrasco-Benitez AT

Feel free to modify this page or email the editor/mailing list public-dwbp-wg - (archive).