Other data sharing protocols

From Data on the Web Best Practices
Jump to: navigation, search

This page contains a list of data sharing protocols. They may either be part of one of the Use_Cases or be outside of the scope of the group but considered to be related.

Web/HTTP based

The following protocols use HTTP as a transport layer, not necessarily in a ReSTful way

  • OPeNDAP : designed to share massive amount of data with the ability to resume transfers and get subsets of data. URL patterns are used for referencing to a data set, a custom "Dataset Descriptor Structure (DDS) format" is used to describe the metadata of the data set. GET parameters are then used to access part of the data, for example "http://data?lat" is used to get all the "lat" entries of "data" (see the quick-start guide).
  • OAI-PMH : specification used to implement resource synchronization across digital libraries. The specification defines URL patterns for requesting the meta-data and the content of the response to be returned. This response is serialized in XML and uses standard vocabularies such as Dublin Core and XMLSchema.
  • ResourceSync : successor of OAI-PMH which extends it by using site maps for documenting resources and features a more restful usage of HTTP.
  • WebDAV : HTTP extension which provides and interface similar to those of file-systems on top of an HTTP server. This protocol can be used to manage documents.
  • Memento : HTTP extension providing access to archived versions of a Web resource. The response to a GET is enriched with a pointer to a Memento timegate from which archived version of the resource can be requested. E.g. http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/page/Amsterdam is the timegate for http://dbpedia.org/resource/Amsterdam. The timegate points to the latest archived version of the resource by default, other versions can be requested by setting a timestamp in the Memento URL or using additional HTTP headers.
  • OSLC  : Open Services for Lifecycle Collaboration (OSLC) is an open community creating specifications for integrating tools. These specifications allow conforming independent software and product lifecycle tools to integrate their data and workflows in support of end-to-end lifecycle processes. Examples of lifecycle tools in software development include defect tracking tools, requirements management tools and test management tools. There are many more examples in software and product development and more still in “IT operations” (sometimes called service management) where deployed applications are managed.
OSLC is based on the W3C Linked Data. Here is a reminder of the 4 rules of linked data, authored by Tim Berners-Lee and documented on the W3C web site: http://www.w3.org/DesignIssues/LinkedData.html.
   Use URIs as names for things
   Use HTTP URIs so that people can look up those names
   When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
   Include links to other URIs. so that they can discover more things. (sic)


In OSLC, each artifact in the lifecycle – for example, a requirement, defect, test case, source file, or development plan and so on – is an HTTP> resource that is manipulated using the standard methods of the HTTP specification (GET, PUT, POST, DELETE).

Following the third rule of linked data, each resource has an RDF representation – OSLC mandates RDF/XML, which is the most widely adopted RDF notation - but can have representations in other formats, like JSON or HTML.

The OSLC Core specification defines a number of simple usage patterns of HTTP and RDF and a small number of resource types that help tools integrate and make the lifecycle work. The OSLC domain workgroups specify additional resource types specific to their lifecycle domain, but do not add new protocol.

Other

Other non-HTTP based protocols using a custom data transfer method over a network. They may still however use URIs to address the resources they share

  • Rsync : Is a software utility and network protocol that synchronizes files and directories from one location to another while minimizing data transfer by using delta encoding when appropriate.
  • FTP : File transfer protocol provides the means for clients (users and software) to interact directly with a remote file storage system including uploading and downloading files. Alternative forms of FTP include Secure FTP (SFTP) where authentication and data uses Secure Shell (SSH) for data transfer and authentication and and FTP with Transfer Socket Layer (TSL) that encrypts authentication but not the data.
  • GridFTP: Yet another alternative form of FTP, GridFTP is used to support scientific big data applications. It includes certificate based security, data striping, fault tolerance among other features.
  • SCP : Secure copy protocol (SCP) uses Secure Shell (SSH) for data transfer and authentication, ensuring authenticity and confidentiality of the data in transit. Options are available to automatically compress files prior to transmission.