Best practices notes

From Data on the Web Best Practices
Jump to: navigation, search

notes on best practices

URI design and management for persistence

[See this page of working group notes]

From Andy Mabbett's talk on OpenStreetMap at our Face-to-face in London: // This may be too much for this section

  • OpenStreetMap tags essentially form triples (attributes of nodes). Not expressed that way though.
  • Reference external sites
    • (like openplaques_plaque = 1563). Not referenced with a URL; they'd like to.
    • english_heritage_ref 468884. The URL could go to a text-based article or images.
      • key:english_heritage_ref isn't defined. The community may have minted their own key (english_heritage_number, for example). Some editing tool will autosuggest existing keys, but not all, and users can ignore them.
    • Do include the URL to a website, but not necessarily structured data.
    • Store wikidata value
      • key:wikidata isn't documented. It's still new, and the community hasn't agreed that it should be used. Small community and slow consensus process.
    • store reference terms to build a wikipedia page, but don't explicitly build it because
      • it's long,
      • b) there are more than one wikipedia URL per article (http, https, etc.)
  • People are reluctant; they don't understand why they should. Volunteers think entering the number into a table is enough. We could run a script to replace the data with a URL, or we could make the fact that the data is part of a URL more clear on the (human readable) wiki.)
  • Subkeys: wikipedia:architect. or wikidata:architect.
  • Community conflicts: members in different parts of the world will mint different terms meaning the same thing. They slowly come together and conflict.
  • Unique reference number per node (in this case, for the way) is in the URL.
  • Tree. (node 2642471054) Attribute: Ref = 0141. Whose ref is that? In this case, it's a local council reference. No URL to be made there — nothing to link to.
  • When a node is replaced with a new node or new data, there is no way to know what happened to the old node. (Building destroyed? Data was just there in error?) Old one just returns an HTTP 200 error.

use of core vocabularies to improve interoperability

Day 1 Discussion was held whether or not vocabularies should be standardized. Challenges for vocabularies include:

  • common vocabularies are not used
  • should vocabularies be open?
  • what about licensing constraints on vocabularies?
    • ISO,
    • apache,
    • open licenses was used as an example.
    • Vocabularies can be licensed and used.
  • Standardized and referenced vocabularies were discussed:
    • If reference vocabularies not available or suitable, try to:
      • extend existing vocabularies,
      • suggest to cooperative setting to create a vocabulary, or
      • create your own


  • Existing reference vocabularies should be reused where possible (resolved)
  • Reference vocabularies should be shared in an open way (resolved)

guidance on the provision of metadata

Day 1 Discussion on the requirements on metadata:

  • An initial focus was held on metadata being machine readable.
  • This requirement was expanded to metadata being both machine and human readable.
  • A question was raised whether human readable and machine readable should be separate discussions.
  • Use cases to consider metadata in data that perhaps has not been considered thus far includes:
  • Another question was raised if machine readable meant not “natural language” using PDF as an example.
    • PDF on its own is only usable by humans. if your pdf include tables, you could use metadata to give provide explanation about the table. One recommendation was that when you scrape pdf it should have metadata
    • The following use case was identified:
      • User A publishes a PDF file. User B reads the PDF file over the next 3 weeks and decides to create a table based on the PDF. You need metadata to refer back to the original source that you generated the table from.
      • From the CSVW Working Group If User B page scrapes a table from the PDF file the PDF file is considered as a external source such as a database. The concern of the CSVW working group is the form that it is parsed and how the resulting tabular data is organized. From the Data Activity perspective this represents an interesting use case because DWBP Working Group could represent the best practices for using metadata to associate the derived table with the original PDF file.


  • There should be metadata (resolved)
  • Metadata vocabulary, or values if vocabulary is not standardised, should be well-documented (resolved)

publishing and accessing versions of datasets

making controlled vocabularies accessible as URI sets

technical factors for consideration when choosing data sets for publication

technical factors affecting potential use of open data for innovation, efficiency and commercial exploitation

data preservation