Supporting the Linked Data Consumer
This is a potential eGov Demo project to make consuming the Linked Open Data (LOD) cloud more user friendly and compelling - it is focused on the consumer of data. Making data consumption more compelling is required to make the LOD story viable – why publish if we can’t consume? It is also focused on the web client user, not the programmer. Many of the capabilities we need exist and there is a lot to build on – we are not ignoring the great work that has already been done such as Tabulator and sig-ma but suggesting how we may be able to put it together. We should be able to achieve a state where simple act of publishing uses generic capabilities, configured with RDF resources, to bring any LOD resource into the consumable data cloud that is then directly usable by citizens. We are not suggesting all this can or should be done at once, but that we can build-up a consumer centric capability from a solid foundation. What comes after “tabulator”?
Communicating the LOD Vision
Consider the situation where you have just explained the linked open data vision to a government executive. You have explained the “cloud picture”, and all the data that is being made available in data.gov and data.gov.uk. How data linked and accessible as web resources has more value You start to see a glimmer of understanding in their eyes, they look excitedly at you and ask: “This sounds great, can you show me this linked data cloud?” .
You cast your eyes down and say well, not exactly. There are several interesting applications coming on line a few great demos. You take them to this web site or that and show them all the hard work that has gone into LOD applications – but they just look like any web site serving data. You show them the links on data.gov, where the data can be downloaded and some of the other dataset registries – but the difference between this and some CSV files is not evident. You explain that an application really has to be written to get at the data.
The glimmer starts to fade in the eyes of your executive, the vision that they started to see in their mind – of a see of connected data is not something for them, it is for programmers who don’t want to convert data from its native format. If they are thinking of publishing data they still need to create an application for the data, just like they did before. If they want to use data they have to go through a specific program or programs for each dataset. There is no cohesive web of data that they can see, touch and get value from directly. Your executive starts checking email on her blackberry.
So, what would we need to have available when someone says “This sounds great, can you show me this linked data cloud?” Our proposition is that we must make the LOD cloud a visible, usable resource for consuming the data that is being published. We should be able to point to one or more well defined URLS where the LOD cloud comes alive – where the data is visible, browsable, searchable and usable for the average web user. While many of the parts and pieces of this exist, it doesn’t seem like it has been put together for the data consumer. The next section suggests a vision of what the user experience for consuming the semantic web may be.
There are multiple ways a user may “enter” the LOD cloud. They may want to browse or search for a particular kind of data, they may be following a link that someone gave them or using a LOD aware application that links externally. In any of these cases they are probably starting with a web page URL and expect to see some data or a data search/query capability. Lets consider a few use cases:
A LOD “Front page”
A LOD front page is an entry point into the LOD cloud. It isn’t providing the data consumer with any particular data point but a way to find the data you may be looking for. Some may be general and some more targeted to a specific purpose. The UK HM Government page is an example of such an entry point – from here you can find and search for the data sets that may interest you. Wouldn’t it be great if there were more of these and a few that were of very wide scope – where you could point someone and they could start using the semantic web directly? From such a front page you should be able to:
- Do generic searches and queries
- Browse for data sets (raw and processed or mashes up), applications that use that data as well as other “front pages”.
- Click on something of interest and see it
- Enter your credentials to see data, if required
- To express some preference as to what data providers you trust and the context of your interest
We have been carful to say “A” front page – not “the” front page. In keeping with web philosophy we don’t want to “lock down” the LOD cloud but keep it open. Of course some front pages will emerge as predominant and some technologies will be come commonplace to provide them.
A LOD Entity List
Through a front page or other means you have clicked on am interesting data set. What do you see? If the data set is small perhaps you see a summary list. If the data set is larger perhaps you see a more targeted search/query/taxonomy capability which then returns a summary list. You may also see some general information about the list – such as who maintains it. You may also see links to other resources relevant to the list – such as special analysis programs or viewers. Since we have a list of a particular kind of data you may expect a faceted browsing capability and some analysis capability built in. In summary, the features you would expect on an entity list are:
- To be able to see a summary of the entities, the most important properties in a grid, like excel.
- To be able to click on any entity and see it in more detail
- To be able to search the entities
- To execute canned queries and build queries
- To be able to follow taxonomies and facets
- To be able to understand the source(s) of the data and link to those sources
- To be able to link to other resources relevant to that data – particularly those that analyze and/or query the data, perhaps merged with other data sets.
- To merge other data sources that may provide additional information about entities in the data set
- To be able to see summary information in generic lists or in specifically styled presentations, such as a map or graph
- To be able to export data in various formats
A LOD Entity
Viewing a LOD entity should provide a well organized view of information about something. Of course that view should also be providing links to other information which will bring up other entities or lists. Specific “important” information should be highlighted, other properties shown and, for a large entity, some hidden behind “more detail”. Where sub-entities are very tightly coupled, those entities should be visible “in line” with the focus entity, showing their most important properties The result should be a very unsurprising view of a particular data entity – much like you would expect from Microsoft Access or any data oriented web application. Generically the entities would look very much a like, but there should also be the capability to “style” any entity or property with a special-purpose presentation. Other capabilities that should be available from an entity are “faceted browsing” and related queries.
In summary an entity view should:
- Provide differentiation between the entity “summary”, properties that should always be visible and those that are hidden. It would be great if this could somehow be sensitive to the users context.
- Show nested data in summary or tabular form, with links to more
- Show where various data elements came from
- Provide faceted browsing to other entities
- Be able to use the entity as a basis for query
- Link to other, special purpose, viewers and analyzers for that entity
Aggregated or Analyzed Data
There is a limit as to what can be done generically – so the LOD consumer should be able to call on special purpose analyzers and viewers, much like the demos done by the Tetherless World Project. These purpose specific applications should be able to link back to the generic data views as their source.
Read/Write and Executable LOD
While read-only access to the LOD cloud provides a lot of value, the next obvious step is write access. A mantra of the semantic web is that “anyone can say anything about anything, anywhere”. A unique feature of the LOD design is that augmenting and linking data can be done outside of the original data source - making the data environment much more open. It should be as easy or easier to create and modify LOD data as it is to create or update a spreadsheet. Once modifications to data are done, the next step is to allow for executable models and business rules to be triggered.
Therefore a next step in the support for the linked data consumer is full read/write capability with the management of rights and data ownership that this implies.
A lot of the capabilities outlined above exist or are in development. What seems to be missing is bringing it together into a consumer-centric and cohesive experience. You should be able to experience the global web of data.
Tools and capabilities that can be leveraged include:
- voiD and RDF catalogs
- Tetherless World Project.
- Various RDF triple stores and SPARQL engines
- Various implementations of federated query
- All the great work at data.gov.uk
- The EnACKing project and their mapping demonstration
- Semantic Pingback
- Much of the above requires programming which should be replaced by convention and configuration
- The practice of linked to an entire data set is not practical, all data must be viewed through SPARQL
- Combined queries either need to be federated, brought into a common repository or both. Suspect we will need “trusted sparql end points” that we know provide the capabilities we need and
- Need a consistent registry (or registry of registry) for metadata about datasets, probably use of voiD as the ontology
- Clean generic user interface with “plug in” capability to configure purpose-specific data widgets for types
- Configurations of data sets may need to be merged with other data sets to provide all the information the consumer may want. Links and vocabularies should not have to be tightly coupled with the raw data.
- Context specific views which bind the right views of the data, select desired data and UI widgets based on user & data context. [This is probably a stretch goal]
- Identity and sameAs processing
- Knowing what configuration of data sets is relevant for a specific view or query
- Binding URIs and URLs
- For write access - where to write various kinds of data based on the user and configuration
- Rights management
The “implementation” we have in mind is an out of the box LOD “NODE” that is free and open source. This would enable access to any published LOD data. We would deploy one of these nodes as our “front page” to the web of data. This would be modularized as follows:
- Overall architecture
- Generic but “configurable” and “pluggable” web UI, including:
- Front pages and registries
- Entity Lists
- Queries UI (canned and built)
- Entity pages
- Context and contextual biding, Binding of URI and URLs
- Back end that provides a consistent SPAQL for of all data in the web of data. This will probably require cashing a lot of the data that is not accessible via trusted sparql endpoints
- Federated search and query
- Identity and same-as management
- Metadata management
- Event notification to maintain curency between LOD nodes and with other clients
- Server stuff, including user management and authorities
- Content negotiation
- Supporting ontologies for configuration and metadata
The next step will be to look at the existing resources and chart a course.