Use Cases And Requirements

From Linked Data Platform
Revision as of 14:56, 9 July 2012 by Eric (Talk | contribs)

Jump to: navigation, search

1 Linked Data Platform Use Cases And Requirements

This is a working document used to collect use cases and requirements for consideration by the WG. The starting point comes from Linked Data Basic Profile Use Cases and Requirements.

1.1 Use Cases

1.1.1 Maintaining Social Contact Information

Many of us have multiple email accounts that include information about the people and organizations we interact with – names, email addresses, telephone numbers, instant messenger identities and so on. When someone’s email address or telephone number changes (or they acquire a new one), our lives would be much simpler if we could update that information in one spot and all copies of it would automatically be updated. In other words, those copies would all be linked to some definition of “the contact.” There might also be good reasons (like off-line email addressing) to maintain a local copy of the contact, but ideally any copies would still be linked to some central “master.”

Agreeing on a format for “the contact” is not enough, however. Even if all our email providers agreed on the format of a contact, we would still need to use each provider’s custom interface to update or replace the provider’s copy, or we would have to agree on a way for each email provider to link to the “master”. If we look outside our own personal interests, it would be even more useful if the person or organization exposed their own contact information so we could link to it.

What would work in either case is a common understanding of the resource, a few formats needed, and access guidance for these resources. This would support how to acquire a link to a contact, and how to use those links to interact with a contact (including reading, updating, and deleting it), as well as how to easily create a new contact and add it to my contacts and when deleting a contact, how it would be removed from my list of contacts. It would also be good to be able to add some application-specific data about my contacts that the original design didn’t consider. Ideally we’d like to eliminate multiple copies of contacts, there would be additional valuable information about my contacts that may be stored on separate servers and need a simple way to link this information back to the contacts. Regardless of whether a contact collection is my own, shared by an organization, or all contacts known to an email provider (or to a single email account at an email provider), it would be nice if they all worked pretty much the same way.

1.1.2 Keeping Track of Personal and Business Relationships

In our daily lives, we deal with many different organizations in many different relationships, and they each have data about us. However, it is unlikely that any one organization has all the information about us. Each of them typically gives us access to the information (at least some of it), many through websites where we are uniquely identified by some string – an account number, user ID, and so on. We have to use their applications to interact with the data about us, however, and we have to use their identifier(s) for us. If we want to build any semblance of a holistic picture of ourselves (more accurately, collect all the data about us that they externalize), we as humans must use their custom applications to find the data, copy it, and organize it to suit our needs.

Would it not be simpler if at least the Web-addressable portion of that data could be linked to consistently, so that instead of maintaining various identifiers in different formats and instead of having to manually supply those identifiers to each one’s corresponding custom application, we could essentially build a set of bookmarks to it all? When we want to examine or change their contents, would it not be simpler if there were a single consistent application interface that they all supported? Of course it would.

Our set of links would probably be a simple collection. The information held by any single organization might be a mix of simple data and collections of other data, for example, a bank account balance and a collection of historical transactions. Our bank might easily have a collection of accounts for each of its collection of customers.

1.1.3 System and Software Development Tool Integration

System and software development tools typically come from a diverse set of vendors and are built on various architectures and technologies. These tools are purpose built to meet the needs for a specific domain scenario (modeling, design, requirements and so on.) Often tool vendors view integrations with other tools as a necessary evil rather than providing additional value to their end-users. Even more of an afterthought is how these tools’ data -- such as people, projects, customer-reported problems and needs -- integrate and relate to corporate and external applications that manage data such as customers, business priorities and market trends. The problem can be isolated by standardizing on a small set of tools or a set of tools from a single vendor, but this rarely occurs and if does it usually does so only within small organizations. As these organizations grow both in size and complexity, they have needs to work with outsourced development and diverse internal other organizations with their own set of tools and processes. There is a need for better support of more complete business processes (system and software development processes) that span the roles, tasks, and data addressed by multiple tools. This demand has existed for many years, and the tools vendor industry has tried several different architectural approaches to address the problem. Here are a few:

  • Implement an API for each application, and then, in each application, implement “glue code” that exploits the APIs of other applications to link them together.
  • Design a single database to store the data of multiple applications, and implement each of the applications against this database. In the software development tools business, these databases are often called “repositories.”
  • Implement a central “hub” or “bus” that orchestrates the broader business process by exploiting the APIs described previously.

It is fair to say that although each of those approaches has its adherents and can point to some successes, none of them is wholly satisfactory. The use of Linked Data as an application integration technology has a strong appeal. OSLC

1.1.4 Library Linked Data

The W3C Library Linked Data working group has a number of use cases cited in their Use Case Report. LLD-UC These referenced use cases focus on the need to extract and correlate library data from disparate sources. Variants of these use cases that can provide consistent formats, as well as ways to improve or update the data, would enable simplified methods for both efficiently sharing this data as well as producing incremental updates without the need for repeated full extractions and import of data.

1.1.5 Municipality Operational Monitoring

Across various cities, towns, counties, and various municipalities there is a growing number of services managed and run by municipalities that produce and consume a vast amount of information. This information is used to help monitor services, predict problems, and handle logistics. In order to effectively and efficiently collect, produce, and analyze all this data, a fundamental set of loosely coupled standard data sources are needed. A simple, low-cost way to expose data from the diverse set of monitored services is needed, one that can easily integrate into the municipalities of other systems that inspect and analyze the data. All these services have links and dependencies on other data and services, so having a simple and scalable linking model is key.

1.1.6 Healthcare

For physicians to analyze, diagnose, and propose treatment for patients requires a vast amount of complex, changing and growing knowledge. This knowledge needs to come from a number of sources, including physicians’ own subject knowledge, consultation with their network of other healthcare professionals, public health sources, food and drug regulators, and other repositories of medical research and recommendations.

To diagnose a patient’s condition requires current data on the patient’s medications and medical history. In addition, recent pharmaceutical advisories about these medications are linked into the patient’s data. If the patient experiences adverse affects from medications, these physicians need to publish information about this to an appropriate regulatory source. Other medical professionals require access to both validated and emerging effects of the medication. Similarly, if there are geographical patterns around outbreaks that allow both the awareness of new symptoms and treatments, this information needs to quickly reach a very distributed and diverse set of medical information systems. Also, reporting back to these regulatory agencies regarding new occurrences of an outbreak, including additional details of symptoms and causes, is critical in producing the most effective treatment for future incidents.

1.1.7 Metadata enrichment in broadcasting

There are many different use cases when broadcasters show interest in metadata enrichment:

  • enrich archive or news metadata by linking facts, events, locations and personalities
  • enrich metadata generated by automatic extraction tools such as person identification, etc.
  • enrich definitions of terms in classification schemes or enumeration lists

This comes in support of more effective information management and data/content mining (if you can't find your content, it' like if you don't have and must either recreate or acquire it again, which is not financially effective).

However, there is a need for solutions facilitating linkage to other data sources and taking care of the issues such as discovery, automation, disambiguation. Etc. Other important issues that broadcasters would face are the editorial quality of the linked data, its persistence, and usage rights.

1.1.8 Aggregation and Mashups of Infrastructure Data

For infrastructure management (such as storage systems, virtual machine environments, and similar IaaS and PaaS concepts), it is important to provide an environment in which information from different sources can be aggregated, filtered, and visualized effectively. Specifically, the following use cases need to be taken into account:

  • While some data sources are based on Linked Data, others are not, and aggregation and mashups must work across these different sources.
  • Consumers of the data sources and aggregated/filtered data streams are not necessarily implementing Linked Data themselves, they may be off-the-shelf components such as dashboard frameworks for composing visualizations.
  • Simple versions of this scenario are pull-based, where the data is requested from data sources. In more advanced settings, without a major change in architecture it should be possible to move to a push-based interaction model, where data sources push notifications to subscribers, and data sources provide different services that consumers can subscribe to (such as "informational messages" or "critical alerts only").

In this scenario, the important factors are to have abstractions that allow easy aggregation and filtering, are independent from the internal data model of the sources that are being combined, and can be used for pull-based interactions as well as for push-based interactions.

1.1.9 Data Sharing

In a downscaled context, where the used of a central data repository is replaced by several smaller servers, it is necessary to be able to ship information among the servers. A device in the network may publish an information on a server with an other device as a target receiver. This message will then have to be forwarded from server to server until that target is reached. A set of common standards for updating the content of containers and the description of the resources will be necessary to implement such feature (not taking the routing aspect into consideration here).

1.1.10 RESTful Interactions

REST's main focus is on building interactions around the exchange transfer between clients and servers. For this to work, it must be possible to define and communicate expectations for certain state transfers. In this gist the discussion centers around book orders, but pretty much any interaction in a SOA context could be used: some interaction requires a specific state transfer between client and server, and there must be a way how this state transfer is

  • captured in the context of a bigger interaction flow (what a media type defines on the web), and
  • expressed by means of expectations/constraints that apply to a specific representations, so that a server can validate against those expectations/constraints, and only accept those representations which satisfy the expectations/constraints (what often is done with a combination of schemas and prose in the context of a media type's conversation).

"What Are Linked Data Services?" describes these requirements as the "service surface" that needs to be defined by any platform that is providing some sort of services. This becomes particularly important in any kind of loosely coupled scenario, where servers/services cannot trust clients to always do the "right thing" or "behave cooperatively". instead, the platform must provide support so that misbehaving and adversarial clients can be dealt with effectively, and that means that the "service contract" needs to define the service surface based on the state representations that are acceptable in the context of the use case that is addressed by the service, so that anything else can be easily rejected.

1.1.11 Hosting POSTed Resources

<http://dev.example/bugs> is a factory resource for creating new bugs (well, documenting existing bugs). It accepts <Bug>s of the form:

 _:newBug a <Bug> ;
          <product> <> ;
          <issueText> "kills people" ;
          dc:author "Bob" ;
          dc:date "2012-07-04T23:54"^^xsd:dateTime

By this definition "hosting" means changing _:newBug to <http://dev.example/bug/7>. LDBP doesn't provide any gudiance around that.

1.2 Requirements

  1. Define a minimal set of RDF media-types/representations
  2. Define a limited number of literal value types
  3. Use standard vocabularies as appropriate
  4. Update resources, either RDF-based or not
  5. Use optimistic collision detection on updates
  6. Ensure clients are ready for resource format and type changes
  7. Apply minimal constraints for creation and update
  8. Add a resource to an existing container
  9. Remove a resource, including any associations with a container
  10. Get members of a container
  11. When getting members of a container, provide data about the members
  12. Get just data about a container, without all the members
  13. Handle a large number of members of a container, breaking up representation into pages
  14. Allow pages to have order information for members, within a page and across all pages