2. Related Work and State of the Art

Expanded version of Chapter 2 in the W3C Incubator Group Report 15 April 2012

2.1 Decisions and Decision Making
2.2 Decision Support Systems
2.3 An Example Domain: Emergency Management
2.4 Semantic Web and Linked Data - The Basics
2.5 Engineering the Semantic Web
2.6 Semantic Web Applications

This section gives a brief overview of the areas related to the work of the incubator group. It does not aim to be complete with respect to the research state-of-the-art, it is merely a brief introduction to some of the ideas that have been used as a basis for the incubator or that have come up during the incubator meetings. The initial inspiration has come from the side of traditional decision support research, as well as applications in the emergency management domain, which are described in the following three sections.

One of the fundamental developments that have pushed this incubator forward, however, is the emergence of the Semantic Web, and in particular the endorsement by the W3C of a set of standards for the Semantic Web. The incubator has explored this as a foundation for developing an interchange format for decisions. A basic introduction to the Semantic Web and the related W3C standards can be found in the following two sections. The final section contains a by no means comprehensive description of Semantic Web applications related to decision support. These are the kinds of applications that have given this incubator some food of thought with respect to both the needs and requirements for a decision format, but also the possible applications that could be built using such a format.

2.1 Decisions and Decision Making

According to [Bennet2008, p.4] a decision can be described as "The decision direction is to change a currently unsatisfactory complex situation into a future satisfactory situation. [...] It is not possible to make just one decision to act with respect to a complex situation because there is no single action that will produce a solution. Rather, to move a complex situation to a desired complex situation requires a continuing process which must be built into a decision solution strategy that plans for a sequence of actions." This describes decision making as the process of planning and moving towards a desired complex situation. From this we can conclude that decisions are complex structures, that can contain a number of steps or sub-decisions that takes the decider one step closer to the complex situation which is the goal. Decisions involve a certain amount of "guessing" [Bennet2008], i.e. estimating and predicting what will be the result of a certain decision and ultimately the making a decision is about trusting one such guess, the one that predicts the situation that leads us on a path towards our goal.

Viewing decisions as described above, a decision is resulting in taking some action in order to change the situation in some way, to make it more similar to a desired goal situation. This means that as a starting point we have some (partial and uncertain) knowledge about our current complex situation, we have an (incomplete) description of the goal we are trying to reach, and we have some set of possible actions that can be taken in our current situation, each of which may or may not lead towards the goal.

One of the partial problems of decision making of today is in [Holsapple2008, p.21-22] described as: "From a diverse mass of knowledge, the decision maker strives to identify the specific knowledge that is both relevant and important for the decision at hand." This indicates that one particularly important part of decision making is having access to knowledge, both about the current complex situation as well as possible future situations that may result from a decision being made. As we all are aware of, today a vast amount of information is available, and with the emergence of the Semantic Web [Berners-Lee2001] the intention is to provide information on the web ready to be used also by software systems as opposed to only human readable web pages. In order to reuse such vast amounts of information, the Semantic Web introduces the possibility to describe the information using ontologies with formal semantics, to let software systems and users alike assess the relevance of the information for a certain task, based on its inherent meaning.

Another partial problem is of course the actual decision making, i.e. the selection of a certain set of changes to the current complex situation. Usually even human decision makers do not know exactly what parameters and factors are used in a certain decision. Experience and previous encounters of similar situations seem to play an important role in human decision making [Bennet2008,Holsapple2008]. Under this assumption, as humans we have ways of unconsciously recording our past decisions as "patterns" that we can retrieve and apply again when similar situations occur, if the previous outcome was successful, or avoid otherwise. In the context of AI researchers have during many years been trying to mimic such capabilities in software systems. Although with limited overall success, most researchers agree that one important part is to have access to both procedural and domain knowledge, i.e. both information about how to make decisions and what to decide on. Among the vast information available on the Semantic Web, e.g. as Linked Data (see further below), there is a lot of information directly concerning decisions and decision making but also a vast amount of information that could act as the basis for decision-making. Currently however, the decisions out there are represented in arbitrary and sometimes even proprietary formats, and decisions are usually not explicitly linked to the information they were based on.

In [Holsapple2008] the authors distinguish between the traditional view of decision making, and more complex and modern views involving additional aspects. The traditional view describes decision making as a three-step process; (1) Identify the alternatives, (2) study the implications of each alternative, and (3) compare the alternatives based on criteria, such as goals, purposes, constraints and pressures, and finally pick one of the alternatives as the decision. In addition to this, a decision may also be viewed as a piece of knowledge. This knowledge can be descriptive, i.e. describing some commitment to future actions, but it can also be procedural, i.e. describing how to actually perform those actions. This knowledge based view of decision making actually results in the conclusion that decision making is also a process of knowledge creation, i.e. we are creating some new knowledge, which is a representation of the decision, based on a set of premises.

2.2 Decision Support Systems

Decision support systems attempt to support the user in making decisions on several levels. Some systems simply try to provide the right information, so that the user can make sense of it and make an informed decision. Other systems go further, and try to also perform some parts of the assessment of the information, or the prediction of future situations resulting from decisions, based on the available information. Depending on the nature of the decision making process (for instance as described by [Byström1995], a system may be able to completely automate the process if all the information needs are well specified and the criteria and procedure for decision making is well-known). On the other hand, for a less well specified process all the steps may not be known, what information is needed to make the decision may not be clear, and even the outcome of the process, i.e. what kind of decision is needed, may not be clear. Obviously, the latter case poses a much more difficult challenge for decision support systems, but also increases the need for such systems in order for humans to make good decisions. While decision support systems on the lower levels of complexity are about reducing the workload of a user, i.e. through automating simple but tedious tasks such as making calculations and summarizing data, decision support in more complex cases is about guessing, predicting, and making sense of uncertain information.

In [Power2008] five general types of decision support systems (DSS) are distinguished:

Model-driven DSS
Data-driven DSS
Communications-driven DSS
Document-driven DSS
Knowledge-driven DSS

The model-driven DSSs operate on some model of the reality, in order to optimize or simulate outcomes of decisions based on data provided. In these systems the model is at focus, and can be accessed and manipulated by the decision maker in order to analyze a certain situation, while the amount of data is usually not large. A simple case could be a quantitative model for calculating the effect of some operation, e.g. in the financial domain. Data-driven DSSs on the other hand focus on the access and manipulation of large amounts of data, e.g. data warehousing systems tailored for certain tasks, or even more elementary system such as file systems with search and retrieval capabilities. While data-driven DSSs focus on retrieving and manipulating data, document-driven DSSs use text or multimedia document collections as their basis of decision information. Document analysis and text retrieval systems are simple examples from this category.

Communications-driven DSSs, on the other hand, focus on the interaction and collaboration aspects of decision making. Simple examples include groupware and video-conferencing systems that allow distributed and networked decision-making. Finally, knowledge-driven DSSs are those that actually recommend or suggest actions to the users. Rather than just retrieving information relevant to a certain decision, or allow for collaboration among decision makers, these systems try to perform some part of the actual decision making for the user through special-purpose problem-solving capabilities.

Although the simple examples given above may not give much hint as to where a decision representation format would be needed in such systems, it is at least obvious that many of these systems use some sort of model of "relevance" when providing information to the user, or apply some criteria or special-purpose procedures on the information. Such notions might be more useful if they are made explicit, e.g. what does relevance mean in a specific case? What criteria were applied? How can we represent the fact that this information is used as the basis of a certain decision-making process? Can we learn from others by studying what information was used in other processes? For the more advanced systems, e.g. knowledge-based DSSs, it soon becomes evident that the system could also benefit from some learning capabilities, e.g. to keep track of what previous decisions have been made, on what grounds, and what the outcome was, and the possibility to drill-down into the motivation of a decision proposal.

2.3 An Example Domain: Emergency Management

Sharing decisions across a broad and diverse set of users and systems is an important aspect of situational awareness in many domains. Perhaps no domain needs such capability more than the domain of emergency management. During an emergency, a diverse set of decisions must be shared among emergency managers and first responders from multiple organizations, jurisdictions, and functional capabilities. For example, decisions to route patients must be shared among first responders in the field who are sending the patients, those who are doing the transport, the medical facilities who are receiving the patients, and the patient's families and relatives. As another example, decisions for establishing evacuation routes must be coordinated with police, fire, transportation agencies and the general public. Also, decisions being made in the field provide an important component of the situational awareness which must be shared both horizontally and vertically across all participants to ensure a synchronized and efficient response effort.

First responders and emergency managers do an amazing job under extremely difficult conditions using their current mechanisms for information sharing; however, their need is great for improved solutions. For example, there are paper-based Incident Command forms which can be used to help provide an initial standardization of emergency information. Also there is an Incident Command Structure which can be used to organize responders into a hierarchical structure of sections (e.g. Operations, Planning, Logistics, Finance) and roles (e.g. Incident Commander, Public Information Officer, Safety Officer) commonly needed in an emergency. There exist several examples of incident command forms. In addition, XML-based standards are being developed to further the effort toward improved sharing of emergency information. The Organization for the Advancement of Structured Information Systems (OASIS) has a family of emergency management standards known as the Emergency Data Exchange Language (EDXL). Wikipedia provides a good overview of EDXL. The EDXL family of standards is available at the OASIS website. These standards include the Common Alert Protocol (CAP), Hospital Availablity (HAVE), Resource Messaging (RM) and the emerging Situation Reporting (SITREP) and Tracking of Emergency Patients (TEP). Together, these emergency management XML-based standards provide a more machine-friendly format for exchanging emergency information in a non-proprietary manner.

An important next step for improved sharing of emergency information is to utilize the semantic web standards, to enable the conversion of the XML-based information into the more flexible RDF format as needed and show the ability to integrate the various emergency management information for dynamic queries across datasets, for inferencing with underlying ontology support, and for enabling a more expressive format for representing policies. The initial steps for supporting this vision are already being made in this direction. For example, the OASIS Distribution Element (DE) is one of the EDXL standards designed to support packaging and addressing emergency management information for purposes such as routing. The standard has links to externally-managed "lists" representing concepts such as "senderRole", "receiverRole" and "keywords". The vision is that these links would be to ontologies which encapsulate in a machine-understandable format the information sharing policies that jurisdictions and organizations put in place to define who can or should receive what types of information. Implied in these emergency management standards, whether the paper-based ICS forms or the XML-based EDXL standards or the RDF representations, is the underlying decision-making process that continues at all levels through an emergency.

2.4 Semantic Web and Linked Data - The Basics

Since the first vision of the Semantic Web [Berners-Lee2001] it has been a focus of research of many groups worldwide. Although the Semantic Web as such is more of an application area rather than a research field in itself, it has given rise to many new inventions and trends. The main idea of the Semantic Web is to extend the web of documents, i.e. the web that mainly contains documents suitable for human consumption, to also include data, represented in some structured format so that its semantics can be interpreted and used by software systems.

A basic facilitator for this development has been the standardization of semantic web languages, e.g. RDF, OWL and SPARQL, where W3C has been the main actor. RDF provides a basic graphical data model, consisting of triples, i.e. statements of the form subject-predicate-object, where all three parts are resources identified through URIs. Resources may reside on the web, but URIs can also refer to real world things, that are in this way given an “online identity”. Finally, the object of a statement can also be a literal, e.g. a string or a number. RDF provides a common format for publishing data on the web, and it has an XML serialization that provides interoperability with previous web standards. Using the SPARQL query language, RDF graphs can be queried, similarly as other structured data sources.

However, RDF does not provide the means to express any domain semantics, e.g. a predicate that is part of a triple does not have any specific meaning in the sense of formal semantics. RDFS (RDF Schema) is a simple language that can be used to create a model of the data represented using RDF, i.e. to create ontologies. An ontology is in computer science commonly defined as an explicit specification of a shared conceptualization [Gruber1993], which usually is interpreted as a collection of concepts and relations between those concepts and their formal definitions expressed in some logic formalism. The “shared”-keyword indicates that it should be shared by some domain or community, and could also be interpreted as it should be available for those who need to interpret the data it models. RDFS provides a set of simple language primitives, such as the ability to express subclass-relations in a taxonomy, and domains and ranges of properties. However, RDFS does not set any restrictions on the usage of the language, which means that automated reasoning becomes difficult. Additionally, its expressivity is too low to express a number of axioms that are common in many domains, e.g. disjointness of concepts and inverse relations.

Based on this there is since 2004 a W3C recommendation for a language called OWL, the Web Ontology Language (since 2009 the OWL specification has been extended to what is called OWL 2 – here we will simply call it OWL). OWL builds on the semantics of a certain Description Logic, which has some desirable properties for instance with respect to automated reasoning. OWL builds on RDFS in the sense that it extends the set of language primitives provided by RDFS, but it also restricts RDFS in order to allow for automated reasoning. The work performed by this incubator group has been based on these semantic web standards, in order to be able to exploit the emerging semantic web also for decision-making, and for representing decision information.

In particular, one effort in the context of the Semantic Web is interesting from our perspective; this is the Linked Data initiative [Bizer2009]. Linked data is actually a set of principles for publishing and reusing interlinked data on the web, by representing the data using RDF, and identifying the data with URIs. Linked data first emerged on a large scale through the Linking Open Data (LOD) project that started to transform open datasets to RDF, publish them online, and subsequently interlink them. The LOD cloud is today a huge resource of interlinked datasets, which is commonly used by researchers and applications alike to provide data and background knowledge for a wide variety of tasks. We envision several possible usages of linked data relevant to this incubator; decisions can be based on this vast amount of data, but information about decisions and decision processes can also add to linked data and be the basis of new applications.

2.5 Engineering the Semantic Web

In order to exploit the benefits of the Semantic Web, e.g. exploiting all this data with well-defined semantics, the data first has to be published. The Linked Data initiative has provided a set of rules for publishing RDF data on the web, and there exist a number of tools for transforming different types of data sources into RDF triples. This process is commonly called triplification. However, there is less work so far on actually aligning the data produced to ontologies, i.e. vocabularies that formally express the semantics of the dataset. Some tools, such as Semion [Nuzzolese2010], provides a two-step process; first syntactically reengineering the data source into RDF, and subsequently refactoring it with respect to some ontology based on a set of alignment rules provided by the user. Such tools provide an opportunity to publish data with an appropriate format, i.e. RDF, but also expressed using some formal model, i.e. ontology. This is an important facilitator for the semantic web, since we cannot assume that all data published in the future will be directly provided in RDF format or expressed using an appropriate ontology.

The second question is then, where do the ontologies come from? There are a few well-known and much reused vocabularies available online, e.g. the ones used by Linked Data, but for more specific domain or tasks the ontologies have to be constructed and published together with the data. The reengineering mentioned above also gives us the opportunity to reengineer, for instance, XML schemas into ontologies. However, such schemas will of course never be more expressive than the XML schema language allows. Such reengineered vocabularies can therefor give us the link to align existing XML vocabularies to other information on the Semantic Web, but it is also an important task to produce native Semantic Web vocabularies, i.e. OWL ontologies.

Ontology Engineering is a popular field of research within the Semantic Web community. Numerous methodologies and tools exist (described at websites here and here as well as in [OE2009]) for producing OWL ontologies, but many of the earlier methods have been tailored towards large monolithic ontologies, e.g. foundational ontologies like SUMO and DOLCE or large domain or application ontologies, tailored to one specific application. The Semantic Web, especially when focusing on Linked Data, needs smaller and more modular ontologies that describe some particular aspect of a dataset. An example success story is the FOAF (friend of a friend) ontology, which describes people, their interests, and their interrelations (through the foaf:knows property). Although originally developed to create a description of the social network of the web, it is now used in a multitude of applications where there is a need for describing people. This shows how a small vocabulary can be very useful, exactly because it is small, easy to understand, and easy to reuse, and by reusing it the data expressed becomes immediately interoperable with a lot of other data on the web using the same vocabulary.

This leads us to believe that ontologies to express decision and decision processes need to be similarly small an modular, so that they can easily be reused and so that users do not have to commit to a huge foundational ontology, but rather to a small vocabulary expressing exactly those concepts that are useful for their dataset or application. One effort that promotes this view of ontologies is the Ontology Design Patterns [Gangemi2009] (as collected in the ODP Portal). Ontology Design Patterns (ODPs) come in many flavors, e.g. logical patterns for expressing certain complex axioms in the OWL language, reengineering patterns for transforming certain types of resources into OWL, reasoning patterns for producing certain kinds of inferences based on the OWL semantics, and so on. One particularly interesting class of ODPs is the Content ODPs. Content ODPs are domain specific, as opposed to the logical patterns that only deal with language expressivity, which means that they propose actual modeling solutions to concrete problems (they could be compared to the catalogues of data model patterns that exist for the database domain [Hay1996]). Although Content ODPs in general are abstract and representation independent solutions to common problems, they also come with example implementations in OWL. Such example implementations, are reusable OWL building blocks, i.e. small well-documented ontologies that target certain well-defined modelling problems. With inspiration from the ODPs, the decision formats that this incubator is targeting could be very similar to Content ODPs, or could even become Content ODPs.

Methodologies for creating such small modular ontologies, and their interrelations, partly require a new way of viewing the ontology engineering process. Recently, methodologies have been suggested that incorporate this kind of scenario, e.g. the NeOn methodology [Gomez-Perez2009]. One particularly interesting part of the NeOn methodology is the eXtreme Design (XD) [Presutti2009]. XD proposes an agile ontology engineering process, where problems are broken down into small sub-parts and then each part is addressed through an ontology module, realizing a small set of well-defined ontological requirements. The method also suggests heavy reuse of existing components, e.g. ODPs. This methodology has been an inspiration for the practical work of this incubator group, when drafting the decision format.

2.6 Semantic Web Applications

Some Semantic Web applications are already today closely related to decision support, i.e. trying to retrieve and/or make sense of the information available on the Semantic Web. The most substantial amount of data freely available today is the Linked Open Data, and many applications are making use of this data. There are simple browsers for finding and querying linked data, but also more advanced semantic recommender systems [Peis2008] that make use of Linked Data as a source for recommendations. One of the largest efforts in this area, that is now considered a reference for many other datasets, is DBPedia. DBPedia consist of data extracted from Wikipedia, but the data is aligned to an ontology, hence it can be queried and reasoned over using this ontology.

By looking at the submissions to the most recent Semantic Web challenge (a yearly competition for Semantic Web applications) it may be noted that a number of those systems could be classified as decision support systems. Some explicitly claim to target decision support, such as the InSciTe system [Lee2010] supporting R&D decision-makers, while others focus on more general data analysis and presentation tasks, such as the HadoopRDF system [Tian2010] for large scale RDF data analysis. Many systems are also aimed at making sense of, aggregating, using, and maintaining Linked Data, such as FalconsExplorer [Cheng2010] and Shortipedia [Vrandecic2010]. Additional efforts in the context of Linked Data are now also considering the modelling of statistical data, e.g. the Data Cube effort.