Uncertainty Reasoning for the World Wide Web

W3C Incubator Group Report 31 March 2008

This version:: http://www.w3.org/2005/Incubator/urw3/XGR-urw3-20080331/
Latest version:: http://www.w3.org/2005/Incubator/urw3/XGR-urw3/
Editors:: Kenneth J. Laskey, MITRE; Kathryn B. Laskey, George Mason University; Paulo C. G. Costa, George Mason University; Mieczyslaw M. Kokar, Northeastern University; Trevor Martin, University of Bristol; Thomas Lukasiewicz, Oxford University
Contributors:: See Acknowledgments.

Abstract

This is the report of the W3C Uncertainty Reasoning for the World Wide Web Incubator Group (URW3-XG) as specified in the Deliverables section of its charter.

In this report we present requirements for better defining the challenge of reasoning with and representing uncertain information available through the World Wide Web and related WWW technologies.

Specifically the report:

identifies and describes situations on the scale of the World Wide Web for which uncertainty reasoning would significantly increase the potential for extracting useful information,
identifies methodologies that can be applied to these situations and the fundamentals of a standardized representation that could serve as the basis for information exchange necessary for these methodologies to be effectively used,
includes a set of use cases illustrating conditions under which uncertainty reasoning is important,
provides an overview and discusses the applicability to the World Wide Web of prominent uncertainty reasoning techniques and the information that needs to be represented for effective uncertainty reasoning to be possible,
includes a bibliography of work relevant to the challenge of developing standardized representations for uncertainty and exploiting them in Web-based services and applications.

The report identifies various areas which require further investigation and debate.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of Final Incubator Group Reports is available. See also the W3C technical reports index at http://www.w3.org/TR/.

This document was developed by the W3C Uncertainty Reasoning for the World Wide Web Incubator Group. It represents the consensus view of the group, in particular the editors of this document and those listed in the acknowledgments, on the issues regarding the representation of uncertainty for the World Wide Web.

Publication of this document by W3C as part of the W3C Incubator Activity indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. Participation in Incubator Groups and publication of Incubator Group Reports at the W3C site are benefits of W3C Membership.

Incubator Groups have as a goal to produce work that can be implemented on a Royalty Free basis, as defined in the W3C Patent Policy. Participants in this Incubator Group have made no statements about whether they will offer licenses according to the licensing requirements of the W3C Patent Policy for portions of this Incubator Group Report that are subsequently incorporated in a W3C Recommendation.

1. Introduction

1.1 The Problem of Uncertainty Representation and Reasoning

The World Wide Web community envisions effortless interaction between humans and computers, seamless interoperability and information exchange among web applications, and rapid and accurate identification and invocation of appropriate Web services. As work with semantics and services grows more ambitious, there is increasing appreciation of the need for principled approaches to representing and reasoning under uncertainty. In this Report, the term "uncertainty" is intended to encompass a variety of aspects of imperfect knowledge, including incompleteness, inconclusiveness, vagueness, ambiguity, and others. The term "uncertainty reasoning" is meant to denote the full range of methods designed for representing and reasoning with knowledge when Boolean truth values are unknown, unknowable, or inapplicable. Commonly applied approaches to uncertainty reasoning include probability theory, fuzzy logic, Dempster-Shafer theory, and numerous other methodologies.

To illustrate, consider a few web-relevant reasoning challenges that could be addressed by reasoning under uncertainty.

Uncertainty is an intrinsic feature of many of the required tasks, and a full realization of the World Wide Web as a source of processable data and services demands formalisms capable of representing and reasoning under uncertainty. Although it is possible to use semantic markup languages such as OWL to represent qualitative and quantitative information about uncertainty, there is no established foundation for doing so. Therefore, each developer must come up with his/her own set of constructs for representing uncertainty. This is a major liability in an environment so dependent on interoperability among systems and applications.

Apart from the interoperability issues caused by proprietary uncertainty representations, there are ancillary issues such as how to balance representational power vs. simplicity of uncertainty representations, which uncertainty representation technique(s) addresses uses such as the examples listed above, how to ensure the consistency of representational formalisms and ontologies, etc. None of these issues can be addressed in a principled way by current Web standards.

1.2 The Incubator Group Activity (XG)

Given the current state of the overall subject of uncertainty representation and reasoning for the WWW, it became clear that the best approach would be to create an Incubator group, which provides an opportunity to share perspectives on the topic with all the advantages already cited in the W3C's Incubator Activity. Once the group was launched, the group's Charter were posted with all the details regarding the group's assignments, rules, and deliverables. Among the instructions, URW3-XG members were reminded that membership conditions include patent disclosure obligations as set out in Section 6 of the W3C Patent Policy and of their goal to produce work that can be implemented on a Royalty Free basis, as defined in the W3C Patent Policy.

2. Intent and Process

2.1 Scope of the URW3-XG

For the first objective, the URW3-XG has compiled a set of use case descriptions to expand on the examples noted above, and solicited and further developed other examples of the kinds of information management challenges that would benefit (and if available, have already benefited) most from mechanisms for reasoning under uncertainty.

For the second objective, the URW3-XG has identified methodologies that may be applied to address the use cases developed under the first objective and that show promise as candidate solutions for uncertainty reasoning on the scale of the World Wide Web. The combination of use cases and associated methodologies was examined to determine the most commonly required information and also the information that, while not common, may be especially important in particular situations.

The results of this one-year work pursuing the above objectives are listed below, in a total of 16 use cases, some of them including comprehensive information and details on how uncertainty would help to address issues that cannot be properly addressed with current deterministic approaches.

It is our expectation that the URW3-XG would recommend those aspects that are considered most important to be included in a standard representation of vagueness and uncertainty. The information below was written in a way of avoiding any connotation that this group advocates the choice of any one uncertainty methodology over others. Instead, we complied with our directives of seeking to identify the type of information that would need to be saved as part of a general resource description and transmitted to a reasoning engine for useful processing. The recommended set does not include all identified information or address every use case in the initial collection. Instead, the entire use case collection below provides a basis for discussing whether the recommended set is sufficient to advocate further actions along the W3C Recommendation Track, either as a separate Recommendation or as part of other related work.

Finally, our scope did not include recommending a single methodology but to investigate whether standard representations of uncertainty can be identified that will support requirements across a wide spectrum of reasoning approaches.

2.2 Process

To achieve the objectives cited above, the XG group was chartered, eventually comprising 25 participants from North and South America, Europe, and Australia and spread through a range of time zones spanning 18 hours. It conducted over 20 telecons, with an average duration between 90 and 120 minutes, plus partial face-to-face meetings held at the 5th ISWC (Busan - Korea) and the SUM conference in College Park, Maryland USA. The telecons were supported by the W3C resources (e.g. telecon bridge, IRC, RSSAgent, etc) and its results and action items were all catalogued in online Minutes. Every telecon also had an agenda with items to be discussed, the first always being to approve the last telecon's minutes.

Most of the issues being discussed were posted in the group's website in the form of wiki pages, which were updated as new information became available and conclusions were drawn. Three months before the group's scheduled end, a draft of this report was posted in wiki format to enable participants to actively contribute to the material included here.

This Report is the major deliverable of the URW3-XG and describes the work done by the XG, identifies the elements of uncertainty that need to be represented to support reasoning under uncertainty for the World Wide Web, and includes a set of use cases illustrating conditions under which uncertainty reasoning is important. Along with the use cases (Section 5), this report also includes the Uncertainty Ontology (Section 3) that was developed during the discussions within our work, an overview of the applicability to the World Wide Web of numerous uncertainty reasoning techniques and the information that needs to be represented for effective uncertainty reasoning to be possible (Section 4), and a discussion on the benefits of standardization of uncertainty representation to the WWW and the Semantic Web (Section 6). Finally, it includes a Reference List of work relevant to the challenge of developing standardized representations for uncertainty and exploiting them in Web-based services and applications.

3. Uncertainty Ontology

A simple ontology was developed to demonstrate some basic functionality of exchanging uncertain information. In addition, this ontology was used to classify the use cases developed by this group with the intent of obtaining a relatively complete coverage of the functionalities related to uncertainty reasoning about information available on the World Wide Web.

While this ontology served the purpose of focusing discussion of the use cases allowing us to show examples of annotation of uncertainty, it should be clear that this ontology is just a first iteration in a larger process. We recommend that an effort to develop a more complete ontology for annotating uncertainty should be undertaken following this XG activity.

A short description of the ontology is presented below. First, the top level of the ontology is shown. Then the classes in the ontology are described. And finally the relations among the classes are discussed.

3.1 Classes of the Uncertainty Ontology

OBS: The * means that the multiplicity constraint on the given property has not been specified, i.e., the property can have zero or more values for a given instance of the domain of the property.

3.1.1 Sentence

An expression in some logical language that evaluates to a truth-value (formula, axiom, assertion). It is then assumed that information will be presented in the form of sentences. So the uncertainty will be associated with sentences.

3.1.2 World

3.1.3 Agent

This is the class representing whoever makes the statement. It can be either a human or a computer agent (machine).

3.1.4 Uncertainty

3.1.5 Uncertainty Nature

This captures the information about the nature of the uncertainty, i.e., whether the uncertainty is inherent in the phenomenon expressed by the sentence, or it is the result of lack of knowledge of the agent.

Aleatory - the uncertainty comes from the world; uncertainty is an inherent property of the world.

3.1.6 Uncertainty Derivation

3.1.7 Uncertainty Type

Ambiguity - The referents of terms in a sentence about the world are not clearly specified and therefore it cannot be determined whether the sentence is satisfied, see also http://en.wikipedia.org/wiki/Ambiguity.

Empirical - a sentence about a world (an event) is either satisfied or not satisfied in each world, but it is not known in which worlds it is satisfied; this can be resolved by obtaining additional information (e.g., an experiment).

Randomness - sentence is an instance of a class for which there is a statistical law governing whether instances are satisfied.

Vagueness - there is not a precise correspondence between terms in the sentence and referents in the world, see also http://en.wikipedia.org/wiki/Vagueness.

Incompleteness - information about the world is incomplete, some information is missing.

3.1.8 UncertaintyModel

This class contains information on the mathematical theories for the uncertainty types. The specific types of theories include, but are not limited to, the following:

3.2 Properties of the Uncertainty ontology

nature - uncertainty U has nature S (either aleatory or epistemic (lack of knowledge)).

uncertaintyModel - uncertainty U is modeled using the mathematical theory M.

4. Most Commonly Used Approaches to Uncertainty for the WWW

4.1 Probability Theory

Probability theory provides a mathematically sound representation language and formal calculus for rational degrees of belief, which gives different agents the freedom to have different beliefs about a given hypothesis. This provides a compelling framework for representing uncertain, imperfect knowledge that can come from diverse agents. Not surprisingly, there are many distinct approaches using probability for the Semantic Web. This section briefly mentions the most commonly used approaches to probability for the Semantic Web. Appendix C brings a more detailed view of the subject.

4.1.1 Bayesian Networks (BNs)

Bayesian networks are a powerful graphical language for representing probabilistic relationships among large numbers of uncertain hypotheses. They have been applied to a wide variety of problems including medical diagnosis, classification systems, multi-sensor fusion, and legal analysis for trials. However, Bayesian networks are insufficiently expressive to cope with many real-world reasoning challenges found in the WWW. For example, a standard Bayesian network can represent the relationship between the type of an object, the object’s features, and sensor reports that provide information about the features, but cannot cope with reports from a large number of sensors reporting on an unknown number of objects, with uncertain associations of reports to objects. Section C.1.1 brings a detailed explanation of BNs within the context of the WWW.

4.1.2 Probabilistic Extensions to Description Logics.

Most of the probabilistic extensions aimed at the ontology engineering domain are based on description logics (DLs), which Baader and Nutt (2003, page 47) define as a family of knowledge representation formalisms that represent the knowledge of an application domain (the “world”) by first defining the relevant concepts and roles of the domain (its terminology), which represent classes of objects/individuals and binary relations between such classes, respectively, and then using these concepts and roles to specify properties of objects/individuals occurring in the domain (the world description).

There are several probabilistic extensions of description logics in the literature and some existing systems as well (e.g. Pronto), which can be classified according to the generalized description logics, the supported forms of probabilistic knowledge, and the underlying probabilistic reasoning formalism. These are covered in section C.1.2 in the appendices.

4.1.3 First-Order Probabilistic Approaches

In recent years, a number of languages have appeared that extend the expressiveness of probabilistic graphical models in various ways. This trend reflects the need for probabilistic tools with more representational power to meet the demands of real world problems, and is consistent with the need for Semantic Web representational schemes compatible with incomplete, uncertain knowledge. A clear candidate logic to fulfill this requirement for extended expressivity is first-order logic (FOL), which according to Sowa (2000, page 41) “has enough expressive power to define all of mathematics, every digital computer that has ever been built, and the semantics of every version of logic, including itself.” However, systems based on classical first-order logic lack a theoretically principled, widely accepted, logically coherent methodology for reasoning under uncertainty. Below are some of the approaches addressing this issue.

A workable solution for the Semantic Web requires a general-purpose formalism that gives ontology designers a range of options to balance tractability against expressiveness. Current research on SW formalisms using first-order probabilistic logics is still in its infancy, and generally lacks a complete set of publicly available tools. Examples include PR-OWL (Costa, 2005), which is an upper ontology for building probabilistic ontologies based on MEBN logic, and KEEPER (Pool and Aiken, 2004), an OWL-based interface for the relational probabilistic toolset Quiddity*Suite, developed by IET, Inc. A more detailed account of first-order probabilistic approaches is conveyed in section C.1.3 of the appendices.

4.2 Fuzzy Logic

In contrast to probabilistic formalisms, which allow for representing and processing degrees of uncertainty about ambiguous pieces of information, fuzzy formalisms allow for representing and processing degrees of truth about vague (or imprecise) pieces of information. It is important to point out that vague statements are truth-functional, that is, the degree of truth of a vague complex statement (which is constructed from elementary vague statements via logical operators) can be calculated from the degrees of truth of its constituents, while uncertain complex statements are generally not a function of the degrees of uncertainty of their constituents (Dubois and Prade, 1994).

Vagueness abounds especially in multimedia information processing and retrieval. Another typical application domain for vagueness and thus fuzzy formalisms are natural language interfaces to the Web. Furthermore, fuzzy formalisms have also been successfully applied in ontology mapping, information retrieval, and e-commerce negotiation tasks. Section C.2 of the appendices dwells on the subject at a greater level of detail.

4.2.1 Fuzzy Propositional Logics

Rather than being restricted to a binary truth value among false and true, vague propositions may also have a truth value strictly between false and true. One often assumes the unit interval [0, 1] as the set of all possible truth values, where 0 and 1 represent the ordinary binary truth values false and true, respectively. For example, the vague proposition “John is tall” may be more or less true, and it is thus associated with a truth value in [0, 1], depending on the body height of John.

4.2.2 Fuzzy Description Logics and Ontology Languages

Syntactically, as in the fuzzy propositional case, one then also allows for formulas that restrict the truth values of concept assertions, role assertions, concept inclusions, and role inclusions. Some important new ingredients of fuzzy description logics are often also fuzzy concrete domains, which include fuzzy predicates on concrete domains, and fuzzy modifiers (such as “very” or “slightly”), which are unary operators that change the membership functions of fuzzy concepts.

4.3 Belief Functions

Belief functions are closely related to probabilities. Beliefs in a hypothesis is calculated as the sum of the masses of all sets it encloses. A belief function differs from a Bayesian probability model in that one does not condition on those parts of the evidence for which no probabilities are specified. This ability to explicitly model the degree of ignorance makes the theory very appealing and has been applied in areas such as inconsistency handling in OWL ontologies (Nikolov et al., 2007) and ontology mapping (e.g. Yaghlane and Laamari, 2007).

5. Use Cases

The Subsections below provide a brief description of the use case scenarions that have been studied by the incubator group and that corroborate its conclusions and recommendations. Detailed descriptions of all use cases can be found in Appendix A.

5.1 Discovery

Service oriented architecture (SOA) assumes a world of distributed resources which are accessible across a network. It is assumed that catalogues will exist for different classes of resources, such as SOA services, and the user will be able to search these catalogs for a desired item. Note, a class of items will be described using a list of relevant properties and items belonging to that class will be described by assigning values to these properties. For discovery to occur, there must be some alignment of or mediation between the list of properties used by those populating the catalogue and those searching it. There must also be some alignment of or mediation between the nonnumeric values assigned to the properties, both in describing items for the catalogue and defining the search criteria.

5.2 Wine Sweetness

Wine domain is a very attracting domain both for experts and non experts. The main reason of its attractiveness is given by:

The “Wine Sweetness” use case focuses on a particular wine property that is the wine sweetness. The goal is to present a particular unknown wine’s sweetness to the user, according to his/her personal and possibly vague sweetness criteria. This is done by considering a knowledge base of reference that could have a finer/coarser classification, or it could use a terminology that is different from the one adopted by the user.

Furthermore, even when the same terminology is used, the interpretation of a vague classification label (such as “dry”) may differ between the creator of the knowledge base and the user who queries the knowledge base.

5.3 Belief Fusion and Opinion Pooling

A typical situation for web users is the need to aggregate information from multiple sources on the web. Issues related to uncertainty arise in such a situation in case the set of information acquired from multiple sources about the same fact is inconsistent (UncAnn - UncertaintyType: Inconsistency), or - more generally - in case that multiple information sources attribute different grades of belief (for example uncertain or mutually inconsistent beliefs) to the same statement (UncAnn - UncertaintyNature: Epistemic). If the user is not able to decide in favor of a single alternative (due to insufficient trust in the respective information sources, which can be seen as the default situation on the web), the aggregated statement resulting from the fusion of multiple statements is typically uncertain (UncAnn - UncertaintyNature: Epistemic, or if the types of uncertainty in this situation can vary. e.g., we could have UncAnn - UncertaintyType: Empirical).

A similar situation can be observed when a single information artifact on the web (e.g., a knowledge base, an ontology, a product rating, meta data, or even an ordinary web page) shall be created from multiple possibly contradictory information sources (e.g., expert opinions, existing ontologies, product recommendations, meta data, web pages...). The result needs to reflect and weight multiple input information appropriately, which typically yields uncertainty in case of heterogeneous input information.

There are several approaches to belief fusion. Examples for belief aggregation operators which can yield uncertain results are logarithmic and linear pools (LogOP, LinOP), and Bayesian Network Aggregation. One possible criterion for a successful fusion is the minimization of the divergence of the resulting probability distribution from the input probability distributions.

5.4 Ontology Based Reasoning and Retrieval from Large-Scale Databases

Consider a production company which has a knowledge base that consists of videos and images about persons (which usually are actors or models), TV spots, advertisements, etc. This company wants to publish its content on the Web so that advertisement or other production companies can use this knowledge base to look for either video footage like films, TV spots, etc or for persons to be employed for advertisements (casting). Each entry in the knowledge base contains a photo or a video, and some specific information like body and face characteristics, age or profession-like characteristic, in the case of persons, or video annotations in the case of spots or sceneries. The casting company has created a user interface for inserting the information of persons as instances of a predefined ontology or for performing semantic annotation of its multimedia content. It also provides a query engine to perform ontology-based search for its content through the web. A user can query the knowledge base providing information like the name, the height, the type of the hair (e.g. good quality, perfect, punk), the body (e.g. slim, athletic, plump), age range (e.g. 30s, 50s, MiddleAged), and more, in the case of persons, or information like the place the video spot is taking place (indoors vs. outdoors), the time of day (morning, afternoon, night), the landscape it depicts (mountain, sea), a sky being cloudy or not, a sea being wavy or not, and many more.

The knowledge engineer of the application has identified that applying a classical (Boolean) knowledge based system in the above scenario is very problematic due to the nature of the knowledge and information. For example, an attempt to assign a Boolean meaning to concepts like "30s", "MiddleAged", "Teen", "Kid", "Slim", "Tall", ... would lead to intuitive paradoxes. On the other hand, it is also very difficult to define other more expressive concepts, like the concept "StudentLooks" in terms of the already problematic concepts "Teen" and "Kid". Similarly, a sky being cloudy or wavy or time being morning or afternoon is also a matter of degree.

His solution to the problem is to use fuzzy ontologies where the membership of an individual (person) or image object to a Concept is annotated with a degree of membership. So one is able to classify "model1" as Tall, Thin, MiddleAged, to degrees 0.6, 0.9, 0.7, respectively, depending on the model's actual height, weight and age. Then, one is able to infer that "model1" is StudentLooking or AccademicLooking to specific degrees according to the definition of the concepts in the ontology and the interpretation of them according to the theory of fuzzy ontologies. Interestingly, the developed system also provides a easy and natural way to provide end-users with rankings in the query results which is not easily supported by Booelan models, or even more to allow end-users specify preferences and weights over the atoms (ingredients) of their queries, thus allowing for far more expressivity.

5.5 SOA Execution Context

As defined in the OASIS Reference Model for Service Oriented Architecture (SOA-RM), the execution context of a service interaction is the set of infrastructure elements, process entities, policy assertions and agreements that are identified as part of an instantiated service interaction, and thus forms a path between those with needs and those with capabilities.

As discussed in SOA-RM, the service description (and a corresponding description associated with the service consumer and its needs) contains information that can include preferred protocols, semantics, policies and other conditions and assumptions that describe how a service can and may be used. The participants (providers, consumers, and any third parties) must agree and acknowledge a consistent set of agreements in order to have a successful service interaction, i.e. realizing the described real world effects. The execution context is the collection of this consistent set of agreements.

5.6 Recommendation

Recommender systems form a rapidly growing category of web-based system. A recommender system takes input from a user in the form of a query or an exemplar of the kind of item the user seeks, and returns recommendations for information or products. For example, the user might input a list of keywords and the system would return a list of recommended books, articles and/or web sites. The user might input one or a few movies, and the system might return a list of suggested movies for the user to view. Many e-commerce sites employ recommending systems to suggest products that customers might want to purchase. Another well-known example for this use case is the search for web pages using a search engine.

This use case discusses uncertainties that typically occur in the context of recommender systems or recommendations generated using other technical means (e.g., agents). The main scenario is as follows: A single or multiple recommendation searcher(s) express(es) her/their preferences in a machine readable format. A recommender system then combines a set of recommendations (obtained by a number of agents or other recommender systems) into an aggregated recommendation and ranking. For example, a user might input a movie, and the system would form its recommendation by aggregating recommendations provided by consumers who have seen the movie.

In order to enable formal inference to be carried out on the set of recommendations, the semantics of recommendation needs to be cleanly defined and an appropriate formal framework for the representation of recommendations is required. Also, an ability is needed to express preferences, scales and rankings in a formal way.

5.7 Extraction / Annotation

The motivating situation is a user (or a web service) that wants a web scale overview of available information – e.g. overview of all car selling shops or online shops selling notebooks. The advantage would be a possibility of comparison of different market offers. Another application is competitor tracking system.

Main problem is the size of data and the fact that these data are mainly designed for human consumption.

Solution are extraction and annotation tools. There are many annotation tools linked on the SW Annotation & Authoring Website, mainly using a proprietary uncertainty representations (or built in uncertainty handling). Here uncertainty annotation of results would be especially helpful.

Assume that a user is looking for notebooks and we would like to provide a machine support for his/her search. A typical statement which is a subject of uncertainty assignment in this use case is: (UncAnn - Sentence) An html coded web page with URL contains informations, which according to an ontology o1 (UncAnn - World: DomainOntology) about notebooks can be expressed by a RDF triple (ntb1, O1:has_priceProperty, 20000). The agent producing this statement is (UncAnn - Agent: MachineAgent) especially an induction agent (UncAnn - Agent: MachineAgent:InductiveAgent). For extensions of concepts see a finer grained version of Uncertainty Ontology.

Uncertainty nature of this statement isUncAnn - UncertaintyNature: Epistemic:MachineEpistemic), uncertainty type is usually (UncAnn - UncertaintyType: Empirical:Randomness). Instances used for training an extraction tool (UncAnn - World:DomainOntology:Instances) are web pages, the uncertainty model is usually complicated (mixture of html structure, regular expressions, annotation ontology and similarity measures) and combination of several models, typically (UncAnn - UncertaintyModel:CombinationOfSeveralModels:ProbabilityAndFuzzySetsCombinationModels) . Depending on this the evidence for this uncertainty statement (UncAnn - World:DomainOntology:Instances:Evidence) are precision and recall on this training set.

5.8 Soft Shopping Agent

Suppose we have a car selling web site offering cars and we would like to buy a car. Descriptions of the cars are stored in databases and we have some ontology encoding information about the domain. Now, suppose that preferably we would like to pay around €11000 and the car should have fewer than 15000 km on the odometer. Also, if there are leather seats then I would like to have air conditioning, the color is preferably blue, and the car is not old.

Of course, most of our constraints, e.g. on price and kilometers, aren't crisp as we may still accept e.g.~a car's cost of €11200 and with an odometer reading of 16000km. Hence, these constraints are rather vague (fuzzy) (we may model this by means of so-called fuzzy membership functions). We may also give some preference weight to my requirements.

On the other hand, the seller may offer a discount on the car's catalogue price, but the bigger the discount the less satisfied he is. For instance, related to the sale of a Mazda3, the seller may consider optimal to sell above €15000, but can go down to €13500 to a lesser degree of satisfaction.

For each car, there will be an optimal price it can be sold, which maximises the product of the buyer's degree of satisfaction and the seller's degree of satisfaction. This is the so-called NASH equilibrium of the matching. Each car gets an optimal degree of buyer/seller degree of satisfaction.

From the buyer perspective, he asks for the TOP-k cars and their optimal price, ranked by the optimal degree of satisfaction.

From the seller perspective, he may ask for the TOP-k buyer's for a given car and their optimal price, ranked by the optimal degree of satisfaction.

5.9 A Chain from the Web to the User

To get information from the web to the user we have to use a chain of tools – typically web crawling, web data extraction, middleware transformation, user querying and delivering answer. There are several use cases dealing with particular problems of uncertainty along such a chain. Usually there is a middleware connecting those.

Our understanding of this is to view the whole chain of models, methods and tools from web to the user and especially handling uncertainty combination along this (UncAnn - UncertaintyModel: could be a combination of several models).

5.10 Appointment Making

This use case was inspired by the 2001 Scientific American article, The Semantic Web, by Berners-Lee, Hendler and Lassila. The article describes a scenario in which Lucy and her brother Pete must schedule their mother for a sequence of visits to a physical therapist. They agree to share the chauffeuring, and Lucy tasks her Semantic Web agent to set up the appointments:

It is clear that many uncertainties arise in handling this classic use case for the Semantic Web use case. For example, both the provider's and the consumer's schedules may be uncertain, and in traffic-clogged metropolitan areas, the amount of time it takes to get from the consumer's location to the place where the service is rendered may be highly uncertain.

5.11 User Preference Modeling for top-k Answers

This is in a sense a generalization of some aspects of Discovery use case. Given a populated catalogue by some extraction tool (see use case about extraction) of items and a user’s criteria and/or multicriterial utility function for item potentially listed in the catalogue retrieve best, top-k matches.

Usually, the main problem is to learn user preferences. This can be done either by implicit information collection (system tracks user behavior, click streams, …) or by explicit information collection (system poses questions, user answers). Sometimes a recommender system finds similar users (UncAnn - UncertaintyModel:SimilarityModels). Another problem is effective retrieval of search results ordered by these preferences (usually top-k answers suffice).

As result of any data mining procedure, results of such user preference mining will be uncertain.

Typical sentence which is a subject of uncertainty assignment is: (UncAnn - Sentence) User1 prefers most item1 (list of of top-k most preferred items for User1 consists of item1, ..., itemk).

5.12 Ontology Mediated Multimedia Information Retrieval

Suppose we want to device ontology mediated multimedia information retrieval system, which combines logic-based retrieval with multimedia feature-based similarity retrieval. An ontology layer may be used to define (in terms of semantic web like language) the relevant abstract concepts and relations of the application domain, while a content-based multimedia retrieval system is used for feature-based retrieval. We ask to make queries such as

5.13 Buying Speakers

The main point of this use case is to show that in some cases one needs to combine different kinds of uncertainty. In this particular use case two types of uncertainty are considered: Randomness and Vagueness.

The scenario includes a customer who is interested in purchasing a set of speakers, but the question is (1) whether to go to a store today or wait until tomorrow to buy speakers, (2) which speakers to buy and (3) at which store. Customer is interested in two speaker features: wattage and price. Customer has a valuation formula that combines the likelihood of availability of speakers on a particular day in a particular store, as well as the two features. The features of wattage and price are fuzzy. Optionally, Customer gets the formulas from CustomerService, an ontology based Web service that collects information about products, stores, statistics, evaluations.

It is assumed that there is known probability distribution on the availability of particular speaker type in particular stores on a particular day in the future. Also it is assumed that both the customer's agent and the consumer service agent share the same Uncertainty Ontology. The customer's agent issues a query (a sentence) using terms from the Uncertainty Ontology: Sentence. It is a complex sentence consisting of three basic sentences. One related to the availability, one to the wattage and one to the price of speakers. Each of these sub-sentences will have uncertainty Uncertainty associated with it. The uncertainty type related to the availability of particular speaker type in the stores is of type UncAnn - UncertaintyType: Empirical. The uncertainty nature is UncAnn - UncertaintyNature: Aleatory. The uncertainty model is UncAnn - UncertaintyModel: Probability. The customer has (or obtains from CustomerService) definitions of features of wattage and price in terms of fuzzy membership functions. For wattage, Customer has three such functions: weak, medium and strong. These are of "trapezoid shaped" membership functions. Similarly, for price Customer has three such membership functions: cheap, reasonable and expensive.

In the end, the customer gets necessary information about the availability and types of speakers from stores. This information is sufficient for the customer to compute the required metric and to make the decision on which speakers to buy, where and when.

5.14 Healthcare and Life Sciences

The entire Healthcare and Life Sciences spectrum involves the creation and manipulation of uncertain information and knowledge. A collection of use cases are presented characterized by a simple taxonomy.

5.14.1 Hypothesis Uncertainty

Some examples of Uncertainty in the context of Hypothesis Generation and Validation are enumerated below:

5.14.2 Interpretation/Classification Uncertainty

5.14.3 Prediction-oriented Uncertainty

Some examples of Uncertainty in the context of predicting some phenomena based on currently available information are enumerated below:

5.14.4 Belief oriented uncertainty

Some examples of Uncertainty in the context of believing (or not believing) certain hypotheses and theories are enumerated below.

5.14.5 Data Source based Uncertainty

Some examples of Uncertainty in the context of trusting various data sources are enumerated below.

5.14.6 Data Uncertainty

Some examples that illustrate the inherent uncertainty of the data generated in the Healthcare and Life Sciences are enumerated below.

6. Benefits of Standardization

6.1 Where use cases imply standardization benefits

We can consider the use cases above as processes in which a consumer of information makes a request to a provider (or multiple providers) of web-accessible information or services, and receives a response (or multiple responses).

The use cases illustrate several examples where uncertainty arises during this interaction and there are a number of topics that are common across use cases. Specifically, we can sub-divide into three areas - the producer’s specification of what can be provided, the consumer’s request (description of what is wanted) and the result. Taking these in turn:

6.1.1 Uncertainty in Provider's Specification

This relates primarily to the provider’s descriptors (i.e. properties used to describe the topic / item / service provided) and the values assigned to these descriptors. Such values may be based on perception rather than measurement (for example, a picture of someone with an ‘athletic physique”), or on overlapping categories where an item can belong to multiple categories at different membership levels (e.g. a film could belong strongly to the genre ‘comedy’ and weakly to the genre ‘adventure’).

Additional uncertainty may arise where the provider makes assertions related to the use of the information or service provided. Standardization could assist (for example) in determining intersection with similar assertions by the consumer, e.g. privacy policies.

6.1.2 Uncertainty in Consumer's Request

The provider has to deal with cases of incomplete and/or inconsistent information in the request from a consumer. Further uncertainty may arise where a request is based partly on submitted data and partly on background information, such as known consumer preferences or history.

As above, further uncertainty may arise where the consumer makes assertions related to the use of the information provided in the request.

6.1.3 Result Returned to Consumer

The consumer may have to deal with uncertainty in the result from a single provider or in results from multiple providers. In the first case, the most obvious possibility is that the result is incomplete or inconsistent in some way. Inconsistency is not a binary state - in many cases, a small inconsistency in a result may not affect the usefulness of the answer. It is however an area in which standardization of uncertainty could aid uniform handling of results. Similarly, incompleteness in a result may not affect its usefulness.

Further uncertainty may arise from use of the provider’s use of consumer preferences, the process of finding responses to a partially matched request, etc. Inconsistency is possible from a single provider but is more likely where results are aggregated from multiple providers. In cases where a consumer is dealing with more than one provider, these problems are multiplied because different providers may have different interpretations of descriptors and values, or even different sets of descriptors, as well as different approaches to processing requests, variation in use of consumer preferences, different historical data on a particular consumer, etc. Clearly standardization would clarify the uncertainty in this process to the benefit of both producers and consumers.

Underlying these aspects are the fundamental questions that motivate standardization - how do the different parties assess uncertainty, and can these assessments be meaningfully combined, particularly when they are derived from different methodologies. The work of this XG is not to develop or even identify many of the mechanisms that these use cases imply are needed to process uncertainty. The current effort intends to identify the types of information that are likely to be valuable for such processing to occur and to provide guidance to those who would develop the syntax to convey this information in a machine-processible way.

6.2 Goals of Standardization

The challenges related to uncertainty reasoning on the scale of the World Wide Web have been introduced in Section 1, and the goal of standardization would be to enable the understanding and processing needed for consistent use of available information when uncertainty is present. Many applications which generate data for the web already handle uncertainty in some form. For example, information retrieval systems may rank pages in terms of “relevance” on a scale of 0-100, weather forecasts are frequently qualified (e.g. 30% chance of showers), product finders return lists which are ordered according to the quality of match with a user’s requirements. These applications implicitly or explicitly define and handle uncertainty, and communicate it to the user. Standardization is not necessary for these individual applications which handle uncertainty internally in a (hopefully) consistent manner.

However, as soon as an application incorporates externally produced uncertain data, there is a need to standardize the representation of the characterization of the uncertainty. The notion of interoperability - being able to access and process data from any web source - is fundamental to a web of distributed information, and cannot be achieved unless all sources conform to common standards. As argued in the introduction, much of the available data on the web is subject to uncertainty - so that without standardization of uncertainty, applications using this information are either (i) inaccurate or (ii) have to make assumptions that enable them to ignore uncertainty. Neither of these options is likely to lead to practical, accurate reasoning about real-world data, except in a limited set of cases.

The availability of an uncertainty mark-up language for annotating web data would make it possible to (semi)automate and manage the trustworthiness of the information on the Web. Indeed, there are many cases in which the same data can have different reliability depending on: the source from which they are generated, the context in which they are produced, the time in which they are made available. Currently, such information generally cannot be managed simply because there are no way for knowing the associated uncertainty. With the availability of uncertainty mark-up annotation, such information can be properly treated for the first time.

6.3 Aspects of Uncertainty Reasoning for Potential Standardization

Many approaches to uncertainty use a numerical scale (e.g. from 0 to 1 or 0 to 100) but interpret and process these values in different ways. It is not necessary for every implemented system to interpret and process every form of uncertainty. The aim should be for common understanding and interpretation of the core forms, and the ability to extend the framework as necessary. For example, if data is published with probabilities attached, any other application would be able to perform specified operations on those probabilities and know that the results were meaningful.

As such, we conclude the following as guidelines when considering possible standards development efforts related to uncertainty:

7. Conclusions and Recommendations

To motivate our debate, we have studied several use cases in which uncertainty would play a significant role. In all of those cases, we assumed the need of an unified model of uncertainty annotation of web resources and the need of those annotations to be done automatically, due to the size of the web. We also found that in order to address the situations presented here, an ontology on uncertainty is needed, so deductive engines would be able to use uncertainty information properly and third party users would understand the annotated resources. In all use cases we studied, a successful end would mean to have automatic processing of web resources with greater accuracy, while a failed end would just leave the web as it is today.

7.1 Conclusions Relating to UncertaintyOntology

In automated Web data processing, we often face situations where Boolean truth values are unknown, unknowable, or inapplicable. Our conclusions follow the UncertaintyOntology as described above but also imply that a finer grained version of UncertaintyOntology might be useful. Such an extension could provide a means to visualize a possible evolution of upper level UncertaintyOntology and to emphasize uncertainty issues connected to machine processing (lot of situations are perfectly certain when considering human consumption of web resources). In the current discussion, we focus especially on finer classification of Machine Agents (UncAnn Agent: MachineAgent) and uncertainty caused by lack of knowledge of a machine agent (UncAnn UncertaintyNature:Epistemic:MachineEpistemic).

We recommend aspects that are considered most important to be included in a standard representation of uncertainty : Extensions of UncertaintyOntology which prove to be useful in annotation of web resources in order to improve their machine processing.

7.2 Conclusions Relating to Kinds of Uncertainty

The use cases demonstrate that there are two very different kinds of uncertainty that we need to consider in standardization.

Each of (a) and (b) represent different standardization motivations and requirements.

In the first case, the standardization should be done at the representation level. When sharing information that has an inherent level of uncertainty, it is useful to have a single syntactical system so that people can identify and process this information quickly. These kinds of use case may require something like uncertain extensions to OWL (i.e., probabilistic, fuzzy, belief function, random set, rough set, and hybrid uncertain extensions to OWL; see Section 3.1.8). For example, as for probabilistic uncertainty, we may want to be able to pass on information that Study X shows that people with property Y have an X% increased likelihood of this disease, or that the probability of a four of a kind given pocket Aces and an Ace in the flop is 0.043. This simply requires a standard syntax.

But many of the use cases we've considered involve uncertainty reasoning on the part of the tools used to access and share web information, not the information itself. For example, if a web service uses uncertainty reasoning to find and rank hotel rooms for me, the uncertain information would not reside on the web. In such situations the role of standardization is different and the motivation may be less clear. After all, if the hotel room information is useful and rankings are roughly accurate, many users will be unconcerned with the reasoning process or the uncertainties attached to the rankings. So, here, we'd want to use standardization for a different purpose. It would be used to represent meta-information about the reasoning models and assumptions. And it would also play a different role, e.g., developing trust models, finding compatible web services. However, it could also require a very extensive representation task. Standardization questions here include determining how to represent this information, how detailed it would be, where it would reside.

7.3 Specific Recommendations for Standardization

It is acknowledged that while uncertainty is pervasive in both normal life and its reflections on the Web, it is not always necessary to characterize this uncertainty. However, there are significant instances where knowledge of uncertainties could be used to positively support decision making, and it is with such instances in mind that the URW3-XG makes the following recommendations:

The recommendations point to the desirability of having a means to annotate information with relevant uncertainty information. The mechanism could be similar to that specified under Semantic Annotations for WSDL and XML Schema(SAWSDL), where the annotation approach is described as follows:

The specification defines how semantic annotation is accomplished using references to semantic models, e.g. ontologies. Semantic Annotations for WSDL and XML Schema (SAWSDL) does not specify a language for representing the semantic models. Instead it provides mechanisms by which concepts from the semantic models, typically defined outside the WSDL document, can be referenced from within WSDL and XML Schema components using annotations.

In the realm of uncertainty representation, we would specify uncertainty models and uncertainty annotations rather than SAWSDL's semantic counterparts. For such uncertainty annotations, a possible standard would need to support both inherent uncertainty in the data and uncertainty connected to results of processing that data, but at this point it is unclear whether there is a need for separate portions of the syntax for each data and processing uncertainty or whether a single syntax would be able to cover the entire range.

In addition, a question that remains with this approach but one which is outside the scope of the URW3-XG is whether existing languages (e.g. OWL, RDFS, RIF) are sufficiently expressive to support the necessary annotations. If so, the development of such annotations might merely require work on a more complete uncertainty ontology and possibly rules; otherwise, the expressiveness of existing languages might need to be extended. As an example of the latter, it might be advisable to develop a probabilistic extension to OWL (e.g. PR-OWL) or a Fuzzy-OWL format or profiles associated with the type of uncertainty to be represented. Further work is required to investigate the adequacy of the existing languages against the compiled use cases.

An eventual goal in continuing the current work would be to define a format to represent uncertainty in an agreed upon way to enable reliable communications for situations such as the compiled use cases. The work of the URW3-XG has made a significant contribution in defining the problem space and identifying continuing work. These questions will likely be the subject of discussion at a proposed 4th Uncertainty Reasoning for the Semantic Web (URSW) workshop at ISWC 2008, and just as the 2nd URSW workshop decided to pursue the question of uncertainty representation through what became the URW3-XG, a proposal for continued work may be an output of the 2008 URSW workshop.

8. Acknowledgements

The editors acknowledge significant contributions from the following persons (in alphabetical order):

Reference List

This is a collection of references that were added by XG members. Their intent was to collect a set of references for various methodologies and to "investigate proposed and implemented methodologies that may be applied to address the use cases developed under the first objective and that show promise as candidate solutions for uncertainty reasoning on the scale of the World Wide Web. The combination of use cases and associated methodologies would be examined to determine the most commonly required information and also that information that while not common may be especially important in select situations."

The list is by no means exhaustive and should be merely regarded as a set of recommended reading for people interested in the subject of uncertainty representation and reasoning in general.

Agarwal, S.; and Lamparter, S. (2005) sMART - A Semantic Matchmaking Portal for Electronic Markets. Proceedings of the 7th International IEEE Conference on E-Commerce Technology. Munich, Germany, 2005.

Baader, F.; and Nutt, W. (2003). Basic Description Logics. In Baader, F., Calvanese, D., McGuiness, D., Nardi, D., & Patel-Schneider, P. (Eds.), The Description Logics Handbook: Theory, Implementation and Applications. 1st edition, chapter 2, pp. 47-100. Cambridge, UK: Cambridge University Press.

Bangsø, O., & Wuillemin, P.-H. (2000) Object Oriented Bayesian Networks: A Framework for Topdown Specification of Large Bayesian Networks and Repetitive Structures. Technical Report No. CIT-87.2-00-obphw1. Department of Computer Science, Aalborg University, Aalborg, Denmark.

Bednarek, D.; Obdrzalek, D.; Yaghob, J.; and Zavoral, F. (2005) Data Integration Using Data Pile Structure, in Advances in Databases and Information Systems, Springer-Verlag, 2005, ISBN 3 540 42555 1, pp. 178-188.

Berners-Lee, T.; and Fischetti, M. (2000). Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by its Inventor. 1st edition. New York, NY, USA: HarperCollins Publishers.

Berners-Lee, T.; Hendler, J.; and Lassila, O. (2001) The Semantic Web, Scientific American (pp. 29-37).

Bonatti, P.; and Tettamanzi, A. (2006) Some Complexity Results on Fuzzy Description Logics. In Di Gesu, V., Masulli, F.,& Petrosino, A. (Eds.), Fuzzy Logic and Applications, Vol. 2955 of LNCS, pp. 19-24. Springer.

Brachman, R. J. (1977). What's in a Concept: Structural Foundations for Semantic Networks. International Journal of Man-Machine Studies, 9(2), 127-152.

Buntine, W. L. (1994) Learning with Graphical Models. Technical Report No. FIA-94-03. NASA Ames Research Center, Artificial Intelligence Research Branch.

Calì, Andrea; Lukasiewicz, Thomas; Predoiu, Livia; and Stuckenschmidt, Heiner (2008) Tightly Integrated Probabilistic Description Logic Programs for Representing Ontology Mappings. Proceedings of the 5th International Symposium on Foundations of Information and Knowledge Systems (FoIKS 2008).

Calvanese, D.; and De Giacomo, G. (2003). Expressive Description Logics. In Baader, F., Calvanese, D., McGuiness, D., Nardi, D., & Patel-Schneider, P. (Eds.), The Description Logics Handbook: Theory, Implementation and Applications. 1st edition, chapter 5, pp. 184-225. Cambridge, UK: Cambridge University Press.

Carvalho, R. N.; Santos, L. L.; Ladeira, M.; and Costa, P. C. G. (2007) A Tool for Plausible Reasoning in the Semantic Web using MEBN. In Proceedings of the Seventh International Conference on Intelligent Systems Design and Applications (ISDA 2007). Mourele, L.; Nedjah, N.; Kacprzyk, J.; and Abraham, A. (eds.); pp. 381-386. October 22-24, 2007, Rio de Janeiro, Brazil.

Codd, E. F. (1970). A Relational Model for Large Shared Data Banks. Communications of the ACM, 13(6), 377-387.

Costa, P. C. G. (2005). Bayesian Semantics for the Semantic Web. Doctoral Dissertation. Department of Systems Engineering and Operations Research. 2005, George Mason University: Fairfax, VA, USA. p. 312.

Costa, P. C. G.; and Laskey, K. B. (2006). PR-OWL: A Framework for Probabilistic Ontologies. In Proceedings of the International Conference on Formal Ontology in Information Systems (FOIS 2006). November 9-11, 2006, Baltimore, MD, USA.

Costa, P. C.G.; Ladeira, M.; Carvalho, R. N.; Laskey, K. B.; Santos, L. L.; and Matsumoto, S. (2008) A First-Order Bayesian Tool for Probabilistic Ontologies. To appear at the 21st International Florida Artificial Intelligence Research Society Conference (FLAIRS-21). May 15-17, 2008, Coconut Grove, Florida, USA.

Damasio, C., Pan, J., Stoilos, G., & Straccia, U. (2006). An Approach to Representing Uncertainty Rules in RuleML. In Proceedings of the 2nd International Conference on Rules and Rule Markup Languages for the Semantic Web (RuleML-06). IEEE Computer Society. Available at

Ding, Z. (2005). BayesOWL: A Probabilistic Framework for Semantic Web. Doctoral dissertation. Computer Science and Electrical Engineering. 2005, University of Maryland, Baltimore County: Baltimore, MD, USA. p. 168.

Ding, Z., & Peng, Y. (2004). A Probabilistic Extension to Ontology Language OWL. In Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04). Jan, 5-8, 2004. Big Island, Hawaii, USA.

Dubois, D.; and Prade, H. (1994) Can We Enforce Full Compositionality in Uncertainty Calculi? Proceedings AAAI-1994, pp. 149-154. AAAI Press.

Enderton, H. B. (2001) A Mathematical Introduction to Logic. 2nd Edition. Harcourt Academic Press

Fagin, R.; Lotem, A.; and Naor, M. (2003) Optimal Aggregation Algorithms for Middleware. In J. Computer and System Sciences 66 (2003), pp. 614-656.

Frege, G. (1879). Begriffsschrift, 1879, translated in Jean van Heijenoort, ed., From Frege to Gödel, Cambridge, MA: Harvard University Press, 1967.

Fukushige, Y. (2004). Representing Probabilistic Knowledge in the Semantic Web, W3C Workshop on Semantic Web for Life Sciences. Cambridge, MA, USA.

Jousselme, A. L.; Maupin, P.; and Bosse, E. (2003). Uncertainty in a Situation Analysis Perspective. In Proceedings of the Sixth International Conference of Information Fusion, vol. 2, pages 1207-1214. July 8-11, 2003, Cairns, Queensland, Australia.

Getoor, L.; Friedman, N.; Koller, D.; and Pfeffer, A. (2001). Learning Probabilistic Relational Models. New York, NY, USA: Springer-Verlag.

Getoor, L.; Koller, D.; Taskar, B.; and Friedman, N. (2000). Learning Probabilistic Relational Models with Structural Uncertainty. Paper presented at the ICML-2000 Workshop on Attribute-Value and Relational Learning:Crossing the Boundaries. Stanford, CA, USA.

Gilks, W.; Thomas, A.; and Spiegelhalter, D. J. (1994). A Language and Program for Complex Bayesian Modeling. The Statistician, 43, 169-178.

Giugno, R.; and Lukasiewicz, T. (2002) P-SHOQ(D): A Probabilistic Extension of SHOQ(D) for Probabilistic Ontologies in the Semantic Web. Proceedings of the 8th European Conference on Logics in Artificial Intelligence (JELIA 2002). Extended version: Lukasiewicz, Thomas (2007) Expressive Probabilistic Description Logics, Artificial Intelligence, 172(6-7), 852-883, April 2008.

Gu, T.; Pung, H. K.; and Zhang, D. Q. (2004) A Bayesian Approach for Dealing with Uncertainty Contexts, in Second International Conference on Pervasive Computing. 2004. Vienna, Austria: Austrian Computer Society.

Gurský, P.; Horváth, T.; Jirásek, J.; Krajči, S.; Novotný, R.; Vaneková, V.; and Vojtáš, P. (2007) Knowledge Processing for Web Search – An Integrated Model, In: C. Badica and M. Paprzycki (eds.) Proceedings of the 1st International Symposium on Intelligent and Distributed Computing (IDC 2007), Studies in Computational Intelligence (vol. 78), Springer, 2007, pp: 95-104.

Gurský, P.; Horváth, T.; Jirásek, J.; Krajči, S.; Novotný, R.; Vaneková, V.; and Vojtáš, P. (2007) Web Search with Variable User Model. In Datakon 2007, L. Popelinsky and O. Vyborny eds. Masarykova Univerzita, 111-121.

Hájek, P. (2005). Making Fuzzy Description Logics More Expressive. Fuzzy Sets and Systems, 154(1), 1-15.

Hájek, P. (2006). What Does Mathematical Fuzzy Logic Offer to Description Logic? In Sanchez, E. (Ed.), Capturing Intelligence: Fuzzy Logic and the Semantic Web. Elsevier.

Heckerman, D.; Mamdani, A.; and Wellman, M. P. (1995). Real-World Applications of Bayesian Networks. Communications of the ACM, 38(3), 24-68.

Heckerman, D.; Meek, C.; and Koller, D. (2004) Probabilistic Models for Relational Data. Technical Report MSR-TR-2004-30, Microsoft Corporation, March 2004. Redmond, WA, USA.

Heinsohn, J. (1994) Probabilistic Description Logics. Paper presented at the Tenth Conference on Uncertainty in Artificial Intelligence (UAI-94), Jul 29-31.Seattle, WA, USA.

Hölldobler, S.; Khang, T. D.; and Störr, H.-P. (2002) A Fuzzy Description Logic with Hedges as Concept Modifiers. In Proceedings InTech/VJFuzzy-2002, pp. 25-34.

Hölldobler, S.; Nga, N. H.; and Khang, T. D. (2005) The Fuzzy Description Logic ALCflh. In Proceeedings DL-2005.

Horrocks, I. (2002) DAML+OIL: A Reasonable Web Ontology Language. Keynote talk at the WES/CAiSE Conference. Toronto, Canada.

Horrocks, I.; and Sattler, U. (2001) Ontology Reasoning in the SHOQ(D) Description Logic. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI 2001), Aug 4-10. Seattle, WA, USA.

Horvath, T.; and Vojtas, P. (2006) Ordinal Classification with Monotonicity Constraints. In ICDM 2006, LNAI 4065, Springer, 2006, p. 217-225.

Jaeger, M. (1994) Probabilistic Reasoning in Terminological Logics. Paper presented at the Fourth International Conference on Principles of Knowledge Representation and Reasoning (KR94), May 24-27. Bonn, Germany.

Jaeger, M. (1997) Relational Bayesian Networks. Paper presented at the 13th Annual Conference on Uncertainty in Artificial Intelligence (UAI-97), August 1-3, Providence, RI, USA.

Jaeger, M. (2006) Probabilistic Role Models and the Guarded Fragment. In Proceedings IPMU-2004, pp. 235–242. Extended version in Int. J. Uncertain. Fuzz., 14(1), 43–60, 2006.

Koller, D.; Levy, A. Y.; and Pfeffer, A. (1997) P-CLASSIC: A Tractable Probabilistic Description Logic. Paper presented at the Fourteenth National Conference on Artificial Intelligence (AAAI-97), July 27-31. Providence, RI, USA.

Koller, D.; and Pfeffer, A. (1997) Object-Oriented Bayesian Networks. Paper presented at the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI-97). San Francisco, CA, USA.

Kolmogorov, A. N. (1960) Foundations of the Theory of Probability. 2nd edition. New York, NY, USA: Chelsea Publishing Co. Originally published in 1933.

Langseth, H.; and Nielsen, T. (2003) Fusion of Domain Knowledge with Data for Structured Learning in Object-Oriented Domains. Journal of Machine Learning Research, Special Issue on the Fusion of Domain Knowledge with Data for Decision Support, vol. 4, pp. 339-368, July 2003.

Laskey, K. B.; and Costa P. C. G. (2005). Of Klingons and Starships: Bayesian Logic for the 23rd Century, in Uncertainty in Artificial Intelligence: Proceedings of the Twenty-first Conference. 2005, AUAI Press: Edinburgh, Scotland.

Laskey, K. B.; and Mahoney, S. M. (1997). Network Fragments: Representing Knowledge for Constructing Probabilistic Models. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI-97), August, 1997. Providence, RI, USA.

Liu, B. (2005) WWW-2005 Tutorial: Web Content Mining. Presented at the Fourteenth International World Wide Web Conference (WWW-2005), May 10-14, 2005, Chiba, Japan.

Li, Y.; Xu, B.; Lu, J.; and Kang, D. (2006) Discrete Tableau Algorithms for SHI. In Proceeedings DL-2006.

Li, Y.; Xu, B.; Lu, J.; Kang, D.; and Wang, P. (2005a) Extended Fuzzy Description Logic ALCN. In Proceedings KES-2005, Vol. 3684 of LNCS, pp. 896-902. Springer.

Li, Y.; Xu, B.; Lu, J.; Kang, D.; and Wang, P. (2005b). A Family of Extended Fuzzy Description Logics. In Proceedings COMPSAC-2005, pp. 221-226. IEEE Computer Society.

Lloyd, J.; and Ng, K.S (2008) Probabilistic Reasoning in a Classical Logic. In Journal of Applied Logic.

Lukasiewicz, T. (2002). Probabilistic Default Reasoning with Conditional Constraints. Ann. Math. Artif. Intell., 34(1-3), 35-88, 2002.

Lukasiewicz, T. (2005). Probabilistic Description Logic Programs. In Proceedings ECSQARU 2005, Barcelona, Spain, July 2005. Vol. 3571 of LNCS, pp. 737-749. Springer. Extended version:International Journal of Approximate Reasoning 45(2), 288-307, 2007.

Lukasiewicz, T. (2006). Fuzzy Description Logic Programs under the Answer Set Semantics for the Semantic Web. In Proceedings of the 2nd International Conference on Rules and Rule Markup Languages for the Semantic Web (RuleML-06), pp. 89-96. IEEE Computer Society. Extended version: Fundamenta Informaticae 82, 1-22, 2008.

Lukasiewicz, T. (2007) Tractable Probabilistic Description Logic Programs. Proceedings of the 1st International Conference on Scalable Uncertainty Management (SUM 2007).

Lukasiewicz, T. (2008). Expressive Probabilistic Description Logics. Artificial Intelligence, 172(6-7), 852-883.

Lukasiewicz, T.; and Schellhase, J. (2006) Variable-Strength Conditional Preferences for Ranking Objects in Ontologies. Proceedings of the 3rd European Semantic Web Conference (ESWC 2006). Extended version: Variable-Strength Conditional Preferences for Ranking Objects in Ontologies. Journal of Web Semantics, 5(3), 180-194, September 2007.

Lukasiewicz, T.; and Straccia, U. (2007a) Description Logic Programs under Probabilistic Uncertainty and Fuzzy Vagueness. In Proceedings ECSQARU 2007, Hammamet, Tunisia, October/November 2007. Vol. 4724 of LNCS, pp. 187-198. Springer.

Lukasiewicz, T.; and Straccia, U. (2007b) Top-k Retrieval in Description Logic Programs Under Vagueness for the Semantic Web. Proceedings of the 1st International Conference on Scalable Uncertainty Management (SUM 2007).

Lukasiewicz, T.; and Straccia, U. (2007c) Tightly Integrated Fuzzy Description Logic Programs Under the Answer Set Semantics for the Semantic Web. Proceedings of the 1st International Conference on Web Reasoning and Rule Systems (RR 2007).

Minsky, M. L. (1975). Framework for Representing Knowledge. In The Psychology of Computer Vision. P. H. Winston (Eds.), pp. 211-277. New York, NY: McGraw-Hill.

Mitra, P.; Noy, N. F.; and Jaiswal, A. R. (2004) OMEN: A Probabilistic Ontology Mapping Tool. Workshop on Meaning Coordination and Negotiation at the Third International Conference on the Semantic Web (ISWC-2004), November, 2004. Hisroshima, Japan.

Mitra, P.; Noy, N.; and Jaiswal, A. R. (2005) Ontology Mapping Discovery with Uncertainty. Presented at the Fourth International Semantic Web Conference (ISWC 2004). November, 7th 2005, Galway, Ireland.

Morgan, M. G.; and M. Henrion (1990) Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis. New York, Cambridge University Press.

Neapolitan, R. E. (1990) Probabilistic Reasoning in Expert Systems: Theory and Algorithms. New York, NY, USA: John Wiley and Sons,Inc.

Neapolitan, R. E. (2003) Learning Bayesian Networks. New York, NY, USA: Prentice Hall.

Pan, R.; Ding, Z.; Yu, Y.; and Peng, Y. (2005). A Bayesian Approach to Ontology Mapping. In Proceedings of the Fourth International Semantic Web Conference (ISWC-2005), November, 2005. Galway, Ireland.

Pan, J. Z.; Stoilos, G.; Stamou, G.; Tzouvaras, V.; and Horrocks, I. (2006). f-SWRL: A Fuzzy Extension of SWRL. In Data Semantics, special issue on Emergent Semantics, Volume 4090/2006: 28-46.

Parsons, S. (1996). Current Approaches to Handling Imperfect Information in Data Acknowledgement Bases. In IEEE Transactions on Knowledge and Data Engineering, vol. 8, issue 3, June 1996, pages 353-372. Los Alamitos, CA, USA: IEEE Computer Society.

Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA, USA: Morgan Kaufmann Publishers.

Peng, Y.; Ding, Z.; Pan, R.; Yu, Y.; Kulvatunyou, B.; Izevic, N.; Jones, A.; and Cho, H. (2007). A Probabilistic Framework for Semantic Similarity and Ontology Mapping. In Proceedings of the 2007 Industrial Engineering Research Conference (IERC), May, 2007. Nashville, TN, USA.

Peirce, C. S. (1885) On the Algebra of Logic. American Journal of Mathematics, 7:180-202.

Pfeffer, A. (2001) IBAL: A Probabilistic Rational Programming Language International. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-2001), August 4-10, vol. 1, pp. 733-740. Seattle, WA, USA.

Pfeffer, A.; Koller, D.; Milch, B.; and Takusagawa, K. T. (1999) SPOOK: A System for Probabilistic Object-Oriented Knowledge Representation. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 541-550, July 30 – August 1. Stockholm, Sweden

Pool, M.; and Aikin, J. (2004). KEEPER and Protégé: An Elicitation Environment for Bayesian Inference Tools. Paper presented at the Workshop on Protégé and Reasoning held at the Seventh International Protégé Conference, July 6 – 9. Bethesda, MD, USA.

Ragone, A.; Straccia, U.; Di Noia, T.; Di Sciascio, E; and Donini, F. M. (2007) Vague Knowledge Bases for Matchmaking in P2P E-Marketplaces. In Proceedings of the Fourth European Semantic Web Conference (ESWC-07). pp. 414-428.

Ramsey, F. P. (1931) The Foundations of Mathematics and other Logical Essays. London, UK: Kegan Paul, Trench, Trubner & Co.

Richardson, M.; Agrawal, R.; and Domingos, P. (2003) Trust Management for the Semantic Web. Proceedings of the Second International Semantic Web Conference, 2003.

Sanchez, D.; and Tettamanzi, A. (2004) Generalizing Quantification in Fuzzy Description Logics. In Proceedings 8th Fuzzy Days in Dortmund.

Sanchez, D.; and Tettamanzi, A. (2006). Fuzzy Quantification in Fuzzy Description Logics. In Sanchez, E. (Ed.), Capturing Intelligence: Fuzzy Logic and the Semantic Web. Elsevier.

Sanchez, E. (2006) Fuzzy Logic and the Semantic Web. 1st edition, April 3, 2006. Oxford, UK: Elsevier Science.

Schmidt-Schauß, M.; and Smolka, G. (1991) Attributive Concept Descriptions with Complements. Artificial Intelligence, 48(1), 1-26.

Schum, D. (1994). Evidential Foundations of Probabilistic Reasoning. New York, Wiley.

Spiegelhalter, D. J.; Thomas, A.; and Best, N. (1996) Computation on Graphical Models. Bayesian Statistics, 5, 407-425.

Sowa, J. F. (2000). Knowledge Representation: Logical, Philosophical, and Computational Foundations. Pacific Grove, CA, USA: Brooks/Cole.

Stoilos, G.; Stamou, G.; Tzouvaras, V.; Pan, J. Z.; and Horrocks, I. (2005a). Fuzzy OWL: Uncertainty and the Semantic Web. In Proceedings of the International Workshop on OWL: Experience and Directions (OWL-ED2005).

Stoilos, G.; Stamou, G.; Tzouvaras, V.; Pan, J. Z.; and Horrock, I. (2005) A Fuzzy Description Logic for Multimedia Knowledge Representation. In Proceedings of the International Workshop on Multimedia and the Semantic Web.

Stoilos, G.; Straccia, U.; Stamou, G.; and Pan, J. Z. (2006) General Concept Inclusions in Fuzzy Description Logics. In Proceedings ECAI-2006, pp. 457-61. IOS Press.

Stoilos, G.; Simou, N.; Stamou, G.; and Kollias, S. (2006) Uncertainty and the Semantic Web. IEEE Intelligent Systems, 21(5), p. 84-87, 2006.

Straccia, U. (1998) A Fuzzy Description Logic. In Proceedings AAAI-1998, pp. 594-599. AAAI Press/MIT Press.

Straccia, U. (2001) Reasoning within Fuzzy Description Logics. J. Artif. Intell. Res., 14, 137-166.

Straccia, U. (2004) Transforming Fuzzy Description Logics into Classical Description Logics. In Proceedings JELIA-2004, Vol. 3229 of LNCS, pp. 385-399. Springer.

Straccia, U. (2005a) Description Logics with Fuzzy Concrete Domains. In Proceedings UAI-2005, pp. 559-567. AUAI Press.

Straccia, U. (2005b) Fuzzy ALC with Fuzzy Concrete Domains. In Proceeedings DL-2005, pp. 96-103.

Straccia, U. (2005c) Towards a Fuzzy Description Logic for the Semantic Web. In Proceedings of the Second European Semantic Web Conference, ESWC 2005.

Straccia, U. (2007) Towards Vague Query Answering in Logic Programming for Logic-based Information Retrieval. In Proceedings of the World Congress of the International Fuzzy Systems Association (IFSA-07). Cancun, Mexico.

Tresp, C.; and Molitor, R. (1998) A Description Logic for Vague Knowledge. In Proceedings ECAI-1998, pp. 361-365. J. Wiley & Sons.

Viegas Damasio, C.; Pan, J. Z.; Stoilos, G.; and Straccia, U. (2007) Representing Uncertainty in RuleML. To Appear in Fundamenta Informaticae.

Vojtas, P. (2007) EL Description Logic with Aggregation of User Preference Concepts. M. Duzi et al. Eds. Information modelling and Knowledge Bases XVIII, pp. 154-165. Amsterdam: IOS Press.

Vojtas P.; and Vomlelova, M. (2006) On Models of Comparison of Multiple Monotone Classifications. In Proceedings of the IPMU'2006, B. Bouchon-Meunier and R. R. Yager eds., pp. 1236-1243. Paris: Editions EDK.

Yaghlane, B. B.; and Laamari, N. (2007) OWL-CM : OWL Combining Matcher Based on Belief Functions Theory. In Proceedings of the 2nd International Workshop on Ontology Matching (OM-2007). November 11, 2007. Busan, Korea.

Yaghob, J.; and Zavoral, F. (2006) Semantic Web Infrastructure using Data Pile. In WI IATW 06, IEEE Los Alamitos, ISBN 0 7695 2749 3, 2006, pp.630-633.

Yang, Y.; and Calmet, J. (2005) OntoBayes: An Ontology-Driven Uncertainty Model. Presented at the International Conference on Intelligent Agents, Web Technologies and Internet Commerce (IAWTIC2005). Vienna, Austria.

Yen, J. (1991) Generalizing Term Subsumption Languages to Fuzzy Logic. In Proceedings IJCAI-1991, pp. 472-177. Morgan Kaufmann.

Yelland, P. M. (2000) An Alternative Combination of Bayesian Networks and Description Logics. In Proceedings KR-2000, pp. 225–234. Morgan Kaufmann.

Appendix A: Use Cases

A.1. Use case 1: Discovery

A.1.1. Purpose and Goals

Given a populated catalogue of items and a user’s criteria for a particular item potentially listed in the catalogue, identify the best match.

A.1.2 Assumptions and Preconditions

A.1.3 Required Resources

A.1.4 Successful End

The user finds an item sufficiently close to their search criteria and is not hampered by vocabulary differences with those who populated the catalogue.

A.1.5 Failed End

The user does not find an item sufficiently close to their search criteria but has an explanation of how the criteria was not met, e.g. there were no screws of the length needed.

A.1.6 Main Scenario

A.1.7 Additional Background Information or References

Service oriented architecture (SOA) assumes a world of distributed resources which are accessible across a network. It is assumed that catalogues will exist for different classes of resources, such as SOA services, and the user will be able to search these catalogues for a desired item. Note, a class of items will be described using a list of relevant properties and items belonging to that class will be described by assigning values to these properties. For discovery to occur, there must be some alignment of or mediation between the list of properties used by those populating the catalogue and those searching it. There must also be some alignment of or mediation between the nonnumeric values assigned to the properties, both in describing items for the catalogue and defining the search criteria.

A.1.8 General Issues and Relevance to Uncertainty

Storage drawer on a cooking stove is missing a part that allows the drawer to easily roll in and out, but it is not clear what that part is called.

Given the class of items, the user may not unambiguously understand the meaning of the properties by which the class is described. Ambiguity on properties of items (UncAnn - UncertaintyNature: Epistemic; UncAnn - UncertaintyType: Vagueness)

The missing drawer part appears to have a screw as part of it, but it is not clear whether the screw size is part of the part description. If size is one of the descriptive properties, it is unknown whether the screw size is in English or metric units.

Given a set of properties of interest to the user, there is uncertainty in trading off the user’s target property values against those of items in the catalogue. Value Assessment (UncAnn - UncertaintyNature: Epistemic; UncAnn - UncertaintyType: Empirical)

User is a 42 years old male, graphical designer, who wants a notebook with properties hasDisplaySize="15", hasWeight="lessThan5Pounds", hasBluetooth="true", hasBatteryAutonogy="?Between2and3Hours", and hasColor="Black". Catalogue doesn't have an item that fulfills all five, but has one notebook that meets 4 and two that meet 3 of the user requirements. Instead of presenting a "no item found", the system needs to return a prioritized list. Not necessarily the item with 4 positive requirements will be the best option for the customer.

A.2. Use case 2: Wine Sweetness

A.2.1 Purpose and Goals

Given a set of knowledge bases containing information about wine, present a user with an approximate classification (according to his/her personal, and possibly vague criteria) of a particular unknown wine’s sweetness.

A.2.2 Assumptions and Preconditions

A.2.3 Required Resources

A.2.4 Successful End

The user is given (an approximation of) the sweetness classification of wine w, e.g. (an interval containing) the degree to which w is dry, off-dry or sweet.

A.2.5 Failed End

The classification of w cannot be obtained (e.g., it can only be established that the degree to which w is dry is in [0,1]).

A.2.6 Main Scenario

A.2.7 Additional Background Information or References

A.2.8 General Issues and Relevance to Uncertainty

Even when using the same terminology, the interpretation of a vague classification label (like “dry”) may differ between the creator of the knowledge base and the user who queries the knowledge base. (UncAnn - UncertaintyNature: Epistemic; UncAnn - UncertaintyType: Vagueness; UncAnn - UncertaintyDerivation: Subjective)

A.3. Use case 3: Belief fusion / Opinion pooling

A.3.1 Purpose and Goals

If a single information artifact (e.g., a knowledge base, an ontology, or a product rating on the web) shall be created from multiple possibly contradictory information sources (e.g., expert opinions, existing ontologies, or product recommendations), the user (e.g., the knowledge engineer) applies a fusion operator in order to yield possibly uncertain fused beliefs from multiple input beliefs provided by the information sources.

A.3.2 Assumptions and Preconditions

For the most basic version of this use case, the only assumption is that the user is able to retrieve information from multiple sources on the Web. For more complex versions, a technical infrastructure for the aggregation of distributed information sources needs to exist (e.g., a distributed knowledge or data base, a news aggregator...).

A.3.3 Required Resources

A.3.4 Successful End

The user managed to create a single, coherent merger of the contributions such that the merger i) is in a format which supports the representation of uncertainty, ii) is consistent, and iii) reflects the support (belief grades) provided by each information source for each contained statement appropriately.

A.3.5 Failed End

A.3.6 Main Scenario

A.3.7 Additional Background Information or References

Uncertainty approaches to belief fusion and opinion pooling (sometimes also called probability aggregation) are researched in AI since quite a long time, although no consensus regarding the "best" approach exists.

Similar approaches, although partly on a different epistemic level, exist in research communities such as sensor fusion and database integration.

A related area is the theory of social choice, especially judgement aggregation, and that of computational trust for the assessment of those knowledge sources which shall be aggregated. The latter issue is dealt with in Richardson et al. (2003).

Instance-based learning and particularly similarity-based learning could also be used for coping with the presented knowledge fusion problem. An example is presented in d'Amato et al. (2006).

An approach which relates the problem of belief fusion with Context Ontologies and provenance annotation on the Web can be found in Nickles (2007).

A.3.8 General Issues and Relevance to Uncertainty

This use case is especially relevant wrt. uncertainty in case the set of information acquired from multiple sources about the same fact is inconsistent (UncAnn - UncertaintyNature: Epistemic; UncAnn - UncertaintyType: Inconsistency) , or (more generally) if multiple information sources attribute multiple and mutually different grades of belief to the same statement. If the user does not want to decide in favor of a single alternative, and wants the merger to weight all input beliefs adequately (instead of discarding some of them), the single statement resulting from the fusion of multiple statements will be uncertain (UncAnn - UncertaintyNature: Epistemic; UncAnn - UncertaintyType: Empirical usually, but might also depend on the uncertainty type of the input statements.)

It might be reasonable to consider in addition factors such as the trustability of the information contributors at aggregation.

Examples for belief aggregation operators which can yield uncertain results are logarithmic and linear pools (LogOP, LinOP), and Bayesian Network Aggregation. One possible criterion for a successful fusion is the minimization of the divergence of the resulting probability distribution from the input probability distributions.

In order to yield an uncertain fusion result it is of course not necessary that any of the input statements which shall be fused is already uncertain itself. One important issue that it is open is how to adapt this essentially "probabilistic use case" for the use of fuzzy logic (if possible).

A.4. Use case 4: Ontology Based Reasoning and Retrieval from Large-Scale Databases

A.4.1. Purpose and Goals

Given a large company database which involved a high degree of vague and imprecise data, make the content available on the Semantic Web for ontology based querying.

A.4.2 Assumptions and Preconditions

A database containing a huge amount of vague data and a method to produce fuzzy assertions (knowledge) based on the original numerical database values.

A.4.3 Required Resources

A.4.4 Successful End

The user can efficiently and effectively find a set of models, spot or other footage without having to browse through the entire collection and try to identify with his own criteria the best matching for him. The successful accomplishment can dramatically improve the productivity of production companies or other related companies having databases with data of such nature.

A.4.5 Failed End

The users have to browse through the entire collection of images or videos with merely no help by the system and try to match the viewed content with the search criteria they have in their mind.

A.4.6 Main Scenario

The data are fuzzified according to metrics defined by the knowledge engineer expert of the specific application. Then, an ontology is created to enable ontology-based content retrieval. Finally, users can use the system and issue queries to the system. The system performs fuzzy reasoning with the data and provides the user with a ranked set of recommendations.

A.4.7 Additional Background Information or References

The specific use case scenario comes from a real world industrial application, which already has such a footage database: Production company database: CINEGRAM S.A. The use case can be coped and has been tested with several fast and scalable fuzzy reasoning systems:

ONTOSEARCH2 (see also http://dipper.csd.abdn.ac.uk/OntoSearch/about.jsp) which is a tractable DL reasoner based on the DL DL-Lite. Currently it also supports fuzzy DL-Lite. Since ONTOSEARCH2 is based on DL-Lite it is able to cope with millions and even billions of fuzzy data and perform expressive fuzzy querying over such data, and

Expressive Fuzzy DL reasoning engine: FiRE (see also http://www.image.ece.ntua.gr/~nsimou/) is a very expressive fuzzy DL reasoner based on Fuzzy-SHIN. It also supports storing and querying fuzzy knowledge over the Sesame triple-store platform additionally supporting expressive fuzzy querying over hundreds of thousands of data. Sesame provides the ability to query over hundreds of thousands of fuzzy data.

A.4.8 General Issues and Relevance to Uncertainty

Vague Concepts and information (UncAnn - UncertaintyNature: Aleatory; UncAnn - UncertaintyType: Vagueness) User wants a recommendation of videos, images or models, but the current non semantic, low level information in the database.

Example: User wants all tall and thin models that have a Student looks. The database contains the height (in cms), weights (in kgs) and age of each models. There are two cases here.

1) In absence of an ontology that defined concepts Tall, Thin and Student the user has to define these concepts in terms of height, weight etc in his query. In other words he has to issue a query like the following:

2) In the presence of an ontology the ontology engineer has defined concepts tall, thin and Student by using similar definitions as above, i.e. he has defined someone above 185km as Tall.

Obviously a model that satisfies all but the weight condition is missed; similarly if all but the age restrictions are satisfies. This is counterintuitive since still these persons might qualify according to the user needs

A.5. Use case 5: SOA Execution Context

A.5.1. Purpose and Goals

Establish the set of conditions that ensures a prospective consumer of a SOA service has identified the appropriate service to address its needs and to ensure for a possible provider of a SOA service that the consumer is authorized to use the service. In addition, the consumer and provider must agree on specifics of message exchange and the semantics of the exchanged messages, both message type and content..

A.5.2 Assumptions and Preconditions

A.5.3 Required Resources

A.5.4 Successful End

The service consumer is satisfied that the service provides the required results (i.e. real world effects) and the consumer and provider have aligned their assumptions, technical and business requirements, and policies so that use of the service may proceed.

A.5.5 Failed End

The service interaction will not proceed but in the best case the consumer and provider have sufficient information to know what broke down in establishing the execution context.

A.5.6 Main Scenario

A.5.7 Additional Background Information or References

A.5.8 General Issues and Relevance to Uncertainty

Service consumer may have insufficient information about provider's capabilities and associated requirements to ascertain whether service provider can meet needs at acceptable cost. (UncAnn - UncertaintyNature: Epistemic; UncAnn - UncertaintyType: could be Ambiguity or Vagueness)

The provider requires address and email information but does not specify privacy policy to let consumer know whether information will be used for other purposes, such as marketing lists. The consumer needs to decide whether to supply the information or terminate the interaction.

No matter how much information is supplied by the provider, it is always possible there will be some additional information that the consumer would like (or needs) to have.

Bayesian decision theory for user assistant to create probabilistic expectation of provider intent and recommend to consumer whether to provide information based on previous provider information and consumer preferences.

Service provider may have insufficient information about consumer's need to know whether the functionality it provides can meet that need. (UncAnn - UncertaintyNature: Epistemic; UncAnn - UncertaintyType: could be Ambiguity or Vagueness)

The consumer initiates a transaction but does not specify whether they are interested in a detailed log of the exchange or just a confirmation of the final results. The provider executes a different process to capture the detailed log and provides the detailed log for an extra fee. The provider could capture the detailed log and offer it at the end for a premium fee, but there is some cost to the provider in capturing the detailed log.

A simple standard default does not allow the provider to generate additional income from consumers who decide on the detailed log at the end.

Bayesian decision theory to estimate likelihood based on knowledge of consumer, e.g. categorization based on collected information about consumer. Provider will choose course of action to maximize expected financial return.

A.6. Use case 6: Recommendation

A.6.1 Purpose and Goals

Develop an ability to express a set of recommendations by a number of agents and an ability to express an aggregated recommendation and ranking such that formal inference can be carried out on the set of recommendations. Also, an ability to express preferences and scales in a formal way.

A.6.2 Assumptions and Preconditions

A.6.3 Required Resources

A.6.4 Successful End

Having obtained an aggregated recommendation and ranking such that formal inference can be carried out on the set of recommendations

A.6.5 Failed End

A.6.6 Main Scenario

A single or multiple recommendation searcher(s) express(es) her/their preferences in a machine readable format. A recommender system then combines a set of recommendations (obtained by a number of agents or other recommender systems) into an aggregated recommendation and ranking. For example, a user might input a movie, and the system would form its recommendation by aggregating recommendations provided by consumers who have seen the movie.

A.6.7 Additional Background Information or References

A.6.8 General Issues and Relevance to Uncertainty

Various agents may have different scales, and more important: different preferences and different confidence levels. A set of recommendations or a set of preferences may be inconsistent.

- the set of recommendations obtained from multiple agents might be inconsistent.

- a recommendation possibly matches the preference(s) of the recommendation searcher only imperfectly.

- a recommender system might have obtained its recommendation using statistical means (e.g., using collaborative filtering).

- an agent or a recommender system might have a low confidence in its recommendation, i.e., it is uncertain whether its recommendation actually matches the preference.

- the preferences provided by a recommendation searcher can be uncertain or inconsistent, e.g., because a user might not be able to express his preferences in sufficient detailedness.

- sometimes a single recommendation shall reflect multiple preferences provided by multiple recommendation searchers (e.g., if a group of users seeks for a single recommendation which reflects the preferences of all group members as good as possible). Uncertainty arises e.g. in case these input preferences are inconsistent.

A.7 Use Case 7: Extraction-Annotation

A.7.1 - Purpose/Goals

The motivating situation is a user (or a web service) that wants a web scale overview of available information – e.g. overview over all car selling shops. The advantage would be a possibility of comparison of different market offers. Another application is competitor tracking system.

Main problem is the size of data and the fact that these data are mainly designed for human consumption.

Many of our use cases assume that e.g. "web resources has been populated using a property set and property values that have a machine processable representation of the vocabulary used". On the other side, the W3C activity Gleaning Resource Descriptions from Dialects of Languages (GRDDL) introduces markup based on existing standards for declaring that an XML document includes data compatible with the Resource Description Framework (RDF) and for linking to algorithms (typically represented in XSLT), for extracting this data from the document (e.g. products in an e-shop).

Our approach tries to generalize this to arbitrary HTML, XHTML sources and extending "Dialects of languages" to semi-structured html pages and also to dominantly text pages (e.g. accident reports). Main goal is to do this "gleaning" (also web content mining, extraction) automatically for a large number of resources. Task is easy for humans, nevertheless humans can not process a large number of pages. Task is difficult for machines, nevertheless machines can process large number of resources. The main trick is to find a trade-off between amount of human assistance (especially in training and ontology creation) and automation. Second issue is domain dependence. One can easily write a script extracting RDF triples from a single page. The goal is to extract data from pages never visited. Third dimension of the problem is "machine difficulty" of the resource. Some pages (e.g. generated from a database) are easier for machine extraction than other dominantly textual.

Our task is: given such a resource and an ontology extract data contained in this resource (to obtain an instance of some ontology parts, typically instances of a class and some properties of that class) and annotate the original resource (wrt given ontology).

A.7.2 - Assumptions/Preconditions

A.7.3 - Required Resources

A.7.4 - Successful End

We will be able to extract RDF data from pages which are plain (both structured, textual) HTML files wrt a given ontology and annotate the original page with RDFa. Moreover the result should be machine understandable and an input for further processing - see Discovery use case.

A.7.5 - Failed End

Lot of pages will be not machine processable, lot of information will be practically unachievable for a human.

A.7.6 - Main Scenario

First type of scenario, is describing the process of extraction and annotation (details above or in links), e.g

Or we can understand this scenario as sequences needed for inclusion to final report, then this scenario has following steps

A.7.7 - Additional background information or references

A.7.8 - General Issues and Relevance to Uncertainty

Solution are extraction and annotation tools. There are many annotation tools linked on http://annotation.semanticweb.org/annotationtool_view, mainly using a proprietary uncertainty representations (or built in uncertainty handling). One of main tasks of this XG is to provide fundamentals of a standardized representation of uncertainty that could serve as the basis for information exchange. Here uncertainty annotation of results would be especially helpful.

In what follows we use acquaintance from uncertainty issues in experiments with web content mining as described in Eckhardt et al. (2007) (see also the presentation slides) .

In what follows we present issues and relevance to uncertainty which are specific for this use case and we annotate them (UncAnn) with reference to Uncertainty Ontology and extensions to classes and properties described in a fine-grained version of Uncertainty Ontology.

Assume that a user is looking for notebooks and we would like to provide a machine support for his/her search. A typical statement which is a subject of uncertainty assignment in this use case is: (UncAnn Sentence)An html coded web page with URL contains informations, which according to an ontology o1 (UncAnn World: DomainOntology) about notebooks can be expressed by a RDF triple (ntb1, O1:has_priceProperty, 20000). The agent producing this statement is (UncAnn Agent: MachineAgent) especially an induction agent (UncAnn Agent:MachineAgent:InductiveAgent).

Uncertainty nature of this statement is (UncAnn - UncertaintyNature:Epistemic:MachineEpistemic), uncertainty type is usualy (UncAnn - UncertaintyType:Empirical:Randomness). Instances used for training an extraction tool (UncAnn - World:DomainOntology:Instances) are web pages, the uncertainty model is usually complicated (mixture of html structure, regular expressions, annotation ontology and similarity measures) and combination of several models, typically (UncAnn - UncertaintyModel:CombinationOfSeveralModels:ProbabilityAndFuzzySetsCombinationModels) . Depending on this the evidence for this uncertainty statement (UncAnn - World:DomainOntology:Instances:Evidence) are precision and recall on this training set.

The goal of this use case is to find out which models of uncertainty and vagueness are appropriate. Especially it is clear that a more detailed ontology is needed (containing information supporting successful automatic extraction - it is not a human uncertainty, it is a machine uncertainty). One can expect that the system is learning/improving during usage and the extraction ontology is extended.

Extraction form textual pages need another type of knowledge, e.g. transforming a sentence to a (Subject Verb Object) tree (full, partial).

A.8. Use case 8: Soft Shopping Agent

A.8.1 Purpose and Goals

Of course, most of our constraints, e.g. on price and kilometers, aren't crisp as we may still accept e.g.~a car's cost of €11200 and with an odometer reading of 16000km. Hence, these constraints are rather vague (fuzzy) (we may model this by means of so-called fuzzy memebr functions). We may also give some preference weight to my requirements.

On the other hand, the seller may offer a discount on the car's catalogue price, but the bigger the discout the less satisfied he is. For instance, related to the e.g a sold Mazda3, the seller may consider optimal to sell above €15000, but can go down to €13500 to a lesser degree of satisfaction.

From the buyer perspective, he asks for the TOP-k cars and their optimal price, ranked the optimal degree of sadisfaction.

From the seller perspective, he may ask for the TOP-k buyers for a given car and their optimal price, ranked the optimal degree of sadisfaction.

A.8.2 Assumptions and Preconditions

A.8.3 Required Resources

A.8.4 Successful End

A.8.5 Failed End

A.8.6 Main Scenario

A.8.7 Additional Background Information or References

A.8.8 General Issues and Relevance to Uncertainty

Matchmaking in eCommerce with soft constraints is about vague reasoning. The uncertainty type is vagueness(UncAnn - UncertaintyType:Vagueness), as matchings are found only to some degree. The possible model is based on mathematical fuzzy logic(UncAnn - UncertaintyModel:Fuzzy Sets). An optimization and reasoning procedure is involved (UncAnn - Agent:Machine:Machine Deduction - optimizing finding top-k answers on the web). One important issue that remains open is Large scale Top-k retrieval algorithms for Semantic Web languages

A.9 Use Case 9: A Chain from the Web to the User

A.9.1 - Purpose/Goals

A.9.2 - Assumptions/Preconditions

A.9.3 - Required Resources

A.9.4 - Successful End

Extend web standards in such a way that the whole chain from the Web to the user is covered by uncertainty application in a unified way and reusable

A.9.5 - Failed End

A.9.6 - Main Scenario

A.9.7 - Additional background information or references

An experimental chain of tools is described (more focused on user side of chain) is described in Gurský et al. (2007). See also http://nazou.fiit.stuba.sk for more experiments on a chain of tools

A.9.8 - General Issues and Relevance to Uncertainty

In what follows we present issues and relevance to uncertainty which are specific for this use case and we annotate them (UncAnn) with reference to Uncertainty Ontology (UncertaintyOntology) and extensions to classes and properties described in Fine grained version of Uncertainty Ontology.

In what follows we use acquaintance from uncertainty issues in experiments with a chain connecting web and user as described in Eckhardt et al. (2007) (see also the presentation slides)

First there is the Web. Only a few resources are annotated (tagged or made understandable to a computer). Some of them are structured in html tables (meaning less or difficult to understand). Many pages contain information in natural language and are designed for human consumption. This is typically difficult to understand for machines (UncAnn - UncertaintyNature:Epistemic:MachineEpistemic).

When (at least a candidate) set of resources was allocated, we face a decision, download and what ?(a snippet or whole source code). Are there uncertainty problems in Wrappers, Crawlers, Search engines (see e.g. L. Galambos ''Egothor - Java search engine'').

Some middleware between user and indexed resources has to care for matching and uncertainty too (see,e.g. ''Semantic Web Infrastructure using Data Pile'' and Data Integration Using Data Pile Structure'').

How to model users is another problem with many uncertainty problems, especially from the point of combination of different models, e.g. UncAnn - UncertaintyModel: Probability and UncAnn - UncertaintyModel: PreferenceModels. Considering the whole chain maybe can help us to understand different sorts of uncertainty in a context of an integrated Web application

A.10 Use Case 10: Making an Appointment

A.10.1 - Purpose/Goals

Schedule an interaction with a provider of a business service (e.g., doctor, lawyer, etc.), taking into account geographic proximity, schedule constraints of consumer(s), schedule constraints of provider, and possibly other constraints.

A.10.2 - Assumptions/Preconditions

A.10.3 - Required Resources

A.10.4 - Successful End

Consumer is given one or more recommended times when provider is available for scheduling. Consumer selects among these. Agreed-upon time is entered into consumer’s calendar and scheduled with provider.

A.10.5 - Failed End

No provider is found who meets consumer’s constraints and is available at times consumer can make.

A.10.6 - Main Scenario

A.10.7 - Additional background information or references

This use case was inspired by the article Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web, Scientific American (pp. 29-37).

A.10.8 - General Issues and Relevance to Uncertainty

A.11 Use Case 11: User Preference Modeling for top-k Answers

A.11.1 - Purpose/Goals

A.11.2 - Assumptions/Preconditions

A.11.3 - Required Resources

A.11.4 - Successful End

User gets answer fitting to his/her preferences He/she gets answers fast, because retrieving top-k is understood without computing all answers.

A.11.5 - Failed End

There is a danger in uncertainty methods - namely combinatorial explosion - namely, replacing two valued Boolean model can lead to the point that everything is relevant in some nonzero degree.

A.11.6 - Main Scenario

A.11.7 - Additional background information or references

Efficient algorithms for top-k answering where studied in Fagin et al. (2003), learning user preferences from explicit information is described in Horvath et al. (2006). In Vojtas et al. (2007), it is shown that these models can be described in description logic and hence compatible with web modeling standards.

A.11.8 - General Issues and Relevance to Uncertainty

As result of any data mining procedure, results of such user preference mining will be uncertain.

Typical sentence which is a subject of uncertainty assignment is: (UncAnn - Sentence) User1 prefers most item1 (list of of top-k most preferred items for User1 consists of item1, ..., itemk).

This statement can be made by the user himself or by another human (UncAnn - Agent: HumanAgent). For the semantic web, more interesting case is the statement made by (UncAnn - Agent: MachineAgent). The statement can be produced by a combination of an inductive procedure (UncAnn - Agent: MachineAgent:InductiveAgent - mining user preferences) and a deductive procedure (UncAnn - Agent:MachineAgent: DeductiveAgent - optimizing finding top-k answers on the web).

Uncertainty assigned to the above statement has typically (UncAnn - UncertaintyNature:Epistemic:MachineEpistemic)

User's preference is in no case Boolean (yes-no), typical (UncAnn - UncertaintyType:Vagueness) is about vagueness, which arises when the boundaries of meaning of user objective are indistinct.

There are models using partially ordered sets to represent preferences. Different ad hoc ranking approaches are used. Possible model is UncAnn - UncertaintyModel: FuzzySets or UncAnn - UncertaintyModel: PreferenceModels. To make these uncertainty annotations usable for other machine agents a fine grained specification of UncAnn - World:DomainOntology:Instances and or UncAnn - World:DomainOntology:Evidence has to be made to support agents decision how to proceed with this information.

Efficient algorithms for top-k answering where studied in Fagin et al., ''Making Optimal aggregation algorithms for middleware '', learning user preferences from explicit information is described in ''Ordinal Classification with Monotonicity Constraints''. In ''EL description logic with aggregation of user preference concepts'' it is shown that these models can be described in description logic and hence compatible with web modeling standards.

We have developed and inductive method learning user preference from given evaluation of a sample of objects (see reference list)

We recommend aspects that are considered most important to be included in a standard representation of vagueness and uncertainty: a concept of truth value as a comparative notion of relevance / preference

A.12. Use case 12: Ontology Mediated Information Retrieval

A.12.1 Purpose and Goals

- Find top-k ranked video passages made by Umberto whose title is about 'tour' - Find top-k ranked images similar to a given one, which is about an animal

A.12.2 Assumptions and Preconditions

A.12.3 Required Resources

A data source containing the multimedia data and an ontology, a top-k reasoner logical reasoner and a multimedia information retrieval system

A.12.4 Successful End

A.12.5 Failed End

A.12.6 Main Scenario

A.12.7 Additional Background Information or References

Further information can be found in Straccia (2007) and Straccia and Visco (2007).

A.12.8 General Issues and Relevance to Uncertainty

The notion of "Relevance" in Multimedia Information Retrieval can be formalized as a vague relation among information need and multimedia object. The uncertainty type is vagueness(UncAnn - UncertaintyType:Vagueness), as matchings are found only to some degree. The possible model is based on mathematical fuzzy logic (UncAnn - UncertaintyModel:Fuzzy Sets). A reasoning procedure is also involved (UncAnn - Agent:Machine:Machine Deduction - optimizing finding top-k answers on the web).

A.13 Use Case 13: Buying Speakers

A.13.1 - Purpose/Goals

Customer needs to make a decision on (1) whether to go to a store today or wait until tomorrow to buy speakers, (2) which speakers to buy and (3) at which store. Customer is interested in two speaker features: wattage and price. Customer has a valuation formula that combines the likelihood of availability of speakers on a particular day in a particular store, as well as the two features. The features of wattage and price are fuzzy. Optionally, Customer gets the formulas from CustomerService, a Web based service that collects information about products, stores, statistics, evaluations.

A.13.2 - Assumptions/Preconditions

A.13.3 - Required Resources

A.13.4 - Successful End

Customer gets necessary information about the availability and types of speakers from stores. This information is sufficient for customer to compute the required metric.

A.13.5 - Failed End

Customer does not get necessary information and thus needs to go to multiple stores, wasting in this way a lot of time.

A.13.6 - Main Scenario

A.13.7 - Additional background information or references

A.13.8 - General Issues and Relevance to Uncertainty

Appendix B: UncAnn

The process being followed by the XG is to propose use cases illustrating different aspects of uncertainty and in parallel to develop an ontology of these uncertainty aspects so we have sme clarity about the domain we are trying to describe. The ontology being developed is UncertaintyOntology.

As a step to bring these portions of the work program together, we are adding annotations to the use cases indicating what types of uncertainty, as defined by UncertaintyOntology, we see illustrated by various parts of the use case.

The UncAnn prefix enables users unfamiliar with the process to access this explanation.

Appendix C: Bayesian and Fuzzy Approaches

This Appendix was heavily based on material extracted from the chapter "Uncertainty Representation and Reasoning in the Semantic Web" of the book "Semantic Web Engineering in the Knowledge Society," edited by Miltiatis Lytras and Jorge Cardoso. Copyright 2008, IGI Global, www.igi.pub.com. Used with permission of the publisher.

C.1 Bayesian Models

Bayesian probability provides a mathematically sound representation language and formal calculus for rational degrees of belief, which gives different agents the freedom to have different beliefs about a given hypothesis. This provides a compelling framework for representing uncertain, imperfect knowledge that can come from diverse agents. Not surprisingly, there are many distinct approaches using Bayesian probability for the Semantic Web.

Bayesian knowledge representation and reasoning systems have their formal basis in the axioms of probability theory (e.g., Ramsey, 1931; Kolmogorov, 1960/1933). Probability theory allows propositions to be assigned truth-values in the range from zero, meaning certain falsehood, to one, meaning certain truth. Values intermediate between zero and one reflect degrees of likelihood of a proposition that may be either true or false. Bayes Rule, a theorem that can be derived from the axioms of probability theory, provides a method of updating the probability of a proposition when information is acquired about a related proposition. The standard format of Bayes rule is:

On the right side of the formula, P(A) is called the prior probability of A, and represents our belief in event A before obtaining information on event B. Likewise, P(B) is called the prior probability of B. There is also P(A|B), which is the likelihood of event A given that event B has happened. On the left side of the formula there isP(B|A), which is the posterior probability of B, and represents our new belief in event B after applying Bayes rule with the information collected from event A. Bayes rule provides the formal basis for the active and rapidly evolving field of Bayesian probability and statistics. In the Bayesian view, inference is a problem of belief dynamics. Bayes rule provides a principled methodology for belief change in the light of new information.

C.1.1 Bayesian Networks (BNs)

BNs provide a means of parsimoniously expressing joint probability distributions over many interrelated hypotheses. A Bayesian network consists of a directed acyclic graph (DAG) and a set of local distributions. Each node in the graph represents a random variable. A random variable denotes an attribute, feature, or set of hypotheses about which we may be uncertain. Each random variable has a set of mutually exclusive and collectively exhaustive possible values. That is, exactly one of the possible values is or will be the actual value, and we are uncertain about which one it is. The graph represents direct qualitative dependence relationships; the local distributions represent quantitative information about the strength of those dependencies. The graph and the local distributions together represent a joint probability distribution over the random variables denoted by the nodes of the graph.

Bayesian networks have been successfully applied to create consistent probabilistic representations of uncertain knowledge in diverse fields. Heckerman et al. (1995) provide a detailed list of recent applications of Bayesian Networks. The prospective reader will also find comprehensive coverage of Bayesian Networks in a large and growing literature on this subject, such as Pearl (1988), Neapolitan (1990, 2003), and others. Figure 1 shows an example of a BN representing part of a highly simplified ontology for wines and pizzas.

In this toy example, we assume that domain knowledge about gastronomy was gathered from sources such as statistical data collected among restaurants and expertise from sommeliers and pizzaiolos. Moreover, the resulting ontology also considered imperfect knowledge to establish a probability distribution among features of the pizzas ordered by customers (i.e. type of base and topping) and characteristics of the wines ordered to accompany the pizzas.

Consider a customer who enters a restaurant and requests a pizza with cheese topping and a thin and crispy base. Using the probability distribution stored in the BN of Figure 1, the waiter can apply Bayes rule to infer the best type of wine to offer the customer given his pizza preferences the body of statistical and expert information previously linking features of pizza to wines. Such computation would be difficult when there are very many features. Bayesian networks provide a parsimonious way to express the joint distribution and a computationally efficient way to implement Bayes rule. This inferential process is shown in Figure 2, where evidence (i.e., the customer’s order) was entered in the BN and its result points to Beaujolais as the most likely wine the customer would order, followed by Cabernet Sauvignon, and so on.

Although this is just a toy example, it is useful to show how incomplete information about a domain can be used to improve decisions. In an ontology without uncertainty, there would not be enough information for a logical reasoner to infer a good choice of wine to offer the customer, and the decision would have to be made without optimal use of all the information available.

As Bayesian networks have grown in popularity, their shortcomings in expressiveness for many real-world applications have become increasingly apparent. More specifically, Bayesian Networks assume a simple attribute-value representation – that is, each problem instance involves reasoning about the same fixed number of attributes, with only the evidence values changing from problem instance to problem instance. In the pizza and wine example, the PizzaTopping random variable conveys general information about the class of pizza toppings (i.e., types of toppings for a given pizza and how it is related to preferences over wine flavor and color), but the BN in Figures 1 and 2 is valid for pizzas with only one topping. To deal with more elaborate pizzas, it is necessary to build specific BNs for each configuration, each one with a distinct probability distribution. Figure 3 depicts a BN for a 3- topping pizza with a specific customer preference displayed. Also, the information conveyed by the BNs (i.e., for 1-topping, 2-toppings, etc.) relates to the class of pizza toppings, and not to specific instances of those classes. Therefore, the BN in Figure 3 cannot be used for a situation in which the costumer asks for two 3-topping pizzas. This type of representation is inadequate for many problems of practical importance. Similarly, these BNs cannot be used to reason about a situation in which a customer orders several bottles of wine that may be of different varieties. Many domains require reasoning about varying numbers of related entities of different types, where the numbers, types, and relationships among entities usually cannot be specified in advance and may have uncertainty in their own definitions.

In spite of their limitations, BNs have been used in specific applications for the SW where the limitations on expressivity can be overcome by clever knowledge engineering workarounds. One example is BayesOWL (Ding and Peng, 2004; Ding, 2005), which augments OWL semantics to allow probabilistic information to be represented via additional markups. The result is a probabilistic annotated ontology that could then be translated to a Bayesian network. Such a translation is based on a set of translation rules that rely on the probabilistic information attached to individual concepts and properties within the annotated ontology. After successfully achieving the translation, the resulting Bayesian network will be associated with a joint probability distribution over the application domain. Although a full translation of an ontology to a standard BN is impossible given the limitations of the latter in terms of expressivity, the scheme can be successfully used to tackle specific problems involving uncertainty.

Also focusing on Bayesian extensions geared towards the Semantic Web is the work of Gu et al. (2004), which takes an approach similar to that of BayesOWL. A related effort is the set of RDF extensions being developed by Yoshio Fukushige (2004). Generally speaking, SW approaches that rely on BNs will have to compensate for their lack of expressiveness by specializing in a specific type of problem, such as the BN-focused approaches for solving the ontology mapping problem (e.g., Mitra et al., 2004; and Pan et al., 2005; Peng et al., 2007).

C.1.2 Probabilistic Extensions to Description Logics.

Description logics divide a knowledge base into two components: a terminological box, or T-Box, and the assertional box, or A-Box. The first introduces the terminology (i.e., the vocabulary) of an application domain, while the latter contains assertions about instances of the concepts defined in the T-Box. Description logics are a subset of first-order logic (FOL) that provide a very good combination of decidability and expressiveness. In fact, an important desired property of description logics is the decidability of their reasoning tasks. Description logics are also the basis of the web ontology language OWL, whose sublanguages OWL Lite and OWL DL correspond to the expressive description logicsSHIF(D) and SHOIN(D), respectively.

There are several probabilistic extensions of description logics in the literature, which can be classified according to the generalized description logics, the supported forms of probabilistic knowledge, and the underlying probabilistic reasoning formalism.

Heinsohn (1994) presents a probabilistic extension of the description logic ALC (a member of the AL-languages (Schmidt-Schauß and Smolka, 1991) obtained by including the full existential quantification and the union constructors to the basic AL (attributive language), which allows to represent terminological probabilistic knowledge about concepts and roles, and which is essentially based on probabilistic reasoning in probabilistic logics. Heinsohn, however, does not allow for assertional knowledge about concept and role instances. Jaeger (1994) proposes another probabilistic extension of the description logic ALC, which allows for terminological and assertional probabilistic knowledge about concepts and roles and about concept instances, respectively, but does not support assertional probabilistic knowledge about role instances (but he mentions a possible extension in this direction). The uncertain reasoning formalism in Jaeger (1994) is essentially based on probabilistic reasoning in probabilistic logics, as the one in Heinsohn (1994), but coupled with cross-entropy minimization to combine terminological probabilistic knowledge with assertional probabilistic knowledge. Jaeger’s recent work (2006) focuses on interpreting probabilistic concept subsumption and probabilistic role quantification through statistical sampling distributions, and develops a probabilistic version of the guarded fragment of first-order logic.

The work by Koller et al. (1997) gives a probabilistic generalization of the CLASSIC description logic, called P-CLASSIC. In short, each probabilistic component is associated with a set P of p-classes, and each p-class C in set P is represented using a Bayesian network. Like Heinsohn’s work (1994), the work by Koller et al. (1997) allows for terminological probabilistic knowledge about concepts and roles, but does not support assertional probabilistic knowledge about instances of concepts and roles. However, differently from Heinsohn (1994), it is based on inference in Bayesian networks as underlying probabilistic reasoning formalism. Closely related work by Yelland (2000) combines a restricted description logic close to FL with Bayesian networks. It also allows for terminological probabilistic knowledge about concepts and roles, but does not support assertional knowledge about instances of concepts and roles.

Another description logic with a probabilistic extension is SHOQ(D) (Horrocks and Sattler, 2001). SHOQ(D) is the basis of DAML+OIL (Horrocks, 2002), the language that came from merging two ontology languages being developed in the US (DAML) and Europe (OIL) and has been superseded by OWL. Its probabilistic extension is called P-SHOQ(D) (Giugno and Lukasiewicz, 2002) (see also (Lukasiewicz, 2008)) and allows for expressing both terminological probabilistic knowledge about concepts and roles, as well as assertional probabilistic knowledge about instances of concepts and roles. P-SHOQ(D) is based on probabilistic lexicographic entailment from probabilistic default reasoning (Lukasiewicz, 2002) as underlying probabilistic reasoning formalism, which treats terminological and assertional probabilistic knowledge in a semantically very appealing way as probabilistic knowledge about random and concrete instances, respectively.

Description logics are highly effective and efficient for the classification and subsump-tion problems that they were designed to address. However, their ability to represent and reason about other commonly occurring kinds of knowledge is limited. One restrictive aspect of DL languages is their limited ability to represent constraints on the instances that can participate in a relationship. As an example, a probabilistic DL version of the toy example in Figures 1 to 3 would allow us to instantiate (say) three pizzas. However, suppose we want to express that for a given pizza to be compatible with another pizza in a specific type of situation (e.g., a given mixture of toppings for distinct pizzas), it is mandatory that the two individuals of class pizza involved in the situation are not the same. In DLs, making sure that the two instances of class pizza are different in a specific situation is only possible if we actually instantiate/specify the tangible individuals involved in that situation. Indeed, stating that two “fillers” (i.e., the actual individuals of class Pizza that will “fill the spaces” of concept pizza in our statement) are not equal without specifying their respective values would require constructs such as negation and equality role-value-maps, which cannot be expressed in description logics. While equality and role-value-maps provide additional useful means to specify structural properties of concepts, their inclusion makes the logic undecidable (Calvanese and De Giacomo, 2003, page 223).

C.1.3 First-Order Probabilistic Approaches

In recent years, a number of languages have appeared that extend the expressiveness of probabilistic graphical models in various ways. This trend reflects the need for probabilistic tools with more representational power to meet the demands of real world problems, and goes to the encounter of the needs for Semantic Web representational schemes compatible with incomplete, uncertain knowledge. A clear candidate logic to fulfill this requirement for extended expressivity is first-order logic (FOL), which according to Sowa (2000, page 41) “has enough expressive power to define all of mathematics, every digital computer that has ever been built, and the semantics of every version of logic, including itself.”

FOL was invented independently by Frege and Pierce in the late nineteenth century (Frege, 1879/1967; Pierce, 1885) and is by far the most commonly used, studied, and implemented logical system. A theory in first-order logic assigns definite truth-values only to sentences that have the same truth-value (either true or false) in all interpretations of the theory. The most that can be said about any other sentence is that its truth-value is indeterminate. A logical system is complete if all valid sentences can be proven and negation complete if for every sentence, either the sentence or its negation can be proven. Kurt Gödel proved both that first-order logic is complete, and that no consistent logical system strong enough to axiomatize arithmetic can be negation complete (cf. Stoll, 1963; Enderton, 2001). However, systems based on classical first-order logic lack a theoretically principled, widely accepted, logically coherent methodology for reasoning under uncertainty. Below are some of the approaches addressing this issue.

Object-Oriented Bayesian Networks (Koller and Pfeffer, 1997; Bangsø and Wuillemin, 2000; Langseth and Nielsen, 2003) represent entities as instances of object classes with class-specific attributes and probability distributions. Probabilistic Relational Models (PRM) (Pfefferet al., 1999; Getooret al., 2000; Getooret al., 2001; Pfeffer, 2001) integrate the relational data model (Codd, 1970) and Bayesian networks. PRMs extend standard Bayesian Networks to handle multiple entity types and relationships among them, providing a consistent representation for probabilities over a relational database. PRMs cannot express arbitrary quantified first-order sentences and do not support recursion. Although PRMs augmented with DBNs can support limited forms of recursion, they still do not support general recursive definitions. Jaeger (1997) extends relational probabilistic models to allow recursion, but it is limited to finitely many random variables. Plates (Buntine, 1994; Gilkset al., 1994; Spiegelhalteret al., 1996) represent parameterized statistical models as complex Bayesian networks with repeated components.

DAPER (Heckermanet al., 2004) combines the entity-relational model with DAG models to express probabilistic knowledge about structured entities and their relationships. Any model constructed in Plates or PRM can be represented by DAPER. Thus, DAPER is a unifying language for expressing relational probabilistic knowledge. DAPER expresses probabilistic models over finite databases, and cannot represent arbitrary first-order sentences involving quantifiers. Therefore, like other languages discussed above, DAPER does not achieve full first-order representational power.

MEBN (Laskey and Mahoney, 1997; Laskey and Costa, 2005; Laskey, 2007) represents the world as consisting of entities that have attributes and are related to other entities. Knowledge about the attributes of entities and their relationships to each other is represented as a collection of MEBN fragments (MFrags) organized into MEBN Theories (MTheories). An MFrag represents a conditional probability distribution for instances of its resident random variables given their parents in the fragment graph and the context nodes. An MTheory is a set of MFrags that collectively satisfies consistency constraints ensuring the existence of a unique joint probability distribution over instances of the random variables represented in each of the MFrags within the set. MEBN semantics integrates the standard model-theoretic semantics of classical first-order logic with random variables as formalized in mathematical statistics.

Although the above approaches are promising where applicable, a workable solution for the Semantic Web requires a general-purpose formalism that gives ontology designers a range of options to balance tractability against expressiveness. Current research on SW formalisms using first-order probabilistic logics is still in its infancy, and generally lack a complete set of publicly available tools. Examples include PR-OWL (Costa, 2005), which is an upper ontology for building probabilistic ontologies based on MEBN logic, and KEEPER (Pool and Aiken, 2004), an OWL-based interface for the relational probabilistic toolset Quiddity*Suite, developed by IET, Inc. Their constructs are similar in spirit and provide an expressive method for representing uncertainty in OWL ontologies. Costa (2005) gives a definition for Probabilistic Ontologies, develops rules for constructing PR-OWL ontologies in a manner that can be translated into Quiddity*Suite, and describes how to perform the translation. Carvalho et al. (2007) and Costa et. al. (2008) present an open source, Java-based, PR-OWL/MEBN GUI and reasoner package, UnBBayes-MEBN, that greatly facilitates the process of building probabilistic ontologies and reasoning with them.

As an illustration of the expressiveness of a first-order probabilistic logic, Figure 4 presents a graphical depiction of the MFrags for the wine and pizza toy example (1). It conveys both the structural relationships (implied by the arcs) among the nodes and the numerical probabilities (embedded in the probability distributions and not depicted in the figure). The MFrags depicted in Figure 4 form a consistent set that allows to reason probabilistically about a domain and can be stored in an OWL file using the classes and properties defined in the PR-OWL upper ontology. The MFrags can be used to instantiate situation specific Bayesian networks to answer queries about the domain of application being modeled. In other words, a PR-OWL probabilistic ontology consists of both deterministic and probabilistic information about the domain of discussion (e.g., wines and pizzas), stored in an OWL file that can be used for answering specific queries for any configuration of the instances given the evidence at hand.

In particular, the toy ontology of Figure 4 can be applied to reason about situations involving any number of pizzas with any number of toppings on each, accompanied by any number of bottles of wine, and including any possible interactions among specific instances of those. Figure 5 illustrates this concept, depicting a situation in which evidence a customer has ordered one thin and crispy pizza with three toppings (cheese, meat, and sauce) and is planning to order one bottle of wine. The BN represents the response to a request to suggest a good wine to go with the pizzas.

In MEBN syntax (3), the knowledge base is augmented by an instance of pizza (!P0), three instances of topping types (!T0, !T1, !T2), and an instance of wine (!W0). To answer the query on the wine suggestion, a probabilistic reasoner will use the evidence available to build a Situation Specific Bayesian Network (SSBN). This example was constructed to yield the same BN as Figure 3. This illustrates the point that the MFrags in Figure 4 have captured all information that is needed to build SSBNs for any specific configuration of pizzas and wines for this toy example.

Clearly, this example is oversimplified, but it suffices to illustrate how PR-OWL can be used to build a probabilistic ontology combining legacy ontologies of pizzas and wines. This example illustrates the use of an expressive probabilistic language to capture knowledge that cannot be expressed with standard Bayesian networks. Probabilistic ontologies are an increasingly important topic in forums devoted to best practices in systems development. Given the nature of the domain knowledge embedded in their systems, system developers in general would profit most from the advantages of being able to convey such knowledge with a principled treatment for uncertainty.

(1) Inspired by the wine ontology available at http://protege.cim3.net/cgi-bin/wiki.pl?ProtegeOntologiesLibrary and the pizza ontology presented in Horridge et al.(2004).

(2) The pentagon nodes are context nodes, representing constraints that must be satisfied for the distributions in the MFrag to apply. The trapezoid nodes are input nodes, whose probability distribution is defined outside the MFrag. The oval nodes are resident nodes, whose distributions are defined in the MFrag.

(3) In MEBN, RVs take arguments that refer to entities in the domain of application. An interpretation of the theory uses entity identifiers as labels to refer to entities in the domain. Entity identifiers are written either as numerals or as alphanumeric strings beginning with an exclamation point, e.g., !M3,48723.

C.2 Fuzzy Logic Models

Consider the statement “it will be blue sky tomorrow”. This statement is uncertain, that is, it is either true or false, depending on the weather conditions of tomorrow, but we generally do not have a complete knowledge about whether it will be blue sky tomorrow or not. In probabilistic formalisms, we thus assume a set of possible worlds, each of which is associated with a probability. Intuitively, we are uncertain about which possible world is the right one. In each world, we only allow for binary truth-values, and thus in each world the statement “it will be blue sky tomorrow” is either true or false. This way, we can quantify our ignorance about whether it will be blue sky tomorrow or not. For example, we may say that the probability that it will be blue sky tomorrow is 0.7, which means that the probabilities of all worlds in which it will be blue sky tomorrow sum up to 0.7.

Consider next the statement “John is tall.” This statement is vague, that is, it is more or less true, depending on the body height of John, but we are unable to say whether this statement is completely true or false due to the involvement of the vague concept “tall,” which does not have a precise deﬁnition. In fuzzy formalisms, we assume fuzzy interpretations, which directly generalize binary interpretations by mapping elementary vague propositions into a truth value space between false and true. For example, we may say that John is tall with the degree of truth 0.7, which intuitively means that John is relatively tall but not completely tall.

It is also important to point out that vague statements are truth-functional, that is, the degree of truth of a vague complex statement (which is constructed from elementary vague statements via logical operators) can be calculated from the degrees of truth of its constituents, while uncertain complex statements are generally not a function of the degrees of uncertainty of their constituents (Dubois and Prade, 1994).

Vagueness abounds especially in multimedia information processing and retrieval. An-other typical application domain for vagueness and thus fuzzy formalisms are natural language interfaces to the Web. Furthermore, fuzzy formalisms have also been successfully applied in ontology mapping, information retrieval, and e-commerce negotiation tasks.

C.2.1 Fuzzy Propositional Logics

To combine and modify the truth values in [0, 1], one assumes combination functions, namely, conjunction, disjunction, implication, and negation functions, denoted ⊗, ⊕, ⊳, and ⊖, respectively, which are functions ⊗, ⊕, ⊳: [0, 1] × [0, 1] → [0, 1] and ⊖: [0, 1] → [0, 1] that generalize the ordinary logical operators ∧, ∨, →, and ¬, respectively, to the set of truth values [0, 1]. As usual, we assume that the combination functions have some natural algebraic properties, namely, the properties shown in Tables 1 and 2. Note that in Table 1, Tautology and Contradiction follow from Identity, Commutativity, and Monotonicity. Note also that conjunction and disjunction functions (with the properties shown in Table 1) are also called triangular norms and triangular co-norms (Hájek, 1998), respectively. The combination functions of some well-known fuzzy logics are shown in Table 3.

More formally, a fuzzy (propositional) interpretation I maps each elementary vague proposition p into the set of truth values [0,1], and is then extended inductively to all (complex) vague propositions (which are constructed from the elementary vague propositions by using the binary and unary logical operators ∧, ∨, →, and ¬) as follows (where ⊗, ⊕, ⊳, and ⊖ are conjunction, disjunction, implication, and negation functions, respectively, as described above):

A fuzzy (propositional) knowledge base consists of a finite set of fuzzy formulas, which have one of the forms φ ≥ l, φ ≤ l, φ > l, or φ < l, where φ is a vague proposition, and l is a truth value from [0,1]. Such statements express that φ has a degree of truth of at least, at most, greater than, and lower than l, respectively. For example, tall_John ≥ 0.6 says that tall_John has a degree of truth of at least 0.6. Any such fuzzy knowledge base represents a set of fuzzy interpretations, which can be used to define the notions of satisfiability, logical consequence, and tight logical consequence, as usual. Here, it is important to point out the difference to Bayesian networks: rather than encoding one single probability distribution (over a set of binary interpretations), fuzzy knowledge bases encode a set of fuzzy interpretations.

C.2.2 Fuzzy Description Logics and Ontology Languages

In fuzzy description logics and ontology lan guages, concept assertions, role assertions, concept inclusions, and role inclusions have a degree of truth rather than a binary truth value. Semantically, this extension is essentially obtained by (i) generalizing binary first-order interpretations to fuzzy first-order interpretations and (ii) interpreting all the logical operators by a corresponding combination function. Syntactically, as in the fuzzy propositional case, one then also allows for formulas that restrict the truth values of concept assertions, role assertions, concept inclusions, and role inclusions. Some important new ingredients of fuzzy description logics are often also fuzzy concrete domains, which include fuzzy predicates on concrete domains, and fuzzy modifiers (such as “very” or “slightly”), which are unary operators that change the membership functions of fuzzy concepts.

As a fictional example, an online shop may use a fuzzy description logic knowledge base to classify and characterize its products. For example, suppose (1) textbooks are books, (2) PCs and laptops are mutually exclusive electronic products, (3) books and electronic products are mutually exclusive products, (4) PCs have a price, a memory size, and a processor speed, (5) pc1 is a PC with the price 1300€, the memory size 3 GB, and the processor speed 4 GHz, (6) pc2 is a PC with the price 500€, the memory size 1 GB, and the processor speed 2 GHz, (7) pc3 is a PC with the price 900€, the memory size 2 GB, and the processor speed 3 GHz, (8) ibm, acer, and hp are the producers of pc1, pc2, and pc3, respectively. These relationships are expressed by the following description logic knowledge base:

(1) Textbook ⊑ Book;
(2) PC ⊔ Laptop ⊑ Electronics; PC ⊑ ¬Laptop;
(3) Book ⊔ Electronics ⊑ Product; Book ⊑ ¬Electronics;
(4) PC ⊑ ∃hasPrice.Integer ⊓ ∃hasMemorySize.Integer ⊓ ∃hasProcessorSpeed.Integer;
(5) (PC ⊓ ∃hasPrice.1300 ⊓ ∃hasMemorySize.3 ⊓ ∃hasProcessorSpeed.4)(pc1);
(6) (PC ⊓ ∃hasPrice.500 ⊓ ∃hasMemorySize.1 ⊓ ∃hasProcessorSpeed.2)(pc2);
(7) (PC ⊓ ∃hasPrice.900 ⊓ ∃hasMemorySize.2 ⊓ ∃hasProcessorSpeed.3)(pc3);
(8) produces(ibm, pc1); produces(acer, pc2); produces(hp, pc3).

The notions “expensive PCs”, “PCs having a large memory”, and “PCs having a fast processor” can then be deﬁned as fuzzy concepts by adding the following three fuzzy concept deﬁnitions:

ExpensivePC ≡ PC ⊓ ∃hasPrice.PCExpensive,
LargeMemoryPC ≡ PC ⊓ ∃hasMemorySize.MemoryLarge,
FastProcessorPC ≡ PC ⊓ ∃hasProcessorSpeed.ProcessorFast.

Here, PCExpensive, MemoryLarge, and ProcessorFast are fuzzy unary datatype predicates, which are deﬁned by PCExpensive(x) = rs(x; 600, 1200), MemoryLarge(x) = rs(x; 1, 3), and ProcessorFast(x) = rs(x; 2, 4), respectively, where rs(x; a, b) is the so-called right-shoulder function (see Figure 6). Informally, as for the fuzzy concept “expensive PCs”, every PC costing at least 1200€ (resp., at most 600€) is definitely expensive (resp., not expensive), while every PC costing between 600€ and 1200€ is expensive to some degree between 0 and 1.

Similarly, the notions “costs at most about 1000€” and “has a memory size of around 2 GB” in a buyer’s request can be expressed through the following fuzzy concepts C and D, respectively:

where LeqAbout1000 = ls(500, 1500) and Around2 = tri(1.5, 2, 2.5) (see Figure 6).

figure6a

Figure 6 – (a) triangular function tri(x; a, b, c), (b) left-shoulder function ls(x; a, b), and (c) right-shoulder function rs(x; a, b)

The literature contains many different approaches to fuzzy extensions of description logics and ontology languages. They can be roughly classified according to (a) the description logics or the ontology languages that they generalize, (b) the fuzzy constructs that they allow, (c) the fuzzy logics that they are based on, and (d) their reasoning algorithms. Below we summarize some of the main approaches.
The earliest works is due to Yen (1991), who proposes a fuzzy extension of a quite restricted sublanguage of ALC. Yen considers fuzzy terminological knowledge
(without terminological axioms that are true with some degree in [0,1]), along with fuzzy modifiers, but no fuzzy assertional knowledge, and he uses Zadeh Logic as underlying fuzzy logic. Yen’s work also includes a reasoning algorithm, which allows for testing crisp subsumptions. Tresp and Molitor’s work (1998) presents a more general fuzzy extension of ALC. Like Yen's work, it also includes fuzzy terminological knowledge (without terminological axioms that are true with some degree in [0,1]), along with a special form of fuzzy modifiers, but no fuzzy assertional knowledge, and it is based on Zadeh Logic. The reasoning algorithm of Tresp and Molitor’s work is a tableaux calculus for computing subsumption degrees.

Another important fuzzy extension of ALC is due to Straccia (1998, 2001), who allows for both fuzzy termino logical and fuzzy assertional knowledge, but not for fuzzy modifiers, and again assumes Zadeh Logic as underlying fuzzy logic. Straccia’s work also includes a tableaux calculus for deciding logical consequences and computing tight logical consequences. Hölldobler et al. (2002, 2005) extend Straccia's fuzzy ALC with fuzzy modifiers of the form fm(x) = xβ, where β > 0, and present a sound and complete reasoning algorithm for the graded subsumption problem.

Straccia (2004) shows how reasoning in fuzzy ALC under Zadeh Logic can be reduced to reasoning in classical ALC. This idea has also been explored by Li et al. (2005a, 2005b).

Approaches towards more expressive fuzzy description logics include the works by Sanchez and Tettamanzi (2004, 2006), who consider the description logic ALCQ. They introduce the new notion of fuzzy quantifiers. As underlying fuzzy logic, they also assume Zadeh Logic. Their reasoning algorithm calculates the satisfiability interval for a fuzzy concept. Straccia (2005c) defines the semantics of a fuzzy extension of SHOIN(D), which is the description logic that stands behind OWL DL. Stoilos et al. (2005a) use this semantics to define a fuzzy extension of the OWL language, and also propose a translation of fuzzy OWL to fuzzy SHOIN.

Other works include the one by Hájek (2005, 2006), who considers ALC under arbitrary t-norms and proposes especially a reasoning algorithm for testing crisp subsumptions. Bonatti and Tettamanzi (2006) provide some complexity results for reasoning in fuzzy description logics.

Straccia (2005a, 2005b) presented a calculus for ALC(D), which workswhenever the connectives, the fuzzy mod ifiers, and the concrete fuzzy predicates are representable as bounded mixed integer linear programs. For example, Łukasiewicz logic satisfies these conditions. The method has been extended to fuzzy SHIF(D), which isthe description logic standing behind OWL Lite, and a reasoner (called fuzzyDL) supporting Zadeh, Łukasiewicz, and classical semantics has been implemented and is available from Straccia's web page.

Towards reasoning in fuzzy SHOIN(D), Stoilos et al. (2007) show results pro viding a tableaux calculus for fuzzy SHIN without fuzzy general concept inclusions and under the Zadeh semantics. Stoilos et al. (2006) provide a generalization thereof that additionally allows for fuzzy general con cept inclusions. In closely related work, Li et al. (2006) provide a tableaux calculus for fuzzy SHI with fuzzy general concept inclusions.

Some reasoners for fuzzy description logics and ontology languages are listed as follows: (1) fuzzyDL, a fuzzy OWL-Lite reasoner with a full fledged mixed integer programming environment; (2) DLMedia, a fuzzy DLR-Lite reasoner with a built-in top-k database retrieval engine; (3) FiRE, which has grown out of the algorithm of SHIN (Stoilos et al., 2006); and (4) the ONTOSEARCH2 platform, which is a known DL-Lite reasoner, and which supports reasoning and querying over fuzzy DL-Lite (Pan et al., 2008).