Use Case OLD Explorations 1

From Decision XG
Revision as of 02:35, 25 June 2010 by Jwaters2 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Using Open Linked Data for Decision-Making

Let's say I need to write a report about recent significant earthquakes. To begin, I need to decide what earthquakes are significant. What are my options? First, I note that there is a nice dataset called "Dataset 34" of earthquake data already available in RDF format. The description of the dataset is here: http://data-gov.tw.rpi.edu/wiki/Dataset_34. (Much thanks and credit to RPI for doing the excellent work on the open linked data. The RDF data representation and sample SPARQL query below are from their page.) The dataset itself is here: http://data-gov.tw.rpi.edu/raw/34/data-34.rdf. The data consists of a list of earthquake "entries" in the following format:

<rdf:Description rdf:about="#entry00002">
 <src>ci</src>
 <eqid>10622493</eqid>
 <version>1</version>
 <datetime>Monday, April 19, 2010 05:10:36 UTC</datetime>
 <lat>32.6808</lat>
 <lon>-115.8198</lon>
 <magnitude>1.9</magnitude>
 <depth>0.10</depth>
 <nst>27</nst>
 <region>Southern California</region>
 <rdf:type rdf:resource="http://data-gov.tw.rpi.edu/2009/data-gov-twc.rdf#DataEntry"/>
</rdf:Description>

So this data set looks good and appropriate for helping me with my decision. My decision "options" will be these entries. Now what are my "metrics" for making my decision? Let's assume that I decide that significant earthquakes for my report will be ones within the last 7 days, with a magnitude greater than 3, and a depth of less than 50 miles. And then let's assume I would like to rank order those by magnitude. From this assessment of significance, I will choose the earthquakes I wish to include in my report.

SPARQL For Decision Assessment

If I had an ontology that represented these decision concepts (question, options, metrics, assessments, answers), then I could perhaps associate them with the open linked data in a manner that allows me to utilize the open linked data, not just this earthquake data, but any data, to represent and help me make decisions. In fact, SPARQL would allow me to do my "assessment" in this case with the following query:

PREFIX dgp32:  <http://data-gov.tw.rpi.edu/vocab/p/32/>
PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#> 
SELECT ?id ?label ?datetime  ?lat ?lon ?magnitude ?depth  ?region ?src ?uri
FROM <http://data-gov.tw.rpi.edu/raw/34/data-34.rdf>
WHERE { 
        ?uri dgp32:eqid ?id. 
        ?uri dgp32:eqid ?label. 
        ?uri dgp32:region ?region.
        ?uri dgp32:datetime ?datetime. 
        ?uri dgp32:magnitude ?magnitude.
        ?uri dgp32:depth ?depth. 
        ?uri dgp32:lat ?lat.
        ?uri dgp32:lon ?lon. 
        ?uri dgp32:src ?src.
        filter ( xsd:float(?magnitude) >= 3 && xsd:float(?depth) <= 50 )
}
ORDER BY DESC (xsd:float(?magnitude))


In this query, we are pulling out all of the fields from the earthquake data for every earthquake entry where the magnitude is greater than 3 and depth is less than 50 and ordering by magnitude. (NOTE: You can run this query by pasting it into the online SPARQL Query engine at http://www.sparql.org/sparql.html. The dataset itself only contains data within the last week.) If we had an ontology with decision labels, how would this add anything to what is already available? First, they help to make explicit and categorize what is being done with the data. Second, there are aspects of decision-making which involve some extra components or attributes. For example, for many decisions I would like to weight my metrics. In other words, I might not want to weight "magnitude", "depth" and "recency" all the same.

Weighting Metrics

Perhaps I want to give magnitude more weight than depth and depth more weight than recency. Also note that these "metrics" are properties (owl:DatatypeProperty) that associate an object (in this case an earthquake entry) with a literal value (like a float or integer, which is useful for ordering and assessment). So in the general case, consider that a decision-maker might potentially use any owl:DatatypeProperty as a metric (for this data, this might include lat, lon, magnitude, etc.) A user might want a list of the possible "metrics" and then be able to pick and choose. So in the general case, perhaps it would be handy to create an ancillary ontology to an open data ontology and import the latter and say that any owl:DatatypeProperty is a type of "metric". So here's a minimal first step on this "metric" portion of the ontology as it might be used in an application:

1) Define a new class "Metric" and state that "owl:DatatypeProperty" is a subclass of "Metric"
2) Define a property "metricWeight" with domain "Metric" and range "float".
3) Import an open data ontology, like the earthquake dataset.
4) Allow a user to browse through the "metrics", selecting and setting thresholds.
5) Create a SPARQL query as the first cut assessment and ordering of the options.
6) Run the query and present the ordered assessment for user selection & refinement

Issues

(1) Does having owl:DatatypeProperty as a subclass of "metric" make sense? 
(2) Why doesn't the specified query appear to run correctly in one of the online SPARQL engines?
    RESOLVED: The query above runs fine in http://www.sparql.org/sparql.html.
(3) What is the best way to handle a separate ontology that gracefully links/subclasses/etc.
     both to owl and to the open data set?
(4) Is the earthquake rdf dataset properties and classes  sufficiently described for this use? 
     If not, what more is needed?