Reaching out onto the Web

So what is this Semantic Web, anyway? Where does the Web come in? So far you've shown us just a language for writing data.

(You do need to know this bit!)

Looking inside web resources: `log:semantics` and `log:includes`

Life in the real world is full of data from different places. Rather than putting all the data into one big pot and believing it, rules often have to look specifically at which document said what.

Cwm has built-in functions to allow it to interact with the web. The concept of a formula - a set of RDF statements - allows it to consider separate sets of data.

The basic function which connects RDF to the web is log:semantics. The log:semantics of a document is the formula which one gets by parsing a semantic web document. As it is a built-in function, when cwm needs to evaluate it it will pick up the document (N3 or RDF/XML) and parse it, returning the formula¹.

{ <foo.rdf> log:semantics ?f } => { ?f a :InterestingFormula}.

Having got a formula, we can test what it says using log:includes. One formula log:includes a second formula if for each statement in the second, there is a corresponding one in the first. This the same pattern matching which happens with log:implies rules: the names of variables do not have to match.

So let's say we we have a concept of a semantic web home page for a person. We decide on the policy that if someone's home page says that they are a vegetarian, then we believe that they are a vegetarian.

@forAll :x.
{:x :homePage log:includes { :x a :Vegetarian }}=> { :x a :Vegetarian}.

Why didn't we use the ?x form there? Well, we have some nested formulae. The variable x is quantified in the scope of the whole document. The definition of ?x is that it is quantified in the scope of not the immediate formula but the next enclosing formula. So we can't use it at different levels of nesting. We are not asking whether the home page includes "For all x, x is vegetarian". This is an example of how one must be explicit about the scope of variables when they are used at more than one level.

Implementing defaults and `log:notIncludes`

Because a formula is a finite size, you can test for what it does not say, with log:notIncludes. Here, we have a rule that is the specification for a car doesn't say what color it is then it is black.

@forAll :car.
{ :car.auto:specification log:notIncludes {:car auto:color []}}
    => {:car auto:color auto:black}.

Note the use of [] here in the nested formula as a blank node. If the spec said that a car had color green, then that would mean that the car had color something, so we would say that the formula included :car auto:color [] . A statement with a [] in it you can think of as weaker version of one with a value for the color.

This is a way to do defaults. Notation3 as it is doesn't have defaults, because on the web, you can't say "if nothing says it is another color". You can never know in the whole web whether anyone has given a color. Also, if we start to just loosely talk about defaults in the sense of if you don't already know a color, then different agents will end up drawing different conclusions from the same data, which is not a good foundation for a scalable web. So, you handle defaults by first running rules to work out everything which is specified, and then on the result of that do a notIncludes rule like that above to implement the default values.

Thinking inside thinking with `log:conclusion`

To do some inference within another set of rules, a useful relationship is that between a formula, and the result of thinking about it - running any rules in the formula on all the data, recursively just like cwm's --think command line option. This relationship is log:conclusion. To make the initial formula, you can use log:conjunction to merge a list of formulae.

{   ( <input.data>.log:semantics 
      <axioms.n3>.log:semantics
      <system-rules.n3>.log:semantics ) log:conjunction :f.
    :f log:conclusion :g.
    :g log:notIncludes {  :request a :ValidRequest }
} => {
    :request a InvalidRequest
}.

This means: if what you get by taking the input data, the axioms and the system rules together and thinking about it doesn't tell you that the request is valid, then it is invalid.

More peeking: `log:content`, `log:n3String`, `log:uri`

The nice thing about log:semantics is that it deals with all the web protocols in one simple function. This is a simple, clean, view of the web as a set of interwoven formulae.

It is possible also to get a little more involved, using the following functions which separate the looking up from the parsing.

log:uri: The uri of a resource. <http://example.org/> log:uri "http://example.org/".
Normal logic processing doesn't look at URIs but in some cases one needs to. This is a level-breaker: it lets an N3 system look at its infrastructure. It is a function which cwm can evaluate either way: resource to URI or URI to resource.
log:content: For a document, the string which was returned as the document contents when the URI of the resource was looked up.
log:parsedAsN3: For a string, the formula you get by parsing it as a Notation3 document.
log:N3String: For a formula, a string which expressed that formula in N3.

One of the uses of this, as we will see in the next section, is to test the digital signature of a string before accepting the data encoded in it.

Getting results from the web: `log:definitiveService` and `log:definitiveDocument`

There is some properties for which there is just a well-defined set of values. The state codes of the US states is an example. There are 50 states, and each has one state code and one state name. Once you know them, you know them. Once you know where to look them up, you can resolve any query about them. That's got to be useful. It is represented by giving, somewhere, a log:definitiveDocument for a the property. This is metadata - data about data. You have a lot of control over where cwm will look for metadata, and how it will use it.

This behaviour is controllable in cwm by the --mode flags. By default cwm won't doing anything about it at all. (See:Cwm's --mode flags in manual page). When cwm runs with r and s mode flags, and it finds in the query it i strying to match a statement whose predicate has a definitive document, then it will read the document, and search it (alone) to resolve that query.

Therflag is necessary for any of this to work. This causes it to check for metadata in the working formula - the current dataset.

The s flag makes cwm also check for schemas. It does this only when trying to resolve a query. If the predicate is from a namespace it doesn't know, it will go to the web to see what it can learn. Unless the eflag is set, then it won't mind if there are any errors in this process, it will just abandon it.

Most vocabularies, ontologies, have all kinds of metadata which helps a query engine resolve questions posed using those terms. Much of the interesting semantic web development will be in seeing what kind of meta data is most useful to leave, and how best to use it. The definitiveDocument property is a simple one, and the simplest way of using it is to let cwm pick it up either from the current store or a schema.

For an example, look at this file expressing query involving US state information.

@prefix : <#>.

@prefix log: <http://www.w3.org/2000/10/swap/log#>.
@prefix state: <data/USRegionState.n3#> .
@prefix city: <data/USCity.n3#>.
# Question: What cities are in states bordering Massachusetts?

{"MA"^state:code.state:borderstate^city:state city:name ?n}
   =>{    ?n a :NAME_OF_CITY_IN_A_STATE_BORDERING_MASSACHUSETTS }.

Its a rather random example (think of a better one? let us know! ;-), to look up the states bordering Massachusetts, and specifically the names of major cities in that state. Massachusetts is identified as the state with state code "MA". So what happens if we load this into cwm and do a --think? Nothing. Cwm just prints out the file reformatted. Now run it in remote query mode with schema fetch:

cwm http://www.w3.org/2000/10/swap/test/dbork/defdoc2.n3 --mode=rse --think

and -- voila -- we get:

    "Albany"     a :NAME_OF_CITY_IN_A_STATE_BORDERING_MASSACHUSETTS .
    
    "Amherst"     a :NAME_OF_CITY_IN_A_STATE_BORDERING_MASSACHUSETTS .
    
    "Avon"     a :NAME_OF_CITY_IN_A_STATE_BORDERING_MASSACHUSETTS .

    ...

and so on.

To do more fancy things, you can run cwm in a mode which loads the

By now you should know how to publish tables of useful information on the semantic web. You should know how to use published data and semantic web services and SQL servers to answer parts of your queries. You are starting to get into some useful scalable stuff, and the next thing you know you'll be needing to reign in your system before it explores the whole world. You'll be needing to think about trust. Fortunately, you already have some of the tools: you can write rules which keep track of data from different places separately. Now all you need is some crypto ....

Next: Web of Trust

Footnotes

1. Of course, the value of this function depends on the real world, which can change. Many systems either assume that other documents won't change, or accept that the information derived from them will change with them. It is would also possible to model the time at which a given representation of a document was returned by a server, and what expiry date was given, and so on, and the reader is welcome to experiment with such schemes where they are needed. Cwm does not currently( 2003/2) provide the functionality of looking inside the HTTP response to extract the protocol headers which convey time-related information.

References

Gerd points to ..."Local Closed World" as a term for what definitiveDocument and log:notIncludes. are doing. Jeff Heflin Paper.

Tim BL, with his director hat off

$Id: Reach.html,v 1.15 2006/01/16 15:26:40 timbl Exp $

Reaching out onto the Web

Looking inside web resources: log:semantics and log:includes

Implementing defaults and log:notIncludes

Thinking inside thinking with log:conclusion

More peeking: log:content, log:n3String, log:uri