Weaving Meaning: The Semantic Web

Eric Miller, W3C Semantic Web Activity Lead

American Association of Law Libraries
2002-07-23
Orlando, FL USA

Slides available at:
http://www.w3.org/Talks/0723-aall-em/

Overview

Background and Context
The Problem
Introduction to Enabling echnologies
Semantic Web and Law Libraries
Future Directions
Additional Information

The Semantic Web: What is it?

Many things to many people...

elephant

The Semantic Web

An interesting bed-time story... still!

a bed time story...

The Current Web

Resources:: identified by URI's; untyped
Links:: href, src, ...; limited, non-descriptive

User:: Exciting world - semantics of the resource, however, gleened from content
Machine:: Very little information available - significance of the links only evident from the context around the anchor.

the current web

The Semantic Web - A Simple Extension to the Current Web

Resources:: Globally Identified by URI's; or Locally scoped (Blank); Extensible; Relational
Links:: Identified by URI's; Extensible; Relational

User:: Even more exciting world, richer user experience
Machine:: More processable information is available
Computers and people:: Work, learn and exchange knowledge effectively

the semantic web

What is the Semantic Web?

The Semantic Web is an extension of the current web, in which information is given well defined meaning, better enabling computers and people to work in cooperation.

Information that has well defined meaning is in a form that machines can understand, rather than simply display.

Machine understandable documents does not imply some magical artificial intelligence allowing machines to comprehend human speech, rather it relies solely on the machine's ability to solve well defined problems by performing well defined operations on well defined data.

or, another way to think about it...

The Semantic Web is a mesh of information linked up in such a way as to be easily processable by machines, on a global scale.

You can think of it as being an efficient way of representing data on the World Wide Web, or as a globally linked database.

Overview

1. Web resources
2. Non-Web resources

2.1 Physical objects

Cars, people, houses, etc.

2.2 Abstract concepts

Sizes, colors, verbs, "love", etc.

"Creator" (e.g., the creator of a document)

"Location"

"Airline reservation"

"Airline reservation service"

Unambiguously Identifying Web Resources

Solution (trivial): URLs

http://www.example.org/index.html

Unambiguously Identifying Physical Objects

Many human systems:

Vehicle Identification Numbers (VIN)
Product serial numbers
UPC product codes
Employee numbers
Etc.

Problems:

Too many formats
Most are not global in scope

Solution: Convert to URIs

http://www.example.com/employeeid/85740

Unambiguously Identifying Abstract Concepts

Solution: Use URIs

Problem: Which URIs?

Need to agree on common vocabulary
Helpful if communities develop naming policies

Solution: Ontology

Ontology

"Formal description of concepts and their relationships"

In other words:

Vocabulary of terms
- "book", "publication", "derivative", "right"
And their relationships
- "book is-a-kind-of publication"
- "derivative is-a-kind-of right"

Dublin Core

Dublin Core
One well-known ontology
Defines 15 basic terms useful for discovery:
- "title", "author/creator", "subject", "publisher"
Each term unambiguously identified by URI
- http://purl.org/dc/elements/1.1/creator

One Global Ontology?

No. Not realistic.

Multiple ontologies will co-exist
Often specialized for problem domain

But:

Can be merged later
"Popularity contest"
URIs => No danger of accidentally using the same name for different concepts

Does an Ontology Really Define Meaning?

Meaning is determined by use.
But an ontology helps

Example: RFC 2119

Defines terms "MUST", "SHOULD", "MAY", etc.
RFC 2119 could be ambiguous
But its adoption in our specs is still helpful

Ontologies and Legal Community

What does <foo:Plantiff> mean?
Is <foo:Plaintiff> the same as <bar:Plaintiff> Is?
Is <foo:Plaintiff> a kind of <legal:Entity> Is?
Web application must agree on semantics
Ontologies can help

Example of Unambiguous Identification

To say: "Web page foo.html was created by John Smith"

Need to unambiguously identify 3 things:

Web page:
http://www.example.org/foo.html
"was created by":
http://purl.org/dc/elements/1.1/creator
"John Smith":
http://www.example.org/staffid/85740

Summary

Adding meaning to web, making the web more "semantic"

Identifying the key problems:
- Ambiguity
- Complexity of information formats
Solving the ambiguity problem
- URIs
- Ontologies

Problem 2: Complexity of Information Formats

Web pages use complex information formats:
- English grammar, page layout, etc.
Easy for human to parse / understand
Hard for machine to parse / "understand"

Example: "Time flies like an arrow"

How to parse?
Which is Subject? Verb? Object?

Need a common, machine-processible information format

Important Characteristics for a Machine-Processible Format

Scalable (the whole Web!)
General
- Allow any info to be expressed
Extremely flexible:
- Allow new data to be added
  - From any source
  - Without breaking existing data/systems
- Allow any kind of query
- Easily combine/join data in new ways

A solution: RDF

What Is RDF?

"Resource Description Framework"
- (But think: "Relational Data Format")
W3C Recommendation
Language for making statements about things
Primarily for metadata
- Author, title, subject, date-of-last-access
Can be used for any kind of statements
Has XML syntax: "RDF/XML"
Uses (simplified, explicit) entity-relational data model

Why a Relational Data Model?

30+ years of database history
Hierarchical before ~1975
Relational since ~1975
Relational model is remarkably flexible
Supports graceful evolution
- Change => Add another table
- Existing queries are unaffected
Easily accommodates new data
- Without affecting existing queries
Allows data to be easily combined ("joined") in new ways

Adapting the Relational Model for the Web

Use URIs as keys
Use subject-verb-object triples instead of tables
- (Simplified relational model)

URIs and Database Keys

How do you unambiguously identify something?

Relational database mantra:

"The key, the whole key, and nothing but the key"

Web mantra:

"The URI, the whole URI, and nothing but the URI"

RDF Triples

All info expressed as triples:

<subject> <verb> <object>
<subject> <property> <value>

Simplified relational model

Example Triple

(Not RDF/XML syntax):

`http://www.example.org/foo.html`	(Subject)
`http://purl.org/dc/elements/1.1/creator`	(Verb/Property)
`http://www.example.org/staffid/85740`	(Object/Value)

Meaning: "Web page foo.html was created by John Smith"

Representing Relational Data as Triples

Table with individual row labeled "Subject", column labeled "Property" and cell value labeled "Value"

Representing Tables as Triples

Any relational data can be represented as triples

Row Key --> Subject
Column --> Verb/Property
Value --> Value

Table as Collection of Triples

Joining Triples to Create a Graph

Triples can be viewed as links in a graph
Equivalent of "joining" in relational database
Joining is automatic in RDF, because:
- Nodes are URIs (unambiguous)

Nodes connected by blue labeled arcs.

Joining Data from Multiple Sources

Trivial: Same URI => same node.

Combination of blue, red and gree networks (or subgraphs). The subgraphs are connected by the nodes that they have in common.

Application Integration: XML Versus RDF


N*N complexity	N*1 complexity

Summary

Concepts of RDF

What is RDF?
RDF as relational data
RDF graphs
RDF and application integration

SECTION 5: Conclusions, Example Applications and Demo

Solutions to Key Problems

Goal: "Machine processible information"

Problem 1: Ambiguity
- Solution: URIs (and ontologies)
Problem 2: Complexity of information formats
- Solution: RDF (or mappable to RDF)

Purpose: Find, share and combine information more easily

What information could be machine processible?

Ideally: All Web data. (Not realistic)

Consumer	Producer	Solution
Machine	Machine	Easy. Use RDF/mappable*
Machine	Human	Easy. Use RDF/mappable*
Human	Machine	Easy. Include RDF/mappable* with human format
Human	Human	Harder: Must manually add RDF/mappable*, requiring: Expertise Tools (e.g., in HTML editors), or AI

*"RDF/mappable" = RDF or RDF-mappable

Where to put machine-processible information?

A: Several possibilities:

Mixed in with xhtml
In HTTP headers
In linked documents
In databases (e.g., search engines)
Etc.

Example RDF / Semantic Web Applications

Open Directory Project (Directory of the Web)
- Used by Google Directory .
DSpace (MIT digital library archive)
epinions.com . . (Product & vendor reviews)
TAP (Semantic search)
Others, see: http://www.w3.org/RDF/#projects

Demo of TAP Semantic Search

W3C's Current Search Search

W3C's Semantic Search Service

What I Hoped to Achieve

Convey basics of Semantic Web
- Purpose
- Reasoning behind it
- (Fill in gaps)
Purpose of ontologies
Purpose of RDF

Creative Commons

RDF Legal Dictionaries

http://rdf.lexml.de/ Open Source Development of an RDF Dictionary Open source development of a multi-lingual and multi-jurisdictional RDF Dictionary for the legal world.

Legal Citations

Example...

Example

Caveat: Everything i learned from legal citations I learned from 

http://philip.greenspun.com/politics/litigation/reading-cites.html

 
    -- the promise of the "semantic web" is automating
    (parts of) social protocols, and those social protocols
    are often grounded in law

    -- there are established conventions for legal citations

    -- more and more legal proceedings are published via the web
    all the time

    -- those legal proceedings are often copied in many places,
    and there's no recognized canonical URI for them, so
        -- caches don't help
        -- my browser doesn't tell me I've been there before
        -- etc.

So... some ideas...
    -- an RDF schema for legal citations
        (probably one schema per jurisdiction, with lots of
        sharing and sublcassing)

    -- a corresponding HTML form for each jurisdiction that, in effect,
    allows you to compute the address of a document

To take the example from philg's tutorial:

    Ford Motor Co. v. Lonon, 2117 Tenn 400, 398 S.W.2d 240 (1966)

Perhaps in RDF, I'd spell that:

         <rdf:Description rdf:about = "uri">
i      <plaintiff>Ford Motor Co.</plaintiff>
       <defendant>Lonon</defendant>
       <volume>2117</volume>
       <jurisdiction>Tenn</jurisdiction> 
       <page>400</page>
         </rdf:Description>