AI and SW
- SW is not AI and AI is not SW
- AI is a field; SW is a project
- SW owes a debt: used much from AI
- SW should be a great playground for AI
- AI projects should use SW to interoperate
What is AI anyway?
- Lots of bits of CS including functional languages,
machine learning, recognition, speech, etc., etc.?
- An attempt to make smart machines?
- An attempt to make machines which behave like people?
- An attempt to be collectively more powerful?
What is SW?
- Data Interoperability across applications and organizations (for IT)
- A set of interoperable standards for knowledge exchange
- A architecture for interconnected communities and vocabularies
The RDF data bus
The RDF bus connects data sources and applications
Sem Web architecture 101
- Give important concepts URIs.
- Each URI identifies one concept.
- Share these symbols between many languages
- Support URI lookup
Define symbols:
- Using natural language (bootstrap?)
- By reference to existing systems (eg GPS)
- By mathematical relation to others (raft)
Chis Welty/IBM: "In the
Semantic Web, it is not the Semantic which is new, it is the
Web which is new".
Web architecture 101
- Things are denoted by URIs.
- Use them to denote things.
- Serve useful information at them.
- Dereference them.
Example
http://www.w3.org/People/Berners-Lee/card#i
http://www.w3.org/People/Berners-Lee/card (in N3,
summarized):
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix : <#>.
:i a foaf:Person;
foaf:family_name "Berners-Lee";
foaf:givenname "Timothy";
foaf:homepage <http://www.w3.org/People/Berners-Lee>;
foaf:mbox <mailto:timbl@w3.org>.
URI + HTTP architecture 1
The hash is an operator which
joins a local identifier to a document URI to give a global
identifier.
http://example.com/foo#bar
- Strip off #bar
- Look up http://example.com/foo using HTTP:
- Look up example.com giving 128.0.0.1
- Request foo from 123.0.0.1
- 200 OK is returned
- Parse the result according to the Internet content
type
- This gives you information about bar
URI + HTTP architecture 2
Post TAG resolution of
HTTPRange-14, an
optional
possible operation is:
given http://example.com/foo/bar
- Strip off #bar
- Look up http://example.com/foo using HTTP as amended
by TAG:
- Look up example.com giving 128.0.0.1
- Request foo from 123.0.0.1
- You get a redirection 303 See Other response,
indicating that the URI did not denote an information
resource, but mentioning a new resource
http://example.com/foo-schema.rdf
- Request http://example.com/foo-schema.rdf
- Get a 200 OK response
- Parse the result according to the Internet
content type
- This gives you information about <http://example.com/foo/bar>
Not recommended by me.
Other issues: Content negotiation between HTMl and RDF.
LSIDs
Breadcrumbs ethos
- Leave information for others to follow
- a Breadcrumbs protocol = what to leave where + what links
to follow when
- Delegated query is a special case of breadcrumbs
- Challenge: For delegated query, how to describe the
sort of query a service or document can answer
Mythbusting
Myth: "The Semantic Web technology is
Description Logic"
No, OWL is one semantic web language.
It is important that applications which need different
expressiveness can use it.
But other languages must interoperate to the greatest extent
possible.
They should use URIs
They should not reinvent functionality already provided by
standards.
SW Arch: Same symbols, multiple languages
Mythbusting: Not just public data
- The SW is not just about public data.
- It also about personal, group, agency and enterprise
data.
- Historically, intranet servers preceded extranet
servers.
For example in biopax
When will the patterns all
connect?
[Diagram: Joanne
Luciano, Predictive Medicine; Drug discovery demo using RDF,
Sideran Seamark and
Oracle 10g]
Other myths
- "The semantic Web is
metadata for classifying documents"
- "The semantic web is
about hand-annotated web pages"
Such pages are interesting, but not the mainstay of semantic
web: too much trouble!
- "The semantic web is
mainly about content extracted from text"
No, it is primarily an interlingua for relational data and
logic. bridges will always be important
- "The Semantic Web
is about making one big ontology"
The semantic web is about a fractal mess of
interconnected ontologies....
- "The semantic web
ontologies must all be consistent"
Only the parts I am using together
"Ontology": two patterns
Schema | Taxonomy |
E.g. Bank statement: account, date, amount |
E.g. Human anatomy and diseases |
Simple ontology. Often documents existing practice |
Complex ontology, difficult to make |
Changes rarely | Changes continually |
Domain knowledge is in the data not the ontology. |
Domain knowledge is in the ontology itself |
Use now in enterprise and science IT |
Specific fields e.g. life sciences |
Both patterns are important. Both patterns use OWL.
NLP vs Semantic Web
NLP |
Semantic Web |
Words |
Terms of logic |
Meaning is use |
Meaning is defined in words or code or specific use. |
Word reused by everyone, no ownership |
URI ownership - go get your own |
"Hydrogen" |
http://example.com/foo.rdf#Hydrogen |
Defining words in terms of ontology complex, unsatisfying, never complete and a waste of time. |
Defining terms using words is never perfect but useful. |
Natural language constantly changing |
Ontologies basically static |
Can't benefit from injected logic |
Can't benefit from cloudy stats from corpora data |
Machine can find stuff |
Machine can make widespread inference |
Distractions: Meaning of meaning
Meaning as definition | Community Standards | Meaning as Use |
Everyone uses the same precise definition of each term. |
Specific communities agree to share good-enough definitions |
Language changes with time |
Works in small close systems |
If communities overlap, can be global |
Works for poetry, makes rich natural language. |
Very hard work to set up |
Finite work to set up |
No effort up front, much afterward. |
| | |
When we build a system (the SW), this is a choice,
not an observation
- Communities will be of many sizes.
- There will be very many small ones (6.10^9 of size 10^0)
and a few global ones (e.g. W3C Rec'n)
- Kleinberg shows
that fractal (1/f) distribution is optimal under some
assumptions
- Swoogle
results for example (right)
- We have less experience when fractal is not constrained
to a 2D surface.
Total Cost of Ontologies (TCO)
Assume :-) ontologies
evenly spread across orders of magnitude; committee size
as log(community), time as committee^2, cost shared across
community.
Scale |
Eg |
Committee size |
Cost per ontology (weeks) |
My share of cost |
0 |
Me |
1 |
1 |
1 |
10 |
My team |
4 |
16 |
1.6 |
100 |
Group |
7 |
49 |
0.49 |
1000 |
|
10 |
100 |
0.10 |
10k |
Enterprise |
13 |
169 |
0.017 |
100k |
Business area |
16 |
256 |
0.0026 |
1M |
|
19 |
361 |
0.00036 |
10M |
|
22 |
484 |
0.000048 |
100M |
National, State |
25 |
625 |
0.000006 |
1G |
EU, US |
28 |
784 |
0.000001 |
10G |
Planet |
31 |
961 |
0.000000 |
Total cost of 10 ontologies: 3.2 weeks. Serious project: 30
ontologies, TCO = 10 weeks.
Lesson:
Do your bit. Others
will do theirs.
Thank those who do working groups!
User Interface challenges
Domain-specific user interfaces are
blossoming... but what about generic ones?
- Generality: can browse any data anywhere
- Dynamically pick up from ontology: Lenses, style,
forms
- Independent control of: style, provenance, domains
(vocabulary groups)
- Blow spreadsheet tools away
Goals for Rules
- Interoperability between Rule-based systems
- Extend the expressive power of shared knowledge
The next step in the Semantic Web roadmap
- New markets for
rule systems
- Mapping between ontologies
- Interlocking with other Semantic Web languages RDF, OWL,
SPARQL
- Serendipitous re-use of
knowledge in Rule form
Web attitude
- Anyone can say anything about anything
- No one knows everything about anything
-> scoped negation as
failure
- My system is most valuable because of its interconnection
to its peers
Artificial Intelligence as powerful systems
- @@
- AI operating on SW data
Thank you
http://www.w3.org/2006/Talks/0718-tbl/
Tim Berners-Lee
CSAIL, MIT
END
You have gone too far.
Components: Adapting random files
Keep your existing systems running - adapt them
Components: Triple store
Virtual severs actually figure stuff out as well as look up
data
Adapting SQL Databases
Keep your existing systems running - adapt them
Adapting XML
Remember- RDF on an HTTP server can always be virtual
Adapting XML: GRDDL
Remember- RDF on an HTTP server can always be virtual
Components: Smart servers
Virtual severs actually figure stuff out as well as look up
data
Evolution: Urges and Capacities
A selection
The bits computers find hard
- Most of them a puppy can do
- There is a balance of motivations
- Wetware can imitate hardware, and vice-versa, but extremely inefficiently
- Evolution and Genetics and Computation all connect
Emergent Systems?
FAQ: Will the WWW produce emergent phenomena?
- Its own agenda
- A code for passing on design
- A selective environment
Example: The corporation
- A very individal agenda
- Code: Corporate law and corporate culture
- A very selective environment
- Do
Azimov's 3 laws apply to tobacco & soda companies?
We are not necessarily in control of things we create.