Tutorial/RDF Vocabularies

From W3C eGovernment Wiki
Jump to: navigation, search

This is part 2 of where John and Sandro are developing the slides for their gov2expo talk.

There is a simple script which turns this page into the slidy (real) version.


Title Mission Possible: Deploying Government Linked Data (Pt2)
Author Sandro Hawke, (sandro@w3.org), W3C/MIT, @sandhawke
John L. Sheridan, @johnlsheridan
Event gov 2.0 expo, May 25-26, 2010, Washington DC

Part 2

Viewing Your Data as Triples

  1. Good URIs
  2. Properties (relationships and attributes)
  3. Overlap and Competition
  4. Classes (Description Logic)
  5. Finding and/or Creating Vocabularies

Vocabulary as Interface

How do programs communicate via triples?

Alice and Bob publish triples, Charlie's software tries to use their data.

It's all about the specific URIs, the vocabulary.

Essentially, when using RDF, the vocabulary is the syntax, the API.

Use Case: Crime Reports

Data Providers:

  • Springfield Police Department "Police Blotter"
  • Springfield FBI Field Office
  • Springfield Citizen's Watch

Data Consumers:

  • Journalists
  • Mobile Apps (AreYouSafe)
  • Real Estate Listings
  • Driving Directions

Identify Your Subjects

What are the things your data is about?

(items, entities, objects, individuals, resources)

in scenario: incident, location, suspect/convict, trial, stolen/damaged property, victim

See UML, Database Records, your web site

Assign Good URIs

give them good, long-term URI names

Pick URIs that:

  • no one will want for something else
  • that may become unfashionable, but wont become wrong
  • that someone will web-serve forever

Such as:

See Designing URI Sets for the UK Public Sector

Some Bad (Linked Data) URIs

Misleading Names

Can't Derefernce

Dereference Doesn't Work

Conflating Item and Page About Item

Unreadable names (usually):

Namespaces

Shared Leading Prefix:

Namespace name: "http://dbpedia.org/resource/"

Confusingly similar to XML namespaces; sometimes the same, sometimes different.

Namespace document:

Often the same as dereferencing things in namespace.

URI Lifecycle

Stage 1: Unstable

  • Coordinate with all users about every change in meaning
    • Easy when you're the only user
    • Gets harder, slower as user community grows
    • Some users will avoid unstable terms

Stage 2: Stable

  • No incompatible changes in meaning.
    • okay to improve/rewrite documentation
    • don't break people's running code
    • maybe change meaning to resolve harmful ambiguity
      • (break minority code)

URI Lifecycle (cont)

Stage 3: Deprecated

  • Still stable, still served, but not recommended for use
  • Should no longer be produced; still consumed for a while.

Stage 4: Dead

  • URI can no longer be de-referenced. Best to avoid this.

URIs in a namespace can be at different stages.

Don't version namespaces. Don't be fooled by:

(They were chosen before we knew this, and now they're stuck.)

Properties

Essential to understand a triple, the middle part

Also known as:

  • relation, relationship,
  • predicate
  • column, column-name
  • attribute
  • field, member, slot (sort of)

triple.png

Property As Question

Each triple states the answer to a question.

  • The subject: the item the question is about
    • you, me, Massachusetts, the moon, crude oil, ... whatever
  • The property: the question
    • who created it? when was it created? where is it located?
  • The value: the answer to the question

Object and Data Properties

A Data Property:

  • value is a literal (string, number, date, etc)
    • data_created, height, weight, name, name_of_owner

An Object Property:

  • value is another first-class entity (not just a literal)
    • owner, near, friend, hometown, capital

Data Types

RDF uses some XML Schema Part 2: Datatypes, usually these ones.

Each datatype is a mapping:

  • from character strings, eg "3" or "3.0" or "0003"
  • to their values, eg the number three
  • xs:decimal
    • xs:integer
      • xs:int
        • xs:byte
  • xs:time
  • xs:dayTimeDuration

... etc

Some Well-Known Properties

@@@ make these be links to real documentation

rdfs:label

rdfs:comment

owl:sameAs

foaf:name

dc:creator

Overlapping, Competing Vocabularies

two terms for the same thing

owl:sameAs

owl:equivalentProperty

  • dc:creator owl:equivalentProperty dcterms:creator
  • foaf:name owl:equivalentProperty vcard:FN

two terms for possibly-identical things. splitting hairs:

  • eg: Peter Pan (various fictional works, productions, editions, copies, characters)

Compare: City of Boston (politcal entity) City of Boston (geographic region)

The Pedantic Web

Subproperty

(dc refines)

rdfs:subPropertyOf

  • sandro contact john
    • sandro friend john
      • sandro recentFriend john
    • sandro presentedWith john
  • friend rdfs:subPropertyOf contact
  • recentFriend rdfs:subPropertyOf friend
  • presentedWith rdfs:subPropertyOf contact

geographically near, overlapping, contained-within

Messy Overlap

Not all related properties are equiv or sub

foaf:firstName, foaf:givenName, foaf:lastName, foaf:familyName, foaf:name


Conversion Rules

        if { ?x foaf:firstName ?first;
                foaf:lastName ?last }
        then
           { ?x foaf:familyName ?last;
                foaf:givenName ?first;
                foaf:name func:string-join(?first " " ?last)
           }

        if { ?x foaf:name ?name } and
           pred:contains(?name, " ")
        then
           # incorrect if lastname has space, like Hillary Rodham Clinton
           { ?x foaf:firstName func:string-before(?name, " ");
                foaf:lastName func:string-after(?name, " ")
           }

See Rule Interchange Format

Applying Rules

  • In custom code, or using rules engine
  • In producer (publish using many vocabs)
  • In consumer (accept many vocabs)

Advice?

The world is full of competing standards. That's good, but painful.

Probably best to follow Postel's Law:

 Be conservative in what you do; be liberal in what you accept from others.

Research topic: automatic downloading of conversion rules

Classes and Subclasses

Sets of objects with something in common.

Instances / rdf:type

Subclass hierarchy

plant / large_plant / tree / mature_horse_chestnut, the one in my back yard

Domain and Range

domain: the class of things which might have this property.

range: the class of possible values for this property

Domain Property Range
foaf:Person foaf:firstName (string)
dcat:Catalog dcat:record dcat:CatalogRecord
eg:Parent eg:daughter eg:Female
rdfs:Property rdfs:domain rdfs:Class

OWL

Powerful way of declaring how properties, classes, and individuals relate to each other.

"Ontologies"

http://www.w3.org/TR/owl2-primer/

  • class expressions ( US_Citizen and Irish_Citizen and not Minor )
  • ChildofSandro (anythin with 'parent' being Sandro)
  • BigTree (trees with height over 30 feet)
  • inverse properties
    • Sandro child Gregorian
    • Gregorian parent Sandro
    • => parent owl:inverseOf child
  • negative assertions (some triple is false)

OWL can be conveyed in triples, but also has some easier-to-read syntaxes. I suggest Manchester, when you don't need triples.

Inference

machines ("reasoners") can process these ontologies

given:

  • Gregorian parent Sandro
  • parent owl:inverseOf child

they will infer

  • Sandro child Gregorian

Which is great if you're querying for child and you have some parent data.

Also helps find errors in data and modeling.

Technology from AI research

http://www.w3.org/2001/sw/wiki/Category:Reasoner

SKOS

A less formal way to document your URIs.

Everything is a Concept. General broader/narrower.

Good when you want to quickly leverage existing controlled vocabulary.

http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/

Finding Vocabularies

watson

swse

sindice

Falcons

sameas.org

http://www.w3.org/2001/sw/wiki/Category:Search_Engine

Browsing Vocabularies

use the HTML documentation

use an ontology viewer http://www.w3.org/2001/sw/wiki/Category:Visualizer

Creating Vocabularies

text editor

protege

topbraid composer

neologism

http://www.w3.org/2001/sw/wiki/Category:Editor

Good Modeling

  1. Which items are you communicating about?
  2. What are the logical groups (classes) of those items?
  3. What properties can each kind of item have?
  • use your existing models (UML, SQL, Spreadsheets)
  • expect to change your design over time