graphic with four colored squares
Cover page images (keys)

Mission Possible: Deploying Government Linked Data (Pt2)

Sandro Hawke, (sandro@w3.org), W3C/MIT, @sandhawke
John L. Sheridan, @johnlsheridan
gov 2.0 expo, May 25-26, 2010, Washington DC
http://www.w3.org/2010/Talks/0525-rdf-vocabularies (wiki)

Part 2

Viewing Your Data as Triples

  1. Good URIs
  2. Properties (relationships and attributes)
  3. Overlap and Competition
  4. Classes (Description Logic)
  5. Finding and/or Creating Vocabularies

Vocabulary as Interface

How do programs communicate via triples?

Alice and Bob publish triples, Charlie's software tries to use their data.

It's all about the specific URIs, the vocabulary.

Essentially, when using RDF, the vocabulary is the syntax, the API.

Use Case: Crime Reports

Data Providers:

Data Consumers:

Identify Your Subjects

What are the things your data is about?

(items, entities, objects, individuals, resources)

in scenario: incident, location, suspect/convict, trial, stolen/damaged property, victim

See UML, Database Records, your web site

Assign Good URIs

give them good, long-term URI names

Pick URIs that:

Such as:

See Designing URI Sets for the UK Public Sector

Some Bad (Linked Data) URIs

Misleading Names

Can't Derefernce

Dereference Doesn't Work

Conflating Item and Page About Item

Unreadable names (usually):

Namespaces

Shared Leading Prefix:

Namespace name: "http://dbpedia.org/resource/"

Confusingly similar to XML namespaces; sometimes the same, sometimes different.

Namespace document:

Often the same as dereferencing things in namespace.

URI Lifecycle

Stage 1: Unstable

Stage 2: Stable

URI Lifecycle (cont)

Stage 3: Deprecated

Stage 4: Dead

URIs in a namespace can be at different stages.

Don't version namespaces. Don't be fooled by:

(They were chosen before we knew this, and now they're stuck.)

Properties

Essential to understand a triple, the middle part

Also known as:

triple.png

Property As Question

Each triple states the answer to a question.

Object and Data Properties

A Data Property:

An Object Property:

Data Types

RDF uses some XML Schema Part 2: Datatypes, usually these ones.

Each datatype is a mapping:

... etc

Some Well-Known Properties

@@@ make these be links to real documentation

rdfs:label

rdfs:comment

owl:sameAs

foaf:name

dc:creator

Overlapping, Competing Vocabularies

two terms for the same thing

owl:sameAs

owl:equivalentProperty

two terms for possibly-identical things. splitting hairs:

Compare: City of Boston (politcal entity) City of Boston (geographic region)

The Pedantic Web

Subproperty

(dc refines)

rdfs:subPropertyOf

geographically near, overlapping, contained-within

Messy Overlap

Not all related properties are equiv or sub

foaf:firstName, foaf:givenName, foaf:lastName, foaf:familyName, foaf:name


Conversion Rules

        if { ?x foaf:firstName ?first;
                foaf:lastName ?last }
        then
           { ?x foaf:familyName ?last;
                foaf:givenName ?first;
                foaf:name func:string-join(?first " " ?last)
           }

        if { ?x foaf:name ?name } and
           pred:contains(?name, " ")
        then
           # incorrect if lastname has space, like Hillary Rodham Clinton
           { ?x foaf:firstName func:string-before(?name, " ");
                foaf:lastName func:string-after(?name, " ")
           }

See Rule Interchange Format

Applying Rules

Advice?

The world is full of competing standards. That's good, but painful.

Probably best to follow Postel's Law:

 Be conservative in what you do; be liberal in what you accept from others.

Research topic: automatic downloading of conversion rules

Classes and Subclasses

Sets of objects with something in common.

Instances / rdf:type

Subclass hierarchy

plant / large_plant / tree / mature_horse_chestnut, the one in my back yard

Domain and Range

domain: the class of things which might have this property.

range: the class of possible values for this property

DomainPropertyRange
foaf:Personfoaf:firstName(string)
dcat:Catalogdcat:recorddcat:CatalogRecord
eg:Parenteg:daughtereg:Female
rdfs:Propertyrdfs:domainrdfs:Class

OWL

Powerful way of declaring how properties, classes, and individuals relate to each other.

"Ontologies"

http://www.w3.org/TR/owl2-primer/

OWL can be conveyed in triples, but also has some easier-to-read syntaxes. I suggest Manchester, when you don't need triples.

Inference

machines ("reasoners") can process these ontologies

given:

they will infer

Which is great if you're querying for child and you have some parent data.

Also helps find errors in data and modeling.

Technology from AI research

http://www.w3.org/2001/sw/wiki/Category:Reasoner

SKOS

A less formal way to document your URIs.

Everything is a Concept. General broader/narrower.

Good when you want to quickly leverage existing controlled vocabulary.

http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/

Finding Vocabularies

watson

swse

sindice

Falcons

sameas.org

http://www.w3.org/2001/sw/wiki/Category:Search_Engine

Browsing Vocabularies

use the HTML documentation

use an ontology viewer http://www.w3.org/2001/sw/wiki/Category:Visualizer

Creating Vocabularies

text editor

protege

topbraid composer

neologism

http://www.w3.org/2001/sw/wiki/Category:Editor

Good Modeling

  1. Which items are you communicating about?
  2. What are the logical groups (classes) of those items?
  3. What properties can each kind of item have?