SemWalker
- A data browser, aka "semantic web browser "(cf Longwell, Tabulator)
- Demonstrates "downhill steps" to decentralizing the Data Web
- In progress
What is a Semantic Web Browser?
Software which lets users browse RDF data published on the web
- Single-Item Page
- Multi-Item Page
- Class-Specific Views
- Real-Time Harvesting
- Real-Time Inferencing
- Control Over Sources
- A better way to make data-oriented web sites
Hopefully the term "Data Browser" will come to mean this
Single-Item Page
see a lot information about something
- Person
- Book
- Meeting
- Photograph
- RDF Property
Multi-Item Page
see a little information about each of many items
- The people danbri foaf:knows
- Books on Danny's wish list
- DIG meetings
- Photographs of the W3C team
- Properties with the word "sister" in their docementation
Class-Specific Views
different content and layout for different class items
- People cf friendster, orkut, department pages, okcupid
- Book cf amazon, library card-catalogs
- Meeting cf conference website, agenda-email
- Photograph cf flikr, custom photo sites
- RDF Property cf ontaria 9-board, ...?
Real-Time Harvesting
fetch the data from the web when the user wants it (with caching)
delays in publication are often unacceptable
- Author's feedback-loop
- Meeting agenda changes
- Photographs we just took
- (isn't this obvious)
Real-Time Inference
Let people rely on the formal semantics; let the machines do the work
- subclass/subproperty
- vocabulary mapping rules
- ...?
Control Over Sources
Let readers see (and control) where data came from
one possible design:
-
- blue -- on reader's trusted-sites list
- white -- trusted by uri-owner
- grey -- 3rd part
- red -- on reader's highly-suspect list
- combined items (eg inferences) colored at least-trusted level
- mouse-over or nearby [!?] icon for more details
- list of Sources at the bottom of the page, with Mark-As-Trusted and Mark-As-Highly-Suspect
- harvesting and inference depths much greater on more-trusted information
A Better Way To Make Data-Oriented Web Sites?
- What big web-sites are not data oriented?
- Content-Providers:
- expose RDF views of their database(s)
- host an off-the-shelf SemWalker installation (or contract for one)
- can decentralize internally (eg between sales, marketing, shipping, customer relations, etc -- all are automatically integrated on "their" website)
Consumers (Data Users)
- Get better data-oriented web-sites
- Get to use their own data browser
- Incidentally get to query/integrate across provider sites
Market Forces
- Providers looking for best server-side data browsers (SemWalker-clones)
- Consumers looking for best data browsers
- Providers become notably absent from "Data Web" for consumers using their data browsers (presure to expose data)
SemWalker Strategy
- Offers software that lets people build world-class websites just by publishing their public data as RDF
- ... (they might have to add data-organization metadata)
- ... (and an RDF-based transactional-services interface)
Downhill steps to the Semantic Web
Applications
(@@@ expand these examples)
- social networking (view of Person)
- photography (view of a Photo)
- shared vocabularies (view of Ontology, Class, Property)
- shared calendars (view of TimeRange, Event)
- collaboration (view of Project)
- shopping (view of ItemForSale)
@@? Some Requirements
- Allow data source branding
- As usuable as a typical data-oriented website (simple, fast, reliable)
- Customizable views for new Classes
- Privilege of term-origin data (over 3rd party data)
- Quiet-but-available inference
- Good Caching
- Access to search features
The Code
- just my work, so far
- most development driven by Ontaria
- Key item classes: Ontology, Class, Property
- Central-hosting plan (portal style), at w3.org
- Extensive shared indexing - plan for 10^8 triples?
Harvester
- prioritized threads (hi-queue, low-queue)
- all harvest data visible as RDF
- reads RDF/XML, HTML
- fast, but there's lots of cruft out there
- breaks large graphs into chunks (10K triples)
- can keep old copies of data
- follows redirects, cache-control, etc.
Indexer
- separate thread, running behind harvester
- maps keyword -> (subject, chunk)
- map uri -> chunks that use it
- broken berkeley db implementation
RDF Store
- library('semweb/rdf_db') in memory (an 8-way indexed quad store)
- rdfpage library provides fast loading from disk (pre-parsed)
- pagein_around
- loads all the chunks that use this URI as subject/value
- (may need ontology-smarts -- ie pagein_around(rdf:Resource)
- uris given to chunks are all hosted/served
- backward-chaining of rules (as prolog, but incremental depth-limited)
Inference
say more? forward chaining, too, eg for index
- experimentally reads surnia rules for OWL inference
Views
rules for mapping from triples to XHTML trees
- Current implementation makes a tab for each that applies
- I think they're pretty easy/fun to write