SIOC/PythonSurfTalk

http://www.europython.eu/talks/cfp/

Structure / Outline

Introduction

(an into to DERI and ourselves)

There is all this interesting data out there: FOAF, SIOC, DBPedia, GeoNames, ...

Cool things can be done in exploring this information, especially when combining data from multiple sources. We will show you how.

SPARQL - similar to SQL (or to what SQL is for databases) - works the same regardless of what tools you use (can demo a simple / non-programmatic web SPARQL query form just to show the principle).

Examples of some cool SPARQL queries

The product - SURF

SPARQL is cool, but you do not need to go to this level of detail (unless you want it).

Same as ORM hides from programmers the complexity of relational databases, tools like SURF hide the intricacies of RDF and SPARQL. You don't need to worry about them - it's MAGIC :D

Introduction to SURF.

Inspired by ActiveRDF for Ruby...

Installation, usage, examples

(we probably need to show people how to collect some data in the datastore. otherwise they will not have data to play with. what we can do is provide some example datasets and simple utilities that allow them to load it into a datastore. they can always load more data by supplying URLs to utility code or doing WGET and then loading resulting RDF)

IGNORE THIS FOR NOW SPARQL is cool, but you do not need to go to this level of detail (unless you want it).

Same as ORM hides from programmers the complexity of relational databases, tools like SURF hide the intricacies of RDF and SPARQL. You don't need to worry about them - it's MAGIC :D

Introduction to SURF.

SuRF is inspired by the work conducted in ActiveRDF for Ruby. Trying to follow similar concepts of closing the gap between RDF data and the Object Oriented paradigm for python programmers. SuRF allows one to navigate the structured data exposed on the web from various RDF producers. For example the Python Cheese shop exposes DOAP data describing python packages, or a certain person exposes their FOAF profile on their personal web page. SuRF is designed to give access to all this data by encapsulating it into a Resource class, which the programmer can manipulate at will. Installation: SuRF is packaged as a standard egg file, so it can be installed just by calling easy_install Architecture and usage: Due to the nature of data access methods developed for the semantic web (query languages such as SPARQL, or access protocols such as the Sesame HTTP api), SuRF is built around a Store class which facilitates access to the data through installed plugins. The Store exposes a reader plugin and a writer plugin, some data stores are read-only such as the Semantic Wikipedia - DBPEDIA, it exposes it's data through a SPARQL (readonly) endpoint, while others offer read-write access to the data e.g.: any Sesame based store, AllegroGraph or Virtuoso. By default SuRF comes prepackaged with plugins that support reading from SPARQL endpoints, reading and writing from Sesame2 HTTP and also from Sesame2 API (used by the AllegroGraph RDF data-store). Creating and registering plugins is an easy process and is based on the metaclass concept. Let's create a plugin FooReader that reads a Foo data source which exposes RDF through a query interface of some sort.

from surf.store.plugins import RDFReader, UnsupportedResultType

class FooReader(RDFReader): __type__ = 'foo-reader' # this must be a unique plugin key as registered in the metaclass .... # implement methods from RDFReader to provide functionality

to register the plugin with the Store, one needs only to import the FooReader class before using it, doing so the Metaclass RDFReader uses registers the FooReader type automatically as a plugin.

import FooReader # registers the plugin with SuRF

Using SuRF to get access to SuRF resources all you need to do is to import surf like this

from surf import *

create a Store object [TO BE CONTINUED]

Installation, usage, examples

(we probably need to show people how to collect some data in the datastore. otherwise they will not have data to play with. what we can do is provide some example datasets and simple utilities that allow them to load it into a datastore. they can always load more data by supplying URLs to utility code or doing WGET and then loading resulting RDF) END IGNORE THIS

Advanced stuff (if time permits)

Here we can cover advanced topics (e.g., publishing of linked data, ...)

Conclusion

Interested in feedback, developers, ...

Notes

(if redland command line tools are installed, can mention that "rapper" is useful for debugging to check if RDF data is valid - but goes into the level of complexity we may as well want to skip)

CFP Requirements

Author name(s)

Cosmin Basca

Contact Information

we know where Cosmin lives

Preferred timeslot (30 minutes, 45 minutes, 60 minutes)

45 minutes

Title of proposed presentation

Tapping into the Web of Data with Python and SuRF

one line summary

Learn how to use structured data on the Web, avoid screen scrapping and access data in a unified way

Summary of proposed presentation

Uldis intro + Cosmin's description of SURF

Ever since the dawn of mankind :D

There are vast amounts of data available on the Web, made available by the community Linked Open Data project and many online community sites (using data formats such as FOAF, SIOC and DOAP). Similarly as the Web already allows documents to link to each other, linked data refers to linked, machine-readable data sets on the Web.

The Open Linked Data project is a community effort to make existing, well-known data sets like Wikipedia, Geonames or Musicbrainz available as RDF data on the Web and ready for reuse. Developers get a unified interface to data from many sources instead of one API for each site.

In order to take advantage of this, there is a need for easy-to-use tools that allow developers to work with this data without knowing the specifics of RDF, SPARQL or other related Semantic Web technologies.

In this talk, we will outline the SuRF library, that provides Python developers a way to build Semantic Web applications using RDF data in an easy way. We will show how it can be used to provide advanced browsing interfaces and mash-ups services on the top of existing data, and how it can be efficiently combined with other frameworks such as Pylons for Web-based applications.

SuRF is an open-source "Object-RDF Mapper", similar to object-relational mappers (e.g., SQLAlchemy). One can access SuRF objects and use their properties through a virtual API generated on the fly by reflecting the underlying RDF data. SuRF is built around a plugin architecture with built-in support for the most common Semantic Web data access protocols and API's.

Talk outline: (1) Introduction; (2) Architecture of SuRF; (3) Usage of SuRF; (4) Advanced applications (e.g. integration with Pylons)

Note to organisers: SURF will be available to public with open source (BSD-type) license at the end of April 2009. This proposal offers the first public demo of this library.

scrapbook (text that is cut out for now)

(e.g., SPARQL is read-only, while the Sesame protocol also supports writing).

Extending the plugin base is a simple matter of just extending the RDFReader or RDFWriter class and importing it into the project, the underlying metaclass based plugin repository will automatically register it into SuRF's session object.

By default SuRF supports the SPARQL protocol, the Sesame2 HTTP protocol with the AllegroGraph extension and the Franz's AllegroGraph python replication of the Sesame2 API in Python.

is the analogue of ORM for RDF data. Thus, people can use SURF to browse and query linked RDF data just using regular Python object access.

"Object-RDF Mapper" similar to object-relational mappers (such as SQLAlchemy). It was inspired by ActiveRDF, a Ruby-on-Rails library for

Benjamins version

As the Web has recently turned 20 years old, recent advances in the research community have focused on putting structured data on the Web, called "Linked Data". The Web already allows documents to link to each other in a decentralised manner, without any central point of control. In a similar way, Linked Data refers to structured and machine-readable data sets using the RDF standard, which contains links to other data sets. The Open Linked Data project is a community effort to make existing, well-known data sets like Wikipedia, Geonames or Musikbrainz available. In this talk we will explain how Linked Data gives developers a unified interface to data from many sources, and how to use this interface and the resulting data in Python. We will discuss the standards and the architecture behind Linked Data, demonstrate how to use our tool SURF to access RDF, and give examples using data from Wikipedia and social networking sites.

Alexandres version

As the Web recently went 20 years old, recent advances in the research community focused on the Semantic Web, i.e. an extension of the current Web enabling global interoperability between applications, while keeping intact the distributed architecture of the Web. More recently, thanks to the Linking Open Data community project, lots of data have been provided in RDF (the representation format for Semantic Web data), freely available to reuse for querying and mash-up purposes. Yet, there is still a need for easy tool for developers so that they can take advantage of this data without having to learn the underlying principles and related Semantic Web technologies. In this talk, we will outline the SURF framework, that provides Python developers a way to build Semantic Web applications using RDF data in a easy way. We will show how it can be used to provide advanced browsing interfaces and mash-ups services on the top of existing data, and how it can be efficiently combined with other frameworks such as Jango for Web-based applications.

Presentation outline

Benjamin, extra short outline:

Overview of the Linked Data idea:

one API for every site versus having a unified API based on a common data format
Linked Data uses the new standard RDF on top of good old HTTP and URLs
the Open Linked Data community implements the 4 simple Linked Data principles

Using Linked Data with Python and SURF:

SURF provides object oriented access on top of redlands RDF storage
show how to install everything

Examples and demonstrations:

Do very complex queries or combine data from multiple sites
Many different flavours of data are available, like Wikipedia or social networking sites

Benjamin version 1, still to long:

Overview of the Linked Data idea:

the web currently has a multitude of standards and APIs for data
Linked Data is a unified approach to publishing and accessing data on the web
Linked Data reuses the existing architecture of the Web: HTTP and URIs
new standard: RDF specifies how to structure the data
explain Linked Data principles
show how the Open Linked Data community effort implements them

Using Linked Data with Python and SURF:

Introduce python tools for using RDF: redland and SURF
explain the different parts: redland as the local RDF store, SURF as the ORM
explain installation

Examples using different flavours of data:

give a generic example of accessing RDF properties
complex query from one data source: Wikipedia
complex query combining two data sources: wikipedia and geonames
compare to screen scrapping
example with SIOC and FOAF
compare to using one API for each site having social profiles to just getting one foaf file from every site and using one interface

Alexandres version: - The Semantic Web - The LOD initiative - Limits of current Python APIs for SW / LOD - Surf - Application examples - Integrating SURF in existing applications

Intended audience (non-programmers, beginning programmers, advanced users, CPython developers, etc.)

Session lengths include time for audience questions, and session switching. You should budget at least five minutes for questions and five minutes for changeover; for example, a 30-minute talk will be 20 minutes of presentation, 5 minutes of questions, and allow 5 minutes for delegates to switch sessions.

Submit form: http://www.europython.eu/talks/submit

Bios

Cosmin Cosmin is a PhD student with Digital Enterprise Research Institute in Galway. He is doing research in the field of Semantic Web and likes to use Python in his work.

picture:

Uldis

Benjamin