SWAD-Europe Workshop on Semantic Web Storage and Retrieval

02:19:49 <dmwaters> {server notice} Ok guys, that server is you. I will restart the server once configs are done pushing and when I've updated dns

02:22:43 Disconnected from irc.freenode.net (Connection reset by peer)

02:23:06 Users on #swade: @logger

08:29:27 <NickServ> This nickname is owned by someone else

08:29:27 <NickServ> If this is your nickname, type /msg NickServ IDENTIFY <password>

09:12:48 <libby> hello nick

09:21:15 <libby> intros:

09:21:28 <libby> daveb: redland, storage, open source

09:22:10 <libby> frank: inferencing, european projects, case studies, thesauri, portals, sw apps

09:22:42 <libby> ...run 2 courses on sw for students

09:23:42 <libby> mark ? : phd, ontology mapping

09:23:56 <jeen> mark van assem

09:24:15 <libby> mike: a new developer; new user of the technology

09:24:45 <libby> jack: startup, lgpl software

09:24:57 <libby> crap, keep missing names :(

09:25:31 <libby> ?? & jeen - sesame - stoarge query and inferencing, java

09:25:57 <libby> jeen: main interest in query, also inferencing

09:25:59 <jeen> arjohn

09:26:01 <nmg> arjohn and jeen

09:26:07 <libby> d'oh

09:26:35 <libby> AndyS: network retrieval, query; now a user

09:27:09 <jeen> libby: ilrt, squish, interested in applications, web services, photo metada, coauthor of foaf

09:27:14 <libby> ta :)

09:27:29 <nmg> nmg: AKT project (http://www.aktors.org/), interactive SW applications (CS AKTiveSpace), 3store RDF storage and inference

09:28:10 <libby> martin pike: stilo; xml, using and implementing SW technologies, aerospace

09:28:45 <libby> nick james: stilo - not research, industrial useage, lots of experience using databases

09:28:59 <libby> ?? works with frank

09:29:04 <libby> (sorry missed name)

09:29:16 <jeen> heiner stuckenschmidt

09:29:33 <jeen> postdoc researcher, works on distributed querying

09:29:34 <libby> ...how to use sesame for querying, physically distributed repositories, sub-parts of a query in difefrent places and rejoin

09:29:50 <libby> steve harris: 3-store and other bits

09:30:28 <libby> jo walsh: free software developer; perl+rdf; bots, http query; rather like joseki/serql

09:30:48 * jeen is a victim of his preconceptions again :/

09:31:00 <libby> dave reynolds: hplabs, sw, wrote jena 1 db backend; now inference, owl support

09:31:52 <libby> daniel krech, mindlab; before rdflib, bdb, zope, storage, apps, foaf,

09:32:20 <libby> kevin wilkinson, hplabs, palo alto - database guy, issues of scaling, using rdf in db

09:32:41 <libby> gregorio from crete (??): visting Frank, storage, retrieval, KR, rules

09:32:51 <libby> gregorius

09:33:14 <libby> frank: he and gregorius have written a SW textbook

09:34:35 <libby> zack: @semantics, building real, commercial sw apps

09:35:07 <libby> alberto: @semantics, rdfstore, rdql, exchange of ideas

09:35:49 <libby> dirk: @semantics; non-rdf things are important also for commercial apps, e.g. freetxt

09:36:10 <libby> ?? (sorry) an end-user of sesame, web serivces based on that

09:36:45 <jeen> sander van splunter

09:36:46 <libby> ?? (sorry) a developer of SWI prolog, recently storage and handling of rdf in prolog

09:36:57 <arjohn> jan wielemaker

09:36:58 <dajobe-lap> ^jan

09:37:39 <libby> guus: now at vrij university; recent interest in ontology enginnering, multimedia; webont co-chair; working on best practices wg

09:37:59 <libby> ...draft charter

09:42:43 <libby> ----

09:42:52 <libby> talking briefly about position papers

09:43:43 <libby> jeen: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/aduna.pdf

09:44:03 <libby> compositionality; datatypes; schema awareness

09:45:00 <libby> rdf output

09:46:38 <arjohn> demo server: http://sesame.aduna.biz/sesame/

09:47:06 <libby> ta

09:47:36 <libby> alberto: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/asemantics-2.html

09:49:07 <libby> RDFStore: C/perl toolkit, rfstore: graph, freetext, content searching. ; bsd license

09:52:07 <libby> btrees: was bdb; now custom; no sql, yet; would not use features of sql

09:54:42 <libby> find a way of searching using bitmaps

10:01:27 <libby> interesting stats on retrievel speeds; dirk: no dependance on N (num statements) only on M , nnumber of hits. hence no theoretical limit to db size

10:02:30 <libby> ditto with rdql queries

10:04:06 <libby> lond-run, queries won;t scale

10:05:02 <libby> long-run

10:05:21 <libby> 'swiss-knife problem'

10:08:53 <libby> Jan Wielemaker(?): http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/amsterdam.pdf

10:09:08 <libby> wants to have 3m triples running on laptop, using prolog

10:10:49 <libby> prolog not good for transitive closure

10:11:51 <libby> 3m triples loaded in 10 secs

10:12:04 <libby> form rdf/xml source about 10x slower

10:12:16 <libby> owl is a challenge

10:17:06 <libby> dave beckett: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/bristol-1.html

10:18:26 <libby> redalnd stores everything, not hashes, because disk access is slow

10:19:50 <libby> redland, even

10:20:23 <libby> no schema support....thinks freetext is useful

10:22:08 <libby> jeen/dirk using special namespaces as escape-hatch e.g. for free-text searching

10:25:15 <libby> jeen: queries and contexts. 3-store does this

10:25:24 <jeen> ...and so should we :/

10:25:53 <dajobe-lap> :)

10:27:04 <libby> yeah, I was fiddling with this cos it gets right down to the problem of storage. otherwise you can;t get at provenance via query...or not quickly anyway

10:27:34 <libby> kevin wilkinson(?): http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/hp-1.html

10:28:28 <jeen> true; though in our case the main bottleneck is adding support in the API. the actual storage can be solved, but changing the API without breaking stuff...

10:28:43 <jeen> we have a legacy problem it seems :)

10:29:22 <libby> bummer

10:30:11 <libby> jim ley and I have been playign with simple SQL storage patterns and: provenance makes it much slower, but, uniqueness seems to be the most important aspect.

10:30:38 <swh> libby: I didnt find that context added much to the cost

10:30:52 <nmg> uniqueness in what sense? of triples in a context?

10:31:06 <libby> kevin: jena uses sql dbs for backend storage. big diff is move from jena1: 2 tables; jena 2 - more

10:31:40 <libby> yeah, unique per context

10:32:24 <nmg> from what I remember, we found this to be expensive as well (building indexes across SPOM)

10:33:19 * libby talking at 500,000 triple level, small-scale, but signifiacant dfferences at that stage

10:33:22 <swh> yeah, SPOM indexes bite too much

10:33:58 <swh> are uniq-ing code is at the app level

10:33:59 <libby> SPOM?

10:34:01 <swh> *our

10:34:09 <swh> subj-pred-obj-model

10:34:12 <libby> sub pred ob provenance

10:34:15 <libby> ah, gotya

10:34:17 <nmg> (expect we should probably call them SPOC now that 'model' is unfashionable)

10:34:17 <swh> model-context

10:34:27 <swh> heh

10:34:29 <jeen> SPOMC?

10:34:37 <libby> heh, spoc!

10:34:46 <swh> SPO[CM]?

10:35:06 <AndyS> Subj-Pred-Obj-Context

10:35:19 <libby> kevin: interested in legacy data. sorta columns are preds in general rdbms

10:36:25 <libby> ...denormalized schema; faster than jena 1 byt 2x storage

10:37:20 <libby> now looking at: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/hp-2.html

10:37:24 <libby> (I think)

10:37:38 <libby> 'pattern mining'

10:42:27 <libby> synthetic rdf generator

10:43:15 <libby> Daniel Krech up.

10:43:23 * libby doesn't see paper

10:43:36 <libby> d'oh: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/mindswap.html

10:44:37 <AndyS> Joseki has an RDFLib-bases

10:44:45 <libby> does it?

10:44:51 <AndyS> Joseki has an RDFLib-based client library

10:44:58 <AndyS> "Pyseki"

10:45:03 <libby> heh

10:45:23 <libby> daniel: contexts. python

10:45:26 <libby> zope btrees

10:45:28 <AndyS> Want a Perl one as well

10:45:50 <AndyS> Any other language requests? (no promises!)

10:47:24 <libby> is this Jack Rusher?

10:47:25 <libby>http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/rusher.html

10:47:34 <dajobe-lap> yes

10:47:36 <jeen> yep

10:48:15 <libby> b-tree impl from scratch; read rather than write, long ints; 3 indexes

10:48:37 <libby> sorted longint sets; binary search, mapped to string table

10:48:52 <jeen> sounds interesting as a backend system.

10:50:01 <libby> java. no ql stuff yet. seems to scale well and perform well

10:53:01 <libby> steve: 3-store http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/soton.html

10:53:12 <libby> needed somewhere to put 50m triples

10:53:37 <libby> soundness but not completeness; some corner cases nnot covered.

10:53:39 <libby> mysql

10:54:02 <nmg> it was 15M not 50M...

10:54:02 * AndyS interested in finding out what the corner cases are

10:54:13 <dajobe-lap> later...

10:54:39 <libby> oops sorry nmg

10:55:07 <libby> lots of interactive apps, need to keep speed up. inserts slower, needs more work.

10:55:45 <nmg> AndyS: the gist is that we evaluate the entailment rules in a fixed order and only pass over most of them once - we get the useful entailments that we need for our applications from this approach

10:55:48 <libby> now about 26m triples, 152 bytes/triple; 5gig. ints, freetest searching

10:56:12 <libby> ...joeski-like remote api

10:56:43 <AndyS> Wonder what the schema for the result set is?

10:56:47 <libby> [that's cool. /me wants to steal your data...]

10:57:15 <libby> ...has context-extension to rdql

10:57:24 <nmg> AndyS: we don't have a formal schema for it (coughcough), but it would be easy enough to reverse engineer one from the data

10:57:31 <libby> ...store data about whether parse succeeful, date and time etc.

10:57:43 <nmg> libby: data is in http://triplestore.aktors.org/data/

10:57:48 <libby> plan to support 'owl-tiny'

10:57:49 <libby> ooh

10:58:09 <libby> distributed storage and query to increase scalablility

10:58:15 <AndyS> NMG: WHat about http://www.w3.org/2003/03/rdfqr-tests/recording-query-results.html ??

10:58:40 <nmg> OWL-Tiny is own name for the subset of OWL-Lite that contains property characteristics and equality/inequality relations from OWL Lite

10:59:04 <libby> are you supporting inverseFunctionalProperty?

10:59:45 <nmg> AndyS: ah. well, I'm sure we'll add support for that (I don't think that it was around when we wrote the XML return format)

11:00:05 <nmg> libby: as yet, no, but high on the wishlist

11:00:09 <libby> --lunchtime--

11:00:10 <libby> cool

11:00:25 <jeen> libby, you're only saying that because you've never been in our canteen yet...

11:00:32 <libby> heh

11:00:32 <nmg> context-based truth maintenance is probably the top of the list, OWL-Tiny probably after that

11:00:36 <nmg> swh may disaghree, of course ;)

11:20:49 <CaptSolo> hi all :)

11:32:12 <CaptSolo> everybody's busy at the workshop?

11:37:56 <Wack> busy with lunch, probably :]

11:39:06 <CaptSolo> ah, true, lunch is a VERY important part :>

11:39:36 <CaptSolo> it would be interesting to hear about the workshop...

11:39:50 <CaptSolo> isn't somebody there who can do real-time blog? ;>

11:40:23 <Wack> libby is reporting stuff here

12:01:05 <CaptSolo> logger url

12:01:10 <CaptSolo> logger help

12:01:18 <CaptSolo> :/

12:04:21 * nmg returns from lunch

12:04:53 <CaptSolo> wack: you were right about lunch i guess ;]

12:05:28 <CaptSolo> (is there an URL where one can see this channel logged?)

12:07:06 <Wack> logger pointer?

12:07:08 <Wack> logger pointer

12:07:35 <libby>http://ilrt.org/discovery/chatlogs/swade/2003-11-13.html

12:09:12 <dajobe-lap> it should be live now

12:11:21 <libby> dajobe-lap: the other loggers seem to be dead :(

12:11:53 <dajobe-lap> hm

12:12:10 <libby> nick james: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/stilo.html

12:12:24 <libby> (there was a massive netsplit last night)

12:12:32 <libby> using maths with RDF

12:13:11 <afs> afs is now known as AndyS

12:13:13 <libby> simple rdf storage - for presentation

12:13:18 <libby> not searching

12:14:11 <libby> martin pike: trying to capture the process of making a plane, so that when people leave or die, don't lose that info

12:14:59 <libby> scaling will become important

12:15:49 <swh> version control for RDF is an interesting subject

12:16:29 <libby> jo walsh: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/walsh.html

12:16:35 <CaptSolo> swh: that might be interesting, true

12:16:37 <libby> interested in restful api

12:16:47 <CaptSolo> swh: has there been a presentation on version control?

12:16:58 <swh> CaptSolo: the last speaker mentioned it

12:17:13 <swh> (they have some support for branches or similar)

12:17:23 <CaptSolo> they = some software?

12:17:30 <swh> yup

12:17:38 <libby> jo: returns a graph that returns more than what you ask for

12:17:40 <CaptSolo> what software?

12:18:02 <libby> descriped here I think...http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/stilo.html

12:18:12 <jeen> omm support change tracking as well actually.

12:18:23 <libby> jo: better ways of representing context

12:18:42 <libby> ...e.g. ericp's summary

12:18:51 <CaptSolo> swh: CVS would not be sufficient for RDF as we must control the triples, not the order how it is presented in textual form

12:19:16 <libby> ...sending queries over http api...dreaming of an ideal ql - like serql

12:19:34 <swh> CaptSolo: yes, you need to diff the triples - I came up with a diff like algorithm, be efficient (linear or near) implenntation is hard

12:19:47 <swh> ...but efficient...

12:20:12 <nmg> diffing the triples isn't enough - you need to take account of bNodes

12:20:33 <nmg> jeremy carroll had something about RDF graph canonicalisation at ISWC which is relevant here

12:20:44 <swh> bnaode are all different from all other bnodes, unless they have IDs

12:21:07 <swh> its a subset of the canon. problem

12:23:02 <nmg> well, the id assigned to a bNode may change from one version to the next, while the graph structure itself remains the same. is the Node in one version really different from the bNode in the other?

12:23:29 <swh> in diff terms, yes

12:25:52 <libby> ---

12:26:01 <libby> relational datbases

12:26:06 <libby> one true schema?

12:28:00 <libby> kevin: built-in query optimisers

12:28:18 <libby> dirk: work with rdf schemas?

12:28:30 <jeen> was that what he asked?

12:29:15 <libby> I was trying to say: work with the sorts of db schemas used for RDF?

12:29:24 <libby> I'm not sure taht he asked that either :)

12:29:29 <jeen> heh

12:30:09 <libby> daveR: an overflow structure and optimisations. not translated to a fixed schema

12:30:20 <libby> ==jena2

12:31:03 <libby> swh: many dbs don't like multiple joins

12:32:16 <libby> nmg: didn't have time to write our own

12:33:14 <libby> dirk: probbaly ought to reexamine backend if so many joins

12:34:30 <libby> nick james: optimisers can;t gather statistics effectively in triples bucket structure

12:35:46 <libby> swh: uses partly backchaining, partially forward-chaining

12:36:10 <libby> jeen: exhaustive forward chaining operation

12:36:23 <libby> ...seems to work quite well :)

12:37:59 <libby> albert: what about changing data - reindex every week?

12:39:10 <libby> dirk: if many blank nodes in data and queries then be difficult: optimise the rdf to remove the blank nodes

12:39:40 <libby> AndyS: transactions, management, backups, support for relational technology

12:40:06 <libby> ...all good reasons to use sql databases

12:41:38 <libby> arjohn: jdbc itself is a bottlenack

12:42:57 <AndyS> Jena experience of JDBC is that drivers return whole query to client before first result to app, not stream as required :-( (MySQL and PostgreSQL documented features)

12:43:28 <libby> boo

12:44:12 <arjohn> second andy's remark. this behaviour is very bad for memory footprint when big query results are returned:-(

12:44:17 <AndyS> Means whole result in memory at one time - big boo! No cursors

12:44:28 <libby> odd...

12:44:33 <swh> AndyS: there is a low level mysql api that has streaming

12:44:40 <swh> (but we dont use it ;)

12:45:16 <AndyS> Is that a network API? I want DB on a different machinne

12:45:45 <swh> yes

12:46:13 <swh> I will move some of the 3s stuff to hte streaming api, and its all nw transparent (of course)

12:46:22 <swh> the api is more fiddly than the chunked one

12:49:41 <AndyS> This is the C API isn't it? mysql_connect takes a host name

12:49:46 <swh> yup

12:50:42 <AndyS> Do you use prepared statments? Coudl they be useful?

12:51:04 <swh> no, but I guess they could be - maybe query caching too, but its not really an issue at the moment

12:51:10 <arjohn> we do

12:51:12 <swh> the ram is more useful for disk caches

12:51:24 <AndyS> I wondered about an app having common patterns of access

12:51:30 <arjohn> speeds up parsing but doesn't help getting the results

12:51:56 <AndyS> arjohn - is it for tuning? Of are there std prepared queries you use?

12:52:08 <swh> is parse speed an issue? I guess I could cache the pre-optimiser results

12:52:39 <arjohn> we use it while uploading data

12:52:50 <swh> ahh, right, I plan to look at that

12:56:48 <libby> jo: why can;t every triple be reified in jena2?

12:57:01 <libby> daveR: because a statemnt can be reified multiple times

12:59:09 <libby> ---

12:59:10 <libby> joins

12:59:45 <libby> swh: left joins are bad; however for RDF they don;t seem to be too bad (with indexes)

13:01:34 <swh> maybe, were supposed to be bad, by db theory is a little rought and probalby out of date

13:05:13 <swh> urgh :-/ z39.50

13:07:02 <libby> yeah

13:09:29 <libby> ...discussion of pros and cons of extensions to RDF, e.g. for dates, geo stuff

13:11:08 <AndyS> This is a good role for a WG - identify a set of operators that all can expect

13:11:22 <AndyS> ... or is that the profiles problem again

13:11:32 <swh> profiles?

13:11:56 <libby> yeah, operators like those in xquery would be good

13:12:13 <AndyS> (application) profiles - a restriction of a broad standard to a subset

13:13:19 <AndyS> xquery ops are probably a baseline

13:14:18 <AndyS>http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rmap/D2Rmap.htm

13:15:13 <AndyS> "D2R MAP is a declarative language to describe mappings between relational database schemata and OWL ontologies"

13:22:16 <AndyS> Any use of (e.g. Lucene) text indexing engines?

13:22:21 <jeen> we do

13:23:10 <AndyS> How does it mix with the main store?

13:25:04 * AndyS waiting to hear from Jeen

13:27:01 <jeen> aw heck

13:27:42 <jeen> we integrate on the api level, so the text index is wrapped directly from source, we don't pull the data into the main (sql) store

13:27:55 <jeen> "lazy" evaluation, if you will

13:28:16 <AndyS> So it is triple match and feed value expressions to the text db?

13:30:03 <libby> arjohn: any advantages to using digests or hashes? (digests are crypto)

13:30:15 <libby> dirk: sha1 is dirt cheap.

13:30:56 <libby> (??) sees no problem with generating any of these

13:31:07 <libby> swh: ditto - did some tests, esp md5

13:32:24 <libby> libby: tests for clashes?

13:32:36 <libby> sswh: tests for it, but very very unlikely to happen

13:35:05 <swh> "probability of a hash collision occurring in a knowledge base of 500 million resources and literals is around 1:10^-10"

13:35:13 <libby> ta :)

13:35:18 <swh> np :)

13:35:23 <libby> and you use full md5 hashes?

13:35:29 <swh> nope, top half

13:35:38 <libby> ok

13:35:45 <swh> with all 128 bits it would be eve safer, but too slow

13:35:45 <libby> I think I use part of a sha1

13:35:53 <libby> right, cool

13:35:55 <swh> if you like, but md5 is more common

13:36:06 <libby> not sure why we picked sha1

13:36:12 <libby> random, probbaly :)

13:36:23 <swh> doesnt really matter

13:45:38 <libby> jeen: does more or less 1-2-1 mapping between serql and sql query

13:48:14 <jeen> my main point was actually that this mapping is an optional thing. if such a mapping is not possible, the gear is still there to evaluate the query.

13:48:40 <AndyS> Jena does similar - each store gets a chnace to

13:48:51 <AndyS> optimize a query - or part of a query

13:49:37 <AndyS> sounds like Sesame's mapping is more sophisticated

13:50:25 <jeen> It's not really that complex I think.

14:33:52 <jeen>http://swap.semanticweb.org/

14:34:33 <jeen> SWAP - Semantic Web and Peer 2 Peer

14:50:30 <jeen>http://bisw.ontoview.com/cgi-bin/SameIndivAs/SameIndivAs.pl

14:50:46 <jeen> (sort of relevant for aggregation discussion)

14:51:22 <jeen> tool by Borys Omalyaenko that analyzes RDF data and finds 'same invididuals'.

14:51:28 <jeen> application-specific though.

14:51:39 <swh> nmg wrote something similar - dont have url to hand

14:52:17 <AndyS> Any clues to find the paper (unique naming problem?!)

14:53:49 <AndyS> SIMILE is currently looking at this for matching dc:creators in image catalogues

14:54:11 <AndyS> SIMILE == http://web.mit.edu/simile/www/

15:00:27 <AndyS> nmg - any clues for that paper on "same individual" problem?

15:01:36 <nmg> let me have a look

15:01:45 <AndyS> jeen - is there a URL for Boris's work?

15:01:55 * AndyS wants to pass refs on to others on SIMILE

15:03:09 <AndyS> s/Boris/Borys/

15:06:45 <nmg> AndyS: we had a paper in EKAW2002 describing some aspects of the coreference/sameIndividualAs problem, available at http://eprints.aktors.org/archive/00000076/

15:07:48 <AndyS> Ta

15:09:43 <jeen> there's his homepage: http://www.cs.vu.nl/~borys

15:10:08 <jeen> I don't know if there is any publication about his SameIndividualAs tool, or if the code is available. I assume it is...

15:15:18 <AndyS> Getting close: http://bisw.ontoview.com/cgi-bin/SameIndivAs/SameIndivAs.pl

15:17:29 <jeen> yeah, but it's only the demo. no description or code download...

15:17:45 <AndyS> ISWC paper reference (well - name/title) anyone?

15:17:59 <libby> ---discussion on test data, - generated data better than real data, plus generated data not personal

15:18:13 <libby> ---discussion on provenance/contexts etc

15:19:06 <nmg> AndyS: Benchmarking DAML+OIL Repositories, Yuanbo Guo, Jeff Heflin and Zhengxiang Pan

15:19:28 * libby asks for contexts or something in rdf2 - for network retrieval

15:19:41 <AndyS> Thanks nmg

15:19:58 <nmg> AndyS: http://www.cse.lehigh.edu/~heflin/pubs/iswc2003.pdf

15:23:46 <jeen>http://km.aifb.uni-karlsruhe.de/ws/psss03/proceedings/macgregor-et-al.pdf

15:23:55 <jeen> paper by macgregor about contexts

15:51:10 <dajobe-lap> - end of meeting -

15:51:19 <dajobe-lap> back 09:00 GMT+1 tomorrow

The IRC chat here was automatically logged without editing and contains content written by the chat participants identified by their IRC nick. No other identity is recorded.

SWAD-Europe Workshop on Semantic Web Storage and Retrieval - IRC Chat Logs for 2003-11-13