Nearby: SWAD-Europe events page | Workshop home page | Workshop Announcement | Attendees | Position papers
Next day Friday 2003-11-14
02:19:49 <dmwaters> {server notice} Ok guys, that server is you. I will restart the server once configs are done pushing and when I've updated dns
02:22:43 Disconnected from irc.freenode.net (Connection reset by peer)
02:23:06 Users on #swade: @logger
08:29:27 <NickServ> This nickname is owned by someone else
08:29:27 <NickServ> If this is your nickname, type /msg NickServ IDENTIFY <password>
09:12:48 <libby> hello nick
09:21:15 <libby> intros:
09:21:28 <libby> daveb: redland, storage, open source
09:22:10 <libby> frank: inferencing, european projects, case studies, thesauri, portals, sw apps
09:22:42 <libby> ...run 2 courses on sw for students
09:23:42 <libby> mark ? : phd, ontology mapping
09:23:56 <jeen> mark van assem
09:24:15 <libby> mike: a new developer; new user of the technology
09:24:45 <libby> jack: startup, lgpl software
09:24:57 <libby> crap, keep missing names :(
09:25:31 <libby> ?? & jeen - sesame - stoarge query and inferencing, java
09:25:57 <libby> jeen: main interest in query, also inferencing
09:25:59 <jeen> arjohn
09:26:01 <nmg> arjohn and jeen
09:26:07 <libby> d'oh
09:26:35 <libby> AndyS: network retrieval, query; now a user
09:27:09 <jeen> libby: ilrt, squish, interested in applications, web services, photo metada, coauthor of foaf
09:27:14 <libby> ta :)
09:27:29 <nmg> nmg: AKT project (http://www.aktors.org/), interactive SW applications (CS AKTiveSpace), 3store RDF storage and inference
09:28:10 <libby> martin pike: stilo; xml, using and implementing SW technologies, aerospace
09:28:45 <libby> nick james: stilo - not research, industrial useage, lots of experience using databases
09:28:59 <libby> ?? works with frank
09:29:04 <libby> (sorry missed name)
09:29:16 <jeen> heiner stuckenschmidt
09:29:33 <jeen> postdoc researcher, works on distributed querying
09:29:34 <libby> ...how to use sesame for querying, physically distributed repositories, sub-parts of a query in difefrent places and rejoin
09:29:50 <libby> steve harris: 3-store and other bits
09:30:28 <libby> jo walsh: free software developer; perl+rdf; bots, http query; rather like joseki/serql
09:30:48 * jeen is a victim of his preconceptions again :/
09:31:00 <libby> dave reynolds: hplabs, sw, wrote jena 1 db backend; now inference, owl support
09:31:52 <libby> daniel krech, mindlab; before rdflib, bdb, zope, storage, apps, foaf,
09:32:20 <libby> kevin wilkinson, hplabs, palo alto - database guy, issues of scaling, using rdf in db
09:32:41 <libby> gregorio from crete (??): visting Frank, storage, retrieval, KR, rules
09:32:51 <libby> gregorius
09:33:14 <libby> frank: he and gregorius have written a SW textbook
09:34:35 <libby> zack: @semantics, building real, commercial sw apps
09:35:07 <libby> alberto: @semantics, rdfstore, rdql, exchange of ideas
09:35:49 <libby> dirk: @semantics; non-rdf things are important also for commercial apps, e.g. freetxt
09:36:10 <libby> ?? (sorry) an end-user of sesame, web serivces based on that
09:36:45 <jeen> sander van splunter
09:36:46 <libby> ?? (sorry) a developer of SWI prolog, recently storage and handling of rdf in prolog
09:36:57 <arjohn> jan wielemaker
09:36:58 <dajobe-lap> ^jan
09:37:39 <libby> guus: now at vrij university; recent interest in ontology enginnering, multimedia; webont co-chair; working on best practices wg
09:37:59 <libby> ...draft charter
09:42:43 <libby> ----
09:42:52 <libby> talking briefly about position papers
09:43:43 <libby> jeen: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/aduna.pdf
09:44:03 <libby> compositionality; datatypes; schema awareness
09:45:00 <libby> rdf output
09:46:38 <arjohn> demo server: http://sesame.aduna.biz/sesame/
09:47:06 <libby> ta
09:47:36 <libby> alberto: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/asemantics-2.html
09:49:07 <libby> RDFStore: C/perl toolkit, rfstore: graph, freetext, content searching. ; bsd license
09:52:07 <libby> btrees: was bdb; now custom; no sql, yet; would not use features of sql
09:54:42 <libby> find a way of searching using bitmaps
10:01:27 <libby> interesting stats on retrievel speeds; dirk: no dependance on N (num statements) only on M , nnumber of hits. hence no theoretical limit to db size
10:02:30 <libby> ditto with rdql queries
10:04:06 <libby> lond-run, queries won;t scale
10:05:02 <libby> long-run
10:05:21 <libby> 'swiss-knife problem'
10:08:53 <libby> Jan Wielemaker(?): http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/amsterdam.pdf
10:09:08 <libby> wants to have 3m triples running on laptop, using prolog
10:10:49 <libby> prolog not good for transitive closure
10:11:51 <libby> 3m triples loaded in 10 secs
10:12:04 <libby> form rdf/xml source about 10x slower
10:12:16 <libby> owl is a challenge
10:17:06 <libby> dave beckett: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/bristol-1.html
10:18:26 <libby> redalnd stores everything, not hashes, because disk access is slow
10:19:50 <libby> redland, even
10:20:23 <libby> no schema support....thinks freetext is useful
10:22:08 <libby> jeen/dirk using special namespaces as escape-hatch e.g. for free-text searching
10:25:15 <libby> jeen: queries and contexts. 3-store does this
10:25:24 <jeen> ...and so should we :/
10:25:53 <dajobe-lap> :)
10:27:04 <libby> yeah, I was fiddling with this cos it gets right down to the problem of storage. otherwise you can;t get at provenance via query...or not quickly anyway
10:27:34 <libby> kevin wilkinson(?): http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/hp-1.html
10:28:28 <jeen> true; though in our case the main bottleneck is adding support in the API. the actual storage can be solved, but changing the API without breaking stuff...
10:28:43 <jeen> we have a legacy problem it seems :)
10:29:22 <libby> bummer
10:30:11 <libby> jim ley and I have been playign with simple SQL storage patterns and: provenance makes it much slower, but, uniqueness seems to be the most important aspect.
10:30:38 <swh> libby: I didnt find that context added much to the cost
10:30:52 <nmg> uniqueness in what sense? of triples in a context?
10:31:06 <libby> kevin: jena uses sql dbs for backend storage. big diff is move from jena1: 2 tables; jena 2 - more
10:31:40 <libby> yeah, unique per context
10:32:24 <nmg> from what I remember, we found this to be expensive as well (building indexes across SPOM)
10:33:19 * libby talking at 500,000 triple level, small-scale, but signifiacant dfferences at that stage
10:33:22 <swh> yeah, SPOM indexes bite too much
10:33:58 <swh> are uniq-ing code is at the app level
10:33:59 <libby> SPOM?
10:34:01 <swh> *our
10:34:09 <swh> subj-pred-obj-model
10:34:12 <libby> sub pred ob provenance
10:34:15 <libby> ah, gotya
10:34:17 <nmg> (expect we should probably call them SPOC now that 'model' is unfashionable)
10:34:17 <swh> model-context
10:34:27 <swh> heh
10:34:29 <jeen> SPOMC?
10:34:37 <libby> heh, spoc!
10:34:46 <swh> SPO[CM]?
10:35:06 <AndyS> Subj-Pred-Obj-Context
10:35:19 <libby> kevin: interested in legacy data. sorta columns are preds in general rdbms
10:36:25 <libby> ...denormalized schema; faster than jena 1 byt 2x storage
10:37:20 <libby> now looking at: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/hp-2.html
10:37:24 <libby> (I think)
10:37:38 <libby> 'pattern mining'
10:42:27 <libby> synthetic rdf generator
10:43:15 <libby> Daniel Krech up.
10:43:23 * libby doesn't see paper
10:43:36 <libby> d'oh: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/mindswap.html
10:44:37 <AndyS> Joseki has an RDFLib-bases
10:44:45 <libby> does it?
10:44:51 <AndyS> Joseki has an RDFLib-based client library
10:44:58 <AndyS> "Pyseki"
10:45:03 <libby> heh
10:45:23 <libby> daniel: contexts. python
10:45:26 <libby> zope btrees
10:45:28 <AndyS> Want a Perl one as well
10:45:50 <AndyS> Any other language requests? (no promises!)
10:47:24 <libby> is this Jack Rusher?
10:47:25 <libby>http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/rusher.html
10:47:34 <dajobe-lap> yes
10:47:36 <jeen> yep
10:48:15 <libby> b-tree impl from scratch; read rather than write, long ints; 3 indexes
10:48:37 <libby> sorted longint sets; binary search, mapped to string table
10:48:52 <jeen> sounds interesting as a backend system.
10:50:01 <libby> java. no ql stuff yet. seems to scale well and perform well
10:53:01 <libby> steve: 3-store http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/soton.html
10:53:12 <libby> needed somewhere to put 50m triples
10:53:37 <libby> soundness but not completeness; some corner cases nnot covered.
10:53:39 <libby> mysql
10:54:02 <nmg> it was 15M not 50M...
10:54:02 * AndyS interested in finding out what the corner cases are
10:54:13 <dajobe-lap> later...
10:54:39 <libby> oops sorry nmg
10:55:07 <libby> lots of interactive apps, need to keep speed up. inserts slower, needs more work.
10:55:45 <nmg> AndyS: the gist is that we evaluate the entailment rules in a fixed order and only pass over most of them once - we get the useful entailments that we need for our applications from this approach
10:55:48 <libby> now about 26m triples, 152 bytes/triple; 5gig. ints, freetest searching
10:56:12 <libby> ...joeski-like remote api
10:56:43 <AndyS> Wonder what the schema for the result set is?
10:56:47 <libby> [that's cool. /me wants to steal your data...]
10:57:15 <libby> ...has context-extension to rdql
10:57:24 <nmg> AndyS: we don't have a formal schema for it (coughcough), but it would be easy enough to reverse engineer one from the data
10:57:31 <libby> ...store data about whether parse succeeful, date and time etc.
10:57:43 <nmg> libby: data is in http://triplestore.aktors.org/data/
10:57:48 <libby> plan to support 'owl-tiny'
10:57:49 <libby> ooh
10:58:09 <libby> distributed storage and query to increase scalablility
10:58:15 <AndyS> NMG: WHat about http://www.w3.org/2003/03/rdfqr-tests/recording-query-results.html ??
10:58:40 <nmg> OWL-Tiny is own name for the subset of OWL-Lite that contains property characteristics and equality/inequality relations from OWL Lite
10:59:04 <libby> are you supporting inverseFunctionalProperty?
10:59:45 <nmg> AndyS: ah. well, I'm sure we'll add support for that (I don't think that it was around when we wrote the XML return format)
11:00:05 <nmg> libby: as yet, no, but high on the wishlist
11:00:09 <libby> --lunchtime--
11:00:10 <libby> cool
11:00:25 <jeen> libby, you're only saying that because you've never been in our canteen yet...
11:00:32 <libby> heh
11:00:32 <nmg> context-based truth maintenance is probably the top of the list, OWL-Tiny probably after that
11:00:36 <nmg> swh may disaghree, of course ;)
11:20:49 <CaptSolo> hi all :)
11:32:12 <CaptSolo> everybody's busy at the workshop?
11:37:56 <Wack> busy with lunch, probably :]
11:39:06 <CaptSolo> ah, true, lunch is a VERY important part :>
11:39:36 <CaptSolo> it would be interesting to hear about the workshop...
11:39:50 <CaptSolo> isn't somebody there who can do real-time blog? ;>
11:40:23 <Wack> libby is reporting stuff here
12:01:05 <CaptSolo> logger url
12:01:10 <CaptSolo> logger help
12:01:18 <CaptSolo> :/
12:04:21 * nmg returns from lunch
12:04:53 <CaptSolo> wack: you were right about lunch i guess ;]
12:05:28 <CaptSolo> (is there an URL where one can see this channel logged?)
12:07:06 <Wack> logger pointer?
12:07:08 <Wack> logger pointer
12:07:35 <libby>http://ilrt.org/discovery/chatlogs/swade/2003-11-13.html
12:09:12 <dajobe-lap> it should be live now
12:11:21 <libby> dajobe-lap: the other loggers seem to be dead :(
12:11:53 <dajobe-lap> hm
12:12:10 <libby> nick james: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/stilo.html
12:12:24 <libby> (there was a massive netsplit last night)
12:12:32 <libby> using maths with RDF
12:13:11 <afs> afs is now known as AndyS
12:13:13 <libby> simple rdf storage - for presentation
12:13:18 <libby> not searching
12:14:11 <libby> martin pike: trying to capture the process of making a plane, so that when people leave or die, don't lose that info
12:14:59 <libby> scaling will become important
12:15:49 <swh> version control for RDF is an interesting subject
12:16:29 <libby> jo walsh: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/walsh.html
12:16:35 <CaptSolo> swh: that might be interesting, true
12:16:37 <libby> interested in restful api
12:16:47 <CaptSolo> swh: has there been a presentation on version control?
12:16:58 <swh> CaptSolo: the last speaker mentioned it
12:17:13 <swh> (they have some support for branches or similar)
12:17:23 <CaptSolo> they = some software?
12:17:30 <swh> yup
12:17:38 <libby> jo: returns a graph that returns more than what you ask for
12:17:40 <CaptSolo> what software?
12:18:02 <libby> descriped here I think...http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/stilo.html
12:18:12 <jeen> omm support change tracking as well actually.
12:18:23 <libby> jo: better ways of representing context
12:18:42 <libby> ...e.g. ericp's summary
12:18:51 <CaptSolo> swh: CVS would not be sufficient for RDF as we must control the triples, not the order how it is presented in textual form
12:19:16 <libby> ...sending queries over http api...dreaming of an ideal ql - like serql
12:19:34 <swh> CaptSolo: yes, you need to diff the triples - I came up with a diff like algorithm, be efficient (linear or near) implenntation is hard
12:19:47 <swh> ...but efficient...
12:20:12 <nmg> diffing the triples isn't enough - you need to take account of bNodes
12:20:33 <nmg> jeremy carroll had something about RDF graph canonicalisation at ISWC which is relevant here
12:20:44 <swh> bnaode are all different from all other bnodes, unless they have IDs
12:21:07 <swh> its a subset of the canon. problem
12:23:02 <nmg> well, the id assigned to a bNode may change from one version to the next, while the graph structure itself remains the same. is the Node in one version really different from the bNode in the other?
12:23:29 <swh> in diff terms, yes
12:25:52 <libby> ---
12:26:01 <libby> relational datbases
12:26:06 <libby> one true schema?
12:28:00 <libby> kevin: built-in query optimisers
12:28:18 <libby> dirk: work with rdf schemas?
12:28:30 <jeen> was that what he asked?
12:29:15 <libby> I was trying to say: work with the sorts of db schemas used for RDF?
12:29:24 <libby> I'm not sure taht he asked that either :)
12:29:29 <jeen> heh
12:30:09 <libby> daveR: an overflow structure and optimisations. not translated to a fixed schema
12:30:20 <libby> ==jena2
12:31:03 <libby> swh: many dbs don't like multiple joins
12:32:16 <libby> nmg: didn't have time to write our own
12:33:14 <libby> dirk: probbaly ought to reexamine backend if so many joins
12:34:30 <libby> nick james: optimisers can;t gather statistics effectively in triples bucket structure
12:35:46 <libby> swh: uses partly backchaining, partially forward-chaining
12:36:10 <libby> jeen: exhaustive forward chaining operation
12:36:23 <libby> ...seems to work quite well :)
12:37:59 <libby> albert: what about changing data - reindex every week?
12:39:10 <libby> dirk: if many blank nodes in data and queries then be difficult: optimise the rdf to remove the blank nodes
12:39:40 <libby> AndyS: transactions, management, backups, support for relational technology
12:40:06 <libby> ...all good reasons to use sql databases
12:41:38 <libby> arjohn: jdbc itself is a bottlenack
12:42:57 <AndyS> Jena experience of JDBC is that drivers return whole query to client before first result to app, not stream as required :-( (MySQL and PostgreSQL documented features)
12:43:28 <libby> boo
12:44:12 <arjohn> second andy's remark. this behaviour is very bad for memory footprint when big query results are returned:-(
12:44:17 <AndyS> Means whole result in memory at one time - big boo! No cursors
12:44:28 <libby> odd...
12:44:33 <swh> AndyS: there is a low level mysql api that has streaming
12:44:40 <swh> (but we dont use it ;)
12:45:16 <AndyS> Is that a network API? I want DB on a different machinne
12:45:45 <swh> yes
12:46:13 <swh> I will move some of the 3s stuff to hte streaming api, and its all nw transparent (of course)
12:46:22 <swh> the api is more fiddly than the chunked one
12:49:41 <AndyS> This is the C API isn't it? mysql_connect takes a host name
12:49:46 <swh> yup
12:50:42 <AndyS> Do you use prepared statments? Coudl they be useful?
12:51:04 <swh> no, but I guess they could be - maybe query caching too, but its not really an issue at the moment
12:51:10 <arjohn> we do
12:51:12 <swh> the ram is more useful for disk caches
12:51:24 <AndyS> I wondered about an app having common patterns of access
12:51:30 <arjohn> speeds up parsing but doesn't help getting the results
12:51:56 <AndyS> arjohn - is it for tuning? Of are there std prepared queries you use?
12:52:08 <swh> is parse speed an issue? I guess I could cache the pre-optimiser results
12:52:39 <arjohn> we use it while uploading data
12:52:50 <swh> ahh, right, I plan to look at that
12:56:48 <libby> jo: why can;t every triple be reified in jena2?
12:57:01 <libby> daveR: because a statemnt can be reified multiple times
12:59:09 <libby> ---
12:59:10 <libby> joins
12:59:45 <libby> swh: left joins are bad; however for RDF they don;t seem to be too bad (with indexes)
13:01:34 <swh> maybe, were supposed to be bad, by db theory is a little rought and probalby out of date
13:05:13 <swh> urgh :-/ z39.50
13:07:02 <libby> yeah
13:09:29 <libby> ...discussion of pros and cons of extensions to RDF, e.g. for dates, geo stuff
13:11:08 <AndyS> This is a good role for a WG - identify a set of operators that all can expect
13:11:22 <AndyS> ... or is that the profiles problem again
13:11:32 <swh> profiles?
13:11:56 <libby> yeah, operators like those in xquery would be good
13:12:13 <AndyS> (application) profiles - a restriction of a broad standard to a subset
13:13:19 <AndyS> xquery ops are probably a baseline
13:14:18 <AndyS>http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rmap/D2Rmap.htm
13:15:13 <AndyS> "D2R MAP is a declarative language to describe mappings between relational database schemata and OWL ontologies"
13:22:16 <AndyS> Any use of (e.g. Lucene) text indexing engines?
13:22:21 <jeen> we do
13:23:10 <AndyS> How does it mix with the main store?
13:25:04 * AndyS waiting to hear from Jeen
13:27:01 <jeen> aw heck
13:27:42 <jeen> we integrate on the api level, so the text index is wrapped directly from source, we don't pull the data into the main (sql) store
13:27:55 <jeen> "lazy" evaluation, if you will
13:28:16 <AndyS> So it is triple match and feed value expressions to the text db?
13:30:03 <libby> arjohn: any advantages to using digests or hashes? (digests are crypto)
13:30:15 <libby> dirk: sha1 is dirt cheap.
13:30:56 <libby> (??) sees no problem with generating any of these
13:31:07 <libby> swh: ditto - did some tests, esp md5
13:32:24 <libby> libby: tests for clashes?
13:32:36 <libby> sswh: tests for it, but very very unlikely to happen
13:35:05 <swh> "probability of a hash collision occurring in a knowledge base of 500 million resources and literals is around 1:10^-10"
13:35:13 <libby> ta :)
13:35:18 <swh> np :)
13:35:23 <libby> and you use full md5 hashes?
13:35:29 <swh> nope, top half
13:35:38 <libby> ok
13:35:45 <swh> with all 128 bits it would be eve safer, but too slow
13:35:45 <libby> I think I use part of a sha1
13:35:53 <libby> right, cool
13:35:55 <swh> if you like, but md5 is more common
13:36:06 <libby> not sure why we picked sha1
13:36:12 <libby> random, probbaly :)
13:36:23 <swh> doesnt really matter
13:45:38 <libby> jeen: does more or less 1-2-1 mapping between serql and sql query
13:48:14 <jeen> my main point was actually that this mapping is an optional thing. if such a mapping is not possible, the gear is still there to evaluate the query.
13:48:40 <AndyS> Jena does similar - each store gets a chnace to
13:48:51 <AndyS> optimize a query - or part of a query
13:49:37 <AndyS> sounds like Sesame's mapping is more sophisticated
13:50:25 <jeen> It's not really that complex I think.
14:33:52 <jeen>http://swap.semanticweb.org/
14:34:33 <jeen> SWAP - Semantic Web and Peer 2 Peer
14:50:30 <jeen>http://bisw.ontoview.com/cgi-bin/SameIndivAs/SameIndivAs.pl
14:50:46 <jeen> (sort of relevant for aggregation discussion)
14:51:22 <jeen> tool by Borys Omalyaenko that analyzes RDF data and finds 'same invididuals'.
14:51:28 <jeen> application-specific though.
14:51:39 <swh> nmg wrote something similar - dont have url to hand
14:52:17 <AndyS> Any clues to find the paper (unique naming problem?!)
14:53:49 <AndyS> SIMILE is currently looking at this for matching dc:creators in image catalogues
14:54:11 <AndyS> SIMILE == http://web.mit.edu/simile/www/
15:00:27 <AndyS> nmg - any clues for that paper on "same individual" problem?
15:01:36 <nmg> let me have a look
15:01:45 <AndyS> jeen - is there a URL for Boris's work?
15:01:55 * AndyS wants to pass refs on to others on SIMILE
15:03:09 <AndyS> s/Boris/Borys/
15:06:45 <nmg> AndyS: we had a paper in EKAW2002 describing some aspects of the coreference/sameIndividualAs problem, available at http://eprints.aktors.org/archive/00000076/
15:07:48 <AndyS> Ta
15:09:43 <jeen> there's his homepage: http://www.cs.vu.nl/~borys
15:10:08 <jeen> I don't know if there is any publication about his SameIndividualAs tool, or if the code is available. I assume it is...
15:15:18 <AndyS> Getting close: http://bisw.ontoview.com/cgi-bin/SameIndivAs/SameIndivAs.pl
15:17:29 <jeen> yeah, but it's only the demo. no description or code download...
15:17:45 <AndyS> ISWC paper reference (well - name/title) anyone?
15:17:59 <libby> ---discussion on test data, - generated data better than real data, plus generated data not personal
15:18:13 <libby> ---discussion on provenance/contexts etc
15:19:06 <nmg> AndyS: Benchmarking DAML+OIL Repositories, Yuanbo Guo, Jeff Heflin and Zhengxiang Pan
15:19:28 * libby asks for contexts or something in rdf2 - for network retrieval
15:19:41 <AndyS> Thanks nmg
15:19:58 <nmg> AndyS: http://www.cse.lehigh.edu/~heflin/pubs/iswc2003.pdf
15:23:46 <jeen>http://km.aifb.uni-karlsruhe.de/ws/psss03/proceedings/macgregor-et-al.pdf
15:23:55 <jeen> paper by macgregor about contexts
15:51:10 <dajobe-lap> - end of meeting -
15:51:19 <dajobe-lap> back 09:00 GMT+1 tomorrow
Next day Friday 2003-11-14
The IRC chat here was automatically logged without editing and contains content written by the chat participants identified by their IRC nick. No other identity is recorded.