02:19:49 {server notice} Ok guys, that server is you. I will restart the server once configs are done pushing and when I've updated dns 02:22:43 Disconnected from irc.freenode.net (Connection reset by peer) 02:23:06 logger has joined #swade 02:23:06 Users on #swade: @logger 08:29:27 This nickname is owned by someone else 08:29:27 If this is your nickname, type /msg NickServ IDENTIFY 09:12:27 nmg has joined #swade 09:12:41 libby has joined #swade 09:12:48 hello nick 09:18:32 jeen has joined #swade 09:21:15 intros: 09:21:28 daveb: redland, storage, open source 09:22:10 frank: inferencing, european projects, case studies, thesauri, portals, sw apps 09:22:42 ...run 2 courses on sw for students 09:23:08 Wack has joined #swade 09:23:30 AndyS has joined #swade 09:23:42 mark ? : phd, ontology mapping 09:23:56 mark van assem 09:24:15 mike: a new developer; new user of the technology 09:24:45 jack: startup, lgpl software 09:24:57 crap, keep missing names :( 09:25:31 ?? & jeen - sesame - stoarge query and inferencing, java 09:25:57 jeen: main interest in query, also inferencing 09:25:59 arjohn 09:26:01 arjohn and jeen 09:26:07 d'oh 09:26:35 AndyS: network retrieval, query; now a user 09:27:09 libby: ilrt, squish, interested in applications, web services, photo metada, coauthor of foaf 09:27:14 ta :) 09:27:29 nmg: AKT project (http://www.aktors.org/), interactive SW applications (CS AKTiveSpace), 3store RDF storage and inference 09:28:10 martin pike: stilo; xml, using and implementing SW technologies, aerospace 09:28:45 nick james: stilo - not research, industrial useage, lots of experience using databases 09:28:59 ?? works with frank 09:29:04 (sorry missed name) 09:29:16 heiner stuckenschmidt 09:29:33 postdoc researcher, works on distributed querying 09:29:34 ...how to use sesame for querying, physically distributed repositories, sub-parts of a query in difefrent places and rejoin 09:29:50 steve harris: 3-store and other bits 09:30:28 jo walsh: free software developer; perl+rdf; bots, http query; rather like joseki/serql 09:30:48 * jeen is a victim of his preconceptions again :/ 09:31:00 dave reynolds: hplabs, sw, wrote jena 1 db backend; now inference, owl support 09:31:06 dajobe-lap has joined #swade 09:31:52 daniel krech, mindlab; before rdflib, bdb, zope, storage, apps, foaf, 09:32:20 kevin wilkinson, hplabs, palo alto - database guy, issues of scaling, using rdf in db 09:32:41 gregorio from crete (??): visting Frank, storage, retrieval, KR, rules 09:32:51 gregorius 09:33:14 frank: he and gregorius have written a SW textbook 09:34:35 zack: @semantics, building real, commercial sw apps 09:35:07 alberto: @semantics, rdfstore, rdql, exchange of ideas 09:35:08 arjohn has joined #swade 09:35:24 arjohn has quit 09:35:40 arjohn has joined #swade 09:35:49 dirk: @semantics; non-rdf things are important also for commercial apps, e.g. freetxt 09:36:10 ?? (sorry) an end-user of sesame, web serivces based on that 09:36:45 sander van splunter 09:36:46 ?? (sorry) a developer of SWI prolog, recently storage and handling of rdf in prolog 09:36:57 jan wielemaker 09:36:58 ^jan 09:37:39 guus: now at vrij university; recent interest in ontology enginnering, multimedia; webont co-chair; working on best practices wg 09:37:59 ...draft charter 09:42:43 ---- 09:42:52 talking briefly about position papers 09:43:43 jeen: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/aduna.pdf 09:44:03 compositionality; datatypes; schema awareness 09:45:00 rdf output 09:46:38 demo server: http://sesame.aduna.biz/sesame/ 09:47:06 ta 09:47:36 alberto: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/asemantics-2.html 09:49:07 RDFStore: C/perl toolkit, rfstore: graph, freetext, content searching. ; bsd license 09:52:07 btrees: was bdb; now custom; no sql, yet; would not use features of sql 09:54:32 swh has joined #swade 09:54:42 find a way of searching using bitmaps 10:01:27 interesting stats on retrievel speeds; dirk: no dependance on N (num statements) only on M , nnumber of hits. hence no theoretical limit to db size 10:02:30 ditto with rdql queries 10:04:06 lond-run, queries won;t scale 10:05:02 long-run 10:05:21 'swiss-knife problem' 10:08:53 Jan Wielemaker(?): http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/amsterdam.pdf 10:09:08 wants to have 3m triples running on laptop, using prolog 10:10:49 prolog not good for transitive closure 10:11:51 3m triples loaded in 10 secs 10:12:04 form rdf/xml source about 10x slower 10:12:16 owl is a challenge 10:17:06 dave beckett: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/bristol-1.html 10:18:26 redalnd stores everything, not hashes, because disk access is slow 10:19:50 redland, even 10:20:23 no schema support....thinks freetext is useful 10:22:08 jeen/dirk using special namespaces as escape-hatch e.g. for free-text searching 10:25:15 jeen: queries and contexts. 3-store does this 10:25:24 ...and so should we :/ 10:25:53 :) 10:27:04 yeah, I was fiddling with this cos it gets right down to the problem of storage. otherwise you can;t get at provenance via query...or not quickly anyway 10:27:34 kevin wilkinson(?): http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/hp-1.html 10:28:28 true; though in our case the main bottleneck is adding support in the API. the actual storage can be solved, but changing the API without breaking stuff... 10:28:43 we have a legacy problem it seems :) 10:29:22 bummer 10:30:11 jim ley and I have been playign with simple SQL storage patterns and: provenance makes it much slower, but, uniqueness seems to be the most important aspect. 10:30:38 libby: I didnt find that context added much to the cost 10:30:52 uniqueness in what sense? of triples in a context? 10:31:06 kevin: jena uses sql dbs for backend storage. big diff is move from jena1: 2 tables; jena 2 - more 10:31:40 yeah, unique per context 10:32:24 from what I remember, we found this to be expensive as well (building indexes across SPOM) 10:33:19 * libby talking at 500,000 triple level, small-scale, but signifiacant dfferences at that stage 10:33:22 yeah, SPOM indexes bite too much 10:33:58 are uniq-ing code is at the app level 10:33:59 SPOM? 10:34:01 *our 10:34:09 subj-pred-obj-model 10:34:12 sub pred ob provenance 10:34:15 ah, gotya 10:34:17 (expect we should probably call them SPOC now that 'model' is unfashionable) 10:34:17 model-context 10:34:27 heh 10:34:29 SPOMC? 10:34:37 heh, spoc! 10:34:46 SPO[CM]? 10:35:06 Subj-Pred-Obj-Context 10:35:19 kevin: interested in legacy data. sorta columns are preds in general rdbms 10:36:25 ...denormalized schema; faster than jena 1 byt 2x storage 10:37:20 now looking at: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/hp-2.html 10:37:24 (I think) 10:37:38 'pattern mining' 10:42:27 synthetic rdf generator 10:43:15 Daniel Krech up. 10:43:23 * libby doesn't see paper 10:43:36 d'oh: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/mindswap.html 10:44:37 Joseki has an RDFLib-bases 10:44:45 does it? 10:44:51 Joseki has an RDFLib-based client library 10:44:58 "Pyseki" 10:45:03 heh 10:45:23 daniel: contexts. python 10:45:26 zope btrees 10:45:28 Want a Perl one as well 10:45:50 Any other language requests? (no promises!) 10:47:24 is this Jack Rusher? 10:47:25 http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/rusher.html 10:47:34 yes 10:47:36 yep 10:48:15 b-tree impl from scratch; read rather than write, long ints; 3 indexes 10:48:37 sorted longint sets; binary search, mapped to string table 10:48:52 sounds interesting as a backend system. 10:50:01 java. no ql stuff yet. seems to scale well and perform well 10:53:01 steve: 3-store http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/soton.html 10:53:12 needed somewhere to put 50m triples 10:53:37 soundness but not completeness; some corner cases nnot covered. 10:53:39 mysql 10:54:02 it was 15M not 50M... 10:54:02 * AndyS interested in finding out what the corner cases are 10:54:13 later... 10:54:39 oops sorry nmg 10:55:07 lots of interactive apps, need to keep speed up. inserts slower, needs more work. 10:55:45 AndyS: the gist is that we evaluate the entailment rules in a fixed order and only pass over most of them once - we get the useful entailments that we need for our applications from this approach 10:55:48 now about 26m triples, 152 bytes/triple; 5gig. ints, freetest searching 10:56:12 ...joeski-like remote api 10:56:43 Wonder what the schema for the result set is? 10:56:47 [that's cool. /me wants to steal your data...] 10:57:15 ...has context-extension to rdql 10:57:24 AndyS: we don't have a formal schema for it (coughcough), but it would be easy enough to reverse engineer one from the data 10:57:31 ...store data about whether parse succeeful, date and time etc. 10:57:43 libby: data is in http://triplestore.aktors.org/data/ 10:57:48 plan to support 'owl-tiny' 10:57:49 ooh 10:58:09 distributed storage and query to increase scalablility 10:58:15 NMG: WHat about http://www.w3.org/2003/03/rdfqr-tests/recording-query-results.html ?? 10:58:40 OWL-Tiny is own name for the subset of OWL-Lite that contains property characteristics and equality/inequality relations from OWL Lite 10:59:04 are you supporting inverseFunctionalProperty? 10:59:45 AndyS: ah. well, I'm sure we'll add support for that (I don't think that it was around when we wrote the XML return format) 10:59:57 arjohn has quit 11:00:05 libby: as yet, no, but high on the wishlist 11:00:09 --lunchtime-- 11:00:10 cool 11:00:25 libby, you're only saying that because you've never been in our canteen yet... 11:00:32 heh 11:00:32 context-based truth maintenance is probably the top of the list, OWL-Tiny probably after that 11:00:36 swh may disaghree, of course ;) 11:20:45 CaptSolo has joined #swade 11:20:49 hi all :) 11:32:12 everybody's busy at the workshop? 11:37:56 busy with lunch, probably :] 11:38:35 jeen has quit 11:39:06 ah, true, lunch is a VERY important part :> 11:39:36 it would be interesting to hear about the workshop... 11:39:50 isn't somebody there who can do real-time blog? ;> 11:40:23 libby is reporting stuff here 12:01:05 logger url 12:01:10 logger help 12:01:18 :/ 12:04:21 * nmg returns from lunch 12:04:53 wack: you were right about lunch i guess ;] 12:05:28 (is there an URL where one can see this channel logged?) 12:07:06 logger pointer? 12:07:08 logger pointer 12:07:35 http://ilrt.org/discovery/chatlogs/swade/2003-11-13.html 12:09:12 it should be live now 12:09:17 afs has joined #swade 12:09:49 AndyS has quit 12:11:21 dajobe-lap: the other loggers seem to be dead :( 12:11:53 hm 12:11:59 arjohn has joined #swade 12:12:10 nick james: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/stilo.html 12:12:24 (there was a massive netsplit last night) 12:12:32 using maths with RDF 12:13:04 jeen has joined #swade 12:13:11 afs is now known as AndyS 12:13:13 simple rdf storage - for presentation 12:13:18 not searching 12:14:11 martin pike: trying to capture the process of making a plane, so that when people leave or die, don't lose that info 12:14:59 scaling will become important 12:15:49 version control for RDF is an interesting subject 12:16:29 jo walsh: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/walsh.html 12:16:35 swh: that might be interesting, true 12:16:37 interested in restful api 12:16:47 swh: has there been a presentation on version control? 12:16:58 CaptSolo: the last speaker mentioned it 12:17:13 (they have some support for branches or similar) 12:17:23 they = some software? 12:17:30 yup 12:17:38 jo: returns a graph that returns more than what you ask for 12:17:40 what software? 12:18:02 descriped here I think...http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/stilo.html 12:18:12 omm support change tracking as well actually. 12:18:23 jo: better ways of representing context 12:18:42 ...e.g. ericp's summary 12:18:51 swh: CVS would not be sufficient for RDF as we must control the triples, not the order how it is presented in textual form 12:19:16 ...sending queries over http api...dreaming of an ideal ql - like serql 12:19:34 CaptSolo: yes, you need to diff the triples - I came up with a diff like algorithm, be efficient (linear or near) implenntation is hard 12:19:47 ...but efficient... 12:20:12 diffing the triples isn't enough - you need to take account of bNodes 12:20:33 jeremy carroll had something about RDF graph canonicalisation at ISWC which is relevant here 12:20:44 bnaode are all different from all other bnodes, unless they have IDs 12:21:07 its a subset of the canon. problem 12:23:02 well, the id assigned to a bNode may change from one version to the next, while the graph structure itself remains the same. is the Node in one version really different from the bNode in the other? 12:23:29 in diff terms, yes 12:25:52 --- 12:26:01 relational datbases 12:26:06 one true schema? 12:28:00 kevin: built-in query optimisers 12:28:18 dirk: work with rdf schemas? 12:28:30 was that what he asked? 12:29:15 I was trying to say: work with the sorts of db schemas used for RDF? 12:29:24 I'm not sure taht he asked that either :) 12:29:29 heh 12:30:09 daveR: an overflow structure and optimisations. not translated to a fixed schema 12:30:20 ==jena2 12:31:03 swh: many dbs don't like multiple joins 12:32:16 nmg: didn't have time to write our own 12:33:14 dirk: probbaly ought to reexamine backend if so many joins 12:34:30 nick james: optimisers can;t gather statistics effectively in triples bucket structure 12:35:46 swh: uses partly backchaining, partially forward-chaining 12:36:10 jeen: exhaustive forward chaining operation 12:36:23 ...seems to work quite well :) 12:37:59 albert: what about changing data - reindex every week? 12:39:10 dirk: if many blank nodes in data and queries then be difficult: optimise the rdf to remove the blank nodes 12:39:40 AndyS: transactions, management, backups, support for relational technology 12:40:06 ...all good reasons to use sql databases 12:41:38 arjohn: jdbc itself is a bottlenack 12:42:57 Jena experience of JDBC is that drivers return whole query to client before first result to app, not stream as required :-( (MySQL and PostgreSQL documented features) 12:43:28 boo 12:44:07 eikeon has joined #swade 12:44:12 second andy's remark. this behaviour is very bad for memory footprint when big query results are returned:-( 12:44:17 Means whole result in memory at one time - big boo! No cursors 12:44:28 odd... 12:44:33 AndyS: there is a low level mysql api that has streaming 12:44:40 (but we dont use it ;) 12:45:16 Is that a network API? I want DB on a different machinne 12:45:45 yes 12:46:13 I will move some of the 3s stuff to hte streaming api, and its all nw transparent (of course) 12:46:22 the api is more fiddly than the chunked one 12:47:09 JackRusher has joined #swade 12:47:45 JackRusher has quit 12:47:49 JackRusher has joined #swade 12:49:41 This is the C API isn't it? mysql_connect takes a host name 12:49:46 yup 12:50:42 Do you use prepared statments? Coudl they be useful? 12:51:04 no, but I guess they could be - maybe query caching too, but its not really an issue at the moment 12:51:10 we do 12:51:12 the ram is more useful for disk caches 12:51:12 JackRusher has quit 12:51:24 I wondered about an app having common patterns of access 12:51:30 speeds up parsing but doesn't help getting the results 12:51:56 arjohn - is it for tuning? Of are there std prepared queries you use? 12:52:08 is parse speed an issue? I guess I could cache the pre-optimiser results 12:52:30 JackRusher has joined #swade 12:52:39 we use it while uploading data 12:52:50 ahh, right, I plan to look at that 12:56:48 jo: why can;t every triple be reified in jena2? 12:57:01 daveR: because a statemnt can be reified multiple times 12:59:09 --- 12:59:10 joins 12:59:45 swh: left joins are bad; however for RDF they don;t seem to be too bad (with indexes) 13:01:34 maybe, were supposed to be bad, by db theory is a little rought and probalby out of date 13:05:13 urgh :-/ z39.50 13:07:02 yeah 13:09:29 ...discussion of pros and cons of extensions to RDF, e.g. for dates, geo stuff 13:11:08 This is a good role for a WG - identify a set of operators that all can expect 13:11:22 ... or is that the profiles problem again 13:11:32 profiles? 13:11:56 yeah, operators like those in xquery would be good 13:12:13 (application) profiles - a restriction of a broad standard to a subset 13:13:19 xquery ops are probably a baseline 13:14:18 http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rmap/D2Rmap.htm 13:15:13 "D2R MAP is a declarative language to describe mappings between relational database schemata and OWL ontologies" 13:22:16 Any use of (e.g. Lucene) text indexing engines? 13:22:21 we do 13:23:10 How does it mix with the main store? 13:25:04 * AndyS waiting to hear from Jeen 13:27:01 aw heck 13:27:42 we integrate on the api level, so the text index is wrapped directly from source, we don't pull the data into the main (sql) store 13:27:55 "lazy" evaluation, if you will 13:28:16 So it is triple match and feed value expressions to the text db? 13:30:03 arjohn: any advantages to using digests or hashes? (digests are crypto) 13:30:15 dirk: sha1 is dirt cheap. 13:30:56 (??) sees no problem with generating any of these 13:31:07 swh: ditto - did some tests, esp md5 13:32:24 libby: tests for clashes? 13:32:36 sswh: tests for it, but very very unlikely to happen 13:35:05 "probability of a hash collision occurring in a knowledge base of 500 million resources and literals is around 1:10^-10" 13:35:13 ta :) 13:35:18 np :) 13:35:23 and you use full md5 hashes? 13:35:29 nope, top half 13:35:38 ok 13:35:45 with all 128 bits it would be eve safer, but too slow 13:35:45 I think I use part of a sha1 13:35:53 right, cool 13:35:55 if you like, but md5 is more common 13:36:06 not sure why we picked sha1 13:36:12 random, probbaly :) 13:36:23 doesnt really matter 13:45:38 jeen: does more or less 1-2-1 mapping between serql and sql query 13:48:14 my main point was actually that this mapping is an optional thing. if such a mapping is not possible, the gear is still there to evaluate the query. 13:48:40 Jena does similar - each store gets a chnace to 13:48:51 optimize a query - or part of a query 13:49:37 sounds like Sesame's mapping is more sophisticated 13:49:37 libby has quit 13:49:41 JackRusher has quit 13:50:03 libby has joined #swade 13:50:25 It's not really that complex I think. 13:51:20 JackRusher has joined #swade 14:32:19 jeen has quit 14:32:47 jeen has joined #swade 14:33:52 http://swap.semanticweb.org/ 14:34:33 SWAP - Semantic Web and Peer 2 Peer 14:48:34 arjohn has quit 14:50:30 http://bisw.ontoview.com/cgi-bin/SameIndivAs/SameIndivAs.pl 14:50:46 (sort of relevant for aggregation discussion) 14:51:22 tool by Borys Omalyaenko that analyzes RDF data and finds 'same invididuals'. 14:51:28 application-specific though. 14:51:39 nmg wrote something similar - dont have url to hand 14:52:17 Any clues to find the paper (unique naming problem?!) 14:53:49 SIMILE is currently looking at this for matching dc:creators in image catalogues 14:54:11 SIMILE == http://web.mit.edu/simile/www/ 15:00:27 nmg - any clues for that paper on "same individual" problem? 15:01:36 let me have a look 15:01:45 jeen - is there a URL for Boris's work? 15:01:55 * AndyS wants to pass refs on to others on SIMILE 15:03:09 s/Boris/Borys/ 15:06:45 AndyS: we had a paper in EKAW2002 describing some aspects of the coreference/sameIndividualAs problem, available at http://eprints.aktors.org/archive/00000076/ 15:07:48 Ta 15:09:43 there's his homepage: http://www.cs.vu.nl/~borys 15:10:08 I don't know if there is any publication about his SameIndividualAs tool, or if the code is available. I assume it is... 15:11:39 arjohn has joined #swade 15:15:18 Getting close: http://bisw.ontoview.com/cgi-bin/SameIndivAs/SameIndivAs.pl 15:17:29 yeah, but it's only the demo. no description or code download... 15:17:45 ISWC paper reference (well - name/title) anyone? 15:17:59 ---discussion on test data, - generated data better than real data, plus generated data not personal 15:18:13 ---discussion on provenance/contexts etc 15:19:06 AndyS: Benchmarking DAML+OIL Repositories, Yuanbo Guo, Jeff Heflin and Zhengxiang Pan 15:19:28 * libby asks for contexts or something in rdf2 - for network retrieval 15:19:41 Thanks nmg 15:19:58 AndyS: http://www.cse.lehigh.edu/~heflin/pubs/iswc2003.pdf 15:23:46 http://km.aifb.uni-karlsruhe.de/ws/psss03/proceedings/macgregor-et-al.pdf 15:23:55 paper by macgregor about contexts 15:30:26 jeen has quit 15:30:32 jeen has joined #swade 15:48:21 arjohn has quit 15:48:21 JackRusher has quit 15:49:23 jeen has quit 15:50:12 swh has quit 15:51:10 - end of meeting - 15:51:17 nmg has quit 15:51:19 back 09:00 GMT+1 tomorrow 15:53:25 AndyS has quit 15:57:55 eikeon has quit 16:06:54 dajobe-lap has quit 16:12:00 libby has quit 17:53:54 CaptSolo has quit 17:53:54 Wack has quit 17:54:01 CaptSolo has joined #swade 17:54:38 Wack has joined #swade 18:20:53 Wack has quit 18:20:53 CaptSolo has quit 18:21:40 Wack has joined #swade 18:21:40 CaptSolo has joined #swade 18:22:45 Wack has quit 18:23:07 Wack has joined #swade 18:23:10 CaptSolo has quit 18:23:13 CaptSolo has joined #swade