15:01:42 RRSAgent has joined #simile 15:01:58 hmm... waiting for chairperson... listening to musak... blech 15:02:06 mickBass has joined #simile 15:02:16 initiating teleconf 15:02:37 please stop the musak mick! :) 15:03:23 attending: eric, John, MacKenzie, Andy, Kevin 15:03:37 attending: Mark 15:03:39 kevins2 has joined #simile 15:04:06 Dialing ... 15:04:58 andy, I couldn't get the international sprint number to work, I dialled the toll free one, got a warning that I'd be charged, and then got accepted 15:06:01 Ah ... 15:07:03 Rob has joined #simile 15:08:33 mickBass: corpus data update 15:09:20 What was the result of the plenary discussion? 15:09:30 mackenzie: I sent metadata for 20 OCW courses written in IMS 15:09:42 unfortunately not the courses with a lot of images 15:10:01 I'm still looking for other IMS collections, they won't be big, but my proposal is we could take some IMS 15:10:13 or CIDOC data and convert it to IMS for the demo 15:10:45 ericm: On CIDOC, I haven't heard from Martin since the 13th, he took the sample license, but he needed to talk to the owners about the data 15:11:11 MacKenzie: with artstor we just didn't go there - if we needed some kind of agreement, we would have had to get the lawyers involved 15:11:25 ericm: Yes, I just sent the license as a basis for discussion. 15:11:36 mickBass: Eric, any update on the getty? 15:11:49 ericm: No, unfortunately not been able to get in touch with my contact 15:11:59 MacKenzie: you want to get it for free? 15:12:34 ericm: No. What we can get, what it will cost, and what we can do with it e.g. can we use it to create a web service? 15:12:55 MacKenzie: but that's not their intention in the short term? 15:13:15 mickBass: I have a tactical suggestion here. I think we should just get a license. I think it would be useful to examine this data. 15:13:46 ericm: the short term approach. There are two parts in my discussion - that's the first one. But what is holding the discussion up is just phone tag 15:14:12 mickBass: okay, we need to take the discussion offline about what it is likely to cost, what they consider the product is they are offering 15:14:29 the conversation about webservices could be facilitated by some demos in the project 15:16:59 mark: I've been looking at the CIDOC data and starting to write a transform, but I've found an RDFS schema, and I can't see at the moment 15:17:27 mark: example CIDOC + IMS data in CVS 15:17:37 not done much with IMS 15:18:05 how to translate it from XML to RDF in a way that fits with the schema - see email on list. I welcome input 15:18:23 from the team, and I intend to speak to Martin Doerr also 15:18:38 kevins2: now ims. 15:18:54 There is an area where you can find things like authors, these would be contribute fields 15:19:15 but often times the contribute field, it's hard to tell what the relation is between the field and the author 15:19:33 Sometimes its the name of a Professor who created the course, sometimes its the artist who created the work 15:19:48 Often its in prose, it's not easy to extract from the description 15:19:55 q+ to ask about OCLC's involvement 15:20:19 As far as I can tell, there is nothing in the records to describe permissions for use of the work. 15:20:38 The description often says "used with permission" but nothing in the metadata. 15:20:50 jse has joined #simile 15:20:55 MS: That might be due to the creative commons license used in OCW. They are trying to save labor 15:21:04 If you go to OCW you can find the license. 15:21:31 kevins2: There is some fine art in the courses that we have metadata, e.g. a course on Goya. Also a lot of the PDF files contain images even 15:21:36 though there are PDF files 15:21:49 MS: I've only tracked down one art class at the moment. 15:21:59 kevins2: The Goya image was from the "Age of Reason" course 15:22:17 I haven't looked at all of them, some of them are mechanical image drawings. 15:22:35 MS: Most courses have pictures associated with, but they are more decoration than learning objects 15:22:57 mickBass: we had questions a couple weeks about granularity? 15:23:19 kevins2: There is metadata information for images, but a lot of the time its copied from the course metadata. 15:23:32 Often the title is just the image filename. 15:23:50 MS: That's often true with title. Sometimes its things like "picture number 3" which isn't much use. 15:24:17 Often with IMS they don't have time to do item level cataloguing. There are research projects going on for doing automatic keyword assignment 15:24:33 but for images it is really hard to get the subject from the image itself. 15:24:40 mickBass: Any questions? 15:25:15 EM: Do we have commitment from OCLC here? 15:25:20 MS: Like a contract? 15:25:26 EM: No, like a web page? 15:25:53 MS: When I was there last, they showed me a prototype webservice working with DSpace. You type in the name, press this verify button, then 15:26:07 it goes to the webservice and does the authority lookup. 15:26:33 That isn't quite what we want, but they didn't have input from us. But I don't think this is a short-term goal e.g. committed to the demo. 15:26:49 EM: It would be good if they had something on their website? 15:27:07 AndyS: It sounds like that service is what we want during ingestion, e.g. when we process the Artstor records. 15:27:41 EM: What I would like is for someone to lurk from their side and get a feel for what kind of information we would need from the name authority files 15:28:06 MS: I think its Ralph Levine (?spelling), I haven't brought it up because integrating this system with their webservice will take work 15:28:20 Ralph LeVan 15:28:29 EM: It'll take work, but maybe not a lot on our end 15:28:50 MS: Maybe I should put Andy or Mark in touch with Ralph, it sounds like we need a more techncial discussion 15:29:19 mickBass: Rather then developing complex heuristics for splitting up the names in the Artstor data, we could just do this in the webservice 15:29:42 MS: I could write him and ask him in a hypothetical way if he is willing to do that? 15:30:11 mickBass: I'd ask that to Mark and Andy - I haven't talked about that yet. 15:30:37 AndyS: Well either of us, one of us will do it, Mark is actually processing the Artstor data 15:31:20 MS: Let me find out what their service is going to offer, so even if we do something stupid now we know we aren't going to have to do that in the future 15:31:41 mickBass: We wanted to have a discussion on next steps on ArtStor data with Mark, Andy and MacKenzie 15:32:10 They want to know it makes sense in a domain specific way, but also so that its portable, movable 15:35:17 AndyS: There are two schemas: One when we go from XML to RDF. From then we go to a representation in the VRA core vocabulary. 15:35:37 We've broken these up to determine which bits are data dependent and which bits are modelling dependent. 15:36:01 I'd like to go through the VRA core schema with someone with domain expertise to check my assumptions. 15:36:16 MS: That's why you asking for help from Eric or myself? 15:36:18 AndyS: Yes 15:36:38 MS: I'm not an expert in VRA myself, I know someone involved in the spec 15:36:54 AndyS: I don't think we need an expert just yet 15:37:22 MS: I think the person who would be good for this is Tony Gill. He's very familiar with RDF, he's familiar with CIDOC to 15:37:33 AndyS: Could you broker a phone conference with him? 15:38:13 MS: When you have something I can look at, I'm happy to do that, then we can forward it to him as well. 15:38:43 mickBass: I think pulling Tony in that capacity would be good. It's interesting he's involved in CIDOC as well. 15:38:57 MS: He's involved in the Harmony / CIDOC harmonization. 15:39:36 Mark: Would anyone else like at the sample RDF data from Artstor? 15:39:54 EM: I have some feedback. From a programmatic standpoint the translation looked pretty close. 15:40:24 They use a lot of topics with these images e.g. architecture:site. One thing that might be helpful is to give these concepts URIs 15:40:53 Just as you've done with hasMediaFile - MacKenzie do you know what kind of taxonomy they are using here? 15:41:08 I need to know Topic, Subject and Type? 15:41:17 MS: I think they are using Getty here 15:41:29 EM: I don't know what topic is, is that AAT or not? 15:42:41 Mark: I could build a tool for investigating the taxonomy, but it might take a while 15:42:55 Could someone else take over the minutes? 15:43:07 OK 15:44:16 Mick: Could try and feed data through Getty identifier, see what exceptions are 15:45:31 Mark: Sometimes appropriate to build a controlled vocab: depends on how often terms are used 15:45:43 ... changing 'Chichester (England)--Cathedral' to .... 15:46:00 15:46:07 giving terms URIs makes them classes 15:46:07 Chichester (England)--Cathedral 15:46:09 15:46:48 (disagreement) 15:47:03 just a label, not necessarily a class 15:48:20 AndyS: Annotation needs URIs, not classes. 15:48:53 AndyS: There are relationships elsewhere in the controlled vocabularies. 15:49:15 Mark: Added complexity without advantage? 15:49:36 no need unless you're specifying relationships between terms 15:50:21 Will need to assign URIs when you try and combine vocabs 15:51:38 Eric: There are programmatic and declarative approaches to the data... need both approaches 15:52:51 discussion of resolution property 15:56:13 Andy: The Artstor data will produce around 1-1.5 million triples 15:56:21 Mark: XML file had internationalisation issues 15:56:55 "Fr?ta (Spain)--San Martin" 15:57:47 Kevin: Perl script could fix encoding erros 15:58:29 summary 15:58:40 mark fixes stylesheet per eric suggestions 15:58:53 merge andy / mark observations & approaches 15:59:07 break up data set - run thru stylesheet - initial corpus in RDF 15:59:26 load initial corpus in RDF, check DB perf issues 15:59:55 choose a suitable namespace for schemas (MIT, W3C, DSpace/SIMILE?) 16:02:41 mickBass: There's one important question I want EM and MS to answer 16:02:54 It looks like we'll have lots of Artstor data, and lots of CIDOC data 16:03:02 MS: Eric, how much are we going to get? 16:03:11 EM: I'll check 16:03:43 mickBass: so a fundamental goal is get the demo script locked down, so an open question is we have two potential alternatives 16:04:06 one with VRA mapping to IMS, so we might be able to map or we might create some more 16:04:28 MS: CIDOC is more of an object model, I would have thought it doesn't deal with descriptive metadata directly 16:04:35 You can put anything in it. 16:04:48 EM: It's a object model of descriptive metadata. 16:05:06 MS: I'm just worried we are talking about apples and oranges. 16:05:13 EM: I think they'll connect. 16:05:30 MS: If you've got enough of them, its no wierder than VRA to IMS. 16:05:50 EM: I think more of mini-mapping, there will be bits and pieces of things that connect 16:06:18 mickBass: Do we have a strength that the number of connection points for each? 16:06:44 MS: I know there will be some connection points for VRA and IMS. And we can construct a script for that. 16:06:59 EM: Yes, but the CIDOC data is an architectural dig. So there may be an overlap here. 16:07:19 (archaeology not architectural) 16:07:32 EM: Some of the connections happen in the topics, name authority files etc 16:08:14 There is work that people are doing on professors and students, to get academic geneology. Not necessarily for the SIMILE project per say, 16:08:39 but when you try to draw different patterns from a faculty standpoint, areas and work, it becomes a big co-citation graph 16:08:57 there is a story here I haven't articulated but I think it would be cool. 16:09:49 mickBass: To return to next steps on the demoscript is to get the right set of individuals on the team to consider mapping to CIDOC and/or mapping to IMS from VRA 16:09:52 JSE: Sounds a bit like the mapping/graphing that the jibble.org people have been playing with... 16:10:02 It sounds like this is still an open question 16:10:28 mickBass: It will happen de facto if we can't get the CIDOC data. 16:10:40 EM: We have the data, I just don't know what we can do with it yet. 16:10:55 MS: We should hear back from Martin next week, then we share it around 16:11:14 rrsagent, pointer? 16:11:15 See http://www.w3.org/2003/10/16-simile-irc#T16-11-14 16:11:17 Then I don't think it will take long for us to nail down what we want to do. I can get a little bit IMS data but not a lot more 16:11:32 I don't see why we couldn't decide that in a week or two 16:11:47 http://www.w3.org/2003/10/16-simile-irc is now publig 16:11:51 mickBass: I'm just pointing out its on the critical path 16:12:15 one we have these three, the next steps are to continute the modelling discussions, see how it should be modelled in RDF 16:12:33 then we need to answer connections in the corpori 16:12:50 we can zero in on "these are the particular queries we would like to show" 16:13:03 so we are all on the same page WRT to timing here 16:13:52 MS: I do think thats my goal, but it will evolve more at the plenary, its hard to do this in a distributed way, but the server should be ready soon, we can put it on a protected website 16:14:07 mickBass: this will iterate, we want to be at first or second generation by then