IRC log of simile on 2003-10-16
Timestamps are in UTC.
- 15:01:42 [RRSAgent]
- RRSAgent has joined #simile
- 15:01:58 [ericm]
- hmm... waiting for chairperson... listening to musak... blech
- 15:02:06 [mickBass]
- mickBass has joined #simile
- 15:02:16 [mickBass]
- initiating teleconf
- 15:02:37 [ericm]
- please stop the musak mick! :)
- 15:03:23 [mickBass]
- attending: eric, John, MacKenzie, Andy, Kevin
- 15:03:37 [mickBass]
- attending: Mark
- 15:03:39 [kevins2]
- kevins2 has joined #simile
- 15:04:06 [AndyS]
- Dialing ...
- 15:04:58 [marbut]
- andy, I couldn't get the international sprint number to work, I dialled the toll free one, got a warning that I'd be charged, and then got accepted
- 15:06:01 [AndyS]
- Ah ...
- 15:07:03 [Rob]
- Rob has joined #simile
- 15:08:33 [marbut]
- mickBass: corpus data update
- 15:09:20 [AndyS]
- What was the result of the plenary discussion?
- 15:09:30 [marbut]
- mackenzie: I sent metadata for 20 OCW courses written in IMS
- 15:09:42 [marbut]
- unfortunately not the courses with a lot of images
- 15:10:01 [marbut]
- I'm still looking for other IMS collections, they won't be big, but my proposal is we could take some IMS
- 15:10:13 [marbut]
- or CIDOC data and convert it to IMS for the demo
- 15:10:45 [marbut]
- ericm: On CIDOC, I haven't heard from Martin since the 13th, he took the sample license, but he needed to talk to the owners about the data
- 15:11:11 [marbut]
- MacKenzie: with artstor we just didn't go there - if we needed some kind of agreement, we would have had to get the lawyers involved
- 15:11:25 [marbut]
- ericm: Yes, I just sent the license as a basis for discussion.
- 15:11:36 [marbut]
- mickBass: Eric, any update on the getty?
- 15:11:49 [marbut]
- ericm: No, unfortunately not been able to get in touch with my contact
- 15:11:59 [marbut]
- MacKenzie: you want to get it for free?
- 15:12:34 [marbut]
- ericm: No. What we can get, what it will cost, and what we can do with it e.g. can we use it to create a web service?
- 15:12:55 [marbut]
- MacKenzie: but that's not their intention in the short term?
- 15:13:15 [marbut]
- mickBass: I have a tactical suggestion here. I think we should just get a license. I think it would be useful to examine this data.
- 15:13:46 [marbut]
- ericm: the short term approach. There are two parts in my discussion - that's the first one. But what is holding the discussion up is just phone tag
- 15:14:12 [marbut]
- mickBass: okay, we need to take the discussion offline about what it is likely to cost, what they consider the product is they are offering
- 15:14:29 [marbut]
- the conversation about webservices could be facilitated by some demos in the project
- 15:16:59 [marbut]
- mark: I've been looking at the CIDOC data and starting to write a transform, but I've found an RDFS schema, and I can't see at the moment
- 15:17:27 [Rob]
- mark: example CIDOC + IMS data in CVS
- 15:17:37 [Rob]
- not done much with IMS
- 15:18:05 [marbut]
- how to translate it from XML to RDF in a way that fits with the schema - see email on list. I welcome input
- 15:18:23 [marbut]
- from the team, and I intend to speak to Martin Doerr also
- 15:18:38 [marbut]
- kevins2: now ims.
- 15:18:54 [marbut]
- There is an area where you can find things like authors, these would be contribute fields
- 15:19:15 [marbut]
- but often times the contribute field, it's hard to tell what the relation is between the field and the author
- 15:19:33 [marbut]
- Sometimes its the name of a Professor who created the course, sometimes its the artist who created the work
- 15:19:48 [marbut]
- Often its in prose, it's not easy to extract from the description
- 15:19:55 [ericm]
- q+ to ask about OCLC's involvement
- 15:20:19 [marbut]
- As far as I can tell, there is nothing in the records to describe permissions for use of the work.
- 15:20:38 [marbut]
- The description often says "used with permission" but nothing in the metadata.
- 15:20:50 [jse]
- jse has joined #simile
- 15:20:55 [marbut]
- MS: That might be due to the creative commons license used in OCW. They are trying to save labor
- 15:21:04 [marbut]
- If you go to OCW you can find the license.
- 15:21:31 [marbut]
- kevins2: There is some fine art in the courses that we have metadata, e.g. a course on Goya. Also a lot of the PDF files contain images even
- 15:21:36 [marbut]
- though there are PDF files
- 15:21:49 [marbut]
- MS: I've only tracked down one art class at the moment.
- 15:21:59 [marbut]
- kevins2: The Goya image was from the "Age of Reason" course
- 15:22:17 [marbut]
- I haven't looked at all of them, some of them are mechanical image drawings.
- 15:22:35 [marbut]
- MS: Most courses have pictures associated with, but they are more decoration than learning objects
- 15:22:57 [marbut]
- mickBass: we had questions a couple weeks about granularity?
- 15:23:19 [marbut]
- kevins2: There is metadata information for images, but a lot of the time its copied from the course metadata.
- 15:23:32 [marbut]
- Often the title is just the image filename.
- 15:23:50 [marbut]
- MS: That's often true with title. Sometimes its things like "picture number 3" which isn't much use.
- 15:24:17 [marbut]
- Often with IMS they don't have time to do item level cataloguing. There are research projects going on for doing automatic keyword assignment
- 15:24:33 [marbut]
- but for images it is really hard to get the subject from the image itself.
- 15:24:40 [marbut]
- mickBass: Any questions?
- 15:25:15 [marbut]
- EM: Do we have commitment from OCLC here?
- 15:25:20 [marbut]
- MS: Like a contract?
- 15:25:26 [marbut]
- EM: No, like a web page?
- 15:25:53 [marbut]
- MS: When I was there last, they showed me a prototype webservice working with DSpace. You type in the name, press this verify button, then
- 15:26:07 [marbut]
- it goes to the webservice and does the authority lookup.
- 15:26:33 [marbut]
- That isn't quite what we want, but they didn't have input from us. But I don't think this is a short-term goal e.g. committed to the demo.
- 15:26:49 [marbut]
- EM: It would be good if they had something on their website?
- 15:27:07 [marbut]
- AndyS: It sounds like that service is what we want during ingestion, e.g. when we process the Artstor records.
- 15:27:41 [marbut]
- EM: What I would like is for someone to lurk from their side and get a feel for what kind of information we would need from the name authority files
- 15:28:06 [marbut]
- MS: I think its Ralph Levine (?spelling), I haven't brought it up because integrating this system with their webservice will take work
- 15:28:20 [ericm]
- Ralph LeVan
- 15:28:29 [marbut]
- EM: It'll take work, but maybe not a lot on our end
- 15:28:50 [marbut]
- MS: Maybe I should put Andy or Mark in touch with Ralph, it sounds like we need a more techncial discussion
- 15:29:19 [marbut]
- mickBass: Rather then developing complex heuristics for splitting up the names in the Artstor data, we could just do this in the webservice
- 15:29:42 [marbut]
- MS: I could write him and ask him in a hypothetical way if he is willing to do that?
- 15:30:11 [marbut]
- mickBass: I'd ask that to Mark and Andy - I haven't talked about that yet.
- 15:30:37 [marbut]
- AndyS: Well either of us, one of us will do it, Mark is actually processing the Artstor data
- 15:31:20 [marbut]
- MS: Let me find out what their service is going to offer, so even if we do something stupid now we know we aren't going to have to do that in the future
- 15:31:41 [marbut]
- mickBass: We wanted to have a discussion on next steps on ArtStor data with Mark, Andy and MacKenzie
- 15:32:10 [marbut]
- They want to know it makes sense in a domain specific way, but also so that its portable, movable
- 15:35:17 [marbut]
- AndyS: There are two schemas: One when we go from XML to RDF. From then we go to a representation in the VRA core vocabulary.
- 15:35:37 [marbut]
- We've broken these up to determine which bits are data dependent and which bits are modelling dependent.
- 15:36:01 [marbut]
- I'd like to go through the VRA core schema with someone with domain expertise to check my assumptions.
- 15:36:16 [marbut]
- MS: That's why you asking for help from Eric or myself?
- 15:36:18 [marbut]
- AndyS: Yes
- 15:36:38 [marbut]
- MS: I'm not an expert in VRA myself, I know someone involved in the spec
- 15:36:54 [marbut]
- AndyS: I don't think we need an expert just yet
- 15:37:22 [marbut]
- MS: I think the person who would be good for this is Tony Gill. He's very familiar with RDF, he's familiar with CIDOC to
- 15:37:33 [marbut]
- AndyS: Could you broker a phone conference with him?
- 15:38:13 [marbut]
- MS: When you have something I can look at, I'm happy to do that, then we can forward it to him as well.
- 15:38:43 [marbut]
- mickBass: I think pulling Tony in that capacity would be good. It's interesting he's involved in CIDOC as well.
- 15:38:57 [marbut]
- MS: He's involved in the Harmony / CIDOC harmonization.
- 15:39:36 [marbut]
- Mark: Would anyone else like at the sample RDF data from Artstor?
- 15:39:54 [marbut]
- EM: I have some feedback. From a programmatic standpoint the translation looked pretty close.
- 15:40:24 [marbut]
- They use a lot of topics with these images e.g. architecture:site. One thing that might be helpful is to give these concepts URIs
- 15:40:53 [marbut]
- Just as you've done with hasMediaFile - MacKenzie do you know what kind of taxonomy they are using here?
- 15:41:08 [marbut]
- I need to know Topic, Subject and Type?
- 15:41:17 [marbut]
- MS: I think they are using Getty here
- 15:41:29 [marbut]
- EM: I don't know what topic is, is that AAT or not?
- 15:42:41 [marbut]
- Mark: I could build a tool for investigating the taxonomy, but it might take a while
- 15:42:55 [marbut]
- Could someone else take over the minutes?
- 15:43:07 [Rob]
- OK
- 15:44:16 [Rob]
- Mick: Could try and feed data through Getty identifier, see what exceptions are
- 15:45:31 [Rob]
- Mark: Sometimes appropriate to build a controlled vocab: depends on how often terms are used
- 15:45:43 [ericm]
- ... changing 'Chichester (England)--Cathedral' to ....
- 15:46:00 [ericm]
- <topic rdf:about = "http://example.org/Chichester (England)--Cathedral">
- 15:46:07 [Rob]
- giving terms URIs makes them classes
- 15:46:07 [ericm]
- <rdfs:label>Chichester (England)--Cathedral</>
- 15:46:09 [ericm]
- </>
- 15:46:48 [Rob]
- (disagreement)
- 15:47:03 [Rob]
- just a label, not necessarily a class
- 15:48:20 [AndyS]
- AndyS: Annotation needs URIs, not classes.
- 15:48:53 [AndyS]
- AndyS: There are relationships elsewhere in the controlled vocabularies.
- 15:49:15 [Rob]
- Mark: Added complexity without advantage?
- 15:49:36 [Rob]
- no need unless you're specifying relationships between terms
- 15:50:21 [Rob]
- Will need to assign URIs when you try and combine vocabs
- 15:51:38 [Rob]
- Eric: There are programmatic and declarative approaches to the data... need both approaches
- 15:52:51 [Rob]
- discussion of resolution property
- 15:56:13 [Rob]
- Andy: The Artstor data will produce around 1-1.5 million triples
- 15:56:21 [Rob]
- Mark: XML file had internationalisation issues
- 15:56:55 [ericm]
- "Fr?ta (Spain)--San Martin"
- 15:57:47 [Rob]
- Kevin: Perl script could fix encoding erros
- 15:58:29 [mickBass]
- summary
- 15:58:40 [mickBass]
- mark fixes stylesheet per eric suggestions
- 15:58:53 [mickBass]
- merge andy / mark observations & approaches
- 15:59:07 [mickBass]
- break up data set - run thru stylesheet - initial corpus in RDF
- 15:59:26 [mickBass]
- load initial corpus in RDF, check DB perf issues
- 15:59:55 [mickBass]
- choose a suitable namespace for schemas (MIT, W3C, DSpace/SIMILE?)
- 16:02:41 [marbut]
- mickBass: There's one important question I want EM and MS to answer
- 16:02:54 [marbut]
- It looks like we'll have lots of Artstor data, and lots of CIDOC data
- 16:03:02 [marbut]
- MS: Eric, how much are we going to get?
- 16:03:11 [marbut]
- EM: I'll check
- 16:03:43 [marbut]
- mickBass: so a fundamental goal is get the demo script locked down, so an open question is we have two potential alternatives
- 16:04:06 [marbut]
- one with VRA mapping to IMS, so we might be able to map or we might create some more
- 16:04:28 [marbut]
- MS: CIDOC is more of an object model, I would have thought it doesn't deal with descriptive metadata directly
- 16:04:35 [marbut]
- You can put anything in it.
- 16:04:48 [marbut]
- EM: It's a object model of descriptive metadata.
- 16:05:06 [marbut]
- MS: I'm just worried we are talking about apples and oranges.
- 16:05:13 [marbut]
- EM: I think they'll connect.
- 16:05:30 [marbut]
- MS: If you've got enough of them, its no wierder than VRA to IMS.
- 16:05:50 [marbut]
- EM: I think more of mini-mapping, there will be bits and pieces of things that connect
- 16:06:18 [marbut]
- mickBass: Do we have a strength that the number of connection points for each?
- 16:06:44 [marbut]
- MS: I know there will be some connection points for VRA and IMS. And we can construct a script for that.
- 16:06:59 [marbut]
- EM: Yes, but the CIDOC data is an architectural dig. So there may be an overlap here.
- 16:07:19 [marbut]
- (archaeology not architectural)
- 16:07:32 [marbut]
- EM: Some of the connections happen in the topics, name authority files etc
- 16:08:14 [marbut]
- There is work that people are doing on professors and students, to get academic geneology. Not necessarily for the SIMILE project per say,
- 16:08:39 [marbut]
- but when you try to draw different patterns from a faculty standpoint, areas and work, it becomes a big co-citation graph
- 16:08:57 [marbut]
- there is a story here I haven't articulated but I think it would be cool.
- 16:09:49 [marbut]
- mickBass: To return to next steps on the demoscript is to get the right set of individuals on the team to consider mapping to CIDOC and/or mapping to IMS from VRA
- 16:09:52 [jse]
- JSE: Sounds a bit like the mapping/graphing that the jibble.org people have been playing with...
- 16:10:02 [marbut]
- It sounds like this is still an open question
- 16:10:28 [marbut]
- mickBass: It will happen de facto if we can't get the CIDOC data.
- 16:10:40 [marbut]
- EM: We have the data, I just don't know what we can do with it yet.
- 16:10:55 [marbut]
- MS: We should hear back from Martin next week, then we share it around
- 16:11:14 [ericm]
- rrsagent, pointer?
- 16:11:15 [RRSAgent]
- See http://www.w3.org/2003/10/16-simile-irc#T16-11-14
- 16:11:17 [marbut]
- Then I don't think it will take long for us to nail down what we want to do. I can get a little bit IMS data but not a lot more
- 16:11:32 [marbut]
- I don't see why we couldn't decide that in a week or two
- 16:11:47 [ericm]
- http://www.w3.org/2003/10/16-simile-irc is now publig
- 16:11:51 [marbut]
- mickBass: I'm just pointing out its on the critical path
- 16:12:15 [marbut]
- one we have these three, the next steps are to continute the modelling discussions, see how it should be modelled in RDF
- 16:12:33 [marbut]
- then we need to answer connections in the corpori
- 16:12:50 [marbut]
- we can zero in on "these are the particular queries we would like to show"
- 16:13:03 [marbut]
- so we are all on the same page WRT to timing here
- 16:13:52 [marbut]
- MS: I do think thats my goal, but it will evolve more at the plenary, its hard to do this in a distributed way, but the server should be ready soon, we can put it on a protected website
- 16:14:07 [marbut]
- mickBass: this will iterate, we want to be at first or second generation by then