IRC log of simile on 2003-10-16

Timestamps are in UTC.

15:01:42 [RRSAgent]
RRSAgent has joined #simile
15:01:58 [ericm]
hmm... waiting for chairperson... listening to musak... blech
15:02:06 [mickBass]
mickBass has joined #simile
15:02:16 [mickBass]
initiating teleconf
15:02:37 [ericm]
please stop the musak mick! :)
15:03:23 [mickBass]
attending: eric, John, MacKenzie, Andy, Kevin
15:03:37 [mickBass]
attending: Mark
15:03:39 [kevins2]
kevins2 has joined #simile
15:04:06 [AndyS]
Dialing ...
15:04:58 [marbut]
andy, I couldn't get the international sprint number to work, I dialled the toll free one, got a warning that I'd be charged, and then got accepted
15:06:01 [AndyS]
Ah ...
15:07:03 [Rob]
Rob has joined #simile
15:08:33 [marbut]
mickBass: corpus data update
15:09:20 [AndyS]
What was the result of the plenary discussion?
15:09:30 [marbut]
mackenzie: I sent metadata for 20 OCW courses written in IMS
15:09:42 [marbut]
unfortunately not the courses with a lot of images
15:10:01 [marbut]
I'm still looking for other IMS collections, they won't be big, but my proposal is we could take some IMS
15:10:13 [marbut]
or CIDOC data and convert it to IMS for the demo
15:10:45 [marbut]
ericm: On CIDOC, I haven't heard from Martin since the 13th, he took the sample license, but he needed to talk to the owners about the data
15:11:11 [marbut]
MacKenzie: with artstor we just didn't go there - if we needed some kind of agreement, we would have had to get the lawyers involved
15:11:25 [marbut]
ericm: Yes, I just sent the license as a basis for discussion.
15:11:36 [marbut]
mickBass: Eric, any update on the getty?
15:11:49 [marbut]
ericm: No, unfortunately not been able to get in touch with my contact
15:11:59 [marbut]
MacKenzie: you want to get it for free?
15:12:34 [marbut]
ericm: No. What we can get, what it will cost, and what we can do with it e.g. can we use it to create a web service?
15:12:55 [marbut]
MacKenzie: but that's not their intention in the short term?
15:13:15 [marbut]
mickBass: I have a tactical suggestion here. I think we should just get a license. I think it would be useful to examine this data.
15:13:46 [marbut]
ericm: the short term approach. There are two parts in my discussion - that's the first one. But what is holding the discussion up is just phone tag
15:14:12 [marbut]
mickBass: okay, we need to take the discussion offline about what it is likely to cost, what they consider the product is they are offering
15:14:29 [marbut]
the conversation about webservices could be facilitated by some demos in the project
15:16:59 [marbut]
mark: I've been looking at the CIDOC data and starting to write a transform, but I've found an RDFS schema, and I can't see at the moment
15:17:27 [Rob]
mark: example CIDOC + IMS data in CVS
15:17:37 [Rob]
not done much with IMS
15:18:05 [marbut]
how to translate it from XML to RDF in a way that fits with the schema - see email on list. I welcome input
15:18:23 [marbut]
from the team, and I intend to speak to Martin Doerr also
15:18:38 [marbut]
kevins2: now ims.
15:18:54 [marbut]
There is an area where you can find things like authors, these would be contribute fields
15:19:15 [marbut]
but often times the contribute field, it's hard to tell what the relation is between the field and the author
15:19:33 [marbut]
Sometimes its the name of a Professor who created the course, sometimes its the artist who created the work
15:19:48 [marbut]
Often its in prose, it's not easy to extract from the description
15:19:55 [ericm]
q+ to ask about OCLC's involvement
15:20:19 [marbut]
As far as I can tell, there is nothing in the records to describe permissions for use of the work.
15:20:38 [marbut]
The description often says "used with permission" but nothing in the metadata.
15:20:50 [jse]
jse has joined #simile
15:20:55 [marbut]
MS: That might be due to the creative commons license used in OCW. They are trying to save labor
15:21:04 [marbut]
If you go to OCW you can find the license.
15:21:31 [marbut]
kevins2: There is some fine art in the courses that we have metadata, e.g. a course on Goya. Also a lot of the PDF files contain images even
15:21:36 [marbut]
though there are PDF files
15:21:49 [marbut]
MS: I've only tracked down one art class at the moment.
15:21:59 [marbut]
kevins2: The Goya image was from the "Age of Reason" course
15:22:17 [marbut]
I haven't looked at all of them, some of them are mechanical image drawings.
15:22:35 [marbut]
MS: Most courses have pictures associated with, but they are more decoration than learning objects
15:22:57 [marbut]
mickBass: we had questions a couple weeks about granularity?
15:23:19 [marbut]
kevins2: There is metadata information for images, but a lot of the time its copied from the course metadata.
15:23:32 [marbut]
Often the title is just the image filename.
15:23:50 [marbut]
MS: That's often true with title. Sometimes its things like "picture number 3" which isn't much use.
15:24:17 [marbut]
Often with IMS they don't have time to do item level cataloguing. There are research projects going on for doing automatic keyword assignment
15:24:33 [marbut]
but for images it is really hard to get the subject from the image itself.
15:24:40 [marbut]
mickBass: Any questions?
15:25:15 [marbut]
EM: Do we have commitment from OCLC here?
15:25:20 [marbut]
MS: Like a contract?
15:25:26 [marbut]
EM: No, like a web page?
15:25:53 [marbut]
MS: When I was there last, they showed me a prototype webservice working with DSpace. You type in the name, press this verify button, then
15:26:07 [marbut]
it goes to the webservice and does the authority lookup.
15:26:33 [marbut]
That isn't quite what we want, but they didn't have input from us. But I don't think this is a short-term goal e.g. committed to the demo.
15:26:49 [marbut]
EM: It would be good if they had something on their website?
15:27:07 [marbut]
AndyS: It sounds like that service is what we want during ingestion, e.g. when we process the Artstor records.
15:27:41 [marbut]
EM: What I would like is for someone to lurk from their side and get a feel for what kind of information we would need from the name authority files
15:28:06 [marbut]
MS: I think its Ralph Levine (?spelling), I haven't brought it up because integrating this system with their webservice will take work
15:28:20 [ericm]
Ralph LeVan
15:28:29 [marbut]
EM: It'll take work, but maybe not a lot on our end
15:28:50 [marbut]
MS: Maybe I should put Andy or Mark in touch with Ralph, it sounds like we need a more techncial discussion
15:29:19 [marbut]
mickBass: Rather then developing complex heuristics for splitting up the names in the Artstor data, we could just do this in the webservice
15:29:42 [marbut]
MS: I could write him and ask him in a hypothetical way if he is willing to do that?
15:30:11 [marbut]
mickBass: I'd ask that to Mark and Andy - I haven't talked about that yet.
15:30:37 [marbut]
AndyS: Well either of us, one of us will do it, Mark is actually processing the Artstor data
15:31:20 [marbut]
MS: Let me find out what their service is going to offer, so even if we do something stupid now we know we aren't going to have to do that in the future
15:31:41 [marbut]
mickBass: We wanted to have a discussion on next steps on ArtStor data with Mark, Andy and MacKenzie
15:32:10 [marbut]
They want to know it makes sense in a domain specific way, but also so that its portable, movable
15:35:17 [marbut]
AndyS: There are two schemas: One when we go from XML to RDF. From then we go to a representation in the VRA core vocabulary.
15:35:37 [marbut]
We've broken these up to determine which bits are data dependent and which bits are modelling dependent.
15:36:01 [marbut]
I'd like to go through the VRA core schema with someone with domain expertise to check my assumptions.
15:36:16 [marbut]
MS: That's why you asking for help from Eric or myself?
15:36:18 [marbut]
AndyS: Yes
15:36:38 [marbut]
MS: I'm not an expert in VRA myself, I know someone involved in the spec
15:36:54 [marbut]
AndyS: I don't think we need an expert just yet
15:37:22 [marbut]
MS: I think the person who would be good for this is Tony Gill. He's very familiar with RDF, he's familiar with CIDOC to
15:37:33 [marbut]
AndyS: Could you broker a phone conference with him?
15:38:13 [marbut]
MS: When you have something I can look at, I'm happy to do that, then we can forward it to him as well.
15:38:43 [marbut]
mickBass: I think pulling Tony in that capacity would be good. It's interesting he's involved in CIDOC as well.
15:38:57 [marbut]
MS: He's involved in the Harmony / CIDOC harmonization.
15:39:36 [marbut]
Mark: Would anyone else like at the sample RDF data from Artstor?
15:39:54 [marbut]
EM: I have some feedback. From a programmatic standpoint the translation looked pretty close.
15:40:24 [marbut]
They use a lot of topics with these images e.g. architecture:site. One thing that might be helpful is to give these concepts URIs
15:40:53 [marbut]
Just as you've done with hasMediaFile - MacKenzie do you know what kind of taxonomy they are using here?
15:41:08 [marbut]
I need to know Topic, Subject and Type?
15:41:17 [marbut]
MS: I think they are using Getty here
15:41:29 [marbut]
EM: I don't know what topic is, is that AAT or not?
15:42:41 [marbut]
Mark: I could build a tool for investigating the taxonomy, but it might take a while
15:42:55 [marbut]
Could someone else take over the minutes?
15:43:07 [Rob]
OK
15:44:16 [Rob]
Mick: Could try and feed data through Getty identifier, see what exceptions are
15:45:31 [Rob]
Mark: Sometimes appropriate to build a controlled vocab: depends on how often terms are used
15:45:43 [ericm]
... changing 'Chichester (England)--Cathedral' to ....
15:46:00 [ericm]
<topic rdf:about = "http://example.org/Chichester (England)--Cathedral">
15:46:07 [Rob]
giving terms URIs makes them classes
15:46:07 [ericm]
<rdfs:label>Chichester (England)--Cathedral</>
15:46:09 [ericm]
</>
15:46:48 [Rob]
(disagreement)
15:47:03 [Rob]
just a label, not necessarily a class
15:48:20 [AndyS]
AndyS: Annotation needs URIs, not classes.
15:48:53 [AndyS]
AndyS: There are relationships elsewhere in the controlled vocabularies.
15:49:15 [Rob]
Mark: Added complexity without advantage?
15:49:36 [Rob]
no need unless you're specifying relationships between terms
15:50:21 [Rob]
Will need to assign URIs when you try and combine vocabs
15:51:38 [Rob]
Eric: There are programmatic and declarative approaches to the data... need both approaches
15:52:51 [Rob]
discussion of resolution property
15:56:13 [Rob]
Andy: The Artstor data will produce around 1-1.5 million triples
15:56:21 [Rob]
Mark: XML file had internationalisation issues
15:56:55 [ericm]
"Fr?ta (Spain)--San Martin"
15:57:47 [Rob]
Kevin: Perl script could fix encoding erros
15:58:29 [mickBass]
summary
15:58:40 [mickBass]
mark fixes stylesheet per eric suggestions
15:58:53 [mickBass]
merge andy / mark observations & approaches
15:59:07 [mickBass]
break up data set - run thru stylesheet - initial corpus in RDF
15:59:26 [mickBass]
load initial corpus in RDF, check DB perf issues
15:59:55 [mickBass]
choose a suitable namespace for schemas (MIT, W3C, DSpace/SIMILE?)
16:02:41 [marbut]
mickBass: There's one important question I want EM and MS to answer
16:02:54 [marbut]
It looks like we'll have lots of Artstor data, and lots of CIDOC data
16:03:02 [marbut]
MS: Eric, how much are we going to get?
16:03:11 [marbut]
EM: I'll check
16:03:43 [marbut]
mickBass: so a fundamental goal is get the demo script locked down, so an open question is we have two potential alternatives
16:04:06 [marbut]
one with VRA mapping to IMS, so we might be able to map or we might create some more
16:04:28 [marbut]
MS: CIDOC is more of an object model, I would have thought it doesn't deal with descriptive metadata directly
16:04:35 [marbut]
You can put anything in it.
16:04:48 [marbut]
EM: It's a object model of descriptive metadata.
16:05:06 [marbut]
MS: I'm just worried we are talking about apples and oranges.
16:05:13 [marbut]
EM: I think they'll connect.
16:05:30 [marbut]
MS: If you've got enough of them, its no wierder than VRA to IMS.
16:05:50 [marbut]
EM: I think more of mini-mapping, there will be bits and pieces of things that connect
16:06:18 [marbut]
mickBass: Do we have a strength that the number of connection points for each?
16:06:44 [marbut]
MS: I know there will be some connection points for VRA and IMS. And we can construct a script for that.
16:06:59 [marbut]
EM: Yes, but the CIDOC data is an architectural dig. So there may be an overlap here.
16:07:19 [marbut]
(archaeology not architectural)
16:07:32 [marbut]
EM: Some of the connections happen in the topics, name authority files etc
16:08:14 [marbut]
There is work that people are doing on professors and students, to get academic geneology. Not necessarily for the SIMILE project per say,
16:08:39 [marbut]
but when you try to draw different patterns from a faculty standpoint, areas and work, it becomes a big co-citation graph
16:08:57 [marbut]
there is a story here I haven't articulated but I think it would be cool.
16:09:49 [marbut]
mickBass: To return to next steps on the demoscript is to get the right set of individuals on the team to consider mapping to CIDOC and/or mapping to IMS from VRA
16:09:52 [jse]
JSE: Sounds a bit like the mapping/graphing that the jibble.org people have been playing with...
16:10:02 [marbut]
It sounds like this is still an open question
16:10:28 [marbut]
mickBass: It will happen de facto if we can't get the CIDOC data.
16:10:40 [marbut]
EM: We have the data, I just don't know what we can do with it yet.
16:10:55 [marbut]
MS: We should hear back from Martin next week, then we share it around
16:11:14 [ericm]
rrsagent, pointer?
16:11:15 [RRSAgent]
See http://www.w3.org/2003/10/16-simile-irc#T16-11-14
16:11:17 [marbut]
Then I don't think it will take long for us to nail down what we want to do. I can get a little bit IMS data but not a lot more
16:11:32 [marbut]
I don't see why we couldn't decide that in a week or two
16:11:47 [ericm]
http://www.w3.org/2003/10/16-simile-irc is now publig
16:11:51 [marbut]
mickBass: I'm just pointing out its on the critical path
16:12:15 [marbut]
one we have these three, the next steps are to continute the modelling discussions, see how it should be modelled in RDF
16:12:33 [marbut]
then we need to answer connections in the corpori
16:12:50 [marbut]
we can zero in on "these are the particular queries we would like to show"
16:13:03 [marbut]
so we are all on the same page WRT to timing here
16:13:52 [marbut]
MS: I do think thats my goal, but it will evolve more at the plenary, its hard to do this in a distributed way, but the server should be ready soon, we can put it on a protected website
16:14:07 [marbut]
mickBass: this will iterate, we want to be at first or second generation by then