DSpace W3 Orientation

On October 11, people from W3C met with people from HP/MIT's DSpace project. Following are the proceedings.
Participants

Ralph Swick - W3C
Dan Brickley - W3C
Marja Koivunen - W3C
Art Barstow - W3C, HP
Eric Prud'homeaux - W3C
Dave Becket - ILRT
Mick Bass - HP
Margaret Benchofsky - MIT library
David Stuve - MIT
Joyce - MIT masters studen
Bill Catty - MIT informations systems
Minutes

Ralph Swick, Dan Brickly, Art Barstow, Eric Prud'hommeaux and Dave Becket arrived from W3C

met Mick Bass and Margar Benchofsky and David Stuve.


Mick Bass:
  project leader for DSpace
  Hp employee

  DSpace has MIT, Hp, and contractors

Eric:

Ralph Swick:
  RDF is a long-time project.
  integrated with other work under name of metadata
    though it's all data
  Danny W and ??? have had prior conversations but tx to Art and Mick

Margaret Benchofsky
  long time librarian
  factulty contact for DSpace
  UI interests

Dan Brickely:
  RDF interest group
  RDF schema spec
  other .5 time: working with Dave in Bristol

Dave Becket
  working on metadata for 5 years
  developing meatadata library called redlin

David Stuve:
  Hp donation to MIT for next 1.5 years
  DSpace design work
    archive aspects
    metadata storage
  may also be doing some UI stuff

Ralph:
  we have several metadata projects
  in the middle of defining next phase of metadata activity
  Art is here to do RDF tool development
  1 year ago we got .5 danbri from Bristol
  Bristol work is seperately coordinated

danbri:
  another connection - Hp labs in Bristol
  4-5 ILRT developers in lunch c=powows

joyce:
  master student working with Mick
  Hal Ableson is thesis advisor

sheet 1:
  purpose:
    establish W3C-DSpace relationship
    get feedback - avoid blindspots

  goals:
    W3C understand DSpace
    DSpace understand frontier/direction

sheet 2:
  Steps:
    intros - done
    DSpace Metadata Approach - dstuve
    W3C - RDF activities Review - art
    W3C - Metadata activiteis Review - ralph
    free-for-all
    lunch!

---- DSpace Metadata Approach ----
dstuve:
  sponsored by Hp and MIT libraries
  goal: institutional intellectual capitol archive
  others: lanl( physis + math), cognet (MITPress)
  lots of domain-specific 
<enter peter>
  umbrella archive for everything produced by MIT
  cross discipline - therefor need metadata abstraction
  +------------------------------------+
  |          math subspace             |
  |  periodicals   MIT press   theses  |
  +------------------------------------+
  |        physics subspace            |
  ...

Mick:
  there is metadata associeated with the subspace itself as well as the individual ?things?

dstuve:
  Asset store - metadata and document store
<enter bill catty>
  Services layer - web server, harvesting, sharing, replication

Ralph:
  maintain access to papers in perpetuity - how about intellectual prop rights?

dstuve:
  indeed it's a big part - hoping to be a pioneer - hope to cause conflict in the area

Mick:
  Eric Celeste has 1.5 positions to look at business models
  view libraries as source rather then cost center

Bill Catty
  MIT informations systems
  learned from dublin core, warwick framework
    must let disciplines define their own metadata set [schema - ed]
    must integrate with existing discipline-specific metadata sets (eg medline)
  automatic import of metadata standards

Mick:
  fundamental hard prob:
    continuum between efficiency of search and ability to extend
    hard coded tables <-----------> abstract

Ralph:
  multiple metrics of efficient, DB efficiency, query creation efficiency, schema maintain efficient

Eric:
  each select decomposes to a join

art:
  do you see

Marg:
  yes

Ralph:
  slow evolution

Ralph:
  you hope to absorb a lot of data from existing repositories

Marg:
  will ask end users to submit

dstuve:
  bulk import is a way to jumpstart and build credibility/interest
  ?subspaces is an illusion to orient user?

Mick:
  there is an interest that disciplines publish their own stable schema
  how much can we automate schema import

Eric:
  not always necessary for mere data slurp and burp

danbri:
  how about UI extensions

Mick:
  that and what mods need to be made to the data store

Ralph:
  what sort of user insentives?

Marg:
  not a huge amount of insentive.
  math discipline has expressed this need before
  osceanography needs a system anyways

Ralph:
  need access to old data

Mick:
  feedthrough mechanism to share data with exisint repositories that already have critical mass - probably possible
  Indus Leason Program has an incentive program

Ralph:
  goal: absorb *anything* that the author wants to say about their article

dstuve:
  MIT: if you build it, they won't come. you have to go to them.
  Margaret is on board to get influential users on board.
  extensible looks and feels
  policy-independent as we will adopt *your* policy

Margaret:
  some labs and centers have outside company support and have small DBs that are undeveloped and work to maintain


danbri:
  will you have folks maintain their own systems and export to DSpaces

dstuve:
  document evolution is a draw
  pick up incarnations from various domains (whitepaper - thesis - book)
    may pass-through DSpace to other repositories

Ralph:
  we want to represent relationships between resources
  over time evolve a few that most are familiar but need to hold all

Mick:
  providence - new class of metadata i hadn't considered

  need to offer harvesting, we don't have a lot of leverage to change peoples behavior

Ralph:
  will you be content-type agnostic

Mick:
  "for you content to be useful in the future, we reccoment x"

danbri:
  if documents are blobs it's a non-issue

dstuve:
  don't want arbitrary UI

Ralph:
  what about the data itself?

dstuve:
  import/export mechanism system can put data between spaces

Ralph:
  what do you want to make efficient?

Mick:
  make certain structures efficient and general queries less so

danbri:
  "framework" is a problem word:
    disciplines schemas are not germain to that discipline only

dstuve:
  author may mean different things depending on the disipline

  workflow items outside of the space

Ralph:
  it's different vocab, but does the system need to handle it differently

dstuve:
  metadata comes from user
  workflow comes from administrator

Mick:
  simply a question of roles

  let's move on to rights

  need to decide if there are two modeles

Eric:
  two? three? n? what about in-between data?

Ralph:
  can you issue workflow based queries?

---- Rights ----

dstuve:
  there is a god admin who is allowed to defined rights and empower folks
  rights - one is the ability to hang code off a right
  rights may be time-based, person based.
  we expect to get some rights stuff wrong
  attach rights to any element

Ralph:
  what is the granularity?

dstuve:
  element is any file in the silo
  code on elements for indexing
  [more refs to fedora]
  rights are namespaced to distinguish ASAP right from ... right

Eric:
  wanna see my model? (documents, ACLs)
  organized by homegeneity of rights

Mick:
  how volitile are those rights?

Ralph:
  user feedback for expensive non-optimizable queries

dstuve:
  let users know when data is non-local

Peter:
  corba is being superceeded by XML stuff


< discussion of HTTP NG >

danbri:
  are you looking at somehting like corba?

dstuve (or maybe bill?):
  user interface would correspond to hard java code in the server

Peter:
  peter bretten - consolutant on the project
  need ad hoc ...
  corba is too heavy and slow
  looking at SOAP
  perl can talk to java in SOAP
  SOAP isn't really mature
  looking at Hp's E-Speak
  SOAP addresses marshalling
  E-Speak addresses what can i do, what types?

< XML protocol charter not including service advertisement >

dstuve:
  pushing not for RDF but a standard entity relationship diagram and think about it in less abstract terms for the first cut
  RDF is a new tool set and new nomenclature
  may want to use present tools and nomenclature

danbri:
  what you do in the privacy of your database is your own business
  sergey melnick is looking at UML and RDF
  merging data from multiple domains may not be addressable with conventional relational tools

Ralph:
  implications have to do with long-term extensibility

danbri:
  large tool gap - many more tools for UML

Ralph:
  there are *no* tools that use XML namespaces the way they are meant to be used.

< lunch >

Art, Ralph, danbri:
  discussion of current tools

---- future-proofing ----
Ralph:


Eric:
  contents, decosntructed?
  if you peer into an image/gif, you can get the size
  if you peer into a text/xml, you can harvest RDF metadata

---- rights ----
Ralph:
  need to decide whether to serve documents as well as tell the client what their rights are

Ralph descrives p3p:
  not actually in RDF

danbri:
  salvagable with xslt

Ralph describes XSLT
Eric Prud'hommeaux,
but please post comments to
www-rdf-dspace@w3.org
@(#) $Id: 11-DSpace-minutes.html,v 1.2 2000/10/19 20:52:43 eric Exp $