On October 11, people from W3C met with people from HP/MIT's DSpace project. Following are the proceedings.
Ralph Swick, Dan Brickly, Art Barstow, Eric Prud'hommeaux and Dave Becket arrived from W3C
met Mick Bass and Margar Benchofsky and David Stuve.
Mick Bass:
project leader for DSpace
Hp employee
DSpace has MIT, Hp, and contractors
Eric:
Ralph Swick:
RDF is a long-time project.
integrated with other work under name of metadata
though it's all data
Danny W and ??? have had prior conversations but tx to Art and Mick
Margaret Benchofsky
long time librarian
factulty contact for DSpace
UI interests
Dan Brickely:
RDF interest group
RDF schema spec
other .5 time: working with Dave in Bristol
Dave Becket
working on metadata for 5 years
developing meatadata library called redlin
David Stuve:
Hp donation to MIT for next 1.5 years
DSpace design work
archive aspects
metadata storage
may also be doing some UI stuff
Ralph:
we have several metadata projects
in the middle of defining next phase of metadata activity
Art is here to do RDF tool development
1 year ago we got .5 danbri from Bristol
Bristol work is seperately coordinated
danbri:
another connection - Hp labs in Bristol
4-5 ILRT developers in lunch c=powows
joyce:
master student working with Mick
Hal Ableson is thesis advisor
sheet 1:
purpose:
establish W3C-DSpace relationship
get feedback - avoid blindspots
goals:
W3C understand DSpace
DSpace understand frontier/direction
sheet 2:
Steps:
intros - done
DSpace Metadata Approach - dstuve
W3C - RDF activities Review - art
W3C - Metadata activiteis Review - ralph
free-for-all
lunch!
---- DSpace Metadata Approach ----
dstuve:
sponsored by Hp and MIT libraries
goal: institutional intellectual capitol archive
others: lanl( physis + math), cognet (MITPress)
lots of domain-specific
<enter peter>
umbrella archive for everything produced by MIT
cross discipline - therefor need metadata abstraction
+------------------------------------+
| math subspace |
| periodicals MIT press theses |
+------------------------------------+
| physics subspace |
...
Mick:
there is metadata associeated with the subspace itself as well as the individual ?things?
dstuve:
Asset store - metadata and document store
<enter bill catty>
Services layer - web server, harvesting, sharing, replication
Ralph:
maintain access to papers in perpetuity - how about intellectual prop rights?
dstuve:
indeed it's a big part - hoping to be a pioneer - hope to cause conflict in the area
Mick:
Eric Celeste has 1.5 positions to look at business models
view libraries as source rather then cost center
Bill Catty
MIT informations systems
learned from dublin core, warwick framework
must let disciplines define their own metadata set [schema - ed]
must integrate with existing discipline-specific metadata sets (eg medline)
automatic import of metadata standards
Mick:
fundamental hard prob:
continuum between efficiency of search and ability to extend
hard coded tables <-----------> abstract
Ralph:
multiple metrics of efficient, DB efficiency, query creation efficiency, schema maintain efficient
Eric:
each select decomposes to a join
art:
do you see
Marg:
yes
Ralph:
slow evolution
Ralph:
you hope to absorb a lot of data from existing repositories
Marg:
will ask end users to submit
dstuve:
bulk import is a way to jumpstart and build credibility/interest
?subspaces is an illusion to orient user?
Mick:
there is an interest that disciplines publish their own stable schema
how much can we automate schema import
Eric:
not always necessary for mere data slurp and burp
danbri:
how about UI extensions
Mick:
that and what mods need to be made to the data store
Ralph:
what sort of user insentives?
Marg:
not a huge amount of insentive.
math discipline has expressed this need before
osceanography needs a system anyways
Ralph:
need access to old data
Mick:
feedthrough mechanism to share data with exisint repositories that already have critical mass - probably possible
Indus Leason Program has an incentive program
Ralph:
goal: absorb *anything* that the author wants to say about their article
dstuve:
MIT: if you build it, they won't come. you have to go to them.
Margaret is on board to get influential users on board.
extensible looks and feels
policy-independent as we will adopt *your* policy
Margaret:
some labs and centers have outside company support and have small DBs that are undeveloped and work to maintain
danbri:
will you have folks maintain their own systems and export to DSpaces
dstuve:
document evolution is a draw
pick up incarnations from various domains (whitepaper - thesis - book)
may pass-through DSpace to other repositories
Ralph:
we want to represent relationships between resources
over time evolve a few that most are familiar but need to hold all
Mick:
providence - new class of metadata i hadn't considered
need to offer harvesting, we don't have a lot of leverage to change peoples behavior
Ralph:
will you be content-type agnostic
Mick:
"for you content to be useful in the future, we reccoment x"
danbri:
if documents are blobs it's a non-issue
dstuve:
don't want arbitrary UI
Ralph:
what about the data itself?
dstuve:
import/export mechanism system can put data between spaces
Ralph:
what do you want to make efficient?
Mick:
make certain structures efficient and general queries less so
danbri:
"framework" is a problem word:
disciplines schemas are not germain to that discipline only
dstuve:
author may mean different things depending on the disipline
workflow items outside of the space
Ralph:
it's different vocab, but does the system need to handle it differently
dstuve:
metadata comes from user
workflow comes from administrator
Mick:
simply a question of roles
let's move on to rights
need to decide if there are two modeles
Eric:
two? three? n? what about in-between data?
Ralph:
can you issue workflow based queries?
---- Rights ----
dstuve:
there is a god admin who is allowed to defined rights and empower folks
rights - one is the ability to hang code off a right
rights may be time-based, person based.
we expect to get some rights stuff wrong
attach rights to any element
Ralph:
what is the granularity?
dstuve:
element is any file in the silo
code on elements for indexing
[more refs to fedora]
rights are namespaced to distinguish ASAP right from ... right
Eric:
wanna see my model? (documents, ACLs)
organized by homegeneity of rights
Mick:
how volitile are those rights?
Ralph:
user feedback for expensive non-optimizable queries
dstuve:
let users know when data is non-local
Peter:
corba is being superceeded by XML stuff
< discussion of HTTP NG >
danbri:
are you looking at somehting like corba?
dstuve (or maybe bill?):
user interface would correspond to hard java code in the server
Peter:
peter bretten - consolutant on the project
need ad hoc ...
corba is too heavy and slow
looking at SOAP
perl can talk to java in SOAP
SOAP isn't really mature
looking at Hp's E-Speak
SOAP addresses marshalling
E-Speak addresses what can i do, what types?
< XML protocol charter not including service advertisement >
dstuve:
pushing not for RDF but a standard entity relationship diagram and think about it in less abstract terms for the first cut
RDF is a new tool set and new nomenclature
may want to use present tools and nomenclature
danbri:
what you do in the privacy of your database is your own business
sergey melnick is looking at UML and RDF
merging data from multiple domains may not be addressable with conventional relational tools
Ralph:
implications have to do with long-term extensibility
danbri:
large tool gap - many more tools for UML
Ralph:
there are *no* tools that use XML namespaces the way they are meant to be used.
< lunch >
Art, Ralph, danbri:
discussion of current tools
---- future-proofing ----
Ralph:
Eric:
contents, decosntructed?
if you peer into an image/gif, you can get the size
if you peer into a text/xml, you can harvest RDF metadata
---- rights ----
Ralph:
need to decide whether to serve documents as well as tell the client what their rights are
Ralph descrives p3p:
not actually in RDF
danbri:
salvagable with xslt
Ralph describes XSLT