Re: High-level annotation issues

To: liberte@ncsa.uiuc.edu (Daniel LaLiberte)
Subject: Re: High-level annotation issues
From: "R. Martin Roscheisen" <rmr@cs.stanford.edu>
Date: Sat, 14 Oct 1995 01:08:49 -0700
Cc: w3c-collab-annotation@w3.org
From roscheis@xingu.stanford.edu Sat Oct 14 04: 08:09 1995
In-Reply-To: Your message of "Fri, 13 Oct 1995 22:58:02 CDT." <9510140358.AA06636@void.ncsa.uiuc.edu>
Sender: roscheis@xingu.stanford.edu

  >This is perhaps ambiguous, or at least confusing.  A single annotation
  >set consists of a set of URLs that have annotations in the set.  (Typically
  >the URLs will be related in some way.) For each URL in the set, there
  >may be a set of annotations.  All these annotations for all these URLs are
  >"in" the annotation set, though a two level structure is more correct.

I am getting lost here... An annotation set is just as the name says a
set of annotations -- at least this is the way we have been using this
here for.  For example, there can be a set good_for_kids which
annotates documents which are good for kids.  That is, this set is
stored at some server (set meta-info plus meta info describing the
annotations currently in it), and there is some database on this
server which keeps all the annotation info for this set ("contains the
annotations").  Now, then you could say that the URLs of the annotated
documents in the good_for_kids set are related in the sense of this
set, but of course this is result not cause. In other words, they need
not be related in any substantial way, and whenever someone adds a new
annotation to an annotation set, then this might change.  "set of
(document) URLs that have annotations in it" would mean then something
different.  I am not sure how this would be useful (if you really mean
this, please better use a different phrase, such as "annotated
documents set" or so...).

I guess you are pointing to the fact that the server has to know
whether a particular document is annotated; yes, this is true, but in
order for this, we need not introduce the notion of a set -- a simple
database index does it for this purpose.  On the other hand, what one
needs is the notion of structuring annotations together according to
some semantic notion (such as topics, threads, SOAPs, tours, etc.).
This is then the file system equivalent of a directory as opposed to a
file -- which is entirely independent from how the OS would structure
this in a physical storage tree.  These are the levels which we are
separating out here.

The information whether certain sites or collections are annotated by
a specific set would be kept as meta info as part of the set meta
info, and could be exploited by smart clients (but need not -- for
this cache coherency problem there is no perfect solution).

  >In any case, the generalization to collections composed of subcollections
  >is dominant enough in the web that I expect a collection protocol will
  >emerge, upon which we can build annotation sets.

Yup; this can be easily added to the protocol once the base protocol
is there -- you might be right that there is some value in it.

  >> Well, since supersets are built on top of sets, we could also decide
  >> to leave them out entirely for now, get the basic protocol done first,
  >> and then go back and extend the annotation and set request protocol to
  >> include things such as supersets and other things. [But if you already
  >> have thought it through...fine.]
  >
  >This is a reasonable plan - and it wouldnt get in the way of the
  >extension later to support subsetting. Later then (but not too much later).

OK.

  >The model I am working from is that http servers should not be
  >considered the information units.  Servers are administrative units

Yup.

  >that function more as proxies to the actual data than as the keepers of
  >all the data.  A single server might have numerous annotation sets
  >maintained by several independent groups that might not even talk to
  >each other.  Why should the server know about all the annotation sets
  >that might happen to be on the server?

Good point. The question is: which part of this cannot be controlled
by the access control ?  For example, if you allow people to specify
which groups/whether the public can see the description of a set at a
server, wouldn't this point then be dealt with ?  That is, the server
knows the sets internally, much like the OS does know everything, but
only if you have right the access rights you actually see it...

  >In this model, the data itself (an annotation set in this case) and
  >the code associated with the data (whether in cgi scripts or other
  >extension modules) are the real annotation server.  There isn't
  >necessarily a single piece of code that handles all annotation sets
  >that might be on the server.

I get your basic point, but we shouldn't get to saying that, hey, I
want a separate http server running on this machine to serve my
documents because I do not want to share it... As said above, I think
access control can deal adequately with your underlying concern that
people generally might want to create sets on a server without that
the other people know about this fact.

  >The main point I want to make is that all the annotation sets on a single
  >server are *not* necessarily related, so the server should not be 
  >providing a list of them as if they were related.  The fact that they
  >are on the same server is largely irrelevant.

Right. It is irrelevant.  The server should just serve
(access-controlled) descriptions (both human-intelligible as an html
page and computer-intelligible set meta-info) for each annotation set
-- upon request by the command annset_info_get (or whatever).
These pages can then be linked together in any way, and if someone
feels like it, this person can make a list of sets which are generally
accessible at a specific server -- but need not.

  >I have a similar complaint about the idea of searching through all
  >documents on a server.  It should be searching through documents in a
  >*collection*, not the server as a whole.  It may happen that all
  >documents on a particular server are related and so may be in the
  >same collection, but this is not necessarily so for all servers.

If this is a reply to what I might have appeared to claim, then there
is a misunderstanding.  I see no scenario where this would happen. (?)
To get a listing of annotations in a set on a server, the server just 
runs through the annotation meta-info for the annotations in a set and
extracts the "annotated document URL" from this meta-info, and returns
this listing.


  >"Set" has almost the same meaning as "collection", so all I was
  >suggesting was that the protocol for manipulating other kinds of
  >collections might apply here too.  In particular, searching for a
  >particular object in a collection works here too.

Indeed, sets can be used to define collections. Take all
annotatedDocumentURL's of the annotations in an annotation set and you
have an collection.

annotationSet = TYPE {
  annotationSetName String;
  annotationSetServer FQDN;
  annotations = SET of annotation;
}

annotation = TYPE {
  annotationName String;
  annotatedDocumentURL URL;   -- URL of the annotated document
  annotationAuthor String;
  annotationContent String;
  annotationRequestURL URL;   -- would return this annotation only
}


  >> - Get List of all annotations in set aSet with <select>
  >>     [this is very important to help people find stuff]
  >
  >That's what I have above for "Get Annotations" with a selector and
  >parts spec, without being too clear about it, I know.  

Actually, I noted the above, but this list command is supposed to
return a list of pointers to annotatedDocumentURL's from all
annotations which fulfill the select criteria. (Indeed, more
precisely, the annotatedDocumentURL#markForAnnotation -- such that
jumping to this location jumps directly to a certain annotation on its
page).  Thus, if the select criteria is "recency=all which are new
since yesterday", then this would return a list of links to documents
which then have new annotations on them (that is, if this list is
realized as an HTML list, then I can click on the link -- it load the
document plus the annotations and jumps to the newest one).

  >Notice, I've left out all management of groups and members.  We need
  >another protocol for this.  Do we need another working group or should
  >we take it on too?  I think it deserves a different group more closely
  >involved with security.  Things like roles and credentials should be
  >considered, as well as hierarchical and distributed groups, keys and
  >what not.

Yes, this should be fine.  For the time being, we can consider that
this is dealt with by the sysadmin at the server.  It is easily
extended later if we leave this out for now.

  >We also need a protocol for manipulation of access control lists that
  >identify who can do what functions to what objects.

Actually, we can say that during set creation (which should be fluidly
possible) the permissions are specified as one of the things in the
set meta-info.  This is sort of just an attribute in the set creation
request, not really a different protocol.  [It is like 'chmod g+r
file', not like insert name into /etc/group]

  >The next step in specifying the protocol is to map it onto two different
  >lower level protocols: HTTP via CGI and HTTP via new methods.  CGI
  >because current servers can deal with it, and new methods because we
  >are extending the server to handle arbitrary new methods using a general
  >method handler interface (MHI).

Yes, so to get this clean, we probably should define method arguments
and results indepedently of the realizations, and then map them onto a
cgi interface and onto a more general one.

E.g.
QUOTE

METHOD ann_new

Description: Add new annotation into a set for a URL. 

Authentication: Member distinguished name, member password on server.
Arguments: 
 annSet          (name) annotation set the annotation is being added to
 url             (url)  The URL of the document about which the annotation is.
 title           (text) Title of annotation
 pos             (posid) Position within document as defined in XY.
 type            (type) Type of the item.

 Optional:
 data            (text) Short text annotation (if needed)
 annurl          (url)  pointer to a separate page (if needed)
 name            (text) name of author (for Anyone)
 email           (email) email address of author (for Anyone)
 iconurl         (url) url of author's icon (for Anyone)
 picturl         (url) url of author's mug shot (for Anyone)
 homeurl         (url) url of author's homepage (for Anyone)

Adds an annotation meta-info record to the annotation database for the
appropriate annotation set.

Returns a document containing a text message, and a meta-info and an HTML
representation of the annotation.

The icon, picture or home URLs are taken from the member's record
unless overridden by the fields in this command. The exception is the
member Anyone who is an anonymous entity. When accessing or posting to
a set anonymously the browser should provide the name, email etc. of
the user, since they are not on file at the public ann site. If
the user is a member on that server then the details are stored as
part of the member info, and they will be automatically retrieved from
the member's record.
END QUOTE


This can then be mapped into

--> CGI

GET http://serverhttp/serverscript?cmd=ann_new&annset=Test&url=http://sun.com/

or POST http://serverhttp/serverscript
cmd=ann_new&annset=Test&url=http://sun.com/

--> CORBA ISL

method ann_new for annotation set object:

ann_metainfo ann_new(annset String, url URL);

  >dan
  >

Cheers, 

- M
References:
Re: High-level annotation issues
From: liberte@ncsa.uiuc.edu (Daniel LaLiberte)
Prev: Re: High-level annotation issues
Next: More coming...
Index: Mail Index
Thread: Mail Thread Index