Re: vocabulary versioning and preservation from Herbert Van de Sompel on 2015-03-19 (public-dwbp-wg@w3.org from March 2015)

From: Herbert Van de Sompel <hvdsomp@gmail.com>
Date: Thu, 19 Mar 2015 03:39:00 -0600
To: "public-dwbp-wg@w3.org" <public-dwbp-wg@w3.org>
Cc: Herbert Van de Sompel <hvdsomp@gmail.com>
Message-ID: <CAOywMHcNz9rKV2smZnpNKzS_hsvdUQPuzwkoMSYra1fk6jUaJQ@mail.gmail.com>
Dear all,

Many thanks for your enthusiastic comments. Below, I respond to some
of your comments/questions.

Greetings

Herbert Van de Sompel

==

* I can only express my excitement that interest is expressed in
reading the Memento protocol spec, RFC 7089. A handy HTML version is
available at [1]. For a gentle introduction, see [2].

* Regarding Ghislain's remark about "storing" versions:

The Memento protocol has nothing to say about criteria used for
deciding when a resource is effectively a new version. The Memento
protocol comes into play once temporal resource versions have been
created, irrespective of the underlying approach used to create them.
Typical cases are:

(a) In web archiving, a temporal version (snapshot) is created after a
robot crawled a page and the resulting resources (the page and its
embedded/linked resources) are ingested in a web archive. The crawling
date will be the Memento-Datetime. It is the date time of the
observation of the crawled web resources. Returning the "best" Memento
for a specified datetime is typically done on the basis of the
smallest delta between the specified datetime and a Memento-Datetime
value. This is the approach in all web archives including Internet
Archive.

(b) In CMS, software versioning systems, etc., a temporal version is
created subject to technical and editorial policies. The datetime of a
new temporal version becomes its Memento-Datetime. With CMS etc., one
typically knows the history of resource versions, i.e. one knows the
temporal interval in which they were the "live" versions.  Because
this history is known, returning the "best" Memento for a specified
datetime is typically done by returning the version that was
operational in the interval that includes the specified datetime, i.e.
the version that is closest in the past to the specified datetime is
returned. This approach is used e.g. in the Memento extensions for
MediaWiki [3][4].

* Regarding Ghislain's remark about URI syntax for Mementos:

The Memento protocol does not require any special URI syntax for
Mementos as everything is (according to REST and HATEOAS principles)
based on HTTP headers, typed links, negotiation. However, the syntax
style exemplified by
<http://dbpedia.mementodepot.org/memento/20100316/http://dbpedia.org/page/DJ_Shadow>
is rather widely used/supported by web archives although definitely
not uniformly. The API associated with our Time Travel service
<http://bit.ly/webtimetravel> also supports the syntax. But CMS etc.
definitely do not use it.

* Steven says: "Your note addresses archiving published data, but I
also ask how an organization can assume best practices in publication
if they do not yet have policies to retain that which is not yet
decided to be published?" :

I guess retention is a bit of a different beast and typically subject
to a range of policies. There's also the question whether everything
that is decided to be retained is also public/published. Let's just
say that, if an organization decides to retain resources in the public
eye (i.e. publish them), the data principles apply. If the
organization would already apply the data principles internally, prior
to publishing, chances are high they would be in a better position to
adhere to the principles when they publish.

* Antoine says: "So in practice for the document I would be very happy
to say that the versioned vocabulary could be published following the
methods that are applied to the data itself. And count on the data
versioning section to refer on Memento.":

That would be an approach. But, as you mention, a lot of vocabularies
are used in data that are not controlled/published by the publisher of
the data. If data and vocabulary use a different approach for handling
versions, interoperability decreases.

* Lewis says: "… the DWBP WG is taking a data centric view of data
versioning meaning that a protocol which defines the data version
would be more part of the BP relating to Follow REST principles when
designing APIs. I think we need to be aware of the differences between
something like Memento (a specification and protocol for accessing
resources) and best practice of publishing versioning information
alongside dataset which are to be published on to the Web.":

This is a very good point, and goes straight to the two possible
perspectives one can take on Memento in the context of this
discussion:

- The Memento protocol, RFC 7089, is actually a RESTful "API" to
access temporal resource versions. API between quotes because it's
actually not an API, it's just a straightforward extension of HTTP
with datetime negotiation, a feature that Tim Berners-Lee suggested
ages ago [5] but was never specified. The protocol offers TimeGates
(datetime negotiation to access a single temporal version) and
TimeMaps (access to a temporal resource version history) as version
access mechanisms. Obviously, instead of having a multitude of APIs to
access temporal versions and version information, I would much prefer
a world in which this were uniformly done using the Memento protocol
;-) The uniform "API" exists and our experience with the TimeGate
server [6] shows that it is typically straightforward to implement
Memento support in cases where a bespoke version API exists.

- Aspects of the Memento protocol can be used to publish resource
version information without actually fully implementing the protocol.
This is the bit that I shared initially and that is described in [7].

[1] http://mementoweb.org/guide/rfc/
[2] http://mementoweb.org/guide/quick-intro/
[3] http://www.mediawiki.org/wiki/Extension:Memento
[4] http://www.mediawiki.org/wiki/Extension:MementoHeaders
[5] http://www.w3.org/DesignIssues/Generic.html
[6] https://github.com/mementoweb/timegate
[7] http://mementoweb.org/guide/howto/

On Wed, Mar 18, 2015 at 4:19 PM, Mcgibbney, Lewis J (398M)
<Lewis.J.Mcgibbney@jpl.nasa.gov> wrote:
> Hi Herbert,
>
>>
>>(1) vocabulary versioning
>>
>>The Memento-related comments I made about Data Versioning apply
>>equally to Vocabulary Versioning. All approaches described in
>><http://mementoweb.org/guide/howto/> apply to data and vocabulary. As
>>a matter of fact, when implementing Memento protocol support for both
>>data and vocabularies used in data, temporal versions of the data can
>>automatically be aligned with the temporally correct version of the
>>used vocabulary.
>>
>
> Right now the Best Practices document classifies Data Versioning as a
> part-of/child-component within the Metadata parent topic.
> This can be seen within the taxonomy provided within the BP document ToC
> [0].
> To me there is a distinction to me made here which indicate that your
> Momento-related comments do not necessarily apply equally to both Data
> versioning and Vocab versioning. The Momentos themselves e.g. The
> instances of archived versions of web resources could provide a
> Memento-Datetime which may be different from that published within and
> unique to the dataset.
> We need complete and utter clarification on this topic, however AFAICT the
> DWBP WG is taking a data centric view of data versioning meaning that a
> protocol which defines the data version would be more part of the BP
> relating to ³Follow REST principles when designing APIs² [1].
> I think we need to be aware of the differences between something like
> Memento (a specification and protocol for accessing resources) and best
> practice of publishing versioning information alongside dataset which are
> to be published on to the Web.
> Thank you very much for your comments Herbert.
> Working GroupŠ is it worth visiting some aspects of the data versioning
> commentary and use cases at one of the forthcoming meetings?
> Thanks
>
> [0] http://w3c.github.io/dwbp/bp.html#h-toc
> [1] http://w3c.github.io/dwbp/bp.html#BulkAccess2
> [2]
>



-- 
Herbert Van de Sompel
Digital Library Research & Prototyping
Los Alamos National Laboratory, Research Library
http://public.lanl.gov/herbertv/

==
Received on Thursday, 19 March 2015 09:39:28 UTC