W3C

SDSVoc bar camp: Versions and archives - how to annotate and query

01 Dec 2016

See also: IRC log

Attendees

Present
newton, AxelPolleres, jrvosse, Javier, Jacco, sebastian, BernadetteLoscio, DavidBrowning, WillemVanGemert
Regrets
Chair
AxelPolleres
Scribe
Javier

<AxelPolleres> to join type /join #sdsvoc_versioning

<AxelPolleres> 1) How to describe your change frequency and change characteristics

<AxelPolleres> E.g. order preserving, monotonicity, no change in the ontology but changes in the instances, etc.

<AxelPolleres> Are there specific vocabularies? Should we join forces to create one? Which are the potential needs?

<AxelPolleres> How to query data across time?

<AxelPolleres> Collect requirements/use cases

<AxelPolleres> How to efficiently store and archive public datasets, allowing users to ask complex cross-time queries? how can metadata make that easier?

<AxelPolleres> Jacco: use case annotating artworks, vocabulary is updated regularly, assumed to be only additions

<AxelPolleres> … but some risk for semantic changes.

<AxelPolleres> … problems with querying archived data over time.

problems with concept drift

<AxelPolleres> scribe: Javier

david: validity period in one dimension but also the authors and changes

<AxelPolleres> David: version means for us schema updates, not the ongoing data (changes are always happening there)

<AxelPolleres> … could be streams.

David: in streaming, transaction is fairly complete
... we treat versions as software releases

<AxelPolleres> … common in the finance, data market.

<AxelPolleres> … customers don’t want to look at transaction logs, but they want to search/lookup events.

david: you don't want to track all the transaction logs, but we would like to have a most common practice with DCAT

<AxelPolleres> Transaction log vs. “full snapshots"

david: we have different representations for different needs: full spanshots + Logs
... we have different representations for different needs: full snapshots + Logs
... It may change every minute, but it is published once per day

<AxelPolleres> “here you get ther realtime dataset and here you get the daily snapshot” might be different distributions of the same dataset.

<AxelPolleres> David: time dimension might be part of the data/snapshot or part of the metadata

<AxelPolleres> metadate needs either be able to say “this column in the data” contains the temporal validity of the data or to describe the temporal extent of the whole dataset/distribution/resource

<AxelPolleres> Willem: Eurostat overwrites, does not provide history.

<AxelPolleres> newton: do you have different URIs for versions?

<AxelPolleres> Willem: no, just overwritten at the same URI

<AxelPolleres> David: similar case, but it’d be actually quite valuable to look at the changes.

<AxelPolleres> … but if you can monitor, knowing the changerate, you could follow that.

<AxelPolleres> … archiving and finding value in the differences is a common case.

<AxelPolleres> … use case in the legal domain, precedent cases that have been overturned…. which past cases they affect.

<AxelPolleres> Willem: Eurostat updates daily

<AxelPolleres> Axel: so advertising the change frequency would be helpful to monitor.

Willem: for taxonomies, we collect the changes and generate a new release, e.g. each 3 months

taxonomies/vocabularies

<AxelPolleres> Axel: could it be that certain parts of thedatasets change at particular frequencies, e.g. thinking of statistical data, some may change daily, other annually, or other frequencies.

David: regular rhythm of updates + emergency updates (e.g. news)

<AxelPolleres> David provides another example of schema change… e.g. splitting of a country, the histroic data then needs to be split hich can’t be done algorithmically.

<AxelPolleres> David: most of our system uses versioned URIs

<AxelPolleres> … could be “upddate is every three months, except when it doesn’t” … that might be too complicated to make it machine readable.

<newton> dct:accrualPeriodicity <http://purl.org/linked-data/sdmx/2009/code#freq-A> ;

<AxelPolleres> Newton: dct has peridocity

<newton> https://www.w3.org/TR/dwbp/#AccessUptoDate

@David we use it for the external customer, but not internally

<AxelPolleres> Willem: dcat-ap has frequency of update

Axel: it would be interesting to have the growth rate

<AxelPolleres> Axel: or other characteristics, like order.

<AxelPolleres> Willem: dcat-ap has also special things like biweekly/forthnightly.

<AxelPolleres> Newton: travel restrictions are another example of data that changes over time.

<AxelPolleres> … some government agencies have only current, others back over 10 years.

<AxelPolleres> some common metadata would be useful to build such an application.

<BernadetteLoscio__> I don'y know if you discussed that, but I think we also need better definition for versioning

<AxelPolleres> Javier: what do you use to process temporal/archived data? SPARQL? something else?

<AxelPolleres> David: different, not a standard way to do it.

<newton> +1 BernadetteLoscio__

<AxelPolleres> … big data technologies, SQL, SPARQL, etc. differnt systems

<AxelPolleres> Axel: what would be helpful in tems of standards/metadata?

<AxelPolleres> David: if metadata was more standardised that might help. important is benefits in terms of ease of use, enabling automation.

<AxelPolleres> … anything that simplifies the exchange of the description of the interface.

<AxelPolleres> … frequency of update, etc.

<AxelPolleres> Javier: should we talk about memento?

<AxelPolleres> Jacco: … which relies on content negotiation (we had talked about that)

<AxelPolleres> Phila: internet archive use it.

<AxelPolleres> Willem: we are planning to use it.

<AxelPolleres> … on the publications office metadata (from the 50’s to today), CELLAR project.

<AxelPolleres> … in the commission this is used in production.

<AxelPolleres> … we have an issue how to model changes over time, e.g. organisational changes, country changes, etc.

<AxelPolleres> … e.g. change from kingdom to republic for a country.

<AxelPolleres> Javier: to some extent reflected on wikidata

<AxelPolleres> Axel: “concept drift” ?

<AxelPolleres> David: similar example, what if companies merge

<AxelPolleres> … mostly done manual.

<AxelPolleres> … doesn’t happen overnight, so there is some transition phase when it’s unclear.

<AxelPolleres> … in the financial area this matters, if over 24hrs it’s not clear what the share price it, how it translates to yesterdays share price.

<AxelPolleres> Axel: when starting the session I was aiming at much lower hanging fruits…i.e. making recomendations for best practices to a) describe dataet change frequency, characteristics, b) align existing vocabes in that space, c) allow to describe diffferent practices of slicing datasets based on temporal extent, etc.

<AxelPolleres> Javier: another issue is online APIs… e.g. metadata for APIS that indicate whether the data behind has changed.

<BernadetteLoscio__> I also think that we should have a better definition for the basic concepts! to have a kind of agreement

<newton> rescuing BernadetteLoscio__ question: I don'y know if you discussed that, but I think we also need better definition for versioning

<AxelPolleres> David: we use e.g. kafka based.

<newton> Is it in the scope of a (new) wg to create a definition about it?

<AxelPolleres> Axel: i there anything needed/best practice for notifications of changes. i.e. push vs. pull

<AxelPolleres> Jacco: openarchives.org/rs resource sync

<AxelPolleres> Newton: WebMention (W3C rec track)

<AxelPolleres> … could be useful/related.

<AxelPolleres> … also the data-usage vocabulary by DWBP WG

<AxelPolleres> Axel: who would be in to push in a WG for such issues being addressed?

<AxelPolleres> Jacco: could be in DCAT 2.0 spatial and temporal coverage (which was mentioned in the panel).

<AxelPolleres> Axel: I think discussing use cases would make sense, because common “categories” of use cases might require different modeling (e.g. evolving datasets that represent snapshots vs. delta/updates)

<newton> W3C WebMention -> https://www.w3.org/TR/webmention/

<AxelPolleres> David: only talk more about time-granularity and how to model that in DCAT or talk also about which other sreas/standard might be affected.

<AxelPolleres> newton: my impression is we don’t have a clear definition of version.

<AxelPolleres> … e.g. software versioning is clearer than data versioning, e.g. dbpedia example.

<AxelPolleres> Axel: dbpedia is a good example, because you could split it per dbpedia version, vs. differnt versions by resource taken from the wikipedia edit history.

<AxelPolleres> Axel: take-home/conclusion extensiond to model temporal extent, changes and versioning should be ctegorized use-case driven.

<AxelPolleres> David: this would help us to understand what it the scope of standardisation in this space…

<AxelPolleres> … we have discussed *some* examples in the breakout-session.

<AxelPolleres> Newton: it seesm no group has taken this issue serious or considered it seriously as “in scope”.

<AxelPolleres> Willem: not only use cases, but also examples and current solutions should be collected.

[End of minutes]