Annotation break-out, TPAC meeting, Sapporo, Japan

28 Oct 2015

See also: IRC log


csarven, rhiaro, tantek, rob sanderson, manu sporny, benjamin young



<m4nu> rob: Hi Rob Sanderson, co-chair for Web Annotations WG... Ben Young is a co-chair.

<m4nu> Rob: Academically, I've always been interested in annotations, allows people to give feedback on annotations.

<m4nu> David: Hi, David Burns - co-editor for web ??? specification - user emulation in the browser, wanted to see what this was about.

<m4nu> John: Hi John Jansen from Microsoft, working on Webdriver spec, mapping test suite to spec, leveraging annotations to map words in a paragraph to spec - interested in where it's going.

<m4nu> Malena: Hi Malena - working on automatic annotation on fashion data - image recognition on Semantic Web.

<m4nu> Amy: Hi just joined

<m4nu> Sarven: Joined web annotations recently, at MIT -

<m4nu> Alex: Hi Alex Milowski - worked on annotations and scientific data - anything web/annotations pique my interested.

<m4nu> ???: Wanted to see what's going on here.

<m4nu> Philippe: ? Hi , from Paris - research infield, would like to extension of annotatin to semantic

<m4nu> Kazui: Kazui Sako - wanted to learn more about web annotations - personal data store, manage it, control it, etc, working on it.

<m4nu> Takari: Takari from ???

<m4nu> Ben: Hi Benjamin Young with Hypothesis and co-editor of data model.

Brief Walkthrough

<m4nu> Rob: Just wanted to do a brief walktrhough - five to ten minutes . Let's keep questions until the end, put yourself on queue.

<m4nu> Rob: Why people care about annotations - user comments on pages - don't read the comments, but want to solve that problem, want to tag posts, make review of products, academic paper, describe content that's not easily accessible for screen readers

<m4nu> Rob: Transcription of video, audio, replying in a threaded mode could be an annotation on an annotation, copyediting - instead of having editor wars, anyone might be able to propose a change by making an annotation.

<m4nu> Rob: System for annotating and moderating - those have been indirections on content, moving your own annotations between systems. Moving annotations from Kindle to another ereader.

<m4nu> Rob: Doesn't need to go to anyone outside your circle. We have a long list of use cases

<m4nu> Rob: We lay out a set of sorts of things you'd want to do.

<m4nu> Rob: A brief history of annotations on the Web. This was in Mosaic in 1993 - annotations. Running a server where annotations would be stored, would cause legal and operational problems.

<danbri> see also https://lists.w3.org/Archives/Public/www-annotation/

<m4nu> Rob: Even since very beginning of Web, notation of users annotating the Web - both read and write has been part of the vision and part of technical reality. Now that we have superior capabilities, time is right again to try it.

<m4nu> Rob: In 2001, there was an Annotea protocol

<m4nu> Rob: In 2009, Google created Sidewiki, but stopped in 2011

<danbri> https://lists.w3.org/Archives/Public/www-talk/msg02698.html Announcing www-annotation@w3.org and www-collaboration@w3.org (1996)

<m4nu> Rob: In 2011, the Open Annotation Community Group was created - OAC was focused more on humanities.

<bigbluehat> also, those email lists are public, so *please* join in the conversation

<m4nu> Rob: In 2014 the Web Annotation Working Group was chartered.

The Annotation Ecosystem

<m4nu> Rob: You have someone that creates a page, then someone else annotates that web page.

<m4nu> Rob: We have a serialization using JSON-LD to write that annotation down, write that down to persistent storage mechanism.

<danbri> see also e.g. 1998 debates around Netscape "what's related" functionality — http://www.interhack.net/pubs/whatsrelated/ (Netscape fetched RDF annotations from its related link service for each page you viewed)

<m4nu> Rob: We also have an anchoring mechanism to talk about a bit of the page.

<m4nu> Rob: If you have a small range of text, fragments don't help you. Commenting on regions of images rather than ranges of text.

<m4nu> Rob: The annotations may need to be read by the person or other people as well, new user might want to comment on the annotation.

<m4nu> Rob: They may have their own store. Final step, as far as protocol goes, for notification, for all annotations, it would be useful to have a system that aggregates them. Publish that there were annotations about image - tells publisher, maybe render them as a part of the display.

<m4nu> Rob: That's the architectural vision that you're making good progress towards - we have two WDs, data model

<m4nu> Rob: We are trying to simplify implementations, our focus is on looking at JSON and JSON-LD for ease of development, so that devs can look at document, rather than having to worry about lots of RDF stuff, even though its RDF underneath.

<m4nu> Rob: Focus has been on CRUD - use hypermedia APIs to do that, also discovery, fortunate enough to have TimBL to join our meetings over last couple of days - if you can't discover annotations and where you create them, one to one mapping of client and server - that's one area that we're trying to solve w/ Web Architecture.

<m4nu> Rob: Ongoing implementation - we don't want new query languages - we're looking at what the minimal filtering requirements are that might have millions of annotations, internationalization

<m4nu> Rob: On client-side, we're looking for 'find text API' - context - we need to work on finding text.

<m4nu> Rob: We are trying to make it simpler, and make it more internationalized.

<m4nu> JohnJansen: Is there deep linking in the specs? A URI to point to an anchor?

<m4nu> Rob: There has been discussion around find text API and to see if that could be used in the fragment - you wouldn't want to put the entire text of a chapter into a URI - so there are technical issues - around fragments and URIs

<m4nu> Rob: Technically, fragments are defined by media type - lots of requirements and discussion, we may not deliver anything in that space. We do have people asking for exactly that.

<JohnJansen> :-)

<m4nu> Rob: This stuff is pretty simple (shows example)

<kevinmarks> how small a fragment is not useful?

<m4nu> Rob: There are technical details that we could go through.

<kevinmarks> across the entire web a ten word phrase is pretty good at uniqueness

<m4nu> Rob: if you want to comment on our specs, you can annotate them - we're dogfooding.

<kevinmarks> within a document how small a phrase do you need?

<m4nu> Rob: To try and dogfood, we enabled annotation on the working drafts.

<m4nu> Rob: There are specs on TR space and on Github.

<m4nu> ???: What about the use of this stuff for ePub standard?

<m4nu> Rob: Working with markus on epub - been implemented in a few reading systems, timing is always problematic.

<kevinmarks> yep

<kevinmarks> if someone has done work on uniqueness length I'd love to hear about it

<m4nu> Rob: There are a few changes from the CG spec that are not backwards compatible - for example, to embed text body - use EARL but it was never sent to CR>

<csarven> BartvanLeeuwen: There is http://www.w3.org/annotation/diagrams/annotation-architecture.svg

<JohnJansen> +1 to kevinmarks

<m4nu> Rob: There are a few changes that would have to be made, but relatively minor.

<m4nu> Rob: Several of us are also a part of digital publishing WG - we share a staff contact.

<m4nu> Rob: There is ongoing conversations between communities.

<m4nu> ???: All annotations are human produced?

<m4nu> Rob: At the moment, the majority of the use cases assume human produces them, in scientific area - a lot of work being done in NLP.

<m4nu> Rob: UIMA annotates text and uses CG spec to produce those annotations, all machine produced. We've done our best to allow for those use cases without modifying the model either way. if there are issues, we dont' want to make them unusable.

<m4nu> Helena: Machines have special knowledge - the knowledge that they have is of a different quality.

<m4nu> Rob: One thing we don't have in there yet is the confidence of the implementation - only 50% confident (a machine might say that)

<m4nu> Rob: Since focus has been on annotating web resources, we haven't put it into the model - since it's JSON-LD, we can add those features later w/o breaking the model. We had a meeting w/ I18N folks, NLP interchange format on Monday, they're interested in assisting to see if NIF can use annotation model.

<m4nu> Rob: Confidence is one of the requirements.

<m4nu> Helena: I'm part of multi-modal interaction WG - we have EMA - supports annotation use cases also - have supported recommendation already - in discovery w/ another approach using semantic web also.

<Zakim> kevinmarks, you wanted to ask if there has been work done on uniqueness length

<kevinmarks> I'm not physically present, so if someone can read that out…

<m4nu> manu: PLACEKEEPR

<m4nu> Rob: There has been some work done on uniqueness

<m4nu> Rob: 32 characters was good enough for some very high percentage... 64 characters was almost 100% accurate - experiment was wikipedia corpus, randomly select region of text, then see if you got back to the right block of text - so that test was done in english.

<m4nu> Benjamin: If you are only sending the thing you want highlighted, in those scenarios, we provide prefix and suffix, which help w/ reanchoring. In Hypothesis, we have robust anchoring to provide edit space away from text.

<Zakim> m4nu, you wanted to ask where are you in your timeline?

<kevinmarks> I suspect human generated ones would be word rather than character focused

<m4nu> manu: PLACEKEEPER2

<m4nu> Rob: We have a two year charter, we're pretty much halfway through - we're confident that we'll get the model and protocol and at least a stripped down of find text down to CR by middle of next year.

<kevinmarks> is the 64/32 for unique across whole wikipedia corpus or within a page?

<m4nu> Rob: Then it's a question of how long CR takes - if we have to handle an extension - that would likely be granted.

<m4nu> Rob: We would hope that the next year will create enough momentum around things that we're not going to take to CR to get more use cases and requirements to know what to fulfill them, then sketch out pre-FPWD material.

<m4nu> Rob: We don't want there to be a gap

<m4nu> Rob: We're certainly thinking about it - people wanting to participate over next year would help w/ continuation of group.

<m4nu> Bart: How would this work w/ real-world objects?

<m4nu> Rob: If there is a URL to the object, you can talk about it - it's RDF, so all you need is a URI for the object.

<m4nu> Bart: If you take a real-world object, would augmented reality see the annotations.

<m4nu> Benjamin: Data model is RDF-based, you could put in geolocation - specific resource is more fine-grained, a component of a thing - highlight - form of selectors, prefix quote suffix, data position inside data file, text position, those are the ones that run into I18N problems. With a different set of selectors you could do a geolocation type thing.

<m4nu> Bart: It's been a while since I read the spec, should work.

<m4nu> Rob: Yes

<m4nu> Benjamin: Yeah, should work.

<m4nu> Rob: Catarina's project after she left Flickr - HistoryPin? wants to annotate items in the real world.

<Zakim> kevinmarks, you wanted to ask if the uniqueness was within a page or through whole corpus.

<m4nu> Benjamin: Other use case that came up - digitally storing annotations in physical books - page number, use digital selector, text position to anchor inside book - closer shot at anchoring - we had most of what they needed.

<m4nu> Kevin: Was it the entire wikipedia corpus or per page?

<m4nu> Rob: it was on a per-page basis - so if you use a fragid on a page, what's the likelyhood that you mis-link.

<m4nu> Rob: You need to see how long it takes for the anchoring to become obsolete... don't remember if anchoring was through time.

<Zakim> m4nu, you wanted to ask about credentials and digital signatures.

<m4nu> Manu: What about digital signatures, what about credentials, where does that fit into your timeline?

<m4nu> Rob: Very trivial agents that can be associated with annotation, target could be another organizations, however, w/o credentials or signatures you could trivially spoof information - publish annotations that claim you're the author of body - reputation models, that's an issue.

<m4nu> Rob: This is really important to get right - if you want to spam someone, million followers, a million followers get spammed w/ extraneous content - we know it's going to be important - we want to make sure it'll be possible in future, we don't have time to do that right now, but it's certainly on radar to work on actively - would want to get started - could be another tick mark on ledger

<m4nu> to seeking a second charter

<Zakim> alexmilowski, you wanted to ask What is the status of the other items on the charter or have they been rolled up into the existing WD documents?

<m4nu> Alex: Are these things rolled into other things, are there things that still need to be done - six areas of work - serialization, data model, protocol, client-side APIs that use protocol.

<m4nu> Rob: Data model for specification, model + vocab + serializations - protocol stands alone, but doesnt have stuff for notification or search

<m4nu> Rob: We can't solve search in this iteration - client-side API, robust anchoring - great if you have experience requirements interest - find text API will start to be addressed for robust anchoring.

<m4nu> Rob: However, we do not yet have a client-side, make it easy to create/manipulate annotations in a browser - there was a pre-WD spec written up by Nick Stenning, but we haven't been able to take that forwards w/ enough momentum, one of the issues in the WG - lack of input from WebApps side. We've been trying to work with WebApps - find text, trying to make sure we don't spec something that

<m4nu> doesn't work w/ other APIs.

<m4nu> Rob: Call for help in that space - collaboration.

<Zakim> m4nu, you wanted to ask abou t@id and @type

<m4nu> Rob: If our answers are not complete, please ask.

<m4nu> Benjamin: There are a lot of JSON databases that use ID and Type in a different way - annotator already uses UUID, for those developers, they would have to change the context than putting square brackets - so namespace is a new thing - sorry about brackets - lower pain point, change all IDs to URLs. We felt this could co-exist next to existing JSON - could be thrown away.

<m4nu> Benjamin: Leaving them around it problematic - but less so than ...

<m4nu> Benjamin: Spec - context is optional - could express JSON in this shape, or you could upgrade it - you could add a link header.

<scribe> scribenick: rhiaro_

m4nu: the best option seems to be to get rid of all the @ signs to make it ieasier for js developers
... alias @id to id and @type to type
... are you working with legacy data that has id and type?

bigbluehat: we're working with annotations systems all over the web, and we're trying to get them to upgrade
... this was the easiest way. So we're not destroying keys they already depend on
... It's still an open question
... It was put in and taken back out
... We could consider undoing that

m4nu: how much of your user base are you going to destory by introducing this weird @ stuff into this data
... do you want to get more adoption at the risk o fmaking the data uglier forever?

bigbluehat: I come from CouchDB land which has thes e ugly underscore prefix things
... people have just got used to these
... as being not their stuff
... evertyhing else can be there stuff
... that's not great either
... we don't want to pollute someone elses keyspace at all
... Some amount of developers, mongo and couch, are okay with the shape of their json changing if I use that database, I now have these keys I don't like
... But the @ one is a little awkward because it requires 4 more characters, underscore does not

m4nu: we use mongo and couch and have aliased everything and it worked fine

bigbluehat: I just mentioned those as they are doing awkward ids

m4nu: just because they are doesn't mean you do
... THe @ signs were put in there so legacy json data could easily be ported to json-ld
... I understand you have legacy data, I understand you don't want to alienate those devs or make them rewrite applications, that's valid
... but if you could change it and they could agree the change to their data and their apps, and have a nicer format that looks just like json, that would be best

azaroth: one other issue that came up in the discussion was we want to use things like activitystreams for having eg. a collection of annotations
... if at al lpossible we don't want to create another collectoin spec
... as2 uses @id and @type
... we were concerned that if we did aliasing and they don't and we wanted to use them together that would cause problems

m4nu: right, that would be a problem
... when we first did the @id thing we hoped that would never happen, but now it is
... and in schema.org

tantek: we can fix it

m4nu: alias it

danbri: just the @ sign?

m4nu: alias any json-ld keyword that starts with an @ sign
... to not have an @ sign

bigbluehat: as1 has id but it is a uri
... so no problem there
... and means the same thing
... and used objectType earlier, so no cost to change
... For most of the formats an id and type shift is probably the most marginal change that they'd have to make
... Other things are harder
... Thank you. I didn't know about json-ld, could you put that in the spec

<tantek> m4nu: https://github.com/jasnell/w3c-socialwg-activitystreams/issues/

m4nu: we're trying to put together a best practices thing about that

tantek: could you file an issue against as2?

m4nu: okay

azaroth: after 5, q is empty!

<m4nu> scribe: m4nu

<tantek> https://indiewebcamp.com/annotation-use-cases

<danbri> m4nu, got an example of proper @context mapping syntax for @type -> type etc?

<danbri> i.e. for https://github.com/schemaorg/schemaorg/blob/sdo-phobos/api.py#L663

Tantek: Most of those annotations are post-types that they're annotating - if you want to look at more examples, is that compatible w/ your model - as input to social web work - best examples in the model - here are people posting replies/reviews. Mostly JSON-focused.

Benjamin: take found JSON, wordpress comments that are not JSON - upgrade those into the model
... Schema.org has JSON shapes that match or don't match - what you're doing, what you're not doing.

Dan: No way world will adopt single mechanism for annotations - there are too many different ways to do it.

Benjamin: This annotation, if you want 3 different types - knock yourself out.

xidorn: One more annotation use case - not aware of - why east asian video sites - Danmaku - text host where video is created, text will move with the video in some direction - in-video comment, probably another use case.

Benjamin: We don't have that written up as use case - fragment selector - media fragment on time-based positioning - this video and this 10 second mark.
... Any fragment-based selector ontology - 10 that you dereference - make it an RFC - reference non-RFC specs. - here's value hash, media time

<tantek> bigbluehat, hopefully you can mention https://indiewebcamp.com/fragmentions as well!

Benjamin: If XPointer had become something, it would've worked.

<tantek> fragmentions is essentially a modern HTML-based replacement for XPointer ranges

Rob: The list of things is not affected - these are examples. Most of them can be used in URIs. From 30 seconds to 60 seconds of this video, fragment according to fragment, media fragment

<danbri> m4nu, ok i found http://www.w3.org/TR/json-ld/#aliasing-keywords :)

<Zakim> m4nu, you wanted to ask about type coercion in JSON-LD.

Manu: Why do you have so many @id?

Rob: We had a long discussion about this - let's take it to hallway discussion.

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.140 (CVS log)
$Date: 2015/10/29 00:46:20 $