Web Annotation Working Group F2F at TPAC 2014 (Santa Clara)

28 Oct 2014


See also: IRC log


Frederick Hirsch, Rob Sanderson, Thomas Smailus, Ray Denenberg, Doug Schepers, Tim Cole, Paolo Ciccarese, Markus Gylling, Ben De Meester, Raphael Troncy, Randall Leeds, Dan Whaley, Jake Hartnell, Arron Eicholz, Benjamin Young, MarkS, Davis, John Pedersen_(observer), Takeshi Kanai (observer), Erik Mannens, Tzviya Siegman, Gergely Újvári, Renoir Boulanger, Dave Lewis, Kristóf Csillag
Frederick Hirsch, Rob Sanderson
tilgovi, MarkS, nickstenn, TimCole


<trackbot> Date: 28 October 2014

<fjh> ScribeNick: tilgovi

Agenda review

fjh: we had a meeting with webapps which shepazu lead
... we need to talk about the model, what we need to do to get an editor's draft out
... paoloC will give us an update
... we have to decide as a WG how to handle the issues
... we've added a topic for the social web group to visit us
... we'll take breaks and lunch
... we'll need to discuss implementations
... we should also discuss serialization
... HTTP API and Client side are lower priority
... we can also talk about i18n, security, privacy, but probably premature

azaroth: do we want to move implementation discussions up front?

fjh: maybe... it might inform what we're doing

rayd: we might want to give consideration for when some people have other meetings, when planning break times


fjh: is there any concern with the minutes from our last call?

paoloC: we discussed json-ld and sparql queries in the last meeting, we should revisit those

RESOLUTION: minutes from 15 Oct approved

Recap CSV

<bjdmeest> http://www.w3.org/2014/10/15-annotation-minutes.html

fjh: we met with the CSV WG, who is putting CSV files on the Web
... what they want to do is annotate that data -- cells, rows, etc
... the target is straightforward because they have a URL fragment that's well defined
... they have the CSV and then a separate metadata file
... they want to be able to embed the annotations into the metadata
... they understand annotations can be distributed separately, but they want this ability
... they have a JSON-LD format, where annotations may fit
... discussions about what's normative, what dependencies, etc. They will think about an extension point to decouple.
... They do not want the annotation representation to be complicated.
... Example: when the body is text, they just want text.
... so that discussion may resume
... they want to be done by August, but we should be okay because we can cover the data model this year

<bjdmeest> minutes of the csv meeting: http://www.w3.org/2014/10/27-csvw-minutes.html

fjh: I don't think everyone in the room was convince about the rationale for simplifying bodies

shepazu: I anticipate that most of the people that are supplying annotations are going to be things like (insert social media thing)
... they're not going to want (what they say are) complicated models
... Two things can address this:
... 1) A serialization. HTML or JSON that maps specifically, for simple cases, to the data model.
... 2) We might be able to do something with context headers.

fjh: We've talked about JSON-LD. There are tradeoffs, but it will help with the common cases.
... we'll talk about this later, though.

Recap WebApps Robust Anchoring

<MarkS> Minutes from discussion with webapps -> http://www.w3.org/2014/10/27-webapps-minutes.html#item27

fjh: We had an hour session with WebApps yesterday and talked about robust anchoring
... Doug (shepazu) presented
... Travis (Microsoft) gave some feedback to Doug
... Simpler summary: Doug gave the summary, things we've looked into, and got feedback about performance, etc
... We'll add robust anchoring to the agenda
... after serialization

azaroth: we should probably do an interest poll
... we should prioritize our time
... so everyone is feeling like they can contribute to the discussion
... straw poll for model, vocab, and serialization

fjh: We _have_ to do this

azaroth: HTTP API?
... Client side APIs?

TimCole: Can we wrap targeting into that?

fjh: We should maybe keep it separate.

azaroth: Robust anchoring?

<shepazu> https://specs.webplatform.org/anno-model/w3c/draft/specific.html

shepazu: an editor's draft is an informal specification, not necessarily consensus, it's scratch space
... when the WG says "we agree with this" then we can publish it as a working draft
... in this case we'll publish it as a first public working draft
... right now we have a draft that we're working from (the open annotation data model)
... published as an editor's draft on specs.webplatform.org
... the spec itself has annotation support so I propose we use annotations as a way to give feedback
... If anybody's interested in the break find me (shepazu) and we'll get you signed in
... There's a lot of a value in us doing some simple annotations so that the AC understands what we're doing on how it impacts the W3C when Dan (dwhly) presents to the AC later

azaroth: Does anyone want to be the annotator scribe? While we discuss topics, they could add comments to the spec itself.

Implementation Review

UNKNOWN_SPEAKER: There's been a bunch of work at Hypothes.is, and by paoloC

paoloC: I've been focusing on the backend
... we have several applications that can produce annotations (CG model)
... interop has not been proven
... so the idea is to have a backend that can accommodate different clients

<fjh> interoperabilty at scale

paoloC: I've developed a backend that can receive annotations in the community group spec data model

<fjh> text mining services, entity recognition servcies can do work and store as annotations, including provenance

paoloC: And there is a framework to plug in text mining and entity recognition services to get back machine annotations
... I take existing services and build connectors to translate results into open annotations
... OpenCalais, DBPedia Spotlight, etc

<raphael> NERD (http://nerd.eurecom.fr) is a named entity recognition and disambiguation platform

paoloC: you can take these and connect them by translating results into open annotation

<stain> I'll lurk from IRC

<raphael> NERD is summing up DBpedia Spotlight, OpenCalais and 12+ extractors

<fjh> ScribeNick: tilgovi

<paoloC> Annotopia server on github https://github.com/Annotopia

raphael: A commonality for all those tools is the ability to locate entity strings in the original text
... a lot of these tools use the same model, which is a NIF, an abstract way of locating strings in a text
... I think there is a connection here between robust anchoring and NIF (Natural Language Processing Interchange Format)

paoloC: there is another way emerging, called BioC, which was made by the biomedical community
... it's a bit tricky to translate into open annotation, but it would be nice to have a connector for that

<fjh> connectors have value to crreate open annotations from various sources

<paoloC> Annotopia server presentation http://www.slideshare.net/paolociccarese/annotopia-overview-by-paolo-ciccarese

<paoloC> Annotopia presentation at I Annotate 2014 https://www.youtube.com/watch?v=UGvUbFv0Zl8

<raphael> NLP Interchange Format (NIF) 2.0 - Overview and Documentation at http://persistence.uni-leipzig.org/nlp2rdf/index.html#specification

<paoloC> BioC - A Minimalist Approach to Interoperability for Biomedical Text Processing - http://bioc.sourceforge.net/

fjh: Nick, talk about what's new with implementation at Hypothes.is
... not everyone was at I Annotate or the Workshop, so some of this is new

nickstenn: I used to work on a project OpenShakespeare, and one of the things we built was an annotation tool
... it became obvious this might bea more general purpose tool

<azaroth> ScribeNick: tilgovi

nickstenn: it was extracted and became a thing called Annotator
... A "hack on a stick" of a javascript library for annotating HTML
... Slowly but surely more people got interested in it
... Eventually picked up by Hypothes.is and now sits at the core of a somewhat more mature annotation product
... In parallel, people who have worked on Annotator in the past are now working with or alongside Hypothes.is
... In the last six months or so, we've been focusing on making Annotator the library be a platform for implementing annotation applications on the Web.
... That means turning it from a ball of mud into an assembly of useful components.
... We are within a couple of months of being able to release Annotator v2
... It will hopefully have native support for something like Open Annotation
... At least harmonizing the vocabularity of the data model.
... Additionally, the UI has been pulled out, so you can pick what you need out of it

fjh: Is it Open Source?

nickstenn: Annotator is dual-license, MIT and GPL, which we may change and are interested to know what opinions are in the group
... Hypothes.is is also all open source

TimCole: We will probably get to the HTTP APIs today, but will not get to the Client API topic.
... How do we make use of this work, but still get our work done.
... ?

fjh: (summarizing) Question from Tim is, how does the WG activity relate to the tooling work?

shepazu: I think what's more relevant (HTTP API) than the Hypothes.is stuff is the stuff the Social Web WG is doing.
... I wouldn't want to create a new thing if there's something that is existing

nickstenn: I would add that we (Hypothes.is) are not here to shop our implementation.
... We're here to share our experiences (much of it is lessons about what is not good)

azaroth: At Stanford we build an LDP (linked data platform) server that is compliant with the latest and then some middleware between the LDP and the server to make it easier to bridge annotations
... a very thin layer between the client and the LDP server

raphael: Having the CG data model as a starting point does not prevent us from making changes to it

shepazu: there have already been suggested changes

fjh: Another question about Open Source... do we have a sense for how Annotator has extended beyond the community here

nickstenn: Every time I talk to people about annotation I discover more people using annotator

paoloC: EdX is using it with extension for thousands of students

fjh: Are is more implementation work that's represented here that we should mention?

paoloC: Manchester University work, focused mostly on bio, is going to save the annotations through Annotopia in Open Annotation

TimCole: The Maryland Institute for the Humanities has some implementation

<paoloC> We are working to integrate Utopia http://utopiadocs.com/ for PDF with Annotopia using Open Annotation

raphael: Within the European LinkedTV work, there is some extension of the Open Annotation data model
... using media fragments URI for anchoring and the Open Annotation core for annotating

<raphael> LinkedTV core ontology: http://data.linkedtv.eu/ontologies/core/core.html


<azaroth> http://www.idpf.org/epub/oa/

<TimCole> Maryland Institute for the Humanities: http://umd-mith.github.io/OACVideoAnnotator/

mgylling: IDPF has been doing work on bundling annotations, etc
... I know of two implementations done, but we expect more to come
... we hope to follow any changes done in the model here in our work

azaroth: Who were the implementors?

<paoloC> Video that showcases the integration of Domeo Annotation tool and Utopia for PDF through Open Annotation and Annotopia https://www.youtube.com/watch?v=OrNX6Sfg_RQ

mgylling: One in Korea, and another, PubliWide (sp?)

<raphael> LinkedTV annotation server and SPARQL endpoint: http://api.linkedtv.eu/ and http://data.linkedtv.eu/sparql

alastair: We use CFI in iBooks to map references between editions and provide a fallback for annotation anchoring
... As far as storage, it's just Core Data
... Forward compatibility has been something we have to pay attention to. With Core Data that's very painful.
... We have had to leave some things empty to be populated in the future, so as not to change the schema

<fjh> CFI compound fragement identifier

<azaroth> http://www.idpf.org/epub/linking/cfi/epub-cfi.html

Data Model Discussion

paoloC: Who is using a triple store in their implementation (besides me)?

raphael: <raises hand>

paoloC: In the last emails, it's been a little confused the difference between model and serialization

<shepazu> ericP says Annotea used a triple store

paoloC: I just want to make sure that when we talk about model we talk about model and serialization as separate things

fjh: is that a consequence of how we're creating our deliverables?

paoloC: we decided in the oct 15 call that we would take the old spec and swap turtle with json-ld
... and a lot of the discussion came out of that
... (making lists in a nice way, etc)
... We agreed on that (the turtle -> json-ld) and taking out the sparql

nickstenn: Is one of the serializations less prone to attract discussion of the serialization? Is one better for focusing on the model?

fjh: I think the turtle is better for this

raphael: I think it depends on the people

bigbluehat: I'd like to see the turtle around

fjh: It's important not to lose the underlying benefits of the model and not get carried away with the serialization

nickstenn: It would seem to me (not knowing much about either) that having both side-by-side might be a sensible self-check

fjh: I think some people think there's a fear that some people will be scared off by the turtle
... I don't think that's founded
... I think we should have both in the document for now

shepazu: I think we've talked about this before and Ivan has experienced this... when people experience unfamiliar syntax and discover it's RDF they get scared off

fjh: We just won't tell them it's RDF
... I don't think we should yank it out

shepazu: I think ti might hurt success

fjh: I think you're making assumptions
... we're assumnig for some reason this is going to be a barrier to adoption

nickstenn: my take is that, as a rule, having more representations (if they're conceptually consistent) allows people to focus on the one that makes the most sense to them

<ericP> OWL 2 Primer gives readers a serialization choice

shepazu: We (W3C) have seen many people, especially in the browser community, move on when the discussion moves to RDF
... I think if we make it a representation of RDF we're making a big mistake, shooting ourselves in the foot
... companies we have interacted with (not just browser vendors) have said they are not interested in RDF
... These same companies often specifically prefer JSON-LD
... They do not realize that JSON-LD is RDF

bjdmeest: Why don't we make a tabbed inferface (default on JSON)?

<Zakim> bigbluehat, you wanted to show off schema.org examples

bigbluehat: schema.org does that

fjh: we do make printable versions of recs, so I'm not sure how that would work, but I see.

<raphael> +1 for following the schema.org ways of tabbing multiple serializations regarding the examples to be used in the data model

fjh: I hear what you (shepazu) is saying, but I question whether the first document they're going to look at is the data model
... I think the browsers have other reasons for saying they're not interested, I think they're smart people, and I think they can read two representations.
... I don't buy the argument that RDF is such a scary thing. Maybe focusing on RDF a lot is, but not as one representation for examples.
... Maybe we should talk to some people, because I don't understand.

nickstenn: I have a lot of sympathy because I am one person who scratches my head when RDF is brought up
... If, as a group, we agree with Doug (fundamentally) that RDF is a thing that's scary then we should remove it entirely.
... Having a JSON-LD-LD serialization isn't a solution if these people are scared by RDF
... I don't know who these people are that don't understand that JSON-LD is a serialization of RDF
... My preference would be to have both since I believe we cannot rip RDF entirely out of the spec

shepazu: I would be happy with a solution like schema.org (tabbed interface with different serializations)
... I would note that I was just with a schema.org person on Sunday and he joked that "at some point" we'll tell people this is RDF

azaroth: danbri

shepazu: People think of RDFa as something different from RDF
... before we go to first editor's draft we should think carefully about who we're trying to target

TimCole: Things are not static. One of two things will happen: people will hate RDF and recognize JSON-LD is RDF, or people will not be as scared when they realize they're already using it.
... let's make a decision before the first public draft, but I think times are changing

bigbluehat: <showing the json-ld.org playground for other examples of making thing less scary and showing the equivalence>
... Turtle, to me, is this nice base.

shepazu: fwiw, as I was reading the spec, I was completely befuddled by the Turtle

fjh: I was the opposite

nickstenn: sometimes people are different from other people

raphael: we will not find a solution that fits all

tilgovi: <stares blankly, trying to scribe 5 people babbling at once>

<bigbluehat> JSON-LD by default, text/turtle as an additional tab...then...whatever else

azaroth: General consensus (maybe) is that a tabbed interface with JSON-LD as the primary one may have the most support?

<nickstenn> +1

fjh: The resolution would be that we'll keep both forms but add a mechanism to allow the reader to display the format they like.

PROPOSAL: We will keep multiple forms for serialization and use some mechanism to have JSON-LD as primary For the editor's draft

raphael: Process point: was that a resolution or a proposal for a resolution?

<mgylling> +1

fjh: we'll ask if there's consensus, but not have a vote
... I'm assuming, that you're asking if we need further discussion of the resolution that we're proposing

raphael: we should not scribe "RESOLUTION" before we have a resolution

azaroth: is there any objection the proposed resolution?

nickstenn: We will have some examples, which will be serializations of a data model by virtue of the fact that the data model *is* abstract
... We were discussing whether or not we'd have Turtle in addition to JSON-LD... are we now discussing whether we'll have other ones?
... I'd like to move on.

fjh: I agree.

paoloC: Ultimately, we have to write a document. So let's have an understanding.
... Rob (azaroth) and I used pictures in the community group spec.
... We still used RDF because we have the Turtle
... I've seen abstract ways of defining these models that were not intimately connected to serialization.
... However, my personal position is that these ways always throw me off.
... Can we keep figures and the two serializations as we used to do?

fjh: I think we should agree to that now and keep that for the first editor's draft.

paoloC: We start with JSON-LD...

fjh: ... and we're going to iterate.

<raphael> +1 for the proposal

<shepazu> s/supporting shepazu/

azaroth: I'll spend a minute describing the current issues
... Serialization of lists is a current issue. In the current data model the multiplicity constructs don't use RDF list consistently
... Do we change to a model that results in an easier serialization in JSON-LD?

Data Model issues review

azaroth: The second issue is another multiplicity issue: should choice have a priority order or weighted items?
... Embedded content is a big issue: most current and previous annotations systems only allow textual comments
... so it is expected that the body of the annotation will be somehow embedded.
... The data model says that any resource can be used as a body.
... Hence, you could have a URI. If we have both at once we should have some way of embedding a representation within the graph.

<fjh> https://github.com/w3c/web-annotation/issues

azaroth: We used to use Content in RDF but they're not progressing so we *have* to change that.
... A recommendation is that we consider allowing string literals to be directly attached.
... Regardless of voting, this is a priority topic.
... Another issue is distinguishing semantic tags from from resources.
... We currently distinguish a tag by giving it a class of oa:SemanticTag
... Another issue: if we have lists do we need both sets (unordered) and lists (ordered)
... Another issue: Do we need to be more expressive about agents and provenance?
... Intended audience. This issue is about some resources having intended audience. Example: some resources may be better for K-12, some for university ... how do you express that?
... Also, Annotations could have an intended audience.
... Should the annotation concept and document be distinguished? Another issue -- we decided not to have two URIs, one for the serialization and one for the conceptual annotation.
... This means we had to do the annotator and the serializer attached to the same resource. This makes life easier for consumer but is maybe bad for provenance.

paoloC: The "Role of Target/Body" issue is when we have multiple targets and bodies how do we distinguish and relate them

azaroth: JSON-LD context issue: the current context uses exactly the same names (e.g. hasBody) as the RDF, but that may be unintuitive for those more familiar with "regular" JSON
... The final one: should we, and how, do we allow literals directly as a body.
... The embedded content issue is, I strongly believe, the big one.

rayd: the proposal you made a week ago is still a viable way forward?

paoloC: <discussing adopting content as text> The only advantage is to have no dependencies, but the meaning will overlap almost exactly.

rayd: There's a draft spec for bibframe in the library community that directly references content in rdf

TimCole: For our purposes, we would not want the entire thing

nickstenn: from my perspective being able to embed small bodies is probably a requirement for many people embedding annotations in their system
... so it is probably important that we have support
... I have no technically insight into the right answer, but we certainly shouldn't build something if a good specification already exists

azaroth: We should ask the content in rdf group if it's alright with them if we adopt their namespace for work (minus the xml)
... or we should do it ourselves. These are the two options.

TimCole: Is anything else priority that we have to get to, or can we do it all on email after?
... Are there opinions in the room?

nickstenn: what's the difference in use case between a literal and an embedded resource?

rayd: I think there are two separate issues: supporting embedded content and allowing (or not) literals.

TimCole: even if we allow literals as a body, we may still want to accommodate embedded resources

nickstenn: to clarify, my understanding was that if we were take content as text it would probably be a limited subset (utf-8, text)
... is the only difference between these two types of body the only difference that you could have different metadata


azaroth: yes

shepazu: we have a tight charter, we should not do things that are about fundamental features of rdf. I suggest people who want to do that join the WG under whose charter this work is done.

TimCole: but the annotation wg still has to decide...

nickstenn: if we decide this is important for annotation then this working group has to do this work. why would we do it again ourselves?

raphael: this is all tied to the fate of whether the content in rdf document is still in progress anywhere in w3c

bigbluehat: I think shepazu's concern is that this conversation only affects rdf representations

shepazu: we'll talk offline and we'll address it some other time

fjh: maybe it's more productive to figure out first exactly what we need and then discuss which is the best way to address it

TimCole: the use case that motivated this was wanting to be able to talk about the annotation body rather than just the annotation
... in the case of text... is it HTML text? some other specialized text?
... (which may come up in the CSV use cases... or maybe not)
... We may have an annotation that multiple bodies... these are use cases which I think are real
... that require a way to talk about the text that is the body of an annotation

<JohnPedersen> or if you wanted to annotate MathML - or does that count as text?

TimCole: that's not to say there aren't more uses cases where yo ujust want the string

azaroth: I sent an email last night with the various options we should look through

<paoloC> Rob's email http://lists.w3.org/Archives/Public/public-annotation/2014Oct/0107.html

azaroth: the simplest case is simpler than now if we allow a string in hasBody
... currently, we would do {"hasBody": {"rdf:value": "this is the comment"}}
... a proposed simplification is {"hasBody": "this is the comment"}
... however, if we allow literals, we have to allow this construct {"hasBody": {"@value": "This is the comment", "@language": "en"}}
... The concern is if you come from an RDF store that you'll get the language representation
... another concern is the distinction between (3) and (5) (using @type vs "dc:format", which allows any mime types)

TimCole: Another concern is that sometimes we might get a string literal and sometimes we might get a URI
... is there really a problem with allowing all of the examples?

nickstenn: Is it harder, if you've implemented your own storage, to automatically spit out objects? No. But if you are aiming for true interoperability then yes, it's harder.

bigbluehat: the later ones that inline the context is tedious if it's just a string body
... one of the things I love most about JSON-LD is the context object

nickstenn: there is another difference between an implementation that provides for both (1) and (2)... there is an on-ramp
... the implementation complexity of having to deal with different types is comparable or lower than the complexity of dealing with big, complex resources

<fjh> randall: concern about allowing 1 and 8a, which might result in an ambiguity

<fjh> randall: want to have consistent context

<fjh> rob: they are mutually exclusive

azaroth: The last thing to point out is that there is no equivalent for (7) in literal form. It's not allowed to have a type and a language. That won't change any time soon.
... If you have to have a format and a language then you must use a resource.

TimCole: what we're basically doing is ignoring the preference and saying "you have to parse it"
... one proposal was to have hasBody and hasLiteralBody, but most people (I think) thought that was unnecessary

nickstenn: brief point about the nature of the trade-off here
... one of the reasons people might accept that utf-8 was adopted more than utf-16 is that a broken utf-8 encoding is more useful than a broken utf-16 encoding
... the analogy is that we're imposing the cost of discovering implementation failures later if we allow simple cases because implementations may get on well for a while until suddenly something breaks in a complex example

azaroth: the distinction between (2) and (6), and (3) and (5), is going to be confusing to people. Having sometimes an "@" and sometimes not puts bumps in the on ramp.

TimCole: the only thing on this we've experienced in the past is that some people will assume the model cannot support the complexity they need if no one is implementing it

fjh: 80% or 90% are going to want (1) for usability, but a naive implementor may then not implement the other cases.

nickstenn: they may not be naive, just busy

paoloC: If you are creating your own client, you make a decision.
... but if I implement a backend for interop then I have to accept different clients
... Maybe not everyone wants to do that, but our goal is interop

shepazu: I want to make sure we understand there are classes of conforming agents here. We should make the conforming agents explicit in the document which we don't do now.
... If we think it's okay for a MUST and a SHOULD so that authors and implementors could conform to one, but not the most complex
... that isn't great for interop, but it is understandable.

nickstenn: my intuition is that we should write a spec that encourages people to implement more of the spec but that gives a starting point so there are at least some implementations that conform to some subset

<fjh> levels of conformance

azaroth: one way to avoid this is to have a separate property for literal bodies

<paoloC> Maybe we could think of having multiple levels of compliance to the specs

azaroth: when you see "body" it's always a resource. when you see "literal body" it's always a literal.

JohnPedersen: when we say "string literal" does that include being able to include mathml (without loss of generalization) in your comment?

azaroth: No, but you would have to specify the format of that literal.

TimCole: That's the motivation for case (3)... give the consuming application information about how to interpret that literal

<fjh> issue: how to specify language of annotation

<trackbot> Created ISSUE-8 - How to specify language of annotation. Please complete additional details at <http://www.w3.org/annotation/track/issues/8/edit>.

<fjh> ISSUE-8: annotation is relationship of body target, hence they have languages but not annotation itself

<trackbot> Notes added to ISSUE-8 How to specify language of annotation.

nickstenn: to me having a separate literal body relationships muddles the line between model and serialization
... from an implementation perspective, detecting the presence of a key is similar in complexity to detecting the data type of the value

paoloC: I see a parallel to the tags. We had a separate relationship for semantic tags, but now we add a class to the body.

azaroth: Agree with Paolo bringing back tags to the discussion, a lot of people will want to use literals for tags and bodies and that makes it hard to understand what is a tag and what is a comment.

<JohnPedersen> resume at what time?

<JohnPedersen> what time are we starting up again?

<JohnPedersen> thanks fjh and assuming that is PDT :)

<shepazu> https://specs.webplatform.org/anno-model/w3c/draft/core.html

<JakeHart> ^^ Please annotate the above document.

<MarkS> scribeNick: MarkS

azaroth: recommend 4-7
... literals for the body
... option 1 and 4-7. should not use 2-3. onramp for simple annotations, anything more than baseline, you end up in a comparative model
... then it has to be 8b

NS: right solutions would be to define levels of implementation
... ordered by the levels of complexity
... recognize that implementers may want to start more simply

Paolo: level are a good idea. we have levels for our collections, different assets, et
... option 1, if I have a triplestore, and it comes back with non-conforming text, what should I do?

azaroth: you should convert to 6

NS: not something we should have to handle

azaroth: do we want to discuss levels then?

[agreement to table modeling for levels]

azaroth: if we have levels of compliance, we can then create rules for handling exceptions

nickstenn: levels are realistic assessment of complexity

alistair: SVG had something similar where people weren't able to implement the full spec and no mechanism for indicating full or partial implementation

azaroth: here is an example

shepazu: profiles are not very popular

nickstenn: what is the distinction of a profile, how does it differ from level

shepazu: i think we should talk to implementers to see how they feel about it.

nickstenn: ietf's uri templates spec every implementation of this spec is level 4. its smaller then what we are taking on, but its been successful here
... you can create a level 0 in no time, but everyone wants to be level 4

azaroth: is this like CSS?

shepazu: they build on each other, much like levels
... they are sequential

paolo: the effect this will have on the spec, I will start with literal and then describe resource
... to avoid levels becoming different specs

shepazu: levels are not popular among browser vendors. people will push back against optional features.
... need to talk to community to collect feedback

alistair: i would look at this and the more complicated it gets and if it doesn't fit our needs, I can't justify it to my boss.
... we already have a "level 0"
... it will be good enough for us at some point
... even with ePub, we don't support all of ePub
... in fact, I don't know of any ePub reader that supports everything

shepazu: it may be that we do have the next level as an optional feature.

nickstenn: i agree with shepazu that this kind of thing can introduce complexities that I don't think we want to deal with.
... don't want to end up in feature detection land
... thats not interop
... clear precise levels, building on the previous one is the ideal
... allowing them to identify which piece they want to focus on first, and having a shared language they can use to describe level of support

shepazu: perhaps we should be flexible enough that if things go beyond the model, they are still conforming

azaroth: anything that is not prohibited is allowed

takeshi: certain profiles might be limited to device/platform

shepazu: i'm leaning towards simplicity here

alistair: simple fallback, if you don't understand what I'm sending, you can understand a basic set of it

paolo: in terms of writing the spec, is it better to start simple and increase in complexity?
... or should we write out the entire spec and then break it out into levels

shepazu: have a specification, then test for implementation support, whatever doesn't get implemented gets moved to a next level spec

bigbluehat: its dangerous to do

paolo: this sounds like very short term planning

shepazu: deferring features from one version to the next. builds momentum very well

nickstenn: to early to be talking about testing at this point. in terms of whether or not we do levels or what we think people will implement... I think we have a small spec now, easy to implement

azaroth: too early to be worried about what will be implemented and what wont
... we should spec out what we ideally want/need

nickstenn: we need to give them a language to talk about what they are going to implement

<azaroth_> +1 to levels in the spec

<nickstenn> +1

<paoloC> +1

<bigbluehat> +1

<shepazu> -1 to levels as static feature of the spec

<JakeHart> +0

<bjdmeest> +1

<azaroth_> fjh: +1

<tilgovi> RESOLUTION: The WG will define monotonically increasing levels of conformance. The editors will attempt to include these in the specification, and if not will have a separate method for recording them

azaroth: if we have levels, does it help us embed bodies
... at level zero, we can require string literals only. option 1
... that is the simplest case. most basic version
... Level 1, you should use resources
... Level 2...

paolo: can we ask people to use literals without language and type

azaroth: level 1: 4-7 would be supported

paolo: level 0: option 1. then we have option 2 and 3 for language and type, allow 1 as long as you don't do 2 or 3. wondering if that is a fair request
... rdf allows language and type as a basic feature.
... you are saying you can use basic literal but only the way we say you can use it

nickstenn: serialization hides the details of lang and type
... do we require level 0 to understand that
... i would say no
... we are thinking about a default serialization method. as an implementer, i would like to say I don't have to worry about option 2 or 3.

ericP: you can further constrain RDF. I think its useful to pass this info

<paoloC> Pass contraints on RDF to the Data Shapes Working Group

bigbluehat: i've seen implementations in json-ld that add a graph on to a simple annotation that adds all sorts of other meta

azaroth: the reason i didn't put that into the list was to foreground the RDF nature of this work

Frederick: wouldn't you want to be able to express a simple string in different languages?

azaroth: the proposal was to allow literals without lang and type, or use option 5 or 6. if you require both, you must use type 3 or 4

nickstenn: no matter what language it is, I'm just going to print it to the screen. if I need more, I will jump to option 4

shepazu: agreeing with bigbluehat, its too early to be talking about breakpoints for levels at this point.
... plan to propose to WG is to work in github, and mirror on webplatform.org, editors draft, so that we can annotate them.

azaroth: can we use our existing repo?

shepazu: team says we want to have a different repo for each spec
... the data model spec should be in a new repo

azaroth: we start using this repo for only this specification and make new repos for related specs

<azaroth> ScribeNick: tilgovi

<azaroth> PROPOSAL: The model will allow literal bodies as the object of the hasBody relationship. Plain xsd:strings are the only thing supported in this manner, and not data typing or language tagging, or any other properties. Any other requirements will use a resource, structure to be defined

<azaroth> +1 to the proposal

<paoloC> +1

<azaroth> fjh: +1

<nickstenn> +1

tilgovi: +1

<bjdmeest> +1

<JakeHart> +1

RESOLUTION: The model will allow literal bodies as the object of the hasBody relationship. Plain xsd:strings are the only thing supported in this manner, and not data typing or language tagging, or any other properties. Any other requirements will use a resource, structure to be defined

<JohnPedersen> is the phone bridge going to open?

<JohnPedersen> or has it changed from this morning?

azaroth: We have a first case where choice is some number of representations that are equivalent
... for isntance, different representations
... another case is where there is a defined ordering that matters
... perhaps where there is decreasing specificity, or some such

<npdoty> https://github.com/w3c/web-annotation/issues

<npdoty> https://github.com/w3c/web-annotation/issues/1

azaroth: A proposal is to use a real rdf:List construction which specifies "these things in this order"
... This solves three problems.
... 1) All constructs would serialize the same way, and the only thing that changes is the interpretation

<Davis> Repeating for offsite attendees: Is the phone bridge open?

azaroth: 2) The desire to have an unambiguous priority order for choices.
... 3) there are only two problems
... 3) Actually, the third is that in the current vocab there are three constructs. If we have a list, then there is no distinction between composite and list. Simpler.

paoloC: In terms of interop, if you send me something and it says "list", what does it means? What are the semantics?
... If you would like to classify the items of the list you must create separate constructs to do so. That's the only thing I have a problem with.

<fjh> is there a use case for the potential problem case

azaroth: The issue is that if you want to know all the items in the list you need to follow the chain.

nickstenn: I think I have a mild preference for the proposal as a whole on the basis that we're talking about data rather than a thing with behavior
... to me the difference between a composite and a list is a difference in how it's used

Zakim: agenda?

paoloC: maybe we should take an example and put it in a triple store and see what the queries look like so we know what's happening

<azaroth> PROPOSAL: Accept issue 1

<bjdmeest> +1 for simplicity

nickstenn: I don't think most people are using a triple store and we seem to be arguing from simplicity of querying in that particular storage

<azaroth> +1

azaroth: this is a 10% feature, most people probably don't even need the feature

<nickstenn> +1 in support of proposal

<azaroth> ( https://github.com/w3c/web-annotation/issues/1 )

<azaroth> RESOLUTION:Accepted

<fjh> [Post-Meeting Note: This resolution was replaced with the subsequent one after additional discussion]

<paoloC> It would good to have an idea on how that construct translates to RDF and how the SPARQL queries would look like

tilgovi: +1 paoloC. I'm also curious to learn this.

<fjh> +1 to paoloC on getting additional information

nickstenn: What should the semantics of an object be if I have the same resource in it twice

azaroth: in the past these would have been duplicate triples and therefore not allowed
... You must not have the same object twice in the members of a composite

fjh: Why does it matter?

Alastair: Does order matter?

fjh: I would argue that if you create a testable assertion then you have to test the assertion and why bother?

nickstenn: when would I _need_ composite

Would it help to discuss use cases that place a list or a composite in a body as opposed to a target?

paoloC: If I annotate data, I can have multiple targets in the data, but most likely the order won't matter.
... to me that's a composite
... but what I want to convey is not that they are in order but that I am annotating them together

fjh: there's a risk that if you don't have a composite people will assume it's ordered

azaroth: we're not really weighing down the model by saying "this is a set"

Could we punt to individual use domains who need a composite to have a resource type that composites the things they need to group?

paoloC: in the composite or choice theoretically you can sub property and say something more about the roles of each item
... now we totally lose that
... we can build that outside, and maybe that's better... wait and see who needs that
... but we should understand what we lose

nickstenn: is it worth considering that we accept part of the proposal
... composite remains, but list and choice become list

I'd point out that choice is actually just a list of an item and a composite

<azaroth> {

<azaroth> "@type": oa:Composite",

<azaroth> "item": [Target1, Target2]

<azaroth> }

<azaroth> {

<azaroth> "@type": "oa:List",

<azaroth> "members": [Target1, Target2]

<azaroth> }

<azaroth> {

<azaroth> "@type": "oa:Choice",

<azaroth> "members": [Target1, Target2]

<azaroth> }

azaroth: on the wire they are all the same, but
... in list and choice the object of the members predicate is an rdf:list
... in composite, the order is not important

Can you describe the difference between choice and list?

azaroth: in choice you pick one

nickstenn: the most obvious use case seems to me to be multiple targets
... targets may or may not have an ordering

azaroth: "I like these six stars" and "These six starts make up the ___ construct"... the first order doesn't matter but in the second we probably wish to put them in order

fjh: maybe we shouldn't belabor this and we should just keep all three for now

azaroth: I was going to agree with the optimization use case, where you may be incrementally finding targets in the order they appear in a document while incrementally parsing / scanning

nickstenn: seriously improtant optimization. consider searching for targets and stopping at the edge of the viewport, then searching for the rest as you continue

<azaroth> PROPOSAL: Accept the model that generates the above serialization

<nickstenn> +1

<fjh> +1

<paoloC> +1

<TimCole> +1

tilgovi: before I accept this, can we state whether there's any inheritance relatinoship between these multiplicity clasess?

fjh: isn't it sensible to say the answer is no?

azaroth: I think the answer is no
... we should defer the question

paoloC: did we talk about list of composite things? just put in the notes that we don't disallow nesting.

<fjh> not defer, if nobody cannot live with it, lets decide now

<azaroth> +1


<ujvari> +1

<bjdmeest> +1

RESOLUTION: Accept the model that generates the above serialization

azaroth: the only other issue we starred at the beginning of the day was "should annotation concept and document be distinguished?"

<fjh> on issue #10


<fjh> https://github.com/w3c/web-annotation/issues/10

paoloC: It seems that the reason this has been brought up again is because of provenance
... sometimes I need to know who created the conceptual annotation and who created the digital artifact
... If you need to have a distinction and we don't want to deal with it in the model then we have to split the annotation into two
... I think we can probably sort that out using provenance vocabulary
... the way I do it is that I have one new relationship that avoids a lot of triples from prov

(everyone looking at Appendix A, Figure A: https://specs.webplatform.org/anno-model/w3c/draft/appendices.html#ProvMapping)

nickstenn: I do not understand how you would do this in a complicated case, but it is my intuition that such a case is a very small subset of people who would use this
... as long as we're not ruling out their use cases, we needn't have all that expressivity in our model

azaroth: we could say "annotatedBy" means ___ and if you want something else you can use something else

TimCole: It was thought to be a practical solution that you colud have multiple annotatedBy's by having more resources

paoloC: I wonder if there are use cases within this community where this will matter

<JakeHart> paoloC: was talking about ebooks.

<azaroth> ISSUE: What are the exact semantics of oa:annotatedBy

<trackbot> Created ISSUE-9 - What are the exact semantics of oa:annotatedby. Please complete additional details at <http://www.w3.org/annotation/track/issues/9/edit>.

<azaroth> PROPOSAL: We will not split annotation into document/concept.

RESOLUTION: We will not split annotation into document/concept.

<fjh> tilgovi: can have more than one annotated by?

<fjh> azaroth: yes

<azaroth> ISSUE: Can we have a list of agents as the object of oa:annotatedBy?

<trackbot> Created ISSUE-10 - Can we have a list of agents as the object of oa:annotatedby?. Please complete additional details at <http://www.w3.org/annotation/track/issues/10/edit>.

azaroth: as far as I can tell we have gone through all the critical decisions that we have to before publishing the first working draft

<Davis> I have to leave for the day. Thanks everyone, very interesting stuff.

azaroth: (recapping issues)
... (describing robust anchoring) the issue of a client wanting to know when its selector should anchor to a particular piece of content or not
... for instance, a position selector is not enough information to know if the string at that position is the same as it was when the annotation was created
... for some use cases where the content of a resource changes
... if browsers would support this kind of functionality natively we would have a better time implementing things

nickstenn: I would like to know how the conversation went yesterday. I have a deep hesitation for specifying too much of this right now.
... I think it would be very useful if specifications encourage user agents to have a canonical representation of a document (in theory, this is already there, in practice it's not)
... The algorithms and selector types, etc, I have no idea what that would look like and I wouldn't want to stifle innovation
... But it's obvious that there's a need to extract a canonical string of text or range of video, etc.

Alastair: I have a fairly deep understanding of the anchors (I was part of the CFI authorship for the epub wg)
... basically, cfi breaks down into location and assertion
... a location is "this is where I think it is" and an assertion is "this is what I expect to be here"
... Things we've done with CFI to keep in mind: we slice and dice HTML to provide better user experience
... We do snippets of chapters and cut out parts and then show annotations on top
... CFI is pretty dumb in a nice way that means they're just numbers that are ordered in document order
... the way that epubs are authored they can be more or less robust, depending on whether ids are provided on spans, spine elements, paragraphs, etc
... we also, when we do an update, if we find things we orphan them but don't get rid of them

shepazu: So the robustness there relies upon an authoring practice?

Alastair: Yes, which you can't depend upon.
... It does, to a certain extent, have some robustness.

shepazu: does cfi work with other things besides text?

Tzviya: yes

shepazu: I think we agree that the dumbest, least computationally expensive, selectors should be used first
... that seems uncontroversial

azaroth: (hat off, personally) I think it's uncontroversial, but with the addendum that there's concern you use rich assertions to reconstruct an object which is otherwise DRM'd
... so there are situations where it's undesireable to use quotations

shepazu: Everything I have seen suggests that aggregation of works which serve a different purpose is not a problem
... it is the reconstruction, not the publication of the quotations, that might be a problem
... the WG might not be the best place to speculate about legal issues

tzviya: I've done a lot of work on the epub specification
... ebooks have so much legal stuff around them
... don't think about it or nothing will get done
... the legal implications are up to the implementors and the authors

Alastair: we prevent you from making annotations that are copied text that are too large
... doesn't need to be in the spec

csillag: I wanted to say that we have implemented our strategies for working with selectors and coming up with matches
... in a way that they expect to find the quote and context to be available in the selector
... but even if they are not, we just don't have an assertion. We can locate without assertions, there's just no guarantee it's the same thing.

shepazu: I don't think the spec is going to require that any given selector be saved
... no one is obliged to save a quote

paoloC_: legal issues aside, I need the text quote or interop is not going to happen in between formats

nickstenn: part of the reason that is true is because there is no concept of canonicalization across different representations
... if there were, then perhaps you could use more selectors on documents in different formats

azaroth: also applies to selection by quote
... character encodings, etc

paoloC_: for instance, for pdfs, they've tried to understand the structure of the document of the pdf so the pieces are very well defined
... I don't know if this is reasonable or possible for other formats, or to what extent

Joint meeting with Social Web WG

shepazu: I'm going to give a visual tour of what we're trying to do, and then you (social) can do the same
... maybe start with a couple 3-5m presentations
... is that alright?

<KevinMarks> I thought humming was an IETF thing

<shepazu> http://www.w3.org/2014/annotation/diagrams/annotation-architecture.svg

<scribe> ScribeNick: nickstenn

<evanpro> Hi everyone!

<fjh> Minutes for this session were recorded in the social web WG minutes log, see http://socialwg.indiewebcamp.com/irc/social/2014-10-28#t1414535621276

<fjh> Also see http://www.w3.org/2014/10/28-social-minutes.html

<Arnaud> #social

<cwebber2> hello

<evanpro> http://evan.prodromou.name/files/TPAC/

<evanpro> shepazu: ^^^^

<KevinMarks> is that this wiki? https://www.w3.org/wiki/Web_Annotations or is there another one?

<dwhly> KevinMarks: have you considered supporting prefix and postfix in the fragmention syntax? for when fragments are short, and or the document changes?

<KevinMarks> the choice of words should be unique within the document; you could use a fuzzy matching in the implementation

<dwhly> So "the" would not be a recommended fragmention?

<KevinMarks> no, not a good idea

<dwhly> k

<KevinMarks> I wrote a bit about this at http://www.kevinmarks.com/poemfragmentions.html

<KevinMarks> most spec-like text for fragmention is at http://indiewebcamp.com/fragmention

<fjh> ScribeNick: TimCole

bigbluehat: Suggests talking about HTTP API


bigbluehat: LDP might be a way to go

azaroth: azaroth is a member of LDP
... But LDP is in Final Call, so feedback would go towards re-chartering
... Should not expect to have our comments addressed in current draft
... Might be useful to list out our functional requirements that need to be addressed
... in order to identify what Social and LDP already addressed

<fjh> from charter: - HTTP API: An API specification to create, edit, access, search, manage, and otherwise manipulate annotations through HTTP

Takeshi Kania: EPub in Japan contains the inline images

<fjh> http://www.w3.org/annotation/charter/

<azaroth> ISSUE: Anchoring needs to take into account images of glyphs, such as calligraphic japanese text. There's no offset or text to quote

<trackbot> Created ISSUE-11 - Anchoring needs to take into account images of glyphs, such as calligraphic japanese text. there's no offset or text to quote. Please complete additional details at <http://www.w3.org/annotation/track/issues/11/edit>.

fjh: notifying about creation, deletion, modification of annotations
... we can use ActivityStreams / Social potentially

<fjh> azaroth: notifications related to annotations might be from social web work

bigbluehat: Atom doesn't make sense for us?

<KevinMarks> notifications to users or to publishing site?

<KevinMarks> webmention is notification to publishing site

<tac> ISSUE: some letters in HTML are counted not as 1 letter even though the appearance is one letter. In case we use offset as counter, it might not reach to the expected destination in different systems (non web platform).

<trackbot> Created ISSUE-12 - Some letters in html are counted not as 1 letter even though the appearance is one letter. in case we use offset as counter, it might not reach to the expected destination in different systems (non web platform).. Please complete additional details at <http://www.w3.org/annotation/track/issues/12/edit>.

bigbluehat: Doug's SVG shows publication to author, not just publisher
... how to signal who should be notified?

<KevinMarks> the publisher could send a webmention to the author?

bigbluehat: Webmention is insufficient for everything we need to do. One option but not sufficient

tilgovi: Webmention does this intentionally

<KevinMarks> author webmention dispay http://www.kevinmarks.com/##webmentions

azaroth: formalized pingback or trackback

fjh: we do not want to do Identity Management

bigbluehat: the Data model identifies the entities, but putting these together for notification goes beyond our scope
... We should look at RFC 2668 covers linking and unlinking (HTTP 1.1 beta)

<fjh> http://tools.ietf.org/html/rfc2068

<azaroth> And: http://tools.ietf.org/html/draft-snell-link-method-10

bigbluehat: in that is a defined protocol for sending annotations to other Websites
... we may want to look at bringing this bit back

<bigbluehat> the implementation in progress: http://www.relify.com/

azaroth: we should strongly consider the second link if it moves forward

<KevinMarks> see http://indiewebcamp.com/webmention#Implementations

shepazu: Can we talk about takeways from Web App discussion yesterday?

Followup to WebApps Robust Anchoring discussion yesterday

shepazu: Doug is now showing the slides from that meeting yesterday.
... will export a Web version
... focused on idea of 'find.text' api, which would only be part of a full robust anchoring solution
... this includes cases for which FragMentions would be insufficient
... discussed some of the OA and Hypothe.is selectors, e.g., Context Wings
... then proposed a strawman 'element.findText()'
... might pass back an array of matches
... could include some options, e.g., prefix, postfix, etc. and would find closest match
... feedback from Web Apps was that this seems like a piece missing from the existing Web
... they also suggested using existing and prior art as much as possible

<bjdmeest> XPointer was a suggested technique as well, I think

shepazu: pointed out that the find.text() would need to be incremental (e.g., to slow to return every hit)

<azaroth> Yup XSLT / XQuery / XPointer / XPath ... lots of research done on this topic in the past to not reinvent

shepazu: so then you could iterate and refine your search
... idea of confidence sore (how close a match)

<azaroth> Also regular expressions etc etc

shepazu: should have options like whole word, case insensitive, beacuse it would be useful for broader sense of applications beyond annotation
... they mentioned polyfills
... also mentioned that when editing the application, if it knew about regions annotated, could update regions
... also talked about styling range objects
... psuedo element that uses these range objects (that you have found or selected)

<KevinMarks> fragmention implementation could be used to make a polyfill: https://github.com/chapmanu/fragmentions

shepazu: style would be allowed for properties that repaint but not those that require reflow
... good to do this for performance and security
... like they did with revisit link, these would be visible only in the view, not the page

fjh: Kristoff offered some suggestions of ways robust targeting could be generalized for other formats like images
... but not a first step

raphael: today we have been focused on text. Don't we also need to take into account other media?

fjh: yes.

Music Annotations

shepazu: Turning to music annotations
... muscians mark their scores up a lot.
... on the example shown there are annotations and time signature changes from two sources

<shepazu> http://mrgsmusic.weebly.com/assignment-5-sheet-music-annotation.html

<fjh> http://goingdigitalmusician.wordpress.com/page/2/

raphael: Can provide additional demos of music annotation.

shepazu: not yet a way to render music on the Web, but it is being discussed (e.g., on Wednesday
... sometimes need to annotate the underlying model, not the visual representation

<azaroth> ISSUE: Need a generic method to relay to annotation clients how to annotate the underlying data, rather than the representation in the browser

<trackbot> Created ISSUE-13 - Need a generic method to relay to annotation clients how to annotate the underlying data, rather than the representation in the browser. Please complete additional details at <http://www.w3.org/annotation/track/issues/13/edit>.

TimCole: this comes up in other media, e.g., are you annotating the thought behind the paragraph or the translation or the grammar...

paoloC_: for new spec need to know where and schedule

fjh: We will be using Respec

<stain> thanks for all the scribing!

<KevinMarks> did TimCole get all Wittgenstein there?

Upcoming Meetings

<azaroth> 23-24 April, I Annotate in San Fran -- potential for f2f meeting on the 22nd

<raphael> iannotate conference is at http://iannotate.org/

<fjh> Teleconferences on 12, 19 Nov

<fjh> 3 Dec

<azaroth> no call 26 Nov (thanksgiving)

fjh: thanks for an effective meeting. We are done.


Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.135 (CVS log)
$Date: 2014/11/01 12:17:16 $