W3C

Annotation WG F2F, First day

17 May 2016

Agenda

See also: IRC log

Attendees

Present
Ben De Meester, Benjamin Young (bigbluehat), Dan Whaley (dwhly), Doug Schepers (shepazu), Ivan Herman (ivan), Lena Gunn, Nick Stenning, Rob Sanderson (azaroth), TB Dinesh, Takeshi Kanai, Tim Cole
Guest
Richard Ishida (r12a), Felix Sasaki
Regrets
Frederich Hirsch, Rafaƫl Troncy
Chair
Rob, Tim
Scribe
dwhly, bigbluehat, tbdinesh

Contents

  1. Charter
  2. Issues
    1. Issue #205 (Dis/Allow both hasState and hasSelector on same SpecificResource?)
    2. Issue #206 ((model) vague definition of charactor position for text position selector)
    3. Issue #213 (exactly 0 or 1 language(s))
    4. Issue #210 ((model) have a note on logical and visual order of text for text quote selector implementations)
    5. Issue #216 ('generated' and 'modified' should use UTC by default)
    6. Issue 223 (String body must not have a language associated?)
    7. Issue #220 (Request header and language negotiation)
    8. Issue #224 (Base direction for annotations)
  3. Testing

TimCole: Discussion of agenda
... 3 parts
... Testing
... Identifying and Encouraging Implementations
... Misc

Ivan: We should also talk about preserving our F2F slot at TPAC

TimCole: Our goal is to very shortly be able to go to candidate recommendation.

Charter

Nick: Short overview would be helpful.

TimCole: We have a draft data model. We have a protocol draft.
... #5 and #6 are the things we don't have yet.

Ivan: Some of the selectors are aimed at robustness. Not a zero.

TimCole: Q: are these important enough that we should renew the charter. Sense is no from mailing list.
... CG may form and become active and continue these discussions.
... or other WGs may step in.
... Good signs that we may see uptake on the model.
... Primary serialization is JSON-LD
... Discussion of RDFa, script tags, etc. Might be worth documenting those.
... Should also recognize DPIG and IDPF interest.
... Important to find ways to make it easy as possible for folks to make use.

Shepazu: Controls for stating annotating preferences to reduce harassment might be worth discussing.
... it might be nice to state to folks at I Annotate and elsewhere to say we've discussed.

Nick: Should we renew the charter or let it expire?

Ivan: Terminology: extend not renew.
... Extension is easy to do.

TimCole: A year from now it's hard to renew. Extension can be used to finish work.

Ivan: For a CR we have to have all tech issues close, respond to reviews.
... I18N, privacy. never got a reply from any others.

Ivan: So we have to close the issues and document.
... Not required to have all test cases done.
... We have to define what the exit criteria are.
... Mainly passing tests, etc.
... If we can close all the open issues tomorrow. Then it is more an administrative task to finish.
... Hopefully all of us will be involved in testing.
... Will use oppy at I Annotate to discuss progress.
... That we have a stable document and need implementers.

Ivan: CR takes as long as it takes. If we can exit CR, then the rest is administrative.
... If we're in CR and have implementations then extension should be no prob.

Shepazu: Extensions usually 6 months.

Issues

<azaroth> Github Issues: https://github.com/w3c/web-annotation/issues?q=is%3Aissue+is%3Aopen+-label%3Apostpone+-label%3Aeditor_action+sort%3Acreated-asc (from top to bottom)

Azaroth: First, mapping bw activity streams and data model
... similar to basic uses of annotation
... should be clear if you're an AS client what it would look like in annotation space
... They're about to enter CR
... They have no open issues, need to define exit criteria.

Shepazu: Will have some implementations.
... Doesn't look like there are renewed interests.

Ivan: ... Will they stop at CR and never go to rec?

Shepazu: I can ask.
... If they don't get two implementations, then no.

BigBlueHat: Apache streams is implementing, and they use Activity Streams

Shepazu: Are we requiring things from AS?
... Do we have dependency on it?
... the solution would be to ask if they're going to exit CR, what timeframe. if they don't know, then we should extract those elements from our spec.

Ivan: We should mark them at risk.
... What would the consequences be?

Azaroth: Annotation collections.
... Choice is just an RDF list.
... Oh, you're right it's a subclass.

Ivan: Could we have a version of the doc in a separate branch, we can hot swap when there's an issue.

TimCole: Dependencies are a couple things in the vocab.
... And the generator.

Ivan: Can I just add in the editors section?

TimCole: +1s all around (by vote).
... HTML Serialization-- we'll discuss tomorrow. (All Agree).

Azaroth: Issue #199
... When you deferenced the namespace, what should you get. We all agree HTML, but not exactly what.
... How about an interstitial page that switches to the specific form (Turtle, RDFa, etc)

Ivan: If it's easy to generate an RDFa page, then great.

Shepazu: Not a fan of content negotiation, but we could have multiple files pre-created.
... index.json, index.turtle

Ivan: More info is better.
... Another Q, should it be http, https

TimCole: Concerned about one thing.
... re: JSON-LD. Is that a context doc or something else?

Azaroth: It would have its own context.

TimCole: Is that a point of confusion
... I know it's practice in other groups, like Schema.org, etc.

Azaroth: Only way you'd get JSON document is by dereferencing the context document.

TimCole: When you have translators and you have a namespace in your JSON document.
... now, https?

Ivan: https is still under discussion. move at W3C to https all the things.
... separate discussion whether the namespace documents should be https
... URL is a formal identifier.
... Not totally clear.
... We may decide we don't care and do it in https

Azaroth: Other area of concern would be the context document.

Ivan: All JSON files should have the CORS flag set.

Shepazu: It's the security model of the web.
... how long will https be around
... we don't know if in 15 years what we'll be doing.
... TBL acquiesced to using schemeless URLs
... should we consider the same.

Ivan: Worry that RDF tools might fail on that.
... They can handle https today

BigBlueHat: Stick w/ http

Nick: A thing w/o a scheme is not a URI
... why not just use https

TimCole: In 10 years we're probably going to be on version 2

Ivan: we can say https
... W3 is pushing for it.
... The only downside is whether existing annotation providers will have a problem

Azaroth: If you're inputting the context document, if it lives at https

Ivan: I propose we use https. Call it out and get feedback.

TimCole: Proposed: https and an interstitial document. Are we confident enough to close this now?

Ivan: Lets pass and consider it closed.

<azaroth> PROPOSED RESOLUTION: Use interstitial HTML document, with RDFa of the vocab if possible

<ivan> +1

<azaroth> +1

<bjdmeest> +1

<takeshi> +1

<TimCole> +1

<nickstenn> +1

<tbdinesh> +1

<shepazu> +1

<bigbluehat> +1

RESOLUTION: Use interstitial HTML document, with RDFa of the vocab if possible

<azaroth> PROPOSED RESOLUTION: Use https for the namespace and context documents, not http

<azaroth> +1

<TimCole> +1

<bigbluehat> +1

<ivan> +1

<tbdinesh> +1

<bjdmeest> +1

<takeshi> +1

RESOLUTION: Use https for the namespace and context documents, not http

Issue #205 (Dis/Allow both hasState and hasSelector on same SpecificResource?)

Azaroth: Background: At TPAC there was some discussion around list in the use of selectors.
... Since then you can define one selector by another selector.

<bigbluehat> https://github.com/w3c/web-annotation/issues/205

Azaroth: Question that remains, meaning of multiple selectors multiple states .

Ivan: Close to issue #207
... And that one is slightly better in the sense that they're specific use cases coming from Paolo.
... At the moment we can't cover what he wants.

TimCole: We don't have composite, order and choice

Ivan: Doesn't say what it means.
... We have a bug

TimCole: Say I'm annotating script of a play, want to annotate the dialog, not the bits in between.
... Three pieces want to annotate as a single target.
... Why did we take composite out.

Azaroth: Because list will work for it.
... Composite didn't have any order.
... We didn't put list back.
... Use of collection instead of Annotation collection.

TimCole: Do we need to distinguish between Annotation Collection and the thing we use for Composite.
... Would there be confusion.

Ivan: We need a way of expressing if I have several targets, what does it mean?
... Choice, Composite, etc.
... We could add another term AND, OR, etc.

TimCole: It's a question of being explicit
... if I have a target array.
... body applies to each element of the target equally. or everything in the array is an OR, or an AND.

Ivan: Separately, all together or "Pick one of those"

azaroth: translations.
... one of the requests for choice is for preferences of publishers.
... Need extra node to have order of list.

Ivan: Worried about doing something that's way more complicated.
... need to have consistency.
... For me saying that the target could be an array, AND, OR, etc.
... What I hear is we need choice, conjunction, etc.
... We may have a default case, but lets forget about that for now.
... We need a conjunction, disjunction, choice, ordered list.

TimCole: One reason we want to this, b/c 80 need "here's a set of targets" don't care about order.

Ivan: Issue: We did not define it, but discovered we needed to.
... For implementers its an extra step.

Nick: The default is a distraction, what matters is the underlying complexity.

Azaroth: If its just an array, then its mini-triples.

Shepazu: Single structure best

BigBlueHat: Fiddly, but better.

Nick: Meaning of an array is well defined in JSON-LD

Azaroth: You can't have same key multiple times
... you need an array.

tbdisnesh: are we trying not to have a predicate anywhere?

TimCole: My concern right now is we do not have a type for composite.

Azaroth: I agree.

Shepazu: Number of multi-target annotations is probably pretty high.

Ivan: Agree.

TimCole: For people that don't need this, not having to put in list is better.

Ivan: Lets write out the choices.

Azaroth: Choices:
... [DISCUSSION ON SKETCH PAD]

<TimCole> 1. "target": "uri" 2. "target": ["uri"] 3. "target": ["uri1", "uri2"] 4. "target": {"type": "____", "items": ["uri1", "uri2"] } 5. "target": [....] "combineTarget": "____" 6. "target": {"@list": [....] }

Ivan: We did say that #1 is a must

NickStenn: A field can be a target, those targets can be resources, either as URIs or embedded. Item #1 #2 #3, either with URIs or embedded
... the question is what resource types will be in our spec that will allow for ordering that are not provided for by JSON-LD

Ivan: What we have is #1-3
... Currently we are not covering ordering, and we don't have a clear statement about what #3 means.
... We don't have use cases where they are combined either conjunctively or disjunctively.

Shepazu: I was a big proponent originally of 1, 2, 3.
... Wanted to have simple case of 1
... Even if its always more complex, having a single structure is better.

BigBlueHat: Except to Nick's point.

Shepazu: You always have to pick a type. Get rid of 1-3.

TimCole: 5 looks like 2 and 3

Nick: Completely add odds with mental model programmer has.

Ivan: If we were only to take one structure. then 4 would cover it all.

Nick: What if we had virtual embedded resource that wraps other resources?
... (Agreeing w/ #4)

Shepazu: My argument in allowing 1-3 is complexity of implementation.

Ivan: Go with #4, accept #1.

Shepazu: Why

BigBlueHat: Could be a URI.

Azaroth: We've already made a lot of concessions for simple use cases. It would be a big change to disallow 1.

TimCole: notice, as in 4. one can always use exaple10 structure. - you can always use Choice

ivan: we should also say what #3 is for the sake of clarity

nickstenn: what about removing choice? we only do not cover admittedly esoteric use cases

azaroth: 3.26 covers cardinality of targets

ivan: at this point my feeling is.. lets put this additional structure as at risk

shepazu: i can see a case for each body applying to a particular target

ivan: can we propose a resolution by have 3 more classes disjun, conjun, and ordered

azaroth: concept important not name

<azaroth> PROPOSED RESOLUTION: Extend the Choice structure by adding back Composite, List, plus a new one Individuals to explicitly state each is independent. The entire section to be marked At Risk

ivan: now moving on... i would close 205 as editorial issue by refereing to 207

<ivan> +1

<azaroth> +1

<TimCole> +1

<takeshi> +1

<nickstenn> +1

<bigbluehat> +1

<bjdmeest> +1

+1

<shepazu> 0

RESOLUTION: Extend the Choice structure by adding back Composite, List, plus a new one Individuals to explicitly state each is independent. The entire section to be marked At Risk

Issue #206 ((model) vague definition of character position for text position selector)

going on to https://github.com/w3c/web-annotation/issues/206

issue https://github.com/w3c/web-annotation/issues/206

fsasaki: talking about new issue sent by his group over email

takeshi: its a limitation of javascript. unicode is extended from 2 bytes to 4 bytes.

nickstenn: it would be lovely if unicode guys stated what a normalized code would be. for example, emoji code
... real impl prob is JS throws its hands up.

fsasaki: summarizing to richard. JS is not doing the right thing

rishida on phone: you pointed out graphene clusters was the right thing to follow/point

<fsasaki> https://github.com/w3c/web-annotation/issues/206#issuecomment-217479442

<nickstenn> the fundamental tension is between users' selections, which are made on logical characters, and the realistic implementation complexity of counting logical characters in UCS-2-based javascript

ivan: go to practicality or purity?

shepazu: whole point thing about face and color being 2 code points..

nickstenn: its backwards compatibility.. unicode guys will not change that

fsasaki: from the i10a perspective.. using code points is natural; not just JS .. but how to avoid the divergence

azaroth: can we embrace the divergence.

ivan: how

azaroth: with a property.. on the Text*Selectors, such as characterCountingType with a value of "CodePoint" or "CodeUnit"

nickstenn: no!

r12a: you must not select from high surrogate to low surrogate... so ..???

TimCole: migrate from text selectors to codepoint selectors (also rob)

ivan: takeshi.. what is the percentage of cases where this fails

takeshi: now since people have started using emoji.. docs with emojis will fail

TimCole: what about using another selector..

shepazu: uses would not know.. but implementers would

nickstenn: as an implementor.. i would rather use codepoints. as in JS its a lot of work
... optimizing for JS for all platforms.. is huge work

<bigbluehat> nickstenn: this one? https://github.com/RadLikeWhoa/Countable

ivan: propose to close by turning it back into an editor_action..

r12a: there is no scenario ever in code units.. where you need to select between high and low sorrogates
... if you are dealing with utf8 you can figure out by looking at the bytes

<azaroth> PROPOSED RESOLUTION: (Continue to) Accept code point and add a note on browser implementation details

r12a: are we not being too specific .. we need something_re_graphenes maybe the level of details what we need to say in this spec

<azaroth> +1

<ivan> +1

<bigbluehat> +1

<bjdmeest> +1

<nickstenn> for the minutes: I'm still a bit nervous about codepoint vs logical character

<takeshi> +1

RESOLUTION: (Continue to) Accept code point and add a note on browser implementation details

Issue #213 (exactly 0 or 1 language(s))

<fsasaki> https://github.com/w3c/web-annotation/issues/213

<fsasaki> latest input from richard at https://github.com/w3c/web-annotation/issues/213#issuecomment-219731858

r12a: can you look at the bottom - last para
... can you have multiple languages or not.. it does not make sense to specify 1 lang at a time

fsasaki: we did some related discussion earlier; we can provide a priority list eg [fr, en]

<nickstenn> bigbluehat: [earlier discussion] yes, but the core of it is actually punycode.ucs2.decode from https://github.com/bestiejs/punycode.js/

<bigbluehat> "A robust Punycode converter that fully complies to RFC 3492 and RFC 5891, and works on nearly all JavaScript platforms."

r12a: yes, the language of the intended user.. choosing localized version, but when you present it to someone you do not know what lang that bit of text is.
... what is it that the actual lang that we are processing

TimCole: what time can you join us tmrw, we start at 9am. earlier the better

r12a: will let you know. i may not be able to even

fsasaki: https://github.com/w3c/web-annotation/issues/208 is a small one.
... the way it works.. BCP7 is always stable

ivan: we can close then. next small one..

TimCole: we dont need to describe the audience
... so closed.

azaroth: we explicitly draw the line at ???

Issue #210 ((model) have a note on logical and visual order of text for text quote selector implementations)

https://github.com/w3c/web-annotation/issues/210 next small one

fsasaki: give guidance for logical .. right to left or left to right.

azaroth: if its right to left but you look at left to right.. its still right to left

Issue #216 ('generated' and 'modified' should use UTC by default)

next https://github.com/w3c/web-annotation/issues/216

azaroth: any objections to UTC?

<nickstenn> +1 to not having to replicate the Olson database in javascript :p

<azaroth> PROPOSED RESOLUTION: Accept use of UTC as recommended to include explicitly and use as default if not present

<azaroth> +1

<ivan> +1

<bigbluehat> +1

<bjdmeest> +1

<TimCole> +1

<takeshi> +1

+1

RESOLUTION: Accept use of UTC as recommended to include explicitly and use as default if not present

215 is same / similar

azaroth: leave them both editorial

nickstenn: someone pls explain 216, is it to make it unambiguios?

azaroth: you cant have a timezone, without time. you only have a date

TimCole: json schema does not do just date

nickstenn: figuring actual date is not easy

bigbluehat: if you were to record historic annotations.. then having to express the time. it will be hard

azaroth: vast majority.. what time/date is it now
... i think the proposal is to go back to date time and require UTC

nickstenn: point that Addison is make in 216 is .. incremental time value to absolute

<bigbluehat> 5.3 of this for those who like to read things about incremental time https://www.w3.org/TR/timezone/#ivfbased

ivan: when i use it manually i cheat. i put a date and 0000 as midnight

nickstenn: but its unambiguous :)

ivan: that requires a resolution as we change the model

<azaroth> PROPOSED RESOLUTION: Revert to requiring xsd:dateTime, and to REQUIRE UTC for all timestamps

<ivan> +1

<azaroth> (And this overrules the previous recommended resolution above)

<azaroth> +1

<nickstenn> +1

<bjdmeest> +1

+1

<takeshi> +1

<TimCole> +1

<bigbluehat> +1

RESOLUTION: Revert to requiring xsd:dateTime, and to REQUIRE UTC for all timestamps

now https://github.com/w3c/web-annotation/issues/218

Issue 223 (String body must not have a language associated?)

azaroth: 223

https://github.com/w3c/web-annotation/issues/223

azaroth: just give in context and not have language here

Issue #220 (Request header and language negotiation)

<tbdinesh_> now https://github.com/w3c/web-annotation/issues/220

azaroth: its an implementation specific thing

Issue #224 (Base direction for annotations)

(this was about 224 https://github.com/w3c/web-annotation/issues/224)

ivan: its out of scope

fsasaki: same as lang?

bigbluehat: he has hebrew mixed with english

takeshi: if that text is on browser rendering we do not need rtl

Testing

TimCole: we're stopping here for today. good job everybody
... let's shift gears and talk testing

now on testing

shepazu: several of us are working on testing, shane, ...
... i had said during our last telcom that i would break out of assertions and these two are working on it
... basically, rob put together a spread sheet. diff than the way i would..

<azaroth> Spreadsheet link: https://docs.google.com/spreadsheets/d/1QwhHYyEd-106nvwe_q-A9z02wO9R-Oa7l5vnmMlYTQ0/edit

shepazu: broke it down into indiv objects, 0, 1. many..
... i think this is useful
... i suggested MUST SHOULD..
... i am in the process of adding RFC key words
... i split out body/target..
... do we agree we need RFC2119 keywords? i think we do
... obviously we can generate tests automatically
... is this sufficient?

ivan: some value of certain things .. are not in this table

<bigbluehat> for those following along at home, we're talking about using JSON Schema-based tests built (potentially) from azaroth's link above and run through the Web Platform Test framework which has been customized by ShaneM and lives here https://github.com/Spec-Ops/web-platform-tests

TimCole: for context.. implementers will use their impls, use the test interface, run those by basically use json-schema to chk valid/not-valid
... to give implementers' feedback on where they are

shepazu: there are about a 120 of these testable assertions, not inlcuding combinations or keywords
... we have the test, we have the text area, we paste a json from the test and test
... we dont have one test for each, we have maybe 150+ grouped tests

<azaroth> Huh, there's some missing rows ... will add them back

ivan: we we did RDFa we had much larger num

shepazu: i assume that once we have these tests, each impl can do their own framework for testing

bigbluehat: which is why we architected it as descriptive platform
... copy and paste thing is analogous to validate your html
... manual tests are for us to further our process

<azaroth> Anything that looks wrong probably is, it was done on the flight here with limited wifi

shepazu: i had hoped to be able to get away from manual tests. is there anybody there who think there is another way

TimCole: there are certain things we have to think about. as along as there is for example, json schema
... we have a couple of schemas, test docs, notes on certain combinations

ivan: i would like to see how the json schema holds these together. for the time being.. i am lost

shepazu: tim, can you demo something tomorrow?

TimCole: benjamin and i will talk about and see how much we can show.

bigbluehat: shane is possibly available tmrw to do screen tests demos

<bigbluehat> readme for current test system https://github.com/Spec-Ops/web-platform-tests/tree/master/annotation-model

<ShaneM> basically it just works.... for simple cases anyway.

TimCole: we may want to test together, eg our text says there should be an id and if there is an id there must be only 1
... 1 test, 2 possible results
... in json schema, there is test for URI format; is URI more restricted than IRI.. how do we reconcile
... thats part of the morning. we will finish all this and some postponed issues

ivan: postponed might mean v2 next WG. lets not spend much time on that

TimCole: lets give all an opportunity and move on

bigbluehat: can i have an UTC to inform Shane

TimCole: he is 7hrs off from here.

shepazu: time zone math is so hard

Summary of Action Items

Summary of Resolutions

  1. Use intersitial HTML document, with RDFa of the vocab if possible
  2. Use https for the namespace and context documents, not http
  3. Extend the Choice structure by adding back Composite, List, plus a new one Individuals to explicitly state each is independent. The entire section to be marked At Risk
  4. (Continue to) Accept code point and add a note on browser implementation details
  5. Accept use of UTC as recommended to include explicitly and use as default if not present
  6. Revert to requiring xsd:dateTime, and to REQUIRE UTC for all timestamps
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.143 (CVS log)
$Date: 2016/06/01 16:25:12 $