Technical Architecture Group Teleconference -- 07 Jan 2014

<dka> trackbot, start meeting

<trackbot> Date: 07 January 2014

<ht> ScribeNick: ht

<scribe> Scribe: Henry S. Thompson

<scribe> Meeting: TAG F2F

Agenda planning

[Per Wiki: http://www.w3.org/wiki/TAG/Planning/2014-01-F2F]

YK: DRM? Draw a line and say "not our pblm"?
... Too much politics for us?

TBL: I had hoped for some technical clarification

YK: AB more than us?

TBL: Interop is our business

YK: I think the consensus (minus TBL) was that the proposed technology _would_ harm interop

TBL: Users of [NetFlix] think it's useful -- they are worried about an open platform

YK: Tech. focus needed if we put this on the agenda

DA: Yes, we need to go back to the architecture of components and interfaces

HST: I'll do a 5-minute intro based on the thread from November [ref?]

TBL: Focus on architecture

<dka> http://www.w3.org/wiki/TAG/Planning/2014-01-F2F

s/November [ref?]/October (http://lists.w3.org/Archives/Public/www-tag/2013Oct/0050.html)

s/November [ref?]/October (http://lists.w3.org/Archives/Public/www-tag/2013Oct/0050.html)@g

s@November [ref?]@October (http://lists.w3.org/Archives/Public/www-tag/2013Oct/0050.html)@g

<Yves> last work on logfile was... in 96

Capability URIs

JT: Ref. previous set of slides

<JeniT> http://w3ctag.github.io/capability-urls/2014-01-03.html

JT: Document discusses why, whether, and how to use Capability URLs

YK: APIs might need some thought

JT: Haven't got to recommendations on what standardization is needed to help take this forward
... My main question is whether we should encourage this

YK: Moot -- they're already in widespread use

JT: So, OK, is the work needed to improve things/standardize/etc. worth the potential improvement?

YK: Well, e.g., Github users risk getting sniffed, overlooked, etc.

JT: There are _lots_ of ways in which URLs get leaked, not just over-the-shoulder
... e.g. Referrer header

YK: Cross-domain when https?

JT: I think so

AvK: I think not
... Some amount of Referrer control under development, opt-in

YK: Good to have a list of exposure points

<annevk> Referer policy browsers are converging towards (I think only Chrome has this at the moment): http://wiki.whatwg.org/wiki/Meta_referrer

JT: I have some of these in the document -- plain http is always a bad idea, 3rd-party scripts, ...

YK: Best practice: Don't Leak This

DA: TAG recommendation along these lines?

YK: Header which says "this is secret"

AvK: CSP directive ?

<twirl> й+

DA: Risk/exposure is in scope for the document

TBL: The simple observation is valuable that putting security in URLs, when URLs are in wide use, is intrinsicly risky

YK: Too late to say "don't do this"

JT: Not saying that, saying: here are the risks, consider them before going ahead

YK: Yes, but also, look for mitigation strategies

DA: Suggest a WG do this?
... Focus on this document -- what more needs to be done before publishing it

YK: Looks good to me -- but check the Referrer facts

JT: Wrt risks, mitigations are listed: -- always use https, several levels; capability URLs should expire, no links to 3rd-party scripts

YK: Github ones can't be expired

AvK: better _untrusted_ 3rd-party ...

SK: No use of capability URLs by robots
... Google Analytics, MS Mail, etc.
... because then open to e.g. searching

JT: Yes, search engines find URLs whereever they can

SK: Yes, once you find one, you can find lots via wildcarding

JT: robots.txt is only as good as crawlers let it be

TBL: Signalling that a cap URL is an important secret is a bit counter-productive -- it just tells bad guys where to focus their efforts. . .

DA: Well, how can we avoid cap. URLs being crawled if robots.txt isn't the right way

PL: I put poison links at the front and back of every page which I protect with robots.txt
... Anyone who follows them twice gets firewalled

TBL: Higher level thought: when we think about using URL for important stuff is that if one _does_ leak, then I have no way of knowing who's accessing my data

HST: But you have server logs

TBL: You have IP addresses, but not identity
... So in the document, between 4.1 and 4.2, need something about _recognizing_ compromises
... That is, how can I tell that it's been compromised
... JAR would be arguing _for_ capabilities

AvK: Capabilities are great, but we're talking about using URLs for caps

YK: But URLs are the basic currency of the Web, it's natural to want to use them
... Trying for the perfect cap. system would be too complicated

JT: What about caps via email -- any recommendations?

AvK: At least make it expire quickly

YK: Suggestions wrt shoulder-surfing section should move higher up

JT: Using replace-state means you can't bookmark
... So the back button won't work
... the swap mechanism fixes that
... but not the bookmarking problem

AvK: And that's important
... Note that gist.github would then be completely useless

JT: OK, so I'll take it out

HST: No, just explain what it doesn't work for/breaks

TBL: Suppose all cURLs were recognizable by browsers, then would it be obvious how to modify browsers to do the right thing

AvK: Treat it like a password -- blur, etc.

YK: Yes, doesn't go into history

TBL: But then you can't cut and paste it

YK: I said before, this is a big open-ended topic, suitable for a new (or existing) WG, not us

JT: Another issue wrt moving forward
... when you have a resource (a doc, e.g.), and there are cURLs to enable others to edit
... How do you indicate they are all for the same resource
... rel=canonical isn't really right

YK: Seems like a lot of the semantics are correct

TBL: If I give out two cURIs for a calendar, neither of them is canonical

YK: But one could be

TBL: Giving bad guys too much information?

JT: Not all would be listed, all would point only to the core one
... And it could be the one with access control

AvK: [Flicker example -- scribe didn't catch]

YK: Making the canon accessed-controlled is the right move

AvK: But we don't want them indexed. . .

YK: Similar to cache -- you want to cache the canonical one

AvK: Hunh?

JT: At least you have some abiitity to do comparisons across users

HST: So, something about this does belong in the document
... How it does correspond to the core use of canonical to some extent
... And what it does and doesn't give you

AvK: OK, but not v. important
... The redirecting thing is more important

JT: 301 Moved Permanently?

AvK: Yes
... Say anything about what happens if you try to use a cURL which has expired?

JT: Not sure what the right response is

AvK: 404?

HST: Too weak -- wait, I see, maybe that's right, don't giving anything away

JT: 410 Gone might be more appropriate

AvK: Possible, but not required

YK: But 404 is retryable, but 410 is not

TBL: Does _anyone_ ever distinguish between 4--?

YK: Yes, I did

<Yves> I don't know any implementation caring about the real meaning of 301 or 410

AvK: I tried using it, people kept re-fetching. . .

<Yves> is there any browser modifying/deleting bookmarks based on such response?

DA: So what does the doc. say?

HST: See YK's meta-point -- this is part of further work

DA: So add something saying the 410 is right in principle, but may not be well-supported
... In practice, if you try for an expired cURL, do you get a 200 or a 404?

YK: Tried an example with gist.github, it gives a 404

AvK: To give a 410, you would have to have a history of your issued capabilities

Actually, keeping a history is probably a good idea anyway, IMO

[Meta discussion about publishing]

HST: +1 to Finding

<Yves> note that giving a 404 hides that there was a capability

<Yves> 401 leaks that there was one

<wycats_> I vote :shipit:

Yves, yes,

<Yves> like github giving out 404 for hidden/protected projects instead of 403

HST: Rec Track requires an AC review, I don't think we want to go there

<dka> Suggest process going forward: going to working draft, seeking some public feedback, and then publishing as a "finding."

JT: Use 2219 words officially?

AvK: In accordance with them, but not 'officially' . . .

TBL: Referencing the RFC isn't necessary
... w/o a Conformance section it doesn't make sense

JT: Best Practice boxes. . .

DA: So, yes, publish a (F)PWD, seek feedback, we address, then publish a finding

JT: Not including standardisation

DA: Agreed, but identifying gaps/further work is good
... w/o discussing solutions

<dka> RESOLUTION - we move ahead with the publication of Capability URLs towards a TAG finding.

Closing issues

<dka> Open issues: http://www.w3.org/2001/tag/group/track/issues/open

<dka> TAG products: http://www.w3.org/2001/tag/products/

<dka> Github repo: https://github.com/w3ctag

<dka> Spec review list (github issue tracker): https://github.com/w3ctag/spec-reviews/issues

DA: Should we clarify where we're actually working
... I've edited the home page to suggest the way we're moving to Github
... Do we want to keep any of these issues?
... Is there things we should bring forward? Or have abandoned or been overtaken?

YK: URIs for packages?

DA: Charter says Issues are what we're working on
... And we have two places where we are recording them

YK: Propose that onus is on individual to move issue from old list to Github

<wycats_> ht: thanks for that crisper articulation

DA: Proposed resolution: Github issues list shows what we are commited to work on

JT: Archived?

PL: working on that, but not in place yet

<dka> This is our github issue tracker: https://github.com/organizations/w3ctag/dashboard/issues

YK: Can be exported at any time

DA: So if we move one, we would need to point back to the old Issue
... Happy not to go through the old list

<JeniT> Scribe: JeniT

dka: we should close some of these issues

<scribe> Scribe: ht

<dka> Re issue-60, I propose that we record this as closed since the TAG has published work on this topic.

<dka> issue-67: html and xml - we had a task force, we've done everything we intend to do here.

<trackbot> Notes added to issue-67 HTML and XML Divergence.

<dka> Henry: issue-64 and issue-65 can be closed

<dka> PROPOSED RESOLUTION: Summarily close all other issues except issue-57 unless TAG members wish to reopen them in github.

JT: Close 25, Deep Linking?

<dka> issue-25 can be closed - we have published work on this.

DKA: Yes

<dka> Issue-40 can be closed as we have completed work in this space - it can be re-opened if there are key URL/URI topics we need to work on.

Closing 40 should mention both capability URL and FragId drafts

PL: Used Postponed to indicate we 'closed' w/o review?

DA: Fooling ourselves?

HST: The substance will persist on the Web regardless of what we call it

<dka> PROPOSED RESOLUTION: Summarily mark as "postponed" all other issues (not explicitly noted above) except issue-57 unless TAG members wish to reopen them in github.

PL: Thought it was worth it

TBL: Better to make the distinction

HST: Right, OK, because 'Closed' means we actually _did_ something

<dka> RESOLUTION: Summarily mark as "postponed" all other issues (not explicitly noted above) except issue-57 unless TAG members wish to reopen them in github.

<dka> Products: http://www.w3.org/2001/tag/products/

DKA: Moving on to Products
... Obsolete this page, and ref. Github?
... Not updated for some time. . .

YK: +1

(Note also http://www.w3.org/2001/tag/group/track/products)

<dka> PROPOSED RESOLUTION: we obsolete the tag products page, explicitly state on our home page that the current tag work is in github and info can be find in the github readme files associated with each product.

DKA: We can move some things to Completed, and then note that no further changes will be made

<dka> RESOLUTION: we obsolete the tag products page, explicitly state on our home page that the current tag work is in github and info can be find in the github readme files associated with each product.

<dka> ACTION: dan to make edits to the tag home page and product page accordingly. [recorded in http://www.w3.org/2014/01/07-tagmem-minutes.html#action01]

<trackbot> Created ACTION-846 - Make edits to the tag home page and product page accordingly. [on Daniel Appelquist - due 2014-01-14].

DA: Github next, but we need AR for that

IETF London Action Plan

<dka> http://www.ietf.org/meeting/89/index.html

<dka> https://www.w3.org/2014/strint/Overview.html

Security workshop (aka STRINT) is 28/2--1/3, Friday and Saturday

DA: Will be in London
... I'll be there
... Emphasis is, I believe, on technical issues

IETF is at Hilton Metropole 3-7/3

DA: I'll attend the HTTP part of that, at least

Yves, are you going?

DA: What other APPSDIR stuff should we be looking at?

HST: Get Your Truck off my Lawn? We can ask MN tomorrow if he wants any help
... JSON?

DA: We'll come back to that

<Yves> dka, I don't know yet if I'll be there (london ietf) or not

<dka> We will re-convene at 13:00.

<timbl> http://weather.aol.com/2014/01/06/look-swirling-polar-vortex-over-northern-us-seen-from-space/

Pause for lunch

<scribe> ScribeNick: annevk

<ht> Scribe: Anne van Kesteren

Data Activity

<PhilA> http://www.w3.org/2014/Talks/0701_phila_tag/ -> PhilA slides

[Recap: capability URLs will become a TAG finding. If you have feedback slightlyoff please pass it on.]

[We did not close ISSUE-57. If you care about an issue you need to open it in GitHub. We did not talk about HTTP2.]

<timbl> http://www.opc.ncep.noaa.gov/UA/USA.gif

PA: A decade ago I ended up sniffing around this W3C organization. I ended up in one of Dan's groups.

DA: I take no blame!

PA: If it's data and not something else, say HTML or XML, it's part of the data activity.

[Goes through aforementioned slides.]

PA: Interested in government data (e.g. mapping criminal activity), but also scientific data, such as free access to papers
... in the scientific world there's a question how you can have open access while still have peer review

[Slide projects web applications on one side and spreadsheets/data/etc. on the other side.]

PA: we'd like to bridge the gap between the data and the application, make it easier
... if you want to do more involved things with data you end up with semantic web technology
... a lot of the RDF stuff is done
... if you export to CSV (or tab-separated, etc.) you lose a lot of data
... we want to be able to describe the metadata separately
... so they can be independent actors

<slightlyoff> annevk: this is the open-vs-closed dataset issue I keep brining up

PA: we want to find the middle ground between RDF and CSV

<slightlyoff> annevk: in a world where people have incentives to lie, and data isn't pre-groomed, schemas are suggestions

<slightlyoff> and say very little about quality

HT: I have a PHD student that works on extracting scientific data out of HTML tables and RDFa
... you partition datasets around dimensions such as geography and/or time
... then we map these partitions on URLs
... so they make sense within the context of the web architecture
... and then you end up with HTML tables with RDFa annotation (through a small vocabulary) so the data can be extracted

<ht> and generic visualation and processing tools (e.g. Map-Reduce) can be deployed w/o prior knowledge of the particular format

PA: You can see this on e.g. Google when you query "Population of London"

<ht> I'll send a link to our XML Prague paper in a week or two

PA: The CSV group is trying a more practical approach
... taking existing data and annotating that in a structured way
... That's not the only thing, there's another WG on best practices
... we need to know how often datasets are updated
... whether they are kept alive
... how you cite datasets
... We have some workshops coming up [see slides]

Data Activity and the TAG

PA: we might need some work around packaging of data
... work around access control and payments

- DRAFT -

Technical Architecture Group Teleconference

07 Jan 2014

Attendees

Contents