TAG telcon -- 24 Jun 2010

Admin

NM: Approve minutes of 27 May?
... Regrets from John Kemp

<DKA> +1 to approving 27-may minutes

NM: Minutes of 27 May are approved
... Minutes from f2f?

<DKA> I've read most of them...

<jar> maybe 1/2 or so...

NM: Minutes from f2f of 7 June -- 9 June approved

<DKA> +1 to approve

RESOLUTION: Minutes from f2f of 7 June -- 9 June approved

NM: Minutes of 17 June?
... Minutes of 17 June approved
... Telcons of 1 July and 8 July are cancelled
... Date of next meeting: 15 July, John Kemp is expected to scribe
... Availability tabulated in [member-only] a message to the TAG internal mailing list
... F2F at Google confirmed for 19--21 October, but hold off booking plane tickets until review at beginning of September

<DKA> I plan to organize other meetings in Silicon Valley that week so I hope we do stand by having the f2f those dates.

IRI Everywhere

LM: The most recent discussions about IRI[bis]/LEIRIs/etc. arose originally from a concern about venues (how many places should something about this be said)
... But once we started looking at that, some technical issues emerged
... Some of these have begun to be addressed in IRIbis, progressing (slowly) at IETF
... Here is my list of issues I think are on the table:

<noah> I'd like to know how Larry feels about Roy's position, which I take to be: HTML and similar "containers" don't directly contain URIs or IRIs, but rather reference strings in some document encodings; standardizing those isn't likely to be practical; let HTML5 define what it does.

<masinter> IRI -> URI via hex encoding, and then parse & reencode domain names

LM: 1) (small) -- processing of non-ASCII hostnames, e.g. http://[something in Chinese]/ -- IRIbis draft said to turn all non ASCII chars in an IRI into hex-encoded UTF-8, and then translate the domain name back into punycode

<noah> This recommendation is embodied where? I missed that.

LM: And that seems to be a bad approach, because the domain names didn't always get turned back into punycode
... at the right point

LM: And the browsers don't do that anyway, in practice
... they parse the UTF-8 and convert directly to punycode
... w/o going through hex-encoding
... Fixing this required changes to IRIbis, but maybe also the http and ftp URI schemes, maybe LDAP too
... That work can't happen in the HTML WG
... 2) (related) Security issues around spoofing of IRIs are made worse by the large number of homographs
... (different characters with similar glyphs)
... That's a small part of a larger issue of the visual presentation of IRIs that also arises with more complexity when characters from RtoL national languages are used
... Overall, getting a clear picture of the stages of processing wrt different tasks involving IRIs is still a job that needs to be done
... Writing ASCII URIs on the side of a bus was easy -- it's not easy now, with combining chars and bidi
... So that's three areas where we have work to do
... And the question of who has to do it, and can we do it so that HTML is not either overly involved or overly delayed
... ABarth change proposal [http://lists.w3.org/Archives/Public/public-html/2010Jun/0394.html] seems more about venue than about actually doing the work
... Maastricht IETF meeting coming up might offer the opportunity to really get moving
... but browser vendor participation is in doubt, which is worrying . . .

NM: Helpful summary, thanks

<masinter> not so much in doubt as not quite aggressive enough

NM: I'm confused by some aspects of ABarth's change proposal which you didn't mention
... Particularly where he quotes Roy Fielding

<noah> From Roy by way of Adam:

RFC 3986 defines how to parse URIs (for recipients) and provides many rules for scheme-specific specs to define how to generate URIs of a given scheme (for producers) within the overall constraint of matching the URI syntax (the formal ABNF). Please understand that browsers almost never parse URI or IRI or anything in between. Browsers have input strings that contain one or more references, usually in the document encoding, and so there is a sequence of context-specific and charset-specific and media-type-specific processing that occurs before you even get to the individual URI-reference or IRI-reference that are defined by 3986/3987. Some people have proposed that most of that pre-processing be added to the IRIbis spec, but I have seen no evidence to suggest that such pre-processing is even remotely standardizable (it seems to be different for every input context). If you can demonstrate or get agreement on a single way to preprocess an input string, or at least a few named processes (like single-ref and multi-ref), then that would be useful.

NM: So Roy is focussing on the strings in browser input and the processing they do
... which he doesn't think can be standardized

<Yves> [was http://lists.w3.org/Archives/Public/public-iri/2010May/0008.html ]

NM: We have to confront that point -- either by saying "oh yes it can", or by being more careful and splitting it into two parts
... one of which can be done globally, and one of which remains with e.g. the HTML spec.
... So I think that makes it more than just a venue issue

<Zakim> noah, you wanted to look a little more closely at Adam's note and whether the word "venue" captures the issue

AM: LM, are those three technical points written up in more detail?

<masinter> http://tools.ietf.org/wg/iri/charters

LM: Some are to some extent in the IRI WG mailing list issue processing, and some have been (partly) addressed in the current IRIbis draft
... Follow your nose from the charter http://tools.ietf.org/wg/iri/charters to drafts

<masinter> http://tools.ietf.org/wg/iri/draft-ietf-iri-3987bis/

<masinter> that is the current draft that address many of the issues I discussed

<noah> To clarify: I think we need to come to a more considered opinion, on the technical merits, as to which aspects of parsing belong in the HTML 5 draft (I suspect at minimum things like lead/trailing blanks and document-coding specific concerns), vs. parts that belong in IRIbis. WRT to the latter, we need to decide whether HTML5 need hang up waiting for them to be settled.

<masinter> and http://trac.tools.ietf.org/wg/iri/trac/report/1 notes issues still open

<noah> I don't feel that I have an informed opinion on the answers, but I believe those are the questions.

LM: So what ABarth and RFielding are saying is partly true: some processing is context-dependent and some is generic

<masinter> http://lists.w3.org/Archives/Public/public-html/2010Feb/0882.html

LM: My proposal for HTML issue 56 and the IRIbis draft embody my answer

LM: Specifically, my proposal includes draft rewrites of parts of the HTML5 spec. which make the split of responsibility wrt IRIbis clear
... You could make the cut somewhere else
... but you must to connect to IRIbis somewhere

<noah> Too much there for me to grok in a hurry -- any guesses as to whether it's consonant with Roy's position?

<masinter> if you go back to http://tools.ietf.org/html/draft-ietf-iri-3987bis-00

<masinter> section 7.2

<noah> This seems to be what Roy is saying is a bad bet architecturally.

LM: That describes some preprocessing that might be appropriate for HTML alone, except that WebApps has also proposed to adopt that
... So it shouldn't go in the HTML spec.

<jar> LM is giving evidence then against Roy's "I have seen no evidence to suggest that such pre-processing is even remotely standardizable" ?

LM: I think RFielding is arguing that there is not a second context which might share it

NM: Not sufficiently common across different contexts to merit factoring
... He's betting against getting much value from saying "mostly do it this way" in IRIbis
... Which links to the venue issue via scheduling - - why wait for something being done elsewhere which you believe is not architecturally subject to good factoring anyway

LM: We none-the-less do need to find a good venue for the technical issues to be addressed
... I am told anecdotally of a software package with 7--9 options for parsing IRIs, depending on the kind of IRI and the context

<noah> I think we also need to figure out what likely should have shared specs (I.e. shared between HTML and lots of other uses) vs. separate specs, before we can settle the venue question

LM: 9 is not 100, maybe it can be reduced by removing some inadvertent differences

<noah> OK, but that seems to be the discussion we need to help the community have. Looks to me like, to some degree, people are talking past each other.

LM: And if so, bringing the remainder together would make sense

NM: How can the TAG help?

<noah> Not presuming the answer is that we should do anything.

LM: There are still some problems with IRIbis, which is a fundamental document for (internationalizing) the Web
... Separating the presentation of IRIs to humans from the representation of IRIs as sequences of unicode characters [for mechanical use] is an important architectural distinction, which we have not made in the past
... So, in particular, I invite the TAG to look at the issues still before the IRI WG
... Comparing IRIbis, which includes a change summary, with 3987 as published would be a big help

NM: So, the ultimate goal is to tell the whole story of getting between a string in a text/html representation which is delimited by double-quotes and may have leading/trailing space, and the characters needed for a GET request
... And getting the right layering of specs for that is looking like a lot of people talking past each other

<masinter> the issue of permanence of URIs interacts with things like character encodings, internationalization

NM: Getting input for Maastricht would be good, but it would have to be done w/o further discussion

[no volunteers]

NM: Rather than close 448, I will REOPEN it to cause us to come back to it after Maastricht

<noah> ACTION-448?

<trackbot> ACTION-448 -- Noah Mendelsohn to schedule discussion of http://lists.w3.org/Archives/Public/public-html/2010Jun/0394.html on 15 July (followup to 24 June discussion) -- due 2010-07-13 -- OPEN

<trackbot> http://www.w3.org/2001/tag/group/track/actions/448

NM: That completes the agenda -- we could look at frag-ids
... but TBL is involved with that, and not here

<masinter> action-382?

<trackbot> ACTION-382 -- Larry Masinter to review Web Arch web material on W3C Web Site and make proposals for changes or TAG action -- due 2010-05-31 -- OPEN

<trackbot> http://www.w3.org/2001/tag/group/track/actions/382

<jar> Can we just close action 382?

Generic Processing of Fragment Identifiers -- ISSUE-39

<noah> ISSUE-39?

<trackbot> ISSUE-39 -- Meaning of URIs in RDF documents -- open

<trackbot> http://www.w3.org/2001/tag/group/track/issues/39

<noah> ACTION-449?

<trackbot> ACTION-449 -- Noah Mendelsohn to schedule discussion of pushback on generic handling of fragment IDs in application/xxx+xml media types (self-assigned) -- due 2010-07-13 -- OPEN

<trackbot> http://www.w3.org/2001/tag/group/track/actions/449

<masinter> relates to new issue on MIME and the web

<noah> FWIW, I have some sympathy with the suggestion that 3032 bis should call out rdf+xml in particular as an exception that's being grandfathered.

LM: Including fragid definitions in media type registrations was new, in that it wasn't needed for the email use of media types

<masinter> but svg+xml and xhtml+xml also need exceptions?

LM: And it wasn't well-architected

<masinter> application/xhtml+xml for polyglot

LM: URI rfc says media type registration determines, but the media type guidance doesn't cover this

<Zakim> noah, you wanted to say that, in this case, media type registration is working dandy...except that there's a looming inconsistency if 3023bis goes forward

LM: There at least three cases of conflict -- rdf+, svg+ and xhtml+

NM: Is there problem with svg+

YL: Not sure it's a conflict, it's an addition

NM: If a generic processor encounters it, will it do "the wrong thing"?

YL: Not sure

NM: Would you take an action?

<masinter> i think the horse is already out of the barn, and that there is no hope of generic fragment identifier handling for +xml MIME types

<jar> i tend to agree.

<noah> ACTION yves to investigate generic processing of svg+xml and XHTML+xml

<trackbot> Created ACTION-450 - Investigate generic processing of svg+xml and XHTML+xml [on Yves Lafon - due 2010-07-01].

NM: I agree with LM that the relationship between [guidance for] media reg docs and the URI rfc could be clarified
... but in practice the right thing has been happening
... But 3023bis should cause a red flag, if it goes forward as written
... because it contradicts an existing registration
... So the TAG is trying to prevent that, in the first instance by suggesting the removal of the entry for fragids in the list of generic processing
... We've gotten strong pushback

NM: So I'd like to look again at some kind of grandfathering as an alternative approach

<noah> So, my position is: 3023bis should say "process generically, except if it's rdf+xml (or, as necessary, svg+xml, etc.)

<Zakim> jar, you wanted to ask how to involve Norm etc.

JR: I'm interested in the pushback, maybe, since we sort of covered the options they propose, that we should summarize our analysis

NM: We could do that, but with the new information, I'd like to start by reconsidering the whole thing ourselves
... HST would also like to come back to this, we have new input from outside, and from LM -- particularly wrt making clear what a media type registration involves
... so we should come back to this in July

<jar> umm... no, not interested in pushback or defense, just wanted to help open up the discussion by making sure www-tag folks knew about all the options that we considered at the F2F. this gives people a chance to say "no I like option Z" instead of just "no I don't like the option you chose"

LM: Contrasting a position of allowing/encouraging particular fragid handling vs. focussing on the generic, I prefer the former
... Better to encourage new registrations to adopt the generic processing
... of fragids, i.e. XPointer

<masinter> I think netting out the pros and cons, the advantage of allowing MIME types -- even if they're +xml -- to have specific fragment ID processing, even if they are +xml.

NM: The alternative is to encourage non-generic-enabling types to not use +xml

LM: Need to net out the pros and cons
... . I'm confident that more exceptions will arise

<noah> And in particular, to note that allowing non-XPointer for +xml means that generic processors can never safely gamble on using XPointer.

NM: Or you could play with the syntax

LM: I don't see the pro for {always generic except these 3 exceptions}, because i'm confident that more exceptions will arise

ACTION-382 -- WebArch and the W3C web site

<noah> ACTION-382?

<trackbot> ACTION-382 -- Larry Masinter to review Web Arch web material on W3C Web Site and make proposals for changes or TAG action -- due 2010-05-31 -- OPEN

<trackbot> http://www.w3.org/2001/tag/group/track/actions/382

LM: I can't see getting to this any time soon

JR: I have a very similar action already

ACTION-381?

<trackbot> ACTION-381 -- Jonathan Rees to spend 2 hours helping Ian with http://www.w3.org/standards/webarch/ -- due 2010-06-23 -- OPEN

<trackbot> http://www.w3.org/2001/tag/group/track/actions/381

<noah> close ACTION-382

<trackbot> ACTION-382 Review Web Arch web material on W3C Web Site and make proposals for changes or TAG action closed

NM: Adjourned

<jar> http://www.adobe.com/devnet/xmp/pdfs/DynamicMediaXMPPartnerGuide.pdf

TAG telcon

24 Jun 2010

Attendees

Contents

Admin

IRI Everywhere

Generic Processing of Fragment Identifiers -- ISSUE-39

ACTION-382 -- WebArch and the W3C web site