Technical Architecture Group Teleconference -- 19 Oct 2010

<ht> trackbot, start meeting

<trackbot> Date: 19 October 2010

<ht> scribenick: ht

<scribe> scribe: Henry S. Thompson

<scribe> agenda: http://www.w3.org/2001/tag/2010/10/19-agenda

Admin and agenda review

NM: Recap of time as chair---coming up on two years
... Better focussed on issues coming from the community
... Some noticeable impact, but other areas where we haven't managed to achieve much
... Obvious structural problems with affecting HTML5, but we have made some impact there
... But on WebApps, where we said we would focus, we haven't achieved much as I hoped we would
... Our history was for producing findings, which moved rapidly (some of them) and ended up being well-crafted
... but that has stopped, pretty much
... Partly because we said we wouldn't do Findings, but haven't succeeded in replacing that focus
... Some counter-examples, but mostly because we haven't been putting the work in
... People should be putting more time in
... It's destructive for a group to say: Here's what we're going to do
... and then not do it
... I'd like to see us figure out what we can commit to realistically, and then do it

AM: I agree, I'm frustrated with the lack of feedback on what I have written

NM: Meta-discussion now, or . . .

TBL: A bit, before we shift to substance
... I have put in less time over the last year, I'm hoping that's over and I'll be able to start contributing more
... I've written about net neutrality and people being cut off from the web
... which I'm passionate about
... Passion is important to the TAG's motivation
... Mime, URIs, yes, those are crucial
... What is it about webapps that are crucial
... I'm tasked with bringing back two TAG goals for the next six months to Jeff Jaffe

LM: The driver for anyone is "Who wants it" -- the fear of not meeting peoples interests is demotivating
... (to TBL) You and Jeff should drive us -- what should we be working on?

TBL: But goals for us don't come from management
... Cracks in the architecture aren't noticed by management

NM: The webapps pblm is not that we've tried and found lack of substance, but we haven't tried (enough)
... We need to locate, for example wrt Client-side storage, what the main points ought to be.

LM: I need to understand the whole area before I can identify the main points
... One role of W3C is to apply constraints/feed important viewpoints as policies into WG activities which they might not think or care about intrinsically

<noah> I also think we need to, not right away but early in the process, identify the few main points we want to make, and to whom.

LM: TAG findings, and the Arch Doc(s), are how we express those policies
... So our role is to articulate policies in a way that are actionable

TVR: It resonates, but it takes two hands to clap
... But without respect from the WGs and the people outside the W3C who are driving the Web there's no leverage for what we do
... and that respect has been lost

NM: Use cases are at the heart of establishing credibility
... Saying "I have a principle" is less likely to have an impact than "I have a use case"

TBL: Getting the 'profile' attribute back is a clear example

<noah> I'll give this until 9:40, then on to storage

TVR: The successful cases are from the past, the new Web doesn't have simple axioms or examples that you can state
... The loss of community-level standing to speak by the W3C, not so much the TAG itself -- the TAG alone can't fix this

DKA: Wrt the W3C's role, my perspective is that in the geo-loc area, I see the W3C playing a leadership role and being successful
... Also, when we ran the privacy workshop, we got people from across the eco-system, and the W3C's coordination role was recognised
... We should try to build on that kind of example

TVR: Just emphasising that TAG impact has to be based on W3C effectiveness

<masinter> W3C effectiveness can be improved by the TAG doing a better job

TBL: Let's decouple these -- the TAG should just do the work, and we'll work in other arenas to improve the impact of the W3C

<noah> +1 to Tim

TBL: Agree with NM (and JJ) that we need to pick our focus, and get to work

<Zakim> masinter, you wanted to suggest we review this later in the meeting when we go to "next actions"

LM: When we talk about next steps, remember that the TAG being effective also feeds into W3C effectiveness
... bringing the broader community together
... and expanding the horizons of the WGs

NM: Quality speaks for itself
... I want to look at our work and be proud of it, on the basic of intrinsic merit

Web Applications: Client-side State

trackbot, Action-430?

<trackbot> Sorry, ht, I don't understand 'trackbot, Action-430?'. Please refer to http://www.w3.org/2005/06/tracker/irc for help

action-430?

<trackbot> ACTION-430 -- Ashok Malhotra to propose a plan for his contributions to section 5: Client-side state -- due 2010-09-09 -- OPEN

<trackbot> http://www.w3.org/2001/tag/group/track/actions/430

NM: Background is our commitment to build a webapps document

AM: Two documents: state and storage

NM: Oops, I missed that there were two documents
... Let's start with storage, since it's linked from the agenda

AM: Actually, I'll start with client-side state, which originated wrt Google Maps

NM: Changes since May?

AM: Feedback from NM, LM, JAR, and that was very helpful
... In particular that not all states are reproduceable, only some, and I'm happy to take that on board

NM: So apologies for not scheduling discussion on state
... but not much change in that area?

AM: Right

<jar_> (No news on state since early summer. The new draft is about storage, not state. The action was about state, but Ashok wrote about storage, and what we're about to talk about.)

AM: I would like to have a partner to actively respond to me on this

DKA: I'lll take that on

<noah> . ACTION: Ashok to write finding on client-side storage, DanA to review

<noah> ACTION: Ashok to write finding on client-side storage, DanA to review [recorded in http://www.w3.org/2010/10/19-tagmem-irc]

<trackbot> Created ACTION-475 - Write finding on client-side storage, DanA to review [on Ashok Malhotra - due 2010-10-26].

NM: Move to discussion of http://www.w3.org/2001/tag/2010/09/ClientSideStorage.html

<timbl> Ashok, what about evercookies?

<timbl> I see the, now

DKA: AM mentions evercookies, this is the kind of thing which we should focus on, it got HTML5 on the front page of the NYT
... We should try to move quickly here -- can we triage this area to get something out on this quickly?

<timbl> Note that the W3C privacy workshop and the whole new currentlness of privacy discussion in the communiyt

<Zakim> masinter, you wanted to ask about IETF work on cookies https://datatracker.ietf.org/wg/httpstate/charter/, privacy issues, fragment identifiers ... (items missing that I thought

LM: There is an IETF WG on HTTP state, cookies etc.

<timbl> http://tools.ietf.org/wg/httpstate/

<timbl> http://tools.ietf.org/wg/httpstate/draft-ietf-httpstate-cookie/

YL: the goal is to produce an interoperable spec. about cookies
... both server and client

<noah> Noah needs to remind himself that ACTION-430 was about state, as in fwd/back stack, and this is storage, which is the new action-475

LM: What about sandboxing -- local storage is associated with a site, and there is some model about sandboxing and tainting -- that should be covered in more detail in this doc't

AM: Storage uses same-origin policy at the moment, subject to the same problems

LM: What is the problem in this space -- there's more than one approach to cross-site, how is that playing in this new area? Exposing all that would be a good idea

AM: I borrowed stuff that JK had written about CORS and UMP, and included that

LM: Wrt privacy, the general operation of "clear all my stored information" was not in webarch, cookies weren't in webarch

TBL: There wasn't any local storage in WebArch

<Zakim> noah, you wanted to ask about relationship between storage and privacy

NM: This is a finding on storage, but we move quickly to privacy
... We're failing to notice other things which are fundamental
... cookies are one, we should start with that -- here's the pure Web, in its idealised form
... First look at cookies, SQL stores, etc. Are they antithetical to WebArch, or can they co-exist?
... How do they relate to our existing story about WebArch -- does this e.g. compromise our ability to give URIs to things which can be distributed effectively?

<timbl> You have to say now that now cookies are defintely part o fth eweb arch -- and clioent-side storage and state in general -- we should not ask whee they are just bad. We should define the two types of system, and give the properteis of each.

<masinter> more questions: whether this is 'storage' or 'caching'? is it an optimization or essential? Is there a web for devices without local storage? "public terminals", "embedded clients", ... what is are the requirements for how much storage there is, the possibility of malicious sites using up storage capabilities, ....?

NM: What are the tradeoffs? And only then move to privacy and security

<Zakim> timbl, you wanted to talk about accoutability as a possible TAG finding

TBL: Historical stories, maybe. The first stateless story has been told. So we should recognise the existence of two systems, with different protocols, and then look at how to use the new one effectively

<noah> Should we walk through the document?

TBL: Privacy is really important and current, we've had all these workshops
... Move from a world of access control to a world of accountability
... Great talk by Hal Abelson at the last MIT Privacy workshop -- the five myths of privacy

<masinter> there's a leap between 'storage' to 'privacy' which really seems to be about sandboxing?

TBL: Proposed changing the way we think dramatically

<Zakim> masinter, you wanted to ask more questions

LM: I'm confused about the question of at what point local storage moves beyond being a cache or an optimisation
... Do you have to have local storage to be an agent at all? Public terminals versus privately-confined ones

<noah> I like the public terminal point that Larry is making

<noah> Of course, offline operation is very important in some cases.

LM: The stateless web had the advantage that local storage was not a requirement
... Is it legitimate to have a client the drops all cookies?

TVR: State rides on top of the stateless HTTP protocol
... Some people say that's a hack, we should have had state in HTTP all along
... Others say it's OK, it's good layering

<masinter> I don't think this is a moral issue or a question of "good" or "bad" design, the question is an architectural one, of whether there are categories

TVR: LM is right, in the process of layering we can no longer tell if a URI is basic (clean) or basic+ (cookies needed) or basic-super-plus (need SQL-storage)

<masinter> i would imagine an architectural requirement: web sites that use state should still work with clients that don't have state, although might be missing features

<noah> Need to get link to Crest

AM: Roy Fielding sent us a pointer to a website about C-REST (Computational REST) -- a URL really points to a computation
... but it doesn't spell out state manipulation

<masinter> http://www.erenkrantz.com/CREST/ ?

AM: I thought you, TV, were talking about two or three classes, but I don't think you can separate them

TVR: I agree you can't separate them, but the question is is that a bug or a feature?
... If you go to CNN with a completely clean browser, it takes you ages to actually get the news, because it requires you to answer various questions first
... So a stateless client will always have that hassle
... The old days of getting a bag of bits, from a disk or a CGI script, which get shown, has gone
... Now that bag of bits is run again on the client, and that can iterate. . .
... For example the CNN video page I mentioned in my # in URL document

<masinter> "document" <=> "application which displays document"

<jar_> (Re TV saying "you can't tell" whether the resource is stateless, or whether there's state involved: I muse that this may be the age-old argument of 'typeless' vs 'typeful' languages - do types get reflected in names; are they inferrable from context; or do you have to dive deeper to "tell" what something is like)

NM: Indeed how does the new document relate to the # in URI doc't?

<Zakim> noah, you wanted to ask about doing it right

NM: TVR said look, we had a stateless web, now this new stuff is happening, mostly it's good, but we need to be clear what's more and less good
... There are some things which are pretty dangerous, paralllel to using GET to update state
... Mostly we can set out the tradeoffs, so the consequences of e.g. using persistent local store, will be, both up and down

<masinter> What is it you have to do in order to build an application which works reasonably with many different kinds of devices, all with differing local storage capabilities? is that an architectural goal?

<Zakim> masinter, you wanted to explore some kinds of findings around state

NM: compare what we did with Cool URIs don't Change -- here's what goes wrong when they do

LM: Interoperability -- what does it mean? Two things are interoperable if they work identically? Or something 'works' on small handheld, and on a big desktop screen?
... We're now looking at a range of UAs with respect to local storage capabilities. The goal should be that the architecture makes clear where the scalability has to come
... What are the principles that allow things to work with small or large storage, with any cookies/no cookies/no-cross-site cookies

<Zakim> ashok, you wanted to talk about Evercookies

JAR: Parallel to the accessiblity story

AM: People are very upset about Evercookies
... Is this just one guy's hack, or has it become a Javascript hack?

NM: The guy himself did it to show how easy it was to be bad, as a warning

<masinter> proof of concept, but each of the methods are individually exploitable

NM: But the black hats just say "thank you" and use it

TVR: Evercookie shows how you can create a PNG file and get it back later, which may get deleted because unrecognised. But then just use space-char-distribution in a README, which looks quite harmless

<timbl> Yes, Larry -- there will be two parts to that, adapting diff cient side storage -- one i ssimp;ly mapping the different sorts of faclity (RDB, key/value etc, rdf, etc store) onto each other. The other will be writing an ap to deal with very difering amounts of storage.

<timbl> The latter will be very tricky and difficult to generalize.

AM: Depressing, because it looks like an arms race, parallel to the virus situation

<timbl> (The former is till to an extent in procgress wiht various layers)

NM: Right, so "clean local storage" is just undischargeable -- what is covered?

TBL: I wanted to watermark email to W3C forum so I could detect leaks. . .
... Rather than get depressed, we need to take the switch to accountability here

AM: You can't stop people, but you can hold them accountable, is that it?

JAR: That's the way security works in the real world, per Butler Lampson, for instance bank security works by accountability, not literal security

LM: "Gates, guards and guns"

JAR: Hal A's TAMI project is relevant

<masinter> that's the decomposition of security mechanisms: locked doors that keep people out, monitors that discover when people intrude, and punishment to hold you accountable if you intrude

TBL: [projects from Hal Abelson's "Seductive myths about privacy" slide deck]

<jar_> http://dig.csail.mit.edu/TAMI/ Transparent Accountable Datamining Initiative

<jar_> Also see http://www.bitsbook.com/ book _Blown to Bits: Life, Liberty, and Happiness after the Digital Explosion_ by Abelson et al

<masinter> AAAA Authentication, Authorization, Accounting and Auditing

<masinter> you can use my birth date but not my age?

AM: Yes, I have what I need

NM: Due date?

<masinter> about how to decouple storage finding from privacy

NM: Back to storage, we'll pick up on privacy tomorrow morning

<noah> http://www.w3.org/2001/tag/2010/10/19-agenda.html#privacy

<Zakim> DKA, you wanted to make a point on where client-side storage fits into the "apps vs. web" debate currently playing out in the mobile indtsyr.

<Zakim> masinter, you wanted to ask about "information" boundaries

DKA: Client-side storage hugely important in the mobile industry -- apps vs. web is a hot topic
... Received wisdom is building apps is great
... this isn't consistent with the web-as-a-platform perspective
... This means client-side storage is important to support intermittment connectivity

TVR: Huge confusion about "what is a web application" -- web application vs. client application is just a marketing distinction
... Lots of so-called client apps are actually web-reliant

<masinter> apps vs. web is an interesting discussion.... what is in an 'app' that isn't in the web, besides local storage? "installation", different security model, installation of preferences

NM: But I can't run them on my Palm

TVR: That's just the old write-once-run-anywhere myth
... I don't want to limit web-apps to those which just run in the browser

[general uproar]

TBL: It's an important difference

TVR: It's a continuum

TBL: "Don't build a client app, build a webapp" is my litany these days, because you can link to it, because it _will_ run anywhere

LM: The technology used is orthogonal

TVR: My point

LM: There is a differerence between an app and something on the there's a big security difference

<noah> There is also a difference of openness and ubiquity.

LM: An app can assume permanent storage until it's uninstalled
... An app can install a preferences dialogue

<noah> If I send a link to what Tim calls an application, it will with high probability work wherevery you are. Not true of Android, Flash, iPhone etc. even if heavily using Web protocols under the cover.

TVR: Over time they are merging -- the distinction is only in your head

TBL: One has a URI and the other doesn't

TVR: I can write an android app which jumps in and out of the browser

<jar_> timbl: What do you mean they're merging? One has a URI and the other doesn't

TBL: URI is itunes: ?

TVR: No, http:

<masinter> applications can use web, invoke web, etc. can they limit web access to some sites or restrict ?

TVR: Pandora is a good example
... If the message is it's all URIs, bookmarkable, run on any browser -- yes, that's still there
... but the app universe is much less clear

DKA: I wasn't trying to say that people shouldn't write native apps
... But the client-side storage provision is relevant in a decision about whether to write a native app or a web app
... There are other differentiators, but client-side storage is an important one

<masinter> i think "installation" and "preferences" are other elements

DKA: So if we're trying to bolster web-as-platform, we need to support client-side storage

<Zakim> timbl, you wanted to point out re client-side state the problem of referring to a thing and a veiw of that thing, or a thing and an app which can treat the thing, and th eissue o

TBL: There's a pattern which leads to grief: When we want the world to be represented by one URI
... but in fact it's represented by two
... Maybe we can fix this for two, even if we can't for N
... I'm looking at a location on ???, and I want to drag it to OpenStreetMap

<masinter> and a different security model. Perhaps applications could be designed such that applications also produce URIs so they can transition between app <==> webapp, if they're built on the same technology

<jar_> "in the browser" is an implementation detail. what are the essential aspects of the app / web app distinction from the point of view of issues that are observable to users?

TBL: I can't as a user take "this website" and make it view "this location"

[scribe is getting lost]

<masinter> what Tim is talking about is orthogonal -- can pieces of application 'state' be abstracted enough that they can be shared between applications.... vcard, map locations, etc.

TBL: Multi-dimensional reality about my interests in e.g. people -- sometimes I want them-as-located, sometimes them-as-issue-owner
... Maybe this is pushing us towards multiple-view-supporting URIs -- separate dimensions, in other words -- what object do I have in view, and what application do I want to apply to it?

[again, scribe lost]

<masinter> Tim wants application / web application designers to use common abstractions for common concepts (users, locations, telephones, ....) in a way that those abstractions can be shared between applications

TBL: for sanity, the user shouldn't have to live in a world where the thing and the view/facet/use are conflated into a single URI

<masinter> (to propose)

TBL: we need to untangle those things

TVR: But then we need a universal way of identifying, e.g. locations

TBL: URI of vcard or homepage can be a person-identifier

<Zakim> masinter, you wanted to talk about apps vs. web and to propose that we define what a "web application" is, in a way that helps clarify this

<Ashok> Thre is a project called OKKAM which will give URI to loactions -- among other things

LM: Native apps vs. webapps -- bring different things to the table for the developer . As NM said, starts with the installation story
... There is a natural desire to get a URI which links to an application _in a state_ . Perfectly possible to have a native app which generates URIs for its states, even if it's not "a web-only" app
... That's a separate discussion from the one about what's needed for sharing across applications -- that's harder

[NM: Short break]

Resuming

ISSUE-54 (TagSoupIntegration-54): HTML / XML Unification

NM: Detailed discussion in June at the last f2f about this
... Suggested that TBL, with help from the TAG, might try to pull an effort together to tackle the HTML/XML convergence problem
... JJ has encouraged TBL to discuss this in public at TPAC
... TBL hesitant unless there is progress to report

TBL: This was at TVR's instigation
... There has been pushback from HTML people saying "Nothing's going to change in HTML land, nothing's going to change in XMLland, so no point"
... I got management support to create a task force
... If we are convinced that it has a very very crucial role to play, let's do it

<masinter> What are the requirements for which XML was designed which are not in HTML, and which communities need those requirements? Extensibility, modularity, small footprint.... don't think we can ignore those communities

TBL: But if we think there's no choice but to have two stacks, let's not do it

TVR: The recognition of two stacks seems like one side to being a victory to one side
... because if it goes forward, there's a real risk that [the XML] stack will atrophy and die

<masinter> I don't think the model of "two stacks" makes sense, describes the reality

TVR: I don't think the two-stack story has a future
... For XML to play a role on the browser-driven web, XML has to have a place in the text/html media type
... We have to find a way to sell the value of the XML tool chain
... Because the last mile is broken, the whole XML pipeline story is at risk

LM: XML was designed to meet some communities' requirements, neither the communities nor the requirements have gone away
... So maybe XHTML was going wrong, so we needed a course correction
... But that doesn't mean that we can't bring those original requirements back to the table
... Those requirements: a small, clear, consistent language with not surprises, and modularity, extensibility, archivable
... Not shared by the main driving force of the Web, but a big proportion of the W3C membership
... Someone argued that polyglot didn't matter, but a member (a client) said "that's all I had" -- and they wanted not just the last mile, but to roundtrip -- to take their website and push it _into_ a pipeline
... We have to bring that community with us
... The ebooks /epub standards uses XHTML

<Yves> http://en.wikipedia.org/wiki/EPUB

LVR: And the Daisy XML manifest

LM: The requirement for books is in part archivability, for which a smaller more constrained language is not
... a luxury, but a real value

TBL: So we mustn't leave the polyglot community behind

HST: More than that, the all-XML community

TBL: The HTML community will say that this represents too small a part of the Web for us to care

But it represents a much larger percentage of W3C's membership

<masinter> (a) "leading the web to its full potential" means not looking backward but forward... what interoperabilities can we enable?

<masinter> (b) there are too many web sites that claim they are xhtml compliant

TVR: I don't think the numbers game matters, or trying to 'prove' to the HTML people that they should care about XML on the Web
... What I do care about is that the XML ecosystem is preserved
... The W3C has created a very valuable collection of tools, where XML is at the heart, and sold it to the membership
... In particular, to be able to serve the results of their pipelines onto the Web

TBL: So protect and defend polyglot -- what are the other requirements
... So what other things go into the terms for taskforce?

TVR: Distributed extensibility

NM: The charter of any taskforce should take a clear stand on the HTML5 Last Call
... after which it might not change much
... So are we asking the TF to take responsibility for making HTML5 right for polyglot etc.
... _or_ is it focussed on life after HTML5

TVR: No win either way -- too late for Last Call, too soon for post-HTML5

<masinter> task force should focus on long-term requirements, but be encouraged to make short-term recommendations

TVR: The claim is that HTML5 is continually evolving, we can utilise that to get improvements done

YL: The main difference is that in the XML stack is the extensibility that is both pervasive and easily handled, whereas in the HTML stack, largely driven by the browser implementors, extensibility is a problem to be circumscribed and worked around

TVR: The extensibility argument has been made before many times, but on the implementation side it has been hard about turning new tags into new behaviour
... whereas on the HTML side the addition of new behaviour is quite easy
... Making a raman:person element and move it around and work with it is very important
... but so is my ability to write <div class="person>.... and get lots of leverage wrt appearance and behaviour from that

LM: There is more to XML than extensibility. Did we lose the value of the XML cleanup because of the pushback on extensibility?
... We've lost the conservative sender part of the duality -- for general web use maybe that's not so important, but for interoperability with the rest of the world it may matter a lot
... Here's a requirement (clean specification of clean sublanguage), here's the XML ability to support that, here's what HTML doesn't give it to us

<Zakim> timbl, you wanted to say but in fact thHTML isn't the last mile any more, webapps is more than a mile and where a lot of th eprocessing happens.

TBL: HTML isn't the last mile any more, but the newer architecture involves much more computation client-side, which means its the client that needs both XML and HTML parsers

<masinter> but the client side deals with the DOM? XML dom vs. HTML dom independent of linearization?

TBL: So the client is (via Javascript) looking at a tagsoup-originating DOM and XML from XMLHttpRequest

<masinter> other requirements, how to apply to HTML: XML digital signatures, EXI....

TBL: Which pushes the question of where full HTML parser is required into lots of obscure places

<Zakim> jar_, you wanted to (a) suggest postmortem (b) ask about viewing html as a serialization for xml

TVR: Webkit and maybe Gecko do have tidy functionality, in that they can serialize their DOMs

<masinter> which other XML standards & applications ... how do we apply them to HTML?

JAR: The taskforce could look at doing a postmortem -- the current situation is evidently unprecedented in its nature -- how did we get here, so at least we understand how to avoid it another time
... LM mentioned that we lost the recognition of the value of conservative to produce/liberal to accept -- how did that happen?

<Zakim> ht, you wanted to deprecate arguments based on counting

JAR: So, as a preliminary, how did this happen?

<masinter> http://www.tbray.org/ongoing/When/201x/2010/02/15/HTML5 good perspective

<Zakim> masinter, you wanted to disagree with Noah and agree with Raman

<Zakim> ht2, you wanted to underline the vulnerability of xslt in the browswer

<DKA> +1 to the idea of a dispassionate postmortem.

<jar_> HT: we risk losing all the advantages of XSLT

<raman> JAR's suggestion of learning from the mistakes of the past is a good one, and I was hoping that would happen by constituting the task force to have the right people in it. Writing the wikipedia history article however should not be the task force's job

<masinter> -1 I think *doing* a retrospective view as a document is not itself a good charter goal

s/advantages of XSLT/advantages of client-side XSLT in the browser/

LM: I think the post-mortem is not a good charter goal
... That's already been done, see e.g. the Tim Bray/Ian Hickson exchange [ref?]

<masinter> http://www.tbray.org/ongoing/When/201x/2010/02/15/HTML5

LM: I'd like to focus on getting back to the requirements for re-unification

[tbl reviews the whiteboard -- photograph to come]

<masinter> signatures, EXI....

<jar_> Maybe I haven't read the right things. For me I haven't seen an explanation of what was deeply different in this situation that caused the classical process (professional organization, working groups, etc.) to fail. If we don't understand this we'll just repeat the past.

TVR: Lost the ability to look at a web page and derive an API for it
... a value of XForms which has been lost

<masinter> 'screen scraping' wins

<masinter> "extensibility" actually needs to be put into a context where extensions aren't mandatory for 'browsers'.... e.g., using (X)HTML in help systems which might have addtional extensions, which aren't browser extensions. Building other applications that integrate HTML

YL: The other direction -- Error recovery is better defined for HTML5

NM: Suspended until 1305

<masinter> error handling should be defined for particular applicaiton types, rather than assumed that what's appropriate for 'browser' is also appropriate for other applications

<noah> zakim +aaaa has F2FRoom

<DKA> Scribe: Dan

<DKA> ScribeNick: DKA

Generic Fragment ID Processing

issue-39?

<trackbot> ISSUE-39 -- Meaning of URIs in RDF documents -- open

<trackbot> http://www.w3.org/2001/tag/group/track/issues/39

Noah: we had a discussion around generic processing of fragment identifiers in http bis.
... the specification depended on an interpretation of frag ids that differed...
... the TAG decided that on balance the least damaging thing to do would be to write a letter to the http working group asking for it to be removed from bis.

<noah> http://www.w3.org/2001/tag/2010/10/19-agenda.html#generic

Noah: links there to the proposal, etc...

Henry: In a superficial reading... insofar as there is any official definition, there are no official definition of "ID-ness".
... bad news - xpointer spec says barenames are resolved to IDs and defines IDs in 3 ways...
... 3 ways to find out what an element's id - one from the DTD, one if there is an xml-schema, and one if "you just know.'
... somebody could write an x-pointer processor that "just knows" that RDF IDs are IDs.

Norm: yes you could do that but it would be [not good.]

Henry: therefore we do not need to ask for a change in bis.

TV: Should we write this down as guidance to others who might think there is a problem?

<Zakim> masinter`, you wanted to ask if there's a better definition of "fragment identifiers"

<jar_> (I think Norm said something closer to 'not very smart' than to 'not good'.)

larry: I've been thinking about fragments and what is a fragment, especially in web applications. Documents vs. applications. Is a document a kind of application where the application is "show me this document"?

tim: things which are not declarative are damaging - but if you determine state as "browser with something showing" then this interpretation of fragments makes sense.

<ht> s/In a superficial reading/Thanks to a prode from Jonathan Rees, investigation of the specs uncovered that on a superficial reading/

larry: I like "speech acts" - where semantics is action - I communicate my state to you through speech. This speech act causes changes in state.

<noah> As chair, I'll say that I can't tell if this is a good use of time or a rathole. Guidance would be welcome.

tim: [disagrees]

<Zakim> Norm, you wanted to say that I don't mean that by fragids

norm: for the applications I have in mind I consider fragids a way to reach into the document and get something out - to transclude it in an xproc pipeline, to count the number of characters, - not necessarily to display or change anything.
... i think this is contradictory to what larry said.

larry: I don't think so - if you want to communicate a route from one mapping application to another it could be via a fragment identifer.

<Zakim> noah, you wanted to say that we may need health warnings on registration of new +xml media types

norm: I think of it as a stake in the ground that I can navigate to.

tim: it's important not to generalize too much about frag ids it's valuable to for RDF to use frag ids in a certain way - languages in the future might choose to use them in a different way.

<Norm> +1

noah: I don't recall a health warning in the draft that those who register new media types need to be careful (about frag IDs).
... it seems to me that it might be worth encouraging the editors to say "in the cases where the frag IDs supported by your media type overlap syntacticly with those provided by the generic then... [something]"

tim: how do you grandfather conflict in RDF?

Henry: there is no conflict with RDF.

JAR: Do the specs say that when the frag ID when it's defined by XML ID has a certain meaning or do the specs say "when you see a frag id then it's defined by xml id"? There is a difference.

<jar_> http://www.w3.org/2006/02/son-of-3023/draft-murata-kohn-lilley-xml-04.html section 5

[disagreemnent on whether this conflicts with RDF]

JAR: It conflicts with RDF.

Harry: if your arbitrary foo+xml defines frag ids whose frag ids syntax overlaps with xml id then there's a conflict.

<jar_> "Conformant applications MUST interpret such fragment identifiers as designating that part of the retrieved representation specified by [XPointerFramework]"

tim: either you want that behavior or you don't want that...

harry: I want to know what conferment applications means - generic?

<jar_> "such fragment identifiers" means what?

<ht> and "conformant application" means what?

<ht> s/harry/henry/

Noah: The way to distinguish generic - did you as author of the code follow a path informed by the rdf media type registration? [on whether tabulator is generic or specific]

<jar_> ... for RDF, the application CAN'T interpret it that way, because the fragid isn't an xml id...

tim: no there is no code that looks up how to process things...

s/harry/henry/

tim: any attempt to look at that RDF document with xpointer processing is broken...
... a test case would be a document that has both kinds of IDs...

jar: that would be a bad file - it would cause the id to be interpreted in different ways depending on which spec you are following...

noah: simple case: somebody decides : we like to use XML ids to manage our XML. Some of it happens to be RDF. The people who understand RDF....

[heated discussion]

jar: [writes example on whiteboard]

rdf: ID = "a" -> Person ; xml:id="b"

jar: doesn't occur in nature because people don't put xml:ids in RDF docments.

noah: Neither is there a spec ruling that out.

<noah> The example rdf:id="a" -> person rdf:id="b" -> element is NOT a problem

<noah> The example rdf:id="a" -> person rdf:id="a" -> element IS a problem

<Norm> I'm not sure it's *possible* to define things such that conflicts are impossible, and in practice I don't think they ever happen, so...I'm not sure if I feel like I should worry.

<noah> That's what Jonathan put on the board

tim: fundamental principle: the semantics of a URI is context-free.

<noah> (Jonathan confirms I got that right).

henry: you know that that's false - you can't tell what the semantics are until you know the media type of what the browser sends you.

tim: no.

jar: each representation delivers constraints...

tim: if I say foo#a is interesting...

henry: you can't say if that's coherent or not until you retrieve that URI...
... [restating] what it means to be a generic processor is: you're only interested in constraints that come from a certain class of media types. "I am going to view these documents through a set of lenses consistent with only the 3023bis semantics..."

<jar_> a generic processor is only getting a subset of the whole theory of the fragid. that's fine

noah: on one hand you're talking about a generic piece of code - written as a generic processor - then a second piece of code written as an RDF processor sees the same code and interprets it differently. [Is that OK?]
... if this exists in principle - I was worried about what I just described... that's why I think a health warning needs to be in [3023bis].

<noah> PROPOSED (by Noah) Registrations for media types of the form application/XXX+xml SHOULD NOT define semantics for fragments that would resolve in a manner that is inconsistent with the generic rules

henry: we're saying - documents like this shouldn't be written...

tim: which document - I've got two but I'm going to send you one of them based on my interpretation of your request...

henry: If it's an rdf processor and you give it the second document [on the board] it will have nothing to show you....

noah: 3023 does not currently give it that semantic...

<noahm> Is it clear that we're to make mistakes?

yves: how to process the fragments, based only on the mime type you get back. The purpose of 3023 bis is to say - if you don't know the semantic of the mimetype ...
... it's not a big issue. If you don't know RDF then you don't know RDF...

tim: it's an issue because if you get something else displayed...

yves: it's like displaying an image as text - you get some random bytes...

<jar_> tabulator can translate xml:id="a" into a set of RDF statements (and would be 'authorized' to do so by 3023bis)

<Yves> my point is that using the defaulted behaviour, you know that you don't know the meaning of the fragment

<ht> only if that processor was doing reflection on a bit of XML as an object in the RDF, I think

martin: I could imagine a piece of software that says "rdf / xml - ok I can do something with this as XML because I know XML and I know how to do something with RDF - so then I see #a so in one case I find an RDF ID so I go to the rdf processor or I could find an XML ID and go into XML processing... There could be inconsistent [behaviour]. I think a health warning could be a good thing.

noah: [presents a proposal]

Henry: [disagrees with noah's proposal]

<noah> Which was:

<noah> I clarified that "consistent" means, roughly: "if it resolves in a given document per the generic rules, then it resolves to the same thing per the specific media-type registration" (note that it's OK for it to resolve per the specific rules and fail to resolve per the generic)

Henry: 6 or 8 years ago we were talking about scuds - schema components - how do you point to schema compontents? People said obvious thing would be to use a #name - that's a problem. so we put XML ids in the schema document - and you can use #foo to refer to them. It would work but it would be messy and confusing.
... the fact that we were labelling an element but naming and underlying component seemed sensible...
... [similarly] I don't have a problem with inconsistency across levels.

<jar_> ht: The fragid names one thing, but identifies another. No one was bothered by the pun.

Noah: I'm talking about a health warning for the range of potential future media types that people might register.

<Norm> I think I agree. Inconsistency across levels doesn't bother me. I think.

Henry: I'm not happy with [Noah's proposal] because I don't want to rule out the "pun."

<ht> scribenick: ht

<Ashok> HT: I withdraw ... not convincing people

NM: Objections to my proposal?

LM: Yes

HST: Not happy unless 'inconsistent' can be spelled out

<noah> I tried with:

<noah> That help at all, Henry?

JAR: I guess we could try to converge on allowing the pun, but I think that would be inconsistent with past pronouncements
... I would prefer something with a MUST
... and I don't think any grandfathering would be needed

<timbl> logger, popinter?

<noah> I think SHOULD is appropriate given the precedent of conneg; that's another case in which the same fragid can resolve "inconsistently". that's a SHOULD NOT.

JAR: I want to say something that rules out overriding XPointer

<noah> So, if it's a SHOULD NOT there, why be stronger here?

<DKA> Scribe: Dan

<DKA> ScribeNick: DKA

<ht> JAR: "If XPointner defines the fragid to designate something successfully, it cannot be overriden"

<ht> s/XPointner/XPointer/

<ht> TBL: I will disagree with anything that isn't crisp

<noah> I can certainly live with MUST NOT if you can get a formulation everyone likes.

jar: xpointer does define the frag IDs...

<Norm> I'm with Larry, this boils down to policy at the spec level and MUST seems too restrictive

<noah> Can we wordsmith specific text that might garner consensus>

<noah> Henry, please type what people were agreeing with..

jar: yes I agree with [what henry wrote]

<noah> My IRC missed it.

<ht> JAR: xml:id means an element, no problem -- that's consistent with RDF

<Zakim> ht, you wanted to ask Noah what he thinks of the SCD example

henry: if there is an anchor that xpointer can identify then that's what it means...

<ht> scribenick: DKA

<noah> The chair is trying to get to the queue...with limited success.

jar: if you want to be faithful in RDF then you need to get all of the alternative representations...

henry: the clearest example: in an xml literal you can use xml ID...

<ht> s/xml ID/xml:id/

norm: As long as the solution does not change 3023bis in ways that I previously said I objected to then I could live with almost anything...

<jar_> +1 Henry's wording above "If XPointer defined the fragid to designate ..." (the only other options are punning and grandfathering rdfxml)

<Norm> +1 to Larry on that point.

larry: I'm skeptical around must requirements in documents that establish processes... Mime type registrations are not mandatory... we have a lot of unregistered mime types. The more barriers you add, the more impediments you put to registering things in the first place...

noah: the MUST is against registering a media type...

tim: must is used in a protocol definition - if you obey the musts then you obey the protocol and you get a specific effect.

<timbl> Larry, MUST is used in the sennse of "if you do the things in MUST then you get the benefits o fthe protocol.

larry: let's avoid the imperative text and just say what the consequences are...

<Zakim> masinter`, you wanted to note that registration isn't mandatory, and more constraints on registration the more likely it is people just won't bother registering

<Zakim> duerst, you wanted to say (to Tim) that there is no requirement on RDF processors to interpret any and all graph information in a rdf+xml document. As an example, an RDF processor

martin: answering to Tim - he's worried that every RDF processor would need to do xpointer processing [if following Jonathan's proposal]. No...

tim: when the publisher of the document has asserted a set of triplets - the intent of the document is the triples and only the triples.

martin: if someone put in an XML ID...

<noah> I think the "consequence" is: "if you register a media type that provides for resolutions that are not consistent (in the sense above), then generic processors WILL resolve identifiers per the XML rules"

tim: the XML ID has no significance...

martin: if you open in in emacs xml mode...

tim: then you are violating the architecture of the Web
... it's clear that when someone clicks on a link...

noah: with the plus syntax in 3023bis - it is called out specially...

tim: the +xml is to say to processors e.g. "you might run this through SAX..."

<Zakim> noah, you wanted to talk about should vs. must

<timbl> I challenge people to find any application which actually looks at the "+xml" in the media type -- so we can see what it does.

<Zakim> jar_, you wanted to answer Larry re "must" ... this is easy to fix... just say that certain fragids are defined according to xpointer

noah: I defended SHOULD - I find the fact that con-neg already produces inconsistent results ...

<noah> NM: What I said was, I think SHOULD would be consistent with the precedent of conneg, but I can live with either SHOULD or MUST if either generates consensus.

jar: it's [3023] is already using MUST language - you can just make a statement that the frag IDs in this document are defined in the following way...

<noah> That's a MUST about what conforming processors do; now we're considering a SHOULD vs. MUST on media type registrations.

jar: it sounds like we really are talking about three different proposals. tim was suggesting that rdf+xml should be grandfathered...
... that [should] satisfy norm.
... henry's proposal won't break anything either.
... we have 3 different positions...

larry: I have a proposal.

<Zakim> duerst, you wanted to say that the TAG should wrap up and make sure work on 3023 starts again (currently, the official draft,

<Zakim> ht, you wanted to ask about the XSLT case

henry: on what Tim said - none of the options we discussed mean you need to change tabulator... suppose xslt stylesheets were served with a +xml media type - the obligation to interpret frag ids generically applies to xslt processing, does this mean that every xslt processor writer needs to impleent xpointer? No unless they want to interpret xml ids...

tim: an xslt processor runs xslt - if I have foo.xslt and I serve it as +xml - and I make a link to foo.xslt#bar and the xslt spec says nothing about semantics - your browser is obliged to show you #bar...

henry: it's not wrong for it to show me the rdf xml if there wasn't a hash...
... so why shouldn't it show you the piece of the document marked by the frag id if there is a hash?

tim: it should say "invalid xpointer"...

<noah> I think it's (Xpointer doesn't resolve, it's not syntactically invalid)

<Zakim> masinter`, you wanted to propose my alternative, "just describe the consequences"

[discussion goes on with xml IDs in RDF, XML documents]

larry: this is my proposal.

noah: [calls for all proposals]

<ht> JAR said "If XPointer defines the fragid to designate something successfully, it cannot be overriden"

<noah> 1 = Larry's) NOTE: Some generic XML applications may treat documents labeled as application/XXX+xml using generic processing of fragment identifiers; this will result in inconsistent handling of fragments with those that have specific identification.

<jar_> If XPointer defines the fragid to be something (as opposed to error), that's what the fragid designates. Other fragids can be defined by +xml registrations.

<timbl> ^2

<timbl> 2 suffers from the problem that all RDF processors have to have a xpointer stage added

<Yves> well generic processing use only a fixed subset of possible xpointer schemes

<noah> 3 Noah = Registrations for media types of the form application/XXX+xml SHOULD NOT define semantics for any fragment that would cause it to resolve, in a particular document, to something other than the result of the generic processing.

<noah> If that becomes a MUST, rdf+xml must be grandfathered.

<jar_> inconsistency in the specs, larry.

<duerst> maybe it's the java convention?

<timbl> When the mime type is *+xml, then the semantics of fragment identifiers are defined by the xpointer specification, except for application/rdf+xml where they are defined by the RDF specs.

<jar_> 4. (jar's reading of tim's mind) rdf+xml is exempt from the terms of 3023. +xml only has the 3023 meaning if it is not rdf+xml. ("grandfathering")

<timbl> logger, pointer?

<timbl> Tracker, can we make a vote?

noah: this is not a binding tag vote - it is a fact-finding exercise. Please type into IRC which you like best, if there any you can't live with...

<duerst> any is okay, I'd prefer 2, but not strongly; I disagree with Tim that RDF processors would need additional implementation work.

<jar_> I'm OK with 1, 2, 4. 3 is a bit soft. some preference for 1.

<ht> I like 2 best. I can live with 1, 3 (with s/resolve/identify/), 4 and 4'

<noah> Like: 3 and 2, probably in that order. OK with: Timbls "When the mime type is *+xml, then..." Not happy with 1 or <jar_> 4

<timbl> live with 1, can't live with 2, No to 3, Like 4

<Ashok> I can live with 1

<Yves> I like 1 best. Can live with all the others.

<jar_> s/some preference for 1/some preference for 2/

1, 3, 4 are OK.

(in order)

<jar_> (thinking aloud) 4 is tidy compared to 1, it says no more weird uses of fragids in other +xml types... might be cleaner guidance

<jar_> s/to 1/to 2/

noah: we need someone to take an action to write a note "the TAG has reconsidered its previous action - a. we understand the need for generic XML processing so we are happy to have the text not removed b. we are [worried] about registration of future media types and recommend a warning be written along these lines "..."
... only other alternative - if nobody drafts it - I will take an action to say that the TAG removes its concerns ...

action rees to draft a short note to 3023bis editors reflecting the discussion / consensus...

<trackbot> Created ACTION-476 - Draft a short note to 3023bis editors reflecting the discussion / consensus... [on Jonathan Rees - due 2010-10-26].

martin: somebody should tell somebody to fix that rdf xml dtd that got people confused...

jar: we could put a comment at the top...
... I could write a sentence and send it to Yves...

IETF Draft on MIME and the Web

http://tools.ietf.org/id/draft-masinter-mime-web-info-00.html

Larry: I got some feedback that it wasn't about mime - or just about mime. It's not about all of mime.
... I've tried to give some context of web architecture - thinking back to how MIME was added to email.
... there were lots of document formats. How is it when you communicate from one to the other you tell someone what language it is.
... when you speak in natural language, you "sniff" - you sniff that I am speaking english. But computer languages you usually designate.
... so MIME was designed as a labelling system - for email.
... originally, HTTP didn't have content types - it only had html. Unless you wanted to send plain text.
... a popular system at the time was Gopher - it has code fields...
... single character mime type...
... and we had discussion about shouldn't we use the same labelling for file types.

<abarth> hi

Larry: I'll make a pass thru the document and go back and talk about what needs to happen - I'd rather this be a TAG document than a personal one.
... section 2- history - about how mime added to the web the notion that it didn't predefine what kind of documents you could have... allowed adding image types, etc...
... original "distributed extensibility" - the file type was independent of the URL was independent of the protocol.
... same as for email.
... some problems - ways in which email delivery isn't the same as web delivery. In the Web there is a request and the response is interpreted in the context of the request.
... section 3.3 deserves some expansion...
... section 4 - other notes - [e.g. charsets]
... 4.3 polyglot documents - a single content type which can be delivered as more than one label. The same content delivered in different labels could mean the same thing but could also mean different things depending on the labels...

abarth: Another word for polyglot that gets used a lot is "chameleon" .

larry: [moving on] what is the purpose of the mine registry? [to enable interoperability around well-known media types] It's the out of band way in which messages are self-describing.
... but - languages evolve. I would like to distill the [previous] versioning discussion into something in this document...
... the registry points to a document, a specification, but also about what is being spoken "on the street."
... [lack of version numbering an issue]
... content negotiation and the use of mime types... web architecture says how frag IDs are supposed to be interpreted but MIME type registration doesn't say anything about fragment identifiers...
... still some problems with how we're executing on MIME and belonging in this discussion is [information on] sniffing.
... section 6 is to lay out some specific recommendations - lay out the requirements for a charter for a working group to actually make the changes to the mime registry. It's a matter of staging so we agree on the problem space.

noah: it's not clear to me how normatively media types registrations apply to file systems unless the file system chooses to adopt it.

larry: in the web, people still pass around ftp: urls - I'm trying to lay out some things that need to be in the registry....
... [back to the document] ...
... 6.2 - sniffing -
... first part of the document lays out the history and some of the problems; then goes to recommendations...

tv: I would like to see - all the rules we would like to see for MIME on the web should be applied consistently to mail attachments. It's important for example for when you are displaying mail in a web browser.

larry: part of the reason for pursuing this as an IETF document is to give it a position where the mail client implementers are participating in the discussion as well.

adam: imagine someone invented a new image format called webP and they wanted to use this on the web - what would the lifecycle look like?

larry: you register the type, you deploy viewers, ....

adam: with regard to sniffing, do you think sniffers should sniff for the new mimetype or should we forbid sniffing of this new image type?
... trying to see how things will evolve beyond our messy world.

larry: in a managed world, you don't guess that things are mislabeled unless in cases where they are frequently mislabeled.

tim: should we put energy into making the Web work better compared to http?

larry: it should also apply to a USB stick with web files on it...

tim: the http system does have defined types for files - as opposed to other systems like the filesystem - should we spend our energy trying to encourage people to use http or how to use sniffing when everything fails?

larry: don't see it as an eirher-or.
... we need to be clear about the current state of play and what needs to get fixed, but it's impossible to imaging a world where you never need to sniff.

noah: you're talking about what media type registration needs to say but FTP never refers to the media type...
... there are jpegs delivered through http with an explicit media type...
... if FTP doesn't use media types then it shouldn't talk about the media type registration

[imagine a spherical cow]

noah: I imagine a world - a file format specification for jpeg - does not refer to the media type registration. Most file systems will refer to that - not to the media type registration.
... I imagine a media type as a second document. In your messy situation there are certain user agents who say "i cheat."

larry: the registry is more than just a label.

noah: it [partially] duplicates the file format spec...

larry: the constituencies that need to know this information:
... there are tool chains...
... the middleware of the web delivery chain - people who run web servers, people who run photo sites, virus scanners, content distribution sites...
... as consumers of file types, I want to put the information that the middleware people might need to know ...

adam: the image resizer needs to know when the request is flying by needs to know it's jpeg...
... the jpeg case - you have a browser that's going to take jpegs and pngs ... suppose there is a response labelled as txt and the image resizer doesn't resize it - and it arrives at the browser not resized...

noah: there are parts of the architecture where it's a clean approach. We need to proceed carefully. The system will be more robust - be more "follow your nose" [if it follows the rules...]

larry: this document - does not propose any changes to how implementations work today. the goal is to move the place in where what we are implementing today is described in a way in which the changes we are making as the web evolves are [easier / more sensible]

adam: [is the way you guess going to change in the future? - paraphrased by henry]

<jar_> larry: I think there will always be workflows in which you'll have to guess the type

larry: there will be new filetypes.

<ht> but will there be new filetypes which need to be the result of sniffing

larry: I certainly would like to avoid having content that is labelled with syntactically correct labels being interpreted as something else without strong evidence that mislabelling is [prevalent].
... if receivers refuse to sniff then senders will not mislabel.
... it has been the case that if you get market leaders to not sniff new types then people will test their conent.

adam: this is the prisoner's dilemma - think about video example.

larry: I think this dilemma belongs in section 3. Establishing what the problem is -
... what is the mechanism by which sniffing can be avoided?

adam: think about ie9 - they are making progress...

larry: precedent - I remember when we tried to introduce the host header into http - it was going to be required to send the host header, nobody wanted to do it, but apache got configured to pop up an error when you didn't send a host header, and within 4 months the browsers all changed...
... the people who wanted to host header were the big hosting companies... there was a financial incentive [to deploying this version of apache].

#webhistory

larry: if we can [do something similar we can encourage people] to fix their implementaitons.

adam: I tried to figure this out from a game theory perspective...

larry: some people - e.g. the firewall vendors - are going to sniff.

adam: they are going to sniff in the same way the browsers are going to sniff...

larry: every piece of content is possibly a level upgrade ...

adam: there is the file in linux with all the magic numbers for file types... We looked at how is this different from the browsers and could construct attacks based on this...

noah: you could imagine a firewall that silently blocks stuff - there can be false positives on sniffing... It's good to realize that.

adam: sniffing could be perfectly predictable - if everyone agrees on what algorithm to use.

tim: it could be predictable but it's not always going to be right.

larry: there is no logical path to come from where we are to where everyone agrees on sniffing.
... the email vendors aren't following along - the firewall vendors aren't following along.

noah: you're also punching a hole in the space of data you can deal with...

<noah> http://webarch.noahdemo.com/Metadata/

noah: [points to example]
... imagine this is a bug database - and they put it there to say "this is not well-formed xml" - can't serve this as text/plain

larry: I'd like to cut off discussion on whether or not sniffing is good.
... the goal here is not to endorse practices but to acknowledge them, show a path forward, and show how it can evolve.

ashok: I heard - in this document add a standard sniffing algorithm...

larry: I want to have enough information in the registry so that they sniffing algorithm could point to it.

adam: Today, I am convinced that new content types will also end up with sniffing... [so we will need it...]

noah: why do the webP people want it?

adam: webP want to replace jpeg - they say "jpeg gets to sniff, why not us?"

noah: my ISP that if I put a jpg file they will send it as jpeg but if I put a new thing they will send it as octet stream or something...

larry: we need to tell this story.

noah: it would be nice if my ISP could get at the registry - thus closing the loop-hole.

larry: [back to the document] one of the things we haven't touched on - practices in the community we want to ask people to stop...

adam: you're proposing that the signatures be part of the MIME registry?

larry: some of them are in the mime registry already...

adam: if you have a separate registry - you might end up with fewer signatures and therefore less sniffing.

larry: I want them all the be filled out - I want the servers to do the sniffing so that the channel to the client is more reliable. If there is unlabelled content then you should do the sniffing closer the source.

adam: the folks who maintain the mime registry - do they understand the issues?

larry: they are us
... we have to establish the procedures.

adam: the one benefit of putting it in a standards track document is that these document go through a review process...

henry: after we launched the xpointer scheme registry - the only review was a mailing list - that generated real reviews...

noah: does that feel good enough for infrastructure of this criticality?

henry: the media type registration list does get watched.

adam: i think it's a generally reasonable direction.

larry: I want to identify clearly what the problems are.

adam: i am worried about the incentives of various parties.

<noah> http://lists.w3.org/Archives/Public/public-html/2010Jun/0394.html

IRIEverywhere-27

adam: at what level of detail should you describe - if you have a hyperlink in an html document how do you resolve it - the IRI spec is not accurate to what browsers do - what should we do?

larry: there is a proposed spec that is more accurate. we are trying to fix it.
... martin is not completely happy with all the changes...
... roy was skeptical of if it was possible to capture all the ways applications can process strings and turn them into IRIs
... document that is the subject of the wg - 2 documents - update to the IRI spec, has an appendix (7.2) the preprocessing steps to take a web address and turn it into an IRI...

noah: adam has identified some problems, larry said we're making it better... is there a way forward?

adam: there are several communities ... using URIs and IRIs ... if you have to write one document that pleases every community ... the plan is to write two documents ... we will try to reconcile these documents is there are problems. My document will be written from scratch.

larry: working group chairs are interested in having you [adam] write a draft that they could incorporate as a working group work item...

[discussion on working with ieft]

<noah> HT: The history of the LEIRI note was that someone working on an XML specification found themselves yet again preparing to copy the same transformation rules, because there was no referenceable common text to which one could refer.

<noah> HT: The strings start out between quotes in XML text documents, and need to wind up as RequestURIs in HTTP.

<noah> HT: We thought, for awhile, that it would be in 3987bis, but that stalled for other reasons.

<noah> HT: We then got agreement from Martin and (Michel?) to include it in IRIbis? so that we could refer to it. I'd be very sorry to lose that.

<noah> AB: The section in question is an order of magnitude too small.

<noah> HT: I want to keep our concern "available". We still want that common reference point.

<noah> AB: Not 100% familiar with constraints. Not sure whether they're the same as HTML?

<noah> HT: I haven't looked at 7.1 recently.

<noah> LM: Let me make sure it's really 7.1 we care about.

<noah> AB: Consider an example of the query string. There's some character not representable in the character set. Some choices for how to represent it. In HTML we...

<noah> HT: I know the story.

<noah> AB: It's % escaped...

<noah> HT: Using the document encoding

<noah> HT: Pretty sure XML wants UTF-8, would have to check.

<noah> HT: Fairly sure it's independent of the character encoding of the XML document.

<noah> AB: HTML browsers are simple.

<masinter> http://tools.ietf.org/html/draft-ietf-iri-3987bis-01#section-7.1

<noah> AB: Example http:///example.com (noting the intentional triple /)

<noah> AB: Some specifications give parses like // being the hostname and /example.com/ being the path.

<noah> LM: File URIs...

<noah> AB: Don't want to talk about them

<noah> LM: But we need a common parsing rule, independent of scheme.

<noah> AB: There are 4 sets of parsing rules, depending on scheme

- DRAFT -

Technical Architecture Group Teleconference

19 Oct 2010

Attendees