10:10:23 RRSAgent has joined #urlstandard 10:10:23 logging to http://www.w3.org/2012/10/31-urlstandard-irc 10:10:31 Zakim has joined #urlstandard 10:10:53 Topic: URLs 10:11:00 I’m interested to know if I’m gonna need to rewrite the urlparse python module. 10:11:10 annevk has changed the topic to: http://url.spec.whatwg.org/ 10:11:33 RRSAgent: make minutes public 10:11:33 I'm logging. I don't understand 'make minutes public', annevk. Try /msg RRSAgent help 10:11:40 RRSAgent, make logs public 10:12:16 URL: http://url.spec.whatwg.org/ 10:12:35 timbl has joined #urlstandard 10:13:13 jeff_ has joined #urlstandard 10:13:49 tantek has joined #URLstandard 10:15:36 mchampion__ has joined #urlstandard 10:16:38 If anyone knows where Larry Masinter is please invite him to this session. 10:16:55 plinss has joined #urlstandard 10:18:51 [missed some discussion at the beginning. Starting logs now. 10:19:11 annevk: Some characters, such as square brackets, are not percent escaped when sent over the wire. 10:19:27 annevk: It's better to be more conservative in what's escaped for compatibility 10:20:00 annevk: we're limited in what we escape. If we want to match STD 66 model, we have to escape more. That's not happening. 10:20:29 timbl: you want to have a standard that you can round trip. 10:20:48 annevk: with HTML, you try to enforce postels law. Strict in what you produce, liberal in what you accept. 10:20:58 … the syntax is strict, the parser is liberal 10:21:09 … The parser only stops at fatal errors. 10:21:27 timbl: I don't want URLs written according to the existing standard being interpreted differently. 10:22:13 annevk: I think that is largely not the case. I'm not sure how the current spec deals with relative URLs. 10:22:29 … the way it's implemented in browsers is that there's a certain class of relative URLs. It depends on the scheme. 10:23:01 paulc: When you say the way it's implemented in browsers, is the situation you're dealing with that the browsers are uniformly doing the same thing? Or large proporiton of them? 10:23:17 annevk: all browsers require to not conform to that data model, but there is wiggle room in what they can do. 10:23:33 … at opera we found that being conservative is beter for compat. 10:23:51 timbl: maybe you should do an overview 10:24:17 dbaron has joined #urlstandard 10:24:25 dirk: You say that URL will be based on URI. 10:24:43 annevk: if we can compare it to the RFC, the URL syntax matches IRI. 10:25:02 … the syntax is broad. It escapes characters over U+7F 10:25:02 s/dirk/krit 10:25:50 annevk: fragments are escaped in the parsing step. 10:26:23 … e.g. if you have an ID with a euro character, then the character in the URL will be percent encoded as a byte sequence. 10:27:26 annevk: you have to make it dependent on the document type. 10:28:02 larry: If the fragment contains non-normalised unicode, the matching might need to be a loose matching. 10:28:35 … it might be you have fragments that are written out not in percent encoding, but the matching algorithm does the reverse mapping 10:28:47 … Normally matching from a URI to IRI is a heuristing process. 10:28:58 annevk: it's part of the application layer. 10:29:10 … e.g. SVG takes the fragment, turns it into unicode. 10:29:23 larry: Does it turn the fragment into unicode or do a loose matching algorighm 10:29:28 annevk: HTML turns it into unicode 10:31:32 annevk: do you have some things in general to say about it? 10:31:35 JeniT has joined #urlstandard 10:31:39 larry: There are 4 specs. There should be 1. 10:31:48 tpacbot has joined #urlstandard 10:32:03 MikeSmith has joined #urlstandard 10:32:04 … The IETF documents were the documents of record. If they're no longer, they need to be closed off. 10:32:11 RRSAgent, make minutes 10:32:11 I have made the request to generate http://www.w3.org/2012/10/31-urlstandard-minutes.html MikeSmith 10:32:14 … what do you do with all of the other specs that refer to it? 10:32:26 … The IETF spec says more about comparison. 10:32:36 RRSAgent, make logs public 10:33:02 … The HTML document has something tha was in progress, but needs to be replaced or something 10:33:31 paulc: When the URL work from moved from HTML to WebApps, people thought it would be removed from the HTML spec. It wasn't. 10:33:44 larry: that will be easier to fix later. 10:34:12 annevk: the web apps document is obsolete. 10:34:33 MikeSmith: the editors draft refers to annevk's spec. 10:35:15 hsivonen has joined #URLstandard 10:35:19 MikeSmith: Julian's tests are great for what they are, but will need updating to state the expected result. 10:35:46 larry: The spec in the HTML spec has an advantage in its style in that it normatively references the IETF spec. 10:36:10 … If you look at the latest version of the IETF document, it was changed ... 10:37:22 http://tools.ietf.org/html/draft-ietf-iri-3987bis-13 10:38:11 larry: see section 7 10:38:28 … "Processing of URIs/IRIs/URLs by Web Browsers" 10:38:44 it points to annevk's document 10:38:44 SimonSapin has joined #urlstandard 10:38:57 s/it points/... it points/ 10:39:34 larry: Any valid URI is a valid string that can be used in an HTML document if it's in UTF-8 10:39:42 … then the special processing of the query is the same. 10:39:51 annevk: I think that is true, but I have questions. 10:39:56 … If I write http:test 10:40:01 … that is valid URI? 10:40:28 timbl: It's not a valid HTTP uri, but syntactically correct. 10:40:30 paul has joined #urlstandard 10:40:38 annevk: it's considered a relative reference by by browsers. 10:40:46 timbl: not in the standard. 10:40:57 JohnJansen has joined #urlstandard 10:41:12 larry: does it parse into HTTP scheme with test at the host or path? 10:41:34 annevk: the parser algorithm does both resolving against the base URL. If there's no base URL, you get HTTP scheme and test as the host. 10:41:42 … If there is, then it's resolved as a path. 10:41:58 timbl: I've got old code. It says if there's a colon, it's a scheme. 10:42:38 … If you're going to document this bizarre practice, then you're going to have something incompatible with a bunch of code, or a special section just for browsers. 10:43:17 hsivonen: I think this section on browsers is a distraction. There is other software that is not a browser, that I want to process content in a way compatible with browsers. 10:43:36 … e.g. the HTML validator needs to fetch URLs. It's not a browser, but it doesn't make sense to fetch something different. 10:43:53 … e.g. link checker or spider. I would want http:test to parse the same way as browsers. 10:44:11 … it's apps that want web compat vs. apps that don't 10:45:28 … In the HTML validator, I'm using a jenga (???) IRI library. 10:45:41 masinter has joined #urlstandard 10:45:56 … It has different behaviours for URIs, IRIs, etc. 10:45:59 the question is scope 10:46:25 scope: HTML, HTML on some OS, http: only, file:, other schemes, mailto:, etc. 10:46:28 … there's already 6 different configurations to make it behave in different ways. Not one is that I want it to be compatible with the web, and yet it's the best Java library available. 10:46:50 "how browsers do it" may not be the right scope 10:46:53 … for any Java based app that wants to spider the web, the most successful implementation would be compatible with browsers. 10:47:36 larry: many of the libraries are conditional about whether they're case sensitive depends on the system. e.g. for file URLs 10:47:50 annevk: file URLs are a bit hard. I'm working on them. 10:48:04 s/larry/masinter/ 10:48:09 s/working on them/not working on them/ ? 10:48:38 "What browsers do" might only be "what some browsers do for http: URIs on some file systems" 10:48:39 SimonSapin: no, I'm trying to "fix" them 10:48:41 s/not// 10:48:44 SimonSapin: well, define 10:48:50 ok 10:49:26 I thought I heard that what some browsers do in some circumstances may depend.... 10:49:38 [missed a bit] 10:50:42 hsivonen: The URL bar is UI. It nees to accept what's pasted into it, but can have different processing based on what users type. 10:51:15 is there any real-world content that depends on http:test being treated as a relative IRI in contexts where there is a base? 10:51:24 dbaron: I wouldn't expect any differences in those different places (HTTP, XML, etc.) 10:51:49 … Putting it in an HTML attribute may require some escaping. But I don't know of any software that's going to interpret URLs differently based on if it's in HTML or XML. 10:52:22 hsivonen: timbl's diagram represents what that java library does. To fix the bug, we should use the HTTP stuff for all of it. 10:52:37 s/HTTP/HTML/ 10:53:06 annevk: You're confusing syntax and the model. 10:53:27 … HTML syntax escaping disappears once it's represented as a model. 10:54:18 annevk: the spec model is that you get a set of code points - a string - that goes into the parser. 10:54:55 … HTML handles entity references. The string passed to the URL parser excludes that HTML specific syntax. 10:54:57 test_resolve('bare host', 'http://example.org/', 'http:test', 'http://example.org/test'); 10:55:10 … the DOM level passes it off to the URL level. 10:55:13 ScribeNick: Lachy 10:55:17 so http:test the test is treated as a path, not as a host name 10:55:25 in chrome on windows 10:55:32 paul has left #urlstandard 10:55:55 hsivonen: what happens is that the URL parser knows the original character encoding. 10:56:07 … the DOM has UTF-16 strings. 10:56:26 silvia1 has joined #urlstandard 10:56:35 … The leakage is that it remembers the original encoding that was parsed, and it is given to the URL parser. 10:56:52 dbaron: that extra parameter is used for less than it used to be, I think? 10:56:57 s/the editors draft refers to member:annevk's spec/we have a WebApps WG FPWD of URL but not a current editor's draft/ 10:57:02 … I think there has been change over time more recently than I thought. 10:57:07 RRSAgent, make minutes 10:57:07 I have made the request to generate http://www.w3.org/2012/10/31-urlstandard-minutes.html MikeSmith 10:57:19 … There was a reduction in where IE changed incompatibly around IE7 or 8 and it reduced that leakage. 10:57:55 annevk: basic design: You ahve a string and a base URL and encoding override. Those go into the parser. The parser outputs an object represeenting the URL. 10:58:13 hsivonen: In addition to passing null for the encoding, you can also pass null for the base URL. 10:58:28 timbl: then there's a serialiser 10:58:39 annevk: the serialiser doesn't get the encoding override. 10:58:45 We have four specs for URL/IRI/URI either around or planned: 10:58:45 10:58:45 * IETF 3297bis 10:58:46 - draft http://tools.ietf.org/html/draft-ietf-iri-3987bis 10:58:48 - wg http://tools.ietf.org/wg/iri/charters, mailto:Public-iri@w3.org 10:58:51 - chairs Peter, Chris 10:58:55 - editors me, Martin 10:58:58 Reason: Group established for 3 years, IETF is "official" spec for URIs, etc, referenced by W3C HTML 10:59:00 … when you serialise it, you just output the string, you don't need to know the orignal encoding. 10:59:01 10:59:04 * W3C HTML WG 10:59:07 - Spec http://www.w3.org/TR/html5/urls.html#urls 10:59:11 - Wg http://www.w3.org/html/wg/ mailto:public-html@w3.org 10:59:14 - chairs Paul, Maciej, Sam 10:59:17 - editors team led by Robin Berjon 10:59:20 Reason: stable document, no plans for changing unless bug submitted 10:59:24 10:59:27 * W3C WebApps URL spec 10:59:27 - draft not done, currently pointing to Anne's spec, but plan is to create new spec by copying Anne's draft 10:59:30 - wg http://www.w3.org/2008/webapps/ group email public-webapps@w3.org 10:59:33 - Chairs Art, Charles 10:59:36 annevk: say we have windows-1251 as the encoding override. 10:59:37 Reason: now chartered by W3C to develop spec 10:59:41 10:59:44 * WHATWG spec 10:59:45 … this is our query: ?€ 10:59:47 - draft http://url.spec.whatwg.org 10:59:51 - wg http://www.w3.org/community/whatwg/ email whatwg@whatwg.org 10:59:54 - editor Anne van Kesteren 10:59:54 Reason: WHATWG members defer to Anne 10:59:57 11:00:00 No group has plans to drop spec. 11:00:04 11:00:07 Other groups that have discussed this 11:00:11 - IETF/W3C liaisons (Mark, Thomas, Philippe) and IETF/W3C public-ietf-w3c@w3.org 11:00:14 - W3C Advisory Committee w3c-ac-forum@w3.org yesterday 11:00:15 … that goes into the parser and it comes up with: ?%80 in the parsed URL. The serialiser will output this. 11:00:17 - W3C TAG www-tag@w3.org various action items 11:00:21 - IETF discussion list ietf@ietf.org discussion spilled over 11:00:24 - URI mailing list uri@ietf.org also involved 11:00:27 11:00:27 Testing should help converge the specifications, so results of testing and plans for them should help (Chris, Simon, Kris) 11:00:30 11:01:33 annevk: eventually everyone should move to UTF-8 to avoid problems. The model we have is ugly. 11:02:29 masinter: we need to deprecate the idea that you can convert a URI with percent encoding into an IRI. 11:02:39 … it's a heuristic process. 11:02:55 … we have no idea whether it ever came from an IRI, so it shouldn't affect any definition of the syntax. 11:03:08 … There's just some funny things going on in the back ends of servers. 11:03:59 … it's a special case having non-ASCII characters in the query string. 11:04:10 annevk: it's very common in non-UTF-8 documents. 11:04:55 masinter: processing is in 2 steps. First parse it into components. 11:05:12 … then you translate as necessary. 11:05:17 annevk: no, you translate while parsing. 11:05:24 I think https://bugzilla.mozilla.org/show_bug.cgi?id=261929 is the bug documenting where Mozilla reduced the amount that the origin encoding was used for. 11:05:33 annevk: every browser implements one URL parser. 11:05:37 … except for webkit 11:06:06 annevk: other applications try to converge with web browsers. 11:06:24 masinter: the host time, IDN, the parser translates them to puny code? 11:06:26 annevk: yes 11:06:35 masinter: even in IE, Windows? 11:06:56 paulc: there's a standard Windows library parser in the OS. 11:07:07 timbl: 2 points of order 11:07:17 … 1 it's 12:06 and we've run out of time. 2. Lunch. 11:07:28 RRSAgent: make minutes 11:07:28 I have made the request to generate http://www.w3.org/2012/10/31-urlstandard-minutes.html Lachy 11:07:59 timbl: the other thing, when we started the W3C, we did HTML and HTTP URIs. We tried HTML in the IETF, didn't have the publishing people. 11:08:19 … but HTTP, we did in the IETF in order to get the people designing protocols to review it. 11:09:04 … If you're going to do this work, it's important to unify all this, a classic way to do that is to get all the people involved, who have written other URI parsers, who are going to the IETF. 11:09:19 … going to the trouble to go to their place is sensible. 11:09:46 jeff_ has joined #urlstandard 11:10:38 RRSAgent: make minutes 11:10:38 I have made the request to generate http://www.w3.org/2012/10/31-urlstandard-minutes.html Lachy 11:15:03 MikeSmith has joined #urlstandard 12:04:58 cabanier has joined #urlstandard 12:09:17 MikeSmith has joined #urlstandard 12:13:46 cabanier has joined #urlstandard 12:24:05 timbl has joined #urlstandard 12:25:40 smaug has joined #urlstandard 12:28:20 MikeSmith has joined #urlstandard 12:28:30 MikeSmith has joined #urlstandard 12:28:37 smaug has left #urlstandard 12:29:21 SimonSapin has joined #urlstandard 12:29:45 Lachy has joined #urlstandard 12:29:47 Lachy has left #urlstandard 12:33:11 krit has joined #urlstandard 12:33:17 krit1 has joined #urlstandard 12:33:38 JeniT has joined #urlstandard 12:33:50 cabanier has joined #urlstandard 12:33:57 cabanier1 has joined #urlstandard 12:34:37 tantek has joined #urlstandard 12:36:07 cabanier1 has left #urlstandard 12:39:53 Cyril has joined #urlstandard 12:40:35 MikeSmith has joined #urlstandard 12:41:27 plinss_away has joined #urlstandard 12:41:30 Cyril has left #urlstandard 12:45:15 MikeSmith has joined #urlstandard 12:45:51 JeniT has left #urlstandard 12:47:50 tpacbot has joined #urlstandard 12:48:14 plinss has joined #urlstandard 12:48:52 tpacbot has joined #urlstandard 12:54:12 Zakim has left #urlstandard 13:05:39 masinter has joined #urlstandard 13:05:44 MikeSmith has joined #urlstandard 13:35:20 krit has joined #urlstandard 13:36:32 timbl has joined #urlstandard 13:42:48 masinter has joined #urlstandard 13:43:52 krit has left #urlstandard 13:45:54 SimonSapin has joined #urlstandard 14:04:21 SimonSapin has joined #urlstandard 14:07:05 tantek has joined #urlstandard 14:19:30 tantek has joined #urlstandard 15:07:10 timbl has joined #urlstandard 15:07:30 MikeSmith has joined #urlstandard 15:09:51 timbl has joined #urlstandard 15:11:29 tantek has joined #urlstandard 15:14:12 SimonSapin has joined #urlstandard 15:36:15 masinter has joined #urlstandard 15:36:30 opendata 16:06:15 timbl has joined #urlstandard 16:12:05 SimonSapin has joined #urlstandard 16:17:52 MikeSmith has joined #urlstandard 16:22:22 SimonSapin has left #urlstandard 16:45:06 silvia has joined #urlstandard 17:21:22 tpacbot has joined #urlstandard 17:49:52 timbl has joined #urlstandard 19:10:05 MikeSmith has joined #urlstandard 19:21:01 MikeSmith has joined #urlstandard 20:01:10 silvia has joined #urlstandard 20:31:11 timbl has joined #urlstandard 22:15:18 tantek has joined #urlstandard