W3C | TAG | Previous: 24 Feb teleconference | Next: 7 April 2003 teleconf

Minutes of 31 Mar 2003 TAG teleconference

Nearby: IRC log | Teleconference details · issues list · www-tag archive

1. Administrative

  1. Roll call: All present. SW (Chair), TBL, TB, DC, DO, PC, RF, CL, NW, IJ (Scribe)
  2. Accepted 24 Feb telecon minutes
  3. Accepted this agenda
  4. Next meeting: 7 April. Regrets: SW. NW to Chair.

    Action IJ: Arrange for call to NW

1.1 Meeting planning

1.2 Mailing list management

2. Technical

  1. Architecture document
  2. IRIEverywhere-27 and URIEquivalence-15

2.1 Architecture document

See also: findings.

  1. 26 Mar 2003 Working Draft of Arch Doc:
    1. Action DC 2003/02/06: Attempt a redrafting of 1st para under 2.2.4 of 6 Feb 2003 draft.

      DC: No progress.

    2. Action DC 2003/01/27: write two pages on correct and incorrect application of REST to an actual web page design

      DC: Progress two weeks ago (found co-author).

    3. Action DO 2003/01/27: Please send writings regarding Web services to tag@w3.org. DO grants DC license to cut and paste and put into DC writing.

      DO: I am in meetings now on this topic. No indication of when this will be done.

    4. Action CL 2003/0127: Draft language for arch doc that takes language from internet media type registration, propose for arch doc, include sentiment of TB's second sentence from CP10.

      CL: This is about charset headers, and not generating them when they might be wrong. I hope to have something by next week.

    5. Action DC 2003/03/17: : Write some text for interactions chapter of arch doc related to message passing, a dual of shared state.

      DC: I think I have some progress on latest one. This is my scribbling on issue 8 and nearby. This is scribbling on message passing as a dual of shared state

[Chris]
So i don't have to chase it up again:
CP10. Agents which receive a resource representation accompanied by an
Internet Media Type MUST interpret the representation according to the
semantics of that Media Type and other header information. Servers
which generate representations MUST not generate Media Types and other
header information (for example charsets) unless there is certainty that
the headers are correct.
[Chris]
so its basically, *disagree* with the media type registration ...

Discussion of how to move forward on Architecture Document

[Ian]
TB: Usual progress not happening here (incorporation of draft text after issue resolution). Although lots of people are queued up to write pieces, that's not happening. IJ could write an end-to-end draft and we could work from that.
DC: That one doesn't appeal to me. I'd be happy to say TBL "go".

DO: But one reason we are gathered around this table is lack of TBL time.

DC: I observe that TBL does the writing and that people don't react to it.
CL: I agree with TB on how docs progress: someone just writes stuff.: Someone puts a stake in the ground and then you have something to push against.
DC: That happens when the editor is really the author.
TBL: The way the Process Doc works is that IJ puts it in his head. Not the same with the arch doc (for IJ). What are the reasons for the TAG document?
IJ: Time is no longer the primary issue.: Primary issue is that I am not getting the sense of agreement that would allow me to run with small pieces and create text. One way to advance - small groups of editors to come up with more concrete TOCs for various sections.
CL: I think real text is important; people need something to complain about.
DC: To me the bottleneck is not writing up the decisions; it's making them.
[Chris]
would quite like the paired-off 'buddy' system
[Roy]
Section 5 should be just "Interaction" -- including UI to server actions
[Zakim]
DanC, you wanted to suggest that the arch doc actually does reflect the consensus of the group; there just isn't that much of it
[Ian]
IJ: I disagree slightly on section on protocols; we've not had many discussions. Hard to say we don't agree.
TB: On issues where we can hammer out consensus, writing them down will produce a meaty and useful arch doc. I think there are problems, but we have enough areas of consensus to publish something. E.g., look at our findings. On "5. Machine-to-machine interaction" I think we have consensus on device-independence, lessons of REST, but that material not really there yet.
TBL: Where is Web Services supposed to be?
IJ: I think section 5.
TBL: There's a fundamental difference between Web Services like things and ordinary Web things.
[DanC]
hm... perhaps, TBray; but it's not clear to me where Ian should go to get the material
[Ian]
DO: I got the impression that TBL thought about Web Services interactions as different from Web interactions...
TBL: Some Web Service interactions are very Web-like. But the architecture of Web Services is fundamentally messages that are part of larger protocols. They are not part of transferring the state of global information space. Some Web Services are retrieval operations, and some are not.
DO: Roy asked a question a few months ago - are we writing the arch of the current or future Web.
DO: I thought the answer at the time was more about "what has been". I'd love it if we worked on a more encompassing arch doc, but I wasn't aware that that was our mandate.
TBL: I agree that the arch doc should encompass what we are doing within W3C.
TB: The list under 4.5 has lots of useful consensus material that needs writing up.
RF: It would be easier for me if we talked about the future Web within this arch doc. It's easier for me to lay out the sections in my mind. E.g., I consider everything that TBL described to be under "Interactions"
SW: How do we relate to WSA WG?
CL to TBL: How is what you are saying about Web Services different from Semantic Web interactions
TBL: Whether you send in a msg or a Web service, the Sem Web is for talking about real things in a logical way. You can send messages or create documents. The distinctions are orthogonal.
[Chris]
not finished yet btw
[Ian]
CL: So, e.g., the Sem Web is that "this nut is compatible with this bolt if they have the same thread size; as defined over here at this URI"
[timbl]
Chris how is SW different from WS?
[Ian]
CL: Whereas Web Services is "I want to buy 10k nuts if they fit on this bolt over here."
TBL: When you say "I want to buy them", that's part of the Web Services architecture.
CL: But that's the real world -- it costs you money. I'm not seeing a real difference here; this is still about machine-to-machine communication.
[Chris]
so, sw and ws seem like very closely related things, in summary
[Ian]
DO: I would like to talk about diff between Sem Web and Web Services in the arch doc. My personal opinion is that the focus of Web Services is "how do you send messages back and forth; and how do you describe the messages in the exchange." I see the Sem Web as how you make assertions about things and then make queries (in a reduced world) with constraints. Yes you can put queries in messages...these two worlds intersect.
[Zakim]
TBray, you wanted to note that the only real industrial SW tech applications I've seen have been in WS apps
[Ian]
TB: To me it's obvious that the overlap is very substantial. I've seen people doing Web Services deployment realize that they need vocabularies. I think that they are inextricably in bed with one another.
TBL: I think that you find that when you use one you often use the other. But conceptually, they are very different parts of the architecture. You often use TCP and HTTP together. But they provide different services . Sem Web moves information space from bits to statements about real-world things. Combining Web Services and Sem Web is powerful. But there will be apps that use one without the other. We are not writing an article for Business Week; we are writing up architecture principles. So, e.g., model real-world things in your app, not documents.
[Zakim]
timbl, you wanted to say that the overlap in application area is very large, but the overlap in concept (which befits the arch doc) is smaller
[Ian]
TBL: I can demonstrate cases where you use one without the other.
[TBray]
I don't think Web Services can really get to first base without shared semantics a la SW. And all the SW golden-future stories I read have observable effects that sound a lot like WS.
[Ian]
DO: I agree we are writing an arch doc. The WSA WG is also writing an architecture document. In the 14 Nov 2002 draft of the WSA Arch Doc (3 Basic and Extended Architecture), I proposed some text about Web Services using language of Roy's thesis.
[timbl]
Inextricably intermingled? test cases: Library ctalog = sem web without WS; document validation sWS = WS without SW maybe? Ordering machine parts could be a SWS case.
[Ian]
DO: Maybe we can document differences and similarities.
[DanC]
I think the library folks would disagree with the "without WS", timbl. They're webizing Z39.50 using SOAP all over the place.
[Ian]
DO: We need to decide whether we're documenting one architecture or N.
DC: It is?
DO: I think the understanding we've had so far is that we are writing the Web architecture w.r.t. REST, excluding things like Web Services and Semantic Web.
[DanC]
(I consider our issues list as the authoritative list of decisions we owe)
[timbl]
WS without WS - foaf?
[DanC]
(and perhaps a decision that says 'this document is consistent with our decisions on all these issues')
[Ian]
TB: I think the arch doc will exhibit progress when one or more people take ownership of one or more sections and write them. I don't think we can divorce content from TOC. I am convinced we have consensus around enough points that we can produce a useful document. I'm not arguing that the current TOC is wrong; I'm arguing that we need text from start to finish, and we need to roll up sleeves and write it.

2.2 IRIEverywhere-27 and URIEquivalence-15

[Ian]
CL: Text of this IJ/MD/CL discussion needs to change ($Date: 2003/04/01 16:32:57 $)
[DanC]
to me, what chris just said is the heart of the issue.
[Ian]
CL: No longer true: " 1. RFC2396 should be modified so that hex digits (HEXDIG) are case-insensitive. "
DC: Please include the Kanji example in this document.
CL: TB's how to compare distinguishes case where you dereference and where you do not. I am proposing that the cases where you don't dereference are treated by simple Unicode string matching. However the next level up works on all schemas and you are dereferencing; upper lower case hex should be same; same with Kanji example; I think that this should be scheme-independent.
RF: Why do IRIs affect RFC2396 bis?
CL: Would be weird if Kanji lowercase hex isn't the same as Kanji uppercase hex. Does not depend on URI scheme.
RF: Need to convert from IRIs to URIs for comparison, though.
CL: Aha! No need to do this comparison any more if you are not dereferencing the IRIs.
[timbl]
(RF said they compared the same because you convert to URI before comparison)
[Ian]
RF: If you are going to use an IRI as a namespace id, you're going to want to convert to URI first, otherwise, you'll have differences in the use of that identifier. You create a lot of unnecessary duplicates if you don't convert first.
CL: Encoding has nothing to do with this. These are Unicode character codes. Assume it's in IRI form. Why would you convert to URI?
[DanC]
"most common is URIs"... not in XML implementations
[Ian]
RF: Most common use of identifiers is as URIs; you'll have IRIs sometimes and URIs sometimes. If you don't convert to URIs first, you won't have uniformity.
CL: When you are not dereferencing, case matters. The available evidence from people who do namespace stuff is that they do string comparison. It's pushing too much water uphill for namespace case.
RF: So you are relying on "If you want uniformity, use things uniformly."
CL: I can't get my preferred position, so my new position is closer to what the world will accept.
RF: There are situations where IRIs are not as useful due to lack of deployment.
[timbl]
w+ to say that we were asked to fix a problem; if we documented the current situation, we would document a mess. We could suggest an alternative. Eg. We could recommend that string comparison be limited to URIs only, and people not use IRIs until then can do IRI-URI mapping.
[Ian]
SW: Larry, I think, is advising us to be patient and wait until there's an RFC.
[Chris]
http://www.w3.org/International/iri-edit/draft-duerst-iri.html
[Zakim]
DanC, you wanted to say why it maybe should depend on the scheme and to observe that the genie is out of the bottle here
[Ian]
DC: There's text in XML namespace spec that says you can't have two attribs with the same name after they are namespace qualified.
[Chris]
good test case there
[Ian]
DC: You can tell (for Kanji case) whether your namespace implementation was happy. When I coded this up, I converted IRI to URI before comparison. It worked; that's a coherent position .I looked up in the world and I saw that there's an enormous tide of people doing it the other way: All XPath, XML Query, XML Schema implementations.
[Chris]
I agree its a coherent position. i t was my first choice. But its not the majority position
[Ian]
DC: HTML implementations use heuristics to maximize @@missed@@
[Chris]
everyone does unicode character by character comparison
[Ian]
DC: The way you get shared understanding of a name is to repeat verbatim in lots of contexts. The more mangling we have, the higher the risk of confusion.
[Chris]
mixing up the hexified and non-hexified forms is a problem
they do no collide
the kanji form is te correct canonical form, written correctly
[Ian]
DC: I think that the right answer is: (1) In the specific case of namespace names - no they do not collide unless they match unicode-char by unicode-char (2) this is rationalized in the real world since a person wanted to write their name the way that person usually does.
[Zakim]
timbl, you wanted to agree with Roy and re explain it if chris hasn't got it and to say that while the URI spec licences the conversion between 8bit and hexified froms, then they
... will be equivalent. and to say that we were asked to fix a problem; if we documented the current situation, we would document a mess. We could suggest an alternative. Eg. We
... could recommend that string comparison be limited to URIs only, and people not use IRIs until then can do IRI-URI mapping.
[Ian]
TBL: If we documented the current situation, we would document a mess.
[timbl]
possible solutions -- insist on hexifying for comparison, or forbid hexifying by clients. (iri proposal suggest the former; danc proposal leads to the latter)
[Chris]
we *could* insisst on hexifying always but this would not work
[Ian]
TBL: String comparison is fine BUT, you're not allowed to use IRIs until you've got your IRI-to-URI conversion working.
[Chris]
hexifying isa last-ditch, late conversion to get an IRI through a non 8 bit clean protocol
which is basically what I understand to be the IRI position
[Ian]
TBL: Plan B says that namespace can do char-by-char Unicode and everyone else has to deal with it.
RF: Does this mean IRI comparison should convert URI to IRI first?
CL: No. All URIs are already IRIs.
DC: There is a plan C in RF's question - the two spaces are isomorphic. You peek inside hex encoding.
CL: It's better to say - once you've done hexing you've got a different thing. It's still an IRI, but doesn't use other chars than ASCII
[DanC]
yes, plan C conflicts with the "if you mean the same thing, say it the same way" principle
[Ian]
RF: An IRI that uses chars outside ASCII char set will be less "interoperable" than one that contains only ASCII chars.
[timbl]
The isomophism creates some equivalences which break plan C
[Ian]
DC: This is like a health warning: don't use "l" and "1" to avoid confusion.
RF: Problems include email, RFCs munging text, ... I anticipate times when IRI will be used as namespace and people will want to use it in its URI state.
CL: For the data entry problem, you can already type things in in ASCII. That aspect is already dealt with for XML.
[Zakim]
TBray, you wanted to agree with TimBL
[Ian]
TB: I think that DC's experience with pushing water uphill is correct. If you are staying in the URI universe, then the approach taken so far (strcmp) is important - people should use the same name for the same thing. In a URI world, that's ok. (and bots can be more aggressive) For IRIs, I think we have to push some water uphill.
[Chris]
because bots do dereferencing type things
timbray asks if 'all uris are iris' are cast in stone
believe so, yes
[Ian]
TB on the whether a URI is already an IRI: I think that the rules about hexification non-deterministic.
[Chris]
hexification always when needed, never when not needed
[Ian]
TB: If we could require that hexification always when needed and never when not needed, then we'd have more leverage to attack the IRI problem.
[Zakim]
timbl, you wanted to propose that there is lots more water to be pushed uphill in plan A
[Ian]
TBL: There is lots more water to be pushed uphill in plan A. I don't see Plan C working.
[Chris]
2.3 IRI Equivalence and Normalization
[TBray]
I propose we say a URI is *not* an IRI if it's got unnecessary hexification
[Chris]
"It follows from the above that IRIs SHOULD NOT be modified when being transported. "
"In some scenarios a definite answer to the question of IRI equivalence is needed that is independent of the scheme used and always can be calculated quickly and without accessing a network. An example of such a case might be XML Namespaces ([XMLNamespace]). "
[Ian]
TBL: I feel that when you compare pushing water uphill to number of places where URIs are communicated over 7 bits....that world is much more diverse. That is MUCH MORE water to push uphill. You'd have to divide the Web in two.
[Chris]
disagree that it slits the web in two
*not* doing IRI would certainly split the Web into n parts, though
[Ian]
TBL: If we try to move to the world where URIs are replaced by IRIs, I think it's easier to get namespace-aware software to change (convert then compare). That's a more numerable set of applications.
[Zakim]
DanC, you wanted to discuss world-wide charset and to agree that the rules around hexifying are frightening; cf XML query spec and to observe that we are in web phase 2
[Ian]
DC: Regarding interoperability of IRIs/URIs, my experience is that if you want to use a URI that will really work everywhere, you should use [2-9] and that's it. I agree that the hex rules are frightening; the XML Query folks ran across problems recently. To me, hexifying is something the URI owner gets to do and nobody else gets to look into.
[Chris]
'no peeking inside' - IRI spec agreres about that
3.2 Converting URIs to IRIs
In some situations, it may be desirable to try to convert a URI into an equivalent IRI. This section gives a procedure to do such a conversion. The conversion described in this section will always result in an IRI which maps back to the URI that was used as an input for the conversion (except for potential case differences in escape sequences). However, the IRI resulting from this conversion may not be exactly the same as the original IRI (if there ever was one).
[Ian]
DC: As to TBL's comment on water in 7-bit URI world: Maybe. But water not all flowing in the same direction. I remain convinced that using strings of Unicode chars for resource names is the correct option.
[DanC]
yes, or no, paulc: do we discuss issue 8 today?
[Chris]
would prefer to try and get this issue with some agreement so i can edit this document
[Ian]
[TAG agrees to address issue 8 next week]
[TBray]
I think we have agreement that
  1. IRIs are a good thing, and
  2. URIs are not going away so the URI->IRI conversion is a necessity,
  3. we need real clarity on URI<->IRI interconversion rules & when to do
[Chris]
yes, clearly on 1)
[Ian]
NW: I agree with RF's comment - I think it will remain the case for some time that URIs work where IRIs don't.
[Chris]
for months now
[Zakim]
Norm, you wanted to say that there is simply existing software that insists on ASCIII
[DanC]
yeah, well, solving world hunger is a good thing. but I'm not going to 2nd a proposal to solve it, setting expectations that we're not going to meet.
[Ian]
TBL: I suggest that CL write down Plans A and B.
[Chris]
document both options and discuss them
[DanC]
roy, you're wrong. IRI does exist. It's the norm in practice.
;-)
[Ian]
CL: Yes, I agree that it's a good idea to write down both options.
RF: If the issue is that IRIs should be everywhere, then we need to be clear on what IRIs are. IRI docs are still undergoing big changes. I am happy to reconsider this issue when IRI and URI specs stabilize.
[Zakim]
Roy, you wanted to say that I agree with Larry's comment: the only finding is that IRI does not yet exist
[Chris]
the implementations have consensus and have done for a long time
[Ian]
DC: I observe more consensus in the implementations than in the spec. The fact that the users are happy to invest in this technology early suggests to me that we owe the community a spec for what they're doing.
[Chris]
because the rest of the world wants to use its own characters
[Ian]
DC: I'm talking about XML* specs, not just IRI specs.
[Roy]
Solution: the attribute is CDATA, not an IRI or URI
[Chris]
TAG should request that draft-duerst04 moves to an RFC asap
[Ian]
DC: The status quo is that people cut/paste IRI text directly into their specs.
[Zakim]
timbl, you wanted to say we need to define equivalence more widely than one WG
[Ian]
TBL: We've been asked to say when URIs/IRIs are equivalent. I think it would be a shame to leave it as "Pick your favorite of 7 levels"
[Chris]
it has been pointed out before that equivalence does indeed depend - it depends on the equivalence operator that you are using
and there are only two levels, in fact
[Ian]
TBL: Plan A says IRI_1 and IRI_2 equiv when same string of Unicode chars. Plan B says IRI_1 and IRI_2 are equiv when hexified and string compared.
[Chris]
well, not strcmp but wide-strcmp
[Ian]
DC: The only guy who gets to hexify is the guy who makes up the URI in the first place.
[Discussion of Internet Explorer behavior - been doing Plan B]
[DanC]
(except that I didn't really mean that. yes, I did say that.)
[Chris]
doing plan b when they dereference
doing plan a when they compare without dereference
[Ian]
RF: Like LM, I think we need to wait for the next IRI draft.
[Chris]
roy says that international domain names now ratified so martin will update the spec to reflect this
[TBray]
sorry bye
[Ian]
DC: So Martin writes some text. But we need to get agreement with lots of people (including W3C groups and outside W3C). So TAG should actively coordinate among these groups. E.g., provide tests.
SW: Another thing in LM's message - creation of an IRI mailing list. Is the IETF the venue for this meeting?
DC: An IETF WG could be.
RF: Discussion of draft is taking place on a W3C mailing list for the purpose of advancing spec within the IETF.
[Stuart]
http://www.w3.org/International/iri-edit/
[Chris]
The mailing list for public discussion of IRIs is public-iri@w3.org, with a public archive. To subscribe, send mail to public-iri-request@w3.org with subscribe as the subject.
[Ian]
RF: It would be ideal to endorse this as the home and get people to talk there.
[Some discussion that CL and DC would be uncomfortable with providing text to groups with a guarantee that it would pass muster at doc transition.]
Action CL: Revise the IRI position draft for next week.

2.3 Issues that have associated action items

3. Other actions


Ian Jacobs for Stuart Williams and TimBL
Last modified: $Date: 2003/04/01 16:32:57 $