I'm not sure the w3-mux list is active yet, so I'm sending my comments out early to you guys... something like this will probably go out to the list, if I don't see the holes in my own arguments and come to my senses in the meantime. So, what follows is drafty, but here goes: This note discusses the hash-based notion of protocol IDs, and an alternative. To start with, I summarize the current hash-based system, as I understand it. I also run on for a bit about what are either problems with the scheme or goofs on my part; if you want to skip to my proposed alternative, look for the line of asterisks. Finally, for some notes on the need or non-need for registries in these systems, see after the line of dashes. Bill's proposal for a hash-based system starts by assuming that any protocol stack (he explicitly allows for the possibility of several stacked transports, perhaps serving different purposes, such as compression and authentication) can be designated by a string of the form, say: http:zippy-zip:nsafe where zippy-zip and nsafe are compression and cryptographic layers, respectively, which I just made up. Since some of these protocols may have options, he allows a syntax for specifying them, as in the following protocol stack designator (I'll call these things PSDs from here on): http;version=1.1:zippy-zip;dictsz=2048:nsafe;trusted-signatory="fred.net" To save the latency hit incurred by transmitting a string of this length, the proposal tosses in a hash function (specifically CRC32, but nothing yet depends on this choice of hash, as opposed to another). So, when a Alice has a MUX connection open to Bob, and wants to initiate a session using the protocol stack named by the above PSD, she runs the string through the hash function, and transmits the hash code as part of the session-initiating SYN-flagged MUX block. Bob is supposed to have a complete dictionary of the hash codes for all PSDs representing protocols that he understands (or to be able to compute one on the fly). So, when he receives Alice's SYN, with the hash-code, he can look it up in his dictionary. At this point, there are three possible cases: 1) None of Bob's protocols have a PSD with that hash code 2) Exactly one of Bob's protocols has a PSD with that hash code 3) More than one of Bob's protocols has a PSD with that hash code In case 1, the connection attempt is simply rejected; this is no problem. In case 2, Bob accepts the connection, and uses the protocol designated by the PSD in his dictionary. There can be a problem here. Suppose that there are two PSDs, PSDa and PSDb with the same hash code. Suppose further that Alice speaks PSDa, and Bob speaks PSDb, but neither party speaks both protocols. Then, if Alice tries to open a session using the protocol named by PSDa, Bob will accept it and start using the protocol named by PSDb, with potentially confusing results. (As discussed below, Bill had assumed that Alice would simply never try to talk to Bob using PSDa unless she knew in advance, by some unspecified means, that Bob understood that protocol. If that assumption holds, then the troublesome cases can't arise). Case 3 requires further treatment --- the proposal suggests that Bob should ask Alice for the full PSD of the protocol that she is trying to use, but the over-the-wire details of how he does that aren't fully worked out yet. So much for outlining the proposal. The one remaining feasibility issue that I'm aware of is that the size of Bob's dictionary must stay within reasonable bounds for the scheme to be workable. To begin with, this implies some constraints on the grammar of the PSDs (to avoid having to stuff the dictionaries with multiple redundant names for the same protocol stack which differ only in, say, whitespace). However, a more significant issue is the number of *meaningfully distinct* PSDs Bob might support --- something that depends rather strongly on what options are supported by the protocols he understands. There is at least the possibility of blowup here; if Bob supports only one protocol stack with three layers, but those layers have five three-valued options each (giving 243 supported variants of each layer), then Bob has to put over fourteen million total PSDs in his dictionary, which starts to get a little burdensome. One potentially dangerous case is illustrated by the trusted-signatory option to my made-up nsafe authentication layer. If Bob knows in advance of some particular set of N signatories he is willing to trust, he can build a dictionary with N entries for each protocol stack involving nsafe, one for each signatory --- which isn't necessarily so bad. However, if Bob is willing to allow Alice to name a trusted signatory which he doesn't know about in advance (we presume he's got some test for the trustworthiness of the named signatory, or maybe Alice is the paranoid one and Bob just doesn't care), then Bob would have to somehow come up with dictionary entries for PSDs naming every possible trusted-signatory on the entire net. It isn't clear how Bob would go about doing this. (I freely admit that the "trusted signatory" option is a bit fanciful, and perhaps symptomatic of poor design in my nonexistant nsafe layer, but the same problem can potentially arise with any protocol option whose values name an external third party or entity which is not known to both parties in advance). **************************************************************** FWIW, all the potential problems above would not arise if Bob had access to Alice's PSD (protocol stack designator), instead of just having a hash code for it. The reason he only has a hash code is because transmitting full PSDs over and over would consume bandwidth, and we want to avoid that. However, at least to my eyes, the complications look somewhat awkward. So, in the remainder of this note, I'll be sketching out an approach which has a lot fewer difficulties, and consumes (to my taste) little enough bandwidth to be worth the trade. What I'll assume is that bandwidth is at least cheap enough that it's no tragedy to transmit any given full PSD exactly *once* on a given (hopefully long-lived) MUX connection. As a check, I'll note that the somewhat over-elaborate PSD used above as an example is a bit under 80 characters in length, and so should take roughly 40 ms. to transmit over a 14.4K modem. IIRC, delays have to be roughly a quarter second before people start getting really annoyed, so transmitting one PSD of this length won't annoy anybody much even over a relatively slow link, but transmitting it repeatedly might. Anyway, the basic idea of my scheme is as follows: the first time Alice wants to open a session using a particular protocol, she transmits the full PSD to Bob, *and* an arbitrary brief designator which she will use to designate the same protocol stack in the future. (The brief designator only has to be unique among protocols used on *this* MUX connection, so fewer than sixteen bits ought to do). At this point, with the full PSD, Bob can determine whether he speaks the protocol without having to try to invert a one-way hash function. For future sessions, Alice uses the brief designator, and need not transmit the full PSD. Note that she need not wait for any sort of acknowledgment from Bob before knowing this will work, since *she* assigned the brief ID, and since MUX blocks are guaranteed (by TCP) to be processed in the order of transmission. If Bob supports the protocol stack that Alice is trying to use, everything just works; if not, he simply notes that fact the first time the brief designator arrives (with a full PSD), and rejects future connection attempts which use the same brief designator. Now for the tricky details: To avoid hash collisions, and all their attendant problems, it probably makes the most sense to just assign the IDs dynamically, per connection; this involves a little extra per-connection data structure for each peer, but maintaining that structure is likely to be less work for everybody than complex machinery for handling hash collisions. Given dynamic assignment, it is clearly important to keep Alice and Bob from assigning the same brief designator to different protocol stacks; this is most easily accomplished, as with session IDs, by arbitrarily having one peer allocate the odd brief designators, and the other peer allocate the even ones. We could also simply require each peer not to try to open connections using brief designators assigned by the other, but that could require the same full PSD to be transmitted both ways in some fairly common cases, and that's just a waste of bandwidth. ---------------------------------------------------------------- Lastly, some notes on the notion of registries. In the ILU context, schemes of this sort don't need a registry, but it's important to note the reason why. First, a scenario: *) Alice is working at SGI, where she's working on networked Virtual Reality. To save bandwidth, she has developed a protocol which allows one peer to incrementally transmit data on VRML objects as they come into the other peer's virtual view (anticipating what will come into view along the other peer's current path, and accepting updates and corrections about that peer's direction of motion). She calls this Virtual Reality Transport Protocol, and abbreviates it in PSDs as VRTP. *) Bob is a househusband with a VCR in his apartment. Since his friend's VCR broke down, he allows the poor fellow to set the timer on his own (Bob's) VCR, and to allow him to do it over the net, he's developed a Video Recorder Twiddling Protocol for transmitting the timer settings, and he abbreviates the name of that protocol in PSDs as VRTP. So, we have two parties, with two very different protocols which they both designate in PSDs (which, remember, are the main feature I've retained from the hash-coding scheme) as VRTP. The question, now, is how we keep Alice from trying to download entire virtual worlds into Bob's VCR. IIRC from Bill's email, the reason that this isn't supposed to happen in the ILU context is because Alice is never even supposed to *try* to speak (her version of) VRTP to Bob unless she *knows in advance*, by some unspecified, out-of-band means that Bob speaks that protocol, and refers to it by the same name in his own PSDs. It is because of this assumption about the behavior of peers using the negotiation mechanism that the scheme doesn't require a global registry; mechanistic details, such as the use or non-use of hash functions, are not particularly key. Now, how workable this is for unrelated parties on the global internet obviously depends on the unspecified, out-of-band means by which Alice finds out about Bob's capabilities. I can come up with schemes which do the job for completely unrelated parties, but all of mine involve either the creation of some global registry for contact information such as layer names (and what they signify), or at least the exploitation of an existing global registry, such as the DNS. Of course, I may very well be missing something here... rst