For review: day 1, day 2, day 3
RB Robin Berjon OG Oliver Goldman SP Santiago Pericas-Geertsen PT Paul Thorpe ML Michael Leventhal TH Takanari Hayama CB Carine Bournez FD Fabrice Desré ED Ed Day DL Dmitry Lenkov JS John Schneider DB Don Brutzman PH Peter Haggar SW Stephen Williams KR Kimmo Raatikainen JK Jaakko Kangasharju TK Takuki Kamiya
RB welcome! agenda reviewed. be prepared for Tech Plenary in Boston, including bad weather! http://www.w3.org/2004/12/allgroupoverview.html please register for workshop and hotel plenary session is Wednesday March 2 with relevant plenary: Future of XML we are meeting Thursday/Friday March 3-4 RB some of us have been contacted by journalists and the press some coordination and internal discussion is advisable since this is challenging RB WG will shut down in 2 months, we appear to be on schedule to complete satisfactorily discussed issues applying for a follow-on group charter CB There is at least a 4-week AC review period for a new charter May is very busy for the team, June 5-7 is AC meeting Cannes Royal Casino RB Writing a charter is difficult but something we can do in March Also issues to consider with getting tied to XML 2 or XML Core PH Updates provided for Measurements Status & Owners Document RB cvs issues - is it working for others? apparently so. (discussed details) RB Review follows for Measurements Status & Owners Document PH 7.3 Compactness looks ready, list comments integrated http://www.w3.org/XML/Binary/Measurement/xbc-measurement.html#compactness RB What about getting 7.3.7 Measurement of XML 1.1 filled in? JS Is lossy/lossless defined? PH Properties 5.3.2 http://www.w3.org/XML/Binary/Properties/xbc-properties.html#compactness JS Concerned not precise enough to be measurable, maybe too subjective OG, RB tricky issues relating to "0" or "1" data model issues... (discussion) RB tradeoff between precision or measurability JS difference between JPEG lossiness and losing extraneous whitespace DB equals difference between application-specific lossiness and infoset RB round trip measurements definition, perhaps purposely a bit vague? http://www.w3.org/XML/Binary/Properties/xbc-properties.html#round-trippable let's first sort out 7.26 Round Trip Support http://www.w3.org/XML/Binary/Measurement/xbc-measurement.html#round-trippable OG discussed round trip may be XML->binary->XML or vice versa DB two subheaders in 7.26.3 are hard to distinguish RB editorial improvements will be good here OG note that these closely similar names are in Properties document also JS question about "data models" (DMs) in 7.26, have we ever made a decision on these? OG discussed byte preserving versus character preserving JS the set of data models which can be represented in XML" -- what do we mean? OG there can be a set of data models, we likely don't want to come to decision on which model is used RB data models may be a decision by implenter noted: both definition and property defined in Properties http://www.w3.org/XML/Binary/Properties/xbc-properties.html#xml-data-models http://www.w3.org/XML/Binary/Properties/xbc-properties.html#data-model-versatility JS,OG [discussion on data models] there can be many DMs implied in addition to those 3 defined DB of the specified/well-defined DMs listed, is it possible to support all of their requirements simultaneously or are there internal collisions JS thinks they could all be composed compatibly RB we might consider aligning with XML Infoset JS different applications can have quite different DM requirements DL DM Versatility might well lead to configurability RB considerations of a Generality property RB among the many issues, transcodability to XML is paramount DL number of DMs is enumerable but not yet final RB so, round trippable measurement still in progress, not yet golden RB back to compactness OG we should focus on methodology required for XML Binary, not broader than necessary DL measurements document has been difficult, other documents are pretty great OG let's pop up and consider our goal recommendation for proceeding with XML Binary and measurements document purpose may be clearer OG how will we refer to measurements document in the characterization document? JS measurements are how we might someday compare and get beyond individual existence-proof examples OG why not focus our efforts on measurements corresponding to critical properties DB, SW want to use measurements in a tool that compares algorithms and produces metrics JS measurements lets us prove to ourselves and show others that solution(s) are possible for a successful standard RB game plan for next session: assess properties critical/desirable/nice, go thru decision tree, then go through corresponding measurements -- break Task: Establish critical properties for measurement taken from Use Cases OG: Has matrix of ranked properties, will update and minute changes RB: Establish list of currently defined properties, OG recording in his matrix RB: Changing Version Format Identification to Format Version Identification RB: Ubiquitious Implementation will be changed to Widespread Adoption RB: We have several properties that haven't been written: Generality, Directly Readable and Writable, RB: Support for Open Content? - listed in military UC JS: That just dated, need to use new name RB: Support for the RDF Family, should be Support for the RDF Stack RB: That's all 40-something of them OG: I have 33 properties RB: Property document has 38 total - there should be 41 OG: I have 41 Now recording Must-have, Should-have, Helpful into matrix recorded by OG RB: Helpful was updated in telcon to "Nice" RB: Matrix filled out for Metadata in Broadcast Systems RB: Matrix filled out for Floating Point Arrays (by group, Roger not present) RB: Random access? JS: Random access in memory important but not as a property of the format. Doesn't require additional infomation in format, just that things are constant size. OG: I think that this is still random access in the file. JS: To be clear, there objective is get into memory fast into structures. SW: In my use case they need ability to access randomly by chunks. RB: I think it is Nice for this one. OG: Perhaps not relevant at all. RB: Leave off. RB: Next, matrix for X3D, DB present to give properties RB: Discussion of content type ... JS: I don't know what it means RB: I want to update it, minimal: defines content type or content encoding RB: It is in email from before the last meeting people thought it way too strict needs to be softened DB: Difference between Random Access and Efficient Update JS: Read versus Write OG: In Ottawa, we only had a small number our standard was "Would you not use the format if it didn't have this" for Must. OG: Now we have 14 and 19 for the first two UCs. RB: Some Must are kind of baseline and weren't list before such as content type - a W3C standard must deal with this DB: We would need a content type RB: But if were not in the standard DB: We'd define our own RB: If you'd define your own you would use the format and define your own so it isn't a MUST OG: You could work around the lack of content type in the format DB: Encryptable ... top 10 requirement OG: Same RB: The way it is written, literally, format doesn't prevent encryption DB: Don't want XML Encryption to be broken by binary JS: You want to use XML Encyrption tools with the binary format DL: Two possible semantics of MUST - I won't operate and I won't use in my environment, in persistent store OG: I was referring to the latter RB: I wouldn't use it for my use case DL: but I could still interoperate OG: I would ignore the standard PH: I don't agree, I may not ignore the standard PH: e.g., compactness, I may require different levels, cell phone vs compactness PH: It is important that I have compactness, but what does it mean that it isn't not there, I may still be able to use it RB: We have some sense of this, 10% compactness not enough, generally say we want the standard to pay attention to this DB: I don't think our MUSTs are creating a wall, because we've acheived this with Fast Infoset and other efforts -- lunch break OG checking properties MUST/SHOULD/NICE in the properties spreadsheet. **ACTION ITEM : 3.6 (Embedding external data in XML documents) HAS BEEN RESOLVED DEAD 3 TIMES ALREADY** Uses cases : - WS for small devices - WS in the enterprise - Electronic documents - FIXML in the Securities Industry JS: Why do you need human readable? TK: For archive accessibility. JS: Is it a MUST or a SHOULD ? TK: Maybe SHOULD. Compactness is more important. - Multimedia XML Documents for Mobile Handsets - Intra/Inter Business Communication - XMPP Instant Messaging Compression - XML Documents in Persistent Store - Business and Knowledge Processing - XML Content-based Routing and Publish Subscribe ML: Is compactness and time complexity the same thing ? RB: No. Time complexity is about the coding/decoding time. - Web Services Routing - Military Information Interoperability -- break OG - the spreadsheet is now much different from before. sorted by "must", "should" then "nice". maybe the first discussion should be fragmentable assuming some threshold? DL - platform neutrality is currently suprising below. DL - we need to give more precise measurement definitions for top ones. OG - noticed there are some props that have no must-have ucs. OG - there is another chart showing use-case by use-case statistics. DB - looks like some use cases are heavy-weight while the others are light-weight. OG - top props seem to have already been supported by xml 1.1 except for compactness and time-complexity. RB - characterization doc will mention that. RB - noticed only one uc requires low implementation cost. OG - (mentioned something like "1/3 of the cases are...", sorry missed it.) DL - are we going to look at algorithmic props? e.g. compactness and time-complexity. doing so may not be appropriate. there's no way to standardize on an algorithm. JS - those are there just for not preventing a better algorithm. RB - xml 1.1 is not good designed for space-complexity, for example. OG - round-trippable is ranked relatively low. seems odd since integratable to xml family is higher on the other hand. RB - if directly readable and writable and integratable to xml, it does not matter whether it is round-trippable or not. DB - are there any example that requires integratable to xml stack but not round-trippable? RB - schema-based encoding. they are not round-trippable. but still are integratable to xml family to certain degree. OG - content-type management and data-model versatility, though they are not high ranked, we may want to include them anyway. JS - the criteria we use for cut off should be based on which are the ones that are important for successful standard. OG - let's start listing properties that are important for us? OG - there are two things meant in time complexity. one is "don't do silly thing in binary format that naturally makes it slow", and the other is "do not prevent fast algorithm." DL - agreed. OG - xquery, xpath, xslt - three big ones that come to mind when thinking about integratable to xml family. DL - there are two aspects. user impact and implementation impact. user impact needs to be minimum. JS - the thing is that some formats are unable to integrate into xml stack. DL - integratable to XML stack does not sound like a property of format. we still do not have a clear definition. should be more explicit. OG - how about dom or sax? having them will make it integratable. RB - what about signature? OG - it's a general desire not to reinvest into existing deployment. OG - content-type management will have to be included anyway. JS - it's quality of the spec. OG - it'll be there for additional consideration. OG - fragmentable? JS - required for military use case. it provides a flexible way to share information and allows more dynamic exchange. you may not know the paries you are talking to in advance. RB - broadcasting requires fragmentability as well. is xml 1.1 fragmentable? SW - yes, but only poorly. OG - seems like 9 use cases like it anyway. DL - streamable means streamable in one path. RB - at least for reading. OG - encryptable and signable. can they be layered? or need inherent support? RB - now should look at ones with very strong should? SW - go through remaining ones? RB - wide-spread adoption. JS - that's the definition of successful standard. SW - it is testing the future. SW - Random access? OG - not convinced random access and round-trippable ought to be in the list. SW - some trade off between compactness and random access, but not been proven to be impossible to implement both at the same time. RB - random access is a cool feature. but not convinced to be on the list. there are three ucs that requires it. SW - you cannot do random access on top of the format. PH - rounting *requires* random access. OG - better approach would be to have a place to put index. TH - if mobile handset receives it, what to do with it? SW - you can just ignore the index. RB - random access is not required to make the standard successful. JS - think about decision tree. ask "is it good for xml 1.1"? RB - walking through decision tree. JS - if you reserve a space to put index table, it does not require updating the table. OG - if not being updated, it should not be in the standard. to be able to update, it needs to make known to the processors. being opaque is not enough. SW - we should make strongly-suggested list and add random access to it. PH - the one we currently working on is strict, minimum list. absolute requirements in other words. SW - understood, but still no properties should dissappear. PT - where to keep it? RB - characterization doc. DB - half of must have ones are already suported by xml 1.1. RB - will revisit the issue later. PH - compactness may supposed to be in algorithmic prop? RB - it's space complexity. JS - small footprint covers code size. space complexity is about heap size. RB - somebody needs to finish time and space complexity specified right. JS, SP, SW and TH need to coordinate to finish them accoring to the minute. RB - space complexity likely to be in the list. JS - support for arbitrary instances not understood well. xml 1.1 has this prop. if an instance does not match the schema, xml 1.1 instance still can be created though it is invalid. question is: if schema-invalid, you cannot create binary file? RB - that's right, unless you have fallback way to represent it. PT - is support for arbitrary instance required for widespread adoption? RB - adding it since there's no strong objection. PT - are we ordering the list? RB - look at the spreadsheet for formal ordering. OG - we should put identifier anyway. RB - generality - put it on the list for now. will revisit after it has been well-defined. RB - we will have 2nd list for later consideration. RB - efficient update will go to "should" list. SW - disagreed. RB - robustness, accelerated sequential access, etc. should also go should list. SW - which ones should go to "should" list? RB - will do "should" list tomorrow. DB - having 6 use cases looks to be the threshold for getting 80-20% coverage. minimum list that we have is insufficient though not saying it is not useful. RB - just compactness and time-complexity will add tremendous motivation. DB - compactness and time-complexity is not enough to convince people. RB - we'll have "should" list, which is the way we address 80%. [Properties required for wide-spread adoption] Directly readable & writable Transport independent Human-language independent Time complexity (not prevent fast implementation) Integratable to XML stack Compactness Royalty-free Platform neutrality Content-type management Fragmentability Streamability Small footprint Space complexity Support for arbitrary instance Format version identifier (Geneality)
ACTION: OG to add notion of costs in decision tree ACTION: RB to merge draconian and error handling to robustness RB: Discussion has indicated there are some discussions in the "must" list developed yesterday. Some are "real musts", some are "must not prevent." Have people given any thought to this? Second thing is what the shoulds are that should be kept in mind if a binary format is to be designed. DL: Also yesterday Santiago suggested maybe a separate group of W3C requirements, or something along those lines. RB: Those things would be: royalty free, platform neutrality, transport independence, human language independent, content type management. DL: I think there was something else on the W3C list. RB: Format version identifier is not required by W3C. KR: Should we mention royalty free? If we apply patent policy, then proposal has to be royalty free. RB: These are the minimal properties we believe are required to be successful as a W3C standard. Cutting this list into groups might help make the set seem tractable. Quickly reach a point where you only want something that is compact and fast. OG: We should also identify which use cases have musts which are not on this list. PH: Conversely, which of these on the list don't show up as musts anywhere. JS: Yesterday we did a cut line in the properties and reviewed things below to see if we needed them anyway. Should review if maybe those properties are use case requirements that were missed in the use cases. Otherwise more difficult to understand the results. DL: All W3C properties are yes/no properties. --OG's PC hanged up RB: we may set 'should not prevent' level, and let the next group to decide on each property. --------------------------------------- W3C reqs: - MUST be transport independent - MUST be human language independent - MUST be royalty free - MUST be platform independent - MUST integrate into the media type and encoding architecture Hardcore reqs: - MUST be directly readable & writable - MUST have low time complexity / MUST be faster than XML 1.x - MUST be more compact than XML 1.x - MUST be fragmentable - MSUT be streamable - MUST support arbitrary instance - MUST have format version identifier - MUST support generality - MUST not prevent small footprint implementations - MUST not provent low space complexity implementations Should facilitate (but perhaps not support directly) - Deltas - Encryptable - Signable - Accelerated Sequential Access - Random Access --------------------------------------- OG: Random Access may not be in, as it requires people to keep the table up-to-date SW: there is different keeping the between diffs and deltas OG: is delta efficient update? or way of keep tracking the update? SW: can cover both ML: delta seems to be specific technic for efficient update DL: how about saying not to prevent efficient implementations of the properties OG: we have to answer whether everyone wants to pay costs to faciliate DL: ML: we should have single conformance DL: that's generality all about RB: if you have 1 format w/ 10 options, we have 100 formats DL: RB: we can do that w/ extension points SW: adding delta layer is not big deal OG: incremental update make the thing complicated, and may not be able to extract correctly RB: i can implement format like TIFF for binary which will be not implementable for everyone RB: what do people think about this 'should facilitate' OG: i like 'facilitate'. i just don't think random-access goes there. -- break RB: back to random access OG: there was one interresting idea. if we facilitate random access, and if processor doesn't want to update table, then they may drop it. SW: or we can mark it to indicate the table may not be up-to-date OG: john was mentioning whether we do this for text as well. SW: meta data should be preserved. OG: maybe PI is the perfect place for this. ML: putting index in PI is terrible idea. it must be in seperate place. ML: simply layering random-acess index will result in bad design. RB: you can put fixed PI in the beginning of the file, like in text. RB: for facilitating, we need extension points. OG: facilitating is ok, but random access will increase complexity RB: benefit you may get is bigger than the cost for extension points JS: has CDF working group decided how to handle unknown namespaces? RB: not yet, but wokring on it. JS: rather than forcing all processor to indicate that they don't understand, we should force processors that understand to fix index table SP: using different space for extension point doesn't work for many application. JS: that's why i asked about CDF. SP: not possible to validate though. RB: do we add extension point in the list? OG: extension point has 1 should (broadcast), 4 nices (web3d et al) RB: idea of having extension point is that could be used to exnted format in the future. RB: how many of should properties could be helped by extension points RB: accel sequential access -> yes RB: delta -> no (by SW) RB: random access -> yes RB: error correction -> yes RB: efficient update -> no SW: embedding support? may be yes. RB: should we support extension point? DL: should is ok. must, i doubt. SP: why is 'fragmentable' MUST? DL: there is strong usecase for it, like information sharing. RB: robustness? broadcasting. you cannot drop entire document even you lost a part of it. JS: can't you wait for the next carousel? RB: no, it'll take several minutes to restart. DL: would robustness helpful by extension point? RB: no. if you find not well-formed part in xml document, xml processing rule requires to condiser whole document is error. extension point won't help to change this rule. discussion goes on.... CB: what robin is saying is that if base spec defines some error handling rule, extension spec cannot go against it. RB: what i'm saying is that we can do it with extension point, but in base spec, we should not force text xml error handling rule, i.e. you should throw away entire documents with error. -- lunch break Property demand (continuation) Efficient update (one must) SW: should not prevent in the same sense as in time complexity TH: delta is one way to achieve efficient update SW: tokenized length, update in place TH: worst case is that you need to encode/decode everything RB: very related to random access SW: the most efficient why to implement efficient update is to have efficient handling of deltas OG: efficiency can be achieved on higher layers - I do not see a use case that requires efficient update on format level RB: XMPP headers is not a MUST OG: Efficient update concerns local modification - not routing when you need to copy the whole message from the in-port to out-port OG: appending is a good way to understand this property SW: appending does not work well in all cases SW: we ment writtable random access .... CB: This discussion start to be out of our scope RB: Can this not be done as an extension point SW: This must be done on a low interoperable level OG: XInclude example should be kept in mind SP: We should keep in mind that everything that can be layered should not necessarily be layererd. Conclusion: This property is MAY NOT PROHYIBIT Embedding support RB: Can be layered Round trippable KR: this may be a requirement independent of a specific use case in the same sense as royalty free OG: I take an action item to draft equivalence requirement for the characterization document RB: This is a must to some lossless equivalence class RB: We should describe the problem and minimal requirements but not the solution KR: We tell the requirements for the next working group --break RB: Next up, grouping properties into those that are boolean for ease of Measurements document. RB: Want more detailed descriptions than boolean for: small footprint, space complexity, time coplexity. DL: Time complexity -> must not prevent DL: Small footprint -> must not prevent RB: Need description in Measurements document for sf, sc, tc. SP: Have we split time complexity in two things? RB: Will have description and boolean for each of the algorithmic ones. DL: Maybe discussion should stay in properties document? RB: No, we're really talking about how we would measure them if one were going to be hardcore and precise to measure thems. Leads to those measuring in greater depth. SP: Word "complexity" has a very specific meaning. Discussion about time complexity vs. processing speed, etc. RB: Should say measures the relative speed of two formats. DB: Something that has processing parity doesn't have this property? OG: All formats have this property. The question is, what's the value? JS: Could have multiple implementations of the same format. Don't address how we separate other issues from format. DL: We're not doing measurements, we're describing how to do measurement. JS: We need to describe how to separate measurement of format/algorithm vs. unrelated. OG: There are whole books about that subject. OG: Could take first cut without doing measurements. SP: Looking at a specification and trying to do that? DL: Yes? SW: The theoretical and empiracle measurement values are two different views that provide evidence of the fitness of the format. Additionally, something is shown about algorithms, but that is secondary. DL: Santiago and I feel the measurement aspects should be seperate. In the Measurement document there should be only one. RB: Why? In requirements we have just a boolean, but in Measurements we should document measurement methodology. DL/RB: Discussion about whether speed measurements were in methodology doc. We didn't resolve that they weren't, we need them. CB: The issue isn't just with that property, but all properties in general. Methodology document has guidelines... DL: Part that measures different parsers is not part of this or next group. We can say how it's done. CB: That is useful to compare implementations, but it is not useful to prove that a binary format would be better. SP: My objection is not about percentages to explain our methodology, but that I would like an additional property. RB: I believe we are in violent agreement, the argument is how to organize it in the document. DL: There should be two different pieces. RB: Which two pieces? SP: Property document boolean should talk about whether it prevents it? RB: The methodology doc is not just about the format. You are suggesting that the time complexity text, possibly renamed, gets in the measurements document. RB: Resolution: A) time complexity name update: Santiago B) Measurements doc, in properties section needs boolean about whether it is faster than XML or not. - DL C) Current complexity info goes to new section of measurements document that talks about how to measure implementations. -> goes into appendix - M editors RB: Santiago: update property side of small footprint editors: put measurement of small footprint description in right place in measurements document. Space complexity: Take guideline measurements and boolean measurement. RB: Accelerated sequential access: does not prevent. SW: Question: should we put everything about a property together or in another section. RB: Hold question, see how measurements fill out. KR: Time is no limitation, could run for 2 weeks. Format could use caching for repeating entities. Could depend on how many times things appear in data. ML: Delta is totally application usage specific. JS: There was an attempt to identify some important cases. (Data? Scenarios?) RB: I think the measurement is ok the way it is. Measurements will be in ranges of speed. DB: editorial: change 1/0 to true/false, etc. ED: Why are other aspects such as lossy, lossless, etc. there? RB: Those are important when looking at properties. Those that have the option of being non-lossy, need specific details about measurement. SP: Should have links between documents. - AI: editors Group discussion: Data Model Versatility: Push to syntax level. Discussion about measurements, whether we have to do all of them: we must have a solid measurements document. At least main properties. Could have important new use cases that have properties that we don't have measurements for. SP: Suggest that we delete the data model property. This is especially true given our syntax base decision on last one. OG: Roundtrippable and data models discussion in characterization covers the same issue as data model versatility. PH: Are we doing all of the properties? RB: Have produced priority list, shown. We are going to prioritize properties and create simple measurements for each one of them. Some will have methodology descriptions. Of first 5, shouldn't need any discussion. RB: AI: take first 5 and put in booleans section without further description. KR: Transport Independent may put requirements on transport service. Need to note assumptions that format makes about format: ordered, no missing bytes, etc. SP: There are filesystem assumptions also. RB: That's not what Transport Independent is trying to do. Measurement is whether the format makes any assumptions (many, few, tied to transport). RB: vote: everyone agrees that XML is transport independent, therefore support for guaranteed transport is transport independence. RB: Directly readable and writable. Boolean? Yes RB: Must exhibit some compactness. JS: Won't satisfy my customers. Shouldn't be boolean. RB: It doesn't say boolean! (anymore) Current description from measurements document. RB: Fragmentable: boolean (some discussion) RB: Support for Arbitrary Instances, boolean RB: Format Version Identifier, XML 1.0 has it. RB: Generality: waiting on John's description. RB: Roundtrippable: boolean? Many booleans? byte for byte, equivalence class, doesn't support it. OG: Meets is boolean, but measurement doesn't change, trinary. RB: Extension Points: boolean, has greater granularity, but not needed here. RB: Robustness: can be boolean. RB: Deltas: trinary RB: Encryptable Bool/trinary RB/JS: Trinary: prevents, does not prevent, supports directly RB: Singable Trinary, Accelerated sequential access is trinary RB: Random access, embedding support: trinary RB: Efficient Update: trinary
Actions: Everyone who has Property and Measurement updates must have them done by Feb 7? a few can trickle in on Feb 10th, but that's the latest in order for the editors to generate solid review copies of the respective documents. RB: Publication schedule. First doc out would be Measurements in the next two weeks. PH: Who has actions? RB: Takanari, Dimitry, Robin, Santiago SP: Should we publish a new version of Properties? RB: Yes, how close are you? SP: I can do in 2 weeks. PH: We need an update to the UC doc too as Property names have changed as well as we updated the Must, Should, and Nice lists. SP: Can't commit on the UC doc for next 2 weeks. RB: Doesn't have to happen in next 2, but needs to happen in next 4. There is a publication moratorium around the TP time. RB: Where do we publish Characterizations during all this? KR: Time scale is a problem. RB: Could publish right after TP. KR: That would be good. Can?t update all docs at the very end?too hard to keep track of what's going on. RB: Need a final stable document on these dates: UC ? 2/17 Properties ? 2/17 Measurements ? 2/17 Characterizations ? will pick a date at next F2F RB: We will request publication for the following week?21st or 22nd. SP: Do we have the final list of Properties. RB: I put the final list online. Draconian merged with Robustness and nuked Data Model. RB: Any property updates need to be done by 2/10. PH: We need all Measurements by 2/7 to have a draft to review on 2/10 in order to generate a final version on 2/17 that is ready for publications. SP: But we won't have all of the Properties by then. RB: Can you deal with a few of the properties not being ready until the 10th. PH: SW: Yes, not optimal but we?ll deal with it. RB: People need to focus on Properties and Measurements. OG: I'm curious about the rest of today and our next F2F. It would be nice to have a conclusion now and use the next F2F as wrap up. RB: I agree. We need to look at do we have enough to justify a binary standard. Is it possible to make a standard that corresponds to our requirements? OG: The categorized list is the most useful. Let's look at it. RB: I think we are agreed that a standard that had all these properties would be successful. OG: I claim that XML already does all of this. RB: I agree OG: You could do all of this if you tweaked XML in a couple of places. RB: Compactness? OB: That's the toughest one. OG: Introduce a new tag that allows binary encodings instead of string encodings for everything. You can make minor modifications to XML instead of writing an entire new spec. PT: In my experience, the biggest overhead is the size of the tags in many cases. OG: For some UCs it's not the size of the tags?energy industry for example. OG: Is compactness the only thing that is difficult to achieve? SP: Processing Speed OG: Why? SW: You have to parse every character and do tokenization. OG: Why is it faster, you still have to read the binary file. RB: We need to be sure that what we have here is different enough from XML to justify a standard. OG: Different argument, does this justify binary XML or justify changing XML? DL: Do we need an alternate serialization? I think there is justification for an alternate serialization. OG: What if we tweaked a few things in XML and changed the version number to 2, is that better than creating a whole new standard...I just want to push on this point. DL: XML works well however it has problems with Compactness, Processing Speed and Random Access. JS: XML works well for many applications but it is not compact or is too slow. We want what XML does, but more efficiently. DL: We are in agreement. XML does not have generality in the way John explained it. JS: XML does not have efficiency or compactness. OG: Can XML efficiency be improved. SW and SP: Yes, but not enough to satisfy some applications. ED: Do you want a different textual format? OG: How many properties do we have that are MUST? RB: About 16. OG: 20-25 on the screen total. XML 1 does all but compactness and speed. RB: Not great on the 'Should' list. OG: Yes, but the 'Shoulds' were there because we decided that they could be done on XML 1. Efficiency and compactness are the only ones that XML 1 does not do. So we have identified that XML 1 can do 95% of our list. SW & DL: No OG: Where am I wrong? DL: Can't deal with XML?s lack of compactness and speed. OG: I'm not proposing that we deal with XML the way it is?I'm suggesting we can maybe make some changes to XML 1 to solve these issues without a new standard. DL: I would put most of the items on the list as a MUST. OG: If you take the list of requirements, I think there is a strong argument for tweaking XML instead of a new standard. RB: It?s not two properties that have to do with compactness and performance, there are at least 4. OG: I am going to claim that the only 1 that you can?t tweak XML and solve is Compactness. JS: I don?t think everyone understands you objective. OG: Make sure we go through this argument. I'm not convinced we have had all the counter arguments. PH: One of our documents was supposed to cover counter arguments and argue against binary XML. DB: UCs are a great place to show where XML falls short. RB: Many UCs disappear if you don't have compactness. OG: Even if compactness is the only one, it may be sufficient to do binary XML if we have to do something so fundamentally different to justify it. JS: I like what you are doing. You can process XML very efficiently in text, in low footprint. The big issue is size?my customers have bandwidth constraints. There is nothing you can do with text that will get them there. OG: Would a Fast Infoset encoding of these documents be good enough for your customers? JS: No OG: How much smaller than fast infoset do they need? JS: Order of magnitude OG: How much smaller is Fast Infoset? SP: 1.5 times smaller but you can do other things to go smaller. PT: If you have a schema, we see 1/10 the sizes. JS: CIO of Air Force requires schema to get the small sizes they need. DL: Me too, I use schemas to achieve the size reduction SW: If you have a open ended format like XML, you can add arbitrary stuff to it. What's interesting is adding capabilities when you are becoming more compact and faster. Text XML is just not efficient to process for general processing. JS: You can say it's not as efficient as binary. SW: It's just not efficient. SP: I'm not sure everyone would agree XML would support Fragmentable. Going back to Processing Speed? XML is slow because of tokenization. Binary XML has already made the decision of what is coming next. JS: That's what I meant by binary being more deterministic. SP: If it's binary, you can do numeric optimizations by storing stuff in binary. Example is what SOAP did?eliminating PI and doctypes. DB: Not just tokens, but data that can be parsed and put into memory much faster. JS: There are assumptions when things are in binary. You don't have to determine if it's well formed when in binary. It is more deterministic. Some of the assumptions you can make, people will push back to say they are invalid. PT: Regarding processing speed, a PER encoding could require 1/100th the processing power of XML in our tests. OG: I'd like to have something from JS and DL to understand what order of magnitude of compactness that is needed for their UC?s. JS: Stuff Military is doing in some cases is 16 times smaller than text XML OG: We need to get those numbers in the UC. JS: I will mention some of this in Generality as well as the UC. DB: We also need some of this information about size in Characterization. If this is our 'convince you' document, this data should go here. JS: Can you implement zip with ECN? It's all dynamic analysis. PH: There is something that we haven't discussed. I'm convinced that we can represent XML much smaller than text XML in a binary form. I?m convinced that we can process a binary form much faster. What I am not convinced of is that you can do all this faster in a real-time messaging environment. It takes time to compress the text data and put it into an alternate format. PT: Yes, we can do that with PER (Packed Encoding Rules). SP: You can do serialization faster in XML faster than parsing. If you do redundancy based compression, it is slow. Your roundtrip is faster because serialization and parsing are faster. RB: You can make more assumptions in the binary format at serialization. PH: Are you assuming a schema based compression: SP: No, if it's schema based it will be several times faster. RB: We need to capture this in Characterizations. RB: You have to balance it. The more compact you make the format, the more time it takes, yes. PH: My concern is that if you achieve the right balance, will it be faster than XML by enough to justify all this? PT & RB: Yes JS: You can do all this in a tunable way given the right algorithm. DB: Should we be mentioning candidate technologies by name and comment on them? RB: Group is uncomfortable doing that?it is not our place to do this. DB: I understand and agree, but how to we respond to people asking why we don't use X? RB: This is not what we are charted to do. -- break RB: There are several things we need to turn into proper resolutions. 1. Should we recommend that the W3C develop a standard for binary XML? Sub-questions are : Is the strict set of requirements different enough from XML to justify another format; Is the cost of creating a "second format" justifiable? 2. If yes, is this doable? 3. What is the relationship between binary XML and XML? SW: I don't think it makes sense to just ask the first question. RB: We only need to create a minimal answer to move on. Something to look at is the charter. We need to make sure we have covered everything the charter says. CB: The charter says that we may identify some candidate solutions. SP: The answer to the first question is YES ML: I will just say YES. It is easier to do a binary XML rather than improving XML. SP: I have a comment on my YES: I believe the way to go is to fix other things in XML at the same time with an XML 2.0, not just do a binary XML. RB: That could take another 5 years. CB: Question for Michael, what about the binary approach? ML: I think we still need a binary format but can fix some of the problems in XML. FD: I would say YES, but tying it to XML 2.0 does not see like a good idea since I am worried about the time it would take. TK: I vote YES, since Random Access and Speed are some of the problems we would address. OG: I have to ask a question first. We recommend that the W3C recommend *or adopt* a standard for "binary XML", so then I would say YES: TH: Yes CB: Abstain RB: Yes KR: Yes SW: Yes PH: I am not convinced yet. I am not convinced it is doable to optimize both speed and compactness at the same time. RB: I will rephrase it as YES if doable. PH: Take off the YES, if doable, and say Not convinced. PT: My answer is YES, and Yes it is doable since ASN.1 does it. JK: Yes ED: Yes, something has got to be done. JS: Yes DL: Yes. RB: How do we prove it is doable. OG: The issue I heard is whether it is speed and compactness can be done together. ML: We make a hardware device that does fast processing of text XML, with binary XML there are many more things you can do in hardware. Intel is looking at the XML problem and how it works with microprocessors. RB: If it is doable, we need to prove it by identifying some formats that do it. SW: We can do a first pass survey to see who things it is doable, or not doable and why. PH: I am not convince it is doable based on our internal. SW: What can we do to convince you? PH: What would convince me would be to have an example software that we could try internal on our data to prove it is possible. JS: Are there some things you have seen in binary formats that you think would make your text things slower. PH: No. RB: You have faster XML parser than those commonly available. Do you have an idea of how many use cases you would knock off with that text base parser. PH: We would not knock off any that require compaction. SP: There are a lot of parsers that cut corners to improve performance. JS: There are a lot of ways to address performance, such as hardware, but it is more difficult to address compactness. OG: I think we want an existence proof. Why doesn't someone speak up? PT: I have, ASN.1 DB: XSBC SP: Fast Infoset X.891 RB: BiM TK: Futjitsu Binary JS: Efficient XML SW: ESXML TH: XEUS JK: XEBU PH: CBXML PH: A round trip SOAP were as low as 12% and as high as 85% faster. They are decent, but not overwhelming. SP: Fast Web Services using a schema-based approach had a performance up to an order of magnitude better than XML. SW: I am interesting in more of the stack than just parsing. OG: Doability is not, does it speed up Peter's test case, but does it improve compactness and performance. SP: We need make a distinction between micro benchmarks and macro benchmarks. ML: You should not assume that once you have the binary XML format that you are going to do things the same way. DB: Optimizations are based on tradeoffs. We now have a prioritized set of properties suitable for multiple Web use cases - that didn't exist before. Therefore a new opportunity is available. Member companies with XML Binary experience and software can together produce a suitable, optimal XML Binary for the Web. SP: Parsing is not the only problem, but also data-binding. Schema-based approaches do extremely well with data-binding. RB: Of these formats, some of them are available, some are available in a "broken" form. Which of them are available? XSBC, X.69x X.891, BiM (but not top notch), JS: Is this the place for PR of the available formats? DB: It is good for multiple things to be available, but it would be better to have lots of things up there. -- lunch break AI OG write existence proof for doability of binary xml to characterization document RB: back to second question, need action to write existence proof for characterization document RB: last question, relationship of binary xml to xml, not separate, benefit from progress of xml, not tied to xml 2.0 SP: imo if xml 2.0 happens, lots will be addressed, including infoset. that would be ideal for defining a binary xml. binary xml should go hand in hand with xml 2.0 if that happens RB: for 2.0 is it worth doing for just the identified fixes? SP: but good to do everything in same effort, even if takes longer, because changes in xml 2.0 can affect what binary xml is doing SW: would changes in 1.1 to 2.0 affect binary xml? ML: possibly RB: xml 2.0 would drop things from xml, but parsers would need to parse 1.x JS: what's the benefit? SP: binary xml should not go unnoticed like wbxml RB: it failed for reasons JS: people want something other than wbxml SP: xml and binary xml are not different things RB: xml and binary xml do not need to be same spec or same working group JS: i do not want binary to be coupled to xml, get more degrees of freedom, binary can inform 2.0. binary for 1.0 does not mean no binary for 2.0 JS: military guys want to move data from current data, that is xml 1.0 SP: if we have binary 1.x and then need more in binary for 2.0, that would be bad RB: 98% of changes proposed for 2.0 are syntax changes, nothing would add information items SP: qnames in content adds a new thing RB: that could be supported regardless of what we do with 2.0 SW: main goal of 2.0 should have to be to have a binary format SP: if binary was part of 2.0, it would have a bigger impact on industry than if it was separate SW: if decoupled, binary xml would move forward faster and lead to wider adoption, pressuring 2.0 to include it RB: a single working group with xml 2.0 and binary xml would be a single working group with two task forces, much like two working groups with liaison CB: task forces can be created more easily than liaison with working groups SP: still think one working group is better than two DB: divide and conquer is good, need to deal with too many encodings with xml 2.0 going forward at the same time. need to track 2.0 effort and respond quickly if they are coming to consensus, significant portion of group's time would need to be outreach, to respond to people who see binary as a threat to xml DB: binary xml might also have recommendations for use, e.g. for long-term archival or cases where binary is a win ML: i am concerned about integratable to xml stack and how it might be interpreted. significant players need to be convinced for binary xml to succeed. if we require that every xml-related rec works with binary, that can be a straightjacket for us. RB: how much of xml stack do you get for free and what are you willing to lose? losing just encrypt and sign might be acceptable ML: needs to be evaluated, xquery might also have problems ML: if someone asks for every xml standard if we are going to be compatible, do we have to answer or just let the next group figure out what integratable means RB: it is unlikely for that question to be asked, comes from w3c team, if next group does not get it right, rec will not be published JS: can always point to our documents, say we try to maximize interoperability with xml stack, interoperate except where it would cause too much harm DB: alternative term is consistent with the architecture of www, concern with that, there is for instance no dom there RB: we had the property, but split it into other properties JS: arch document is too high level to speak of dom RB: it says how to design a data format OG: consistency with www arch document would be less strict than integratable with xml, formats other than xml can be consistent with the arch document, we could end up defining binary goo, not binary xml JS: what does relationship in the question mean? RB: technical relationship, not procedural JS: has not been much talk about technical relationship JS: are you looking for things like alternate serialization of an xml document? alternate syntax to communicate same things, xml 1.0 does not separate concepts from syntax RB: binary xml is binary goo that can produce an infoset RB: binary xml needs something similar to infoset, in the sense of being syntax based SP: people have realized infoset is after-the-fact and do not want the same for 2.0, binary xml has to be in the same umbrella as xml 2.0, if binary xml becomes a niche and not an integral part of xml world, we have failed. if we can convince people, our chance of success is much higher RB: would binary xml need to be more than syntax? SP: not a bad idea to do everything together, if xml 2.0 does not have many changes, it should not take very long JS: alternate serialization for information in xml document to allow xml to be used in places where it cannot be used today SW: can become rigid, e.g. forbid new apis by requiring use of only dom and sax, need a standard to replace dom JS: not prohibit innovation SP: inventing a new api would to me be like binary goo, it would not be xml anymore RB: no prohibition for new apis in xml today SW: i do not want to be out of bounds to even discuss new apis RB: that is out of bounds for a format, apis done in another working group ML: e.g. random access would definitely require a new api OG: i could capture discussion to characterization document but possibly not the conclusion RB: characterization document makes no recommendations on process, that is not its purpose OG: we can recommend it be a part of xml 2.0 effort RB: we would get pushback RB: developing something with relationship to xml, to explain why it is xml and not goo, it is xml because it integrates into xml stack OG: a paragraph saying it could be a part of xml 2.0 but if there is no xml 2.0 it could be separate? JS: should not speculate about 2.0, what is relationship of binary xml to xml today? RB: not integrate with just existing but with future OG: future could also bring other xml-related specs, not just xml 2.0, need to integrate with that, trying to get away from 2.0 RB: i do not see value in making w3c process recommendations in characterization document SP: still need to discuss internally, but not put it in a document OG: what will we say in this section? JS: see above, alternate serialization OG: we have to talk about data models in characterization, what is the information in an xml document? SP: next working group makes that decision, js's proposal is precise enough RB: important to maintain investment in xml stack, that is why a lot of properties in must list are xml properties SW: should be a part of the recommendation OG: core of recommendation is that we recommend w3c to develop or adopt binary xml, which we believe doable OG: i still do not understand relationship, js's proposal does not describe relationship JS: it does not explain relationship to rest of xml stack JS: desirable for it to work with existing and future xml technologies OG: what about substitutable alternate serialization? JS: good OG: would any subsequent xml-related spec be needed to work as well both for xml and binary xml? RB: yes, that is good OG: i still think we need discussion on what is the same information SP: application needs to get the same information as it would expect from an xml document SP: very few specification depend on xml syntax, rather on a data model RB: lots of xml-related specs in practice require xml syntax OG: maybe characterization should reference some data model SP: unlikely SP: liam said a strength of xml is not having a single data model OG: data model is required to interpret same information DB: xml c14n could be our data model JS: we do not want to go there, this document should not talk about data models OG: skipping over data models would not do DB: breaking any xml data model would be a problem RB: that is addressed by round-trippability SP: infoset is the information in an xml document OG: only one take of that RB: good to say it is impossible to pick a data model, so how do you define a serialization for something that does not have a data model? mark that we are aware of the problems but it is not our job to resolve those problems JS: xml has many different applications and by selecting a data model cuts some of those applications OG: something needs to go out or you end up with byte-for-byte equivalence RB: question is what we need to say OG: every use case requires keeping elements and attributes, but comments can often be dropped SP: what use case will not work if infoset is picked as the information of an xml document? JS: people in military will want byte-for-byte OG: having to keep every byte does not give enough flexibility to design binary xml properly JS: you can do all that, a part of generality OG: you are saying alternate serialization should work on a sliding scale where you decide at each use what subset of information you want to preserve, which knocks out a lot of current binary xml formats JS: have to support high and low fidelity, according to what people want SP: how much more compact and efficient than xml? JS: you should be able to do better than twice as good, you can have schema-informed without being schema-dependent, use schema to deduce what is likely, non-schema data will be larger SW: only need to preserve things that are not what you expect, e.g. if default quotes are double, quotes need bits only when quotes are single OG: discussion about what information can or should be dropped relative to xml, important point that needs to be hashed out during actual drafting of standard, can do this without dropping anything, but possible to drop something, both of which are important for some use cases, giving binary xml more agnosticity than xml -- break ACTIONS: SW - check out the QA handbook to see if can be used as ref for abstract = scenarios SW - produce small proposal on abstract scenarios (may be canned). All people with products on product list : send in compliance with key = properties ------------------ MINUTES: SW : would like to add items to meas doc. will E-mail to list. try to = boil down things sent end-to-end, or router, doc on disk, (i.e. = different scenarios). OG : that is not what scenarios was about. one notion was that use = case/ reqs had items over different ranges.. SW : abstract and property profiles SW : in each actual use case, there is broad range of different things = going on. abstract scenarios view this a different way. this describes = how XML is produced and consumed.=20 OG : what does all this lead to? SW : should be part of meas methodology, things that make sense in some = and not others. ex. signing and encrypt. OG : how is different then use cases?=20 SW : use cases are user defs of what needs to happen, I am talking about = digested abstract version of what is actually happening. SP, OG, RB : why do we need this at all? RB : thought we needed when we had more robust meas document. not = enough time left to go into this level of detail. SW : try to make as simple as possible (12 to 15 lines). point is if = don't have, then either with theoretical or empirical comps of = strategies, don't have common digested path and what is being = tested/compared. analogous to having different kinds of test data. OG : I don't think we should bother with it. RB : agrees, maybe useful for next WG SW : thinks anyone looking at how this will be tested will miss = scenarios RB : that is OK DB : QA handbook could provide a good framework for testing SW : ref handbook? DB: yes RB : agrees SP : thinks good idea, but we don't have enough time SW : does not think complete enough without this DB : canonicalization in char doc that is important and would require = some effort RB : not sure how it would fit in, believes editors have it in notes OG : confused, what does this have to do with bin XML? DB : compactness and speed would be affected ML : would have to serialize but not on the wire DB : example: sign a compressed document. if round trip and decompress, = must produce canonical document so signing would work. OG : compression does not have anything to do with it. RB : is this covered into integ into XML stack? should be captured at a = higher level somewhere and doesn't want to get into details. ML : bigger trap - must think of this as it is being done or it could be = broken later OG : we could introduce new canonicalization SP : can use decision tree to test this. all items on wire may be = affected RB : has been discussed DB : next group will examine some holy cows - should we suggest anything OG : we have discussion between XML and bin XML of standards that might = need to be modified SP : agrees. RB : other groups with an interest can look at.=20 OG : interesting and worthwhile to measure known standards against = properties on must/should ML : can have creators do this RB : valuable to do one as an example FastInfoset: Transport Ind. : yes human language : uses UTF-8, difference between tags and content=20 Royalty Free : yes Platform neutral : yes Media Types : yes Direct RW : yes compact : yes fragmentable : yes streamable : yes self-contained : yes version ID : yes generality : yes round-trippable : for some equiv class, not goop small footprint : yes space complex : depends on use of dictionaries - have options to include = or exclude (can be external) proc effic : yes extension point : likely, text proposed as ballot comment prevent robustness : depends on being well -formed - spec does not = address this DB : points out why problematic : can't assess any instance against this = RB : syntax errors JS : once encoded, want to isolate problem to sub-trees RB : TLV example : can recover because of partitions (possibly). depends = on where error occurs RB : XML spec says not allowed JS : impact of lengths on streaming? if you have long lengths, need to = know in advance RB : can you have adjacent text nodes? SP : you can do that to do things in chunks deltas : no=20 does it prevent : no encryptable : yes, to extent of self-contained things=20 signable : canonical not part of spec - new one in works that will = define ACA : yes using extension points (would not natively support) random access : same as ACA embedding : DNP efficient update : no BiM: supports all W3C (except RF) (see matrix for rest) SW : how is X.891 streamable? PT : if you have external dictionary, you can do it SP : one more comment on embedding support - can support MIME type SP : question on BiM signing - does it have canon form without schema? RB : yes SP : are attributes ordered? RB : yes SP : does this affect performance? SP : does not think can always be canonical RB : 3rd parties considered it signable SP : encryptable : since schema based, less encryptable RB : does not prevent OG : where should this data go? RB : thinks should not be included in the spec OG : would strengthen existence proofs, what should we put there? PT : would find it unusual to not mention them JS : disagrees SW : codecs are put through tests like this SP : should not be products, should be specs JS : might have a problem with this RB : thinks it would be good to do SW : should not matter if product or spec SP : should not be named PT : some names should be used SP : we should not publish matrix RB : comfortable with names of products and specs, example SVG shows = impl names SP : disagrees - advertisement RB : except for Oracle and Fujitsu, all other names are products/specs --adjourned