W3C Workshop on XML Schema 1.0 User Experiences and Interoperability

introduction [slides]

Michael Sperberg-McQueen: goes over process document introducing concept of w3c workshops function of workshop to advise membership and staff as to what the w3c should be doing purpose - gather user experience examine problems around xsd spec and test suite examine proposed ways forward. gauge interest in further work in xsd recommend course of action

Paul Downey: slots for each preso, 15 mins followed by 5 mins discussion keep record of issues raised at end of day, we'll spend an hour discussing what topics to discuss tomorrow

WSDL overview [slides]

Jonathan Marsh, Microsoft, Chair of the WSDL Working Group

Jonathan Marsh: Web Services is a simple concept… send some data to a service, it does work for you. what goes on the wire is data, typically xml in a message format. would describe that message using a schema language, typically xml schema. schema should be sufficient to describe the message. wsdl relies heavily on xml schema. response message also described in xml schema. wsdl is essentially a wrapper for schema. request and response messages are related in what is called a message exchange pattern (MEP). wsdl introduces a higher level concept of a group of related operations called an interface. have concept of binding. soap envelope itself is not described in a monolithic schema. wsdl uses the schema to describe the payload. wsdl is very similar to xml schema. has component model. has import and include. qname references. component designators. also some blurred lines of responsibility. wsdl-defined schema extensions. versioning. describing base64binary for describing binary data such as a jpeg. wsdl wg developed an annotation that can be added to schema components to specify the mimetype of binary content. describing references. wsa:EndpointReference… provides address and other metadata about an endpoint. created schema annotation: wsdlx:interface that can be added to schema component to indicate the interface of an endpoint reference. versioning. wsdl wg has one remaining open issue… versioning. known problems with <any/> wildcard and UPA, etc. some proposals have been made

Noah Mendelsohn: UPA [Unique Particle Attribution] [Analysis of the Unique Particle Attribution Constraint] means you know not only that an element is valid, but also which schema component makes it so.

Jonathan Marsh: Henry Thompson came up with algorithm: "validate twice" [Versioning made easy with W3C XML Schema and Pipelines]

WS-I Report [slides]

Erik Johnson, Epicore, representing WS-I

Erik Johnson: lots of complaints from users in web services community regarding use of xml schema. not a lot of support for creating a profile of xml schema (subset of features) because you could end up profiling away important things for various domains. UPA rule not well understood and certain tools actually intentionally ignore it. easy to dig yourself into a hole designing extensibility models if you aren't an expert. no one really agrees on the correct way to do this. because modularity mechanisms are not explicit, difficult to recognize this if you are downstream of the original author of a schema. schema validation is rarely used in context of web services. performance issues with large documents. interoperability issues w/r/t xsd mostly manifested at design time. tried to make a list of schema constructs that were problematic

<Henry Thompson> Demo and web form for try-it-yourself validate-twice pipeline are linked from Public Pipeline Server

Noah Mendelsohn: to what degree is there consensus on which programming languages to support?

Paul Biron: roundtripping… if we start from this class structure, we know how to do this… the other side is much harder?

Paul Downey: interest in taking hash table, convert to xml and back to hashtable

Michael Sperberg-McQueen: finds that requirement puzzling or troubling

Jonathan Marsh: the way WSDL 2.0 is designed, you use the schema but there are extensions that can help in mapping all just frosting

Noah Mendelsohn: important point we may want to track: in some scenarios, only the typing of messages on the wire is important, but in other scenarios, we want the same sort of mapping for hashes as we get for integers. Many different programming languages have a concept of integer, and many of those types are close enough to support interop across languages. In those cases,users look to SOAP to help them get an integer in one language transmitted to an integer in another. To a lesser degree, the same is true of types that associate string keys with values, such as java.util.HashMap.

Michael Sperberg-McQueen: isn't a well known idiom in schema to describe that would that help? if there were schema types that mapped to hash tables?

Ümit Yalçınalp: in lang binding (jaxb) roundtripping is extremely hard… helps to be able to retain that information

BT report [slides]

Jon Calladine, BT

Jon Calladine: been involved in WS-I in the XML Schema workplan WG. manager of web services integration team in BT. me thanks. had a promising start in taking critical systems and representing them as web service. when we move from simple rpc to doc literal made things a little bit more difficult w/r/t interoperability at design time. interoperability was achieved using rpc/encoded. by virtue of simplicity. gave them the model that people wanted (data binding). code generation

Ümit Yalçınalp: what is meant by contract first

Jon Calladine: working from existing systems so they converted existing interfaces into wsdl interfaces. describes bearer mgt system architecture. 300k txns/hour. wanted to future proof the design. used contract first approach. started to use schema constructs that had not been used in previous generation of web services. had to constrain schema designers. give them a profile that they can use. not the answer because these things aren't carved in stone. trying to constrain developers is not a rewarding course of action. real killer for gateway application was mixed content support in toolkits. was either incorrectly represented or rejected. web services is actually supplanting existing b2b solutions. currently not publishing schemas. but this is an increasingly important demand by customers. don't want to process xml as documents. not all of BT, but a majority. tendency towards complexity can be constrained to something very simple minded. see graphic in jonc's charts!

Alexander Milowski: interested to know whether they plan on using validity assessment

Jon Calladine: not presently. partly because of performance

<Henry Thompson> curious to know if performance was improved, would their opinion change?

Michael Sperberg-McQueen: struck by regularity with which code-generation is coming up in discussion

<Henry Thompson> How much improvement would be required?

Noah Mendelsohn: some of us proposed in xsd wg but not formalized as a requirement

Paul Biron: recalls code-gen as being in the forefront of many people's minds

Michael Sperberg-McQueen: may have been blocking it out

OAGi report [slides]

Michael Rowell, OAGi

Michael Rowell: define schemas for such things as purchase orders, invoices, etc. use xsd as an alphabet if you will

<Henry Thompson> Is code generation == data binding, or more than that?

<Chris Ferris> yes, code generation == data binding

Michael Rowell: OAGi works with vertical orgs as well. identifies 3 ways to handle extension in xml schema xsi:type, any and extend/substitute. wants substitution groups for local elements.

Paul Biron: this has been asked for by many for several years

Michael Rowell: complexType derivation by restriction. tools could help in this area. schemaLocation consistency

Noah Mendelsohn: when you say redefine doesn't work across namespaces. if I'm going to write a redefine of a type, need a separate document

Michael Rowell: if I want to use a document you've defined but add an element to that, I have to redefine in your namespace

Noah Mendelsohn: for elements or types?

Michael Rowell: could be both

Noah Mendelsohn: also important to understand what the instance documents should look like. simpleType union, not a lot of tool support

Paul Biron: what do you mean by tool support

Michael Rowell: code generation/databinding issue mostly

Sridhar Guthula: does OAGi have a recommendation of constructs that should or should not be used?

Michael Rowell: yes

Michael Sperberg-McQueen: had question on charts regarding bags

Michael Rowell: model group all. treating a collection within a parent bag without particular order

Paul Downey: all used to bind java classes

Paul Biron: the problems is the restriction of xs:all

Noah Mendelsohn: history - what we discovered is whether we like it or not, the order of elements in xml is important. whether your use case makes it interesting, xml as a spec says that they are different documents. order of attributes is not significant. asked to add a standard annotation that indicates that I really didn't care about the order. some of the trouble comes from the infoset spec

Michael Rowell: be nice if it were possible to have a bag of bags

Noah Mendelsohn: really infoset spec that I am appealing to. yes, we can add all kinds of annotation, but we've chosen not to do so thus far

?? co constraints? are these a requirement

Michael Rowell: yes. also capturing the constraints

Waleed Abdulla: any idea how many oagi users are using classes and code generation versus who are processing as documents?

Michael Rowell: we do track downloads, but as far as how they are used, we don't have stats but a large majority are using databinding discusses standalone BOD

Paul Downey: have we captured this point?

Michael Rowell: granularity of namespaces

[break]

HL7 report [slides]

Paul Biron, HL7

Paul Biron: healthcare messaging and documents. didn't see that many problems on the document side, more on the messaging side. things we like: xml syntax, separation of elements and types, type derivation having expressed in xml is a win. don't believe that relax NG can do types. interop problems: specification inaccessible for schema authors.
… :) complicated writing style, features incorrectly implemented.

Ashok: so it's not that the spec is hard, it's just not that written well?

Paul Biron: correct we tried a lot of things for the spec in the working group. no sure if other ways to write specs would have made things different.

Noah Mendelsohn: it's hard to make the spec simple, because of the component model.

Paul Biron: schema writers only cares about syntax, the CM makes it difficult for people to understand. but I wouldn't remove it chameleon include, mixed contents, …

Jonathan Marsh: some validation tools have an option to turn off UPA

Paul Downey: xmlbeans has an option for that

Paul Biron: it would be nice if you can only extend in the middle of things all our schemas are generated from UML models. we would like to express that one complex extends an other we'd like to have all of the UML attributes come first, then the associations but the extension forces us to have a linear serialization: can't have all the attributes first for a Person instance

David Ezell: so HL7 found the readability of the documents to be more important?

Paul Biron: yes I wouldn't want to impose the extension ordering, should be up to the authors to define it. because of co-occurrence constraints, almost everything in our schemas are optional the element is required but the attribute is optional. all attributes are optional. for the case where we have co-occurrence constraints, we put schematron rules originally we had a xpath in xsd:appinfo, but we now do it in schematron occurrence-based vs value-based co-occurance: first is "if this attribute is present, then those others must be as well", second is "if this attribute has this value, then …"

David Ezell: at NACS we had similar problems as well, ie having attributes optional related to EDI vocabularies

Paul Biron: local subsitution groups would do well, compared to type wild cards anonymous types don't have an identity :(

Michael Sperberg-McQueen: do you groups within all groups?

Paul Biron: no, we don't need that. we have very complex constraints [and indeed, Paul gives a very complex one :) ]

QuickTree report [slides]

Sridhar Guthula, QuickTree

[Sridhar presents his SOAP security module (SS)]

<Paul Downey> isn't libxml2 written in C?

<Philippe Le Hégaret> Paul, correct

Sridhar Guthula: features: xml denial of service prevention, wsdl based access control, soap structural and parameter validation, sql and command injection protection, streaming mode interface. all of this is done in a streaming mode wsdl based validation: xml schema 1.0 validation engine (written in C). all the wsdl get converted into schemas. ids, keys are make streaming harder for the datatype validation, we used an existing one. only developed the structural part. everyone develops their schemas but no one is checking at runtime because of performance hits for any schema element, we have callbacks to implement the behaviors (e.g. ACLs) incremental levels of compliance? it would save us time. currently everyone just claims to support "XML Schema" most of the xml schema do not consider security, like checking facets adding concepts of aspects: conformance/compliance levels, ignore order/c14n, DoS, xsit:type support. the same schema goes into several users, including your customers. having a security profile could help: "don't care about datatypes", etc.

Paul Biron: c14n is an important one but don't have a solution so far

Sridhar Guthula: I don't have one either

Paul Biron: ie write a schema that will validate an instance and its c14n instance

SAP report [slides]

Ümit Yalçınalp, SAP

Ümit Yalçınalp: SAP provides business applications that most companies rely on, such as your HR and CRM applications. Our platform is built as a SOA that relies on Web Services and common XML datatypes. We create and support lots of applications and we need common datatypes that will be utilized among these applications in our platform. We use XML Schema for data modeling and use WSDL integration of Web Services. Our platfrom supports two different language and development stacks, namely: Java and ABAP.
Usability: We need to rely on common datatypes and common dictionaries that our applications build upon and we need a controlable environment not arbitrary designs with XML Schema. Otherwise it will be unmanagable since there are many methodologies that exist today as indicated in our paper. Since data modeling for our applications comes first, we're using UN/CEFACT Core component specification as the basis for our datamodeling. We regard UN/CEFACT Core component specification as a profile for XML schema ( a profile that limits use of extension, restriction, forbids substitution groups, redefinition, element wildcards, only the facet aspects is allowed in extension/restriction). It constitutes the basis for our data modeling. This data model first approach allows us to drive both of our language stacks in a uniform way for the applications we enable.
Our message to this group based on input from various groups within the company is to "solve the versioning problem!" The versioning should define the semantics as well by answering specifically "what does it mean to have a major or a minor version?", namely compatibility vs non-compatibility. This is why we also like UN/CEFACT, since we can rely on a clear definition of versioning in our platform. Unfortunately, the lc124 issue in wsdl will not be resolved on a timely manner in order to create a consistent solution in the industry. The LC24 issue will exist with WSDL 2.0, since a solution with major/minor version discipline may not be specified on time by the XML Schema wg.
There are several problems we found with XML Schema which are listed in this slide. For example, until next version of JAXB 2.0 is adopted, substitution groups are not supported. We did not find utility for notations and the design of media types in the WSDL note even makes them useless. We find "redefine" to be very problematic and not used at all…

<Paul Downey> WSDL Primer on Versioning states "versioning is difficult, but you should anticipate and plan for change." guess that should be from version 0 : http://dev.w3.org/cvsweb/~checkout~/2002/ws/desc/wsdl20/wsdl20-primer.html#adv-versioning

<David Orchard> I wish the primer had the many examples that I submitted a while ago… But it will soon.

Noah Mendelsohn: is it a compatible change to add a country code to a phone number on a PO?

Ümit Yalçınalp: As I've defined the rules, yes, it's compatible.

Noah Mendelsohn: but I can point you to software that will break. we need to have a clearer understanding of partial understanding

Ümit Yalçınalp: there are two dimensions we have to distinguish: syntactic extensibility vs. semantic incompatibility. Only if you distinguish them do you have a way to talk about them and differentiate them

<Michael Sperberg-McQueen> [I think the only coherent way to describe partial understanding is with reference to a particular set of intended declarative or operational semantics]

<Chris Ferris> I think that there are also multiple dimensions to versioning… schema and also implementation.

<Mark Nottingham> Noah made an important point; that extensibility and versioning is dependent to some degree upon the intended use of the document, and a format might have many uses; i.e., what you consider backwards-compatible may not be by another user of the document.

Michael Sperberg-McQueen: can you define the versioning problem? it's not clear that everyone is thinking about the same thing

Ümit Yalçınalp: for example, have a rule that says "if backward incompatible change, use a new namespace" combine components and vocabularies of different versions

Noah Mendelsohn: the notion that a change is incompatible or not isn't inherent in the format. for examplpe, is adding a country code in a purchase order incompatible? might depend on the software

Ümit Yalçınalp: incompatible to the structure, incompatible in changing the semantic

<David Orchard> And how does each "user" of a document describe/specify/advertise it's partial understanding? Is there then a composite schema based upon the union of all the partials? Some thoughts prompted by Noah: Partial Understanding

<Chris Ferris> if the software that consumes the schema is say code generation software to generate a data binding for a web service… then the issue becomes whether the generated code is capable of dealing with instance documents that have been generated using subsequent revisions of the schema that incorporate compatible changes. even if you put in all the wildcards, etc. much of today's tooling simply ignores them in the context of the generated databinding code… tha

Semeiosis report [slides]

Steven Ericsson-Zenith, Semeiosis

Steven Ericsson-Zenith: dealing with open source platforms (PL/SQL, PHP, …). scenario: open content development http://xml.memeio.org/. interested in state questions: changes, author intent [draft, …], … the schema expresses glossaries, concept distinction, references. so started to look at the spec in January. the schema spec is a difficult read; it's not formal enough. there needs to be a clear statement about recursive types, needs to stop the recursion. needs to describe sub sections using recursive types, but it has to end. schema and instance licensing. if I do discovery on the Internet, can I use the data? if i discover data on the net, what can I do with it? how do I tell? who is responsible for enforcing the rights? so I issued a memeio instance license as well. interested in two types of constraints: data constraints, and rule constraints. "if this value reaches this level, then you need sign-off" I want to be able to trust the timestamp also. [see also Steven's revised user experience report]

<Michael Sperberg-McQueen> I think the Creative Commons license markup is designed to address this

<Steven Ericsson-Zenith> Creative Commons does not have a markup license - it is something that I have been trying to make CC aware of for sometime. You can license the schema via CC but there is no suitable CC license for instances IHMO.

Paul Downey: how would that surface in schema?

Steven Ericsson-Zenith: as a constraint.

Paul Biron: not a constraint on the receiver, it's metadata

Paul Downey: so ability to annotate schemas?

Steven Ericsson-Zenith: I'd like to have constraints on types. when i specify a type, i'd like to say, about a particular time, the content of a particular element, that it must be automatically generated, that it be state, not declaration

Noah Mendelsohn: I think what you're looking for are annotations, outside the scope of schema.

Steven Ericsson-Zenith: I'm looking for strong inference, annotations are not string enough. what i want is stronger than an annotation; I want a strong inference. I didn't use attributes at all in the design of my schema. not clear when to use elements vs attributes. I want to do one instance that defines the glossary, and I want to refer to that from the schema.

[lunch]

Microsoft report [slides]

Douglas Purdy, Microsoft

Douglas Purdy: Responsible for all remoting, including Indigo. Level set slide: We segment this space into two areas: Infrastructure protocols (SOAP, etc.), Application protocols (WSDL, XSD). We're in the business of supporting Infrastructure platform, and tools for Application protocols. Most customers don't care about angle brackets, we want to serve them as well as the non-angle folks. People want to be able to see the same programming model on each end, but we're working towards platform neutrality. The infrastructure protocols have been profiled. Application protocols have been profiled only in a limited way. E.g. XSD in WS-I BP doesn't make restrictions. Most users think there is a profile for XSD. Web services are protocol agnostic. That's what our customers want. MS has a runtime called the CLR. Want to make sure developers have a first-class experience in the CLR. We believe we need to allow versioning of CLR types in an interoperable way. E.g. I want to interoperate with IBM, BEA, don't care about angle brackets. Want to support v1->v2->v1 scenarios without loss, even if there is a schema validator in the way. Want to support reference and value semantics. We want to be able to support graph semantics. E.g. Want to send a hash table and preserve the references. At Microsoft we have three levels of users. We want to allow people to mess with the angle brackets (and the protocols themselves), we want the mid-range to use angle brackets but not implement the stack themselves, and the lower-range where they don't want to see any angle brackets at all. The programming model does actually matter to these users. They want intellisense. They want CORBA… But we also want to support all of XSD (vertical schemas, etc.) This means for some schemas you don't get intellisense, you just get a DOM. All schemas are supported, but some give a better programming model experience. In fact, there is a "de facto" profile of XSD, from the SOAPBuilders. Pretty much sequence and simple types.

<Paul Downey> implicit profile also available in what works with tools

Douglas Purdy: We wanted to add versioning, graphs, generics, but it must be completely legal schema. Some tools can't process anything beyond sequence, simpleTypes. E.g. need to extend int with attributes for versioning and graphs. We have solved the versioning problem without scratching UPA. We tried a full XSD->CLR mapping and it really isn't usable.

<Chris Ferris> will have them presently

Douglas Purdy: We have great XML programming models, but in some cases we rely on a fallback model which gives you a downlevel experience, eventually falling back to just getting the SOAP message.

Ümit Yalçınalp: What is the subset that gives you the best model?

Douglas Purdy: (Shows some details scribe missed). Versioning is solved by adding a version delimiter between extensions.

<Michael Sperberg-McQueen> addition of versioning-fencepost elements avoids UPA problems

Douglas Purdy: Also provide "unknown serialization data" in the programming model to get at extensions we don't know about. Things to explore: UPA required version delimiter, a bit messy in the instance. Data is exported as dc:int… Don't think profiling at W3C is good idea. Domain-specific profiles might be acceptable, e.g. WS-I for web services.

Chris Ferris: xs:id, xs:ref like Guha's id?

Douglas Purdy: no. It's just an ordinal.

Danielle?!?: Same experience as with the Word schemas?

Douglas Purdy: No. Our customers are just programmers.

Mark Nottingham: You said there's support inside MS for a profile in WS-I?

Douglas Purdy: Yes, we were worried about a profile that limits Office docs sent to a service. A profile doesn't mean that everyone doesn't need to implement all of schema.

Chris Ferris: What's different about JAX RPC - there's a defacto subset that is defined and supported in all tools. There isn't an interoperable problem, just usability.

Douglas Purdy: There's some stability that a de jure profile gives. There are a set of best practices that will give you a good user experience.

Microsoft report [slides]

Derek Denny-Brown, Microsoft

Derek Denny-Brown: Representing toolkit builders. XSD has worked quite well. Nothing else that meets the requirements (e.g. RelaxNG insufficient for data binding). XSD adoption is accelerating. Office customers use it. That makes breaking changes hard to roll out. Takes years to roll out a complete set of products depending on XSD. Taken years to get a common understanding of the spec, it's just now starting to work across the board. People are moving beyond XSD, they are more interested in the business case. E.g. nobody cares about version delimiters, since fewer people are touching the raw XML. Many misunderstandings have been clarified lately. Many unresolved questions, like how to package a set of schema files. Key piece which is left to implementations, and they all do it differently. 5 years later, UPA is finally being resolved. Other examples abound. Test suite is a big help. It's taken us years to figure out how to do a complete set of tests, and to figure out how people really use it. Primer showed how to do simple stuff, real world is much messier. Customers are using substitution groups and more complicated elements. But the average user needs tool support. UPA vs Wildcards has been a big issue for us. Half our products love and depend upon UPA. Half hate it.

<Steven Ericsson-Zenith> I think it important to capture derek's points here about the confusion and the process associated with developing concepts. it is a classic problem. and often gets excluded from the minutes

Derek Denny-Brown: Derivation / Substitution groups. Physical representation ambiguous. Redefine has been a headache, it's hard to understand what it's modeling. Not made clear in the schema spec why redefine is done the way it is. Spec is complex and it takes a long time to figure out how the pieces interrelate. To figure out how to do validation requires scanning the whole Schema spec. No single set of rules. Must do cross-annotations in your head. That's why it's taken so long for all the vendors to get the model right. It would take years to roll out a complete set of products. Need a way to change out pieces of your infrastructure. Would take 5 years to get co-occurrence deployed across the board. Need to keep backwards-compatibility for customers.

Paul Biron: You said the number of errata: I believe there are so many errata because of the writing style. Hid the cross-relations inherent in the spec.

Derek Denny-Brown: Not sure how to fix, but this is part of the problem.

Paul Downey: No formal notation (though it looks kind of like one…)

Derek Denny-Brown: Many edge cases and holes are not apparent.

IBM report

Noah Mendelsohn, IBM

Noah Mendelsohn: IBM's experience, some of my own perspectives too. Will be at a higher level than many other talks. The core of the schema WG has felt some responsibility for the points of pain, but I'm gratified that: Schema has met its main goal. It is ubiquitous, though not as pretty or simple, and there are still some painful problems. The goal was to build a language everyone would use - which is where we are today. Some of the more controversial decisions like UPA were hard, remain hard. Screwing up the current value would be the only thing worse than failing to improve the situation. Web services aren't the only use case we need to consider. IBM and MS have similar breadth. Schema integrates both data and documents. Lotus Notes is architecturally much like XML inside. Integrates data and documents and allows correlation, with new powerful types of applications. Tricky balance between making this work for various communities. Java and .Net are important, but many pain points aren't about Schema. e.g. mixed content came from XML. Need to be careful when we set the bar about what kinds of changes would fly. Stability vs. change: Open question whether validation will become more common as perf improves. Can delegate more checking from the app to the Schema. Must be careful when we change something (ref XML 1.1). Among the clients are the applications which rely on validation. IBM builds APIs that are isomorphic to the component model. Prefer not to see changes there except for high-value reasons. XML and Schema are about loose coupling, need migration strategies. Punch line: "Changes must have such compelling value that both end users and middleware vendors will deploy aggressively and ubiquitously." Schema WG focuses on the right thing for 1.1, but doesn't always look backward. If the next version is similarly ubiquitous, we're on the right track. What are the problems with XML Schema? focused on some less-discussed. Complexity of local vs. global elements. Unnecessarily forbidding syntax, but it might be too late to fix that. Telling people just to implement part is death to interop. Need to tell people to implement it fully and correctly. Profiles are fine for optimization, but not for general-purpose work. Mixed content and java.lang.string don't match, but mixed content is useful. Coming up with schema features that really deal with a compelling subset of the needs for evolving XML vocabularies could be of sufficient benefit that users and vendors would adopt. However, we need to be realistic: we can devote time to trying to design something, but we should only go to Recommendation if we have feedback from users and vendors that the resulting Schema 1.1 will be welcomed and widely adopted. We should focus on tooling and conformance. We should consider only high-value work, and don't follow XML 1.1's failures. Weigh the amount of disruption of new features. Don't dumb down XML or Schema to fit programming languages. Lessons from working on XML Schema: "no such thing as a simple feature" there is combinatorial complexity. "big committee -> big language" though right now we have too small a committee. Need more resources to deliver more than maintenence. "documents and data together are cool" requires some compromise by each side. "make realistic schedules" thinks a pruning stage was short-circuited.

Chris Ferris: Devil's advocate: In XML 1.1 we added something, we can't roll it out. What's different?

Noah Mendelsohn: Need compelling value, e.g. aligning decimal with IEEE. Need to choose areas that don't unravel the whole spec. XML 1.1 hit some key types (names, strings) in Schema that made it very disruptive.

Sun report [slides]

Kohsuke Kawaguchi, Sun

Kohsuke: Sun supports XML schema in Java APIs, NetBeans, J2EE deployment descriptors, Liberty, etc. Issues we experienced: JAXB allows type substitution by default. Makes it hard to map into the programming model, since they might be extensions. Need to account for the possibility that substitution happens. Good data binding is difficult. , even when extensibility points aren't used. Complexity is staggering. Requires 5 people full time or you won't get a good implementation. Unfortunate, but not clear how to fix. Need better data-binding hints. E.g. root element designation is missing. convoluted content models are allowed, and data-binding tools can't infer the intention. Identity constraints are hard to exploit by data-binding code. UPA has been mentioned before. More trivial things: no annotation for particles, attribute uses, … Wherever you can use named type you can substitute an anonymous type, except for some ugly exceptions. The W3C test suite contains some erroneous tests, which leads to bad implementation. Even though there are many tests, it's not clear what the coverage is. We estimate that only half of the spec is covered. Test suite isn't up to date with errata. There isn't an appeal process in place, so if you suspect a test is wrong there's no nice way to make that claim.

Philippe Le Hégaret: Even to the Schema WG?

Kohsuke Kawaguchi: Maybe it hasn't been advertised. User issues include: that they complain to us about… Error messages are cryptic. Spec doesn't say how schema documents are assembled. Some tools ignore malformed schema location, this is hard to figure out what's wrong. Innocent developers suffer. UPA violation is impossible to explain (and even detect for an expert). Tools generating bad schemas (mentioned the brand here :-) But newcomers find published best practices that are actually bad practices. Conclusion: XML Schema evolution will cost a lot. Have to implement across a large number of APIs, tools, specs, etc. Quite a gamble. Nice if a new version actually delivered value, but it is a bigger risk that 1.1 won't deliver the expected improvements. Focus more on interoperability and testing (better test suite) for 1.0.

Ashok: Test suite: typically yes/no validation test. Would you like PSVI tests instead?

Leonid Arbouzov: In Java conformance test suite, you get pass/fail.

Derek Denny-Brown: We don't necessarily expose the PSVI.

Paul Biron: Some parts of the spec can't be tested by pass/fail. The test is what's the value of a property in the PSVI.

Michael Sperberg-McQueen: Thinks further cultivation of the test suite is good. There is an appeal process in place, though it's never been exercised.

Michael Sperberg-McQueen: Go to the public schema page to see the process.

Oracle report [slides]

Ashok Malhotra, Oracle

Ashok Malhotra: Speaking for Oracle, difficult questions to be directed to my colleagues ;-) We map XML Schema to object classes and database structures. We map XML documents to object instances, and database fields. We've found it different to map choice, derivation by restriction, facets, derivation control, namespace restrictions, annotations, substitution groups. Choice allows variations in the structure of an object, which most object-oriented languages don't allow. Perhaps the languages need to be fixed. Derivation by restriction has no equivalent in oo. In the db we map both the original and the restriction to the same db field.

<Noah Mendelsohn> Doesn't derivation by restriction correspond to an OO construct in which the "setters" ensure that the values are bounded (or not exposed at all?)

Ashok Malhotra: Then check some stuff at runtime. Facets don't have an equivalent either. Fixed and final derivations and substitutions. To analog of block and block default, etc. Wildcards can be restricted to elements in a particular namespace, no equivalent.

<Mary Holstege> i think there is a hidden assumption in these "there is no OO equivalent" statements, which we ought to try to tease out later on

Ashok Malhotra: Annotation problems A dev wants to annotate a reference, but there is also an annotation on the element. Which has precedence.

Mary Holstege: You lose the annotation on the reference.

Ashok Malhotra: Don't like this. Many other similar cases. Some arbitrary decisions: Annotation on attribute ref. Annotation on a model group ref or and element ref is associate with teh {particle}. We store schemas and manage them in the database. You can register a schema, validate docs against it, generate a schema from doc. etc. In doing this, what do you do with substitution groups? You have a column corresponding to the head, and some extra stuff that tells you what the real type is. Also tells about namespaces, etc. Surprised noone has talked about mapping datatypes, like duration. Datetime where tz is optional is a problem. Generally just store as a VARCHR. Standard mapping for many datatypes, but you can specify mapping using schema annotations too. Key/keyref aren't enforced at the schema level, since they apply within a doc, instead of across a doc collection, like SQL. Got the sense that key/keyref not very useful. From a db POV

Kongyi Zhon: Not so useful in the db, e.g. a unique PO number across all the POs. In practice they don't create a POcollection doc. Disconnect between notion of collection in the db and the doc in the Schema world. Some workarounds, but nobody is doing it in practice.

Noah Mendelsohn: History was that the impetus came from db vendors. Ironic, eh? Lesson to be learned about complexity.

David Ezell: Didn't like key/keyref until started to work with Office docs.

Derek Denny-Brown: Our db guys like Key/keyref, but there isn't a good collection notion.

Ashok Malhotra: Wildcards don't allow excluding a specific namespace, but still allow the null namespace. Redefine is the only feature we don't support. No customer uses this!

Michael Sperberg-McQueen: Question for you and group at large. Hear UPA is a pain point, which indicates a willingness to entertain removal. But I'm also hearing the biggest problem is that databinding rules don't understand choice, which is quite fundamental. Can't handle substitution groups, which are well-defined. Don't do well with derivation by restriction. Will relaxing UPA help at all?

Noah Mendelsohn: Your paraphrase surprises me. UPA is a two-sided thing. Data-binders depend upon it. The other side is that users can't tell whether their schema is good.

Kohsuke Kawaguchi: No harder to bind with or without UPA.

[break]

Rogue Wave Software report [slides]

Allen Brookes, Rogue Wave Software

Allen: most of our points are the same, but with a different perspective. our product (implemented in C++) would be one of those "incomplete" ones people complain about. most experiences have been positive. we've had trouble with: difficult schema features, invalid schemas, versioning, and compiler limitations. this last is a problem with C++ compilers. issues include: choice, complexType restriction, and mixed content. "choice" is inefficient, must represent all possible types. restriction causes class members to change type -- e.g. from base type of "string" to derived type of integer. mixed content -- we haven't had any verified needs for it. common user errors: ambiguous schemas, circular includes, illegal restrictions, namespace issues, non-standard extensions. these last would normally make documents smaller (leave out nil, etc.) we understood the versioning model to be predicated on use of new namespaces for each rev. actually they want to ignore extra elements and attributes. compiler problems: large number of classes; # of symbols in shared libs on some platforms is limited (e.g. 7000 classes) schema doesn't directly support subdividing the schema to work around this problem. profiling might be helpful, but people will continue to ask for features even if they aren't in the profile. is the problem really with Schema or is it the datamodel (i.e. class model in C++) XML Beans appears to be a viable alternative. call of open-source project to implement XML Beans in C++.

Paul Biron: circular includes (for example) is explicitly allowed.

Michael Sperberg-McQueen: yes, circular includes are not a problem.

Paul Biron: spec says something like -- if you've already included something and end up with a duplicate name treat it the same.

Michael Sperberg-McQueen: important to understand "union" in the sense of sets -- {1} U {1} -> {1}

Chris Ferris: it seems clear that clarification of how "include" works would be very much appreciated.

Jonathan Marsh: WSDL has had a similar experience.

(scribe note -- ask Jonathan Marsh about his added feature… missed it.)

Noah Mendelsohn: why "schemaLocation" is a hint has a history. But validators might not trust document authors.

<Michael Sperberg-McQueen> WSDL eventually decided to say that including two schemas in the same WSDL document is itself a *hint* that they should be accessible for import to each other i.e. the schema processor should perceive the context and DTRT

Acord report [slides]

Dan Vint, Acord

Dan Vint: I come from a publishing background, so I have perhaps a different point of view. Acord represents the insurance industry — a non-profit. We have three specs, one for each of the major insurance lines (Property & Casualty, Life, Reinsurance). P&C started with EDI. Life was an MS initiative. RLC started in Europe. we like: have schema in XML, basic datatypes, built-in document, import/include/redefine. would like to see this support (including namespaces) back into DTDs. things we don't like: overly complex need tools, tools are inconsistent, implementations are incomplete having to continue to define entities in DTDs is distasteful hint attributes! -- awful for interoperability. hinder versioning -- version attribute not usually supported. namespaces are the only way to force recognition of a change. using namespaces to designate extensions leads to "namespace pollution"

Alexander Milowski: can you elaborate on what the lowest common denominator is?

Dan Vint: choice, groups, xs:any… all these cause trouble code lists need to be extendable. Need a way to identify extended code values.

<Chris Ferris> Martin Gudgin had a hack for extensible enumerations of QNames

Dan Vint: our type is based on QName to help identify the source.

<Michael Rowell> OAGi uses the unioned simpleType to do this.

Dan Vint: need to be able to control validation: sometimes strict, sometimes without extensions. we use two methods… global elements and groups with restriction and extension, and <Extension> element. you >have< to have a tool to understand any reasonable schema. where we need help: integrated approach for all tools; need at least one profile of XSV that matches DTD functionality. we need much simpler APIs to work with XML. need a way to limit features we don't want. We've had bad experiences in trying to use schemas produced by others. Doesn't work well unless schemas were created at the same time.

Alexander Milowski: have you looked at OASIS XML catalogs. not sure I trust that… Xerces does work in this regard, except in regard to "no-namespace" namespace documents. finally, changing the namespace is one of the few methods I have to force acknowledgment of the change.

<Dan Vint:> This still doesn't work if the namespace does not have some notion of version embedded in the namespace. Granted catalogs allows me to change www.foo.com to schema1.xsd, but how do I get to v1.2, v1.3, etc? In addition to understanding the namespace mapping I need catalogs to recognize at least one more piece of information which is the version attribute. In my read of XML catalogs I don't see anything to leverage this additional information. If XML catalogs supported this and I could get my members to recognize the use of catalogs, that would handle my versioning issues.

Review of points from today, and creation of topic list for Day 2

Paul Downey: suggest that we skip the review of the thoughts of the program committee on the papers submitted, and go straight to the information we captured on the flip charts.

(scribe is shaking with anticipation… :-)

Jonathan Marsh: the list looks daunting.

Mark Nottingham: should we process the lists offline?

David Ezell: how about hearing suggestions from the floor.

Jonathan Marsh: here's one -- is "ignore-unknowns" tractible?

(MSM returns with sticky notes.)

Michael Sperberg-McQueen: the game is to tear your sticky into 3, and go up and stick your sticky next to the (three) issues you'd like to talk about tomorrow.

Candidate topics for tomorrow:

Schema profiles (13)
LC124 versioning (6)
Co-constraints (6)
Test suite (4)
Wild card deficiencies (4)
Inconsistent versioning (2)
specification inaccessible for Schema authors (2) + complexity
Formal notation (1)
Extend only at the end (2)
Anonymous types lack identity (2)
Graph semantics (2)
UPA (4)
Schema location hints (2) + import/include (2) + packaging
XML Schema 1.1 — just say no
Extension of enumerations (2)
Validation vs code generation (2) + roudtripping (1)
Annotations for data mining (3)
Business level validation (1)

How to improve life without changing the specification?
How to improve the language?

Michael Sperberg-McQueen: suggest we use the hum method for voting. note that "unwarranted complexity in the rec" is not the only source of complexity in the Rec.

Steven Ericsson-Zenith: we should discuss the style of the spec.

Paul Downey: to me, profiles seems more important.

Jonathan Marsh: yes, suggest profiles is less technical, more fruitful

Noah Mendelsohn: we need to be clear what success for this group is. I think it's to give guidance to W3C on how to recognize success.

Chris Ferris: we should focus discussion on things that could be actionable coming out of the workshop. if we're going to recommend profiles, (e.g.), who should be involved in defining them.

Michael Sperberg-McQueen: I see another more radical clump.
clump 1 — ways to improve the language
clump 2 — ways to improve things without changing the language
clump 3 — profiles (improve things without changing the language forward) this work can say, for the customers we stand for here, these things might be important enought to warrant a breaking change. providing input to the Schema WG of the relative value of these various options.

David Ezell: i agree, and I would choose: clump 2, and then clump 1.

Alexander Milowski: one other item might need to be included which is Schema 1.1.

Steven Ericsson-Zenith: I don't understand what "improving life" means.

Michael Sperberg-McQueen: for example, I'd class something in clump 2 "improve the accessibility of the test suite". the schema test suite is >not< a conformance test. It's possible to view improving the test suite as more than just clarifying the spec.

Steven Ericsson-Zenith: won't the test suite become a de facto conformance test?

(scribe notes general negative grumbling...)

Douglas Purdy: I think the green list (the clumps) are sub agendas underneath every blue topic.

Michael Sperberg-McQueen: e.g. you can't fix UPA without changing the rec.

Douglas Purdy: ask XML Spy. add a category, what can be achieved by ignoring the spec.

Michael Sperberg-McQueen: the chairs will now turn our backs and judge by your humming.

… we have 8 winners. (general consensus that organizers have enough info to proceed.)

Michael Sperberg-McQueen: program committee will convene briefly when we adjourn.

[adjourned]

Minutes of the second day are also available.

W3C Workshop on XML Schema 1.0 User Experiences and Interoperability

21 June 2005

Attendees

Contents

introduction [slides]

WSDL overview [slides]

WS-I Report [slides]

BT report [slides]

OAGi report [slides]

HL7 report [slides]

QuickTree report [slides]

SAP report [slides]

Semeiosis report [slides]

Microsoft report [slides]

Microsoft report [slides]

IBM report

Sun report [slides]

Oracle report [slides]

Rogue Wave Software report [slides]

Acord report [slides]

Review of points from today, and creation of topic list for Day 2