TAG F2F (Morning) -- 5 Oct 2006

Issue XMLVersioning-41

HT introduces "XML Languages, entailment and versioning": http://www.ltg.ed.ac.uk/~ht/language.html

HT: In the model-theoretic view, truth is what matters but "truth" isn't at the core of many of our languages.
... How are we going to talk about the text in different languages with respect to the information they contain? This is an attempt to answer that question.

<Noah> When Henry's done, I'd like to take a few minutes to talk about some of the ideas I've had in this space. I think they're complementary to where Henry is going, though approached a bit less formally.

HT: In the traditional model-theoretic view, the distinction between concrete and abstract syntaxes isn't very important. But in our story, it is.

Three levels:

1. The concrete syntax, defined by BNF or DTD/Schema or ...

2. The data model, Java classes, schema components, ...

3. The model out in the real world

HT: For XML languages, it's more complex.
... Theta maps from a syntax to a data model
... Phi maps from the data model to the real world model
... Consider SVG: there's a Theta that maps from XML to the infoset, then there's a Theta that maps from infoset to the SVG data model, then there's a Phi that maps from that data model to the bitmaps/splines/...

Noah: Can't we look at this as having several languages in play?

HT: I don't think we can ignore any of the three levels, but can we focus on 1 and 2?
... Goal is not to talk about what kinds of models or Phi relations there are.
... There are an enormous number of them. I think we can avoid going there an still say useful things.

Dan: I'm certain that we can't ignore level 3.

HT: We can't get by with the traditional truth definition of entailment.
... Consider schema, SVG, and purchase order XML. These have different flavors.

<Noah> To clarify, I was saying that one way to tell the story about the different abstractions that can be inferred from an SVG text. Language 1 is XML+Infoset. It tells you that given an SVG text you can infer things like which element has which parent. Another "language" extracts from the same texts information about, e.g., which circles should be drawn on a screen or page.

HT: Consider the distinction between "snow is white" and "I hereby pronounce you man and wife"
... You can't say anything about the truth or falsehood of the second, it changes the state of the world.
... Two kinds of fit, word-to-world (snow is white) and world-to-word (I hereby pronounce you...)
... World-to-word or "performative" sentences change the world to match the words.

<DanC_lap> TBL: we've modelled these perfomatives in RDF by using the empty URI reference: { <> a Invoice }

HT: In between declarative and performative we also have imperatives.
... XML Schema is declarative: given a schema component and an infoset, the schema is true (satisfied) or false (not satisified) of/by the infoset

Noah: I expected to hear a story about schema documents. That you'd say the information that you got out of the schema document is a set of element/attribute declarations.

HT: That's where I'm going but from a different direction.

DanC: The schema document isn't true or false without an infoset.

HT: This is just like "it's raining". That's not true independent of when and where.

Noah: There are lots of other things that you can ask or say about a schema.

HT: SVG can go either way. Given an SVG dataset and a bitmap, the dataset is true or false of/by the bitmap. Or you can view it as a set of instructions to construct a bitmap (to change the world)

TBL: An SVG document isn't a statement or sentence, it's a noun phrase.

HT: Until you've said what you're going to do with it, there's a limit to what you can say about versioning it.
... The purchase order is definitely performative. Receipt of a PO means package and ship the goods.

TBL: "I hereby order something"

HT: Footnote: you can still frame all of this in terms of claims on the world. Declaratives are true iff they match the world. Performatives can be viewed as pre and post conditions.
... Consider the traffic light (the RGB language)

HT describes the language and the UML diagram (data model)

HT: The RGB language has a declarative interpretation. An instance of RGB is satisfied by a bitmap iff all its lights are satisfied by that bitmap...
... The RGB language has a performative interpretation. Henry describes the performative semantics.
... Finally entailment. We can define entailment for either approach to the semantics of RGB
... Entailment for the declarative semantics: An instance of RGB (A) entails another instance of RGB (B) iff all the bitmaps that satisfy A also satisfy B.
... Entailment for the performative semantics: We say that a message A entails a message B iff for all possible initial states, the response to A performs at least all the actions involved in the response to B.
... Can we appeal to entailment w/o appealing to the model except insofar as entailment presupposes it?

TBL: I don't think you can do entailment for SVG.
... Consider an SVG that draws a purchase order.
... SVG entailment would allow a purchase order of 17 items to be substituted for a purchase order for 3 items.
... The entailment at the purchase order will be completely different.

DanC: The entailment relationship will preserve which pixels are red and green.

Noah: It gets worse. When you version a language, if you're looking for only new documents will do new things, then I think that makes sense.
... The same PO in the SVG 1 spec will render the same way in the SVG 2 spec.
... That's not true of all languages.
... It's circular to say that "that is compatible" in the framework.

DanC: I think it would be useful to say that as long as your language satisfies these criteria then you don't have to say anymore. But if it defines compatibility some other way, you have to think really hard.

HT: Is the functional assumption ok for Theta?
... The notion of the data model (in this paper) is too simple--any language with keys in it may be mapped directly into updates at the model level, voilating the implicit appeal to some kind of context-free abstract syntax kind of story.
... An <address> might be interpreted immediately as an update to a database row keyed by name.

TBL: When an update has been made, there's some information content in that.
... When you're accumulating information, then special things occur. There's a 1:1 correspondence with the entailment in each of those cases.

HT: There are also conditional statements to be considered. There's was a simplifying assumption that needs to be unpacked. I'm not saying there are going to be catastrophes, but...
... Consider adding <xs:attribute name="foo" use="prohibited">.
... This results in less information in the data model.

Noah: Doesn't this depend on your point of view? If your model is a running list of which attributes will be validated, then its not monotonic.

HT: This is about two different schema documents, so there's no time component

Noah: If you modeled it in a way that followed the syntax more closely you might get different results.

Critical point seems to have been that this is an existence proof of a certain class of problems.

Noah: We're trying to teach users what information is in their documents.

HT: It's perfectly coherent to define a Theta that isn't additive.

Noah: It's coherent, but is it the case that whenever you define Theta that way, could have defined it another way?

HT: The punch line is that if this works, we'll have an opportunity to state to levels of relationship which will ground the versioning paper:

1. Two texts from a language are equivalent if they correspond to the same data model.

2. A text A from a language is compatible with another text B if A corresponds to a data model which entails the data model corresponding to B.

HT: Attributes with single and double quotes, for example, are data model equivalent.
... In general, this is informal. In a language with additive semantics, then any two texts where one text is a superset of another, the bigger one will entail the smaller one.

DanC: That's the definition of monotonicity.

TBL: This "compatible with" is something we needed when we were talking about I1 and I2.

Noah: I have a text A and in terms of Language 1, certain things follow.
... Now I kept the text, but I sent it to someone else using Language 2.
... I thought that we were going to have statements that told us something about that case.

HT: 3. A language L1 is compatible with a language L2 with respect to a text A if the data model corresponding to A in L1 entails the data model corresponding to A in L2.

TBL: You talk about entailment in terms of two sets of actions. This is at level 3.
... Two languages with two different data models can have the same information (SVG and HTML documents that both say to close the door)
... If you allow yourself to say that the semantics of one entail the semantics of the other, then we can tie these things together.

TBL draws a picture

TBL: It would be nice if we could talk about sets of instructions or claims (some philosophical background work necessary)

<DanC_lap> (I actually did come up with the correspondence timbl just asked "you and the philosophers" to come up with... pointer coming...)

TBL: If we have a PO in XML and another in RDF, we need to be able to tell if they are compatible in the real world.

<DanC_lap> (how to make PHI functional: http://lists.w3.org/Archives/Public/www-tag/2006Sep/0040.html )

HT: I'm not comfortable talking about the real world as information.

TBL: For me level three isn't the lawnmower, it's information about the lawnmower.

DanC: Returning to the analogy between the "functional" terminology that you objected to, Pat, and the conventional terminology, here's what I have in mind...

DanC:In the conventional terminology, "An argument is valid if the truth of its premises guarantees the truth of its conclusion" (odd; that's easy to find in Suber's stuff, but I can't find it in wikipedia). That glosses over a bunch of stuff that you have to elaborate in order to talk about multiple (versions of) languages. To elaborate, an argument P to Q is L-valid iff for all L-interpretations I, if I(P) is true, then I(Q) is true. (the Wikipedia article calls them L-structures rather than L-interpretations, I think.) To map to the "functional meaning" terminology, flip things around just a little bit and let the "L-meaning" of P be a function from interpretations to True/False. Then we'd say: an argument from P to Q is valid iff for all interpetations I, if L-meaning(P)(I) is true, then L-meaning(Q)(I) is true.

DanC:Does that make sense, Pat? And do you see how it allows us to speak of _the_ meaning that SVG version 1.23 gives to the text "<svg>...</svg>"?

HT: The label Phi is not a label for the relationship between the data model and the set of claims it makes. It's a name for "is domain of discourse of". It maps to the real world.
... It's a separate question if there's a function that maps from that data model to the set of claims.

Chair calls a break

TBL's diagram http://www.w3.org/2001/tag/2006/10/04-whiteboard-2.jpg

Noah: What's the story we tell about information in Dave's finding? I was struggling with a few things.
... It seemed like a data model and that isn't the same for RDF/XML/key-value pairs, etc.
... The idea that came to me is that having information is being able to answer questions.
... I start out not knowing anything. Handed a document, I still don't know anything.
... Handed a description of the language that the document is written in, I can now start to answer questions. I have information.
... By phrasing it this way, I can cover not only the information but also it's structure.
... The infoset language lets me answer questions like "what's the root element of the document", etc.
... Now I have access to structure.
... But some folks want to be able to connect information to things in the real world.
... Not all languages aspire to that. Consider keyword/value pairs.

<timbl> like plists, i imagine

Noah: Like all languages, the spec lets you answer certain questions. You can say what the value is for a given keyword. But that's pretty much it.
... But what if I do want to do RDF? Then the questions I get to answer in that language are much deeper and more interesting.
... We're also looking for a subset/superset relationship. I think it's in some sense pretty clear. To the extent that in two different languages the same text allows you to answer an overlapping set of questions, you've got a story to tell.

DanC: You can write a language where the part 3 bit is elements.
... That's the more traditional way. There's syntax and semantics.
... This comes back to the discussion of SVG semantics. If you have an SVG purchase order, you have two languages. The SVG language tells you what pixels to turn on, the other language let's you order things.

Noah: This seems like the kind of story you could tell in simple words.
... In my formulation, there' just questions. It might be that this is the same as what Henry did, but I can't prove that.

DanC: I'm pretty sure that we're going to need more than that to do the finding.

Noah: Clearly where I say "question" you want something more formal.

DanC: I'm not convinced that three levels is necessary

HT: I'm pretty sure that they are.

DanC engages in thirty minutes of joint editing on the UML versioning diagram

<Noah> Hmm. Was disconnected. In case it wasn't minuted, it seems important to me that the commonsense definition of "Information Set" is "All the things you can convey using this language".

<DanC_lap> link, again http://lists.w3.org/Archives/Public/www-tag/2006Sep/0040.html

<ht_vancouver> http://www.ltg.ed.ac.uk/~ht/language.html

Recessed for lunch

- DRAFT -

TAG F2F (Morning)

5 Oct 2006

Attendees

Contents

Issue XMLVersioning-41

Summary of Action Items