Invisible XML – 20 February 2024

Meeting minutes

Date: 2024-02-20

Review of agenda

ACTION 2023-01-10-f continues

ACTION 2023-10-17-a completed

ACTION 2023-11-28-a continues

ACTION 2023-11-28-c continues

ACTION 2023-11-28-e continues

ACTION 2024-01-09-a completed

Status reports

John: I've updated the workbench to fix bugs. Names weren't being recognized unless they were followed by a space.

John: Working on experiments to match strings and regular expression matching. Tidied up the UX a bit.

Norm: No status yet.

Bethan: The PhD is submitted!

Bethan: Have also started working on my implementation again!

Michael: Nothing to report.

Steven: I've staged the next version, but haven't pushed the changes. Time has been focused on paper for Prague.

Steven: Paper is about round-tripping iXML.

John: I'm also doing some experiments on round-tripping. Can you create a stylesheet from the grammar such that if you run it on the output, it will provide a flattened result.

Bethan: My instinct is that the grammar is to some extent a schema for the output XML (or embodies the same information) as at least the part of the schema for the output your producing

John: Things become very complicated where operators are added back into the right places.

Steven: Someone else is submitting an iXML paper at Prague; he wants to use iXML to reparse XML, to extract information from the text nodes and put it back into the XML.

John: This is something like the work I've done parsing XPath out of XSLT.

Some additional discussion of the problems of round tripping.

Bethan: Could you leverage a schema to produce a grammar to parse some texts into the grammar?

Nods of agreement: there's something interesting about the intersection between grammars and schemas.

publication of ixml spec as W3C CG Report

Steven: This is finished, but there are some URL problems.

ACTION: Steven to contact W3C to get the report links fixed on the report and ixml group pages.

Issue #139 Sample grammars for IRIs and URIs

Steven: I published something today; we can discuss it next week.

Issue #202 Spec should say Unicode version is implementation-defined

ACTION: Steven to amend the specification to describe how Unicode is version-dependent

Issue #199 Require whitespace between prolog and first rule?

Norm: Whitespace is required between rules but not between the prolog and the first rule.

Norm: I think we should be consistent.

Steven: The space is needed between rules to avoid ambiguity; this is a change for the sake of change.

John: If we start to put multiple things in the prolog, things may get ambiguous.

Norm: I'm not going to lie down in the road if we leave this until we need to do it.

Bethan: I think given it's backwards incompatible, we should do it sooner rather than later.

Steven: We also have a larger prolog issue that I raised in email.

ACTION: Michael to make sure that Steven's prolog issues are turned into trackable issues.

Issue #192 Normalizing line endings in ixml inputs

Steven: This is a request from the broader community for a way to specify a line ending that's not platform-dependent.

Norm: That's the issue with a particular spin, I think the user would like us to just normalize to #A and move on.

Michael: I don't know on an IBM mainframe that uses variable length records what they do. I suppose the obvious thing to do is to say that a record boundary turns into a #A.

Some discussion of what IBM mainframes actually do for storing text files.

Michael: I think the proposal is that the iXML spec should say that an implementation presents end-of-lines as #A regardless of the platform.

Steven: My problem with that solution is that if I get files over the web, I don't know where they came from.

Norm: I think XML solves this; there's a simple algorithm for deciding if and which sequences of characters are turned into a single #A

Michael: I think it boils down to: when you're reading a character stream, you normalize line feeds. You have some built in understanding understanding of line boundaries and you recognize them.

Michael: The question in my mind is, if you wanted to use iXML to do something a little closer to the metal, how would you do it?

Norm: I think if you want to do that, you want to treat the input of some kind of binary so it's out-of-scope

Some discussion of the circumstances when you might want to process "binary" of one sort of another.

Bethan: Why not introduce an end-of-line marker for a non-platform-specific end-of-line?

Steven: That's what I proposed in response to Norm.

Bethan: My suggestion is that the character would be a shortcut for the expansion \n, \n\r\, \r, etc.

John: Would you be able to do that in a member string?

Steven: You can't negate that easily, that was part of my example.

Michael: I may be misunderstanding, but I'm not a big fan of the idea of what Bethan suggests; but I'm not sure the problem Steven identifies is a real one.

Michael: Suppose we choose a single character for abstract "end of line"; NEL. If we said NEL means any linefeed, so when that's in grammar...

Steven: I think what Bethan proposed is a character in the grammar that represents this.

Michael: I can say, give me anything that's not that character.

Steven: But it's not only one character.

Michael: If we do normalize, I think normalizing the same way as XML would be wiser, therefore #A.

Any other business?

John: I think we should try to keep track of where people are using iXML. I got a bug report from someone using iXML to parse some genetic data.

Adjourned.

<cmsmcq> FWIW, US daylight time starts 10 March. (Yikes.)

<cmsmcq> In the UK, it starts 31 March.

– DRAFT –
Invisible XML

20 February 2024

Attendees