Meeting minutes
Date: 2024-02-20
Review of agenda
ACTION 2023-01-10-f continues
ACTION 2023-10-17-a completed
ACTION 2023-11-28-a continues
ACTION 2023-11-28-c continues
ACTION 2023-11-28-e continues
ACTION 2024-01-09-a completed
Status reports
John: I've updated the workbench to fix bugs. Names weren't being recognized unless they were followed by a space.
John: Working on experiments to match strings and regular expression matching. Tidied up the UX a bit.
Norm: No status yet.
Bethan: The PhD is submitted!
Bethan: Have also started working on my implementation again!
Michael: Nothing to report.
Steven: I've staged the next version, but haven't pushed the changes. Time has been focused on paper for Prague.
Steven: Paper is about round-tripping iXML.
John: I'm also doing some experiments on round-tripping. Can you create a stylesheet from the grammar such that if you run it on the output, it will provide a flattened result.
Bethan: My instinct is that the grammar is to some extent a schema for the output XML (or embodies the same information) as at least the part of the schema for the output your producing
John: Things become very complicated where operators are added back into the right places.
Steven: Someone else is submitting an iXML paper at Prague; he wants to use iXML to reparse XML, to extract information from the text nodes and put it back into the XML.
John: This is something like the work I've done parsing XPath out of XSLT.
Some additional discussion of the problems of round tripping.
Bethan: Could you leverage a schema to produce a grammar to parse some texts into the grammar?
Nods of agreement: there's something interesting about the intersection between grammars and schemas.
publication of ixml spec as W3C CG Report
Steven: This is finished, but there are some URL problems.
ACTION: Steven to contact W3C to get the report links fixed on the report and ixml group pages.
Issue #139 Sample grammars for IRIs and URIs
Steven: I published something today; we can discuss it next week.
Issue #202 Spec should say Unicode version is implementation-defined
ACTION: Steven to amend the specification to describe how Unicode is version-dependent
Issue #199 Require whitespace between prolog and first rule?
Norm: Whitespace is required between rules but not between the prolog and the first rule.
Norm: I think we should be consistent.
Steven: The space is needed between rules to avoid ambiguity; this is a change for the sake of change.
John: If we start to put multiple things in the prolog, things may get ambiguous.
Norm: I'm not going to lie down in the road if we leave this until we need to do it.
Bethan: I think given it's backwards incompatible, we should do it sooner rather than later.
Steven: We also have a larger prolog issue that I raised in email.
ACTION: Michael to make sure that Steven's prolog issues are turned into trackable issues.
Issue #192 Normalizing line endings in ixml inputs
Steven: This is a request from the broader community for a way to specify a line ending that's not platform-dependent.
Norm: That's the issue with a particular spin, I think the user would like us to just normalize to #A and move on.
Michael: I don't know on an IBM mainframe that uses variable length records what they do. I suppose the obvious thing to do is to say that a record boundary turns into a #A.
Some discussion of what IBM mainframes actually do for storing text files.
Michael: I think the proposal is that the iXML spec should say that an implementation presents end-of-lines as #A regardless of the platform.
Steven: My problem with that solution is that if I get files over the web, I don't know where they came from.
Norm: I think XML solves this; there's a simple algorithm for deciding if and which sequences of characters are turned into a single #A
Michael: I think it boils down to: when you're reading a character stream, you normalize line feeds. You have some built in understanding understanding of line boundaries and you recognize them.
Michael: The question in my mind is, if you wanted to use iXML to do something a little closer to the metal, how would you do it?
Norm: I think if you want to do that, you want to treat the input of some kind of binary so it's out-of-scope
Some discussion of the circumstances when you might want to process "binary" of one sort of another.
Bethan: Why not introduce an end-of-line marker for a non-platform-specific end-of-line?
Steven: That's what I proposed in response to Norm.
Bethan: My suggestion is that the character would be a shortcut for the expansion \n, \n\r\, \r, etc.
John: Would you be able to do that in a member string?
Steven: You can't negate that easily, that was part of my example.
Michael: I may be misunderstanding, but I'm not a big fan of the idea of what Bethan suggests; but I'm not sure the problem Steven identifies is a real one.
Michael: Suppose we choose a single character for abstract "end of line"; NEL. If we said NEL means any linefeed, so when that's in grammar...
Steven: I think what Bethan proposed is a character in the grammar that represents this.
Michael: I can say, give me anything that's not that character.
Steven: But it's not only one character.
Michael: If we do normalize, I think normalizing the same way as XML would be wiser, therefore #A.
Any other business?
John: I think we should try to keep track of where people are using iXML. I got a bug report from someone using iXML to parse some genetic data.
Adjourned.
<cmsmcq> FWIW, US daylight time starts 10 March. (Yikes.)
<cmsmcq> In the UK, it starts 31 March.