29496 2016-02-21 18:39:08 +0000 [FO31] parse-ietf-date with military timezones and leniency towards single-digit numbers 2016-07-21 15:56:23 +0000 1 1 1 Unclassified XPath / XQuery / XSLT Functions and Operators 3.1 Candidate Recommendation PC Windows NT CLOSED FIXED P2 minor --- 1 abel.braaksma mike andrew_coleman debbie liam public-qt-comments oldest_to_newest 125192 0 abel.braaksma 2016-02-21 18:39:08 +0000 If I understand the text in the internal draft and CR correctly, the function fn:parse-ietf-date is meant to parse a date that is approximate to RFC-822, RFC-1123, RFC-850, RFC-1036, POSIX actime. It is more liberal than the more restrictive grammar in RFC-2616. I have a few observations: 1) I am missing the military timezones allowed by RFC-822. Since format-dateTime can create them, it seems to make sense to allow them as input as well. 2) In a similar vain, with the note on "be liberal in what to accept" it seems to make sense to allow unmentioned timezones with an implementation-defined offset. Currently that is an error (but this may well be intentional). 3) The text explains for each absent token or partial token what the default is, but not for fractional seconds. Obviously this must be zero and perhaps it is a bit too pedantic to add it, but nevertheless, all the other optional parts of the grammar have such a mention. 4) The Note on leniency towards single-digit vs double-digit numeric values says "Accepts a single-digit value in place of a two-digit value with a leading zero". This appears to imply "in a place where two digits can be replaced by a single digit then...". But the grammar only allows this for the daynum, not for hours. Is "3:45" to be treated as an error or may it be parsed as "03:45"? If the latter was the intend of this Note, I think the grammar should reflect that, or the Note could perhaps give it as example (or conversely, mention specifically that *only* daynum can be treated this way). 5) Perhaps the 4th paragraph of the Note could be written as follows to reflect point (4) above or more generally, remove the confusion that the grammar should not be taken too strictly (which I doubt is the intend): Suggestion to replace: "Reflecting the internet tradition of being liberal in what is accepted, the function also:" with: "Reflecting the internet tradition of being liberal in what is accepted, the grammar of the function deliberately accepts:" 125441 1 andrew_coleman 2016-03-11 13:25:17 +0000 The WG agreed on 2016-03-01: DECISION: (bug 29496) accepted one technical change, allowing the hours value to be single digit, plus editorial clarifications as suggested in points (3) and (5). 125577 2 mike 2016-03-21 22:24:31 +0000 The changes have been applied. 125581 3 mike 2016-03-22 09:30:26 +0000 Note: I interpreted "the hours component" to include both the hours part of the time, and the hours part of the timezone. A test case has been added. 125701 4 abel.braaksma 2016-04-03 23:21:15 +0000 I took a moment to review the changes in the internal WD as written now, and it looks like the text and the grammar were updated as expected. The suggested text for the "liberal in what to accept" was changed differently than proposed, but I think the revised text is better/clearer. Thanks. 126044 5 debbie 2016-04-22 16:32:54 +0000 Being picky, please can you also update the following sentence of the Rules for parse-ietf-date(): "If a tzoffset is supplied then its first two digits supply the hours part of the timezone offset, and its next two digits, if present, supply the minutes part." to say "...its first one or two digits..." (Note there were a couple of other tests, parse-ietf-date-errs5 and parse-ietf-date-errs28, which expected errors for single digit hours components, which I have modified.) 126156 6 mike 2016-04-26 14:21:33 +0000 Having slight difficulty working out how best to say this in a way that actually tells people that a tzoffset of 130 means one hour and 30 minutes, without relying on the reader's common sense, or at the other extreme treating them like idiots... The production rule is tzoffset ::= ("+"|"-") hours ":"? minutes? where hours ::= digit digit? minutes ::= digit digit So assuming people know how to parse from BNF, I think the best way would be to refer to the parts of the production rule: "If a @tzoffset@ is supplied then @hours@ supplies the hours part of the timezone offset, and @minutes@, which defaults to zero if absent, supplies the minutes part." That's dangerously close to being tautological but I think the font changes make clear the distinction between syntactic components of the supplied string and semantic components of the resulting value. 126214 7 abel.braaksma 2016-04-27 17:49:59 +0000 I'd like to suggest to disallow 130 and to allow 1:30, 01:30 and 0130. I don't think anyone would expect military time (which I believe this format comes from) to be other than four digits. That would mean a slight change in the production rules, for instance: tzoffset ::= ("+"|"-") (hours (":" minutes)? | miltime) miltime ::= milhours minutes milhours ::= digit digit hours ::= digit digit? minutes ::= digit digit 126234 8 liam 2016-04-27 21:50:32 +0000 I don't think it's worth making the change Abel suggests in comment 7. I'd have to go back and check the RFCs and implementations and data to see if we'd be rejecting in-use values used by automatically-generated datestamps (the primary usecase for this function), which I could do but would rather not - people have implemented what they've implemented at this point I expect, and I don't think rejecting values because they look odd to us is a good approach. A note that 130 (for example) is short for 0130 or 01:30 might be helpful. 126245 9 abel.braaksma 2016-04-28 00:52:52 +0000 (In reply to Liam R E Quin from comment #8) > I'd have to go back and check the RFCs and implementations and data to see if > we'd be rejecting in-use values used by automatically-generated datestamps Probably not needed, as my proposal is a partial reversion of a change following the accepted proposal in comment#1 and the editorial license mentioned in comment#3. In fact, I think it is closer to the original decision of comment#1. In other words, we remove something that was accidentally added. 126247 10 liam 2016-04-28 02:56:10 +0000 Thanks, Abel. I agree that a TZ offset of 130 is weird but I don't see it as a problem. None the less I'm also OK with disallowing it and allowing only (1:30, 01:30, 0130). It doesn't come up often in practice but it does happen (Newfoundland is an example). So on rereading in that spirit I'm ok with your comment 7, although let's use tzoffset and not miltime for the name -- "Military time" is sometimes used to mean the 24-hour clock system, and I don't think you're proposing to disallow 130 as equivalent to 1:30am in the time part. 126363 11 andrew_coleman 2016-05-06 09:56:27 +0000 At the meeting on 2016-05-03, the WG decided to retain the status quo with no further changes, allowing TZ 130. 127020 12 mike 2016-07-21 15:56:23 +0000 Comment #5 had been overlooked. I have amended the relevant paragraph to read: * If it contains a colon, this separates the hours part from the minutes part. * Otherwise, the grammar allows a sequence of from one to four digits. These are interpreted as <code>H</code>, <code>HH</code>, <code>HMM</code>, or <code>HHMM</code> respectively, where <code>H</code> or <code>HH</code> is the hours part, and <code>MM</code> (if present) is the minutes part.</p></item>