This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29496 - [FO31] parse-ietf-date with military timezones and leniency towards single-digit numbers
Summary: [FO31] parse-ietf-date with military timezones and leniency towards single-di...
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.1 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows NT
: P2 minor
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-21 18:39 UTC by Abel Braaksma
Modified: 2016-07-21 15:56 UTC (History)
3 users (show)

See Also:


Attachments

Description Abel Braaksma 2016-02-21 18:39:08 UTC
If I understand the text in the internal draft and CR correctly, the function fn:parse-ietf-date is meant to parse a date that is approximate to RFC-822, RFC-1123, RFC-850, RFC-1036, POSIX actime. It is more liberal than the more restrictive grammar in RFC-2616.

I have a few observations:

1) I am missing the military timezones allowed by RFC-822. Since format-dateTime can create them, it seems to make sense to allow them as input as well.

2) In a similar vain, with the note on "be liberal in what to accept" it seems to make sense to allow unmentioned timezones with an implementation-defined offset. Currently that is an error (but this may well be intentional).

3) The text explains for each absent token or partial token what the default is, but not for fractional seconds. Obviously this must be zero and perhaps it is a bit too pedantic to add it, but nevertheless, all the other optional parts of the grammar have such a mention.

4) The Note on leniency towards single-digit vs double-digit numeric values says "Accepts a single-digit value in place of a two-digit value with a leading zero". This appears to imply "in a place where two digits can be replaced by a single digit then...". But the grammar only allows this for the daynum, not for hours. Is "3:45" to be treated as an error or may it be parsed as "03:45"? If the latter was the intend of this Note, I think the grammar should reflect that, or the Note could perhaps give it as example (or conversely, mention specifically that *only* daynum can be treated this way).

5) Perhaps the 4th paragraph of the Note could be written as follows to reflect point (4) above or more generally, remove the confusion that the grammar should not be taken too strictly (which I doubt is the intend):

Suggestion to replace: "Reflecting the internet tradition of being liberal in what is accepted, the function also:"

with: "Reflecting the internet tradition of being liberal in what is accepted, the grammar of the function deliberately accepts:"
Comment 1 Andrew Coleman 2016-03-11 13:25:17 UTC
The WG agreed on 2016-03-01:

DECISION: (bug 29496) accepted one technical change, allowing the hours value to be single digit, plus editorial clarifications as suggested in points (3) and (5).
Comment 2 Michael Kay 2016-03-21 22:24:31 UTC
The changes have been applied.
Comment 3 Michael Kay 2016-03-22 09:30:26 UTC
Note: I interpreted "the hours component" to include both the hours part of the time, and the hours part of the timezone.

A test case has been added.
Comment 4 Abel Braaksma 2016-04-03 23:21:15 UTC
I took a moment to review the changes in the internal WD as written now, and it looks like the text and the grammar were updated as expected. The suggested text for the "liberal in what to accept" was changed differently than proposed, but I think the revised text is better/clearer. Thanks.
Comment 5 Debbie Lockett 2016-04-22 16:32:54 UTC
Being picky, please can you also update the following sentence of the Rules for parse-ietf-date():

"If a tzoffset is supplied then its first two digits supply the hours part of the timezone offset, and its next two digits, if present, supply the minutes part."

to say "...its first one or two digits..."

(Note there were a couple of other tests, parse-ietf-date-errs5 and parse-ietf-date-errs28, which expected errors for single digit hours components, which I have modified.)
Comment 6 Michael Kay 2016-04-26 14:21:33 UTC
Having slight difficulty working out how best to say this in a way that actually tells people that a tzoffset of 130 means one hour and 30 minutes, without relying on the reader's common sense, or at the other extreme treating them like idiots...

The production rule is

tzoffset ::= ("+"|"-") hours ":"? minutes?

where 

hours	::=	digit digit?
minutes	::=	digit digit

So assuming people know how to parse from BNF, I think the best way would be to refer to the parts of the production rule:

"If a @tzoffset@ is supplied then @hours@ supplies the hours part of the timezone offset, and @minutes@, which defaults to zero if absent, supplies the minutes part."

That's dangerously close to being tautological but I think the font changes make clear the distinction between syntactic components of the supplied string and semantic components of the resulting value.
Comment 7 Abel Braaksma 2016-04-27 17:49:59 UTC
I'd like to suggest to disallow 130 and to allow 1:30, 01:30 and 0130. I don't think anyone would expect military time (which I believe this format comes from) to be other than four digits.

That would mean a slight change in the production rules, for instance:

tzoffset ::= ("+"|"-") (hours (":" minutes)? | miltime)

miltime		::=	milhours minutes
milhours	::=	digit digit
hours		::=	digit digit?
minutes		::=	digit digit
Comment 8 Liam R E Quin 2016-04-27 21:50:32 UTC
I don't think it's worth making the change Abel suggests in comment 7. I'd have to go back and check the RFCs and implementations and data to see if we'd be rejecting in-use values used by automatically-generated datestamps (the primary usecase for this function), which I could do but would rather not - people have implemented what they've implemented at this point I expect, and I don't think rejecting values because they look odd to us is a good approach.

A note that 130 (for example) is short for 0130 or 01:30 might be helpful.
Comment 9 Abel Braaksma 2016-04-28 00:52:52 UTC
(In reply to Liam R E Quin from comment #8)
> I'd have to go back and check the RFCs and implementations and data to see if
> we'd be rejecting in-use values used by automatically-generated datestamps
Probably not needed, as my proposal is a partial reversion of a change following the accepted proposal in comment#1 and the editorial license mentioned in comment#3. In fact, I think it is closer to the original decision of comment#1. 

In other words, we remove something that was accidentally added.
Comment 10 Liam R E Quin 2016-04-28 02:56:10 UTC
Thanks, Abel. I agree that a TZ offset of 130 is weird but I don't see it as a problem. None the less I'm also OK with disallowing it and allowing only (1:30, 01:30, 0130).  It doesn't come up often in practice but it does happen (Newfoundland is an example). So on rereading in that spirit I'm ok with your comment 7, although let's use tzoffset and not miltime for the name -- "Military time" is sometimes used to mean the 24-hour clock system, and I don't think you're proposing to disallow 130 as equivalent to 1:30am in the time part.
Comment 11 Andrew Coleman 2016-05-06 09:56:27 UTC
At the meeting on 2016-05-03, the WG decided to retain the status quo with no further changes, allowing TZ 130.
Comment 12 Michael Kay 2016-07-21 15:56:23 UTC
Comment #5 had been overlooked. I have amended the relevant paragraph to read:

* If it contains a colon, this separates the hours part from the minutes part.

* Otherwise, the grammar allows a sequence of from one to four digits. These are interpreted as <code>H</code>, <code>HH</code>, <code>HMM</code>, or <code>HHMM</code> respectively, where <code>H</code> or <code>HH</code> is the hours part, and <code>MM</code> (if present) is the minutes part.</p></item>