Re: [whatwg] <time>

Hi Tom (and David),

On Mar 12, 2009, at 6:57 PM, David Singer wrote:
> At 16:24  -0500 12/03/09, Robert J Burns wrote:
>> That was my point: we cannot get a clear answer out of ISO 8601.  
>> ISO 8601 only covers dates between 1582 and 9999 without  
>> supplemental norms.
>
> No, it says mutual *agreement*, not supplemental norms.  ISO  
> 8601:2004 seems perfectly clear that the year before 0001 is 0000,  
> and that -0002-04-12 means "The twelfth of April in the second year  
> before the year [0000] " (example directly from ISO 8601).  The HTML  
> spec. can constitute such agreement.

The problem isn't that negative integers are not well-defined and  
understood well. The problem is understanding how an ISO 8601  
representation – such as these examples include – maps to an author's  
or user's understanding of the year 2 BC (or is it 1 BC?). ISO 8601  
provides no clear way out of that ambiguity. XSD Datatypes suggests a  
specific solution a future recommendation will likely include. The  
wiki page provides an alternate approach[2]. We cannot simply leave  
the ambiguity without creating problems for HTML for the foreseeable  
future.

On Mar 14, 2009, at 5:43 PM, Tom Duhamel wrote:
>
> I feel we are close to a 'partial' consensus, to reuse your term.  
> Here is what I feel most people agree on:
>
> Use ISO 8601 with the following provisions:
> - Allow all four digit years, positive and negative

We cannot agree on this unless we go beyond ISO 8601 and define how  
those representations map to BC/BCE while also keeping consistent leap  
year rules in place (note this is an area that does require some  
calendar expertise).

> - Allow lower granularity dates: 2009-03-14, 2009-03, 2009

Agreed

> - Allow ranges: 2009-03-01/2009-03-14

Agreed

> - Allow only extended format: 2009-03-14 (rather than 20090314)  
> which will help with simplification and future extensions

This may be more controversial. I think we could allow the omission of  
hyphens for years of four digits. Others have contended that we might  
also allow the omission of hyphens always if we omit support for  
ordinal dates (e.g., YYYY-DDD).

> I'm not sure many have arguments against any of the above. Sorry if  
> I missed anything. I don't claim we have actually reached a consensus.

I would add that keyword support for alternate calendars is an  
important part of keeping options open for HTML6 and also to help  
reduce authoring errors.

> Here are items which we are debating over, with my opinion on these:
>
> - Allow year 0000 or not?
> Actually I don't see why it's important.

It is important because we want dates before 0001-01-01 to be encoded  
unambiguously.  If we don't address such issues we need to limit  
ourselves to 0001-01-01 and later.

> I think it should be allowed. Historians deny the existence of year  
> 0, but astronomers use it.

It isn't a matter of denial or belief. This is an issue of precisely  
defining a Gregorian calendar to apply to the past: a calendar which  
was defined in 1582 with only the future in mind. ISO 8601 could have  
stepped in an further defined the Gregorian calendar but it punted. So  
no one is denying the year 0000 exists. The question is whether the  
Gregorian calendar should call the year before 1 AD, "year 0". Could  
you provide some reference to astronomy's used of a year 0 between 1  
AD and 1 BC (for example the Starry Night astronomical software I use  
does not have a year 0 between 1 AD and 1 BC. I often see such claims  
made, but I've never seen such definition/use of a specialized  
Gregorian calendar).

> What if year 0 is accepted as a valid date for the purpose of HTML,  
> and then not used by authors? It would become available for those  
> authors who use year 0, and ignored by others. Whats the implication?

If I understand current proleptic Gregorian calendar use correctly  
(such as that used by astronomers) the implication is that HTML will  
not match the other uses of that calendar. ISO 8601 has unambiguous  
leap year rules which are to be applied to 000 and negative years as  
well. If we accept year 0, then -0001 would mean 2 BC. That's fine,  
but we need to make it clear to authors and there's some concern  
authors would make the mistake of thinking -0001 was instead 1 BC.

> HTML parser does not perform math over dates, it merely displays  
> information based on what an author instructed.

However, if the UA displays the information in a non-machine readable  
form, how does it make that conversion for presentation purposes. Does  
0000-01-01 get displayed as 1 January 1 BC? Or as 1 January 00?

> Here his the only debate I see over this:
> ISO 8601:2000 and above suggest that year 0000 be used and be  
> considered year 1 BC, and then -0001 is 2 BC, etc.

Could you cite specifically where you get that from ISO 8601:2000? I'm  
looking at ISO 8601:2004 and don't see a clear indication of that  
anywhere.

> Most, I believe, will want year -0004 = 4BC (and this is what I'd  
> suggest).

If -0004 equals 4 BC then that may require leap year rule adjustments  
to match other proleptic Gregorian calendars (where -0005 would be a  
leap year instead). Again the problem is that we do not have a  
definitive standard to reference for a proleptic Gregorian calendar,  
especially for year before 1 AD.

> - Allow 5 or more digit dates? Like year 10,000 or year 100,000 BC?
> It's feasible: 10000-01-01, or -100000-01-01 (which is more likely  
> to be used with lower granularity I guess). I don't think this is  
> going to give parsers a greater complication.

That looks fine to me. Others have suggested we still allow  
hyphenation omission (which works if we exclude ordinal dates like  
YYYY-DDD). This would also then allow: "100000101" to represent 1  
January 10,000.

> - Allow non Gregorian calendars?
> There are actually two debates on this issue. One is whether to  
> allow other calendars as the datetime attribute values, the other is  
> to always have the datetime attribute value specified as a Gregorian  
> date while adding a new attribute which would indicate what calendar  
> is used for the content. I would personally go against the use of  
> any non Gregorian calendar at all, since I do believe the use cases  
> are too few. However if it is considered that the use of non  
> Gregorian is to be supported, I would go with the later solution  
> (allow non Gregorian as the content only, and have the datetime  
> attribute always defined as a Gregorian date), since that would not  
> put much complication on the parser (which does not have any  
> calculation/conversion to perform, leaving that to the author). The  
> new attribute would then be used by the browser to tell what  
> calendar is used, but the datetime attribute is still used to  
> indicate to the user the Gregorian equivalent. I do see too many  
> problems with accepting non Gregorian as the value of the datetime  
> attribute: too many calendars are way incompatible (try to represent  
> a Mayan Long Count calendar),...

We don't need to concern ourselves with how the Mayan Long Count  
calendar works. First of all, the concerns raised are largely over  
Julian and Revised Julian calendars. Other calendars of concern  
include Hebrew, Islamic and Buddhist all of which are used today as  
civil calendars. The other concern is that without clearly defined  
Gregorian and other calendars (and for some periods in history all of  
these calendars lack clear standards that map the representation  
unambiguously to a specific day on the Earth), it is more accurate and  
precise for authors to encode a date within the precise calendar than  
to attempt conversion (which is a lossy operation in that case).  
Conversion is better handled at runtime by UA algorithms that keep  
constant with the state of standards (but that requires some kind of  
keyword differentiation of non-Gregorian and even clearly Gregorian  
dates).

> Those are pretty much the current debates, as I see it. Please feel  
> free to add anything I missed. My current recommandation would be  
> that we try to come to an agreement on current debates, rather than  
> come up with more debates. We have already determined that none of  
> use are calendar experts (and I have already proven I'm far from  
> one), but I think we can come up with something that would give  
> enough flexibility to anyone without giving much restrictions. One  
> idea that comes to mind (if ever allowed by the current HTML draft,  
> I am no sure) could be to force parsers into a minimum set of rules  
> (i.e. accept at least 4 digits for years) while giving them the  
> freedom to decide whether or not to implement extended features  
> (accept any number of digits for years).

I think it is important to keep two roles played by HTML UAs distinct.  
One, they may need to parse a represented date. Two, UAs may also be  
expected by another specification (other than HTML) to flexibly  
present dates these encoded dates (but again, that's nothing HTML  
requires so far).  Third, separately the UA or a helper a other  
application may need to compare dates, convert dates, calculate  
intervals, etc. Again, HTML5 currently has no requirements of this  
sort. None of this is necessary for our present concerns. Our main  
concern with HTML then should be on the representation of dates within  
HTML documents, leaving presentation and manipulation of dates to  
other standards. Therefore we are better served by allowing the  
representation of dates in whatever calendar the dates are available  
within, than forcing (shoehorning as I said before) those dates into a  
Gregorian calendar in an uncertain way (especially if HTML is expected  
to define what a Gregorian calendar is).

> Most popular browsers would probably implement everything, while  
> smaller ones (those with more restrictions, such as cell phone  
> browsers) could go with the minumem (and still cover 95% of use  
> cases) while leaving the extended, more demanding features (although  
> I don't feel my long year example is a good one to represent a  
> 'demanding feature').

We aren't really asking much of UAs in terms of implementation here.  
The HTML5 draft includes algorithms to parse the dates, but whatever  
happens with them after they are parsed are left up to other  
processors and other standards. This means that allowing keyword  
differentiation of alternate calendars (with clear norms for an  
omitted keyword and a "Gregorian" keyword) requires very little of  
UAs. However, it at least allows future revisions of HTML to deal more  
completely with the problems we now face.

Take care
Rob

[1]: <http://www.w3.org/TR/xmlschema-2/XSD#year-zero>
[2]: <http://esw.w3.org/topic/HTML/DateTime>

Received on Sunday, 15 March 2009 00:02:26 UTC