I18nFAQTimeZone

From W3C Wiki

Internationalization Core WG: Working with Times and Time Zones

This page is being used to build a W3C Note discussing the problem of date, time, and dateTime values with and without time zone offsets. When published, this note will be referred to from the XQuery 1.0 and XPath 2.0 Functions and Operators[8] RI''' specifications. Examples are given mainly relying RI''' that mainly rely on XML Schema [6] and XQuery 1.0 and XPath 2.0 Functions and Operators [8], since these are the basis for XQuery and XSLT processing of date/time values.

RI''' Who should read this document, and what will they get from it, ie. why should they?

by Addison Phillips, Felix Sasaki, Mark Davis, Martin Dürst

Working with Time Zones

Time-related data is a common requirement for many applications. XML Schema[6] provides a variety of data types for dates and times, such as date, time, and dateTime. These data types follow internationally friendly formats defined by ISO 8601 and can be used to address a variety of differing date or time applications.

The date, time, and dateTime types can either include or omit the time zone offset. The presence (or absence) of the offset means that the data value must be handled differently for certain kinds of operations. In addition, the particular application and source of the date and time values affects how dates and times with different time zones or zone offsets should be handled, as well as how to handle values that lack any time zone or zone offset indication.

Note Well: Users and implementers of languages which handle time-related data (e.g. XQuery, XPath, and XSLT) should take the following recommendations into account even if time-zone-sensitive data is rarely used. Sooner or later some data will be affected by the issues described.

Background

There are three main applications of date, time, or dateTime data types in applications.

Incremental or "Computer" Time. Most programming languages and development environments provide data types for handling time which are based on a numeric value: units of some specific length measured from a specific point in time (called the epoch). For example, the Java type java.util.Date is a long value for the number of milliseconds since midnight (00:00) January 1, 1970 in UTC (Universal Coordinated Time, sometimes also called GMT). Other systems use other units and epochs (see Universal Time below). Date and time values based on a construct of this type (which we'll call computer time) are time-zone-independent, since at any given moment it is the same time in UTC everywhere on Earth: the values can be transformed for display for any particular time zone offset, but the value itself is not tied to a specific location. Values of this type are commonly used in applications as "time stamps", showing when an event occured. Some applications for these include:

  • labeling log file entries with a timestamp
  • recording actual process start or stop times
  • measuring the duration of an event
  • comparing two time values

Time Zone Independent Field-Based Time. The human representation of computer times is more complicated, and represents time using various separate field values, such as hour, minute, month, or year. One application for this type of representation is for values that are time zone independent, representing a logical event divorced from a particular location on the Earth. [[RI At person's birth date (as opposed to birthday) is not obviously divorced from time zone considerations, esp. if you are trying to work out who is older. You might make this clearer by saying something like "For example, recurring dates such as a person's birthday would normally fall into this category, partly because time is not expressed, and partly because the actual time of the start and end of the day for a given geographic location may not be considered important." I also think that the distinction between this type and the following type is often a question of choice by the user or application, rather than an instrinsic difference between the type of data - the current explanations seem to imply to me that there is an instrinsic difference between the types of data.]] For example, a person's birth date is independent of time zone. Some other examples of this application of dates and times RI''' note that the following examples only include dates, no times. Would be good to add something like regular meeting times, school day start and end times, etc. include:

Time Zone Dependent Field-Based Time. In other cases, field-based dates and times are supposed to represent values linked to a particular location or time zone. For example, if you tell someone that you will make a telephone call to them at 14:00 from Paris, if that person is in London they'll expect the phone to ring at 13:00. As with incremental time, the event happens in the same instant around the globe and meaning of the value depends on the offset from UTC. Some other examples of this application of dates and times include:

  • purchase order date
  • tracking information for a package

Identifying Time Zones and Zone Offsets

XML Schema[6] follows the ISO 8601 standard for its lexical representation. Date and time values in ISO 8601 are field-based using the definitions above and can indicate (or omit) the zone offset from UTC. A zone offset is not the same thing as a time zone, and the difference can be important. XML Schema only supports zone offset, but, confusingly, calls it timezone, see for example [1] (Section 3.2.8.1, lexical representation)

Although ISO 8601 is expressed in terms of the Gregorian calendar, it can be used to represent values in any calendar system. The presentation of date and time values to end users using different calendar and timekeeping systems is separate from the lexical representation.

What is a "zone offset"? A "zone offset" is the difference in hours and minutes between a particular time zone and UTC. In ISO 8601, the particular zone offset can be indicated in a date or time value. The zone offset can be Z for UTC or it can be a value "+" or "-" from UTC. For example, the value 08:00-08:00 represents 8:00 AM in a time zone 8 hours behind UTC, which is the equivalent of 16:00Z (8:00 plus eight hours). The value 08:00+08:00 represents the opposite increment, or midnight (08:00 minus eight hours).

What is a "time zone"? A "time zone" is an identifier for a specific location or region which translates into a combination of rules for calculating the UTC offset. When an application (such as a website maintaining a group calendar) schedules a recurring meeting for 08:00 Pacific Time, it is referring to what is known as "wall time". RI''' Please explain why this is called 'wall time'. It will help users, such as myself, who are not familiar with this term to recall it's meaning when you use it later in the document. This is not equivalent to either 08:00-08:00 or 08:00-07:00, because Pacific Time does not have a fixed offset from UTC; instead, the offset changes during the course of the year. [[RI This makes it sound like all time zones can vary. I think you should make it clearer that this is one case of a type of time zone where the relationship to UTC can vary. For example, for something like BST, such variation does not occur, and such time zones are only relevant for a part of a year.]] As was mentioned before, XML Schema only supports zone offset, and it does not make the terminological distinction between zone offset and time zone. RI''' I think that the remainder of this paragraph should be in a separate para, perhaps with the title ''Rules for daylight savings'' Note that the rules for computing when daylight savings takes effect may change from year to year, and from location to location. Indiana, for example, does not follow daylight savings time but that will change in April 2006 [2]. The Northern and Southern hemispheres perform Daylight/Summer Time adjustments during opposing times during the year (corresponding to seasonal differences in the two hemispheres) RI''' and the changes go in different directions.

To capture these situations, a calendar system must use an ID for the time zone. The most definitive reference for dealing with wall time is the TZ database (aka Olson time zone database, [3]), which is used by many systems such as UNIX, Linux, Java, CLDR, ICU, and many other systems and libraries. In the TZ database, Pacific Time is denoted with the ID America/Los_Angeles. The TZ database also supplies aliases among different IDs; for example, Asia/Ulan Bator is equivalent to Asia/Ulaanbaatar. From these alias relations, a canonical identifier can be derived. CLDR can be used to provide a localized form for the IDs: see Appendix J in [4]. RI''' I'm wondering how CLDR handles locations where there are multiple time zones due to daylight savings...

Guidelines

Incremental vs. Field-Based Time

RI''' The first three paragraphs of this section and the last three should be two additional sections in the background material, in my mind. There are no guidelines expressed in this section at all, and I feel short-changed ;-).

Incremental time and field-based time differ in the way certain operations work. For example, incremental times can be directly compared--their integer values determine which is earlier or later--while field based times must be normalized and their individual fields compared. Field based times can have certain kinds of logical operations performed on them (for example, rolling the date forward or back), while incremental time requires a logical transformation. For example, if incrementing a date 2005-08-03 forward by one day, one can add 'one unit' to the "day" field and adjust the month and/or year as appropriate. RI'' should that date have been 2005-08-30 ? In computer time, the system could increment the value by 24 hours * 60 minutes * 60 seconds * 1000 milliseconds, which is one logical day, but there may be errors when a particular day has more or fewer seconds in it (such as occur during daylight savings transitions).

The SQL data types date, time, and timestamp are field based time values which are intended to be zone offset independent. The data type timestamp with timezone is the zone offset-dependent equivalent of timestamp in SQL. Programming languages, by contrast, tend to use incremental time and convert to and from a localized textual representation on demand. Databases may use incremental time or either zone offset-dependent or independent field-based structures internally. For example, an Oracle 8 database treats a timestamp field as though it is in the local time of the database instance.

As a result, users may not be clear on the differences between these types or may create a mixture of different representations. For example, a Java programmer using JDBC will retrieve incremental times (java.util.Date objects) from a database, even though the actual field in the database is a (field-based) timestamp value.

In XML Schema, as with SQL, dates and times are always expressed using field-based time. The date or time may express the zone offset from UTC (for example using a format such as 08:00:00+01:00). UTC is indicated by the letter Z (for example 08:00:00Z). Or, the zone offset may be omitted completely.

RI''' Does XML Schema allow for half-hour and 45-minute increments?

Properly speaking, an XML Schema date or time value with a zone offset is field-based/zone offset dependent and one without is field-based/zone offset independent.

If the two types are mixed, then the interpretation of the zone offset is not adequately specified in XML Schema. In XQuery 1.0 and XPath 2.0 Functions and Operators [8], the interpretation is implementation-defined and is based on an implicit zone offset. This is usually either UTC or local time. The presence or absence of the zone offset in the XML Schema representation may not be indicative of the original data's intention because of the confusion described above. Proper comparisons or processing rely on normalizing all date and time values into zone offset-independent (or zone offset-dependent) forms and never mixing the two in a particular operation.

Working with Field-Based Times

Field-based times and dates, RI''' 'and dates'? such as a time stored without a fixed date, require the user to determine whether to use a fixed zone offset, a time zone, or nothing. While XML Schema times are field-based in terms of the lexical representation, the underlying data may use incremental time, as may the implementation processing the values. Each specific case requires specific handling.

RI''' It could be clearer who these guidelines relate to, ie. end users, application developers, specification developers, etc.

  • If all of the data values represent incremental time, then the user should always use a specific zone offset (and UTC is strongly recommended as this offset, since most incremental time systems are based on it) and should always specify that zone offset. Values that do not specify a zone offset should be treated as if they use the same offset. If UTC is used, this produces the least amount of modification in the data.
  • If all of the data values represent time zone independent values (such as a list of employee's birth dates), then the zone offset should always be omitted. Any values that have a zone offset should probably ignore the zone offset (actually stripping it off, if possible), since zone changes are probably an artifact of other processing. If a zone offset must absolutely be applied to the data, then UTC should be used.
  • If all of the data values represent time zone dependent values, then the zone offset must always be supplied. Great care should be used to ensure that the correct offset is used and not just the current zone offset. For example, if a system in the U.S. Pacific time zone (America/Los_Angeles) generates a dateTime value '2005-02-11T11:23:04-07:00' on '2005-08-16', it may be an error (since the UTC offset during August in that time zone is UTC-7, but the zone offset in February is UTC-8).
  • If there are time values (with no date portion) with a fixed UTC offset, then the zone offset should always be indicated if and only if the time value really is fixed. That is, this would not apply to a meeting scheduled in Pacific Time, but would apply to a meeting that is always UTC-08:00 (and thus at 7:00 in the morning in Pacific time during parts of the year).

When You Really Need a Time Zone (as opposed to a Zone Offset)

Documents or systems can also choose to accompany a time value with the appropriate time zone ID (TZID) using a complex type. This is very important with recurring times, such as calendar meeting times. If a regular meeting is at 8:00 Pacific Time, it is insufficient to store and interchange just a zone offset.

There are different ways to compare two <datetime, TZID> pairs. If the date is fixed, say, (2004-09-31T01:30), then this can be done by computing the offsets on that date and at those times, using the TZ database. This order then reflects whether one datetime is (absolutely) before another.

If the dates are not fixed (such as <T01:30, TZID> -- notice that the date value is omitted) then in some sense, neither is 'before' the other, since each refers to a repeating, interleaved set of points in time. The simplest comparison mechanism where the dates may not be fully specified is simply to put both in canonical form, then order them first by time then by TZID (alphabetical, caseless order). The Olson database does not maintain a fixed canonical form; however, CLDR does provide such a form ([5]).

(It is also possible to have a looser comparison, whereby <time0, TZID0> is compared to <time1, TZID1> over some interval of time: if one consistently has a smaller offset during that period, it is considered less. However, there are cases where this mechanism results in a partial ordering, so it is not generally recommended.)

Unfortunately, XML Schema date and time types do not provide for Olson IDs, so most time operations cannot use TZIDs directly. Time zone identification in the date and time types relies entirely on time zone offset from UTC. That means that for wall times it is up to the document designer to keep the TZID in a separate data field from the time.

Comparing Times

Conversion between or operations on data sets that mix values with and without zone offsets present certain problems.

For example:


  <aDateTime>2005-06-07T13:14:27Z</aDateTime>  <!-- with a zone offset -->
  <bDateTime>2005-06-07T11:00:00</bDateTime>   <!-- without -->


If one wishes to write a comparison between the value of <aDateTime> and <bDateTime>, then the two values must be reconciled to use the same reference point. <aDateTime> uses UTC and can easily be converted to computer time or shifted to another zone offset. <bDateTime> contains no indication of the zone offset. It may be UTC or any other value (currently up to 14 hours different in either direction from UTC).

It is good practice to use an explicit zone offset wherever possible. If one is not available, best practice is to use UTC as the implicit zone offset for conversions of this nature. This is because the values are exactly centered in the range of possibilities and because representation internally (as computer time) is usually based on UTC. Since a single reference point has been used it may be possible to unwind the change later even if erroneous conversion takes place. When working with multiple documents from various sources, the "implicit" offset of the document may vary widely from that of the implementation doing the processing. If UTC is widely used, the chances of error are reduced.

Content and query authors are warned that comparing or processing dateTimes with and without time offsets may produce odd results and such processing should be avoided whenever possible. Generating content that omits zone offset information (where it exists) is a recipe for errors later. Of course, data such as the SQL types cited earlier which is meant to represent wall time should continue to omit the zone offset. Query writers can check for the presence (or absence) of zone offset and should do so to modify dates and times explicitly (instead of allowing implicit conversion) whenever possible.

Recommendations for XQuery / XSLT

Users of XQuery 1.0 and XSLT 2.0 and other standards should take the following recommendations into account even if time-zone-sensitive data is rarely used. Sooner or later some data will be affected by the issues described:

  1. If possible, make sure that data always contains an explicit zone offset.
  2. Do not apply operations based on date or time types (such as indexing) to collections of data in which some data items may have zone offset information and other data items may not have zone offset information.
  3. If you have data that includes implicit and fixed explicit zone offsets, before applying any date- or time-sensitive operations adjust the zone offset of the implicit data to UTC with the functions for zone offset adjustment, cf. [6].
  4. If you have data that contains both implicit and fixed explicit timezones and you do not want to adjust the data subset which already has a zone offset, make sure that you recognize this data subset, for example via the component extraction functions [7].

References

An example of a group calendar with time zones is the W3C Zakim teleconference calendar (member-only link):

[8]

[1] RFC 3339 (describes ISO 8601): [9]
[2] Tex Texin's site (contains presentations about ISO 8601): [10]
[3] It's About Time [11]
[4]Time Interval Changes 

[12]

[5]Universal Time [13]
[6]XML Schema Part 2: Datatypes Second Edition  

[14]

[7]XQuery 1.0: An XML Query Language  

[15]

[8] XQuery 1.0 and XPath 2.0 Functions and Operators 

[16]