Re: ISSUE-3 (DTF): Date and Time Format

Here's my take on this. I'll break it down into sub-questions.


Q1: Should the range of dcterms:issued include plain literals for free-text descriptions (“sometime in the 70's or 80's”)?

A: Yes, although the use of such free-text date descriptions should be discouraged.


Q2: Should the date format allow placeholders such as “200?” for the previous decade or “2011-00-00” where month and date are unknown?

A: No. This is not allowed in W3CDTF or XML Schema Datatypes or ISO 8601 or SQL or any other date spec I'm aware of. Existing date code such as Java's java.util.Date or PHP's strtotime will in the best case just barf, and in the worst case produce nonsense such as turning 2012-02-00 into 2012-01-31. I'm also not aware of any existing government data catalog that codes dates in this notation, or in any other way that can be automatically transformed into this notation. We should not recommend a notation that requires manual re-coding and is incompatible with everything.


Q3: Should the date format be dcterms:W3CDTF instead of xsd:date in order to support less specific dates such as yyyy and yyyy-mm?

A: No. If at all, then it should allow the W3C-recommended datatypes xsd:gYear and xsd:gYearMonth in addition to xsd:date. But I would prefer not to go there as it makes the creation of clients significantly harder (e.g., correct ordering and filtering of dates). The current approach of filling in 01 for unknown months and dates is a good compromise between simplicity and representational fidelity, IMO.


Q4: Should the date format be xsd:dateTime instead of xsd:date to support higher precision than day?

A: Most existing government catalogs seem to specify data release dates up to the day, without time component. Displaying and processing dateTimes is quite a bit more involved because now we have to deal with time zones, variable precision and so on, and make the problem of filling in 00 or 01 for unknown values even more prevalent. Again, I think that xsd:date strikes the right balance. (It is true that SPARQL has dedicated support for xsd:dateTime but not for xsd:date, but that's fine – unlike for xsd:dateTime, ordering xsd:dates doesn't require special code, lexical ordering will be correct.)


Q5: Then how to deal with cases where the year is unknown, or where the time of day really matters, or where the ambiguity between “01” and “unknown” is really unacceptable?

A: Use a plain literal (see Q1), or deviate from the recommendation.


My conclusion is that dcat should stick with xsd:date (and allow but discourage plain literals) because it's the simplest approach that fulfils the use cases.

Best,
Richard



On 6 Jan 2012, at 15:19, Government Linked Data Working Group Issue Tracker wrote:

> 
> ISSUE-3 (DTF): Date and Time Format
> 
> http://www.w3.org/2011/gld/track/issues/3
> 
> Raised by: Phil Archer
> On product: 
> 
> The current version of DCAT seems a little confused wrt date and time formats. We use dcterms:issued and repeat the DC range declaration of rdfs:Literal and then say it should be datatyped as xsd:date. So far so good. But then the text refers to the W3CDTF document. And they're not the same. 
> 
> xsd:date requires that values be present for yyyy-mm-dd
> 
> W3CDTF is more flexible and allows any of:
> yyyy
> yyyy-mm
> yyyy-mm-dd (and then times can be added)
> 
> The DCAT spec says that if a day and/or month are not known then one should use the value 01. This assumes:
> 
> - that the year is always known;
> - that a date like 2012-01-06 is ambiguous since it includes '01'. 
> 
> There may be cases in which the year is not known. For example, 'the 1980s' might be written as 198?. That breaks W3CDTF but it's an approximation. As it happens this came up just yesterday in the EU work that Christophe and I are doing so it's fresh in my mind. Taking all that on board, my proposal is therefore that:
> 
> 1. Rather than specify a datatype of xsd:date we specify W3CDTF (which is what DC recommends). We can use the URI http://purl.org/dc/terms/W3CDTF to give the data type.
> 
> 2. We recommend using '00' not '01' for unknown dates.
> 
> 3. We explain that just giving the year or the year and month is valid.
> 
> 4. Where the year is uncertain, use the ? character to express this but recognise that this breaks the model and is not W3CDTF. Therefore the data should not be so typed.
> 
> 5. Where even strings like 198? cannot be provided, plain text such as "sometime in the 1970s or '80s" may be used but this should be avoided if at all possible.
> 
> Given DCAT's use cases the latter seems unlikely (it happens in public sector records for things like dates of birth) so maybe we could drop that bit, but 1 - 4 seem valid?
> 
> 
> 

Received on Tuesday, 17 January 2012 12:10:00 UTC