Warning:
This wiki has been archived and is now read-only.
Mapping SQL datatypes to XML Schema datatypes
The ISO/IEC 9075-14:2008 Part 14 XML-Related Specifications (SQL/XML) defines a mapping from SQL datatypes to XML Schema datatypes. Below are listed the Predefined SQL datatypes and their mapping to XML Schema datatypes. The specification further defines constructed and user-defined types.
Contents
Predefined SQL Datatypes
The following predefined (or built-in) SQL datatypes are defined in Section 4.1 of the ISO/IEC 9075-2:2008 SQL - Part 2: Foundation (SQL/Foundation):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mapping SQL Datatypes to XML Schema Types
Mapping Predefined Datatypes
The XML Schema datatypes (XSD) are defined in XML Schema Part 2: Datatypes Second Edition, Oct. 2004. W3C Recommendation, http://www.w3.org/TR/xmlschema-2/. Additionally the xsd:dayTimeDuration and xsd:yearMonthDuration datatypes are defined in the XQuery 1.0 and XPath 2.0 Data Model (XDM) (Second Edition), W3C Recommendation, Dec. 2010, http://www.w3.org/TR/xpath-datamodel/.
The mapping of predefined SQL datatypes to XML Schema described in Section 4.10.5 of ISO/IEC 9075-14:2008 Part 14 XML-Related Specifications (SQL/XML) is represented in the following table. Facets are used to constrain the XSD since it's value space is normally more encompassing than the SQL datatype's value space. Although not represented here, XSD annotations may be used to specify additional information such as the original datatype.
SQL | XSD | Facet used |
---|---|---|
CHARACTER | xsd:string | xs:length or xs:maxLength |
CHARACTER VARYING | xsd:string | xs:maxLength |
CHARACTER LARGE OBJECT | xsd:string | xs:maxLength |
BINARY | xsd:hexBinary or xsd:base64Binary | xs:maxLength |
BINARY VARYING | xsd:hexBinary or xsd:base64Binary (implementation dependent) | xs:maxLength |
BINARY LARGE OBJECT | xsd:hexBinary or xsd:base64Binary (implementation dependent) | xs:maxLength |
NUMERIC | xsd:decimal | xs:precision and xs:scale |
DECIMAL | xsd:decimal | xs:precision and xs:scale |
SMALLINT | xsd:integer or subtype of xsd:integer (implementation dependent) | xs:maxInclusive and xs:minInclusive |
INTEGER | xsd:integer or subtype of xsd:integer (implementation dependent) | xs:maxInclusive and xs:minInclusive |
BIGINT | xsd:integer or subtype of xsd:integer (implementation dependent) | xs:maxInclusive and xs:minInclusive |
FLOAT | xsd:float (precision < 24bits) or xsd:double | |
REAL | xsd:float (precision < 24bits) or xsd:double | |
DOUBLE PRECISION | xsd:float (precision < 24bits) or xsd:double | |
BOOLEAN | xsd:boolean | |
DATE | xsd:date | xs:pattern |
TIME | xsd:time | xs:pattern |
TIMESTAMP | xsd:dateTime | xs:pattern |
year-month duration | xdt:yearMonthDuration | xs:pattern |
day-time duration | xdt:dayTimeDuration | xs:pattern |
The prefix xsd: expands to http://www.w3.org/2001/XMLSchema#.
The xsd:pattern is used to restrict/map the syntax of the date and time datatypes. For instance, Section 9.5 (Part 14) of the spec, maps the DATE type to:
<xs:simpleType> ANN <xs:restriction base="xs:date"> <xs:pattern value="\p{Nd}{4}-\p{Nd}{2}-\p{Nd}{2}"/> </xs:restriction> </xs:simpleType>
where ANN may indicate the original SQL datatype (implementation defined).
INTERVAL types contain information whether they are "year-month intervals" or a "day-time intervals" which can then be used to map to the appropriate XSD. Further information on this can be found in Sections "4.6.1 Introduction to datetimes and intervals" and "6.1 " of ISO/IEC 9075-2:2008 SQL - Part 2: Foundation (SQL/Foundation).
Mapping constructed and user-defined Datatypes
The constructed datatypes are:
- ARRAY
- MULTISET
- ROW
- REF (references a referenceable table)
According to Section 6.2 of the ISO/IEC 9075-2:2008 SQL - Part 2: Foundation (SQL/Foundation) the ARRAY and MULTISET datatypes can only be cast to between each other, ie, ARRAY to ARRAY, ARRAY to MULTISET, MULTISET to ARRAY and MULTISET to MULTISET.
The REF datatype can be cast into other datatypes (namely CHARACTER VARYING) by using user-defined cast descriptions. Same is valid for user-defined datatypes.
Mapping values of SQL Datatypes to XML Schema Type values
The mapping of values of SQL datatypes to values of XML Schema is described in Section 9.8 of ISO/IEC 9075-14:2008 Part 14 XML-Related Specifications (SQL/XML). A short summary is presented next.
SQL Datatype (SQLT) | XSD value generation () |
---|---|
CHARACTER | CSM(CV) |
CHARACTER VARYING | CSM(CV) |
CHARACTER LARGE OBJECT | CSM(CV) |
BINARY | hex or base64 encoding of SQLV |
BINARY VARYING | hex or base64 encoding of SQLV |
BINARY LARGE OBJECT | hex or base64 encoding of SQLV |
NUMERIC | CSM(CV) |
DECIMAL | CSM(CV) |
SMALLINT | CSM(CV) |
INTEGER | CSM(CV) |
BIGINT | CSM(CV) |
FLOAT | CSM(CV) |
REAL | CSM(CV) |
DOUBLE PRECISION | CSM(CV) |
BOOLEAN | LOWER (CV) |
DATE | CSM(SUBSTRING (CV FROM 6 FOR 10)) |
TIME | CSM(SUBSTRING (CV FROM 6 FOR 8 + Q + Z)) |
TIMESTAMP | CSM(SUBSTRING (CV FROM 11 FOR 10) || 'T' || SUBSTRING (CV FROM 22 FOR 8 + Q + Z)) |
INTERVAL YEAR | CSM(SIGN || 'P' || SUBSTRING (CVA FROM 10 FOR L) || 'Y') |
INTERVAL YEAR TO MONTH | CSM(SIGN || 'P'|| SUBSTRING (CVA FROM 10 FOR L) || 'Y'|| SUBSTRING (CVA FROM 11 + L FOR 2) || 'M') |
INTERVAL MONTH | CSM(SIGN || 'P'|| SUBSTRING (CVA FROM 10 FOR L) || 'M') |
INTERVAL DAY | CSM(SIGN || 'P'|| SUBSTRING (CVA FROM 10 FOR L) || 'D') |
INTERVAL DAY TO HOUR | CSM(SIGN || 'P'|| SUBSTRING (CVA FROM 10 FOR L) || 'DT' || SUBSTRING (CVA FROM 11 + L FOR 2) || 'H') |
INTERVAL DAY TO MINUTE | CSM(SIGN || 'P'|| SUBSTRING (CVA FROM 10 FOR L) || 'DT' || SUBSTRING (CVA FROM 11 + L FOR 2)
|| 'H' || SUBSTRING (CVA FROM 14 + L FOR 2) || 'M') |
INTERVAL DAY TO SECOND | CSM(SIGN || 'P'|| SUBSTRING (CVA FROM 10 FOR L) || 'DT'|| SUBSTRING (CVA FROM 11 + L FOR 2) || 'H'|| SUBSTRING (CVA FROM 14 + L FOR 2) || 'M'|| SUBSTRING (CVA FROM 17 + L FOR 2 + Q) || 'S') |
INTERVAL HOUR | CSM(SIGN || 'PT'|| SUBSTRING (CVA FROM 10 FOR L) || 'H') |
INTERVAL HOUR TO MINUTE | CSM(SIGN || 'PT'|| SUBSTRING (CVA FROM 10 FOR L) || 'H'|| SUBSTRING (CVA FROM 11 + L FOR 2) || 'M') |
INTERVAL HOUR TO SECOND | CSM(SIGN || 'PT'|| SUBSTRING (CVA FROM 10 FOR L) || 'H'|| SUBSTRING (CVA FROM 11 + L FOR 2) || 'M'|| SUBSTRING (CVA FROM
14 + L FOR 2 + Q) || 'S') |
INTERVAL MINUTE | CSM(SIGN || 'PT'|| SUBSTRING (CVA FROM 10 FOR L) || 'M') |
INTERVAL MINUTE TO SECOND | CSM(SIGN || 'PT'|| SUBSTRING (CVA FROM 10 FOR L) || 'M'|| SUBSTRING (CVA FROM 11 + L FOR 2 + Q) || 'S') |
INTERVAL SECOND | CSM(SIGN || 'PT'|| SUBSTRING (CVA FROM 10 FOR L + Q) || 'S') |
where
- SQLV is the SQL value
- CV is the result of 'CAST ( SQLV AS CHARACTER VARYING(M) )', M the maximum length of variable-length character strings
- CSM(Expr) is the implementation dependent Custom Mapping of Expr to Unicode
- For CHARACTER, CHARACTER VARYING and CHARACTER LARGE OBJECT there is the option of escaping '&', '<', '>' and '\n'
- For date and time types
- P be the
- If P is 0 (zero), then let Q be 0 (zero); otherwise, let Q be P + 1 (one).
- If SQLT specifies WITH TIME ZONE, then let Z be 6; otherwise, let Z be 0 (zero).
- For intervals
- If SQLV is negative, then let SIGN be '-' (a character string of length 1 (one) consisting of <minus sign>); otherwise, let SIGN be the zero-length string.
- Let SQLVA be ABS(SQLV).
- Let CVA be the result of: CAST ( SQLVA AS CHARACTER VARYING(M) )
- Let L be the <interval leading field precision> of SQLT.
- Let P be the <interval fractional seconds precision> of SQLT, if any.
- If P is 0 (zero), then let Q be 0 (zero); otherwise, let Q be P + 1 (one).
Further notes
- xsd:hexBinary yields 8000 Google hits, xsd:base64Binary yields 37000.
- For the implementation-dependent choices, we could choose xsd:double, xsd:integer and xsd:base64Binary.
- We can't express facets in RDF, therefore the generated RDF literals will simply be typed directly using the XSD type, without considering facets.
Open questions
- What does the and for INTERVAL mean?
- Precise rules for obtaining the target lexical form?
- Fairly obvious for strings and numbers
- What about dates, times and datetimes? What happens to time zones?
- What about booleans? "true"^^xsd:boolean, "TRUE"^^xsd:boolean?
- What exactly is the reference for base64 encoding?
- What about INTERVAL?
- Does anything surprising happen for numbers with NaN, +Inf, -Inf, -0? (I believe SQL number types don't support any of those, but we should check.)