Warning:
This wiki has been archived and is now read-only.

Mapping SQL datatypes to XML Schema datatypes

From RDB2RDF
Jump to: navigation, search

The ISO/IEC 9075-14:2008 Part 14 XML-Related Specifications (SQL/XML) defines a mapping from SQL datatypes to XML Schema datatypes. Below are listed the Predefined SQL datatypes and their mapping to XML Schema datatypes. The specification further defines constructed and user-defined types.


Predefined SQL Datatypes

The following predefined (or built-in) SQL datatypes are defined in Section 4.1 of the ISO/IEC 9075-2:2008 SQL - Part 2: Foundation (SQL/Foundation):

  • CHARACTER
  • DECIMAL
  • BOOLEAN
  • CHARACTER VARYING
  • SMALLINT
  • DATE
  • CHARACTER LARGE OBJECT
  • INTEGER
  • TIME
  • BINARY
  • BIGINT
  • TIMESTAMP
  • BINARY VARYING
  • FLOAT
  • INTERVAL
  • BINARY LARGE OBJECT
  • REAL
  • NUMERIC
  • DOUBLE PRECISION


Mapping SQL Datatypes to XML Schema Types

Mapping Predefined Datatypes

The XML Schema datatypes (XSD) are defined in XML Schema Part 2: Datatypes Second Edition, Oct. 2004. W3C Recommendation, http://www.w3.org/TR/xmlschema-2/. Additionally the xsd:dayTimeDuration and xsd:yearMonthDuration datatypes are defined in the XQuery 1.0 and XPath 2.0 Data Model (XDM) (Second Edition), W3C Recommendation, Dec. 2010, http://www.w3.org/TR/xpath-datamodel/.

The mapping of predefined SQL datatypes to XML Schema described in Section 4.10.5 of ISO/IEC 9075-14:2008 Part 14 XML-Related Specifications (SQL/XML) is represented in the following table. Facets are used to constrain the XSD since it's value space is normally more encompassing than the SQL datatype's value space. Although not represented here, XSD annotations may be used to specify additional information such as the original datatype.

SQL XSD Facet used
CHARACTER xsd:string xs:length or xs:maxLength
CHARACTER VARYING xsd:string xs:maxLength
CHARACTER LARGE OBJECT xsd:string xs:maxLength
BINARY xsd:hexBinary or xsd:base64Binary xs:maxLength
BINARY VARYING xsd:hexBinary or xsd:base64Binary (implementation dependent) xs:maxLength
BINARY LARGE OBJECT xsd:hexBinary or xsd:base64Binary (implementation dependent) xs:maxLength
NUMERIC xsd:decimal xs:precision and xs:scale
DECIMAL xsd:decimal xs:precision and xs:scale
SMALLINT xsd:integer or subtype of xsd:integer (implementation dependent) xs:maxInclusive and xs:minInclusive
INTEGER xsd:integer or subtype of xsd:integer (implementation dependent) xs:maxInclusive and xs:minInclusive
BIGINT xsd:integer or subtype of xsd:integer (implementation dependent) xs:maxInclusive and xs:minInclusive
FLOAT xsd:float (precision < 24bits) or xsd:double
REAL xsd:float (precision < 24bits) or xsd:double
DOUBLE PRECISION xsd:float (precision < 24bits) or xsd:double
BOOLEAN xsd:boolean
DATE xsd:date xs:pattern
TIME xsd:time xs:pattern
TIMESTAMP xsd:dateTime xs:pattern
year-month duration xdt:yearMonthDuration xs:pattern
day-time duration xdt:dayTimeDuration xs:pattern

The prefix xsd: expands to http://www.w3.org/2001/XMLSchema#.


The xsd:pattern is used to restrict/map the syntax of the date and time datatypes. For instance, Section 9.5 (Part 14) of the spec, maps the DATE type to:

<xs:simpleType>
 ANN
 <xs:restriction base="xs:date">
   <xs:pattern value="\p{Nd}{4}-\p{Nd}{2}-\p{Nd}{2}"/>
 </xs:restriction>
</xs:simpleType>

where ANN may indicate the original SQL datatype (implementation defined).

INTERVAL types contain information whether they are "year-month intervals" or a "day-time intervals" which can then be used to map to the appropriate XSD. Further information on this can be found in Sections "4.6.1 Introduction to datetimes and intervals" and "6.1 " of ISO/IEC 9075-2:2008 SQL - Part 2: Foundation (SQL/Foundation).

Mapping constructed and user-defined Datatypes

The constructed datatypes are:

  • ARRAY
  • MULTISET
  • ROW
  • REF (references a referenceable table)

According to Section 6.2 of the ISO/IEC 9075-2:2008 SQL - Part 2: Foundation (SQL/Foundation) the ARRAY and MULTISET datatypes can only be cast to between each other, ie, ARRAY to ARRAY, ARRAY to MULTISET, MULTISET to ARRAY and MULTISET to MULTISET.

The REF datatype can be cast into other datatypes (namely CHARACTER VARYING) by using user-defined cast descriptions. Same is valid for user-defined datatypes.

Mapping values of SQL Datatypes to XML Schema Type values

The mapping of values of SQL datatypes to values of XML Schema is described in Section 9.8 of ISO/IEC 9075-14:2008 Part 14 XML-Related Specifications (SQL/XML). A short summary is presented next.

SQL Datatype (SQLT) XSD value generation ()
CHARACTER CSM(CV)
CHARACTER VARYING CSM(CV)
CHARACTER LARGE OBJECT CSM(CV)
BINARY hex or base64 encoding of SQLV
BINARY VARYING hex or base64 encoding of SQLV
BINARY LARGE OBJECT hex or base64 encoding of SQLV
NUMERIC CSM(CV)
DECIMAL CSM(CV)
SMALLINT CSM(CV)
INTEGER CSM(CV)
BIGINT CSM(CV)
FLOAT CSM(CV)
REAL CSM(CV)
DOUBLE PRECISION CSM(CV)
BOOLEAN LOWER (CV)
DATE CSM(SUBSTRING (CV FROM 6 FOR 10))
TIME CSM(SUBSTRING (CV FROM 6 FOR 8 + Q + Z))
TIMESTAMP CSM(SUBSTRING (CV FROM 11 FOR 10) || 'T' || SUBSTRING (CV FROM 22 FOR 8 + Q + Z))
INTERVAL YEAR CSM(SIGN || 'P' || SUBSTRING (CVA FROM 10 FOR L) || 'Y')
INTERVAL YEAR TO MONTH CSM(SIGN || 'P'|| SUBSTRING (CVA FROM 10 FOR L) || 'Y'|| SUBSTRING (CVA FROM 11 + L FOR 2) || 'M')
INTERVAL MONTH CSM(SIGN || 'P'|| SUBSTRING (CVA FROM 10 FOR L) || 'M')
INTERVAL DAY CSM(SIGN || 'P'|| SUBSTRING (CVA FROM 10 FOR L) || 'D')
INTERVAL DAY TO HOUR CSM(SIGN || 'P'|| SUBSTRING (CVA FROM 10 FOR L) || 'DT' || SUBSTRING (CVA FROM 11 + L FOR 2) || 'H')
INTERVAL DAY TO MINUTE CSM(SIGN || 'P'|| SUBSTRING (CVA FROM 10 FOR L) || 'DT' || SUBSTRING (CVA FROM 11 + L FOR 2)

|| 'H' || SUBSTRING (CVA FROM 14 + L FOR 2) || 'M')

INTERVAL DAY TO SECOND CSM(SIGN || 'P'|| SUBSTRING (CVA FROM 10 FOR L) || 'DT'|| SUBSTRING (CVA FROM 11 + L FOR 2) || 'H'|| SUBSTRING (CVA FROM 14 + L FOR 2) || 'M'|| SUBSTRING (CVA FROM 17 + L FOR 2 + Q) || 'S')
INTERVAL HOUR CSM(SIGN || 'PT'|| SUBSTRING (CVA FROM 10 FOR L) || 'H')
INTERVAL HOUR TO MINUTE CSM(SIGN || 'PT'|| SUBSTRING (CVA FROM 10 FOR L) || 'H'|| SUBSTRING (CVA FROM 11 + L FOR 2) || 'M')
INTERVAL HOUR TO SECOND CSM(SIGN || 'PT'|| SUBSTRING (CVA FROM 10 FOR L) || 'H'|| SUBSTRING (CVA FROM 11 + L FOR 2) || 'M'|| SUBSTRING (CVA FROM

14 + L FOR 2 + Q) || 'S')

INTERVAL MINUTE CSM(SIGN || 'PT'|| SUBSTRING (CVA FROM 10 FOR L) || 'M')
INTERVAL MINUTE TO SECOND CSM(SIGN || 'PT'|| SUBSTRING (CVA FROM 10 FOR L) || 'M'|| SUBSTRING (CVA FROM 11 + L FOR 2 + Q) || 'S')
INTERVAL SECOND CSM(SIGN || 'PT'|| SUBSTRING (CVA FROM 10 FOR L + Q) || 'S')

where

  • SQLV is the SQL value
  • CV is the result of 'CAST ( SQLV AS CHARACTER VARYING(M) )', M the maximum length of variable-length character strings
  • CSM(Expr) is the implementation dependent Custom Mapping of Expr to Unicode
  • For CHARACTER, CHARACTER VARYING and CHARACTER LARGE OBJECT there is the option of escaping '&', '<', '>' and '\n'
  • For date and time types
    • P be the
    • If P is 0 (zero), then let Q be 0 (zero); otherwise, let Q be P + 1 (one).
    • If SQLT specifies WITH TIME ZONE, then let Z be 6; otherwise, let Z be 0 (zero).
  • For intervals
    • If SQLV is negative, then let SIGN be '-' (a character string of length 1 (one) consisting of <minus sign>); otherwise, let SIGN be the zero-length string.
    • Let SQLVA be ABS(SQLV).
    • Let CVA be the result of: CAST ( SQLVA AS CHARACTER VARYING(M) )
    • Let L be the <interval leading field precision> of SQLT.
    • Let P be the <interval fractional seconds precision> of SQLT, if any.
    • If P is 0 (zero), then let Q be 0 (zero); otherwise, let Q be P + 1 (one).

Further notes

  • xsd:hexBinary yields 8000 Google hits, xsd:base64Binary yields 37000.
  • For the implementation-dependent choices, we could choose xsd:double, xsd:integer and xsd:base64Binary.
  • We can't express facets in RDF, therefore the generated RDF literals will simply be typed directly using the XSD type, without considering facets.

Open questions

  • What does the and for INTERVAL mean?
  • Precise rules for obtaining the target lexical form?
    • Fairly obvious for strings and numbers
    • What about dates, times and datetimes? What happens to time zones?
    • What about booleans? "true"^^xsd:boolean, "TRUE"^^xsd:boolean?
    • What exactly is the reference for base64 encoding?
    • What about INTERVAL?
    • Does anything surprising happen for numbers with NaN, +Inf, -Inf, -0? (I believe SQL number types don't support any of those, but we should check.)