GML Introduction

Introduction to GML

Geography Markup Language

Ron Lake
Galdos Systems Inc

1.0 Introduction:

This paper provides a brief introduction to Geography Markup Language (GML). The paper is the first in a series of papers to get you acquainted with this exciting way to represent and manipulate geographic information. Following articles on this site will introduce you to a variety of GML topics including GML map making, GML data transformations, spatial queries and geographic analysis, GML-based spatial databases, and a variety of GML applications including applications to mobile computing systems. We expect GML to revolutionize the treatment of spatial information. GML is web friendly. For the first time spatial information will have a truly public encoding standard.

2.0 What is GML ?

2.1 Status

GML or Geography Markup Language is an XML based encoding standard for geographic information developed by the OpenGIS Consortium (OGC). It is current status is an RFC under review within the OpenGIS Consortium. The RFC is supported by a variety of vendors including Oracle Corporation, Galdos Systems Inc, MapInfo, CubeWerx and Compusult Ltd. GML was implemented and tested through a series of demonstrations which formed part of the OpenGIS Consortium's Web Mapping Test Bed (WMT) conducted in September 1999. These tests involved GML mapping clients interacting with GML data servers and service providers.

2.2 Geography, Graphics and Maps

Before we look at GML itself, it is important that we draw some clear distinctions between geographic data (which is encoded in GML) and graphic interpretations of that data as might appear on a map or other form of visualization. Geographic data is concerned with a representation of the world in spatial terms that is independent of any particular visualization of that data. When we talk about geographic data we trying to capture information about the properties and geometry of the objects which populate the world about us. How we symbolize these on a map, what colors or line weights we use is something quite different. Just as XML is now helping the Web to clearly separate content from presentation, GML will do the same in the world of geography.

GML is concerned with the representation of the geographic data content. Of course we can also use GML to make maps. This might be accomplished by developing a rendering tool to interpret GML data, however, this would go against the GML approach to standardization, and to the separation of content and presentation. To make a map from GML we need only to style the GML elements into a form which can be interpreted for graphical display in a web browser. Potential graphical display formats include W3C Scalable Vector Graphics (SVG), the Microsoft Vector Markup Language (VML), and the X3D. A map styler is thus used to locate GML elements and interpret them using particular graphical styles. The next article in this series will deal with generating a map from GML using SVG and X3D.

2.3 GML is Text

Like any XML encoding, GML represents geographic information in the form of text. While a short while ago this might have been considered verboten in the world of spatial information systems, the idea is now gaining a lot of momentum. Text has a certain simplicity and visibility on its side. It is easy to inspect and easy to change. Add XML and it can also be controlled.

Text formats for geometry and geography have been employed before. The pioneering work of the Province of British Columbia with its SAIF format is just one such example. In the Province of British Columbia, more than 7000 files of 1:20,000 scale data including topography, planimetry (hydrography, buildings, roads etc.) and toponymy are available in the SAIF format. The Province has shown that text formats are practical and easy to use. Another example of the use of text for complex geometric data sets is that of VRML (Vector Markup Language). Large and complex VRML models have been built and navigated over the Web all using text based encoding. Interestingly enough the VRML geometry and behaviour are themselves now being recast in XML through the efforts of the X3D Working Group.

2.4 GML Encodes Feature Geometry and Properties

GML is based on the abstract model of geography developed by the OGC. This describes the world in terms of geographic entities called features. Essentially a feature is nothing more than a list of properties and geometries. Properties have the usual name, type, value description. Geometries are composed of basic geometry building blocks such as points, lines, curves, surfaces and polygons. For simplicity, the initial GML specification is restricted to 2D geometry, however, extensions will appear shortly which will handle 2 1/2 and 3D geometry, as well as topological relationships between features.

GML encoding already allows for quite complex features. A feature can for example be composed of other features. A single feature like an airport might thus be composed of other features such as taxi ways, runways, hangers and air terminals. The geometry of a geographic feature can also be composed of many geometry elements. A geometrically complex feature can thus consist of a mix of geometry types including points, line strings and polygons.

To encode the geometry of a feature like a building we simply write:

<Feature   fid="142" featureType="school" Description="A middle school">
        <Polygon name="extent" srsName="epsg:27354">
            <LineString name="extent" srsName="epsg:27354">
                <CData>
                  491888.999999459,5458045.99963358 491904.999999458,5458044.99963358
                  491908.999999462,5458064.99963358 491924.999999461,5458064.99963358
                  491925.999999462,5458079.99963359 491977.999999466,5458120.9996336
                  491953.999999466,5458017.99963357 </CData>
            </LineString>
        </Polygon>
</Feature>

Note that this has no properties (other than the geometry). These we can readily add and the building would look something like:

<Feature   fid="142" featureType="school" >
   <Description>Balmoral Middle School</Description>>
   <Property Name="NumFloors" type="Integer" value="3"/>
   <Property Name="NumStudents" type="Integer" value="987"/>
       <Polygon name="extent" srsName="epsg:27354">
            <LineString name="extent" srsName="epsg:27354">
                <CData>
                  491888.999999459,5458045.99963358 491904.999999458,5458044.99963358
                  491908.999999462,5458064.99963358 491924.999999461,5458064.99963358
                  491925.999999462,5458079.99963359 491977.999999466,5458120.9996336
                  491953.999999466,5458017.99963357 </CData>
        </LineString>
    </Polygon>
</Feature>

2.5 GML Encodes Spatial Reference Systems

An essential component of a geographic system is a means of referencing the geographic features to the earth's surface or to some structure related to the earth's surface. The current version of GML incorporates an earth based spatial reference system which is extensible and which incorporates the main projection and geocentric reference frames in use today. This is capable of encoding all of the reference systems which can be found at the European Petroleum Standards Group (EPSG) web site. In addition the encoding scheme allows for user defined units and reference system parameters. Future versions of GML will likely provide even more flexible encodings in order to handle local coordinate systems such as used for mile logging etc.

Why encode a spatial reference system ? Why not just provide a unique name and be done with it ? In many cases such an approach does suffice and GML does not require that the sender of geographic data also send an encoding of the reference system to which the data's coordinate values are referenced. There are cases, however, where such information is very valuable, and include:

Client validation of a server specified Spatial Reference System. Client can request the SRS description (an XML document) and compare it to its own specifications or show it to a user for verification.
Client display of a server specified Spatial Reference System.
Use by a Coordinate Transformation Service to validate an input data sources Spatial Reference System.
A Coordinate Transformation Service can compare the SRS description with its own specifications to see if the SRS is consistent with the selected transformation.
To control automated coordinate transformation by supplying input and output reference system names and argument values.

Watch the GeoJava site for future GML services that transform GML data from one spatial reference system to another.

With the GML encoding for spatial references, it is possible to create a web site which stores any number of spatial reference system definitions. Stay tuned to the GeoJava site for standard encodings of common spatial reference systems.

2.6 GML Feature Collections

The XML 1.0 Recommendation from the W3C is based on the notion of a document. The current version of GML is based on XML 1.0, and uses a FeatureCollection as the basis of its document. A FeatureCollection is a collection of GML Features together with an Envelope (which bounds the set of Features), a collection of Properties that apply to the FeatureCollection and an optional list of Spatial Reference System Definitions. A FeatureCollection can also contain other FeatureCollections, provided that the Envelope of the bounding FeatureCollection bounds the Envelopes of all of the contained FeatureCollections.

When a request is made for GML data from a GML server, data is always returned in FeatureCollections. There is no limit in the GML RFC on the number of features which can be contained in a FeatureCollection. Because FeatureCollections can contain other FeatureCollections it is a relatively simple procedure to "glue together" FeatureCollections received from a server into still larger collections.

2.7 GML - More than a Data Transport

While GML is an effective means for transporting geographic information from one place to another we expect that it will also become an important means of storing geographic information as well. The key element here is XLink and XPointer. While these two specifications lag in the development and implementation area they hold great promise for building complex and distributed geographic data sets. Geographic data is, well, geographic. It is naturally distributed over the face of the earth. Interest in data about Flin Flon, Saskatchewan is much much higher near Flin Flon than it would be in Pasadena, California. At the same time there are applications which need to reach out and obtain data on a global basis for large scale analysis or because of interest in a narrow vertical domain. Applications of the later sort also abound in a diverse collection of fields from environmental protection to mining, highway construction, and disaster management. How nice it would be if data could be developed on the local scale and readily integrated to the regional and the global scale ?

In most jurisdictionsn geographic data is collected by particular agencies for a particular purpose. Forest bureaus collect information on the disposition of trees (tree diameters, site conditions, growth rates) for the effective management of commercial forests. Environmental departments collect information on the distribution of animals and animal habitat. Development interests maintain information on demographics and existing features in the built environment. Real world problems seldom, however, respect the parochial boundaries of departments, ministries and bureaus. How nice it would be if data developed for one purpose could be readily integrated with data developed for another ?

We believe that GML as a storage format, combined with XLink and XPointer will provide some useful contributions to these problems. Watch the GeoJava site for our article on GML Spatial Databases.

2.8 On What Technologies Does it Depend ?

GML is based on XML. XML, while sometimes talked about as a replacement for HTML, is best thought of as a language for data description. More correctly, XML is a language for expressing data description languages. XML is, however, not a programming language. There are no mechanisms in XML to express behaviour or to perform computations. That is left for other languages such as Java and C++.

2.8.1 XML Version 1.0

XML 1.0 provides a means of describing (marking up) data using user defined tags. Each segment of an XML document is bounded by starting and end tags. This looks as follows:

                                    <Feature>
                                                .... more XML descriptions ...
                                                ....
                                    </Feature>

The valid tag names are determined by the Document Type Definition. Which tags can appear enclosed within an opening and closing tag pair is also determined by the DTD.

XML tags can also have attributes associated with them. These are also constrained by the DTD in name and in some cases in terms of the values that the attributes can assume.

XML is typically read by an XML parser. All XML parsers check that the data is well formed so that data corruption (e.g. missing closing tag) cannot pass undetected. Many XML parsers are also validating, meaning that they check that the document conforms to the associated DTD.

Using XML is it is comparatively easy to generate and validate complex hierarchical data structures. Such structures are common in geographic applications.

2.8.2 XSL and XSLT (Transforming the WWW)

The original focus of XML was to provide a means of describing data separate from its presentation, especially in the context of the world wide web. XML Version 1.0 deals with the description of data. A companion technology, called XSL was to deal with the presentation side. Overtime it has become apparent that XSL is actually two different technologies. One, now called XSLT (the T stands for Transformation), is focused on the transformation of XML. The other technology is concerned with the actual formatting of text or images and is referred to in terms of format objects or flow objects. In our discussions we are only concerned with XSLT. Since many tools (e.g. MS IE 5.0) were developed before the XSLT label had stuck, XSL is still often used when only XSLT is intended. We will follow that practice.

If you follow xml.com, you may recall a great deal of discussion about the merits of XSL. The XSLT clarification has helped to dampen this discussion somewhat, however, there is still a great deal of skepticism regarding the utility and the need for XSL in some sectors of the XML community. We stand on the opposite side of the issue. We believe that it is the transformational character of XML that is most important, and XSL (XSLT) provides a clean declarative means for expressing these transformations. In our view XSLT is as essential to GML as XML itself.

XSL is a fairly simple language. It provides a powerful syntax for expressing pattern matching and replacement. It is declarative. You can easily read what the XSLT says to do. You do not get to see how it is accomplished. Using its companion specifications (XPath and XQL) you can specify some very powerful queries on an XML document. Furthermore XSLT incorporates the ability to call functions in another programming language such as VBScript or Java through the use of Extension Functions. This means that XSL can be used to do the querying and selection, and then call out to Java or another language to perform needed computation or string manipulation. For simple tasks, XSLT provides built in string handling and arithmetic capabilities.

2.8.3 SVG, VML and X3D - Vector Graphics for the Web

XML has made it's presence felt in many different quarters, not the least of which is vector graphics. Several XML based specifications for describing vector graphic elements have been developed, including Scalable Vector Graphics (SVG), Microsoft's Vector Markup Language (VML), and X3D, the XML incarnation of the syntax and behaviour of VRML (Virtual Reality Markup Language). These specifications are in many ways similar to GML, but have a very different objective. Each has a means of describing geometry. The graphical specifications, however, are focused on appearance and hence include properties and elements for colors, line weights and transparency to name but a few aspects. To view an SVG, VML or X3D data file, it is necessary to have a suitable graphical data viewer. In the case of VML this is built into IE 5.0 (and nowhere else). In the case of SVG, Adobe is developing a series of plug-ins for Internet Explorer and Netscape Communicator as well as Adobe Illustrator, while IBM and several other companies, are, or have already developed, SVG viewers or supporting graphics libraries. Several all Java SVG viewers are available or under development.

To draw a map from GML data you need to transform the GML into one of the graphical vector data formats such as SVG, VML or VRML. This means to associate a graphical "style" (e.g. symbol, colour, texture) with each type of GML feature or feature instance. We will have more to say on this in the GeoJava article, Making Maps from GML.

Figure 1. illustrates the drawing of map using an XSLT style sheet on a suitable mapping client.

ClientSideMapMaking

Figure 1. Making a Map with XSLT and SVG

2.8.4 XLink and XPointer - Linking one place to another

With current HTML technology it is possible to build linked geographic data sets. One can readily build image maps which are linked to other image maps. The HTML linking mechanism has, however, many limitations, and as a result it is not practical to build large complex distributed data sets as occur in real world systems. The most significant limitation is that an HTML link is effectively hard coded in both the source (<a href = ... >) and target (anchor) documents a fact which would any significant system both fragile and impossible to scale. XLink gets around these problems by allowing "out of line" links. In an out of line link, the source points only to a link database and it is the link database that provides the pointer to specific XML elements in the target document. The link is thus not hard coded in either document. This is of great importance in relation to GML as it makes it possible to build scalable, distributed geographic data sets. Even more importantly, the XLink and XPointer make it possible to build application specific indexes for a dates. Need to have a group of buildings organized by street address ? Want to create a farm plot index based on crop type ? With XLink and XPointer, these and many other indexing schemes can be readily constructed, and all without altering the source data itself. We will have much more to say about this in coming articles.

3.0 Why GML ?

Why introduce GML at all ? There are already a host of encoding standards for geographic information including COGIF, MDIFF, SAIF, DLG, SDTS to name only a few. What is so different about GML ? In some ways nothing. GML is a simple text based encoding of geographic features. Some of these other formats are not text based, however, some of them (e.g. SAIF) certainty are. GML is based on a common model of geography (OGC Abstract Specification) which has been developed and agreed to by the vast majority of all GIS vendors in the world. More importantly, however, GML is based on XML. Why should this matter ? There are several reasons why XML is important. To begin with XML provides a method to verify data integrity. Secondly, any XML document can be read and edited using a simple text editor. Nothing more than MS Notepad is required to view or change an XML document. Thirdly, since there are an increasing number of XML languages, it will be more and more easy to integrate GML data with non spatial data. Even in the case of non-XML non-spatial data this is the case. Perhaps, most importantly, XML is easy to transform. Using XSLT or almost any other programming language (VB, VBScript, Java, C++, Javascript) we can readily transform XML from one form to another. A single mechanism can thus be employed for a host of transformations from data visualization to coordinate transforms, spatial queries, and geo-spatial generalization.

GML rests securely on a widely adopted public standard, that of XML. This ensures that GML data can be viewed, edited and transformed by a wide variety of commercial and free ware tools. For the first time we can truly talk about open geographic information.

3.1 Automated Verification of Data Integrity

One of the important features of XML is the ability to verify data integrity. In the XML 1.0 Recommendation this is achieved through the Document Type Definition (DTD). The DTD specifies the structure of an XML document in a such a way that a validating parser can verify that a given document instance complies with this DTD. GML is specified by such a DTD. Future versions of GML will also be supported by XML Schema, a more flexible integrity mechanism than the DTD that should become a W3C Recommendation early in 2000.

Using the GML DTD, servers and clients can readily verify that the data they are to send or receive complies with the specification. Furthermore this can be accomplished with a variety of parsing tools by at least a have a dozen different vendors on a wide variety of operating systems, databases, application servers and browsers.

3.2 GML can be Read by Public Tools

As we have already noted, GML is text and one need have nothing more than a simple text editor to read it. GML, however, is structured, and any of a variety of XML editors can be employed to display that structure. This makes viewing and navigating GML data very easy as shown in Figure 2.

GMLView

Figure 2. Sample GML File Viewed in XML Spy

3.3 GML can be Easily Edited

Using the many XML editors described in Section 3.2 it is also very easy to edit GML data. Want to add a new feature property or change a property value ? Need to adjust a features geometry. These are easily accomplished with a standard XML editor. Unlike many other text based formats however there is no way you can corrupt the data using an XML editor. The editor can be made to ensure that any data which is created or modified complies with the DTD.

It is also not difficult to create a graphical editor for GML and such products are expected to appear on the market within the coming year. Again the GML DTD can be used to ensure data integrity. Note that when one edits GML graphically an intermediate graphic representation is required (perhaps SVG) which is then used to define the geometry of the associated GML feature. We will have more to say on this subject in our up coming article on Making Maps from GML to appear on the GeoJava site.

3.4 GML can readily Integrate with Non-Spatial Data

Binary data structures are typically very difficult to integrate with one another. A classic example is that of associating a text document, or a parameter list, with a separately developed and maintained spatial database of parcels or land tenure boundaries. With a binary data structure one must understand the file structure or database schema and be able to modify it. In many legacy systems using flat files the data structure cannot be modified without breaking the applications which rely on the existing data structure. With GML it is comparatively easy to provide links to other XML data elements and this will dramatically improve with the introduction of XLink and XPointer. Even links to non-XML elements can be readily handled using the well established URI syntax.

3.5 GML is Transformable

The most important aspect of XML in our view is its transformability. It is quite easy to write a transformation which carries XML data relative to one DTD to XML relative to another. This is exactly what we do when we generate an SVG graphical element stream from a GML data file. Such transformations can be accomplished using a variety of mechanisms including XSLT, Java, Javascript and C++ to name only a few. XSLT in our view is of particular interest. With XSLT it is very easy to write a style sheet which locates and transforms GML elements into other XML elements. Where XSLT is not up to the task, one can readily incorporate XSLT extension functions written in Java or VB (the exact languages supported depends on the implementation) to perform tasks such as string manipulation or mathematical computation. XSLT can also make use of powerful searching syntax (XPath/XQL) so as to retrieve elements that satisfy complex boolean expressions on the elements and their attributes. Using these techniques an XSLT style sheet can perform a wide variety of querying, analysis and transformation functions. Consider the following examples:

Using XSLT with suitable extension functions we can extract spatial elements which satisfy various spatial and attribute queries. Galdos Systems Inc will be providing just such a set of spatial extension functions in the near future on the GeoJava site. Using these functions it will be straightforward to write a spatial query that extracts features of a given type which lie within a specified region or which intersect a particular feature.

Change the XSLT style sheet and we can accomplish a totally different function. We can for example write a style sheet that performs coordinate transformation as was demonstrated in the OGC WMT IOC in Washington, September 10, 1999. This immediately provides us with a coordinate transformation service. Locate GML data in one part of the world in reference system X and simply pass its URI to the service and specify the target reference system, and presto you will have GML in the new frame of reference. Look on the GeoJava site for upcoming coordinate transformation service for GML data.

Change the XSLT style sheet and we can accomplish yet another function. We can for example generate an SVG, VML or X3D map on the server. Select different style sheets for different viewing devices or different types of maps.

The transformability of GML also means that we can readily construct application specific indexes or at least we will be able to once XLink and XPointer implementations start to move toward reality. Look for this to have a huge impact on the utility of GML data sets.

3.6 GML can Transport Behaviour

XML is a language for describing data description languages. GML does not itself encode behaviour. GML can, however, be used in conjunction with languages like Java or C++ to in effect transport geographic behaviour from one place to another. This can be done using a simple object factory which instantiates objects based on received GML data, mapping the GML element names into object classes. In the Java case this would mean mapping the GML elements into Java classes as listed in the OGC Java Simple Features RFC. This "re-hydration" of the GML data then creates Java objects which have the OGC interfaces for Simple Features (of course we did not transport the interfaces). GML and Java (or COM or CORBA) Simple Features can thus get along very well with one another. In many applications one only needs the behaviour for a small number of the elements. With this approach one might receive 10,000 GML elements but only need to construct a hundred or so Java objects on an as needed basis.

4.0 What's Coming Down the Road ?

I think we have made it pretty clear that we think GML is pretty cool. Once you have had the opportunity to play with it you will think it is pretty cool as well. Over the next 6 months a series of articles and services extending your understanding of GML, and how to apply it in real world problems, will appear on the GeoJava website. Look for articles on Map Making, Making maps in SVG, Geographic Transformations, GML Spatial databases, Mobile applications and much more.

What will happen to GML itself ? We expect quite a lot. The current version of GML is based on linear geometry and provide no notions of topology. Over the next several months, new versions of GML will be introduced adding topology, non-linear feature geometries, 21/2 and 3D geometry, support for OGC Coverages, XSLT spatial query extension functions, XLink/XPointer support, and an XML Schema implementation.

5.0 Conclusion

GML is a powerful new way to look at spatial information using XML encoding. It promises. however, much more than a mere encoding standard. The inherent transformability and accessibility of GML will open a whole new domain in geo-spatial information management.