For years, a growing trend from one industry to the next has been to use XML-based markup languages to represent domain models in a way that facilitates platform- and software-independent interchange. These markup languages can be easily defined (originally via DTDs and later via XML Schemas and/or RELAX NG schemas) parsed (often via DOM- or SAX-based parsers, easily available in any development environemnt), and queried (usually via XPath or XQuery). To further enable cross-enterprise interoperability, XML documents can be naturally embedded in extensible transport protocols such as SOAP.
However, designing XML dialects with an eye towards interoperability often means imposing an artificial rigidity that makes the XML model divergent from how subject-matter experts (SMEs) think about the domain. For example, most XML document families impose a required element order that may not naturally fit the standard workflow of a SME gathering and working with the data in the model. Or perhaps the markup language mints its own coding or identifier scheme to uniquely identify elements in its domain that do not necessarily have such an identifier or code inside an enterprise. And to facilitate both extensibility and interchange at the same time, many domain-specific XML vocabularies contain elements specifically designed to hold custom or user-specific data that is not otherwise part of the model and is likely opaque if the document's recipient is different from its sender.
There is a trade-off, then, between desgining for interoperability versus designing for flexibility and familiarity to SMEs. Whereas XML markup languages come down definitively on the interoperability side of this trade-off, Semantic Web technologies favor flexibility and familiarity. Thus, RDF provides a data model that can easily accommodate new facts at any time, regardless of schema and order. And RDF Schema and OWL provide ontological constructs to express domain semantics at the same conceptual level that industry experts understand.
The question then becomes: Can we leverage the benefits of semantics in order to more easily allow people to create interoperable XML documents?
Two of the leading aforementioned XML vocabularies in the Oil & Gas industry are WITSML--the Wellsite Information Transfer Standard Markup Language--and PRODML--the Production Optimization Markup Language. WITSML comprises both an object-oriented API and a large collection of XML Schemas that provide interchange formats for almost all aspects of drilling operations, while PRODML covers completion and production operations. Both WITSML and PRODML are specifically intended to facilitate interoperability between organizations; the Energistics overview specifically characterizes WITSML as being for the "seamless flow of well data between operators and service companies."
Given the growing adoption of WITSML and PRODML for information transfer across the industry and the potential for Semantic Web technologies to model drilling, completion, and production operations in a way accessible to industry SMEs, we're left with a challenge: Internally, the various organizations participating in a drilling/production operation store the information that must be integrated with data from upstream and downstream partners in various, non-standard data silos. On the wire, seamless interoperability requires the production and consumption of standard WITSML and/or PRODML documents. How do we use semantics to bridge between the data silos and the wire?
One of the most common incarnations of these data silos is nothing more sophisticated than the Excel spreadsheet. Almost all enterprises have thousands or tens of thousands of spreadsheets containing business-critical data on hundreds of thousands of projects, facilities, operations, and more. These spreadsheets implicitly capture the semantics of the data within. When a human looks at the spreadsheet, he or she knows to associate a particular cell with the meaning of both its row header and column header. But the software doesn't know anything of the sort; in particular, Excel has no way to map these implicit relaitonships into the proper structures, codings, and identifiers required to produce a WITSML/PRODML document. Similarly, Excel has no way to bring in an upstream WITSML/PRODML document and intelligently incorporate it into an existing spreadsheet or family of spreadsheets.
(Note that Excel does provide capabilities to interact with XML documents. However, because this occurs at the level of the XML structure, the Excel user (the industry expert) is still forced to think in terms of the arbitrary identifiers, coding schemes, and rigid structure that is required for interoperability of the XML documents.)
Cambridge Semantics' Anzo for Excel provides a potential solution to this challenge. Anzo for Excel leverages semantic technologies to allow spreadsheet data to be both manually and automatically mapped into a semantic domain model. At the same time, Anzo for Excel works with the Anzo Semantic Application Server to convert between instance data conforming to semantic models (ontology models) and domain-specific XML vocabularies. Together, this means that SMEs can tag spreadsheet data using concepts and terms with which they are familiar, and the spreadsheet will from then on automatically correspond to a syntactically and semantically accurate WITSML/PRODML document. Similarly, incoming WITSML/PRODML documents will be mapped into the proper ontology, and the data within them can then be combined with similar spreadsheet data. Once represented with semantic technologies, the data can also be searched, queried, and visualized with greater precision and flexibility than was possible previously.
This position paper has laid out a general framework for using semantics as a bridge between human domain experts and interoperable wire formats. This framework is particularly applicable to the Oil & Gas industry when considering the WITSML and/or PRODML standards. Cambridge Semantics's Anzo for Excel is one potential implementation of this framework.
The positions herein represent those of Cambridge Semantics, as authored by: