Warning:
This wiki has been archived and is now read-only.

EPIM ReportingHub

From RDF Data Shapes Working Group
Jump to: navigation, search

This page describes a use case of constraint checking and data transformation that should be of interest to the WG. This work was undertaken by TopQuadrant and partners including Franz, Inc. for a large Norwegian Oil and Gas project. The resulting EPIM ReportingHub system has been in production since 2012. You can find more technical presentations about this via Google (e.g. https://www.posccaesar.org/svn/pub/SemanticDays/2012/Presentations/May9/10_David_Price.pdf). TopQuadrant has since worked on two equally large and complete applications EPIM EnvironmentHub and EPIM LogisticsHub, all delivered as SaaS with ongoing support for years to come.

In a nutshell, the system is a server that is used by oil producers to upload daily and monthly drilling reports in a pre-defined XML format. The data gets checked against XSD constraints and is then converted into a canonical RDF representation where each XML element essentially becomes an instance of an OWL ontology derived from the XSD. Those RDF instances are then validated against basic integrity constraints, for example to verify that the well bores in the document correspond to a background data (NPD Fact Pages). Those NPD Fact Pages are in a specific named graph based on a different ontology from incoming data.

Here is an example SPIN constraint from that step:

 CONSTRUCT {
    _:cv a spin:ConstraintViolation ;
        spin:violationRoot ?this ;
        spin:violationPath ddr:nameWellbore ;
        rdfs:label ?message .
 }
 WHERE {
    ?this ep-lib:nameWellbore ?wellBoreName .
    BIND (rhspin:wellBoreByName(?wellBoreName) AS ?wellBore) .
    FILTER (!bound(?wellBore)) .
    BIND (fn:concat("[RH-19] Fact Pages validation of the XML file failed with the following error: Unregistered well bore name ", ?wellBoreName) AS ?message) .
 }

The query above makes use of SPIN functions such as rhspin:wellBoreByName - essentially stored procedures. If such a SPIN function is called, a nested SELECT query is executed. This makes the resulting queries much more readable and modular than having to repeat the same logic again and again.

Once the first step of constraint checking passes, SPIN is used to translate from the canonical RDF representation (derived from the XML) into the target ontology used to store the drilling reports in a triple store. spin:rule is used to attach those transformation queries to the source classes. Here is an example rule, again a SPARQL CONSTRUCT. There are hundreds of rules with similar complexity.

 # STEP 116 Transfer presTestType
 CONSTRUCT {
    ?dailyDrillingActivityToStatus ep-core:hasPart _:b0 .
    _:b0 a ?pressureTestType .
 }
 WHERE {
    ?this ep-spin-lib:nameWellbore ?nameWellBore .
    ?this ddr:dTimStart ?dTimStart .
    ?this ddr:statusInfoRef ?statusInfo .
    ?statusInfo ddr:dTim ?dTim .
    ?statusInfo ddr:presTestTypeRef ?presTestType .
    BIND (ep-spin-lib:selectPressureTestType(?presTestType) AS ?pressureTestType) .
    BIND (ep-spin-lib:normalizeString(?nameWellBore) AS ?normalizedWellBoreName) .
    BIND (ep-spin-lib:buildDailyDrillingActivityToStatusURI(?normalizedWellBoreName, ?dTimStart, ?dTim) AS ?dailyDrillingActivityToStatus) .
 }

At this stage there is a new RDF graph for the uploaded report, and this graph is validated against another set of SPIN constraints, such as

 CONSTRUCT {
    _:cv a spin:ConstraintViolation ;
        spin:violationRoot ?this ;
        rdfs:label ?message .
 }
 WHERE {
    ?this (ep-report:reportOn/ep-activity:onWellBore)/ep-core:temporalPartOf ?wellBore .
    FILTER (!rhspin:currentUserIsOperatorOfWellBore(?wellBore)) .
    BIND (COALESCE(rhspin:npdName(?wellBore), "Unknown well bore") AS ?wellBoreName) .
    BIND (COALESCE(rhspin:npdName(?licence), "Unknown licence") AS ?licenceName) .
    BIND (rhspin:companyName() AS ?companyName) .
    BIND (fn:concat("[RH-11] Your company (", ?companyName, ") is not the operator of the BAA or licence associated with well bore ", ?wellBoreName) AS ?message) .
 }

The normalizeString function has the following body:

 SELECT ?normalizedStr
 WHERE {
     BIND (spif:regex(?arg1, "\\((.*)\\)", "") AS ?s1) .
     BIND (spif:trim(?s1) AS ?s2) .
     BIND (spif:encodeURL(?s2) AS ?t0a) .
     BIND (spif:regex(?t0a, "%2F", "_") AS ?t0b) .
     BIND (spif:regex(?t0b, "%[0-9A-F][0-9A-F]", "") AS ?t1) .
     BIND (spif:regex(?t1, "\\+", "_") AS ?t2) .
     BIND (spif:regex(?t2, "_+", "_") AS ?t3) .
     BIND (spif:regex(?t3, "\\*", "") AS ?t4) .
     BIND (xsd:string(spif:upperCase(?t4)) AS ?normalizedStr) .
 }

Another function is used to get the local name of a resource, with the following steps:

 SELECT ?localName
 WHERE {
     BIND (xsd:string(?arg1) AS ?uri) .
     BIND (spif:lastIndexOf(?uri, "/") AS ?slash) .
     BIND (spif:lastIndexOf(?uri, "#") AS ?hash) .
     BIND (IF(((!bound(?hash)) || (bound(?slash) && (?slash > ?hash))), ?slash, ?hash) AS ?sep) .
     BIND (fn:substring(?uri, (?sep + 2)) AS ?localName) .
 }

EPIM EnvironmentHub Examples

A follow-up project called EnvironmentHub had similar requirements:

1) When environment data i received from a wellbore, we need to make sure that the wellbore ID used in the data is associated with a particular facility and field and all of these are compliant with the NPD (National Petroleum Association) reference facts.

2) Wellbores can be approached by movable facilities (e.g.: drilling platform). When facility operates at a wellbore, it transmits data about the wellbore. When data is received from a movable facility containing information about wellbore's emissions, it needs to be checked for correctly "overlapping" dates. NonOverlappingDate (violation) means date in the file does not overlap with start/end date of when moveable facility was at wellbore.

3) As part of the transformation, URIs are generated from the from literal values present in the files. As URIS are generated they must be checked for validity - uniqueness, adherence to some defined patterns, etc.

Example of the overlapping data validation in SPIN:

 # Wellbore Moveable Facility Dates Not Overlapping
 CONSTRUCT {
   _:b0 a spin:ConstraintViolation .
   _:b0 spin:violationRoot ?this .
   _:b0 spin:violationPath ew:wellboreRef .
   _:b0 rdfs:label ?errorLabel .
 }
 WHERE {
   ?this ewmp:wellboreFacilityIdOrName ?facility .
   FILTER (eeh-lib:isFacilityMoveable(?facility) = true) .
   ?this ewmp:wellboreIdOrName ?wellbore .
   FILTER (eeh-lib:isWellboreInnretning(?wellbore) = false) .
   BIND (eeh-lib:getWellboreURI(?wellbore) AS ?wburi) .
   BIND (eeh-lib:getFacilityURI(?facility) AS ?facuri) .
   ?facilityparent ew:wellboreRef ?this .
   ?fieldparent ew:facilityRef ?facilityparent .
   ?structuretype ew:fieldRef ?fieldparent .
   ?reportdata ew:structureTypeRef ?structuretype .
   ?dataset ew:reportDataRef ?reportdata .
   ?dataset ew:year ?year .
   BIND (eeh-lib:getUsingFacilityURIInYear(?facuri, ?wburi, ?year) AS ?usingfacuri) .
   FILTER (!bound(?usingfacuri)) .
   ?this ewmp:wellboreFieldIdOrName ?field .
   ?this ewmp:wellboreStructureType ?st .
   BIND (eeh-lib:setErrorValueByType("", "ErrorType", "NPDWellboreFacilityDate") AS ?str1) .
   BIND (eeh-lib:setErrorValueByType(?str1, "ErrorValue", ?wellbore) AS ?str2) .
   BIND (eeh-lib:setErrorValueByType(?str2, "ErrorValueType", "NonOverlappingDate") AS ?str3) .
   BIND (eeh-lib:setErrorValueByType(?str3, "InField", ?field) AS ?str4) .
   BIND (eeh-lib:setErrorValueByType(?str4, "InFacility", ?facility) AS ?str5) .
   BIND (eeh-lib:setErrorValueByType(?str5, "Structure", ?st) AS ?errorLabel) .
 }

Conclusions

I would like to highlight a few points:

  • The ontologies were mostly custom-written for this project. However, they are the basis for a new ISO 15926-12 standard for life cycle data integration using OWL.
  • The SPIN files above and some ontologies are entirely private to the specific application, not part of "The Semantic Web".
  • However, other components and applications use these ontologies internally, for example to generate reports documents that aggregate data from other uploaded reports.
  • OWL is used as the syntax for those ontologies, mostly to describe range and cardinality restrictions on properties. OWL is not used for inferencing. The only type of "inferencing" are SPIN rules that convert from one ontology to another.
  • The complexity of the constraint checks and transformation rules required something as expressive as SPARQL.
  • In order to maintain those queries, we made heavy use of SPIN functions that encapsulate reusable blocks of SPARQL code.
  • It was natural to associate SPIN constraints and rules with the ontology, i.e. we did not need a parallel structure of "Shapes" that is detached from the existing class structure.
  • Many constraints include examples of string operations, e.g. to concatenate meaningful error message but also in the normalizeString function mentioned above
  • Most classes in the core ontology (derived from NPD Fact Pages) have a primary key (npd:id) that is used for various purposes throughout the system (e.g. whenever XML file are imported and the NPD Fact Pages updated).