SPIDERS: A use case for semantic integration of legacy data in network management systems

Submitted by: García Gómez, Sergio ⁽¹⁾, de Castro García, Ricardo ⁽¹⁾, Sánchez Sáez, Nuria ⁽²⁾

⁽¹⁾ Telefónica I+D, Spain

⁽²⁾ IT Deusto, Spain

Introduction

The core business of Telco operators is managed with OSS (Operations Support Systems) and BSS (Business Support Systems). In an incumbent operator, lots of OSS's BSS's growing in a disparate way (in-house developments, commercial-off-the-self, etc.), using heterogeneous data models and technologies. These systems can be considered legacy systems.

Usually, these legacy systems need to communicate with each other to exchange information. This communication has the following characteristics:

Each system supports different functionalities and manages and shares with other systems great amounts of data.
Data is propagated by remote invocations, message passing systems, ad-hoc interfaces and also with manual data loads.

Currently, in order to realize successful communications, there must be confronted several problems related to the quality of data:

Software problems: The high cost of software maintenance of these interfaces, as well as error-prone causes problems of dirty data and lack of information.
Business problems: The operation costs (OPEX) of the company, time-to-market of new services, and quality of service offered to the customers.

There exist a common information model for OSS/BSS interoperability which is hardly used: SID (Shared Information and Data), defined by the Telemanagement Forum. It is an extensible, object oriented model, conceived to help in the definition of OSS interfaces and foster the interoperability among COTS. This model is not generally used by most of the legacy systems.

Objectives

To mitigate all these problems, the Project SPIDERS (Semantic P2p data Interchange with Distributed agEnts among netwoRk management Systems) was envisaged, with the objective of creating an information federated semantic layer and the mechanisms to propagate, share and validate data among network management systems.

The specific objectives are:

To model the SID as an ontology and to extend the model to the needs of information to each OSS.
To provide the mechanisms to create a federated view of all the SID-based ontologies.
To create a P2P distributed network of OSS-attached agents which are able to:
- translate relational information into semantic information through SPARQL queries.
- insert or update the relational data with information from the SPARQL query results.
To provide the mechanisms to support the needs from real systems: subscriptions to data changes, on demand queries, massive loads or validation of data integrity.
The system must be as less intrusive as possible with the existing legacy OSS's.

This approach will enable autonomic communications solutions for network and services management, since it will allow to discovering information and relationships among the distributed pieces of data, compose new processes, or take decisions based on policies.

Approach

To fulfill these requirements, it has been conceived the SPIDERS agent, a piece of software attached to every OSS and its database(s) so that it can carry out the information and manipulation of data, accessing to the database, and offering an interface to the OSS. These agents make up, in fact, a P2P network in which every agent can query any other agent, that is, every OSS can send and receive information throughout the network management ecosystem.

In order to make queries based on the SID ontology, and to extract information from the relational database, the D2R Server software is used, embedded in the agent. D2R Server has demonstrated its suitability in complex scenarios with disparate ontologies and database models. By means of SPIDERS user interface, it is possible to design the SPARQL queries to send to a remote OSS and to map the result sequences to the local database schema.

Requirements

From short term and long term objectives described, and the SPIDERS approach, there are several hot topics that must be addressed regarding the translation between relational and semantic information. But beyond formal requirements, it must be taken into account that real-life, legacy data models are only partially described by the schema, and an important part of the semantics lie in the mind of designers and developers.

SPARQL based access: To access the data through SPARQL queries based on an ontology, in a very efficient way, so that the delay in the answers and the load in the RDBMS are minimized. There is no need to upgrade the whole database into RDF, as it would not be feasible or useful. The data will remain relational.

Local SPARQL endpoint

To offer a common SPARQL endpoint for multiple distributed data sources and to enable the resolution of queries joining results from all of them. In case of a complex query, in which different data is available in different systems, a local endpoint can work out which OSS's to query and use the results to query other systems.

This requires to know which system has data implementing each class in the ontology, and to apply rules and reasoning over the SPARQL queries, so that the data sources can be viewed as a whole.

This reasoning mechanism may help also to relate semantically distant data in the queries. SID ontology is very large, and pieces of data which are relationally close, may be semantically far.

Bidirectional translation

It is not only required to translate relational data to semantic information, but to insert, update or delete data in the relational schemas from RDF or SPARQL results, based on an ontology to relational mapping. The same mapping approach for both directions would be more practical.

This requirement arises many other itchy issues, as generation/work out of primary and foreign keys, data types compatibilities, fulfilment of constraints, data formats, etc.

Support for morphologic transformations

If data fusion is required, related data must follow coherent formats (i.e., capital letters). But legacy systems use their own data formats. Whenever this problem is likely to take place, some morphologic specification of data must be provided for data properties, and appropriate filters defined and used before/after the data exchange.

This problem also arises either when structured data is stored in only one field (VARCHAR, for instance), and the ontology describes the individual data properties, or just the opposite, when one data property must come from the combination of different individual fields in the database.

Propagation of data changes

Data is dynamic, and sometimes it is required for an OSS to be aware of changes in data in other systems. To tackle this issue, SPIDERS defines the concept of data subscription, which are actions associated to triggers in the database. However, the definition of a subscription must be ontology-based and the trigger applies to relational objects. Therefore, a semantic to relational mapping of triggers is also required.

Security

Sometimes, not all the data must be available to everybody, and some kind of control has to be applied in order to allow the access to data to authorized systems, under certain conditions. For instance, it could be stated that system A may only access to data from some classes in the ontology of system B from 22 pm to 8 am.

References

Barrasa Rodríguez, J. (2007) Modelo para la Definición Automática de correspondencias Semánticas entre Ontologías y Modelos Relacionales. PhD Thesis. Polytechnic University of Madrid.

Bizer, C. et al. (2007) D2RQ V0.5 - Treating Non-RDF Relational Databases as Virtual RDF Graphs, http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rq/spec/

Dobson S. and Denazis, S (2006). A Survey of Autonomic Communications, In ACM Transactions on Autonomous and Adaptive Systems, December 2006.

Prud'hommeaux, E. and Seaborne, A. (2007). A. SPARQL Query Language for RDF, 14 June 2007, http://www.w3.org/TR/rdf-sparql-query/

Telemanagement Forum (2007). NGOSS Release 7.0 SID Solution Suite (GB922 & GB926) http://www.tmforum.org/page32554.aspx