SCADA, Architectural Styles, and the Web

A position paper for the W3C Workshop on Web of Services for Enterprise Computing, by Benjamin Carlyle of Westinghouse Rail Systems Australia.

Introduction

The Web and traditional SCADA technology are built on similar principles and have great affinity. However, the Web does not solve all of the problems that the SCADA world faces. This position paper consists of two main sections: The first section describes the SCADA world view as a matter of context for readers who are not familiar with the industry; the second consists of a series of "Tier 1" and "Tier 2" positions that contrast with the current Web. Tier 1 positions are those that are based on a direct and immediate impact on our business. Tier 2 positions are more general in nature and may only impact business in the longer term.

The SCADA World View

Supervisory Control and Data Acquisition (SCADA) is the name for a broad family of technologies across a wide range of industries. It has traditionally been contrasted with Distributed Control Systems (DCS), where distributed systems operate autonomously and SCADA systems typically operate under direct human control from a central location.

The SCADA world has evolved to usually be a hybrid with traditional DCS systems, but its meaning has expanded further. When we talk about SCADA in the modern era, we might be talking about any system that acquires and concentrates data on a soft real-time basis for centralised analysis and operation.

SCADA systems or their underlying technologies now underpin most operational functions in the railway industry. SCADA has come to mean "Integration" as traditional vertical functions like train control, passenger information, traction power, and environmental control exchange ever more information. The demands of our customers for more flexible, powerful, and cost-effective control over their infrastructure are an ever-increasing set.

Perhaps half of our current software development can be attributed to protocol development to achieve our integration aims. This is an unacceptable figure, unworkable, and unnecessary. We tend to see a wide gap between established SCADA protocols and one-off protocols developed completely from scratch. SCADA protocols tend to already follow many of the REST constraints. They have limited sets of methods, identifiers that point to specific pieces of information to be manipulated, and a small set of content types. The one-off protocols tend to need more care before they can be integrated, and often there is no architectural model to be found in the protocol at all.

We used to think of software development to support a protocol as the development of a "driver", or a "Front End Processor (FEP)". However, we have begun to see this consistently as a "protocol converter". SCADA systems are typically distributed, and the function of protocol support is usually to map an externally-defined protocol onto our internal protocols. Mapping from ad hoc protocols to an internally-consistent architectural style turns out to be a major part of this work. We have started to work on "taming" HTTP for use on interfaces where we have sufficient control over protocol design, and we hope to be able to achieve Web-based and REST-based integration more often than not in the future. Our internal protocols already closely resemble HTTP.

The application of REST-based integration has many of the same motivations and goals as the development of the Semantic Web. The goal is primarily to integrate information from various sources. However, it is not integration with a view to query but with a view to performing system functions. For this reason it is important to constrain the vocabularies in use down to a set that in some way relate to system functions.

I would like to close this section with the observation that there seems to be a spectrum between the needs of the Web at large, and the needs of the enterprise. Probably all of my Tier 1 issues could be easily resolved within a single corporate boundary, and continue to interoperate with other parts of the Web. The solutions may also be applicable to other enterprises. In fact, as we contract to various enterprises I can say this with some certainty. However, it seems difficult to get momentum behind proposals that are not immediately applicable to the real Web. I will mention pub/sub in particular, which is quickly dismissed as being unable to cross firewalls easily. However, this is not a problem for the many enterprises that could benefit from a standard mechanism. Once acceptance of a particular technology is established within the firewall, it would seem that crossing the firewall would be a more straightforward proposition. Knowing that the protocol is proven may encourage vendors and firewall operators to make appropriate provisions when use cases for the technology appear on the Web at large.

Tier 1: A HTTP profile for High Availability Cluster clients is required

My first Tier 1 issue is the use of HTTP to communicate with High Availability (HA) clusters. In the SCADA world, we typically operate with no single point of failure anywhere in a critical system. We typically have redundant operator workstations, each with redundant Network Interface Cards (NICs), and so on and so forth, all the way to a HA cluster. There are two basic ways to design the network between, either create two separate networks for traffic, or interconnect. One approach yields multiple IP addresses to connect to across the NICs of a particular server, and the other yields a single IP. Likewise, it is possible to perform IP takeover and have either a single IP shared between multiple server hosts or have multiple IPs.

Other than HA, we typically have a constraint on failover time. Typically, any single point of failure is detected in less than five seconds and a small amount of additional time is allocated for the actual recovery. Demands vary, and while some customers will be happy with a ten or thirty second total failover time others will demand a "bumpless" transition. The important thing about this constraint is that it is not simply a matter of a new server being able to accept new requests. Clients of the HA cluster also need to make their transition in the specified bounded time.

HTTP allows for a timeout if a request takes too long, typically around forty seconds. If this value was tuned to the detection time, we could see that our server had failed and attempt to reconnect. However, this would reduce the window in which valid responses must be returned. It would be preferable to send periodic keepalive requests down the same TCP/IP connection as the HTTP request was established on. This keepalive would allow server death detection to be handled independently of a fault that causes the HTTP server not to respond quickly or at all. We are experimenting with configuring TCP/IP keepalives on HTTP connections to achieve HA client behaviour.

The first question in such a system is about when the keepalive should be sent, and when it should be disabled. For HTTP the answer is simple. When a request is outstanding on a connection, keepalives should be sent by a HA client. When no requests are outstanding keepalives should be disabled. In general theory, keepalives need to be sent whenever a client expects responses on the TCP/IP connection they established. This general case affects the pub/sub model that I will describe in the next section. If pub/sub updates can be delivered down a HA client's TCP/IP connection, the client must send keepalives for the duration of its subscriptions. It is the server who must send keepalives if the server connects back to the client to deliver notifications. Such a server would only need to do so while notification requests are outstanding, but would need to persist the subscription in a way that left the client with confidence that the subscription would not be lost.

Connection is also an issue in a high availability environment. A HA client must not try to connect to one IP, then move onto the others after a timeout. It should normally connect to all addresses in parallel, then drop all but the first successful connection. This process should also take place when a failover event occurs.

Tier 1: A Publish/Subscribe mechanism for HTTP resources is required

One of the constants in the ever-changing SCADA world, is that we perform soft real-time monitoring of real-world state. That means that data can change unexpectedly and that we need to propagate that data immediately when we detect the change. A field unit will typically test an input every few milliseconds, and on change will want to notify the central system. Loose coupling will often demand that a pub/sub model be used rather than a push to a set of urls configured in the device.

I have begun drafting a specification that I think will solve most pub/sub problems, with a preliminary name of SENA. It is loosely based on the GENA protocol, but has undergone significant revision to attempt to meet the security constraints of the open Web while also meeting the constraints of a SCADA environment. I would like to continue working on this protocol or a similar protocol, helping it reach a status where it is possible to propose it for general use within enterprise boundaries.

We are extremely sensitive to overload problems in the SCADA world. This leads us to view summarisation as one of the core features of a subscription protocol. We normally view pub/sub as a way to synchronise state between two services. We view the most recent state as the most valuable. If we have to process a number of older messages before we get to the newest value, latency and operator response time both increase. We are also highly concerned with situations permanent or temporary where state changes occur at a rate beyond which the system can adequately deal with. We dismiss with prejudice, any proposal that involves infinite or arbitrary buffering at any point in the system. We also expect a subscription model to be able to make effective use of intermediaries, such as web proxies that may participate in the subscription.

Tier 2: One architectural framework with a spectrum of compatible architectures

I believe that the architectural styles of the Web can be applied to the enterprise. However, local conventions need to be permitted. Special methods, content types, and other mechanisms should all be permitted where required. I anticipate that the boundary between special and general will shift over time, and that the enterprise will act as a proving ground for new features of the wider Web. Once such features are established in the wider Web, I would also expect the tide to flow back into enterprises that are doing the same thing in proprietary ways.

If properly nurtured, I see the enterprise as a nursery for ideas that the Web is less and less able to experiment with itself. I suspect that the bodies that govern the Web should also be involved with ideas that are emerging in the enterprise. These bodies can help those involved with smaller-scale design keep an eye on the bigger picture.

Tier 2: Web Services are too low-level

Web Services are not a good solution space for Web architecture because they attack integration problems at too low a level. It is unlikely that two services independently developed against the WS-* stack will interoperate. That is to say, they will only interoperate if their WSDL files match. HTTP is ironically a higher-level protocol than the protocol that is layered on top of it.

That said, we do not rule out interoperating with such systems if the right WSDL and architectural styles are placed on top of the WS-* stack. We anticipate a "HTTP" WSDL eventually being developed for WS-*, and expect to write a protocol converter back to our internal protocols for systems that implement this WSDL. The sheer weight of expectation behind Web Services suggests that it will be simpler for some organisations to head down this path, than down a path based on HTTP directly.

Tier 2: RDF is part of the problem, not the solution

We view RDF as a non-starter in the machine-to-machine communications space, though we see some promise in ad hoc data integration within limited enterprise environments. Large scale integration based on HTTP relies on clear, well-defined, evolvable document types. While RDF allows XML-like document types to be created, it provides something of an either/or dilemma. Either use arbitrary vocabulary as part of your document, or limit your vocabulary to that of a defined document type.

In the former case you can embed rich information into the document, but unless the machine on the other side expects this information as part of the standard information exchange, it will not be understood. It also increases document complexity by blowing out the number of namespaces in use. In practice it makes more sense to define a single cohesive document type with a single vocabulary that includes all of the information you want to express. However, in this case you are worse off than if you were to start with XML.

You cannot relate a single cohesive RDF vocabulary to any other without complex model-to-model transforms. In short, it is easier to extract information from a single-vocabulary XML document than from a single-vocabulary RDF document. RDF does not appear to solve any part of the system integration problem as we see it. However, again, it may assist in the storage and management of ad hoc data in some enterprises in place of traditional RDBMS technology.

We view the future of the semantic web as the development of specific XML vocabularies that can be aggregated and subclassed. For example, the atom document type can embed the html document type in an aggregation relationship. This is used fo elements such as <title>. The must-ignore semantics of atom also allow sub-classing by adding new elements to atom. The subclassing mechanism can be used to produce new versions of the atom specification that interoperate with old implementations. The mechanism can also be used to produce jargonised forms of atom rather than inventing a whole new vocabulary for a particular problem domain.

We see the development, aggregation, and jargonisation of XML document types as the key mechanisms in the development of the semantic web. The graph-based model used by RDF has currently not demonstrated value in the machine-to-machine data integration space, however higher-level abstractions expressed in XML vocabularies are a proven technology set. We anticipate the formation of communities around particular base document types that work on resolving their jargon conflicts and folding their jargon back into the base document types. We suspect this social mechanism for vocabulary development and evolution will continue to be cancelled out in the RDF space by RDF's reliance URI namespaces for vocabulary and by its overemphasis of the graph model.

Tier 2: MIME types are more effective than URI Namespaces

One the subject of XML, we have some concerns over the current direction in namespaces. The selection of a parser for a document is typically based on its MIME type. Some XML documents will contain sub-documents, however there is no standard way to specify the MIME type of the sub-document. We view MIME as more fully-featured than arbitrary URIs, particularly due to the explicit subclassing mechanism available.

In MIME we can explicitly indicate that a particular document type is based on xml: application/some-type+xml. Importantly, we can continue this explicit sub-typing: application/type2+some-type+xml. We consider this an important mechanism in the evolution of content types, especially when jargonised documents are passed to standard processors. It is normal to expect that the standard processor would ignore any jargon and extract the information available to it as part of standard vocabulary.

While MIME also has its weaknesses, the explicit subclassing mechanism is not available in URI name-spaces at all. To use the atom example, again, atom has a application/atom+xml MIME type but an XML namespace of <http://www.w3.org/2005/Atom>. We view the former as more useful than the latter in the development of the Semantic Web and in general machine to machine integration problems.

Tier 2: Digital Signatures are likely to be useful

We regard the protection of secret data by IP-level or socket-level security measures as being sufficient at this time. Secret data is known and communicated by few components of the architecture, so is usually not a scalability issue. We do not think that secret data should have significant impact on Web architecture, however, we do view the ability to digitally sign non-secret data as a likely enabler for future protocol features.

Conclusion

Web technology and architectural style are proven useful tools for systems integration, but are incomplete. A scalable summarising Publish/Subscribe mechanism is an essential addition to the suite of tools, as is a client profile for operating in High Availability environments. These tools must be defined and standardised in order to gain a wide participation to be useful to the enterprise.

We have concerns about some current trends in Web Architecture. These relate to to namespaces in XML, Web Services, and RDF. All of these trends appear to work against the goal of building integrated architectures from multi-vendor components. Our goal outcomes would also appear to be the goal outcomes of the Semantic Web, so we have some hope that these trends will begin to reverse in the future.