DeviceDescriptionEcosystemArchitectureAndDeployment

From Device Description Working Group Wiki
Jump to: navigation, search

This is the Architecture and Deployment section of the Device Description Ecosystem document. See the DeviceDescriptionEcosystem overview.

Architecture and Deployment

* The provision of a Device Description Repository is not without costs. This section notes some architectural models for the repository, the manner in which these architectures are deployed, and the costs involved. This is neither a recommendation for any particular architecture, nor an exhaustive list of the options.

Centralized Repository

This architecture realizes the repository as a single instance deployed in a single place with all access via the same networked connections. Management of the solution is simplified because all operations are applied in one place.

The cost of this single instance is high because of the number of users that are served from one place, requiring superior processing capability, high bandwidth and permanent "up time". This single instance is also a single point of failure and there will be significant costs if the system fails.

Distributed Mirrors of a Master Repository

This architecture is a simple extension of the centralized repository, where updates to the master are replicated to all of the mirrors. As most of the operations will be reads, the updates can be directed to the master, while the reads are handled by the mirrors. This spreads the networking and processor load, and reduces the down-time risk. In the event of a failure of the master, one of the mirrors can be designated as the new master.

The cost of this architecture is greater than a centralized solution (because of the management and replication overheads), and roughly proportional to the number of mirrors. There is less potential cost due to system failure. The mirrors may share the cost burden.

Multiple Independent Repositories

Unlike the central/mirrored architectures, this architecture permits device descriptions to be collected, managed and queried independently. There is no requirement for the same data to be available from all instances, nor is there a requirement for the values stored to be consistent. The instances may use the same APIs, and even the same vocabularies, though it is more likely that each instance will specialize in some aspect of device descriptions and therefore they will employ their own vocabularies.

This architecture is at least as costly as as the centralized architecture. The instance with the most useful (and/or least expensive) data will bear the heaviest load. Failure of any individual repository will be serious for any users of that repository because it is unlikely that an alternative repository will have the same data.

This is the situation that pertains at the time of writing, though the instances all use their own vocabularies and their own interfaces. In some cases, the interfaces are merely HTTP requests for XML files, and in other cases the interfaces must be provided as custom code because only the repository data is accessible (as a single file).

Distributed Disjoint Delegated Repositories

This architecture is similar to the Multiple Independent Repositories architecture in that each instance has a separate collection of data. However, the instances are managed so that the aggregate data is equivalent to that managed by a Centralized Repository. Furthermore, each instance is capable of redirecting, or proxying, to an alternative instance if it is queried for data that is not in its vocabulary.

The costs of this architecture are similar to the Multiple Independent Repositories architecture, though probably a little more due to the need to manage the delegation.

Distributed Delegated Repositories with Redundancy

This architecture is similar to the Distributed Disjoint Delegated Repositories architecture except that each instance is permitted to mirror some data from other instances. This reduces the need for redirects or proxying, improves latency, improves resilience and reduces risk due to the failure of an individual instance.

The cost of supporting this architecture is shared by all instances. Mirroring (caching) enables the load to be balanced across instances, reducing the need for extreme processing or networking capabilities, and providing redundancies in the data collections which reduce the likelihood of costs due to instance failures. It is even possible to distribute the management and maintenance costs associated with data collection, mirroring and delegation of requests.

Hybrids and more

Any of the above architectures can be combined to some degree. For example, several independent repositories can co-exist with a distributed delegated collection. Lessons from the field of federated databases suggest that many alternative models are possible and are likely to be explored by implementers of Device Description Repositories. These are not explored in this document.