Automated Performance Optimization via Machine Learning

As the nature of web content and service requirements change, ensuring that established and expanding infrastructure can meet required capacity and performance objectives has become a major challenge. Typically, human-derived heuristics have been used to regulate processes essential to delivery of rich content transformation and delivery workloads. These static methods, however, cannot scale in a manner consistent with the growing nature of the field and expanding expectation of capabilities. Machine learning approaches provide a set of tools which can generate accurate, flexible and reactive solutions for multiple areas of the industry, from coordination of dynamic network architecture to selecting optimal placement of workloads on available hardware to provide improved performance.

To date Intel Labs Europe have successfully utilised Machine Learning approaches to dynamically identify relevant hardware metrics to ensure SLA compliance, predict workload behaviour in a heterogeneous system and ensure dense workload packaging and subsequent increased utilisation of available resources. While Machine Learning is often used as a catch-all term, our approach focuses on the breakdown of the multi-faceted problems facing consumers and the application of various Machine Learning solutions as necessary, to form an appropriate integrated approach. Current work in resource allocation, for example, utilises predictive models and real-time self-adjustment that can coordinate to both minimise running costs and improve service performance. This involves multiple Machine Learning approaches to synchronously facilitate intelligent placement of workloads, appropriate allocation of resources based on current and historical data, automated time-of-day capacity adjustments and power optimizations and utilization of telemetry models together with aggregation and feature generation to allow reaction to unexpected workload behaviour. These approaches can then incorporate continuous learning in real-time to ensure the system is both adaptable and future-proof.

Intel Position Statement for Web5G Workshop

5G promises great advances in network performance and flexibility. Edge Computing, Fog, M2M and IoT present a complex range of service level challenges. The ever more dynamic distribution of web content and service components to the network edge, highlights the interplay between allocation of resources for these components, and managing the network itself. Opportunities exist for web application and content developers or service providers to present tags or metadata to enable a network orchestrator to strike the most optimal balance of resource allocation and configuration, in a demanding, complex and constantly changing environment.

Short Abstract

Network Functions and Service Components distributed across an ever expanding and more dynamic network are increasingly critical to data creation and ingestion, transformation, routing, delivery and presentation. These components must be assigned sufficient resources for effective and predictable execution, but not at the expense of efficiency, flexibility and optimal sharing. Content-oriented services such as video creation, analytics, transformation, caching and delivery not only exhibit sensitivities to placement and resource sharing, but are typically chained in workflows, presenting additional constraints on placement.

5G promises huge increases in bandwidth, but an implication of the reduced cell size and the increased mobility of endpoints, is greatly increased volume of handovers. Increased accommodation of NFV, IoT->CPS and content transformation workloads presents challenges in infrastructure resource sharing. Thus we see the requirement for much greater precision and responsiveness in placement of data (content, caches) and service components for processing and delivering this data.

The evolution from IoT toward Cyber Physical Systems (CPS) in critical applications like transport and industry brings more stringent service requirements which will make the inter-play between distributed web and network applications, and the coordination of the network itself, not only more critical but potentially much more dynamic.

Combining deep telemetry and analytics with formulations of resource capability at system and sub-system level dramatically increase the precision with which placement decisions can be made for data and service chain / service components, while reducing the time and effort to compute optimal or near optimal service configurations in highly dynamic scenarios. Service and network orchestration could greatly benefit from classification or tagging of web apps and content, in order to optimise flows or slicing, and to determine placement or allocation decisions for the apps and services themselves, given network realities at any point in time.