Autothrottle: Manage Resources for SLO-Targeted Microservices

1 minutes read (267 words)

August 17th, 2023

In this is post we will walk through the research paper Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices.

Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices

We propose Autothrottle as a bi-level learning-assisted framework. It consists of (1) an application-level lightweight controller that learns to “assist” local heuristic resource control, with the visibility of application workloads, latencies, and the SLO; and (2) per-microservice agile controllers that continuously perform fine-grained CPU scaling, using local metrics and periodic “assistance” from the global level.

Above is an excerpt from paper under review in this post.
Diagram of Autothrottle architecture. It decouples the mechanisms of SLO feedback and resource control, into applicationlevel and service-level control loops, respectively. Resource controls are locally performed to maintain the performance target derived from the pre-specified application SLO.

Microservices have become a popular architecture paradigm, providing benefits like independent scaling and modularity. However, operating and maintaining microservices introduces challenges around efficiently managing resources to provide good user experience within cost constraints.

This paper presents an intriguing bi-level control approach called Autothrottle that decouples application-level SLO monitoring from service-level resource controllers. The notion of a CPU throttle-based performance target is quite exciting, as it provides a way to bridge these levels. Platform engineers may find several practical takeaways from this paper when managing microservices.

Practical Takeaways for Practitioners

  • The bi-level structure aligns well with aggregating metrics at the application level while collecting per-service metrics. This incremental observability is useful.
  • Tracking CPU throttling events can help set alerts and monitor service health. Throttling often indicates problems.
  • The online learning algorithm for autoscaling is handy for capacity planning using production traffic data.
  • The rapid feedback control and rollback mechanisms inform techniques for incident response.
  • Load testing microservices while correlating with metrics helps test observability.
  • The techniques could extend beyond CPU to memory, IO, network for holistic resource management.
  • The modular design enables gains without full end-to-end traces. Failures in one service are mitigated by others.
  • Integration with Kubernetes operators would be valuable for microservice deployments.


The paper presents a practical bi-level resource management approach for microservices. The key takeaway for platform engineers is the value of incremental observability wins from decoupled monitoring and control. Tracking emerging proxy metrics like CPU throttling provides alerts for potential problems. And online learning algorithms lend themselves to continuous improvement of autoscaling policies.

Frequently Asked Questions

What are microservices?
Microservices are an architecture style where an application is composed of small, independent services that communicate via APIs. This provides modularity and flexibility compared to monolithic apps.
How do microservices differ from monolithic apps?
Monolithic apps have tightly coupled components bundled together. Microservices break these into decentralized services that can be developed, deployed, and scaled independently.
What is a service mesh?
A service mesh provides networking capabilities like load balancing, encryption, logging, and monitoring for microservices. Popular options include Istio and Linkerd.
What is a container?
Containers package code and dependencies into isolated processes to enable reproducible and portable deployment of applications. Docker and Kubernetes are commonly used.
What is vertical scaling?
Vertical scaling adjusts the resources (e.g. CPU, memory) allocated to an application instance. This allows fine-grained scaling.
What is horizontal scaling?
Horizontal scaling changes the number of instances of an application. This allows coarse-grained scaling to handle load changes.
What is a latency SLO?
A service level objective (SLO) defines a target level for a metric like request latency. Common examples are 95th percentile latency under 50ms.
What is tail latency?
Tail latency refers to the latency experienced by requests in the tail (e.g. 95th or 99th percentile) of the distribution. It focuses on worst-case vs. average.
What is online machine learning?
Online ML sequentially trains models on incoming data in real-time, as opposed to offline training on fixed datasets. It is commonly used for adaptation.
What are multi-armed and contextual bandits?
Bandits are simple reinforcement learning techniques for decision making with limited information. Contextual bandits use additional context to inform actions.