Unused cloud resources can put an unnecessary drain on your computing budget, and unlike legacy on-premises architectures, there is no need to over-provision compute resources for times of heavy usage.
Autoscaling is one of the value levers that can help unlock cost savings for your Azure workloads by automatically scaling up and down the resources in use to better align capacity to demand. This practice can greatly reduce wasted spend for those dynamic workloads with inherently “peaky” demand.
For periods when an app puts a heavier demand on cloud resources, autoscaling adds resources to handle the load and satisfy service-level agreements for performance and availability. And for those times when the load demand decreases (nights, weekends, holidays), autoscaling can remove idle resources to reduce costs. Autoscaling automatically scales between the minimum and maximum number of instances and will run, add, or remove VMs automatically based on a set of rules.
Autoscaling is near real-time cost optimization. Think of it this way: Rather than build an addition to your house with extra bedrooms that will go unused most of the year, you have an agreement with a nearby hotel. Your guests can check-in, at any time and at the last minute, and the hotel will automatically charge you for the days when they visit.
Not only does it utilize cloud elasticity by paying for capacity only when you need it, you can also reduce the need for an operator to continually monitor the performance of a system and make decisions about adding or removing resources.
What services can you autoscale?
Azure provides built-in autoscaling using Azure Monitor autoscale for most compute options, including:
- Azure Virtual Machines Scale Sets—see How to use automatic scaling and virtual machine scale sets.
- Service Fabric—see Scale a Service Fabric cluster in or out using autoscale rules.
- Azure App Service—see Scale instance count manually or automatically.
- Azure Cloud Services has built-in autoscaling at the role level. See How to configure autoscaling for a cloud service in the portal.
Azure Functions differs from the previous compute options because you don't need to configure any autoscale rules. The hosting plan you choose dictates how your function app is scaled:
- With a consumption plan, your functions app will scale automatically, and you will only pay for compute resources when your functions are running.
- With a premium plan, your app will automatically scale based on demand using pre-warmed workers that run applications with no delay after being idle.
- With a dedicated plan, you will run your functions within an App Service plan at regular App Service plan rates.
Azure Monitor autoscale provides a common set of autoscaling functionality for virtual machine scale sets, Azure App Service, and Azure Cloud Service. Scaling can be performed on a schedule, or based on a runtime metric, such as CPU or memory usage.
Use the built-in autoscaling features of the platform if they meet your requirements. If not, carefully consider whether you really need more complex scaling features. Examples of additional requirements may include more granularity of control, different ways to detect trigger events for scaling, scaling across subscriptions, and scaling other types of resources.
Note that application design can impact how that app handles scale as a load increases. To review design considerations for scalable applications, including choosing the right data storage and VM size, and more, check out Design scalable Azure applications—Microsoft Azure Well-Architected Framework.
Also know that, in general, it is better to scale up than to scale down. Scaling down usually involves deprovisioning or downtime. So, choose smaller instances when a workload is highly variable and scale out to get the required level of performance.
You can set up autoscale in the Azure portal, PowerShell, Azure CLI, or Azure Monitor REST API.
Get started with autoscaling
With autoscaling, you can dynamically scale your apps to meet changing demand or anticipate loads with different schedules and set rules that trigger scaling actions. Regardless of how you set it up, the goal is to maximize the performance of your application and save money by not wasting server resources.