Cmpute.io is a platform which reduces your AWS EC2 spend across three main areas:
- Instance Optimization – usage of Spot, Reserved and On-Demand instances
- Time Optimization – schedule start and stop of instances
- Performance Optimization – analyse application performance and rightsize instances
Time-based Optimization is about the ability to schedule workloads (instances and applications) based on some policies (most relevant for dev/test workloads). Instance Optimization is the automatic usage of Spot Instances at all layers of your application – be it web / app tier or API tier as well as optimal utilization of purchased Reserved Instances across multiple accounts.
In this post, I will talk about the third critical area of optimization which is what we call as Performance Optimization or Rightsizing of your application.
One of the reasons why enterprises move to the cloud is the flexibility it offers in terms of infrastructure management and the ease with which it can be changed in an instant. Is your infrastructure over-provisioned? Just do a scale-in. Is it under-provisioned? A simple scale-out should do the trick. What’s the assumption you have made during the scaling process? That the instances are well utilized and is being used to its maximum. This is partly true though when you look at one metric at a time.
Why you ask? You perform a scaling operation when a certain threshold is breached. Let’s take an example: you have configured your scale-out policy to add one instance to the cluster when CPU utilization touches 60%. This makes sense as you need more instances to handle traffic spikes. But what about the instance’s memory utilization? How do we ensure that memory is being utilized efficiently? What if memory is always at 5% usage? Scaling does not help and this is where Rightsizing comes to our rescue.
Rightsizing is the process of analyzing your workloads and recommending the right instance type to minimize wastage. During the planning and provisioning of the underlying infrastructure, we tend to make certain assumptions regarding the application performance. Over time, we capture metrics and validate our assumptions and take necessary actions if needed. But this is a manual and cumbersome Devops exercise. Wouldn’t it be nice if this can be automated completely?
Cmpute.io takes rightsizing very seriously and guarantees that the application performance is never degraded when we make recommendations.
When customers move their workloads onto our platform, we start orchestration of their infrastructure on RI and Spot Instances to achieve savings of 75% and more. Additionally, we start monitoring the application metrics such as CPUUtilization, memory usage (available via an opt-in during workload migration) and network latency over a two-week window. This data is fed into our proprietary Recommendation Engine.
Our engine first sifts through the data. It plots a time-series graph and calculates the peaks, valleys and averages. It then matches these numbers with in-built breach thresholds. If the instances are being utilized efficiently, then give yourself a pat on your back! You are a DevOps expert and in complete control.
More often than not, our Recommendation Engine sees that application infrastructure is under-utilized and it then proceeds to recommend an appropriate rightsized instance. This is a complex multi-stage process and broadly involves:
* Narrow down to a subset of "suitable" instances spanning across families and sizes. This is really a big deal: Consider an m4.large instance where CPU is maxed out and memory is under-utilized. If we restrict recommendations within the m4 family, then we can only do so much. But if we were to move to a different family say c4, then we can improve CPU utilization and memory usage as it is a compute-optimized instance.
* For each of these "suitable" instances, find out if there is Spot capacity available in the Region and Availability Zones where the application is running. Narrow down the list further.
* For each of these instances, categorize the Spot Instance likelihood of getting terminated as Low, Medium and High. Narrow down the list further to include instances falling into Low and Medium risk.
* For each of these instances, calculate the Spot price and find out the cheapest one.
* Recommend this instance to the customer!
Rightsizing in Action
We eat our own dog food! We use rightsizing on our own infrastructure and act upon them.
As you can see from the figure above, one of our internal systems is running on m4.xlarge. Our recommendation engine analyzed the application performance metrics and suggested that the most appropriate instance type is r3.large. It also displays information on savings achieved when we make the switch – 95 USD per instance-month and 23% of additional savings.