Spot Instances are great. They are cheap and offer up to 90% savings over On-Demand Instances. By design, they can be taken away at any time. “How would you then run Web / App tier on Spot Instances?” is the million-dollar question that needs an answer.
In this post, I will delve into its details. In the end, I am sure you will appreciate the value that Spot Instances provide and also recognize that they can be used for any kind of workload.
First, a recap – Spot Instances are spare computing capacity available at deeply discounted prices. AWS allows users to bid on unused EC2 capacity in a region at any given point and run those instances for as long as their bid exceeds the current Spot Price. The Spot Price changes periodically based on supply and demand, and all users bids that meet or exceed it gain access to the available Spot Instances.
Simple Architecture Diagram
A simple architecture below includes the following components:
Sample Request Workflow
A request is made to www.example.com through a browser. The browser then contacts Amazon Route 53, highly available and scalable cloud Domain Name System (DNS) web service. It then understands that there is a CNAME record associated it with, of the form example-app-XXXXXXXXXX.ap-southeast-1.elb.amazonaws.com. The request is then forwarded to the ELB which subsequently pushes it to an underlying EC2 instance, say m3.large. That instance processes the request and the response is shown on the user’s browser.
Simple right? Yes, as the implicit assumption is that the instances behind the ELB are On-Demand instances and always available. What if we switch-over to Spot Instances? How would the architecture change?
Complex Architecture Diagram
An advanced architecture below includes, and not limited to, the following components:
A request to www.example.com hits the ELB, which routes it to a Private Subnet in one of the AZs, say us-east-1b and the underlying EC2 instance, say m3.large processes it. If the routing policy is Round Robin, then the next request is forwarded to us-east-1c. If one of the Spot Instances is taken away, then the ASG immediately kicks in and provisions another Spot Instance. Let’s say there was a huge spike in Spot Price for m3.large in us-east-1b. All of our Spot Instances in us-east-1b would be terminated. Every request to www.example.com is now routed to us-east-1c and it might overwhelm the associated instances. How should you address these issues?
Top of my head, I can recall a few:
- Consider us-east-1b and us-east-1c. If an entire AZ goes off-the-grid, say us-east-1c due to a natural calamity, then us-east-1b continues to process requests ensuring application availability.
- Consider m3.large and m4.large in us-east-1b. Even though all m3.large are taken away due to a spike in Spot Prices, m4.large continues to process requests in us-east-1b.
- Set minimum, desired and maximum counts to ensure that if a few Spot Instances are terminated, new ones can be immediately provisioned.
- Associate CloudWatch metrics with ASGs so that we have the required capacity when there is a traffic spike.
- Consider m3.large and m4.large in us-east-1b with their current Spot Price being 0.1 USD. Set m3.large Bid Price as 0.2 USD and m4.large as 0.4 USD. If the Spot Price changes to 0.3 USD for both the instances, due to market volatility, then m3.large will be taken away and m4.large will continue to process requests.
Some of the best practices mentioned above can be offloaded through the usage of Spot Fleet, as described in our earlier post. Recently, Spot Fleet announced support for Auto-Scaling Groups which further eases the use of Spot Instances while guaranteeing compute capacity. This should suffice for a vast majority of applications.
But the real-world is the real deal; things can go drastically wrong, non-availability of any Spot Instance for instance, and can cause application downtime. This is totally unacceptable and Plan B should be in place to ensure business continuity. CMPUTE follows a Hybrid model, wherein a few On-Demand Instances are launched and the remaining majority are Spot Instances. This ensures both application uptime and cost savings are never compromised. The customers are extremely happy. So are we.
Register now for a free 14-day trial of CMPUTE.