is now a part of Cisco. Learn More

About Cisco

Vijay Olety

5 minute read


Spot Instances are great. They are cheap and offer up to 90% savings over On-Demand Instances. By design, they can be taken away at any time. “How would you then run Web / App tier on Spot Instances?” is the million-dollar question that needs an answer.

In this post, I will delve into its details. In the end, I am sure you will appreciate the value that Spot Instances provide and also recognize that they can be used for any kind of workload.

Spot Instances

First, a recap – Spot Instances are spare computing capacity available at deeply discounted prices. AWS allows users to bid on unused EC2 capacity in a region at any given point and run those instances for as long as their bid exceeds the current Spot Price. The Spot Price changes periodically based on supply and demand, and all users bids that meet or exceed it gain access to the available Spot Instances.

Simple Architecture Diagram

A simple architecture below includes the following components:

  • External-facing Amazon Virtual Private Cloud (VPC) containing one subnet within a single Availability Zone (AZ)
  • Auto Scaling Groups (ASG) for the EC2 instances to handle requests
  • An Elastic Load Balancer (ELB) to route requests across these instances
  • Standard Identity and Access Management (IAM) roles and instance policies
  • image1

    Credit: AWS Docs

    Sample Request Workflow

    A request is made to through a browser. The browser then contacts Amazon Route 53, highly available and scalable cloud Domain Name System (DNS) web service. It then understands that there is a CNAME record associated it with, of the form The request is then forwarded to the ELB which subsequently pushes it to an underlying EC2 instance, say m3.large. That instance processes the request and the response is shown on the user’s browser.

    Simple right? Yes, as the implicit assumption is that the instances behind the ELB are On-Demand instances and always available. What if we switch-over to Spot Instances? How would the architecture change?

    Complex Architecture Diagram

    An advanced architecture below includes, and not limited to, the following components:

  • External-facing Amazon Virtual Private Cloud (VPC) spread across multiple Availability Zones (AZs) with separate subnets for different applications
  • Auto Scaling Groups (ASG) for the EC2 instances to handle requests
  • Elastic Load Balancers (ELB) to route requests across these instances
  • Standard Identity and Access Management (IAM) roles and instance policies
  • ELB

    Credit: Imgur
    It is a highly available architecture spread across multiple AZs and subnets. This design is common to both On-Demand and Spot Instances but makes more sense to the latter as there is an added complication of instances being taken away. As you can see, a Production VPC has multiple Private Subnets spread across us-east-1b and us-east-1c.

    A request to hits the ELB, which routes it to a Private Subnet in one of the AZs, say us-east-1b and the underlying EC2 instance, say m3.large processes it. If the routing policy is Round Robin, then the next request is forwarded to us-east-1c. If one of the Spot Instances is taken away, then the ASG immediately kicks in and provisions another Spot Instance. Let’s say there was a huge spike in Spot Price for m3.large in us-east-1b. All of our Spot Instances in us-east-1b would be terminated. Every request to is now routed to us-east-1c and it might overwhelm the associated instances. How should you address these issues?

    Best Practices

    Top of my head, I can recall a few:

  • Spread your architecture across multiple AZs
    • Consider us-east-1b and us-east-1c. If an entire AZ goes off-the-grid, say us-east-1c due to a natural calamity, then us-east-1b continues to process requests ensuring application availability.
  • Always have two or more instance types behind an ELB in each AZ
    • Consider m3.large and m4.large in us-east-1b. Even though all m3.large are taken away due to a spike in Spot Prices, m4.large continues to process requests in us-east-1b.
  • Have ASGs associated with every instance type
    • Set minimum, desired and maximum counts to ensure that if a few Spot Instances are terminated, new ones can be immediately provisioned.
    • Associate CloudWatch metrics with ASGs so that we have the required capacity when there is a traffic spike.
  • Choose a random bidding strategy
    • Consider m3.large and m4.large in us-east-1b with their current Spot Price being 0.1 USD. Set m3.large Bid Price as 0.2 USD and m4.large as 0.4 USD. If the Spot Price changes to 0.3 USD for both the instances, due to market volatility, then m3.large will be taken away and m4.large will continue to process requests.
  • Some of the best practices mentioned above can be offloaded through the usage of Spot Fleet, as described in our earlier post. Recently, Spot Fleet announced support for Auto-Scaling Groups which further eases the use of Spot Instances while guaranteeing compute capacity. This should suffice for a vast majority of applications.

    But the real-world is the real deal; things can go drastically wrong, non-availability of any Spot Instance for instance, and can cause application downtime. This is totally unacceptable and Plan B should be in place to ensure business continuity. CMPUTE follows a Hybrid model, wherein a few On-Demand Instances are launched and the remaining majority are Spot Instances. This ensures  both application uptime and cost savings are never compromised. The customers are extremely happy. So are we.

    Register now for a free 14-day trial of CMPUTE.

    comments powered by Disqus