Content • Nov 30, 2023

How AWS is Using Capacity Blocks to Alleviate the GPU Shortage

by Emily Dunenfeld

Capacity Blocks provide greater availability and cost savings for short-term GPU needs.

The widespread rise of AI is causing a huge increase in GPU demand. Companies need GPUs (graphics processing units) for various ML (machine learning) tasks, such as data processing and complex computations, due to their extreme processing power. However, this massive surge in demand has led to extensive waitlists, sometimes spanning nearly a year. This scarcity poses a significant challenge, especially for smaller groups with limited purchasing power, like startups or research organizations.

Some have resorted to creative solutions to combat the shortage. In a notable example, one startup borrows GPUs through connections at large equipment vendors and contacts in quantitative stock trading firms. In this case, they only needed 64 GPUs for six-hour increments. Their story is familiar to other companies facing similar challenges. That’s where Amazon EC2 Capacity Blocks come into play, providing an alternative solution to navigating the GPU scarcity issue.

Amazon EC2 Capacity Blocks for ML

The new release of EC2 Capacity Blocks for ML aims to make GPU instances more accessible. With Capacity Blocks, you can reserve P5 instances by the number of instances (up to 64) and the duration (up to 14 days). P5 instances use NVIDIA H100 Tensor Core GPUs and are colocated in Amazon EC2 UltraClusters. NVIDIA’s GPUs lead and dominate the server market, holding 60–70% of the market share.

Instance	GPUs	vCPUs	Instance Memory (TiB)	GPU Memory	Network Bandwidth	GPUDirect RDMA	GPU Peer to Peer	Instance Storage (GB)	EBS Bandwidth (Gbps)
p5.48xlarge	8	192	2	640 GB HBM3	3200 Gbps EFAv2	Yes	900 GB/s NVSwitch	8 x 3.84 NVMe SSD	80

P5 Instance Types (scroll to see full table)

P5 instances can be used in Generative AI applications for tasks such as question-answering, code generation, video and image generation, and speech recognition. P5 Capacity Blocks are well-suited for use cases such as training and fine-tuning ML models, prototyping, running experiments, and preparing for surges in demand for ML applications. In the previously mentioned scenario, the startup only needed 64 GPUs for six-hour increments, making them an excellent case of a company that would benefit from this plan.

Capacity Block Availability

You may have to be more flexible when searching for Capacity Blocks. They are currently limited to P5 instances in only the AWS US East (Ohio) Region. They are also only available up to 8 weeks in advance.

To reserve Capacity Blocks navigate to the “Capacity Reservations” section of the EC2 console or CLI. When searching for available Capacity Blocks you’ll need to specify:

Number of Instances: 1, 2, 4, 8, 16, 32, or 64 instances.
Duration: 1-14 days in one-day increments.
Date Range: Earliest start and latest end dates.

Capacity Block reservations in console

Capacity Block Reservation Example

Once you enter your specifications, you’ll see a list of available reservations meeting your criteria. It’s important to note that flexibility of the number of instances, duration, and date range will return more options.

Pricing of EC2 Capacity Blocks for ML

The cost of Capacity Blocks is dynamic depending on supply and demand. Capacity Blocks are a good fit for those who don’t need long-term instances because you only need to pay for the time frame they’re reserved. They’re also a more predictable option since you pay upfront for your reservation. Price is shown in ascending order in the list of the returned options within the console or CLI.

The pricing page doesn’t explicitly detail the specific cost range. However, Jake Siddall, a technical senior product manager at AWS, provided an in-depth discussion on pricing during an episode of “Under the Hood with AWS Compute” with Lorenzo Winfrey. Siddall explained that the range slightly varies above or below P5 On-Demand rates, with controls in place to prevent significant surges.

Another cost associated with Capacity Blocks is the price for operating system use while your instances are running. Note that Linux and Ubuntu Pro are charged per-second, while Red Hat Enterprise Linux, RHEL with HA, and SUSE Linux Enterprise Server are charged at a flat hourly rate.

Instance	Linux	Red Hat Enterprise Linux (RHEL)	RHEL with HA	SLES	Ubuntu Pro
p5.48xlarge	$0.000 USD	$0.130 USD	$0.165 USD	$0.125 USD	$0.336 USD

Hourly Operating System Rate for EC2 Capacity Blocks

Conclusion

Increased short-term availability at a cost in line with On-Demand P5 instances serves to make GPUs more accessible. Capacity Blocks are particularly useful for short-term workflows, enabling users to utilize powerful GPUs without long-term commitments. This enhanced accessibility makes it easier for companies to integrate AI into their projects and workflows, fostering adaptability and innovation in their respective fields.

Cost Reporting

Kubernetes

Virtual Tagging

Network Flow Reports

Cost Allocation

Budgeting

Resource Reports

Usage-Based Reporting

Anomaly Detection

Autopilot for Savings Plans

Cost Recommendations

Commitment Reports

Unit Costs

Savings Planner

Issues

Team Access

Terraform

Jira

MCP

AWS

Azure

Google Cloud

Oracle Cloud

Kubernetes

Datadog

Snowflake

Fastly

MongoDB

Databricks

New Relic

Confluent

PlanetScale

Coralogix

GitHub

Linode

OpenAI

Grafana Cloud

ClickHouse Cloud

Temporal Cloud

Twilio

Custom Providers

About

Blog

Customers

Podcasts & Talks

Newsroom

Events

Slack Community

EC2Instances.info

cur.vantage.sh

Cloud Cost Reports

FOCUS Converter

Cloud Cost Handbook

Cloud Cost Leaderboard

Product Changelog

Vantage University

API Documentation

MSPs

Partnerships

Our Partners

How AWS is Using Capacity Blocks to Alleviate the GPU Shortage

Amazon EC2 Capacity Blocks for ML

Capacity Block Availability

Pricing of EC2 Capacity Blocks for ML

Conclusion

More from the blog

Features

Documentation

Community

Company