Save by using Anything Other than a NAT Gateway

NAT gateways are the default way to handle networking in AWS. They are also the most expensive, with 300% higher data transfer costs than the next cheapest option. To reduce costs for customers, AWS introduced a new networking primitive called VPC Endpoints. Infrastructure teams with:

  • Data warehouse workloads
  • Web scaping
  • Logging to Datadog

and more can apply VPC Endpoints to reduce their data transfer costs (skip to scenarios).

NAT Gateway

Within the Vantage userbase, NAT Gateways and associated data transfer costs are a top 20 category of cloud spend

But what good is knowledge without action? The AWS networking review and NAT Gateway migration guide below will help developers use VPC Endpoints to save in their own clouds.

Networking in AWS Overview

AWS Networking Setup

Various networking primitives in a classic arrangement in AWS. Instances sit in a public and private subnet, with a NAT gateway in the public subnet allowing instances in the private subnet to talk to machines on the open internet, but not vice versa. Image source.

AWS networking architectures rapidly get more complex. The simplest possible setup on EC2 would involve a virtual private cloud with one EC2 instance in one region and one availability zone. The instance would have a public IP that could be SSH’d into and pinged from the open internet. A developer could write an application, ftp it to the server, and serve requests.

But a setup like this will not work for long. As applications start processing sensitive data or become larger it is necessary to hide their IP addresses from prying eyes on the internet and scale their connections behind a load balancer. To allow for SSH tunnels, updates from public package repositories, curling data, pulling docker images, and countless other tasks, developers make use of Network Address Translation (NAT) gateways. These interfaces block incoming requests but allow outgoing requests to the internet from the instance.

Since NAT gateways are the default easiest option to configure, many teams will set them up once and not touch them again. Sending all traffic through NAT Gateways is a costly mistake. Running data intensive services, pulling containers, replicating data, and streaming media through NAT Gateways can quickly incur significant costs.

Happily, VPC endpoints are available to handle this traffic at a significantly lower, sometimes even free cost. The basic principle of reducing data transfer costs is that the farther data has to travel, the more expensive it is. To get the best price, let’s study the pricing structure of NAT Gateways and VPC Endpoints.

NAT Gateway Pricing

NAT gateways in us-east are charged per hour and per GB processed. GB processed charges are different per region. It’s also true that data transfer charges are applied if the NAT gateway and the EC2 instance are in different availability zones (AZs). From that we can infer that, in a multi AZ environment, we would be paying for multiple NAT gateways as well.

  per Month per GB processed
NAT Gateway $32.85 $0.045

Note that a NAT Gateway is different than an Internet gateway. Internet Gateways allow public instances with public IPs to access the internet and are free to use. NAT Gateways are for private instances and incur a charge.

VPC Endpoint Pricing

VPC Endpoints are a feature of PrivateLink, a networking service for moving data within or between clouds. PrivateLink pricing is charged per VPC endpoint per AZ per hour and also costs per GB of data processed with a tiered price reduction at petabyte volumes. There is also a gateway load balancer endpoint with similar pricing and specialized use cases for fleets of private instances. Both are charged differently per region.

  per Month per GB processed
VPC Endpoint up to 1 PB data $7.30 $0.01
VPC Endpoint up to 5 PB data   $0.006
VPC Endpoint > 5 PB data   $0.004
VPC Load Balancer Endpoint $7.30 $0.0035

Not all AWS Services can be connected to using PrivateLink. And, in what may be considered an unexpected billing delight, PrivateLink pricing is included by some AWS Services in their own billing. For example, AWS DataSync and Elastic Inference do not generate any PrivateLink per hour or per GB charges.

Scenarios to Reduce Networking Costs on AWS

At first glance the VPC endpoint is a lot cheaper, nearly 5X cheaper to run per month and per GB. But VPC Endpoints cannot be used in all situations. By looking at some example workloads, we can understand in which situations it would be cost efficient to migrate traffic to endpoints.

Scenario 1: Webcrawlers with many AWS Services

Let’s say we have a fleet of web crawlers. By their nature, these servers need to access the web to scrape data, update indexes, and get new information. They must access the web from a private subnet and so we have no choice but to use NAT gateways. For this setup we have 100 memory optimized m5dn.metal instances crawling 50 TB of web data a month. There’s no helping it, these instances are making requests to websites and that data is costing $2,250 in data transfer charges a month to ingest them over the NAT gateway.

But our servers are also extracting data from these webpages and storing it in S3, RDS, ElastiCache, and sending it to Lambda to the tune of another 40 TB a month which is costing an additional $2,250. Instead, we can transfer data using VPC endpoints and pay only $0.01 per GB instead of $0.045, a 78% savings versus sending this data through the NAT gateway. In addition, keep in mind that transfer to S3 can be made free with Gateway Endpoints, so the actual savings from a fully cost efficient setup are even greater.

It is possible to use a NAT Gateway and a VPC Endpoint in the same subnet. The VPC endpoint traffic will take priority since it is a more specific address range.

Scenario 2: Data Warehouse with ETL

In this example, the application is a cloud costs company making heavy use of Redshift and ECS. Cloud billing data comes in, is transformed and analyzed, and then offloaded to Redshift for historical access and detailed report generation. Currently the company is using NAT gateways for everything, including pulling down 100 TB of container images from ECR for workers and moving 400 TB of pricing data through NAT gateways that sit between ECS and Redshift.

Routing traffic in this way is resulting in $22,500 in data transfer costs per month. By switching to VPC endpoints for this traffic, the company could save $17,500 on their bill, give or take the cost of running the VPC endpoint, which is per endpoint per AZ, or $29.20 for 4 endpoints for 2 services across 2 AZs.

Scenario 3: Datadog log ingestion

Datadog is an omnipresent log ingestion, profiling, and analytics platform. For many services there are significant amounts of logs and analytics produced, well into the hundreds of GBs. As your services are logging to Datadog through Datadog endpoints, they are passing through NAT gateways. 500 GB of log files costs $22.50 through a NAT gateway, but only $5 through a VPC endpoint. Datadog provides VPC endpoints which can be configured through PrivateLink for immediately lower data transfer charges for logging.

Networking in AWS in Detail

Analzying workloads gives some flavor for where to find savings. Now consider how data moves through AWS networks in detail to understand where the data transfer costs of your own cloud are being generated.

VPCs, Regions, and AZs

Each AWS account has at least one VPC that spans regions and availability zones. Regions are geographically distributed and there is a data transfer cost to move data between them. AZs are physically redundant networks within one region. In many cases it can be free to move data between AZs, but the network must be setup correctly. Moving data within an AZ is always free.

NAT Gateways

The primary function of a NAT gateway is to request data on the open internet, outside of the AWS cloud. Aptitude getting package updates from packages.ubuntu.com is an example of where this would happen. However, many data requests that companies are making are made within their own VPC where a NAT gateway is unnecessary and expensive. You can find your NAT Gateway costs in the Active Resource Inventory in the console.

NAT Gateway

Examine data transfer costs for your NAT Gateways in the console.

Companies can save significant money by not sending private traffic or traffic to AWS services through NAT Gateways. An example would be pulling ECR images over a NAT gateway. In recent years, AWS has made changes to decouple NAT gateways from private traffic, for example by making Internet Gateways optional to use with NAT Gateways.

VPC Endpoints

To increase security at the network level, VPC Endpoints allow your services to talk to AWS services or managed services in use without sending data over the open network through a NAT gateway. Like Datadog, SQL services could talk to Snowflake without the requests and responses leaving the AWS network. Routing data like this reduces costs for customers because AWS does not pay peering fees on the open network. Pricing and architecture possibilities with PrivateLink have continued to evolve, including as recently as April 2022 where AWS announced that data would be free using PrivateLink within the same region.

There are some subtle cost benefits of using VPCs. AWS says:

Reducing the traffic flowing through internet gateways would also decrease the volume of network traffic processed by associated perimeter security controls (e.g. network firewalls) in place. As such, moving to private subnets and using private IP addresses provides cost savings that could partially or fully offset the cost of interface endpoints used to access the same service over PrivateLink.

Migrating from NAT Gateways to VPC Endpoints

Does your architecture match one of the cheaper-with-VPC setups above? Do you have data flowing through NAT Gateways that are bound for other services within your own network? If you see significant NAT Gateway costs in the console, you likely can save by migrating to VPC Endpoints. AWS themselves recommends this to lower data transfer costs through NAT gateways.

For interface endpoints, which allow for DNS resolution in contrast to VPC endpoints, it is easy and verifiable to adjust the flow of traffic. Engineers or devops can create these endpoints on the service first, test them, and then connect to them from production workloads. Site reliability engineering (SRE) protip: VPC Endpoints have a scale-up time of a few minutes when there is significant traffic already through the network.

Voila! Up to 80% lower data transfer costs going forward.