Amazon CloudWatch is an observability tool that gives you the power to monitor your cloud infrastructure and know when there is a service issue. CloudWatch achieves this by aggregating logs and metrics in a central datastore and providing visualization tools for the data. Observability is commonly defined as having "3 Pillars" or primary components to achieve full observability into a system. Metrics and logs are two of these pillars; the third is traces. The core CloudWatch service offers metrics and logs out of the box. Traces are offered as a separate service called AWS X-Ray which can be integrated into CloudWatch via ServiceLens. Since AWS X-Ray is billed as a separate service we won't cover pricing in this article.
Generally, observability tools are made up of 3 primary architectural components; a data ingest pipeline, a datastore and a management console. Observability tools tend to price their service based on these 3 components; CloudWatch is no different. There will be costs for ingesting data into the datastore, costs for retaining data in the datastore and costs for the visualization/management tools needed to work with the data in order to derive insights.
Part 1 of this series will discuss CloudWatch Metrics. Part 2 of this series will discuss CloudWatch Logs and Dashboards.
CloudWatch Free Tier
CloudWatch offers a free tier which will always be applied to your service before you receive any charges based on the paid tier. The free tier has small allowances for each CloudWatch service (Metrics, Logs, Dashboards, etc.) You can view all of the free limits by visiting the CloudWatch pricing page.
CloudWatch Metrics Pricing
CloudWatch prices its Metrics service per metric that is sent to CloudWatch and how frequently the API is called to send or pull a metric. The more metrics you send to CloudWatch and the higher the frequency you call the API, the higher your bill. This is important to keep in mind because the more metrics that you track the easier it is to diagnose specific problems in your system and the higher the rate you send metrics the more granular and precise you can be when solving service issues. Essentially, CloudWatch is priced on fidelity; the higher fidelity the data that is tracked and stored, the higher the cost.
Different than CloudWatch Logs, CloudWatch Metrics do not incur a separate ingest and storage fees. Instead, Metrics combines the fee for ingest and storage into a single fee based on the number of custom metrics tracked. CloudWatch Metrics is able to prevent abuse of storage costs by having an additional cost per API call (which could be considered a data ingest fee) and having a data retention policy that rolls up high resolution metrics into less granular metrics over time. See the section below on data retention below for full details on how metrics are aggregated over time.
CloudWatch Metrics offers volume discounting as you track more custom metrics. Here is the pricing table for Custom Metrics in US East 2 (Ohio):
|Tiers||Cost (metric/month)||Cost of Fully Utilized Tier (month)|
|First 10,000 metrics||$0.30||$3,000|
|Next 240,000 metrics||$0.10||$24,000|
|Next 750,000 metrics||$0.05||$37,500|
|Over 1,000,000 metrics||$0.02||–|
Volume discounts are great for customers that spend tens of thousands of dollars a month on CloudWatch. In order to benefit from the best unit price discount, you'll need to be paying $64,500 per month in CloudWatch Metrics costs. Once your CloudWatch Metrics specific costs hit that level you will be able to take advantage of the $0.02 unit cost for all metrics over 1,000,000. This pricing tier is really only meant for large enterprises, smaller businesses aren't likely to be able to benefit from the higher discounting tiers.
Lastly, pricing is dependent on the region where you store your metrics and metrics exist only in the region in which they are created. Due to this, the costs associated with CloudWatch will vary depending on the distributed nature of your AWS service architecture.
What is a metric?
A metric is a specific value over time. For example, if I want to track the memory utilization for a specific EC2 instance this would be counted as one metric. If I have a group of 10 EC2 instances and I want to track the memory utilization for each of these instances, the memory utilization metric per instance would count as a metric for billing purposes; 10 total in this example. Costs for CloudWatch Metrics scale based on the number of unique metrics (e.g. memory utilization, requests per second, etc.) you want to track and the number of resources (count of EC2 instances, etc.) where each unique metric needs to be collected from.
CloudWatch Metrics Free Tier Allowances
CloudWatch Metrics offers 3 different allowances on its free tier.
- Basic Monitoring (see below)
- 10 Custom Metrics (EC2 Detailed Monitoring metrics count toward this quota)
- 1 Million API requests (not applicable to GetMetricData and GetMetricWidgetImage)
Basic monitoring provides users a few core metrics per service to ensure that users are able to monitor a particular AWS service for availability and high-level performance characteristics. Most AWS Services (EC2, EBS, RDS, S3, Kinesis, etc.) offer basic monitoring. None of the metrics tracked under basic monitoring are billed to a customer. The resolution of basic monitoring is dependent on the service, some default to 1-second, others to 5-second. You can consult the full documentation to understand the specifics of basic monitoring metrics
Basic Monitoring for EC2
Basic monitoring of an EC2 instance includes CPU load, disk I/O, and network I/O metrics. You probably noticed that one metric that is normally considered a baseline metric to track is missing; EC2 does not expose metrics related to memory. You will need to implement a custom metric to track this. By default, Amazon EC2 sends metric data to CloudWatch in 5-minute intervals. If this level of monitoring isn't sufficient for your needs AWS offers a higher fidelity level of monitoring called detailed monitoring.
Detailed Monitoring for EC2
Detailed monitoring delivers metrics in 1-minute intervals, rather than 5-minute intervals. This provides higher fidelity resolution of metrics so that you will be able to know and act on service issues quicker. There is a pricing implication to this though. If you enable detailed monitoring, you will be charged for all of the metrics that were previously included as part of basic monitoring. Detailed monitoring essentially charges all of the basic monitoring metrics as custom metrics from a pricing perspective.
For example, if you are running 10 EC2 instances with detailed monitoring enabled. Your CloudWatch Metric bill will be $18 per month. This is because by default, for the most commonly used EC2 instance types, there are 7 built-in metrics tracked. 7 metrics per instance * 10 instances = 70 metrics total but you get 10 metrics free as part of the CloudWatch free tier, so only 60 metrics will be charged. The cost per metric is $0.30 per month for the first 10,000 metrics. In this scenario we only have 60 metrics which is well below 10,000 which means none of the volume pricing discounts will apply. 60 metrics * $0.30 = $18 per month
Custom metrics are metered differently than any of the built-in metrics. You are charged for both the number of custom metrics that are tracked and the number of API calls (i.e. PutMetricData API calls) that are made.
If you want to be able to monitor memory utilization at a resolution faster than 1 minute then you'll see your API calls scale in a corresponding fashion. For example, if you want to track memory utilization at a 15-second resolution then you will pay 4x the amount in API costs because the API call will be made 4 times per minute. High-resolution metrics like this are only maintained at this level of granularity for a period of time after which they are rolled up into a less granular time period.
Every metric has specific characteristics that describe it, and you can think of dimensions as categories for those characteristics. For example, a metric might be CPU utilization but a dimension might be CPU core. For multi-core machines this could lead to many CPU metrics being tracked by CloudWatch. Dimensions are part of the unique identifier for a metric, whenever you add a unique name/value pair to one of your metrics, you are creating a new variation of that metric. CloudWatch treats each unique combination of dimensions as a separate metric, even if the metrics have the same metric name. In the example above, if the CPU utilization metric is tracked on a 4-core CPU machine then it would create 4 unique metrics and be charged as 4 Custom Metrics rather than one.
It is important when deciding to track specific dimensions of a metric that the cardinality of the dimension is considered. High-cardinality metrics like IP Address or a unique identifier can cause the number of CloudWatch metrics tracked to explode which will have an equally large impact on your AWS bill.
CloudWatch Metrics can receive metrics with a variety of different periods. As a data point ages, it will be "rolled up" or averaged into a less granular data point that aggregates a number of surrounding values. If you create a custom metric with a 15-second resolution and want to look at that metric 4 hours into the last, you will only be able to see the metric at a 1-minute resolution. After 3 hours, CloudWatch will aggregate your 15-second metric into a 1-minute metric thereby reducing the resolution of the high-resolution you pay extra for. Full details on how CloudWatch rolls up metrics below:
- Data points with a period of less than 60 seconds are available for 3 hours. These data points are high-resolution custom metrics.
- Data points with a period of 60 seconds (1 minute) are available for 15 days
- Data points with a period of 300 seconds (5 minute) are available for 63 days
- Data points with a period of 3600 seconds (1 hour) are available for 455 days (15 months)
CloudWatch Metrics Best Practices
- Remove unneeded custom metrics
- Reduce the resolution of metrics that do not need high resolution
- Be careful with setting dimensions for a metric, high-cardinality metrics can have a large impact on the total number of metrics that are tracked
- Take advantage of free basic monitoring on AWS services that are not high priority
Metrics are a balance between data fidelity and cost. The more data that you collect, the more likely you will have the data necessary to pinpoint exactly what went wrong when a service has an issue. Not all service issues are weighted equally, internal services or less important external services can tolerate some level of service disruption and not materially impact your business. Correctly, prioritizing the importance of a service will help in determining which services deserve more monitoring costs associated with them. Mission-critical services should have the most metrics tracked with the highest metric resolution while less important services should only utilize the bare minimum amount of metrics and resolution to keep costs down.
Custom Metrics and EC2 Detailed Monitoring are both premium monitoring tools that should be evaluated using the above framework. Unnecessary or overly detailed metrics will not help a system perform any better but will accrue costs. Consider whether a metric will help diagnose a specific issue before monitoring it or if a metric is already tracked, ask whether the metric has been useful in previous analysis and should be continued to be monitored. If a service is not a high priority, will Basic Monitoring suffice?
The resolution of a metric should also be considered, high-resolution metrics can be vital during an analysis to help find a need in a haystack but they carry a cost. If a service is so important that knowing there is an issue within seconds is important then utilize a high-resolution metric. If on the other hand, having a 5 minute delay on metric delivery and not being able to increase granularity on the metric is okay for the service, there is a cost savings to reducing metric resolution.
These are the most important questions to ask while architecting a well observed system that is also cost effective.
CloudWatch Metrics are a large and complex piece of the entire puzzle of managing costs in AWS. The goal of this series is to help demystify that complexity, surface relevant ways to think about CloudWatch costs and present a list of best practices on how to manage these costs. In the next post in the series, we'll dive deep on the second major component of CloudWatch costs; CloudWatch Logs.
The examples in the post are great for illustrating hypothetical situations to better discuss the topics. If you are looking to see how these concepts apply to your Amazon CloudWatch bill with your data, Vantage has created a Cost Transparency platform to help users better understand their AWS costs and do automated analysis to surface when an organization is not applying CloudWatch cost best practices. Take a look at your data by creating a Vantage account and connecting your AWS account to it. Users typically find 20% cost savings within the first 5 minutes of signing up.