Since the rise of NoSQL, MongoDB has been a popular database management system because of its flexible and scalable handling of unstructured or semi-structured data. As a provider of cost reporting tools, we’ve noticed common implementation inefficiencies in different areas of MongoDB that can be addressed to reduce costs.

These tips are categorized into three key areas: optimizing your chosen pricing tier, managing data transfer costs, and reducing data storage expenses. So how can you buy MongoDB anyway?

Review of MongoDB Pricing Model

MongoDB is available in different forms: Cloud (MongoDB Atlas) and Server (Community or Enterprise). The Enterprise edition caters to businesses with specific needs and requirements. The Community version is free; however, handling and provisioning your infrastructure will incur costs.

MongoDB Atlas includes several additional features, such as automated deployment, scalability, monitoring, and security. These features alleviate concerns related to manual handling and provisioning and the unpredictable costs that come with it. It can be used with AWS, Azure, and Google Cloud. Our focus will be on cost-saving strategies, specifically tailored for MongoDB Atlas, although some tips may be applicable to other MongoDB versions.

MongoDB Atlas consists of three pricing tiers: Serverless, Shared, and Dedicated. Costs may vary based on provider and region.

Serverless

Described as “pay-per-operation pricing,” operations include reads, writes, storage, backups, data transfer, and more. Serverless is recommended for specific use cases, such as low and variable workloads or workloads with high bursts in activities. Read Processing Unit (RPU) costs are tiered starting at $0.10 per million for the first 50 million per day. Write Processing Unit (WPU) costs $1.00 per one million. Storage is $0.25 per GB-month.

Shared

Catering to learning and exploration, Shared provides three cluster tiers, ranging from 512 MB to 5 GB, each featuring shared CPU and RAM. The cost spectrum spans from free (M0, not recommended for production) to $25.

Dedicated

Choose from over 10 cluster tiers. Price, storage, RAM, and CPU vary per cloud provider and region. AWS has tiers starting at 10 GB storage, 2 GB RAM, and 2 vCPUs for $0.08/hr to 4000 GB storage, 768 GB RAM, and 96 vCPUs for $33.26/hr. Note that the hourly cost applies only per instance hour your cluster is running. Additional costs will occur for backups, data transfer, etc.

Data Transfer Costs

Data transfer costs apply to Serverless and Dedicated, vary by provider, and are tiered. AWS has the simplest pricing:

Data Transfer Type Cost per GB
Same region $0.01
Cross-region $0.02
Internet $0.09

MongoDB AWS data transfer costs.

Azure’s cost tiers (from lowest to highest) are as follows for data transfer: between Availability Zones, using in-region VNet peering, to a different region, for an Atlas cluster depending on the geographic location of the source node, and using cross-region VNet peering.

GCP’s cost tiers (from lowest to highest) are as follows for data transfer: between zones in the same region, between USA regions, between continents, between regions in the same continent other than the USA, and to a location outside of a Google Cloud data center (excluding incoming transfers).

Considerations by MongoDB Pricing Model

Serverless

Looking through this thread alone, you’ll see many customers who signed up for Serverless with the expectation of benefiting from low costs, only to find themselves facing unexpectedly high bills—sometimes surpassing what they would have paid for Dedicated Storage expenses. Two things to consider: (1) make sure Serverless truly meets your use case and (2) optimizing queries is extra important when it comes to Serverless pricing since charges are incurred based on reads.

Unindexed Queries

There are several optimization best practices, one of the most common that leads to staggering costs is unindexed queries. When queries are not indexed, it will result in a read of the entire collection, significantly increasing operations and associated costs. To put that into perspective—an example from MongoDB highlights how indexed queries led to a 99.8% reduction in RPUs. Use the following command to create an index in your cluster using the Atlas CLI:

atlas clusters indexes create [indexName] [options]
Schema Design

Another key aspect of optimization is schema design. Pitfalls occur especially when migrating from a relational database to MongoDB, where differences in data modeling best practices can impact performance. For example, MongoDB recommends utilizing their document model instead of joins to re-combine shallow documents. Schema design is a nuanced process that would be too intricate to fully capture in a blog, but luckily MongoDB provides a free course explaining best practices for schema design patterns.

Atlas Tools

Lastly, take advantage of built-in monitoring tools that identify areas for optimization and configure alerts for spikes in cost.

Dedicated Instances

The most obvious way to save with Dedicated Instances is by choosing the most cost-effective instance size that still meets the requirements of your workload.

Cluster Auto-Scaling

You can take some of the guesswork out of selecting an instance size by enabling cluster autos-caling. With cluster auto-scaling, set the minimum and maximum cluster tiers, and Atlas will scale up or down based on CPU and memory utilization. To manage cluster auto-scaling in your settings, go to the Cluster tier section and navigate to Auto-scale options.

Optimization and Sharded Clusters

When applied, the optimization efforts we went over in the previous section can provide a cost-benefit. If query performance is improved, it could reduce the need for higher RAM and CPU resources, making it feasible to consider a lower cluster tier. In addition, using sharded clusters distributes compute and storage resources across multiple nodes. This distribution can optimize resource utilization and potentially allow you to scale down to a less expensive cluster tier. See the MongoDB docs for instructions on how to deploy a sharded cluster.

Low RAM or CPU Workloads

For workloads with either low CPU or RAM, additional configuration can save you money. For low RAM workloads, save money by downscaling your cluster to one with lower RAM, then add more storage. Similarly, for workloads that don’t use a lot of CPU, AWS has low CPU instance types. These instance types have similar RAM and storage to comparable cluster tiers at a lower price.

Save on MongoDB Data Transfer Costs

Data transfer costs apply to both Serverless and Dedicated Instances and often have the most opportunity for substantial savings.

Reduce Calls

One way to reduce calls is by using local caching to store frequently accessed data locally, minimizing the need to make repeated calls to the remote database. Not only can this improve response times, but it can reduce the overall volume of data transfer.

Reduce Data Retrieval

A MongoDB Reddit community member highlighted some helpful tips to reduce data retrieval. These include the addition of paginated responses to large calls and projecting data on a need-only basis.

Paginating responses can help you save on data transfer costs by fetching smaller portions of data at a time, thereby reducing the overall volume of data transferred over the network. To use pagination in MongoDB you can use the limit() method to “specify the maximum number of documents to return in a single query” and the skip() method to “specify the number of documents to skip before starting to return the data.”

Projecting data means specifying which fields or attributes of a document should be included or excluded, which reduces the volume and lowers costs. To implement, use the following syntax:

db.collection.find({},
  {
    field1: value2,
    field2: value2,
    ..
  }
)
Use Cheaper Data Transfer Tiers

Data transfer costs are tiered, and though they vary for providers, implementing strategies like using the same or closest region and utilizing private routes can help you save on data transfer costs. One way of ensuring this is being done is by setting your read preference to nearest. Private routes can be used to avoid high internet data transfer costs; however, costs may increase in other areas from your cloud provider.

Implement Network Compression

Finally, MongoDB has built-in compression that is enabled in later versions by default. To ensure that it is configured, check net.compression in your configuration file.

Save on MongoDB Data Storage Costs

Optimizing storage can lower costs, whether by reducing the amount of GB that you are charged for with Serverless, scaling down cluster tiers with Dedicated Instances, or reducing added storage with Dedicated Instances.

Automated Data Tiering

One cost-saving option for automated data tiering is MongoDB Atlas Online Archive (for Dedicated Instances M10 or higher). This automated tiering tool can be configured to automatically archive data to managed object storage, while still allowing you to query your data.

Reclaiming Disk Space

In cases where large amounts of data are archived or deleted, you may find that the disk space is not immediately released. To remedy this, use this command:

db.runCommand(
  {
    compact: <collection name>
  }
)
Storage Compression

Similar to data compression, MongoDB offers storage compression, enabled by default with WiredTiger (the default MongoDB storage engine). Check the storage.wiredTiger.collectionConfig.blockCompressor setting to ensure it is enabled.

Conclusion

MongoDB cost optimization requires efforts across several categories. Reviewing the MongoDB pricing model serves as a reminder of the significance of choosing the right tier and understanding the associated costs. Serverless, in particular, demands careful consideration and query optimization to avoid unexpected expenses. Save with Dedicated Instances by ensuring you have the correct instance size for your needs, and scale down when possible to save money.

Data transfer costs present a significant opportunity for savings through various strategies, including local caching, paginated responses, and private routes. Storage cost savings are achievable through automated data tiering, disk space reclamation, and storage compression. By carefully understanding how MongoDB is priced - and how to get visibility into your own bill - you can leverage these areas to put your MongoDB costs on a sustainable path.