Future-Proofing Your Tagging Strategy

A company's tagging strategy is both the foundation of every FinOps strategy, as well as the component that is always in continual refinement. Tags themselves are fairly simple: key/value pairs that can be applied to a variety of different cloud infrastructure resources.

Complexities arise with the questions of when and how to apply these tags. Are tags applied when a resource is created, or at a later point? Can tags be applied at resource creation? Are specific tags mandatory? How is that requirement enforced? Is there a company-wide strategy for tagging, or is each team applying tags independently?

From an accounting perspective, tags are used to allocate costs to different teams or products. But outside of FinOps, tags are also important for functions like managing cloud environments, measuring performance, and maintaining reliability and security.

The limitations of a company's existing tagging strategy usually become glaringly apparent when trying to build a robust FinOps practice. Every untagged resource and typo becomes a percent of unallocated spend.

It's then in the best interest of the FinOps Practitioner to develop a new company-wide tagging strategy and orchestrate a rollout.

These are the things you need to consider:

Tag Value Compatibility

When developing a tagging strategy, you want to make your naming scheme as future-proof as possible. This means you'll need two main considerations:

Compatibility with all major cloud providers
Tag keys and values that communicate clearly

Currently, the big three cloud providers are AWS, Azure, and GCP. Even if your company is only operating in one of these clouds, you may want to consider tag keys and values that are compatible across all three providers. This gives you the ability to launch into a new cloud without needing to re-think the tagging strategy, and makes life easier when using a multi-cloud cost visibility tool, like Vantage.

Attribute	AWS	Azure	GCP	Constraining Factor.
Max Tag Length	128	512	63	63
Max Value Length	256	256	63	63
Allowed Characters	`a-z, 0-9, +-=,_:/@`	`a-z, 0-9, _,-`	`a-z, 0-9, _,-`	`a-z, 0-9, _,-`
Case sensitive	Yes	No	Yes, lowercase only	Yes, lowercase only
Reserved Tags	aws:	N/A	N/A	N/A
Max Tags	50	50	64	50

Comparing the tag value requirements of AWS, Azure, and GCP.

Each cloud provider has slightly different requirements when it comes to tags, such as the max length of the tag, allowed characters, case sensitivity, and the max number of tags allowed on a single resource.

When planning your tags, you should design for the most constraining factor. For example, Azure has a max tag value length of 256, while GCP is only 63. If you ensure all your tag values are less than 63 characters, even if you're not on GCP, you ensure future compatibility if needed. Plus, 63 characters is still a lot to work with.

More important is special characters. Amazon is the most permissive, allowing a-z, 0-9, +-=,_:/@, while both Azure and GCP only support a-z, 0-9, _,-. Similar to tag length, you should plan tags to the most restrictive rules, even if you're currently only operating in AWS.

Secondly, make sure you use tag keys and values that clearly communicate the owner or function of the resource.

If tagging a resource with that resource's owner, you should avoid naming individuals and instead use teams or groups as the tag value. Individuals often move between teams, or leave companies, so using a group's name here greatly increases the value's longevity.

Sometimes companies will use "fun" names to refer to infernal functions or microservices. Maybe early employees of the company were fans of a particular TV series or book and all apps are named after characters or locations.

People will often use these fun-names as tag values on resources—after all, it is the actual name of the service.

But if you're re-designing a tagging strategy and are considering using fun-names as tag values, understand the cost of this decision. For one, it can take longer for new employees to onboard, and even for existing employees to understand the scope of the company, depending on how many different services with fun-names you have.

Additionally, consider whether the company may want to move away from these fun-names at any point in the future. If the answer is "maybe" you may want to consider making the change now, because that switch will only get more difficult with time.

What Tags to Use

Individual resources can have numerous tags. AWS and Azure both max out at 50 tags per resource, and GCP bumps that limit up to a whopping 60.

Tags are important outside of accounting—engineers use tags to manage resources and measure performance. This means that teams will likely already be using tags in some form or another when it comes time to build out a FinOps practice. The question is: can you build a full FinOps practice with the tags currently in use for cloud management? Or do you need a new set of tags?

You may want to consider a mix of required tags, optional tags, and flexible team-specific tags. This can allow teams to continue using tags as usual, without impacting to existing workflows, while adding additional tags that can give the FinOps team greater visibility into the company’s overall spend.

From an accounting perspective, the most important tag is cost center: to which department or group should these costs be attributed? At a bare minimum, this should be a required tag.

After that, there are additional tags that can be extremely useful, which you may want to consider making required, or at least strongly encouraged. These can include:

Team: To what group does this resource belong? There can be multiple teams in a cost center
Environment: Is this production, staging, sandbox, etc
Service: Does this resource perform a named function?

Additionally, there are more tags that can be useful in a FinOps capacity but might not be requirements:

Geography
Research and Development (R&D)
Project

Each company should make decisions around which FinOps-related tags are required, which are strongely suggested, and which are optional. Even with five or six tag keys reserved for FinOps and accounting purposes, that leaves plenty of room for additional team-specific tags that engineers can use for managing their environment.

Remediation VS Enforcement

If you’re going to make certain tags required, you also need to decide what it means to enforce that requirement. This decision is particularly important if you’re rolling out a new tagging policy in an environment that’s already using various tags in a non-uniform way.

There are two basic strategies that you can take when enforcing new tag requirements: remediation or enforcement.

Remediation

A remediation strategy involves fixing mistakes and incongruities after-the-fact. This is often attempted in environments with almost usable tagging. Maybe 60% - 70% of the environment has tags, and those tags are used inconsistently and are full of typos. Remediation involves identifying the untagged resources, getting them tagged, and fixing all the mistakes along the way.

This can be a grueling process since it often involves partnering with so many different teams. Different teams may tag resources in different ways, which means each of these tag omissions or mistakes might be present in code in a variety of different places. As a centralized FinOps Practitioner or team, it would be nearly impossible to try and hunt each of these locations and track a change. Therefore, the onus is put on individual engineering teams to both find the mistakes and fix them—-which can easily become a months- or quarters-long task that never reaches 100% completion.

Despite these challenges, it can still be worth some level of remediation when tackling tagging changes. Don’t try and fix all the issues—instead, prioritize them in order of cost, so the highest-spending resources get the first fixes.

When deciding how much effort to put into remediation, you need to consider the amount of effort weighed against the value of having this corrected data. Does the company want to put $100 of effort into fixing a miscategorization of $5? Most likely not.

Enforcement

Alternately, there’s enforcement. This is where a company sets a global rule that prevents resources that lack certain tags from even launching. This is a clean way to make sure that all the resources running on an environment have a specific base level of tagging visibility.

The ideal time to add a tag is at resource creation, especially in environments managed by Infrastructure as Code. Create AWS SCP tagging policies or an Azure Policy that makes sure these tags exist before allowing the infrastructure to deploy.

If this isn’t possible, the next best time to add a tag is immediately after resource creation, such as with a Lambda function that watches for new infrastructure and applies the appropriate tags. This is also a good solution for resources that do not allow users to apply tags at the time of creation, such as an EBS instance linked to an EC2 instance that was created via a CI/CD pipeline.

With enforcement, the best place to start is with new resources. With the support of senior leadership, choose a date, communicate the plan clearly, and begin enforcement of tags.

Enforcement can also pair nicely with Remediation. Once an enforcement plan is in place, a next step could be to find the highest-priority remediation targets and partner with the engineering team owner to get these resolved.

Getting Engineers to Take Action

Perhaps most crucial to a successful rollout of a new tagging strategy is the support of senior leadership. Priorities always work from the top down, and the backing of one or more executive sponsors will make it much easier to implement change and get engineers to take action.

Additionally, communication is key. Proper and thorough documentation should be created and posted on internal wikis. Additionally, a series of pre-rollout meetings can be hosted, where all impacted employees are invited and have a chance to ask questions and review the upcoming changes.

It may feel like you’re repeating a lot of the same information when leading up to the enforcement deadline, but this is the only way to ensure the changes are known by the highest number of employees. Each person will have a different way of working and consuming information, and repeating yourself in several different ways (from meeting invites, wiki pages, emails, and even direct messages) can really help make the rollout a success.

It’s a huge undertaking to roll out an updated tagging strategy but it can have huge upsides for a company and its level of financial visibility. It’s important to create a naming strategy that’s future-proof and to structure your rollout for the highest chance of success.