Navigating a shady cloud bill – here are seven red flags, and how to avoid them

Following the global shift to the cloud in recent years, the amount of corporate data stored in the cloud has at least doubled since 2015. And yet, companies are unequipped with the knowledge and know-how to employ cloud platforms in a cost-effective way, resulting in spiralling, and unnecessary, cloud costs.

Some cloud bill ‘red flags’ are easy to spot, like inadequately provisioned resources, misconfigured services or unexpected charges and fees. However, many slip under the radar, taking companies by surprise and biting them in the back when the bill arrives.

Here are seven all-too-common red flags to look out for in order to maximise cloud productivity in a cost-effective way:

  1. Paying for AWS CloudTrail
  2. Not applying appropriate object lifecycle policies
  3. Being oblivious to third-party API calls
  4. Over-logging logging services
  5. Not cross-checking decisions
  6. Failing to limit regions and instance types with organisational policies 
  7. Excessive API calls to storage buckets

Paying for AWS CloudTrail

Firstly, if you find yourself paying for CloudTrail at all, that means you have extra trails you need to remove. The first CloudTrail trail in a region is free, and your only trail should be at the AWS Organization level–they’re automatically created in all member accounts within an Organization anyway.

Since trails propagate into all member accounts, this will also help you to consistently apply and enforce your event logging strategy across accounts. You should therefore check that the configuration of your Organization trails matches how you would like the trails configured for all accounts within it.

Not applying appropriate object lifecycle policies 

If cloud storage costs are steadily increasing over time, you may need to rethink your object lifecycle policies.

The purpose of these policies is to ensure that you’re not overspending by storing data that doesn’t require immediate access, or that has become obsolete. Without them, you will end up with an overwhelming, ever-growing log store, and/or excess snapshots, and your storage costs will increase as a result.

In most cases, objects can be transitioned or expired after thirty to ninety days, so it is worth taking a closer look if you see costs continuously rising.

Being oblivious to third-party API calls 

If you’re not careful, third-party services like New Relic and Datadog will quickly become a sinking hole in your cloud costs.

Although these services offer benefits around observability and more, the API requests that they’re making on your usage data aren’t free, and you are paying the price for it! It’s worth looking into the frequency of the API calls, and what metrics are actually being scanned, to see if the frequency, and the granularity of data, are necessary for what you’re looking for.

Luckily, this one is a pretty easy fix: if you’re concerned, simply speak to your third-party service provider about modifying the frequency and metrics being pulled for your projects.

Over-logging logging services

There’s no way around the fact that logging is essential, especially when it comes to monitoring and troubleshooting purposes. But if you’re spending more than 20% of your cloud bill on this, then something is going wrong.

Similarly to the previous point, ask yourself how you’re taking action on those logs and whether the frequency you’re logging at is necessary for the given workload. Speak to the teams utilising these logs, and you’ll be able to determine how you can tweak these to lower your bill. For example, if you’re feeding a dashboard with data from logs, you don’t necessarily need to get per-second updates — an update every five minutes may suffice.

Not cross-checking decisions

There’s no right or wrong decision when it comes to building in the cloud. However, making cloud-related decisions in a vacuum can lead to inefficiencies and increased costs.

To achieve maximum efficiency and optimal costs, one person should not be making decisions alone.

Check with peers, and across teams, that decisions are sound. As an example, this could be making sure that cloud infrastructure decisions are aligned with engineering strategy and vision, or the relevant RFC/ADR documents.

Although this isn’t a red flag that you’ll be able to account for on your bill, or a cost and usage report, failing to cross-check decisions has the potential to grow into a headache and performance problems down the road.

Failing to limit regions and instance types with organisational policies 

Organisational policies can be helpful in defining how your cloud users can access, utilise and manage company cloud resources. Without them, you risk your cloud infrastructure to security and spending pitfalls, inviting the possibility for individuals to deploy instances in unused regions or even carry out malicious activities.

That’s why you should enable regional restrictions across your accounts and instance-type restrictions in non-prod environments.

Implementing organisational policies enhances the safeguarding of your cloud environment and ensures that nobody can spin up (neither maliciously nor by accident), thereby optimising resource utilisation as a result.

Excessive API calls to storage buckets

It’s not unusual for applications to make frequent API calls to cloud storage buckets, although this can quickly become problematic. A high cadence of calls can amplify storage expenses and disrupt performance by causing slowdowns, timeouts and, in a worst-case scenario, service outages.

To address this issue, it’s important to monitor and optimise API calls to storage buckets. Consider implementing caching mechanisms to reduce the need for repeated calls. This will help you minimize costs and improve performance.

Regularly review your application’s API call patterns and assess whether they can be optimised. Look for opportunities to reduce the frequency of calls or consolidate multiple calls into a single request. This can be achieved through batch processing or leveraging cloud-native features like event-driven architectures or serverless computing.

By reducing excess API calls, you can not only save on cloud costs but also enhance the overall efficiency and reliability of your application.

Keeping an eye out for these red flags and implementing necessary optimizations can go a long way in maximising cloud productivity while minimising costs. Remember to regularly review your cloud spend, involve the entire team in identifying optimization opportunities, and cultivate a cost optimization culture within your organisation. By staying proactive and vigilant, you can avoid hidden costs and ensure that your cloud bill reflects the true value you receive from your cloud services.

Worth a read

Matan Bordo
Matan Bordo

Matan Bordo got his start working for a VC fund and has since become a Product Marketing Manager at DoiT. He has contributed to TechFinitive under our Opinions section.

NEXT UP

Fully Homomorphic Encryption (FHE) EXPLAINED

Fully Homomorphic Encryption (FHE) explained

From Caesar’s cypher to Fully Homomorphic Encryption (FHE) – Jeremy Bradley, COO, Zama, explains, in this sponsored article, exactly what FHE is, how it has evolved, what it is now capable of and how far off truly universalised FHE is.