Cloud costs are spiraling out of control at companies of all sizes. Here’s how to not let your cloud infrastructure costs handcuff your business.
I hope you’re hugely successful in your business!
But when you are, you may share the experience many of us have had — watching costs grow out of control. It’s usually the right decision to worry about those things post-traction (and tech scaling).
But I want to talk a bit about architectural decisions up front to minimize some of the growth pain later.
One big area of modern business cost relates to cloud usage and cost and often is exacerbated by people getting trapped in “Cloud Jail.”
How do you wind up in Cloud Jail?
It’s a story as old as the public clouds. A startup gets a $500K credit to set up its infrastructure in the cloud. “Excellent,” they think, “that’ll last us at least a year, and then either we won’t get traction, or we’ll have enough revenue to cover the costs as they increase.”
It starts out great; they’re probably spending around $20K monthly on infrastructure costs. But, as they grow, so does their cloud bill. It becomes $50K — then $100K a month, then $250k. Suddenly, they’re out of credit as their bill continues to climb.
Whether it’s the CFO, exec team, or the board, there’s usually an alarm point where people say, “Wait. What happened?! We were supposed to spend most of our expenses on people, and now you’re pouring it all into the cloud!”
A team is formed, and the company scrambles as fast as possible to get those cloud costs down. They optimize their database-as-a-service use, buy spot or reserved instances, find and kill needless network data transfer, tune their instance types, and track down and delete the unused object and block storage, bringing their costs down to a more manageable $70K per month.
The problem now, though, is that they’re still growing. If they grow to plan, they’ll be back to blowing past the water line in no time. And, to make things worse, the optimizations they put in place to bring down costs will only hold at the current levels of cloud usage, and as the company grows soon, they’ll grow to the next level. In the post-2021 world of suddenly not-so-free money, that can risk the company’s ability to operate efficiently at scale.
A Cloud Jail example
For example, once upon a time (in 2008), a company specializing in video encoding and streaming approached me with a staggering $300,000/month cloud bill that was only getting bigger monthly. This bill was eating into their margins, causing them to lose money as they grew. Together, we moved 500TB and 10 gigabits/sec of streaming from their public cloud provider to their own infrastructure.
The result? Their bill dropped to under $100,000/month, including the salaries of staff who managed their physical infrastructure and routers. Over the next few years, that grew back to $250,000/mo all-in as they scaled 5x, and if they had stayed in pure cloud, it would have easily been well over $1,000,000/month.
After this roller coaster ride, they told me they wished they’d initially invested time in setting up a hybrid of cloud-based infrastructure and their own servers, or at least a multi-cloud system.
They’d have avoided the trap that left them struggling to migrate as they scaled. In today’s lingo, they’d fallen into “Cloud Jail,” utterly dependent on their cloud provider and finding it difficult and expensive to break free.
The hooks: The industry leaders are exceptionally skilled at keeping customers hooked with features like user identity, authentication, queuing, email, notifications, and seamless databases. These lightweight services save time, but only if you use those platforms. Their magic lies in making it nearly impossible for customers to leave, despite escalating costs for storage and bandwidth.
Then, the dreaded call from the board comes, questioning why your gross margin is less than 40% and why you’re spending more on infrastructure than developers. You try to explain that costs should’ve decreased as you grew, but that’s not what’s happening. In today’s competitive VC market, these explanations won’t suffice.
But while hybrid and multi-cloud use is on the rise, very few companies that I see are moving completely away from public cloud infrastructure and services.
How to avoid Cloud Jail
First, it’s essential to dive into infrastructure with a clear understanding that the cloud isn’t always the cheapest or most efficient option just because it’s “cloud.” Especially for always-on infrastructure.
It’s crucial to have a plan in place for when you reach a scale where relying solely on the cloud becomes impractical due to cost or performance reasons. You should know how to run at least some of your own infrastructure and bring on early team members who have experience with such alternatives.
By this, I don’t mean constructing a building and filling it with chillers and racks.
Instead, at medium scale and beyond, leasing colocation space in existing facilities managed by someone else, and investing in servers and switching/routing gear. This approach is generally more cost-effective at scale, particularly for the non-bursting and steadily increasing workloads commonly found in many startup infrastructures.
The sooner you consider these options, the better. If possible, start by running multi-cloud and then, after gaining initial traction, establish a small infrastructure connected to your cloud provider(s).
You can spend under $5,000 a month on space, power, and bandwidth by running your own starter infrastructure. Although initial equipment purchases may range from $50,000 to the low hundreds of thousands, these costs on a multi-year basis are relatively low compared to cloud compute, storage, and bandwidth. It means you can afford staff to manage your infrastructure early on.
Operating servers in dedicated infrastructures has also become more straightforward over the years. Most operations teams now treat servers as “cattle, not pets” and can flexibly deploy applications using configuration management systems or containerization and container orchestration systems. It’s not that hard for platform teams to run these netbooted and/or “kubernetified.”
Hiring the right staff also makes a world of difference.
A small team of three to five people can manage both cloud and dedicated infrastructure, and this same team can often run a system ten times larger than when they started. This scalability is invaluable. As soon as possible, hire an infrastructure team lead with a solid background in running hybrid systems, including both cloud and physical infrastructure. This expert will keep an eye on your growing costs and know when it’s time to make the right changes. Prioritizing this investment early on can make all the difference.
Are you on the path to Cloud Jail?
You may be already careening towards cloud jail and want to know if there’s anything to do about it. I’m happy to say that it’s not too late. You can still pass “go” and collect your $200.
I’d recommend that startups keep an eye on these indicators to gauge whether they’re approaching dangerous territory:
Calculate the portion of your bill associated with “always on” and “steady state” or consistently growing workloads. When these costs surpass the $100,000/month mark, you may be approaching the tipping point sooner than you think.
Pay attention to the number of infrastructure services you purchase from your cloud provider(s) beyond basic compute, network, and storage. Consider services like authentication, load balancing, SQL, and NoSQL services. Are there alternative options available? Will the services you’re using now work well over a direct connection to your own infrastructure if and when the time comes, or might they trap you in a single-provider jail?
Be on the lookout for network performance issues that your current provider(s) can’t or won’t address, such as packet loss and subpar throughput to specific geographies or internet providers. If CDNs and SD-WAN acceleration services can’t resolve these problems, that’s a red flag. For many SaaS and web companies, performance becomes the primary reason to run either multi-cloud or at least some dedicated infrastructures to which they can load-balance for performance.
Too late. We’re already in Cloud Jail!
Did you come across this article too late? Are you already shackled to your rising cloud infrastructure costs with no easy fix in sight?
Fear not; there’s still hope!
It’s never too late to start, though it can take anywhere from six to 12 months to start running hybrid infrastructure from zero — especially if you’re dealing with petabytes of data to move or a company experiencing rapid revenue growth.
However, I’ve also seen it happen in just two to three months, albeit with a healthy dose of “exigent engineering.” Or, if your footprint is smaller or your need for control is lower, perhaps they’ll skip the private network/colocation and simply start by adding some dedicated servers or “bare metal cloud” into the mix.
I’ve personally witnessed 30 web companies go through this kind of transition, and most of them have three to five core people handling the network and physical server administration. The fantastic news is that, as long as you have the runway, you can dig yourself out when public cloud fees begin to take their toll.
And if you’re spending a lot, unsure if you can achieve significant gross margins with your current cloud usage, and struggling to recruit infrastructure gurus on staff, don’t lose hope.
Feel free to ping me (Avi at Kentik dot io), and the networking community is incredibly open, and people are generally happy to socialize and help.
Attend NANOG, RIPE, APRICOT, or your local network nerding meetup or conference. Make connections and ask questions; you’ll usually find people who can help you analyze and plan your infrastructure.
Remember, you’re not alone!
Now, it’s important to note that I’m not suggesting startups should avoid using the cloud initially — especially considering the credits available when you’re backed by venture capital.
The cloud can be a fantastic, capital-efficient way to launch a business and handle fluctuating workloads. You just need to be aware of the breaking points.
When your steady-state workloads are maxed out, and your cloud bill reaches hundreds of thousands per month and continues to grow by tens of thousands regularly, you may have already reached the tipping point. Before hitting that milestone, I recommend transitioning the steady-state load to mostly your own infrastructure.
Often, people overlook the inefficiencies when it’s just $1 or 2 million annually.
However, these seemingly small inefficiencies can sneak up on you and transform into an existential threat to your entire company. Your ability to make a profit, secure more funding, or even survive can hang in the balance. It’s at that moment when people wish they had considered the risks of Cloud Jail earlier in their startup journey. Don’t let your cloud infrastructure costs handcuff your business; be proactive and plan wisely to avoid falling into this trap.