Posted by

Kevin Kissoon

on June 24, 2019

Share to

More Posts...

Scaling Cloud Costs Startup Style

by Kevin Kissoon | June 24, 2019

Tech startups are known for empowering their employees to be autonomous while enabling them to make meaningful and impactful decisions. However, these decisions often come at a cost, and if not properly monitored, could quickly and easily spiral out of control. One notable example of this is cloud computing costs.

As a multi-cloud organization, Rubikloud deploys resources for development, testing, staging, and production in all three major cloud environments – GCP, Azure, and AWS. The complexity of a multi-cloud infrastructure, coupled with employee autonomy and decentralized procurement of cloud resources, enabled employees to spin up resources in real-time with very little bureaucratic red-tape. This allowed for what appeared to be more efficient work, but at the same time was the perfect recipe for poor cloud cost management. The top three effects of poor cloud cost management include:

Runaway Cost A Sprawl of Unused and Abandoned Resources Sub-Optimal Service Levels
Runaway Cost: A Sprawl of Unused & Abandoned Resources: Sub-Optimal Service Levels:
The ability and autonomy the cloud provides by enabling resources to be deployed with the click of a button is a huge advantage, but also makes it very easy to rack up a huge bill. Many resources such as unused compute instances or obsolete storage containers are often left abandoned, but still cost the organization a significant amount of money. Overspending on unneeded resources limits the purchase of more desired and useful resources, thus causing a decrease in desired application performance.

Operations typically don’t have room for poor cloud cost management in their operating budgets. As such, it was imperative that we implemented a cloud cost management practice that was both effective and efficient. In doing so, we developed a three-tier cloud cost control policy which focuses on:

Single Source of Truth & Enhanced Visibility: Policing & Policy Enforcement: Assigning Budgets & Ownership:
Single Source of Truth & Enhanced Visibility: Policing & Policy Enforcement: Assigning Budgets & Ownership:
Using Cloudability, we have integrated our billing and usage data from all cloud vendors into a single source of truth. Using this application as well as a custom-built Slack notification bot (which we will explain in further detail below), we can generate custom reports and dashboards, receive real-time spend and anomaly alerts, organize and optimize our infrastructure, and provide enhanced visibility to the entire organization. Our infrastructure team has been policing our cloud resources to ensure that all deployed cloud resources are correctly tagged and utilized following our internal policy. By ensuring all resources are correctly used and tagged correctly, we can track, and chargeback resource costs accordingly. Each team and project is assigned a cloud budget. To allow teams to remain semi-autonomous, they can decide on how to spend their cloud budget on resources to maximize usage in accordance with their needs. Each team lead is responsible for tracking their spend and usage to ensure they remain on budget.

Cloudability provides a significant amount of functionality out of the box. However, to achieve enhanced visibility company-wide, while further enforcing budgets and cost ownership, we had to take it one step further and created a Slack notification bot to regularly alert specific teams/projects of their cloud spend. The details of this custom slack-bot are described below:

  • Within Cloudability, we first setup different views, which filter tagged cloud data to display cost and usage data specific to a particular project.
  • Within Slack, we created specific channels for each project to ensure all necessary project stakeholders receive channel notifications.
  • The custom-built Slack notification bot runs on Kubernetes and utilizes both Cloudability’s estimate API and Slack’s API to send notifications. Within the Slack bot configurations, we mapped each project’s Cloudability View to its specific Slack Channel. We then set up a Cron Job, to query each project’s view daily for both the previous day’s spend and the estimated monthly total for the project. These values are then sent as a notification to the corresponding slack channel as such:

Slack Notification Example
Cloud Cost Diagram

Through the implementation of this cloud cost control policy and our custom-built Slack bot, we have seen a significant turn-around in our cloud cost and usage. There are no more monthly surprises regarding our usage and expenditure, and we are on track to continuously optimize our cloud usage. After all, our software may belong in the clouds, but our costs need to remain on the ground.