Avoiding a Surprise AWS Bill
If you haven't set up billing alerts in your AWS account, I highly recommend you do so. Taking the five minutes to set up an alert likely saved us thousands or possibly even tens of thousands of dollars.
When first signing up for AWS back in January 2018, I wanted to set a monthly budget of $1000. At the time, I wasn't even thinking about avoiding a surprise bill. The only reason I set up a budget was to know when I should look into cutting costs. I figured any AWS bill below $1000/month wouldn't be worth optimizing, but, once we crossed that threshold, I would start digging into things and figure out what is going on. Anyways, I went into the AWS billing and cost management dashboard, set up an alert for when the bill for the month went over $950/month, and completely forgot about it.
On the evening of August 4th 2020, I received the following email:
When I first saw this email, I started panicking a little. It was only the fourth of the month and we had already spent over $1600! My first thought was that someone had somehow gained access to our AWS account and was using it to mine cryptocurrency or something. I quickly logged into my AWS account and looked at which services were costing us so much money:
I immediately saw we had spent almost $1500 on Amazon CloudWatch, Amazon's logging and metrics service. When I broke down the CloudWatch spend by usage type, I saw that the entire spend was made up of USW2-DataProcessing-Bytes—in other words, sending data into CloudWatch. Amazon charges $.50 per GB sent into CloudWatch.
When I pulled up the most recent CloudWatch logs, I immediately saw what was going on. Earlier in the day, I was debugging an issue and added a print statement to log every event that came through the Freshpaint backend. That print statement made its way into production and had generated multiple terabytes of logs, causing the $1500 increase in our AWS bill. I immediately removed the print statement and stopped sending so much data into CloudWatch.
Note: A $1500 AWS bill is bad, but it could have been a lot worse. As part of going through YC, we received $100k in AWS credits that expire after two years. Most likely, the credits are going to expire before we can use them, so the total cost of the outage to us was $0.
Preventing Similar Issues Going Forward
After resolving the problem, I started brainstorming ways to prevent similar issues going forward. It seems like there were two different factors that caused the charges:
- A print statement meant for debugging made its way into production.
- While we did have an alert that eventually caught the issue, we could have had an alert catch the issue much sooner.
To prevent debugging code from making it into production, we should have had a linter check for any fmt.Println statements. Anything actually intended to be logged in production should be logged with log.Println. This would have made a clear distinction between code that is intended for debugging purposes and code that is intended for logging purposes. We could easily run an automated linter to check there is no debugging code shipped to production before we deploy things.
As for the alerting, the only thing that saved us was that we had an alert for our entire AWS account. Instead of just one alert, we should have set up separate alerts for each AWS service. This way, if the bill for one service suddenly blows up, we will immediately know and be able to dig into it.