Cloud Cost Management Team Starter Kit
Corey: This episode is sponsored in part byLaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visitlaunchdarkly.com and tell them Corey sent you, and watch for the wince.
Jesse: Hello, and welcome to AWS Morning Brief: Fridays From the Field. I’m Jesse DeRose.
Amy: I’m Amy Negrette.
Jesse: This is the podcast within a podcast where we talk about all the ways that we have seen AWS used and abused in the wild, with a healthy dose of complaining about AWS for good measure. I feel like it’s just kind of always necessary. There always has to be just that little bit of something extra; it’s the spice that really makes the dish. Today we’re going to be talking about the ‘Cloud Cost Management Team Starter Kit.’ Now, in a previous episode, we talked about the ‘Cloud Cost Management Starter Kit,’ which was a little bit more generalized, and one of the things that we talked about, ultimately, was building a team that is responsible for some of this work, some of this cloud cost management work.
So today, we’re going to take that one step further; we’re going to talk about all of the things that your cloud cost management team should ultimately be responsible for, what it should look like, how you might want to start building that team within your organization. So, I’m going to kick us off. I think one of the first things that is so, so critical for any team that is going to be doing any work is buy-in at the executive leadership level. You need to make sure that engineering leadership, the C-suite leadership has your back in everything that you’re doing. You need to make sure that the work that you’re doing has been signed off at the highest level so that that leadership can help empower you to do your work.
Amy: And we’ve referenced this before, and really, every time we talk about things like what makes a successful project is that as the one executing that project, you probably need the authority and actionable goals in order to do that, and the leadership is going to be the ones to lay that out for you.
Jesse: Absolutely. If you don’t have the backing of leadership, whether it is your boss, whether it is the C-suite, whether it’s a VP suite, you’re not going to get other people to listen to what you have to say; you’re not going to get other people to, broadly speaking, generally speaking, care about the work that you’re trying to do, the work that you’re trying to incentivize and empower other people in the organization to do.
Amy: And that kind of leads us into the next portion of it where you need to know what the responsibilities are and have that clear delineation so that you understand the things that is expected of you, what the engineering teams, what they’re expected to do, and product teams, and finance teams. Everyone has to have a pretty much fenced-in idea of what they’re allowed to do and what they are expected to deliver, just like in any project.
Jesse: Absolutely. It’s so critical for me to understand what I’m responsible for, you to understand what you’re responsible for. I can’t tell you how many times I’ve been in a meeting where somebody will say something generally like, “We should do X,” and then everyone nods and goes, “Oh, yeah, yeah, yeah. We should do X.” And then everybody leaves the meeting and thinks that somebody else is responsible for it, and nobody’s been clearly assigned that work, or nobody knows that work is ultimately their responsibility.
Amy: And if you don’t assign it, people are going to assume that this is going to be a thing that if they have time to, they’ll get to it. And we harp on it enough that whenever work is not prioritized, it is automatically deprioritized. That’s just the way task lists shake out, especially at the end of sprint meetings.
Jesse: Absolutely. And I think that’s one of the other things that’s so important, too, is that it’s not just about assigning the work, but it’s about making sure that everybody who is involved in the conversation, everybody who’s involved in the work agrees on what those boundaries are, agrees on who is responsible for what actions, more specifically speaking from a task responsibility perspective. Because at the end of the day, I want my team, whether that is my individual team or a cross-functional team, to all be bought into who’s responsible for what parts of the project. We all need to be on the same page in terms of, “Yes, this is my responsibility. This part of the work is my responsibility. I will take ownership over this,” so that we can all help each other.
Get that project goal together. One of the other big ideas that is so critical to starting a cloud cost management team is identifying and socializing your business KPI metrics. Now, this is something that some engineering teams already think about day-to-day. They might have ideas of service-level agreements, metrics, maybe service-level objective metrics, but there might be other business metrics that indirectly—or directly—relate to engineering work. It could be number of users using your SaaS platform, it could be number of API requests, it could be the amount of storage that customers are storing on your platform. You want to identify what these metrics are, and start measuring your cloud spend against these metrics.
Amy: And as far as cost optimization projects go, the KPIs may not line up directly against how many servers you’re standing up, or how many users are coming through. They’ll be very indicative because you are spending money per user and per resource, but perhaps your business goals are different. Maybe you’re not looking at trying to save money, but better understand where that money is going.
Jesse: Absolutely. It’s not just about how many instances are running per hour, it’s not just about how many servers are running per hour, or how many users per server. It’s really about understanding what are the core driving indicators of your business? What are the things that ultimately influence and impact how your workloads, and servers, and API functions, and everything, flow and grow and change over time?
Amy: These metrics also can be influenced by things that are not architecturally specific, like savings plans, or the saving you would get through reservations, or some other contractual deal you get from your provider.
Jesse: Yeah, that’s one of the hard things, too, that we always hear from our clients. There is this idea that they think that they are spending a certain amount of money because they’re getting discounts from savings plans, or from reserved instances or from an enterprise discount program, and maybe their usage is a lot higher than that, but because they get these discounts, they think that they’re actually using a lot less than they actually are. And while this is not something we’re talking about specifically or directly in this conversation, it is something to be mindful of because there definitely can be a difference between your usage and your overall spend if your company is investing in things like savings plans, and reserved instances, and discounts through either a private pricing addendum or an enterprise discount program.
Amy: Yeah. Really, the bottom line with that is you want to be aware of what your business’s goals are—and this goes back into buy-in, this goes back to leadership—that having a fully contextualized understanding of what it is that they want to do will help you make the right decisions and define your metrics in a way that basically helps you try to set your goals.
Jesse: Absolutely. And all of this comes together in policies and best practices. All of this can come together in a way where you, your team, your cloud cost management team can put all of these ideas and all these things that everybody is agreeing to, into writing. Make sure that everybody is bought in and then write it down; make it an artifact and say, “Okay, after this meeting, we’ve agreed that the way that we are going to handle cross-availability zone traffic is like this,” or, “The way that we are going to handle scaling is like that,” or, “The best practices that we want for storage is this.” Put all this down in writing.
Make sure that there are best practices being created. There’s a number of clients that I’ve worked with before that have seen multiple different teams using the same service but using it in different ways. And maybe one of them has encryption and compression enabled and they’ve got this really tight turnaround time for their services, and another team doesn’t. They’re using a lot more data transfer because they aren’t focusing on compression, for example. And this is an opportunity for a best practice to get everybody on the same page and say, “Okay. If you’re going to use this particular type of service, you need to have compression enabled, you need to architect your services to focus on talking to other services in the same region, in the same availability zone to ultimately try to cut down on data transfer costs, or on storage costs, or other things.”
Corey: This episode is sponsored in part by our friends at Lumigo. If you’ve built anything from serverless, you know that if there’s one thing that can be said universally about these applications, it’s that it turns every outage into a murder mystery. Lumigo helps make sense of all of the various functions that wind up tying together to build applications. It offers one-click distributed tracing so you can effortlessly find and fix issues in your serverless and microservices environment. You’ve created more problems for yourself; make one of them go away. To learn more, visit lumigo.io.
Amy: That brings up a really good point that I’ve noticed when I was actually coding day-to-day that each project and each team is ultimately different because you’re building different things and you’re building it with different people. So, it’s entirely possible that your
KPIs may be different between teams.
Amy: But you’re not going to know that unless all the other stuff that we mentioned. And it’s perfectly fine if your KPIs are different between teams, or if your practices have to be modified to better work with what your goals are. And also that best practices, just like everything else in the cloud, can change. If the cloud architecture backend can change once every five minutes, you have to be able to be flexible and say, “These cost management rulings that we made two quarters ago, two years ago, they made sense then. And just like anything else, things evolved, we scaled, and our needs have changed, so we have to review these.”
This is why before, we also mentioned is, like, maybe reviewing it as part of your cloud cost analysis once a quarter because things change all the time. AWS changes all the time. That’s not a thing that I’ve complained about tons of times in different places, with a lot of recorded evidence.
Jesse: Amy, why can I see this giant—your eye is just twitching, just this giant throbbing vein on your forehead right now?
Amy: [laugh]. I have to start recording these podcasts with some kind of blood pressure monitor, and we can see, as soon as I say the word, “AWS starts changing stuff,” and just watch that skyrocket.
Jesse: That is the quote. When our audio engineers post this, that is the quote to use to highlight this episode on social media.
Amy: [laugh]. Yes, absolutely.
Jesse: And I think to really bring this back around, all of these ideas, all of these things that we’re talking about aren’t just about saying we’re going to do the one thing and we’re going to do it that way for all of eternity, like Amy said. Things change over time, and that’s fine. That’s perfectly normal. So, your best practices should change over time, too. Maybe one of the things that you write down as part of your best practices is that you’re going to review your best practices, maybe once a quarter, once every six months, every year, maybe once every, whatever time period works best for your team, the way that your workloads work, the way that your team works, the pace at which your team works, make sure that you’re actively reviewing this information because all of us have seen documentation that is written once and is immediately out of date, and nobody ever touches it ever again. And that’s not what these best practices are about.
Amy: If you want to make sure that teams are bought in, show that you care, that you are aware that this stuff that they work on evolves and changes with them. If you want them to care about cloud cost management policies, it’s hard enough to say much less hard enough to get buy-in. You want them to know that you’re doing it with an awareness of what they are doing and then why they are doing it. You don’t want to go in and say, “We’re making widespread changes. We do not care what happens to your infrastructure because of it.” You want to go, “Because you are running things this way, it has to look like this because the cost is going to look like this.”
Jesse: Absolutely having data to back up your decisions really, really helps in every decision you’re making. It shows that you’re making data-driven decisions; there is a why behind what you’re doing. And it helps other people understand what you’re doing as well.
Well, if you’ve got a question you’d like us to answer on air, please go to lastweekinaws.com/QA. If you’ve enjoyed this podcast, please go to lastweekinaws.com/review and give it a five-star review on your podcast platform of choice, whereas if you hated this podcast, please go to lastweekinaws.com/review. Give it a five-star rating on your podcast platform of choice and tell us what your ideal starter kit would include.
Announcer: This has been a HumblePod production. Stay humble.