My team uses AWS CloudFormation to provision our cloud infrastructure using code. Most of the time we can get what we need with the set of resources that AWS provides.
However, sometimes CloudFormation support for a service or a particular feature takes a while to arrive, and we need to fill in the gap ourselves. CloudFormation gives us the ability to fill these gaps by building custom resources that can literally run any logic you need in AWS Lambda, and we’ve used these when we needed to.
Sometimes Lambda isn’t the right answer
Sometimes even Lambda fails us, though, as some resources can potentially take a long time to set up. We don’t particularly want to have a Lambda function sitting idle or potentially timing out halfway through the resource setup.
The great thing about Lambda functions is that you only pay for them when they’re running. This makes them great for places where you need to run a quick task or handle an API request. You lose some of the benefits when your Lambda function looks like this:
start_something() while not_done(): time.sleep(30) finish()
because you’re paying for the Lambda to run while you’re waiting for your operation to complete. Worse, Lambda functions have a maximum lifetime of 15 minutes, so if your process takes longer than 15 minutes, you have to do weird hacks to make it work with a pure Lambda solution.
When we found out that CloudFormation didn’t have support for creating DynamodDB Global Tables, our first thought was “we know how to do this, a Lambda function custom resource can handle it.” However, as we dug into the details and tried it out, we quickly learned that we could get into a scenario where creating the initial replica set or updating the replicas could easily exceed the 15-minute Lambda timeout.
Step Functions to the rescue!
Here’s where AWS Step Functions comes in. Step Functions make the task of orchestrating processes easier. Step Functions have built-in support for looping, waiting, and integrating with different functions and services. This feature makes them perfect for this sort of thing.
One of our team members put together this Step Function definition for creating a DynamoDB global table. It starts out by checking the state of the table, waiting until the table is ready for updates, then comparing the set of replicas with the desired set.
You can only add one replica at a time. The step function repeats the
UpdateReplicas step until the actual state matches the desired state.
Each step is very small and self-contained, usually only one or two API calls, and all of the waiting is done by Step Functions instead of in the Lambda function. For the budget conscious, using this step makes sure you’re not paying for idle time! Best of all, Step Functions can run for up to a week, so we don’t need to worry about the 15-minute timeout any more.
There’s a small catch…
— Geoff Baskwill (@geoff_baskwill) February 24, 2021
CloudFormation doesn’t support direct integration with Step Functions as a custom resource provider yet. As a workaround, we can use our old Lambda function trick to trigger the Step Function execution, and send the response back to CloudFormation when we get to an end state.
When you love infrastructure-as-code and need a custom resource for something that CloudFormation doesn’t support, Lambda is usually a great solution. When you need a bigger hammer for complex orchestration or operations with lots of idle time, Step Functions can help get you there.
The goal is to retire this particular resource soon, as AWS tells us that they’ll have built-in support in CloudFormation for DynamoDB Global Tables in the very near future.
That said, my team is happy that we were able to deliver an initial implementation with this workaround and provide value to our customers! 🎉
Hopefully this has sparked some ideas for you! What tools do you use to automate your cloud configuration? Join the conversation on the forum!