Pulumi, A Beginner's Experience
Context
As part of a project to (over)engineer a system for Role-Based Access Control used for administrative tasks and back-office operations, I’ve decided to go with building reproducible and portable infrastructure leveraging infrastructure-as-code (IaC) tools.
I chose to use Pulumi as the IaC of choice over the well-known Terraform by HashiCorp in hopes to gain a deeper understanding of Pulumi after having first come into contact with it on a separate project months ago and having had a great impression of it.
Pulumi
How Pulumi works
Before diving further into this article, I would highly recommend having some understanding of Pulumi concepts like how Pulumi works and using Pulumi which are both well documented.
In summary, new resources are declared using Pulumi by running pulumi up
which triggers the Pulumi engine to perform a check on the current state on the state backend, determine what resources needs to be modified, perform the necessary API calls to those providers to change the resource state and finally, update the state backend of this new state.
Pulumi Backend
To make things easier for state storing without reliance on Pulumi cloud (mainly for portability reasons), we’d decided to use S3 as our main Pulumi state backend.
The Design
A simplified view of the planned AWS Architecture Diagram:
The above design uses the following services:
- Route53 to route traffic between a primary region and secondary replica region
- ALB to spread traffic between 2 availability zones within a particular region
- Another internal-facing ALB to spread traffic across replicas of downstream microservices
- ECS to orchestrate the containerized application on top of EC2 compute instances
- RDS as the main database service running Postgres@15
- Lambda for simple log recording tasks
- SQS to enable asynchronous log entries
- Various network configurations with VPCs, Subnets and Security Groups
Our goal is to sufficiently replicate services within the regional construct such that on the event that the secondary region is promoted to the primary, a tertiary region can be spun up and take over as the secondary region while the original primary region remains inactive for maintenance or other purposes.
Design Considerations
In order to establish services that can be easily ported over to multiple regions, we would ideally want to group configurations specific to regional services together and global services separately, following the Micro-Stacks Pulumi project organization pattern.
Here’s what we started with initially:
The original intention was to separate concerns for the primary and secondary regions apart from the global and database services. This is accomplished by leveraging Pulumi Projects for each of the global
, database
, primary
and secondary
specific infrastructure setups.
We quickly realized that it was silly to separate the primary and secondary regions as they largely contain the same configurations of the same types of services and it’ll just be repetitive to have a secondary
on top of the already existing primary
when we could have just used a different Pulumi stack within the a single project with differently defined parameters like the target AWS Region.
Another thing that was somewhat unnecessary was the separation of database
and global
since global
was intended to be a shared configuration across AWS regions and the kind of configuration for RDS perfectly fits that in nature, even though it is technically a regional service.
We now end up with a project structure similar to the following:
With the regional
containing 2 stacks to stores the main configuration for the primary region in Singapore (./pulumi/regional/Pulumi.sin.yaml
) and the secondary region in Hong Kong (./pulumi/regional/Pulumi.hkg.yaml
).
With an additional logical separation of related services like for ecs
within the regional
Pulumi
project.
A Deeper Look
‼️ NOTE: The various code snippets shown below are written by a beginner (unless labelled otherwise) and are meant to be references - use at your own risk!
Structure of Code
In the Global component of code, we define the key resources that are available and used actively across regions, and shouldn’t be modified frequently, such as the Route53 and various network related configurations.
Defining network resources would look something like:
Looking back, the code structure could definitely be a lot cleaner, but for a first time effort and built in favor for quick iteration, this was what we ended up with.
The main motivation of adopting such a structure is the logical hierarchy of the provisioned resources as it now becomes clear which subfolders is responsible for which resources on AWS.
Handling Secrets
Secrets sharing is a common workflow especially among team members. For this particular project, I find the use of the built-in secrets management (see also Managing Secrets with Pulumi | Pulumi Blog) to be more than sufficient for our use case.
For example, to define a shared database secret, all we need to do is define it in the stack configuration as a secret by running a command, which sets a configuration variable named dbPassword
to the plaintext value verySecurePassword!
:
If we list the configuration for our stack, the plaintext value for dbPassword
will not be printed:
Similarly, if the program attempts to print the value of dbPassword
to the console - either intentionally or accidentally - Pulumi will mask it out:
Running this program yields the following result:
For further information on handling secrets, see the comprehensive Pulumi documentation on secrets.
A simple example of how this would look like in practical code would be:
Putting It Together
When running the pulumi up
command on a python stack, Pulumi will attempt to find and execute the entrypoint which is the __main__.py
file. Hence that’s where necessary procedures of resource allocation should start.
We could also leverage the fact that python executes code on import modules and just import the necessary resources we want to provision.
Referencing External Stacks
Following the Micro-Stack project structure, we have 2 stacks that requires references from the other, which could easily be resolved either by using an external configuration file as a reference or just simply using the Pulumi Output + StackReference.
In essence, this would allow the regional
stack to get the output reference of global
such as the VPC id to provision new resources to. An example of the implementation:
AWS Specific Details
Once we fully understood the core functionalities that Pulumi provides, what’s left is to understand the best practices for provisioning AWS resources, with or without IaC tools.
Resources:
- Security best practices for your VPC - Amazon Virtual Private Cloud
- Best practices - Amazon EC2 - Amazon Elastic Compute Cloud
- Best practices - Amazon ECS - Amazon Elastic Container Service
- Best practices - Networking - Amazon Elastic Container Service
- Best practices - AWS Lambda - AWS Lambda Functions
- Best practices - Amazon RDS - Amazon Relational Database Service
Challenges
Using Pulumi has mostly been pleasant, but it’s not without some hiccups once every so often.
Documentation
Disclaimer: Pulumi’s documentation in general is great - a lot better than many open-sourced tooling on the internet. However, from the eye of the novice, somethings are not as well described as I would hope for it to be.
I found that referencing Terraform’s documentation, coupled with AWS resource-specific guide, to be the best combination when working with new AWS resources to be provisioned through Pulumi.
Lack of References
Considering that Pulumi is a relatively young tool, researching for best practices to replicate was challenging due to the limited availability of resources and references compared to more established tools like Terraform or CloudFormation. When faced with issues foreign to me, it took me quite a while to search on places like stackoverflow or public GitHub repositories before finally finding solutions that are written for Terraform but yet still works in Pulumi (ironically this is another great thing about Pulumi since it uses the same APIs as Terraform does for infrastructure provisioning on cloud providers like AWS). Don’t even think about using Pulumi AI, using it has caused me more problems than if I were to just do a more in-depth research on the resource or just look at the Pulumi source code.
Pulumi has a relatively active slack channel where questions can be quickly answered but for a beginner, most of the time, the challenge lies in forming a good question, which coupled with lack of necessary context, can be quite time-consuming.
vs Terraform
The main reason why Terraform is used here as a comparison is because it has a lot of great features that are attractive to many: open-source, extensive documentation, and supports most major cloud providers and lower-level infrastructure system like Kubernetes.
However, Terraform isn’t without its challenges. The primary drawback lies in needing to learn and master the HashiCorp Configuration Language (HCL) in order to write resource configurations. While suitable for basic tasks, it becomes difficult to modularize code and articulate complex logical constructs such as if
statements and for
loops, particularly within expansive projects.
The Pulumi team has done a great job listing key areas of comparison between Pulumi and Terraform in their comparison documentation and at risk of repetition, I’ll be listing a few areas which impacted our ability get things out quick.
Popularity
Given that Terraform has been around for much longer, it definitely has a much larger community around it compared to Pulumi but I would say Pulumi’s community is growing fast and there are also much more activity on Pulumi’s GitHub repository than Terraform’s but Pulumi has about half the number of stargazers than Terraform.
This popularity affected the number of issues identified and shared among the community, which in turn directly impacted the availability of online resources that we could reference.
Concepts
Pulumi, similar to Terraform, support importing existing resources so that they can be managed. However, on top of that, Pulumi allows generation of code from an existing state but Terraform requires it to be manually written.
Pulumi also allows the conversion of templates by Terraform HCL, Kubernetes YAML, and Azure ARM into Pulumi programs, which is incredibly useful for teams that are already deep into the other IaC tools and want to convert to Pulumi.
Programming Language
Pulumi, in contrast to Terraform, enables users to leverage familiar general-purpose programming through their SDKs which supports multiple languages on different runtimes such as Python, Node.js (JavaScript, TypeScript), .NET (C#, F#, VB), Java, and even on YAML for configuration. This enables seamless integration with the user’s preferred IDE and allows users to leverage familiar code modularization designs and logical primitives directly through their preferred language and environment.
Final Thoughts
It’s worth a try.