Choosing an Infrastructure as Code Tool#

Infrastructure as Code tools differ in language, state management, provider ecosystem, and operational model. The choice affects how your team writes, reviews, tests, and maintains infrastructure definitions for years. Switching IaC tools mid-project is possible but expensive – it typically means rewriting all definitions and carefully importing existing resources into the new tool’s state.

Decision Criteria#

Before comparing tools, establish what matters to your organization:

  • Language preference: Does the team want a domain-specific language (HCL) or general-purpose programming languages (Python, Go, TypeScript)?
  • Cloud strategy: Single cloud, multi-cloud, or hybrid with on-premises?
  • State management: Who manages state, where is it stored, and how is it locked?
  • Team expertise: What does the team already know? IaC tool adoption is constrained by the team’s willingness to learn.
  • Ecosystem: How many providers, modules, and community examples exist for the resources you need to manage?
  • Testing: How important is unit testing, integration testing, and policy-as-code for your infrastructure?

Terraform and OpenTofu#

Terraform uses HCL (HashiCorp Configuration Language), a declarative DSL designed for infrastructure definition. It has the largest provider ecosystem of any IaC tool – over 4000 providers covering AWS, Azure, GCP, Kubernetes, GitHub, Datadog, PagerDuty, and almost anything with an API. State is stored in a state file (local by default, remote backends like S3, GCS, or Terraform Cloud for teams).

OpenTofu is an open-source fork of Terraform, created after HashiCorp changed Terraform’s license from MPL to BSL in August 2023. OpenTofu is a Linux Foundation project and maintains compatibility with Terraform’s provider ecosystem and HCL syntax.

Choose Terraform/OpenTofu when:

  • You need multi-cloud support – Terraform’s provider ecosystem is unmatched for managing resources across AWS, Azure, GCP, and SaaS services in a single codebase
  • The team is comfortable learning HCL (or already knows it)
  • You want the broadest community – the most Stack Overflow answers, blog posts, modules, and examples
  • You need mature provider support for niche services (DNS providers, monitoring tools, identity providers)
  • You value a stable, well-understood state management model with remote backends and locking
  • For OpenTofu specifically: you want truly open-source licensing (MPL 2.0) and are concerned about HashiCorp’s BSL restrictions

Limitations: HCL has a learning curve for developers accustomed to general-purpose languages. Complex logic (conditional resource creation, dynamic blocks, for_each with maps) in HCL can become difficult to read and maintain. State management requires careful planning – state file corruption or drift causes real outages. The BSL license for Terraform restricts competitive use, which led to the OpenTofu fork. Module versioning and dependency management is less sophisticated than package managers in general-purpose languages.

Pulumi#

Pulumi lets you define infrastructure using general-purpose programming languages: Python, Go, TypeScript, C#, Java, and YAML. Instead of learning a DSL, you use the language your team already knows, with full access to loops, conditionals, functions, classes, and the language’s standard library. State is managed by the Pulumi Cloud service (free tier available) or self-hosted backends (S3, Azure Blob, GCS, local filesystem).

Choose Pulumi when:

  • Your team strongly prefers writing infrastructure in Python, TypeScript, Go, or C# rather than learning HCL
  • Infrastructure definitions require complex logic – dynamic resource generation based on configuration, API calls to external systems during planning, or conditional compositions that are awkward in HCL
  • You want IDE support with type checking, autocompletion, and inline documentation for infrastructure resources
  • Testing with standard language frameworks matters – unit tests with pytest, Jest, or Go’s testing package, applied to infrastructure code
  • You are building reusable infrastructure components that benefit from object-oriented or functional design patterns
  • Your team has strong software engineering practices and wants to apply them to infrastructure

Limitations: The provider ecosystem is smaller than Terraform’s. Many Pulumi providers are auto-generated wrappers around Terraform providers (bridged providers), which means they lag behind Terraform provider releases and sometimes have quirks in the translation. State management still requires a backend – the Pulumi Cloud service adds a dependency, and self-hosted backends require the same planning as Terraform remote state. Debugging is harder when the abstraction between your code and the cloud API spans both the Pulumi SDK and a bridged Terraform provider. The community is smaller, so finding examples and solutions for specific problems takes more effort.

CloudFormation, Bicep, and Deployment Manager#

Cloud-native IaC tools are built and maintained by the cloud providers themselves. AWS CloudFormation uses JSON or YAML templates. Azure Bicep is a DSL that compiles to ARM templates. Google Cloud Deployment Manager uses YAML with Jinja2 or Python templates.

Choose cloud-native IaC when:

  • You are committed to a single cloud provider and lock-in is an acceptable tradeoff
  • You want the tightest integration with the provider’s services – new features are available in CloudFormation/Bicep before third-party tools
  • Compliance or organizational policy requires using the provider’s native tooling
  • You do not want to manage external state files – CloudFormation and Bicep manage state internally within the cloud provider
  • You need stack-level operations that the provider handles natively (CloudFormation stack policies, drift detection, change sets)
  • Azure specifically: Bicep is a genuinely good DSL with clear syntax, excellent VS Code support, and a better developer experience than ARM templates

Limitations: Vendor lock-in is total – CloudFormation cannot manage Azure resources and Bicep cannot manage AWS resources. If you ever go multi-cloud, you must adopt a second tool. CloudFormation’s YAML/JSON syntax is verbose and error-prone for large templates. Deployment Manager (GCP) is the weakest of the three and Google has been directing users toward Terraform instead. Error messages from CloudFormation are notoriously unhelpful. Rollback behavior during stack updates can leave resources in inconsistent states that require manual intervention.

Crossplane#

Crossplane extends the Kubernetes API to manage cloud infrastructure. You define cloud resources (RDS instances, S3 buckets, VPCs) as Kubernetes Custom Resources, and Crossplane controllers reconcile them with the cloud provider’s API. This makes Kubernetes the universal control plane for both applications and infrastructure.

Choose Crossplane when:

  • Your team is deeply invested in Kubernetes and wants to manage everything (applications and infrastructure) through the Kubernetes API
  • GitOps for infrastructure is a goal – ArgoCD or Flux can manage Crossplane resources the same way they manage application deployments
  • You want continuous reconciliation for infrastructure (like GitOps for applications) – Crossplane controllers detect drift and correct it automatically
  • You are building an internal platform where teams self-service infrastructure through Kubernetes CRDs (Crossplane Compositions let you create opinionated abstractions)
  • You value the Kubernetes ecosystem – RBAC, namespaces, admission webhooks, and audit logging all apply to infrastructure resources

Limitations: Requires a running Kubernetes cluster to manage infrastructure – you have a chicken-and-egg problem for the cluster that runs Crossplane itself. Provider support is less mature than Terraform – AWS, Azure, and GCP providers exist but coverage of individual services and API fields is not as complete. The Kubernetes API model adds complexity for resources that do not naturally fit CRD patterns (complex nested configurations, cross-resource dependencies). The learning curve assumes deep Kubernetes knowledge. Composition (Crossplane’s abstraction mechanism) has its own complexity that teams must learn alongside Kubernetes itself.

CDK Variants (AWS CDK, CDKTF)#

AWS CDK lets you write CloudFormation in TypeScript, Python, Go, C#, or Java – it synthesizes to CloudFormation templates. CDKTF (CDK for Terraform) does the same but synthesizes to Terraform HCL. These are bridge tools that give you programming language capabilities with established IaC backends.

Choose CDK/CDKTF when:

  • You want programming language expressiveness but with CloudFormation or Terraform as the deployment engine
  • Your team already uses CloudFormation or Terraform and wants to layer programming languages on top without switching the entire stack
  • You value the construct library model – high-level abstractions that encode best practices (AWS CDK’s L2/L3 constructs)
  • CDKTF specifically: you want Terraform’s provider ecosystem with TypeScript or Python as the authoring language

Limitations: Added abstraction layer – debugging requires understanding both the CDK layer and the synthesized output (CloudFormation or HCL). CDK constructs can be opinionated in ways that conflict with your requirements, and overriding them requires understanding the underlying template. CDKTF generates HCL that is not intended to be human-readable, which makes state debugging harder.

Comparison Table#

Criteria Terraform/OpenTofu Pulumi CloudFormation/Bicep Crossplane CDK/CDKTF
Language HCL Python, Go, TS, C#, Java YAML/JSON (CF), Bicep DSL YAML (K8s CRDs) TS, Python, Go, C#, Java
Multi-cloud Yes (4000+ providers) Yes (smaller ecosystem) No (single provider) Yes (growing) Depends on backend
State management External (S3, GCS, TF Cloud) Pulumi Cloud or external Provider-managed Kubernetes etcd Backend-dependent
Provider ecosystem Largest Large (many bridged) Single provider, complete Growing, gaps exist Backend’s ecosystem
Learning curve Medium (HCL) Low (known languages) Low-Medium High (requires K8s) Medium
Testing support terraform test, Terratest Native language testing CloudFormation Guard, cfn-lint K8s admission webhooks Native language testing
Drift detection terraform plan (manual) pulumi preview (manual) Built-in (CF), manual (Bicep) Continuous (controller) Backend-dependent
Community size Very large Medium Large (AWS-specific) Growing Medium
License BSL (TF), MPL (OpenTofu) Apache 2.0 Proprietary (free to use) Apache 2.0 Apache 2.0

The Pragmatic Reality#

For most teams starting from zero, Terraform or OpenTofu is the pragmatic default. The provider ecosystem is so much larger than the alternatives that practical coverage trumps language preference for most organizations. You can manage AWS, Azure, GCP, Kubernetes, GitHub, Cloudflare, Datadog, and hundreds of other services with a single tool and a single state management model.

Switch to something else when you have a specific reason: Pulumi when HCL is genuinely blocking your team’s productivity and you are confident the providers you need exist. Crossplane when Kubernetes-native management and continuous reconciliation are architectural requirements. CloudFormation/Bicep when you are single-cloud and the organizational mandate exists. CDK when you want programming languages but cannot leave the CloudFormation or Terraform ecosystem.

Common Mistakes#

Choosing Pulumi for the language, then discovering provider gaps. The Pulumi provider for a service may exist but cover only 80% of the API surface that Terraform’s provider covers. Check the specific resources and properties you need before committing.

Choosing Crossplane without deep Kubernetes expertise. Crossplane’s value proposition depends on the team being fluent in Kubernetes concepts – CRDs, controllers, reconciliation loops, RBAC. If the team is still learning Kubernetes for application deployment, adding infrastructure management to the same cluster adds too much cognitive load.

Ignoring state management until it is a problem. Every tool except CloudFormation and Crossplane requires explicit state management decisions. Remote state with locking should be configured from day one, not after two engineers corrupt the state file by running apply simultaneously.

Over-engineering with CDK abstractions. CDK constructs make it easy to build deep abstraction hierarchies. When something goes wrong, debugging through four layers of constructs to find the generated CloudFormation property that is incorrect is significantly harder than writing the resource definition directly.