GCP Terraform Patterns#
GCP’s Terraform provider (google and google-beta) has patterns distinct from both AWS and Azure. The biggest differences: APIs must be explicitly enabled per project, IAM uses a binding model (not inline policies), and GKE requires secondary IP ranges for VPC-native networking. GCP resources also tend to have longer creation times with more eventual consistency.
Projects and API Enablement#
Before creating any resource in GCP, the corresponding API must be enabled in the project. This is the most common source of first-time failures.
variable "project_id" {
type = string
description = "GCP project ID (not the project number)"
}
# Enable required APIs
resource "google_project_service" "apis" {
for_each = toset([
"compute.googleapis.com",
"container.googleapis.com",
"sqladmin.googleapis.com",
"servicenetworking.googleapis.com",
"iam.googleapis.com",
"cloudresourcemanager.googleapis.com",
])
project = var.project_id
service = each.value
disable_on_destroy = false # do not disable API when Terraform destroys
}Gotcha: API enablement is eventually consistent. The API might report as enabled before it is fully ready. Add a short time_sleep or use depends_on from resource to API enablement:
resource "time_sleep" "api_warmup" {
depends_on = [google_project_service.apis]
create_duration = "30s"
}
resource "google_container_cluster" "main" {
depends_on = [time_sleep.api_warmup]
# ...
}Gotcha: disable_on_destroy = false is critical. Without it, terraform destroy disables the API, which cascades to deleting all resources using that API — including resources managed by other Terraform configurations.
IAM Binding Patterns#
GCP IAM has three resource types. Using the wrong one causes silent permission overwrites.
# google_project_iam_member — ADDITIVE, always safe
# Adds one member to one role. Does not affect other members in that role.
resource "google_project_iam_member" "gke_logging" {
project = var.project_id
role = "roles/logging.logWriter"
member = "serviceAccount:${google_service_account.gke_nodes.email}"
}
# google_project_iam_binding — AUTHORITATIVE for the role
# Sets the COMPLETE list of members for a role. Removes anyone not listed.
# DANGEROUS: can silently remove permissions granted by other Terraform configs or manually.
resource "google_project_iam_binding" "editors" {
project = var.project_id
role = "roles/editor"
members = [
"user:admin@example.com",
"serviceAccount:ci@project.iam.gserviceaccount.com",
]
# Anyone else who had roles/editor? Gone.
}
# google_project_iam_policy — AUTHORITATIVE for the ENTIRE project
# Sets ALL IAM bindings for the project. Removes everything not listed.
# EXTREMELY DANGEROUS: can lock you out of the project.
# Almost never use this.Rule for agents: Always use google_project_iam_member. Never use google_project_iam_binding unless you are certain you control all members of that role. Never use google_project_iam_policy.
Service Accounts#
resource "google_service_account" "app" {
account_id = "my-app-sa"
display_name = "My Application Service Account"
project = var.project_id
}
# Grant specific permissions
resource "google_project_iam_member" "app_storage" {
project = var.project_id
role = "roles/storage.objectViewer"
member = "serviceAccount:${google_service_account.app.email}"
}
resource "google_project_iam_member" "app_sql" {
project = var.project_id
role = "roles/cloudsql.client"
member = "serviceAccount:${google_service_account.app.email}"
}Gotcha: GCP IAM changes are eventually consistent (typically 60 seconds, can be up to 7 minutes). If a resource fails with PERMISSION_DENIED immediately after granting a role, it may be a propagation delay, not a missing permission.
VPC Networking with Secondary Ranges#
GKE requires VPC-native networking with secondary IP ranges for pods and services:
resource "google_compute_network" "main" {
name = "production-vpc"
auto_create_subnetworks = false
project = var.project_id
}
resource "google_compute_subnetwork" "gke" {
name = "gke-subnet"
project = var.project_id
region = var.region
network = google_compute_network.main.id
ip_cidr_range = "10.0.0.0/24" # node IPs
secondary_ip_range {
range_name = "pods"
ip_cidr_range = "10.1.0.0/16" # 65K pod IPs
}
secondary_ip_range {
range_name = "services"
ip_cidr_range = "10.2.0.0/20" # 4K service IPs
}
private_ip_google_access = true # nodes can reach Google APIs without external IP
}Gotcha: auto_create_subnetworks = false is essential. The default (true) creates a subnet in every region with /20 CIDRs — almost never what you want.
Gotcha: Secondary range sizing matters. For GKE, the pods range needs to be large enough for max_pods_per_node × max_nodes. A /16 gives 65K pod IPs, which supports ~600 nodes with the default 110 pods per node.
Gotcha: private_ip_google_access = true is required for private GKE nodes to reach Google Container Registry, Cloud APIs, and other Google services without NAT.
GKE Configuration#
resource "google_container_cluster" "main" {
name = "production"
project = var.project_id
location = var.region # regional cluster (HA across zones)
network = google_compute_network.main.id
subnetwork = google_compute_subnetwork.gke.id
ip_allocation_policy {
cluster_secondary_range_name = "pods"
services_secondary_range_name = "services"
}
# Remove default node pool and manage separately
remove_default_node_pool = true
initial_node_count = 1
# Workload Identity
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
# Private cluster
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = false # allow kubectl from internet (or true for fully private)
master_ipv4_cidr_block = "172.16.0.0/28"
}
# Release channel for auto-upgrades
release_channel {
channel = "REGULAR" # RAPID, REGULAR, or STABLE
}
# Network policy enforcement
network_policy {
enabled = true
provider = "CALICO"
}
depends_on = [google_project_service.apis]
}
resource "google_container_node_pool" "main" {
name = "production-nodes"
project = var.project_id
location = var.region
cluster = google_container_cluster.main.name
initial_node_count = 3
autoscaling {
min_node_count = 2
max_node_count = 10
}
node_config {
machine_type = "e2-standard-4"
service_account = google_service_account.gke_nodes.email
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform",
]
workload_metadata_config {
mode = "GKE_METADATA" # required for Workload Identity
}
shielded_instance_config {
enable_secure_boot = true
}
}
management {
auto_repair = true
auto_upgrade = true
}
}Gotcha: remove_default_node_pool = true requires initial_node_count = 1. GKE creates the default pool then immediately deletes it. Without initial_node_count, Terraform fails.
Gotcha: master_ipv4_cidr_block must be a /28 that does not overlap with any subnet in the VPC. Forgetting this produces a confusing error about CIDR range conflicts.
GKE Workload Identity#
# GCP service account for the workload
resource "google_service_account" "workload" {
account_id = "my-app-workload"
display_name = "My App Workload Identity"
project = var.project_id
}
# Allow the K8s service account to impersonate the GCP service account
resource "google_service_account_iam_member" "workload_identity" {
service_account_id = google_service_account.workload.name
role = "roles/iam.workloadIdentityUser"
member = "serviceAccount:${var.project_id}.svc.id.goog[default/my-app]"
}
# Grant the GCP SA permissions it needs
resource "google_project_iam_member" "workload_storage" {
project = var.project_id
role = "roles/storage.objectViewer"
member = "serviceAccount:${google_service_account.workload.email}"
}
# K8s service account annotated with GCP SA
resource "kubernetes_service_account" "app" {
metadata {
name = "my-app"
namespace = "default"
annotations = {
"iam.gke.io/gcp-service-account" = google_service_account.workload.email
}
}
}Gotcha: The member format for Workload Identity binding is serviceAccount:{project}.svc.id.goog[{namespace}/{sa-name}]. The brackets are literal — they are part of the member string, not formatting.
Cloud SQL with Private Networking#
# Reserve an IP range for service networking
resource "google_compute_global_address" "private_ip" {
name = "sql-private-ip"
project = var.project_id
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = google_compute_network.main.id
}
# Create the peering connection
resource "google_service_networking_connection" "private_vpc" {
network = google_compute_network.main.id
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = [google_compute_global_address.private_ip.name]
depends_on = [google_project_service.apis]
}
resource "google_sql_database_instance" "main" {
name = "production-postgres"
project = var.project_id
database_version = "POSTGRES_15"
region = var.region
settings {
tier = "db-custom-2-8192"
disk_size = 50
disk_autoresize = true
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = false # no public IP
private_network = google_compute_network.main.id
}
backup_configuration {
enabled = true
point_in_time_recovery_enabled = true
start_time = "03:00"
}
maintenance_window {
day = 7 # Sunday
hour = 3
}
}
deletion_protection = true
depends_on = [google_service_networking_connection.private_vpc]
}Gotcha: The service networking connection must exist before Cloud SQL can use private IP. The depends_on is mandatory — without it, Terraform races and the database creation fails.
Gotcha: Cloud SQL instance names are globally unique per project and cannot be reused for 7 days after deletion. If you destroy and recreate, use a different name or wait.
Gotcha: deletion_protection = true is a GCP API flag (separate from Terraform’s lifecycle { prevent_destroy }). Set both for production databases.
Common GCP Terraform Gotchas#
| Gotcha | Symptom | Fix |
|---|---|---|
| API not enabled | googleapi: Error 403: API not enabled |
Add google_project_service for the API |
| API propagation delay | PERMISSION_DENIED after enabling API |
Add time_sleep or depends_on chain |
| IAM eventual consistency | Permission denied after granting role | Wait 60 seconds, retry. Not a Terraform issue. |
iam_binding overwrites |
Other permissions silently removed | Use google_project_iam_member, never iam_binding |
| Cloud SQL name reuse | Cannot create instance with recently deleted name | Use unique names or wait 7 days |
| Default network exists | Terraform plan shows unexpected resources | Delete default network or import it |
| GKE secondary ranges required | Cluster creation fails with IP range error | Define secondary ranges on the subnet |
| Private cluster master CIDR | Overlap error with existing ranges | Use a /28 from unused CIDR space (172.16.0.0/28) |
| Service networking dependency | Cloud SQL fails without private networking | Add depends_on for service networking connection |
disable_on_destroy default |
API disabled on terraform destroy, cascading deletes |
Set disable_on_destroy = false on all google_project_service |
| Labels vs tags | GCP uses labels (key-value) not tags (network tags) |
Use labels for metadata, tags for firewall targeting |