Terraform 1.9+ and OpenTofu 1.8+
Terraform 1.9+ and OpenTofu 1.8+ rules for Cursor. Teaches for_each over count, remote backend with locking, version pinning, moved/removed/import blocks, ephemeral resources for secrets, check blocks, tftest framework, OpenTofu state encryption, OIDC federation. Catches 20 regressions including 0.0.0.0/0 ingress, count for stable resources, ignore_changes=all, null_resource overuse, plaintext secrets, unpinned providers.
terraform-reviewer
Reviews Terraform / OpenTofu HCL for 0.0.0.0/0 ingress on management ports, count for stable resources, local backend in shared modules, unpinned providers/engine, ignore_changes=all, null_resource as a hammer, terraform import CLI without import block, renaming without moved block, workspaces-as-environments, secrets in HCL, static AWS keys, wildcard IAM, unmarked sensitive outputs, public DBs/buckets, IMDSv1, unencrypted volumes, unversioned modules. Use after generating or modifying Terraform code.
# Terraform / OpenTofu Reviewer
You are a Terraform 1.9+ / OpenTofu 1.8+ reviewer. Review HCL changes and flag by severity.
## Critical (security risk, data loss, or production breakage)
- `0.0.0.0/0` ingress on SSH (22), RDP (3389), MySQL (3306), Postgres (5432), Redis (6379), Mongo (27017), or any other management/database port.
- `publicly_accessible = true` on `aws_db_instance` or `aws_rds_cluster`.
- `block_public_acls = false` or missing `aws_s3_bucket_public_access_block`.
- `encrypted = false` or omitted on `aws_ebs_volume`, `aws_rds_cluster`, `aws_db_instance`.
- `http_tokens = "optional"` on EC2 metadata options (IMDSv1 enabled).
- Hardcoded secrets in `variable` defaults, `*.tfvars` files, or HCL literals (passwords, API keys, connection strings).
- `actions = ["*"]` or `resources = ["*"]` in IAM policy statements.
- Static `access_key` / `secret_key` declared in provider blocks (use OIDC / env / shared credentials).
- `backend "local"` in shared / production modules.
- Unmarked sensitive outputs (passwords, tokens).
- `lifecycle { ignore_changes = all }`.
## Warning (regression vs modern Terraform / OpenTofu idioms)
- `count` used for a collection that has stable identity (use `for_each` over `toset`).
- Missing `required_version` in `terraform { }` block.
- Missing `required_providers` block or provider versions unpinned.
- Renaming a resource without an accompanying `moved` block.
- `terraform import` CLI usage in scripts/docs; should use `import {}` block.
- `null_resource` + `local-exec` where a real provider resource exists.
- `terraform_remote_state` for cross-module data lookup where a provider data source would work.
- `depends_on` listed redundantly on a resource that already references the dependency.
- `dynamic` block iterating over a single static element.
- Variables without `type` or `description`.
- Constrained variables without `validation` blocks (e.g., `admin_cidrs` accepts any CIDR including `0.0.0.0/0`).
- `terraform.workspace` interpolated into resource names (workspaces used as environments).
- Module `source = "git::..."` without `?ref=v...` pin.
- Wildcard module include like `for_each = fileset(...)` without a clear schema.
- Unguarded RDS instance: `aws_db_instance` without `backup_retention_period`, `deletion_protection = true`, and `final_snapshot_identifier`.
## Suggestion (style / future-proofing)
- `ephemeral` resources (Terraform 1.10+) instead of `data` for secrets.
- `check` blocks for runtime invariants (cost guards, health probes).
- `*.tftest.hcl` test files alongside modules.
- OpenTofu state encryption when using a custom backend.
- `terraform-docs` injection into module README.
- `default_tags` on provider block for org-wide tagging.
- AWS resource `tags` merged with module-managed local.
- AssertJ-style assertions in tests.
## Per-file checks
For each `.tf` / `.tfvars` / `.tftest.hcl` file changed:
1. **Top-level config**: `required_version`, `required_providers`, remote backend with locking.
2. **Resources**: `for_each` over `count` for stable identity, `lifecycle` blocks narrow not all, encryption flags on storage.
3. **Variables**: typed, described, validated. No defaults that contain secrets.
4. **Outputs**: minimal, named, `sensitive = true` on anything secret-bearing.
5. **IAM**: narrow actions and resources, no wildcards.
6. **Networking**: private DBs, IMDSv2 required, ingress scoped to security groups not CIDRs.
7. **Refactoring**: `moved` for renames, `removed` for retirements, `import` blocks for adoption.
8. **Module sources**: pinned to tags or versions.
9. **Tests**: at least plan-tests for modules with validation blocks.
## OpenTofu-specific notes
When reviewing OpenTofu code:
- State + plan encryption can be configured at the engine level (Terraform requires backend-side).
- `tofu test` supports `mock_provider` (Terraform does not).
- `provider for_each` available (1.9+) for multi-region/multi-account from a single provider block.
- Static variables/locals usable in `module.source` and `backend` blocks (1.8+).
## Output Format
Group findings by severity. For each:
**file:line** - **severity** - what's wrong - how to fix (with one-line code example).
End with: `N critical, N warnings, N suggestions`.terraform-migrate-secrets
Migrate secrets out of Terraform HCL: ephemeral resources for Terraform 1.10+, encrypted state for OpenTofu, sensitive data sources for older versions. Remove .tfvars secrets, rotate exposed credentials, document the rotation.
# Migrate Secrets Out of HCL
## When to Use
When auditing a Terraform codebase that contains:
- Hardcoded passwords / API keys in `variable "x" { default = "..." }`.
- Committed `*.tfvars` files with real values for secret-bearing variables.
- Secrets visible in `terraform plan` output (sensitive outputs not flagged).
## Instructions
### Step 1: Inventory the leaks
```bash
# Plaintext secrets in HCL
grep -rnE '(password|secret|token|api[_-]?key)\s*=\s*"[^$\s{][^"]+"' --include='*.tf' --include='*.tfvars' .
# Common placeholders
grep -rnE 'default\s*=\s*"(hunter2|changeme|password|admin)"' --include='*.tf' .
# Outputs that should be sensitive
grep -rnE 'output\s+"(password|token|secret|key)' --include='*.tf' . -A 3
```
Make a list of every file:line that needs change.
### Step 2: Rotate any exposed credentials first
If a real secret was committed to git, treat it as compromised. Rotate it (rotate the AWS key, change the DB password, regenerate the API token) before continuing. The git history retains the secret forever; rotating is the only mitigation.
```bash
# Optional: rewrite git history to scrub the secret (destructive, requires force-push)
git filter-repo --replace-text replacements.txt
```
Document the rotation in an incident note.
### Step 3: Move secrets to a secrets manager
For AWS:
```hcl
# in your secrets module or via the AWS console
resource "aws_secretsmanager_secret" "db" {
name = "${var.env}/db/password"
description = "RDS password for ${var.env}"
}
resource "aws_secretsmanager_secret_version" "db" {
secret_id = aws_secretsmanager_secret.db.id
secret_string = jsonencode({ password = random_password.db.result })
lifecycle {
ignore_changes = [secret_string] # subsequent rotations don't show as drift
}
}
resource "random_password" "db" {
length = 32
special = false
}
```
The password lives in Secrets Manager. State contains the resource references, not the password value (use `ephemeral` or read flag-tracked sensitive data sources).
### Step 4: Consume the secret
**Terraform 1.10+ (ephemeral resources):**
```hcl
ephemeral "aws_secretsmanager_secret_version" "db" {
secret_id = "${var.env}/db/password"
}
resource "aws_db_instance" "main" {
password = jsondecode(ephemeral.aws_secretsmanager_secret_version.db.secret_string).password
}
```
Ephemeral values are never written to state or plan output. This is the cleanest path.
**Terraform 1.6-1.9 / OpenTofu 1.6-1.7 (sensitive data source):**
```hcl
data "aws_secretsmanager_secret_version" "db" {
secret_id = "${var.env}/db/password"
}
resource "aws_db_instance" "main" {
password = jsondecode(data.aws_secretsmanager_secret_version.db.secret_string).password
}
```
Plus encrypt the state (S3 backend with `encrypt = true` and KMS CMK, or OpenTofu native state encryption).
**OpenTofu state + plan encryption (1.7+):**
```hcl
terraform {
encryption {
key_provider "aws_kms" "k" {
kms_key_id = "alias/tofu-state"
region = "eu-central-1"
}
method "aes_gcm" "m" { keys = key_provider.aws_kms.k }
state { method = method.aes_gcm.m }
plan { method = method.aes_gcm.m }
}
}
```
State and plan files are encrypted at rest by OpenTofu itself, even if the backend is not configured to encrypt.
### Step 5: Remove HCL secrets
```bash
# Remove default = "..." from variable blocks
# (manual edit, sed risky with multiline blocks)
# Delete *.tfvars files containing secrets
rm shared.tfvars
git rm shared.tfvars
# Update .gitignore
echo "*.tfvars" >> .gitignore
echo "!*.example.tfvars" >> .gitignore # template versions OK
```
Keep `*.example.tfvars` with placeholder values as documentation:
```hcl
# shared.example.tfvars
db_secret_arn = "arn:aws:secretsmanager:eu-central-1:123456789012:secret:prod/db/password-XXXXXX"
admin_cidrs = ["10.0.0.0/8"]
```
### Step 6: Mark sensitive outputs
```hcl
# Before
output "db_endpoint" {
value = aws_db_instance.main.endpoint
}
# After (if endpoint is sensitive in your threat model)
output "db_endpoint" {
value = aws_db_instance.main.endpoint
sensitive = true
}
```
For password outputs - do not output them at all. Consumers read from Secrets Manager directly.
### Step 7: Verify
```bash
# No plaintext secrets in tf or tfvars
grep -rnE '(password|secret|token|api[_-]?key)\s*=\s*"[^$\s{][^"]+"' --include='*.tf' --include='*.tfvars' .
# No tfvars committed
find . -name '*.tfvars' -not -path '*/.terraform/*' | grep -v example
# Plan output contains no secrets
terraform plan | grep -iE 'password|secret|api[_-]?key' || echo "plan clean"
```
The greps should return empty. If anything remains, fix it before continuing.
### Step 8: Document for future engineers
Add a section to the project README:
```markdown
## Secrets
This project does not store secrets in HCL. Secrets live in AWS Secrets Manager
under `<env>/<service>/<name>` paths. Terraform reads them via:
- **ephemeral** resources (1.10+) for runtime-only use.
- **aws_secretsmanager_secret_version** data sources for compile-time references.
Rotation is handled by Secrets Manager rotation lambdas; Terraform does not need
to be re-applied when a secret rotates (the `lifecycle.ignore_changes` on
`aws_secretsmanager_secret_version.secret_string` keeps it out of drift).
```
## Anti-patterns to avoid
- Do not check committed secrets into git history without rotating. The history is forever.
- Do not use `data` sources that put the secret value into state without encrypting the state.
- Do not pass secrets as Terraform variables on the command line - they appear in shell history and CI logs.
- Do not use `terraform output -raw db_password` in scripts. Read from Secrets Manager directly.
- Do not assume `sensitive = true` hides the value everywhere. It only suppresses plan/apply output; the value is in state.terraform-new-module
Scaffold a reusable Terraform / OpenTofu module: versions.tf with required_version + required_providers, variables.tf with typed + validated inputs, outputs.tf with curated outputs, main.tf with for_each + tagging, README + examples/ + tests/.
# Scaffold Terraform Module
## When to Use
When creating a new reusable module (e.g. VPC, RDS, ECS service) that will be consumed by multiple root modules.
## Instructions
1. Decide the module name and primary resource. Use a single-noun name: `vpc`, `rds`, `ecs-service`. Avoid `aws-vpc-network-module`.
2. Create the directory layout:
```
modules/<name>/
├── README.md
├── versions.tf
├── variables.tf
├── main.tf
├── outputs.tf
├── examples/
│ └── simple/
│ ├── main.tf
│ └── README.md
└── tests/
└── <name>.tftest.hcl
```
3. Write `versions.tf`:
```hcl
terraform {
required_version = ">= 1.5"
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 5.0"
}
}
}
```
4. Write `variables.tf` with `type`, `description`, and `validation` on every input:
```hcl
variable "name" {
description = "Logical name (used as a tag prefix and identifier)"
type = string
validation {
condition = can(regex("^[a-z][a-z0-9-]{1,30}$", var.name))
error_message = "name must be lowercase + hyphen, 2-31 chars"
}
}
variable "tags" {
description = "Tags merged onto every resource the module manages"
type = map(string)
default = {}
}
```
5. Write `main.tf` with `for_each` over `count`, explicit tagging, no hardcoded values:
```hcl
locals {
module_tags = merge(
var.tags,
{ Module = "<name>", ManagedBy = "Terraform" },
)
}
resource "aws_<thing>" "this" {
# ... per-resource args
tags = merge(local.module_tags, { Name = var.name })
}
```
6. Write `outputs.tf` with curated, named outputs:
```hcl
output "id" {
description = "ID of the created <thing>"
value = aws_<thing>.this.id
}
```
Mark anything secret-bearing `sensitive = true`. Better: do not output secrets at all.
7. Add a simple example under `examples/simple/`:
```hcl
# examples/simple/main.tf
provider "aws" { region = "eu-central-1" }
module "thing" {
source = "../../"
name = "demo"
}
```
8. Add a `*.tftest.hcl` file under `tests/`:
```hcl
run "rejects_invalid_name" {
command = plan
variables { name = "Invalid Name With Spaces" }
expect_failures = [var.name]
}
run "plans_cleanly_with_valid_inputs" {
command = plan
variables { name = "test" }
assert {
condition = aws_<thing>.this.tags["Name"] == "test"
error_message = "Name tag did not propagate"
}
}
```
9. Generate the README with `terraform-docs`:
```bash
terraform-docs markdown table --output-file README.md --output-mode inject .
```
10. Commit and tag a release:
```bash
git tag v0.1.0
git push origin v0.1.0
```
## Anti-patterns to avoid
- Never `count = length(var.things)` for stable collections - use `for_each = toset(var.things)`.
- Never embed `0.0.0.0/0` in module defaults (validate it out at the variable).
- Never declare `provider "aws"` inside the module - the caller configures the provider.
- Never output the raw resource object - curate the public surface.
- Never `~>` pin the engine inside a module - use `>=` and let root modules pin.
- Never accept `variable "extra" { type = any }` as an escape hatch - design real inputs.terraform-refactor-with-moved
Safely refactor Terraform code (rename a resource, restructure a module, switch from count to for_each) using moved blocks. Avoids destroy+create cycles that would otherwise drop and recreate live infrastructure.
# Refactor with `moved` Blocks
## When to Use
When changing the address of an existing Terraform-managed resource - by renaming, moving between modules, or switching from `count` to `for_each`. Without a `moved` block, Terraform treats the new address as a new resource and destroys + recreates the old one.
## Instructions
### Step 1: Identify the affected addresses
Run `terraform state list` to see the current state addresses:
```
aws_iam_role.application
aws_iam_role_policy.application_s3
module.vpc.aws_subnet.private[0]
module.vpc.aws_subnet.private[1]
```
### Step 2: Write the `moved` block in HCL
For a simple rename:
```hcl
moved {
from = aws_iam_role.application
to = aws_iam_role.app
}
```
For moving between modules:
```hcl
moved {
from = aws_iam_role.app
to = module.iam.aws_iam_role.app
}
```
For switching from `count` to `for_each`:
```hcl
# Before (count-keyed)
resource "aws_subnet" "private" {
count = length(var.azs)
availability_zone = var.azs[count.index]
...
}
# After (for_each-keyed)
resource "aws_subnet" "private" {
for_each = toset(var.azs)
availability_zone = each.value
...
}
# moved block, one per element
moved {
from = aws_subnet.private[0]
to = aws_subnet.private["eu-central-1a"]
}
moved {
from = aws_subnet.private[1]
to = aws_subnet.private["eu-central-1b"]
}
```
### Step 3: Plan
```bash
terraform plan
```
You should see:
```
# aws_iam_role.app will be moved from aws_iam_role.application
```
Not a destroy + create. If the plan still shows destroy + create, the `moved` block does not match - check addresses carefully.
### Step 4: Apply
```bash
terraform apply
```
The state file is updated. No infrastructure changes.
### Step 5: Clean up
`moved` blocks live for one release cycle. After every consumer environment has applied the migrated configuration:
- Remove the `moved` blocks from HCL.
- Tag a new module version.
Leaving stale `moved` blocks indefinitely is harmless but adds noise.
## Cross-environment migration
If multiple environments use the same module:
1. Tag a new module version that includes both the resource rename and the `moved` block.
2. Bump each environment's `module "x" { source = "...?ref=vN" }` pin to the new version.
3. Plan + apply each environment. The state file gets migrated.
4. After every environment is on the new tag, you can delete the `moved` blocks and tag a follow-up version.
## When you cannot use `moved`
`moved` cannot migrate resources across:
- Different backends (different state files).
- Different cloud providers.
- Different resource types (`aws_instance` -> `aws_ec2_fleet`).
For these, use `removed { from = ... lifecycle { destroy = false } }` to drop the resource from state without destroying it, then re-adopt at the new address with an `import` block.
## Common mistakes
- **Forgetting the `moved` block**: rename without one and Terraform plans a destroy + create. Always plan before applying.
- **Wrong address syntax**: `moved { from = "aws_iam_role.x" to = "aws_iam_role.y" }` (strings) is wrong. The addresses are bare HCL identifiers, not quoted strings.
- **Mixing `moved` with other changes**: a single PR that both moves resources AND changes their arguments is hard to review. Split: PR 1 moves, PR 2 changes args.
- **Deleting `moved` blocks too early**: if you delete the block before all environments have applied, the un-applied environments will destroy + create on next apply.
## `removed` blocks for retiring resources without destroy
When you want to stop managing a resource but keep it alive (e.g., handing ownership to another team):
```hcl
removed {
from = aws_iam_role.legacy
lifecycle {
destroy = false
}
}
```
Terraform drops the resource from state on the next apply but does not destroy it. Useful when transferring ownership; rare otherwise.
## `import` blocks for adopting existing infrastructure
Pair with `removed` from the source codebase to do a clean handoff:
```hcl
# in the new codebase
import {
to = aws_iam_role.app
id = "my-iam-role-id"
}
resource "aws_iam_role" "app" {
# ... config that matches the existing role
}
```
After apply: remove the `import` block in a follow-up PR.terraform-validate
Scan a Terraform / OpenTofu codebase for anti-patterns: 0.0.0.0/0 ingress, count for stable resources, local backend in shared modules, unpinned providers, ignore_changes=all, null_resource overuse, plaintext secrets in HCL or tfvars, unmarked sensitive outputs, public DBs, unencrypted volumes, IMDSv1, wildcard IAM, unversioned modules.
# Validate Terraform / OpenTofu Codebase
## When to Use
When auditing AI-generated HCL, reviewing a PR, or preparing a Terraform repo for production handoff. Output is a list of locations and suggested fixes.
## Instructions
Run each grep against the repo root. Each hit is a candidate; review case by case.
### Security: open ingress
```bash
# 0.0.0.0/0 anywhere - audit each match
grep -rn '"0\.0\.0\.0/0"' --include='*.tf' .
# IPv6 wildcard
grep -rn '"::/0"' --include='*.tf' .
```
Pair these with the resource type. `0.0.0.0/0` on port 443 of a public ALB is acceptable; on SSH or a database port it is not.
### Resource identity
```bash
# count for resources that should be for_each
grep -rnE '^\s*count\s*=' --include='*.tf' .
```
Audit each: if it's `count = var.enabled ? 1 : 0` keep it; otherwise consider `for_each`.
### State management
```bash
# Local backend
grep -rn 'backend\s*"local"' --include='*.tf' .
# Find roots without a backend block
for d in live/*/*/*/; do
test -f "$d/backend.tf" || echo "MISSING backend: $d"
done
```
### Drift suppression
```bash
grep -rn 'ignore_changes\s*=\s*all' --include='*.tf' .
```
Almost always a smell. Replace with a specific attribute list.
### Provider / engine pinning
```bash
# Roots without required_version
for f in $(grep -rl 'terraform\s*{' --include='*.tf' .); do
grep -q 'required_version' "$f" || echo "MISSING required_version: $f"
done
# Providers without version pin
grep -rn 'required_providers' --include='*.tf' . -A 10 | grep -v 'version'
```
### Lifecycle / refactoring
```bash
# CLI imports in scripts/docs (should be import {} blocks)
grep -rn 'terraform\s\+import' --include='*.tf' --include='*.sh' --include='*.md' .
# null_resource - audit each
grep -rn 'null_resource' --include='*.tf' .
grep -rn 'local-exec' --include='*.tf' .
# Module sources without ref pin
grep -rnE 'source\s*=\s*"git::[^"]*"' --include='*.tf' . | grep -v '\?ref='
```
### Secrets
Note: secrets embedded inside `user_data` string content (e.g. `user_data = "DB_PASS=hunter2"`) are not caught by the grep below because the attribute name `user_data` does not contain "password" / "secret". Use `checkov` (CKV_AWS_88) or `trivy config` for user-data scanning.
```bash
# Plaintext secrets
grep -rnE '(password|secret|token|api[_-]?key)\s*=\s*"[^$\s{][^"]+"' --include='*.tf' --include='*.tfvars' .
# Common placeholder values
grep -rnE 'default\s*=\s*"(hunter2|changeme|password|admin123)"' --include='*.tf' .
# Static AWS credentials in HCL
grep -rnE 'access_key\s*=\s*"' --include='*.tf' .
```
### IAM
```bash
# Wildcard action / resource
grep -rnE 'Action"?\s*:\s*"\*"' --include='*.tf' .
grep -rnE 'Resource"?\s*:\s*"\*"' --include='*.tf' .
# actions = ["*"] in HCL policy documents
grep -rnE 'actions\s*=\s*\[\s*"\*"' --include='*.tf' .
```
### Storage / database
```bash
# Public DBs
grep -rn 'publicly_accessible\s*=\s*true' --include='*.tf' .
# Unencrypted volumes
grep -rnE '(storage_encrypted|encrypted)\s*=\s*false' --include='*.tf' .
# IMDSv1 enabled
grep -rn 'http_tokens\s*=\s*"optional"' --include='*.tf' .
# S3 buckets without public access block
for f in $(grep -rl 'aws_s3_bucket"' --include='*.tf' .); do
grep -q 'aws_s3_bucket_public_access_block' "$f" || echo "MISSING public_access_block: $f"
done
```
### Outputs
```bash
# Outputs whose name suggests a secret, missing sensitive flag
grep -rnE 'output\s+"(password|token|secret|key)' --include='*.tf' . -A 3 | grep -v 'sensitive\s*=\s*true'
```
### Variables
`grep -L` does not work with piped stdin, so audit each variable block per file:
```bash
for f in $(grep -rl 'variable[[:space:]]*"' --include='*.tf' .); do
python3 - "$f" <<'PY'
import re, sys
text = open(sys.argv[1]).read()
for m in re.finditer(r'variable\s+"([^"]+)"\s*\{([^}]*)\}', text, re.DOTALL):
name, body = m.group(1), m.group(2)
if not re.search(r'\btype\s*=', body):
print(f"{sys.argv[1]}: variable \"{name}\" missing type")
if not re.search(r'\bdescription\s*=', body):
print(f"{sys.argv[1]}: variable \"{name}\" missing description")
PY
done
```
### tfvars hygiene
```bash
# .tfvars files committed (audit each)
find . -name '*.tfvars' -not -path '*/.terraform/*'
# .tfvars not gitignored
git check-ignore -v $(find . -name '*.tfvars' -not -path '*/.terraform/*' 2>/dev/null) 2>&1 | grep -v gitignore
```
### Workspaces as environments
```bash
grep -rn 'terraform\.workspace' --include='*.tf' .
```
Audit each: legitimate use is in cheap parallel state (PR previews), not in resource names for production environments.
## Run automated tools alongside
The greps above are quick triage. For production audits also run:
```bash
terraform fmt -check -recursive
terraform validate
tflint
trivy config .
checkov -d .
```
Tflint / Trivy / Checkov ship pre-built rulesets that catch what greps miss.
## Output Format
```
=== Critical ===
live/prod/web/main.tf:34 - 0.0.0.0/0 on port 22 ingress
modules/vpc/main.tf:18 - count = length(var.azs) for stable collection
live/prod/web/backend.tf - missing required_version
shared.tfvars:5 - plaintext db_password value committed
=== Warnings ===
modules/iam/main.tf:78 - actions = ["*"] in policy document
live/staging/web/main.tf:12 - null_resource + local-exec (provider resource available?)
=== Suggestions ===
modules/rds/variables.tf:5 - missing description on variable "backup_retention"
```Terraform module design: input variable shape, output curation, versioning via git tags or registry, single-resource vs composite modules, README + terraform-docs, examples directory, semver discipline.
Terraform module design: input variable shape, output curation, versioning via git tags or registry, single-resource vs composite modules, README + terraform-docs, examples directory, semver discipline.
# Terraform Module Design
## When a module is the right answer
A module makes sense when:
- The same set of resources is created in 3+ places.
- The resources together represent a meaningful unit (a VPC, an RDS pair, an ECS service).
- Consumers need a stable contract that survives internal refactoring.
A module is *not* the right answer when:
- The unit is "a single resource with a couple of inputs" - just use the resource.
- The unit is "everything for project X" - that is your root module, not a reusable module.
## Module structure
```
modules/vpc/
├── README.md # usage + terraform-docs output
├── versions.tf # required_version + required_providers
├── variables.tf # all inputs, typed, described, validated
├── main.tf # primary resources
├── data.tf # data sources (optional, keep main.tf focused)
├── outputs.tf # curated, minimal outputs
├── examples/
│ ├── simple/
│ │ ├── main.tf
│ │ └── README.md
│ └── multi-az/
│ └── main.tf
└── tests/
└── vpc.tftest.hcl
```
`examples/` contains complete, runnable configurations using the module. They serve as documentation and as integration tests.
## versions.tf
```hcl
terraform {
required_version = ">= 1.5"
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 5.0"
}
}
}
```
Modules declare a *minimum* engine and provider version, not a `~>` pin - that responsibility belongs to root modules. Use `>=` to widen compatibility for consumers.
## Inputs: minimal surface, maximum validation
```hcl
variable "name" {
description = "Logical name for this VPC (used as a tag prefix)"
type = string
validation {
condition = can(regex("^[a-z][a-z0-9-]{1,30}$", var.name))
error_message = "name must be lowercase, alphanumeric + hyphen, 2-31 chars"
}
}
variable "cidr_block" {
description = "VPC CIDR block (RFC 1918 only)"
type = string
validation {
condition = anytrue([
startswith(var.cidr_block, "10."),
startswith(var.cidr_block, "172.16."),
startswith(var.cidr_block, "192.168."),
])
error_message = "cidr_block must be RFC 1918 (10/8, 172.16/12, 192.168/16)"
}
}
variable "availability_zones" {
description = "AZs to deploy subnets into"
type = list(string)
validation {
condition = length(var.availability_zones) >= 2
error_message = "at least 2 AZs required for high availability"
}
}
variable "tags" {
description = "Tags merged with module-managed tags on every resource"
type = map(string)
default = {}
}
```
Every input has `description`, `type`. Constrained inputs have `validation`. Optional inputs have `default`.
## Outputs: curated, named, sensitive-flagged
```hcl
output "vpc_id" {
description = "ID of the VPC"
value = aws_vpc.this.id
}
output "private_subnet_ids" {
description = "IDs of the private subnets, in AZ order"
value = aws_subnet.private[*].id
}
output "vpc_endpoint_s3_id" {
description = "ID of the S3 VPC endpoint, if enabled"
value = try(aws_vpc_endpoint.s3[0].id, null)
}
```
Outputs are the module's public API. Expose the IDs and ARNs callers need, not the underlying resource objects.
## Tagging
```hcl
locals {
module_tags = merge(
{
"Module" = "vpc"
"ManagedBy" = "Terraform"
"Environment" = var.env
},
var.tags,
)
}
resource "aws_vpc" "this" {
cidr_block = var.cidr_block
tags = merge(local.module_tags, { Name = var.name })
}
```
Module-managed tags + caller-supplied tags merged. Caller can override the module's defaults with the same key.
## Versioning
Tag releases with semver:
```bash
git tag v1.2.0
git push origin v1.2.0
```
Consumers pin to a tag:
```hcl
module "vpc" {
source = "git::ssh://git@github.com/acme/tf-modules.git//vpc?ref=v1.2.0"
...
}
```
For registry-hosted modules:
```hcl
module "vpc" {
source = "acme/vpc/aws"
version = "~> 1.2"
}
```
Semver discipline:
- Patch: bug fix, no input/output contract change.
- Minor: new optional input, new output, backwards-compatible behavior change.
- Major: breaking change (removed input, renamed resource, behavior change that requires consumer modification).
Document every breaking change in `CHANGELOG.md` with a migration note.
## README + terraform-docs
```bash
terraform-docs markdown table --output-file README.md --output-mode inject .
```
Auto-injects the Inputs / Outputs tables from `variables.tf` and `outputs.tf` into the README. Hook it into pre-commit so PRs always have current docs:
```yaml
- id: terraform_docs
args: [--args=--config=.terraform-docs.yml]
```
## Module tests with `*.tftest.hcl`
```hcl
# tests/vpc.tftest.hcl
run "valid_cidr_required" {
command = plan
variables {
name = "test"
cidr_block = "172.32.0.0/16" # not RFC 1918
availability_zones = ["eu-central-1a", "eu-central-1b"]
}
expect_failures = [var.cidr_block]
}
run "creates_vpc_when_inputs_valid" {
command = plan
variables {
name = "test"
cidr_block = "10.0.0.0/16"
availability_zones = ["eu-central-1a", "eu-central-1b"]
}
assert {
condition = aws_vpc.this.cidr_block == "10.0.0.0/16"
error_message = "vpc cidr_block did not match input"
}
}
```
Run with `terraform test`. Tests live next to the module they exercise. The `examples/` directory complements tests with apply-able configurations.
## What NOT to expose
- Don't pass raw provider configs through a module. Modules accept inputs, the caller configures the provider.
- Don't expose every resource's id as an output. Curate.
- Don't accept a `name_prefix` and a `name_suffix` and a `name_separator` - just accept `name`.
- Don't accept arbitrary `aws_vpc` arguments via `variable "extra_args" { type = any }`. That defeats the module's purpose.
- Don't bundle multiple unrelated resources into one module to "save a call site". Composition at the root is cleaner than coupling inside the module.Terraform / OpenTofu anti-pattern detector. Catches 0.0.0.0/0 ingress, count for stable resources, local backend in shared modules, unpinned providers/engine, ignore_changes=all, null_resource as a hammer, terraform_remote_state for everything, lifecycle abuse, renaming without moved, CLI imports, workspaces-as-envs, secrets in HCL, unguarded RDS instances (missing backup/deletion-protection/final-snapshot), wildcard IAM, sensitive outputs without flag.
Terraform / OpenTofu anti-pattern detector. Catches 0.0.0.0/0 ingress, count for stable resources, local backend in shared modules, unpinned providers/engine, ignore_changes=all, null_resource as a hammer, terraform_remote_state for everything, lifecycle abuse, renaming without moved, CLI imports, workspaces-as-envs, secrets in HCL, unguarded RDS instances (missing backup/deletion-protection/final-snapshot), wildcard IAM, sensitive outputs without flag.
# Terraform / OpenTofu Anti-Patterns
Reject these in generated HCL. Each entry has a BAD example and the CORRECT replacement.
## 1. `count` for stable collections
```hcl
# BAD - removing buckets[1] renumbers buckets[2..n], destroys + recreates them
resource "aws_s3_bucket" "logs" {
count = length(var.names)
bucket = var.names[count.index]
}
# CORRECT
resource "aws_s3_bucket" "logs" {
for_each = toset(var.names)
bucket = each.value
}
```
Reserve `count` for conditional resources (`count = var.enabled ? 1 : 0`).
## 2. Local state in shared code
```hcl
# BAD - terraform.tfstate on a laptop, no locking
terraform { required_version = ">= 1.5" }
# CORRECT
terraform {
required_version = "~> 1.15"
backend "s3" {
bucket = "acme-tfstate"
key = "vpc/prod.tfstate"
region = "eu-central-1"
dynamodb_table = "tfstate-locks"
encrypt = true
}
}
```
## 3. Unpinned providers and engine
```hcl
# BAD - any provider version, any Terraform version
provider "aws" { region = "eu-central-1" }
# CORRECT
terraform {
required_version = "~> 1.15"
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.70" }
}
}
```
## 4. Renaming a resource without a `moved` block
```hcl
# BAD - rename causes destroy + create
resource "aws_iam_role" "app" { ... } # was "application"
# CORRECT
moved {
from = aws_iam_role.application
to = aws_iam_role.app
}
```
## 5. `terraform import` CLI instead of `import` block
LLMs suggest `terraform import aws_instance.web i-abc`. That mutates state on one machine; the next `plan` on another machine shows the resource as "to be created". Always commit an `import` block in code, apply, then remove the block in a follow-up PR.
## 6. `lifecycle { ignore_changes = all }` to hide drift
```hcl
# BAD - hides every drift forever
lifecycle { ignore_changes = all }
# CORRECT - ignore only the specific mutable fields you accept drift on
lifecycle {
ignore_changes = [tags["LastScanned"], tags["LastBilled"]]
}
```
`ignore_changes = all` turns Terraform into a write-only system - drift becomes invisible.
## 7. `null_resource` + `local-exec` as a hammer
```hcl
# BAD
resource "null_resource" "deploy" {
provisioner "local-exec" {
command = "kubectl apply -f manifest.yaml"
}
}
# CORRECT - use the right provider resource
resource "kubernetes_manifest" "deploy" {
manifest = yamldecode(file("${path.module}/manifest.yaml"))
}
```
`null_resource` and `terraform_data` (1.4+) are escape hatches. Reach for a real provider resource first.
## 8. `0.0.0.0/0` ingress on SSH / RDP / database ports
```hcl
# BAD
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# CORRECT
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = var.admin_cidrs # validated to exclude 0.0.0.0/0
}
```
Public ingress on management ports is the single most common Terraform-shipped vulnerability. `0.0.0.0/0` is only acceptable for ports 80/443 of a public load balancer.
## 9. Secrets in `variable` defaults or `.tfvars`
```hcl
# BAD - committed to git
variable "db_password" {
default = "hunter2"
}
# Also BAD - terraform.tfvars committed
db_password = "hunter2"
# CORRECT (Terraform 1.10+) - ephemeral resource
ephemeral "aws_secretsmanager_secret_version" "db" {
secret_id = var.db_secret_arn
}
resource "aws_db_instance" "main" {
password = ephemeral.aws_secretsmanager_secret_version.db.secret_string
}
```
Pre-1.10 Terraform / OpenTofu: use a `data` source on Secrets Manager + encrypt the state.
## 10. Workspaces used as environments
```hcl
# BAD
resource "aws_s3_bucket" "logs" {
bucket = "${terraform.workspace}-logs"
}
# All envs share the same state, the same code path, the same blast radius
```
Use separate directories or separate backend keys per environment. Workspaces are for cheap parallel state (PR previews, ephemeral testing).
## 11. `depends_on` overuse
```hcl
# BAD - Terraform already detects this dependency through the reference
resource "aws_iam_role_policy" "p" {
role = aws_iam_role.r.id
depends_on = [aws_iam_role.r] # redundant
}
# CORRECT - implicit dependency through aws_iam_role.r.id reference
```
`depends_on` is only needed for hidden runtime dependencies (e.g., an EC2 instance that needs an IAM policy attached before it boots; the policy attachment is not in the instance's argument list).
## 12. `terraform_remote_state` for everything
```hcl
# BAD - tightly couples this module's state layout to another's
data "terraform_remote_state" "vpc" {
backend = "s3"
config = { ... }
}
resource "aws_security_group" "x" {
vpc_id = data.terraform_remote_state.vpc.outputs.vpc_id
}
# CORRECT - look up the VPC by tag or name with a provider data source
data "aws_vpc" "main" {
tags = { Environment = var.env }
}
resource "aws_security_group" "x" {
vpc_id = data.aws_vpc.main.id
}
```
Provider data sources query the cloud directly. Remote state ties two modules together at the state-layout level, which is brittle.
## 13. Unmarked sensitive outputs
```hcl
# BAD
output "db_password" {
value = aws_db_instance.main.password
}
# CORRECT
# Do not output secrets at all; if you must:
output "db_password" {
value = aws_db_instance.main.password
sensitive = true
}
```
Unmarked secret outputs appear in plan/apply logs and CI run artifacts.
## 14. Static provider credentials in HCL
```hcl
# BAD
provider "aws" {
access_key = var.aws_key
secret_key = var.aws_secret
}
# CORRECT - OIDC from CI, or shared credentials file locally
provider "aws" {
region = var.region
}
```
## 15. Wildcard IAM action / resource
```hcl
# BAD
statement {
effect = "Allow"
actions = ["*"]
resources = ["*"]
}
# CORRECT - narrow per principle of least privilege
statement {
effect = "Allow"
actions = ["s3:GetObject", "s3:PutObject"]
resources = [aws_s3_bucket.logs.arn, "${aws_s3_bucket.logs.arn}/*"]
}
```
Action or resource wildcards on a resource-based or identity-based policy are flagged by every IAM auditor.
## 16. Variables without `type` / `description`
```hcl
# BAD
variable "name" {}
# CORRECT
variable "name" {
description = "Friendly name for the resource group"
type = string
validation {
condition = length(var.name) <= 64
error_message = "name must be 64 chars or fewer"
}
}
```
`description` is consumed by `terraform-docs` and IDE tooling. `type` catches misuse at plan time. `validation` catches the rest.
## 17. `dynamic` block with one literal iteration
```hcl
# BAD - dynamic block iterating over a single static element
dynamic "ingress" {
for_each = [80]
content {
from_port = ingress.value
to_port = ingress.value
...
}
}
# CORRECT - just use a literal block
ingress {
from_port = 80
to_port = 80
...
}
```
`dynamic` is for genuinely variable-length collections of nested blocks. One literal element is noise.
## 18. Module too granular or too coarse
Module antipatterns:
- **Too granular**: a module that wraps one resource (`aws_s3_bucket`) and exposes 20 inputs - usability suffers, callers might as well use the resource directly.
- **Too coarse**: a single `infrastructure` module that creates VPC + RDS + Lambda + S3 + CloudFront - cannot be reused for any other shape.
- **Leaks raw IDs**: returns `aws_s3_bucket.this.id` instead of a curated set of named outputs.
- **Unversioned**: `source = "git::..."` with no `?ref=v1.2.3` - upstream changes silently break consumers.
## 19. Public database / unencrypted volumes / IMDSv1
```hcl
# BAD
resource "aws_db_instance" "main" {
publicly_accessible = true
}
resource "aws_ebs_volume" "data" {
encrypted = false # or omitted, which defaults to false on some providers
}
resource "aws_instance" "web" {
metadata_options {
http_tokens = "optional" # IMDSv1 enabled
}
}
# CORRECT
resource "aws_db_instance" "main" {
publicly_accessible = false
}
resource "aws_ebs_volume" "data" {
encrypted = true
}
resource "aws_instance" "web" {
metadata_options {
http_tokens = "required" # IMDSv2 only
}
}
```
## 20. No `terraform fmt` / `validate` / `tflint` / policy scan in CI
Every module merged to main should pass:
```bash
terraform fmt -check -recursive
terraform validate
tflint
trivy config . # or checkov, kics
```
Missing this is not strictly an HCL bug but lets every other anti-pattern in this file slip through.AWS-specific Terraform security: S3 bucket policies, IAM least-privilege, KMS encryption, RDS hardening, EC2 IMDSv2, VPC flow logs, GuardDuty, Security Hub. Catches public buckets, wildcard IAM, IMDSv1, unencrypted volumes, public DBs.
AWS-specific Terraform security: S3 bucket policies, IAM least-privilege, KMS encryption, RDS hardening, EC2 IMDSv2, VPC flow logs, GuardDuty, Security Hub. Catches public buckets, wildcard IAM, IMDSv1, unencrypted volumes, public DBs.
# Terraform on AWS: Security Patterns
## S3 buckets: blocked public access, encrypted, versioned
```hcl
resource "aws_s3_bucket" "logs" {
bucket = "${var.env}-${var.app}-logs"
}
resource "aws_s3_bucket_public_access_block" "logs" {
bucket = aws_s3_bucket.logs.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_bucket_server_side_encryption_configuration" "logs" {
bucket = aws_s3_bucket.logs.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.s3.arn
}
bucket_key_enabled = true
}
}
resource "aws_s3_bucket_versioning" "logs" {
bucket = aws_s3_bucket.logs.id
versioning_configuration { status = "Enabled" }
}
```
Every S3 bucket needs the public access block, server-side encryption config, and versioning. Without all three, `trivy config` will fail the build (correctly). (`tfsec` was merged into Trivy; treat them as one tool now.)
## IAM: least privilege, no wildcards
```hcl
# CORRECT - narrow actions, scoped resources
data "aws_iam_policy_document" "s3_read" {
statement {
sid = "ReadLogsBucket"
effect = "Allow"
actions = ["s3:GetObject", "s3:ListBucket"]
resources = [
aws_s3_bucket.logs.arn,
"${aws_s3_bucket.logs.arn}/*",
]
}
}
resource "aws_iam_policy" "s3_read" {
name = "${var.env}-s3-logs-read"
policy = data.aws_iam_policy_document.s3_read.json
}
```
Never `actions = ["*"]` or `resources = ["*"]`. AWS Managed Policies (`AmazonS3FullAccess`) are also too broad - prefer purpose-built customer-managed policies.
## RDS: private, encrypted, backed up, deletion-protected
```hcl
resource "aws_db_instance" "main" {
identifier = "${var.env}-${var.app}"
engine = "postgres"
engine_version = "16.4"
instance_class = "db.t4g.small"
allocated_storage = 20
publicly_accessible = false # never public
storage_encrypted = true
kms_key_id = aws_kms_key.rds.arn
backup_retention_period = 7
deletion_protection = true # prevents `terraform destroy` foot-gun
skip_final_snapshot = false
final_snapshot_identifier = "${var.env}-${var.app}-final"
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [aws_security_group.db.id]
password = ephemeral.aws_secretsmanager_secret_version.db.secret_string
username = "appuser"
enabled_cloudwatch_logs_exports = ["postgresql"]
monitoring_interval = 60
monitoring_role_arn = aws_iam_role.rds_monitoring.arn
performance_insights_enabled = true
performance_insights_kms_key_id = aws_kms_key.rds.arn
tags = local.tags
}
```
Security group for the DB allows ingress only from app security groups, never from CIDR ranges.
## EC2: IMDSv2 required, encrypted root volume
```hcl
resource "aws_instance" "web" {
ami = data.aws_ami.al2023.id
instance_type = var.instance_type
subnet_id = aws_subnet.private[0].id
metadata_options {
http_tokens = "required" # IMDSv2 only
http_endpoint = "enabled"
http_put_response_hop_limit = 1
instance_metadata_tags = "enabled"
}
root_block_device {
encrypted = true
kms_key_id = aws_kms_key.ebs.arn
volume_size = 20
volume_type = "gp3"
}
vpc_security_group_ids = [aws_security_group.web.id]
iam_instance_profile = aws_iam_instance_profile.web.name
tags = merge(local.tags, { Name = "${var.env}-web" })
}
```
IMDSv1 (`http_tokens = "optional"`) is the SSRF-to-credential-theft attack vector behind the Capital One breach. `required` enforces v2.
## Security groups: deny by default, narrow by intent
```hcl
resource "aws_security_group" "alb" {
name = "${var.env}-alb"
description = "Public ALB"
vpc_id = aws_vpc.main.id
# only port 443 from the internet
ingress {
description = "HTTPS from internet"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # OK for a public ALB on 443
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
tags = local.tags
}
resource "aws_security_group" "app" {
name = "${var.env}-app"
description = "App servers, ingress from ALB only"
vpc_id = aws_vpc.main.id
ingress {
description = "From ALB"
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
}
```
App tier ingress references the ALB's security group, not a CIDR. Egress is wide because outbound is rarely the threat surface (and AWS API calls need it).
## VPC flow logs, GuardDuty, Security Hub
```hcl
resource "aws_flow_log" "vpc" {
vpc_id = aws_vpc.main.id
iam_role_arn = aws_iam_role.flow_logs.arn
log_destination = aws_cloudwatch_log_group.flow_logs.arn
traffic_type = "ALL"
}
resource "aws_guardduty_detector" "main" {
enable = true
finding_publishing_frequency = "FIFTEEN_MINUTES"
}
resource "aws_securityhub_account" "main" {}
```
Flow logs catch unexpected outbound traffic. GuardDuty catches known-bad patterns. Security Hub aggregates findings. None of them prevent incidents but they make detection possible.
## KMS keys: rotation enabled, scoped policy
```hcl
resource "aws_kms_key" "data" {
description = "${var.env} application data encryption"
enable_key_rotation = true
deletion_window_in_days = 30
policy = data.aws_iam_policy_document.kms_data.json
}
resource "aws_kms_alias" "data" {
name = "alias/${var.env}-data"
target_key_id = aws_kms_key.data.id
}
```
Key rotation enabled. The default key policy is overly permissive - write your own that scopes `kms:Decrypt` to specific roles.
## Tagging for cost + compliance
```hcl
locals {
tags = {
Environment = var.env
Application = var.app
Owner = var.owner
CostCenter = var.cost_center
ManagedBy = "Terraform"
Compliance = var.compliance # e.g., "SOC2", "PCI"
}
}
```
Apply via `provider.default_tags`:
```hcl
provider "aws" {
region = var.region
default_tags { tags = local.tags }
}
```
Default tags apply to every taggable resource. Per-resource `tags = { Name = "..." }` merges with the defaults.
## Policy-as-code in CI
```bash
trivy config .
checkov -d .
```
Pin tool versions in the workflow. These catch the patterns in this rule file when humans miss them. A failing scan should block the PR, not warn.Core Terraform / OpenTofu rules. Enforces version pinning, remote backend with locking, for_each over count, moved/removed/import blocks for safe refactoring, typed + validated variables, sensitive outputs, OIDC for CI auth. Covers Terraform 1.9+ and OpenTofu 1.8+ delta.
Core Terraform / OpenTofu rules. Enforces version pinning, remote backend with locking, for_each over count, moved/removed/import blocks for safe refactoring, typed + validated variables, sensitive outputs, OIDC for CI auth. Covers Terraform 1.9+ and OpenTofu 1.8+ delta.
# Terraform / OpenTofu Core Rules
You are writing **Terraform 1.9+** or **OpenTofu 1.8+**. The two languages are 95% compatible; differences are flagged explicitly. Follow these rules.
## Pin Terraform/OpenTofu and provider versions
Every root module declares the engine and provider versions. `terraform plan` becomes deterministic only when versions are pinned.
```hcl
terraform {
required_version = "~> 1.15"
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.70" }
random = { source = "hashicorp/random", version = "~> 3.6" }
}
}
```
`~> 5.70` accepts 5.71, 5.72 but not 6.0 (semver). Pin the engine in `required_version`, providers in `required_providers`.
## Remote backend with state locking
Local state is fine for personal experiments. Anything shared - team, CI, production - needs a remote backend with locking.
```hcl
# Terraform Cloud, HCP Terraform, AWS, GCP, Azure all supported
terraform {
backend "s3" {
bucket = "acme-tfstate"
key = "prod/vpc.tfstate"
region = "eu-central-1"
dynamodb_table = "tfstate-locks" # state locking
encrypt = true
}
}
```
OpenTofu 1.7+ adds native state encryption on top of any backend:
```hcl
terraform {
encryption {
key_provider "aws_kms" "k" {
kms_key_id = "alias/tofu-state"
region = "eu-central-1"
}
method "aes_gcm" "m" { keys = key_provider.aws_kms.k }
state { method = method.aes_gcm.m }
plan { method = method.aes_gcm.m }
}
}
```
This is OpenTofu-only. Terraform does not support encryption at the engine level; rely on backend-side encryption (S3 SSE, GCS CMEK) or HCP Terraform.
## `for_each` over `count` for stable identity
```hcl
# CORRECT
resource "aws_s3_bucket" "logs" {
for_each = toset(var.bucket_names)
bucket = each.value
}
# WRONG - removing bucket_names[1] renumbers buckets[2..n], destroying and recreating them
resource "aws_s3_bucket" "logs" {
count = length(var.bucket_names)
bucket = var.bucket_names[count.index]
}
```
`for_each` keys resources by name. Adding/removing items only affects the specific resource. `count` keys by index, so any insertion or deletion shifts every subsequent resource into a different state address. Reserve `count` for "this resource is conditional on a single boolean":
```hcl
resource "aws_s3_bucket" "audit" {
count = var.audit_logging ? 1 : 0
bucket = "${var.env}-audit"
}
```
## Refactor with `moved`, retire with `removed`, adopt with `import`
When renaming or restructuring resources, ship a `moved` block in the same PR. Without it, Terraform sees a new resource at the new address and destroys+recreates.
```hcl
moved {
from = aws_iam_role.application
to = aws_iam_role.app
}
```
When deleting a resource you want to keep alive (rare, but real: pre-existing infra adopted then handed back to a parent stack), use `removed` (Terraform 1.7+, OpenTofu 1.6+):
```hcl
removed {
from = aws_iam_role.legacy
lifecycle { destroy = false }
}
```
For adopting existing infrastructure into Terraform, use `import` blocks - not the `terraform import` CLI:
```hcl
import {
to = aws_s3_bucket.logs
id = "acme-prod-logs"
}
```
Commit the `import` block in a PR, apply, then delete the block in a follow-up PR. The CLI's `terraform import` mutates state on one machine and leaves the next plan showing the resource as "to be created" everywhere else.
## Variables: typed, described, validated
```hcl
variable "instance_type" {
description = "EC2 instance type for the web tier"
type = string
default = "t3.micro"
validation {
condition = can(regex("^t3\\.", var.instance_type))
error_message = "instance_type must be a t3 family size"
}
}
variable "admin_cidrs" {
description = "CIDR blocks allowed to SSH"
type = list(string)
validation {
condition = !contains(var.admin_cidrs, "0.0.0.0/0")
error_message = "admin_cidrs must not include 0.0.0.0/0"
}
}
```
Every variable has `type`, `description`. Constrained variables have `validation` blocks. Terraform 1.9+ allows validation conditions to reference other variables, locals, and data sources.
## Outputs: minimal, typed, sensitive flagged
```hcl
output "vpc_id" {
description = "ID of the created VPC"
value = aws_vpc.main.id
}
output "db_endpoint" {
description = "RDS endpoint hostname"
value = aws_db_instance.main.endpoint
sensitive = false
}
```
If an output bears a secret value (password, token, key), either omit it entirely or mark `sensitive = true`. Sensitive outputs do not appear in plan/apply logs.
## Secrets via ephemeral resources (Terraform 1.10+)
Storing secrets in `variable` defaults or `.tfvars` files commits them to state. Ephemeral resources (Terraform 1.10+) read each phase, never written to state or plan:
```hcl
ephemeral "aws_secretsmanager_secret_version" "db" {
secret_id = "prod/db/password"
}
resource "aws_db_instance" "main" {
password = ephemeral.aws_secretsmanager_secret_version.db.secret_string
}
```
For OpenTofu or pre-1.10 Terraform, use a `data` source on Secrets Manager / SSM Parameter Store and mark the data source `sensitive`. State will still contain the value - encrypt the state file.
Terraform 1.11+ adds **write-only arguments** on managed resources: arguments accept ephemeral values but never persist to state. For `aws_db_instance.password` this completes the secrets story - the value flows through Terraform but never lands in state or plan. Check provider docs for which arguments are write-only-capable.
## OIDC federation for CI auth
```yaml
# GitHub Actions
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/terraform-deploy
aws-region: eu-central-1
audience: sts.amazonaws.com
```
OIDC tokens are minted per workflow run, scoped to the repository and branch. No static AWS access keys in repo secrets or environment.
## Repo layout: live envs + reusable modules
```
infra/
modules/
vpc/ # reusable, semver-tagged
main.tf variables.tf outputs.tf versions.tf
rds/
...
live/
prod/eu-central-1/vpc/
main.tf backend.tf terraform.tfvars
staging/eu-central-1/vpc/
dev/eu-central-1/vpc/
```
One root module per environment+region. Cross-environment state never shared. Modules versioned via git tag or registry:
```hcl
module "vpc" {
source = "git::ssh://git@github.com/acme/tf-modules.git//vpc?ref=v1.4.0"
# or registry
# source = "acme/vpc/aws"
# version = "~> 1.4"
}
```
`source = "../../../modules/vpc"` is acceptable inside a monorepo but loses semver discipline; pin to tags once a module is consumed across teams.
## `check` blocks for runtime invariants (Terraform 1.5+)
```hcl
check "api_healthy" {
data "http" "endpoint" { url = "https://api.example.com/healthz" }
assert {
condition = data.http.endpoint.status_code == 200
error_message = "api healthz returned non-200"
}
}
check "cost_budget" {
assert {
condition = var.instance_count <= 10
error_message = "instance_count > 10 requires SRE approval"
}
}
```
`check` blocks run after apply (and during plan if the data sources are pure). Failing assertions emit warnings but do not block apply - use them for advisory invariants, not gates.
## Workspaces are NOT environments
`terraform workspace` is for cheap parallel state (PR previews, feature branches sharing the same code). Using `terraform.workspace` interpolated into resource names produces a shared backend with cross-environment blast radius - one corrupted state takes down prod.
For real environments, use separate directories (`live/prod`, `live/staging`) or separate backend keys. Workspaces are a fine-grained tool, not an environment boundary.
## Pre-commit: fmt, validate, tflint, trivy
```yaml
# .pre-commit-config.yaml
repos:
- repo: https://github.com/antonbabenko/pre-commit-terraform
rev: v1.92.0
hooks:
- id: terraform_fmt
- id: terraform_validate
- id: terraform_tflint
- id: terraform_trivy
- id: terraform_docs
```
These run on every commit, catch formatting drift, syntax errors, security issues, and stale module docs. Mandatory in CI as well.
## Provider authentication: never static keys in HCL
```hcl
# BAD
provider "aws" {
access_key = var.aws_access_key
secret_key = var.aws_secret_key
region = "eu-central-1"
}
# CORRECT - OIDC federation (CI) or shared credentials file (local)
provider "aws" {
region = var.region
}
```
The provider picks up credentials from `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` env vars, `~/.aws/credentials`, or IAM role assumption (CI). Never declare credentials in HCL.Terraform / OpenTofu testing: tftest framework with run blocks, mocked providers in OpenTofu 1.8+, fixture-driven examples directory, terratest for full E2E. Agent-requested - apply when writing or reviewing *.tftest.hcl files.
Terraform / OpenTofu testing: tftest framework with run blocks, mocked providers in OpenTofu 1.8+, fixture-driven examples directory, terratest for full E2E. Agent-requested - apply when writing or reviewing *.tftest.hcl files.
# Terraform Testing
## `*.tftest.hcl` framework (Terraform 1.6+, OpenTofu 1.6+)
Native test framework. Tests live alongside the module they exercise.
```hcl
# tests/vpc.tftest.hcl
run "rejects_non_rfc1918_cidr" {
command = plan
variables {
name = "test"
cidr_block = "200.0.0.0/16" # public range
availability_zones = ["eu-central-1a", "eu-central-1b"]
}
expect_failures = [var.cidr_block]
}
run "creates_vpc_with_valid_input" {
command = plan
variables {
name = "test"
cidr_block = "10.0.0.0/16"
availability_zones = ["eu-central-1a", "eu-central-1b"]
}
assert {
condition = aws_vpc.this.cidr_block == "10.0.0.0/16"
error_message = "vpc cidr_block did not match input"
}
assert {
condition = length(aws_subnet.private) == 2
error_message = "expected 2 private subnets"
}
}
run "applies_cleanly" {
command = apply
variables {
name = "ci-${run.id}"
cidr_block = "10.123.0.0/16"
availability_zones = ["eu-central-1a", "eu-central-1b"]
}
assert {
condition = aws_vpc.this.tags["Name"] == "ci-${run.id}"
error_message = "Name tag not propagated"
}
}
```
Run with `terraform test` (or `tofu test`). Plan-only tests run against the configured provider but never apply; apply-tests create real resources, then destroy them.
## `expect_failures` for input validation
```hcl
run "rejects_zero_zero_admin_cidr" {
command = plan
variables {
admin_cidrs = ["0.0.0.0/0"]
}
expect_failures = [var.admin_cidrs]
}
```
Asserts that the validation block on `var.admin_cidrs` rejects the input. Use to verify your validations are wired correctly.
## Mocked providers (OpenTofu 1.8+)
OpenTofu supports provider mocking in `tofu test`:
```hcl
mock_provider "aws" {
mock_resource "aws_s3_bucket" {
defaults = {
bucket = "mocked-bucket"
arn = "arn:aws:s3:::mocked-bucket"
}
}
}
run "creates_bucket_with_mock" {
command = plan
providers = { aws = mock_provider.aws }
...
}
```
Plan tests against a mocked provider run without real cloud credentials. Useful for fast CI feedback. Terraform does not have this feature in 1.x.
Use `alias = "..."` only when you need multiple distinct mock providers (multi-region, multi-account):
```hcl
mock_provider "aws" { alias = "primary" }
mock_provider "aws" { alias = "secondary" }
```
## Apply tests in CI
Apply tests cost money and time. Strategies:
- **Per-PR plan-only tests**: free, fast, catch syntax + validation + plan-shape regressions.
- **Nightly apply test against a dedicated CI account**: catches drift in provider behavior and accidentally-broken resource configurations.
- **Per-release apply test**: full module exercise before tagging a version.
```hcl
run "applies_in_ci" {
command = apply
variables {
name = "ci-test-${run.id}" # unique per run, prevents collisions
}
}
```
`run.id` is a globally unique identifier supplied by the test framework.
## `examples/` directory as integration tests
```
modules/vpc/
├── examples/
│ ├── simple/
│ │ ├── main.tf
│ │ └── README.md
│ └── multi-az/
│ ├── main.tf
│ └── README.md
```
Each example is a complete, apply-able config that exercises the module. In CI:
```bash
for dir in modules/vpc/examples/*/; do
cd "$dir"
terraform init
terraform plan
done
```
A failing plan in any example fails the build.
## Terratest for full E2E
For Go-based integration tests against real cloud:
```go
package test
import (
"testing"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/stretchr/testify/assert"
)
func TestVpcApply(t *testing.T) {
t.Parallel()
opts := &terraform.Options{
TerraformDir: "../examples/simple",
Vars: map[string]interface{}{
"name": "terratest-" + randomString(),
},
}
defer terraform.Destroy(t, opts)
terraform.InitAndApply(t, opts)
vpcId := terraform.Output(t, opts, "vpc_id")
assert.Regexp(t, "^vpc-", vpcId)
}
```
Terratest's strength is mid-apply assertions (call AWS SDK to verify the actual cloud state after apply, not just the Terraform state). Use it for high-value modules; plain `*.tftest.hcl` is faster for simple cases.
## Policy tests with OPA / Conftest
```rego
# policies/terraform.rego
package terraform.security
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_security_group_rule"
resource.change.after.cidr_blocks[_] == "0.0.0.0/0"
resource.change.after.from_port <= 22
resource.change.after.to_port >= 22
msg := sprintf("SSH ingress from 0.0.0.0/0 on %v", [resource.address])
}
```
```bash
terraform plan -out plan.tfplan
terraform show -json plan.tfplan > plan.json
conftest test plan.json --policy policies/
```
Pair with `trivy config` / `checkov` (which ship pre-built policy bundles; tfsec is now part of Trivy). Custom OPA policies cover org-specific rules (allowed regions, required tags, naming patterns).
## What to test, what not to test
Test:
- Validation block rejects invalid inputs.
- The right resources are created (count, type).
- Tag merging works correctly.
- Outputs match expected values.
Don't test:
- That AWS itself works ("an `aws_s3_bucket` actually creates an S3 bucket"). That's AWS's job.
- Plan output for resources you have no control over (managed identities, provider-generated IDs).
- Implementation details that change between provider versions (specific computed attributes).