General Pillar

Terraform Architecture Review: A Complete Guide (2026)

What a Terraform architecture review is, how it differs from IaC linting, and how to run one against the AWS Well-Architected Framework — with a step-by-step process, HCL examples, and a pillar-by-pillar checklist.

May 11, 202618 min read#terraform#aws#well-architected#iac#architecture review

Terraform is now the dominant way to describe AWS infrastructure. But “described in Terraform” and “correctly architected for your workload” are not the same thing. Plenty of infrastructure can be syntactically valid, pass every linter check, and still carry significant risk — because the risk lives in the relationships between resources, in what is absent, and in whether the configuration is appropriate for the workload it serves.

A Terraform architecture review is the process of evaluating infrastructure code against a structured framework — specifically the AWS Well-Architected Framework — to answer whether the architecture is sound for its intended purpose, not just whether individual attributes are correctly set.

This guide explains what that process involves, how it differs from the linting step you already run in CI, and how to execute it yourself — or with tooling — in a structured and repeatable way.

What Is a Terraform Architecture Review?

A Terraform architecture review is a structured evaluation of your infrastructure code against a set of architectural standards — typically the six pillars of the AWS Well-Architected Framework. It treats your Terraform as a system: reading relationships between resources, evaluating cross-service blast radius, and assessing whether the configuration as a whole is appropriate for the workload it describes.

It produces a prioritised list of findings — each one mapped to a WAF control, assigned a severity based on workload context, and accompanied by a remediation recommendation.

Crucially, the review is informed by workload context: the same Terraform configuration can be Low severity in a development environment and Critical in production. A Terraform architecture review begins by capturing this context — service type, environment, SLA, data sensitivity — and uses it to weight every finding accordingly.

Architecture Review vs. IaC Linting

Both processes read your Terraform. Both produce findings. They are not interchangeable.

Linters catch violations. Architecture reviews catch patterns.

Checkov asks: “Is this config valid?”
An architecture review asks: “Is this architecture sound for your workload?”

IaC linters — Checkov, Trivy, tfsec — apply a library of deterministic rules to individual resource attributes. Each rule evaluates one attribute against a known-bad condition. This is fast, reproducible, and free of false negatives for the conditions each rule covers. Checkov currently has over 1,000 rules for Terraform and runs in CI without any context about what the infrastructure does.

The ceiling on linting is structural. A linter cannot evaluate:

  • ·Cross-resource relationships — whether an IAM role can reach a KMS key that encrypts a specific S3 bucket, and what that access path means for blast radius.
  • ·Workload context — whether the environment is production or development, whether the SLA requires multi-AZ, whether the data is regulated.
  • ·Absent resources — a Lambda without a Dead Letter Queue, an RDS instance without cross-region backup, a VPC with no Flow Logs. Linters read what is there; architecture reviews also evaluate what should be there but isn't.
  • ·Pattern-level risks — overly broad IAM permissions that are not wildcards (and therefore pass CKV_AWS_40) but still violate the principle of least privilege for the specific workload.

For a detailed walkthrough of three specific Terraform examples that pass all linter checks but fail an architectural review, see What Checkov Catches — and What It Misses.

When Do You Need a Terraform Architecture Review?

Not every change warrants a full review. The moments where it pays off most are:

Before the first production deployment of a new service

Architectural debt is easiest to fix before it is deployed. A review at this stage catches reliability and security patterns that would require disruptive changes to fix later.

Before a major infrastructure change

New VPC, new account structure, new data store, new IAM trust policy — any change that affects the blast radius or reliability boundary of an existing service.

After inheriting infrastructure you did not build

New CTOs, new platform leads, and acquired-company onboarding all involve taking ownership of an environment built under different constraints. A structured review surfaces the risk before you are accountable for it.

Before a compliance audit or investor due diligence

Series A technical due diligence now includes cloud infrastructure review. AWS Well-Architected findings in IAM, encryption, and observability are flagged by DD teams. Getting ahead of this with a review first is faster than remediating under time pressure.

On a recurring schedule for production workloads

AWS recommends re-running a Well-Architected Review after any significant workload change. Quarterly is appropriate for higher-risk workloads (payment processing, healthcare, multi-tenant SaaS). Annually for stable lower-risk services.

The AWS Well-Architected Framework — the Review Standard

The AWS Well-Architected Framework is the structured standard that AWS uses for architecture reviews. It organises cloud architecture quality into six pillars, each containing a set of design principles and specific controls (called High Risk Issues, or HRIs).

Using the WAF as the review standard — rather than a custom checklist — means findings map to the same controls that AWS Partner Network consultants, AWS Solutions Architects, and the AWS Well-Architected Tool use. This makes the output meaningful to stakeholders who are familiar with the framework, and ensures the review covers the full scope of architectural risk, not just security.

Security pillar

The Security pillar covers identity and access management, detection, infrastructure protection, data protection, and incident response. For Terraform, the highest-impact controls are around IAM (SEC 2: grant least privilege; SEC 3: no long-lived credentials), network protection (SEC 5: network layers; SEC 6: no unnecessary connectivity), and data protection (SEC 8: encryption at rest and in transit).

The Security pillar is where linting coverage is highest — most Checkov rules map to Security controls. But even here there are gaps: linters cannot evaluate IAM blast radius, principal scope breadth in bucket policies, or whether VPC endpoint policies restrict access appropriately.

Reliability pillar

The Reliability pillar covers service quotas, network topology, workload architecture, change management, and failure management. Key Terraform controls: REL 6 (distribute workloads across multiple AZs), REL 7 (deploy Auto Scaling Groups with appropriate capacity), REL 8 (use load balancers for static stability), REL 9 (back up data at the correct recovery point objective), and REL 10 (use fault isolation boundaries).

The Reliability pillar is where linting coverage is weakest. Checkov has almost no rules that check for multi-AZ configuration, minimum instance counts, or backup retention against SLA requirements. These are the most common source of architectural findings in production Terraform.

Cost Optimization pillar

The Cost Optimization pillar covers cloud financial management, cost-effective resources, demand management, and optimisation over time. Key Terraform controls: COST 1 (implement cloud financial management practices — including tagging), COST 5 (select the correct resource type), and COST 6 (select the correct resource size).

For Terraform, cost findings are almost entirely absent from linters. The most common findings: untagged resources (making cost attribution impossible), oversized EC2 instances, and resources without lifecycle policies for data that could be tiered to cheaper storage.

Operational Excellence pillar

The Operational Excellence pillar covers organisation, prepare, operate, and evolve. For Terraform, the highest-impact controls are OPS 7 (understand workload health — CloudWatch alarms on error rates and latency) and OPS 11 (learn from operational events — Dead Letter Queues, structured logging, SNS alerting).

Observability infrastructure — CloudWatch alarms, log groups with retention periods, SNS topics for alerting — is entirely invisible to IaC linters. These are some of the most valuable findings a Terraform architecture review produces, because silent failures are the hardest incidents to diagnose.

Performance Efficiency and Sustainability

The Performance Efficiency pillar covers selection of the right resource types, reviewing resource configurations over time, and performance monitoring. In Terraform, this surfaces primarily as instance type selection, cache layer configuration, and database instance sizing relative to query patterns.

The Sustainability pillar (added to the WAF in 2021) covers minimising the environmental impact of cloud workloads — right-sizing, Graviton instance selection, and avoiding idle resources. Both pillars are lower-priority than Security, Reliability, and Cost for most first-time reviews, but are worth including in the scope for mature workloads.

What a Terraform Architecture Review Covers

A full review reads all Terraform resources in scope and evaluates them across six domain areas that map to the WAF pillars. Below is what each domain area covers, with a representative HCL example showing the type of finding that only surfaces through architectural review — not linting.

IAM and identity patterns

IAM review covers role trust policies, permission boundaries, managed policy attachments, and inline policies. The architectural concern goes beyond “are there wildcard actions?” — it asks whether the permissions granted are appropriate for what the principal actually needs to do.

The following Lambda execution role passes all Checkov IAM rules: no wildcard actions, no AdministratorAccess managed policy. The architectural finding is that two of the four actions are unnecessary for a read-only function — and their presence increases blast radius if the role is compromised.

iam.tf✗ Before
# iam.tf — Lambda reads order data; all Checkov IAM rules pass# (no wildcard actions, no AdministratorAccess managed policy)# WAF finding: write and delete permissions not required for a read-only functionresource "aws_iam_role_policy" "order_reader" {  name = "order-reader-policy"  role = aws_iam_role.order_lambda.id  policy = jsonencode({    Version = "2012-10-17"    Statement = [{      Effect = "Allow"      Action = [        "s3:GetObject",        "s3:ListBucket",        "s3:PutObject",       # not needed — function only reads        "s3:DeleteObject",    # not needed — blast radius if role is compromised      ]      Resource = [        aws_s3_bucket.orders.arn,        "${aws_s3_bucket.orders.arn}/*",      ]    }]  })}

All Checkov IAM checks pass — but write and delete permissions are not needed for this read-only function

iam.tf✓ After
# iam.tf — permissions scoped to exactly what the function needs (WAF SEC 2)resource "aws_iam_role_policy" "order_reader" {  name = "order-reader-policy"  role = aws_iam_role.order_lambda.id  policy = jsonencode({    Version = "2012-10-17"    Statement = [      {        Effect   = "Allow"        Action   = ["s3:GetObject"]        Resource = "${aws_s3_bucket.orders.arn}/orders/*"      },      {        Effect   = "Allow"        Action   = ["s3:ListBucket"]        Resource = aws_s3_bucket.orders.arn        Condition = {          StringLike = { "s3:prefix" = ["orders/*"] }        }      }    ]  })}

Scoped to GetObject and ListBucket only, with an s3:prefix condition to restrict list scope

Other IAM patterns an architectural review evaluates: cross-account trust policies without condition constraints, IAM users with programmatic access and no MFA enforcement, and OIDC trust policies with overly broad audience claims.

According to the Verizon 2025 Data Breach Investigations Report 15% of breaches involve cloud misconfiguration, with IAM the most common entry point. For a deeper treatment of the IAM patterns behind major AWS-adjacent incidents, see the upcoming post on IAM breach patterns.

Network topology and security groups

Network review covers VPC structure, subnet segmentation, security group rules, NACLs, VPC endpoints, and routing tables. Linting covers the most obvious violations: 0.0.0.0/0 on port 22 or 3389, missing VPC flow logs flag. Architectural review goes further.

Common architectural network findings:

  • ·Security groups using /8 or /16 CIDRs that cover entire internal networks, not specific service CIDRs — passes linters, violates defense-in-depth.
  • ·RDS instances in a public subnet even when public_access = false — the route exists even if currently blocked.
  • ·No VPC endpoint for S3 or DynamoDB — traffic routes through the internet gateway unnecessarily, increasing latency and egress cost.
  • ·Single NAT Gateway serving multiple AZs — a NAT Gateway AZ failure takes all private instances offline, not just those in its AZ.

Storage configuration and encryption

Storage review covers S3 bucket configuration (encryption, access logging, lifecycle policies, versioning, bucket policies), EBS volume encryption, RDS storage encryption and backup, and Secrets Manager vs. plaintext configuration values.

Checkov covers encryption-at-rest and public access blocks well. Architectural review adds: does the S3 lifecycle policy match the data retention requirements of the workload? Is access logging enabled — a WAF SEC 4 control that most linters do not enforce? Is the KMS key customer-managed (with rotation) or AWS-managed? For regulated data, this distinction matters for compliance.

For a complete technical walkthrough of S3 encryption requirements in the Well-Architected Security pillar, see the upcoming post on S3 Encryption in Terraform: What the Well-Architected Framework Actually Requires.

The five most common Terraform misconfigurations that fail a Well-Architected Security review — including S3 and IAM — are covered in detail in The Five Terraform Misconfigurations That Fail an AWS Well-Architected Security Review.

Compute resilience and Auto Scaling

Compute review covers EC2 Auto Scaling Group configuration, ECS service deployment settings, Lambda concurrency and timeout settings, and instance type selection. This is the domain with the largest linting gap: almost no Checkov rules evaluate fault tolerance, AZ distribution, or minimum capacity.

compute.tf✗ Before
# compute.tf — ASG configured to span multiple AZs but min_size=1# No Checkov rule checks min_size, AZ distribution, or health_check_grace_period# WAF finding: REL 6 — single instance means any failure = full service outageresource "aws_autoscaling_group" "api" {  name                = "api-asg"  min_size            = 1       # ← no minimum redundancy  max_size            = 10  desired_capacity    = 1  vpc_zone_identifier = [aws_subnet.private_a.id]  # ← single AZ  target_group_arns = [aws_lb_target_group.api.arn]  health_check_type = "ELB"  launch_template {    id      = aws_launch_template.api.id    version = "$Latest"  }}

min_size=1 in a single AZ — any AZ failure or instance failure = full outage. No Checkov rule checks this.

compute.tf✓ After
# compute.tf — multi-AZ with minimum two instances (WAF REL 6 and REL 8)resource "aws_autoscaling_group" "api" {  name                = "api-asg"  min_size            = 2       # one instance per AZ minimum  max_size            = 10  desired_capacity    = 2  vpc_zone_identifier = [       # spread across three AZs    aws_subnet.private_a.id,    aws_subnet.private_b.id,    aws_subnet.private_c.id,  ]  target_group_arns         = [aws_lb_target_group.api.arn]  health_check_type         = "ELB"  health_check_grace_period = 300  launch_template {    id      = aws_launch_template.api.id    version = "$Latest"  }  instance_refresh {    strategy = "Rolling"    preferences { min_healthy_percentage = 50 }  }}

min_size=2 spread across three AZs with rolling instance refresh (WAF REL 6 and REL 8 compliant)

Other compute findings: ECS services with desired_count = 1 on production tasks, Lambda functions with timeout values shorter than their typical execution duration (causing silent truncation), and EC2 instances using$Latest launch template version rather than a pinned version (which can cause uncontrolled changes during Auto Scaling events).

Cost anomaly patterns

Cost review covers resource tagging (necessary for attribution), instance type and size selection, data transfer architecture, and the presence of cost monitoring resources (AWS Budgets, CloudWatch billing alarms).

The most universal cost finding is missing tags. Without consistent tagging, cost cannot be attributed to the service, team, or environment that generated it — making cost optimisation decisions impossible.

main.tf✗ Before
# main.tf — resources without cost allocation tags# No Checkov rule enforces tag presence on all resource types# WAF finding: COST 1 — cannot attribute spend to service, team, or environmentresource "aws_instance" "worker" {  ami           = "ami-0abcdef1234567890"  instance_type = "c5.xlarge"  # No tags block}resource "aws_db_instance" "main" {  identifier     = "prod-db"  engine         = "postgres"  instance_class = "db.r6g.large"  # No tags block}resource "aws_elasticache_cluster" "session" {  cluster_id    = "session-cache"  engine        = "redis"  node_type     = "cache.t4g.medium"  num_cache_nodes = 1  # No tags block}

No cost allocation tags — spend cannot be attributed to service, team, or environment (WAF COST 1)

main.tf✓ After
# main.tf — consistent tagging for cost attribution (WAF COST 1)locals {  common_tags = {    Environment = "production"    Service     = "order-api"    Team        = "platform"    CostCenter  = "eng-platform"    ManagedBy   = "terraform"  }}resource "aws_instance" "worker" {  ami           = "ami-0abcdef1234567890"  instance_type = "c5.xlarge"  tags          = local.common_tags}resource "aws_db_instance" "main" {  identifier     = "prod-db"  engine         = "postgres"  instance_class = "db.r6g.large"  tags           = local.common_tags}resource "aws_elasticache_cluster" "session" {  cluster_id      = "session-cache"  engine          = "redis"  node_type       = "cache.t4g.medium"  num_cache_nodes = 1  tags            = local.common_tags}

Consistent tagging via a locals block — applied to all resources from a single definition

Observability and operational readiness

Observability review covers CloudWatch alarms, log group configuration (retention periods and structured format), SNS topic alerting chains, and Lambda Dead Letter Queue configuration. This is the domain where linting coverage is effectively zero: no Checkov rule checks for the presence of CloudWatch alarms on any resource type.

lambda.tf✗ Before
# lambda.tf — payment processing function with no CloudWatch alarms# Checkov has no rule for missing metric alarms on Lambda functions# WAF finding: OPS 7 — no mechanism to detect or recover from operational eventsresource "aws_lambda_function" "payment_processor" {  function_name = "payment-processor"  runtime       = "python3.12"  handler       = "handler.process"  role          = aws_iam_role.lambda_role.arn  filename      = data.archive_file.lambda_zip.output_path  # Error rate, throttle rate, and duration are unmonitored.  # Silent failures will not page anyone.}

Payment-critical Lambda with no alarms — errors and throttles are invisible until a customer reports a problem

lambda.tf✓ After
# lambda.tf — Lambda with error and throttle alarms (WAF OPS 7)resource "aws_lambda_function" "payment_processor" {  function_name = "payment-processor"  runtime       = "python3.12"  handler       = "handler.process"  role          = aws_iam_role.lambda_role.arn  filename      = data.archive_file.lambda_zip.output_path}resource "aws_cloudwatch_metric_alarm" "payment_errors" {  alarm_name          = "payment-processor-errors"  comparison_operator = "GreaterThanThreshold"  evaluation_periods  = 1  metric_name         = "Errors"  namespace           = "AWS/Lambda"  period              = 60  statistic           = "Sum"  threshold           = 5  alarm_actions       = [aws_sns_topic.alerts.arn]  dimensions          = { FunctionName = aws_lambda_function.payment_processor.function_name }}resource "aws_cloudwatch_metric_alarm" "payment_throttles" {  alarm_name          = "payment-processor-throttles"  comparison_operator = "GreaterThanThreshold"  evaluation_periods  = 1  metric_name         = "Throttles"  namespace           = "AWS/Lambda"  period              = 60  statistic           = "Sum"  threshold           = 0  alarm_actions       = [aws_sns_topic.alerts.arn]  dimensions          = { FunctionName = aws_lambda_function.payment_processor.function_name }}

Error rate and throttle alarms, each with an SNS action for alerting (WAF OPS 7 compliant)

How to Conduct a Terraform Architecture Review

A repeatable review process has six steps. Following them in order prevents the most common failure mode: conflating linting findings with architectural findings and producing a report that mixes the two without distinguishing severity or remediation effort.

1

Collect your Terraform

Gather all .tf files, module directories, and the relevant tfvars file for the environment being reviewed. If you use Terraform workspaces, document the workspace. The Terraform state file is optional but useful — it reveals actual deployed values versus planned values, which can differ if the state has drifted.

Include everything that describes the target environment: root config, modules called by the root, and shared modules used across services. Exclude .tf files for other environments (dev.tfvars is not relevant when reviewing production).

2

Describe your workload context

Before evaluating any Terraform, document the workload context: service type (API, data pipeline, event processor), environment, SLA (what is the recovery time objective? the recovery point objective?), data sensitivity classification (PII, financial, public), and which WAF pillars matter most. A payments service prioritises Security and Reliability. A batch analytics pipeline might prioritise Cost and Performance Efficiency.

This step is the one that separates an architectural review from a linting run. Without workload context, severity cannot be assigned meaningfully — the same configuration can be Low in dev and Critical in prod.

3

Run your linter first

Run Checkov, Trivy, or tfsec against the collected Terraform before architectural review begins. Resolve any Critical or High linter findings. An architectural review is most useful when the baseline configuration is already clean — mixing linter findings with architectural findings in the same report makes prioritisation harder.

4

Review against each WAF pillar

Work through the six WAF pillars systematically. For each pillar, evaluate the Terraform against the high-impact controls for that domain (see the checklist below for a starting list). Record each finding: the control it maps to, the current state in Terraform, and whether it is a configuration issue (fixable with an attribute change) or an architectural issue (requires a new resource or design change).

Prioritise Security and Reliability for a first review. Cost, Operational Excellence, Performance Efficiency, and Sustainability are important but rarely carry the immediate blast radius of Security and Reliability findings.

5

Prioritise by severity and blast radius

Assign each finding a severity using workload context. Critical: finding that, if exploited or triggered, could result in data breach, regulatory breach, or complete service unavailability. High: finding that materially increases risk or would result in extended downtime. Medium: finding that should be remediated before the next major release. Low: best-practice gap with no immediate risk.

Blast radius is key: map the lateral movement path. If this IAM role is compromised, which other resources are reachable? If this AZ fails, which services lose availability? Severity is a function of both likelihood and blast radius.

6

Produce a stakeholder report

Document findings in a structured report: executive summary (overall posture, top three findings, recommended immediate actions), findings table (pillar, control, severity, current state, remediation HCL, effort estimate), and an appendix listing the Terraform files reviewed. Distinguish "fix before next deploy" (Critical/High) from "backlog" (Medium/Low). The report should be readable by a non-technical stakeholder without losing the technical detail needed by the engineer implementing the fixes.

Terraform Architecture Review Checklist

The following checklist covers the highest-impact controls across the four primary WAF pillars for most Terraform workloads. It is not exhaustive — the full AWS Well-Architected Framework contains over 300 controls — but it covers the findings that appear most frequently in production Terraform reviews.

Security

  • IMDSv2 enforced on all EC2 instances (http_tokens = "required")
  • No IAM roles with Action: "*" or Resource: "*" wildcards
  • S3 buckets have server-side encryption and public access blocks
  • Security groups do not allow 0.0.0.0/0 on port 22 or 3389
  • No secrets or credentials in Terraform variables or state
  • VPC Flow Logs enabled
  • CloudTrail enabled in all regions
  • KMS key rotation enabled for customer-managed keys

Reliability

  • RDS instances have multi_az = true for production workloads
  • Auto Scaling Groups span at least two AZs with min_size ≥ 2
  • Lambda functions processing async events have dead_letter_config
  • Backup retention configured for all stateful resources
  • Health checks configured on all load balancer target groups
  • RDS deletion_protection = true on production databases
  • No single points of failure in the critical path

Cost Optimization

  • All resources have cost allocation tags (Environment, Service, Team)
  • No unattached EBS volumes or idle Elastic IPs
  • EC2 instance types right-sized for workload requirements
  • S3 lifecycle policies configured for infrequently accessed data
  • Reserved Instances or Savings Plans for steady-state compute
  • CloudWatch billing alarms configured

Operational Excellence

  • CloudWatch alarms on Lambda error rate and throttle count
  • ALB 5xx error rate alarm configured
  • RDS CloudWatch alarms for CPU, connections, and storage
  • Structured logging enabled (CloudWatch Logs, no stdout-only)
  • Runbooks linked from Terraform outputs or documentation
  • Terraform state stored remotely with locking (S3 + DynamoDB)

Common Terraform Architecture Mistakes

These five patterns appear consistently in production Terraform reviews. All five pass standard linting. All five carry real architectural risk.

01

IMDSv1 still enabled on EC2 instances

Security

EC2 instances without an explicit metadata_options block default to IMDSv1, which does not require authentication. The Capital One breach (2019) used SSRF to reach the IMDSv1 endpoint and retrieve IAM credentials from an EC2 role. A Checkov rule (CKV_AWS_79) now exists for this — but only if you are running Checkov with all checks enabled.

See the before/after HCL fix
02

IAM roles with overly broad permissions (not necessarily wildcards)

Security

CKV_AWS_40 catches wildcard actions. It does not catch a role that has 12 specific actions where the workload only needs 3. Every extra permission extends the blast radius if the role is compromised. Architectural review maps what the Lambda or EC2 actually calls versus what the role allows.

03

Single-AZ RDS for a production database

Reliability

multi_az = false is the default. No linter checks it. For production databases, single-AZ means an AZ failure or an AWS infrastructure event in that AZ takes the database offline. RDS multi-AZ failover typically completes in 60–120 seconds; restoring from a backup snapshot takes 15–40 minutes. For a service with an SLA, this is a Critical finding.

See the multi-AZ example
04

Lambda processing async events without a Dead Letter Queue

Operational Excellence

AWS retries asynchronous Lambda invocations twice on failure by default. After the third attempt, the event is silently discarded. For a function processing payment events, order confirmations, or notifications, this means lost data with no visibility. Checkov has no rule for the absence of dead_letter_config.

See the DLQ example
05

No CloudWatch alarms on critical-path services

Operational Excellence

Terraform that defines a Lambda function, an ECS service, or an RDS database with no associated CloudWatch alarm resources is operational without observability. No alarm = no page = engineers discover the problem when customers report it. This is entirely invisible to linters — it is the absence of resources, not the misconfiguration of present ones.

Tools for Terraform Architecture Review

No single tool covers the full scope of a Terraform architecture review. The current landscape splits across two categories: static linters (fast, free, CI-native, attribute-level) and architectural review tools (contextual, WAF-aligned, workload-aware). The right answer for most teams is both layers, not a choice between them.

Terraform security and review tools (2026)

Comparison of Terraform security and review tools
ToolTypeWAF CoverageOutputPrice
CheckovConfig linterSecurity (partial)CLI / JSONFree (OSS)
TrivyMulti-scannerSecurity (partial)CLI / JSONFree (OSS)
tfsecConfig linterSecurity (partial)CLI / JSONDeprecated → use Trivy
Snyk IaCConfig linterSecurity (partial)Dashboard$25+/dev/mo
AWS Trusted AdvisorRuntime checksPartial (4 pillars)Console$100+/mo support
TerrascanConfig linterSecurity (partial)CLI / JSONArchived Nov 2025
ArchGuardArch reviewYes (4 pillars)PDF report$49–$399/mo

Checkov — open source, maintained by Bridgecrew (Prisma Cloud), the most widely used Terraform linter. Over 1,000 checks for Terraform, fast CI integration, SARIF output support for GitHub Advanced Security. The right tool for the attribute-level linting layer. Use it.

Trivy — broad scanner covering container images, Terraform IaC, OS packages, and SBOM. Absorbed the tfsec IaC capability in 2023. Note: in March 2026, a supply chain attack on Trivy’s GitHub Actions distribution affected 75 of 76 published tags — if you use Trivy in CI, pin to a specific verified digest rather than a floating tag.

Terrascan — archived by Tenable in November 2025 and no longer maintained. Teams using Terrascan should migrate to Trivy or Checkov.

AWS Trusted Advisor — runtime checks against your deployed AWS environment. Useful for catching live issues that Terraform does not describe (idle resources, underutilised instances). Not Terraform-native and requires Business or Enterprise support tier for the full check library.

ArchGuard — uploads your Terraform and returns a WAF-aligned PDF report covering Security, Reliability, Cost, and Operational Excellence findings with contextual severity based on your workload. Designed to complement, not replace, your existing linting layer. See a sample report.

Frequently asked questions

What is the difference between a Terraform architecture review and an AWS Well-Architected Review?

An AWS Well-Architected Review (WA Review) is a structured interview process using the AWS Well-Architected Tool — it asks your team questions and records answers. A Terraform architecture review reads your actual infrastructure code and evaluates it against the same Well-Architected pillars. The two are complementary: the WA Review captures intent; the Terraform review verifies implementation.

How is a Terraform architecture review different from running Checkov or Trivy?

Checkov and Trivy apply deterministic rules to individual resource attributes — they check whether a specific attribute is set to a known-good value. A Terraform architecture review evaluates relationships between resources, maps blast radius, considers workload context (is this prod or dev?), and flags patterns that emerge from absent resources. The two approaches are complementary, not competing.

Which Terraform files should I include in an architecture review?

Include all .tf files that describe the target environment: main.tf, variables.tf, outputs.tf, and any module directories. Include the relevant tfvars file for the environment being reviewed — production.tfvars, not dev.tfvars, if you are reviewing production. The Terraform state file is optional but helps when reviewing actual deployed values versus planned values.

Do I need to run Checkov before an architecture review?

Yes — run your linter first and resolve critical findings before an architectural review. An architecture review is most useful when the baseline configuration is already clean. Mixing linter findings with architectural findings in the same report makes it harder to prioritise what to fix first.

How often should I run a Terraform architecture review?

At minimum: before the first production deployment of a new service, before major infrastructure changes (new VPC, new account structure, new data store), and annually for stable services. Higher-risk workloads — payment processing, healthcare data, multi-tenant SaaS — warrant quarterly reviews. AWS recommends re-running a Well-Architected Review after any significant workload change.

Can I review Terraform modules rather than just root configurations?

Yes — and modules are where architectural patterns are most important to review, because a misconfiguration in a shared module propagates to every service that consumes it. Review modules with the same checklist as root configurations. Pay particular attention to module output contracts: what does the module expose and what can a consumer override?

What does a Terraform architecture review report contain?

A well-structured report contains: an executive summary (overall posture, pillar scores, top three findings), a findings section organised by WAF pillar (each finding with severity, current state, WAF control reference, remediation HCL, and effort estimate), and an appendix with the full Terraform inventory reviewed. See the ArchGuard sample report for a concrete example.

AI-Powered Review

Get an AI-Powered Review of Your Terraform

Upload your Terraform and get a Well-Architected report covering Security, Reliability, Cost, and Operational Excellence — with prioritised findings, HCL remediation examples, and a PDF you can share with stakeholders.

Upload your Terraform. Receive a structured findings report in minutes.