Master Terraform and Kubernetes with IaC for EKS, GKE, and AKS

When you bring Terraform and Kubernetes together, you create a single, declarative workflow for managing the entire lifecycle of your infrastructure and the applications running on it. This powerful pairing uses Infrastructure as Code (IaC) to automate everything from provisioning a cloud-managed cluster like EKS or GKE to deploying complex workloads, guaranteeing a setup that’s consistent, repeatable, and fully auditable.

Table of Contents

Why Pair Terraform and Kubernetes

Pairing Terraform and Kubernetes gives you a unified strategy for managing modern cloud-native systems. At its heart, Terraform is brilliant at provisioning and managing the underlying infrastructure—the virtual machines, networking, and the managed Kubernetes services themselves. Kubernetes, on the other hand, is the master of orchestrating the applications that live on top of that infrastructure.

Using them together solves a classic “chicken-and-egg” problem. You can’t run your applications without a Kubernetes cluster, but building that cluster by hand is slow, tedious, and prone to human error. Terraform neatly bridges this gap by defining the cluster’s entire architecture as code. This means your whole environment, from the VPC all the way to the final application deployment, is version-controlled and fully automated.

The Strategic Advantages of This Combination

Adopting this combined approach brings several major benefits, especially for platform and security teams:

Unified Workflow: You can manage both the cluster and its applications within the same IaC framework. This cuts down on tool sprawl and simplifies your operational overhead.
Provider Agnosticism: Terraform’s huge provider ecosystem lets you define your infrastructure once and deploy it across different cloud providers with minimal tweaks. This helps you avoid getting locked into a single vendor. For a deeper dive, check out our guide comparing Terraform to cloud-specific tools.
Enhanced Security and Compliance: With IaC, every change to your infrastructure leaves an auditable trail. This is a massive win for regulatory compliance, as every single modification is documented and tracked in version control.

This decision tree helps visualise that first big choice: which cloud provider’s Kubernetes service is right for your team?

As the flowchart shows, the best choice often comes down to where your existing infrastructure lives and what your team is already comfortable with, pointing you toward Amazon EKS, Google GKE, or Azure AKS.

Choosing Your Kubernetes Platform for Terraform

Selecting the right managed Kubernetes service is a critical first step. Each platform has its own nuances when it comes to Terraform integration, management overhead, and ideal fit. This table breaks down the key differences to help you decide.

Platform (EKS, GKE, AKS)	Terraform Provider Maturity	Management Responsibility	Ideal Use Case
Amazon EKS	Highly mature; extensive community modules and official AWS support.	AWS manages the control plane; users manage worker nodes and networking.	Teams heavily invested in the AWS ecosystem seeking deep integration with services like IAM and VPC.
Google GKE	Very mature and stable; often the first to support new Kubernetes features.	Google offers flexible management, from control plane only to fully automated "Autopilot" mode.	Organisations prioritising operational simplicity and cutting-edge Kubernetes features.
Azure AKS	Mature and well-integrated with the Azure ecosystem. Strong support for Windows containers.	Azure manages the control plane; flexible node pool and networking options for users.	Enterprises standardised on Azure, especially those with mixed Windows/Linux workloads or needing Azure AD integration.

Ultimately, all three major cloud providers offer robust and well-supported Terraform providers. Your decision will likely hinge more on your team’s existing cloud skills and strategic vendor relationships than on any major technical limitation.

Meeting Modern Compliance Demands

This automated, code-driven approach isn’t just a “nice-to-have”—it’s becoming essential for meeting strict regulations like the EU’s Cyber Resilience Act (CRA). In the European Union, Kubernetes adoption has skyrocketed, with 69% of companies now running it in production. This growth is partly driven by rules like GDPR, which steer companies toward local, compliant cloud providers.

For manufacturers of connected devices, pairing Terraform with Kubernetes is a game-changer. It dramatically simplifies the process of deploying secure firmware updates, a core requirement of the CRA, by providing a repeatable and auditable delivery pipeline.

Practical Example: Provisioning an AKS Cluster with Terraform

Alright, let’s get our hands dirty and spin up our first managed Kubernetes cluster with Terraform. Theory is great, but nothing beats seeing it in action. For this guide, we’ll use Azure Kubernetes Service (AKS) because it’s a common setup you’ll find in the wild. Our goal is simple: understand how Terraform resources fit together to create a fully working cluster.

The process involves telling Terraform how to talk to Azure, defining the essential building blocks like a resource group and the AKS cluster itself, and then safely getting the credentials we need to connect to it. By splitting our code into main.tf, variables.tf, and outputs.tf, we’re setting ourselves up with a clean, reusable project right from the start.

This workflow shows exactly what we’re about to do—use Terraform to instruct Azure to build the cluster, which paves the way for deploying our applications later.

Defining Core Infrastructure in `main.tf`

The main.tf file is the heart of our configuration. It’s where we declare which provider we’re using and define the actual cloud resources to be created. Here, we’ll define a resource group to neatly contain our assets and then the AKS cluster itself, specifying details like the node count and VM size.

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~>3.0"
    }
  }
}

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "aks_rg" {
  name     = var.resource_group_name
  location = var.location
}

resource "azurerm_kubernetes_cluster" "aks_cluster" {
  name                = var.cluster_name
  location            = azurerm_resource_group.aks_rg.location
  resource_group_name = azurerm_resource_group.aks_rg.name
  dns_prefix          = var.dns_prefix

  default_node_pool {
    name       = "default"
    node_count = var.node_count
    vm_size    = "Standard_DS2_v2"
  }

  identity {
    type = "SystemAssigned"
  }
}

I’ve kept this configuration intentionally simple. It uses a system-assigned identity to keep things straightforward and sets up a default node pool, which is perfect for getting started. In a real-world production setup, you’d almost certainly define more complex networking rules and multiple, specialised node pools for different workloads.

Using Variables for Flexibility in `variables.tf`

Hardcoding values like names and locations directly into main.tf is a recipe for headaches down the line. That’s what variables.tf is for. It lets us separate these configuration values, making our code much more reusable and easier to manage across different environments like development, staging, and production.

variable "resource_group_name" {
  description = "The name of the resource group for the AKS cluster."
  type        = string
  default     = "aks-demo-rg"
}

variable "location" {
  description = "The Azure region where resources will be created."
  type        = string
  default     = "West Europe"
}

variable "cluster_name" {
  description = "The name of the AKS cluster."
  type        = string
  default     = "my-aks-cluster"
}

variable "dns_prefix" {
  description = "The DNS prefix for the AKS cluster."
  type        = string
  default     = "myakscluster"
}

variable "node_count" {
  description = "The number of nodes in the default node pool."
  type        = number
  default     = 2
}

Exposing Outputs with `outputs.tf`

Once Terraform has done its job and the cluster is running, we need a way to actually connect to it. The outputs.tf file is where we define values that get displayed after a successful terraform apply. We’ll use it to export the cluster’s kubeconfig, which holds all the credentials kubectl needs.

Security Tip: The kube_config_raw output is marked as sensitive = true. This is a critical step; it tells Terraform not to print the raw credentials in the console log, which significantly reduces the risk of them being accidentally exposed in CI/CD logs or shell history.

output "kube_config_raw" {
  description = "Raw Kubeconfig to connect to the AKS cluster."
  value       = azurerm_kubernetes_cluster.aks_cluster.kube_config_raw
  sensitive   = true
}

With these three files in place, you’re ready to go. Just run terraform init, terraform plan, and terraform apply to create your first Kubernetes cluster entirely through code.

Managing Applications on Kubernetes with Terraform

Once your Kubernetes cluster is up and running, the real work begins: managing the applications and services you deploy on it. This is where combining Terraform and Kubernetes really starts to pay off. You have two main routes for declaring your application’s desired state as code: defining resources natively with the Kubernetes provider or managing pre-packaged applications with the Helm provider.

The choice you make here isn’t just a technical detail; it will shape your team’s entire workflow for application management. One path gives you fine-grained, resource-by-resource control, while the other offers a much faster way to handle complex applications with many moving parts.

Defining Resources Directly with the Kubernetes Provider

The most straightforward way to manage Kubernetes objects is by using Terraform’s official Kubernetes provider. This approach means writing HCL to define standard resources like Deployments, Services, and ConfigMaps, just like you would for your cloud infrastructure. It gives you maximum control.

For instance, if you wanted to deploy a simple Nginx web server, you would define a kubernetes_deployment and then expose it with a kubernetes_service.

resource "kubernetes_deployment" "nginx_app" {
  metadata {
    name = "nginx-deployment"
    labels = {
      App = "Nginx"
    }
  }

  spec {
    replicas = 2
    selector {
      match_labels = {
        App = "Nginx"
      }
    }
    template {
      metadata {
        labels = {
          App = "Nginx"
        }
      }
      spec {
        container {
          image = "nginx:1.21.0"
          name  = "nginx"
        }
      }
    }
  }
}

resource "kubernetes_service" "nginx_service" {
  metadata {
    name = "nginx-service"
  }
  spec {
    selector = {
      App = kubernetes_deployment.nginx_app.spec.0.template.0.metadata.0.labels.App
    }
    port {
      port        = 80
      target_port = 80
    }
    type = "LoadBalancer"
  }
}

This method is perfect for your own custom-built applications or anytime you need to tweak individual resource properties with absolute precision. To make sure you’re managing these resources reliably, it’s always a good idea to stick to solid Infrastructure as Code best practices.

Using the Helm Provider for Packaged Applications

For more complex, off-the-shelf software like monitoring stacks or databases, defining every single resource by hand quickly becomes a massive headache. This is exactly what the Terraform Helm provider was built for. Helm bundles applications into reusable packages called “charts,” and this provider lets you deploy them straight from your Terraform code.

Let’s take the popular Prometheus monitoring stack as an example—a complex application with many different components. With the Helm provider, deploying it is surprisingly simple.

provider "helm" {
  kubernetes {
    config_path = "~/.kube/config"
  }
}

resource "helm_release" "prometheus" {
  name       = "prometheus"
  repository = "https://prometheus-community.github.io/helm-charts"
  chart      = "prometheus"
  namespace  = "monitoring"
  create_namespace = true
}

That small block of HCL deploys the entire Prometheus chart, including its server, Alertmanager, and various exporters. You don’t have to write dozens of individual resource definitions. It just works.

Adopting these IaC patterns isn’t just about making your life easier; it’s a strategic move. The link between Terraform and Kubernetes is so strong that its Kubernetes provider saw 471 dependent repos by late 2025, a clear sign of deep community adoption and a mature ecosystem.

Which Approach Is Right for You?

So, should you use native Kubernetes resources or Helm charts? It all comes down to what you’re trying to do.

Use the Kubernetes Provider when:
- You’re deploying your own custom applications.
- You need granular control over every single resource attribute.
- You’re managing simple, standalone services.
Use the Helm Provider when:
- You’re deploying complex, third-party software like Prometheus, Grafana, or a database.
- You want to tap into community-maintained application packages.
- Your main goal is to get a complete software stack up and running quickly.

In practice, many teams end up using a hybrid approach. They use the Kubernetes provider for their own bespoke services and the Helm provider for all the ecosystem tools that support them. This gives you the best of both worlds: tight control where you need it and the convenience of packaged software where you don’t.

Integrating Terraform into Your CI/CD Pipeline

Let’s be honest: manual deployments just don’t cut it anymore. If you want the speed and reliability that modern development demands, automating your infrastructure workflows is non-negotiable. This is where a solid CI/CD pipeline becomes the backbone of your Terraform and Kubernetes setup, turning your Infrastructure as Code into a self-service, completely auditable process.

The fundamental idea is simple: treat your infrastructure changes exactly like you treat application code. When a developer opens a pull request with their Terraform changes, it should automatically trigger a terraform plan. This simple step lets the entire team review the exact impact of those changes before they ever get applied, catching potential headaches early on.

A Practical Example with GitHub Actions

So, how does this look in practice? Let’s walk through a common workflow using GitHub Actions. The pipeline I’m about to show you has two main triggers: one for pull requests and another for merges into the main branch. The whole point is to fully automate the plan-and-apply cycle.

This example workflow file, which you’d place at .github/workflows/terraform.yml, shows just how to set this up.

name: 'Terraform CI/CD'

on:
  pull_request:
    branches: [ main ]
  push:
    branches: [ main ]

jobs:
  terraform:
    name: 'Terraform'
    runs-on: ubuntu-latest

    steps:
    - name: Checkout
      uses: actions/checkout@v3

    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v2
      with:
        terraform_version: 1.5.0

    - name: Configure AWS Credentials
      uses: aws-actions/configure-aws-credentials@v2
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: eu-west-1

    - name: Terraform Init
      run: terraform init

    - name: Terraform Plan
      if: github.event_name == 'pull_request'
      run: terraform plan -no-color

    - name: Terraform Apply
      if: github.event_name == 'push' && github.ref == 'refs/heads/main'
      run: terraform apply -auto-approve

This workflow is smart enough to differentiate its behaviour using conditional steps. When a pull request is opened, it runs terraform plan, posting the proposed changes right back into the PR comments for review. Once that PR is approved and merged into the main branch, the workflow kicks in again, this time running terraform apply -auto-approve to provision the changes for real.

Managing State and Environments

For any of this automation to work safely, you absolutely cannot store your Terraform state file locally on your laptop. A shared, remote backend is a must.

Here’s what that means in practice:

Shared State: Using a backend like an S3 bucket (with DynamoDB for state locking) is the standard for a reason. It ensures that every CI/CD run and every developer is working from the same, up-to-date state file, which is crucial for preventing conflicts and overwrites.
Environment-Specific Configurations: The best way to handle different environments (like dev, staging, and prod) is with separate Terraform workspaces or distinct directory structures. This lets you pass different variable files (.tfvars) to each environment, tailoring deployments without duplicating a ton of code. For a deeper dive on project structure, our guide on https://goregulus.com/cra-basics/git-ci-cd/ is a great resource.

By automating infrastructure changes through a well-defined pipeline, you’re establishing a proper GitOps-style workflow. The main branch becomes the undisputed source of truth for your infrastructure’s desired state, making your entire system far more transparent and predictable.

If you’re really focused on maximising efficiency and cutting down on operational overhead, it’s worth exploring strategies for building zero-maintenance CI/CD pipelines. And one last critical point: remember to store all your secrets—like cloud provider credentials—securely within your CI/CD system’s secrets manager. Never, ever commit them to your code repository.

Advanced Security and Operational Practices

Getting a cluster up and running is one thing; keeping it stable, secure, and manageable is another challenge entirely. Once you move past the initial deployment, the real-world operational realities of drift, security, and scale come into focus.

This is where your Infrastructure as Code strategy truly proves its worth, separating a quick proof-of-concept from a production-grade system. For platform and security teams, having solid processes to handle these complexities isn’t optional—it’s essential.

One of the most common headaches is state drift. This happens when the actual state of your infrastructure falls out of sync with what’s defined in your Terraform code. It can be triggered by anything from an emergency manual fix during an outage to a change made by another automation tool.

The best way to wrangle drift is by making terraform plan a routine part of your workflow. Running it regularly against your live environment acts as your detection system, immediately flagging any discrepancies between your code and reality. You get a clean, actionable report of what needs to change to bring everything back into line.

Managing RBAC and Security Policies

A huge part of Kubernetes security is controlling who can do what. Role-Based Access Control (RBAC) is the native Kubernetes way to handle permissions, and managing it directly with Terraform is a game-changer. It puts your access policies under version control, making them auditable right alongside your infrastructure.

For instance, if you need to create a simple read-only role for a specific namespace, you can define it cleanly with kubernetes_role and kubernetes_role_binding resources in your HCL.

resource "kubernetes_role" "pod-reader" {
  metadata {
    name      = "pod-reader"
    namespace = "production-data"
  }

  rule {
    api_groups = [""]
    resources  = ["pods", "pods/log"]
    verbs      = ["get", "list", "watch"]
  }
}

resource "kubernetes_role_binding" "read-only-binding" {
  metadata {
    name      = "read-only-binding"
    namespace = "production-data"
  }

  role_ref {
    api_group = "rbac.authorization.k8s.io"
    kind      = "Role"
    name      = kubernetes_role.pod-reader.metadata.0.name
  }

  subject {
    kind      = "User"
    name      = "jane.doe@example.com"
    api_group = "rbac.authorization.k8s.io"
  }
}

But RBAC doesn’t solve every security puzzle. For more sophisticated policy-as-code enforcement, many teams use tools like Open Policy Agent (OPA) Gatekeeper. You can easily deploy Gatekeeper with the Terraform Helm provider, then use the Kubernetes provider to manage its ConstraintTemplates and Constraints. This lets you enforce cluster-wide rules like “all container images must come from our trusted registry.”

Managing security policies as code is a cornerstone of modern compliance. If you want to dive deeper into how code analysis supports this, our overview of static code analysis is a great place to start.

Scaling with Terraform Modules

As your infrastructure grows, a single, monolithic Terraform configuration quickly becomes a maintenance nightmare. The answer is to start thinking in modules. A Terraform module is essentially a reusable, configurable blueprint—a container for a group of resources that are used together.

You could, for example, build a module that defines a standard application deployment, bundling together:

A kubernetes_deployment
A kubernetes_service
A kubernetes_horizontal_pod_autoscaler

With this module in place, your teams can provision a complete, production-ready application stack with just a few lines of code by passing in variables for the image name, port, and replica count.

By modularising your code, you drastically reduce duplication and make your infrastructure easier to maintain and scale. It empowers developers to self-serve common infrastructure patterns while ensuring they adhere to organisational best practices.

Getting Your Hands Dirty: Common Questions

When you start wiring up powerful tools like Terraform and Kubernetes, you’re bound to run into some practical questions. Most teams hit the same roadblocks, from wrestling with state files to figuring out the best way to manage Kubernetes-native resources. Let’s walk through some of the most common issues to smooth out your workflow.

One of the first questions I always hear is: “Should I manage raw Kubernetes manifests (YAML) with Terraform?” While you technically can use the kubernetes_manifest resource to dump raw YAML into your configuration, it’s rarely the best approach. You’re much better off using dedicated resources like kubernetes_deployment or the dedicated Helm provider. They give you proper validation and a structured HCL experience, which is far easier to manage and maintain than trying to embed multi-line YAML strings.

Nailing Provider Authentication

A common sticking point, especially in CI/CD pipelines, is authenticating the Kubernetes provider. It needs a valid kubeconfig file to talk to your cluster, but how do you get it there securely?

The best practice here is to have your infrastructure layer—the Terraform code that provisions the cluster itself—output the necessary credentials. An EKS cluster resource, for example, can output the cluster endpoint, CA certificate, and an authentication token. You then feed these values directly into the Kubernetes provider block in a separate configuration.

The key is to avoid static credential files at all costs. Dynamically generating authentication details and passing them between Terraform configurations or pipeline stages is a far more secure and robust pattern. This completely removes the need to store sensitive kubeconfig files in version control.

Knowing When to Use Terraform vs. Other Tools

Another frequent debate is: “When should I stop using Terraform and switch to a tool like ArgoCD or Flux?” This is a fantastic question, and the answer really comes down to understanding what each tool is built for. They have different strengths, and knowing when to hand off responsibility is crucial.

Terraform is ideal for: Provisioning the foundational infrastructure. Think of it as building the house. It’s perfect for creating the Kubernetes cluster itself and its core services, like an ingress controller or a monitoring stack. It excels at managing resources with a clear, defined lifecycle.
GitOps tools (ArgoCD/Flux) are better for: Continuously deploying and managing the applications running inside the cluster. This is the furniture in the house. These tools are designed for high-velocity application updates and constantly observing the cluster’s live state to enforce the desired configuration.

Most mature teams find a sweet spot by using both. Terraform sets up the cluster and its core systems, and a GitOps tool takes over to manage the applications. For more on structuring these kinds of workflows, you can check out our guide on CI/CD with Git.

This separation of concerns creates a clean, logical workflow. Each tool is used for what it does best, giving you a powerful and maintainable way to manage your entire stack from the metal up to the application.

Navigating compliance for your digital products can be just as complex as your infrastructure. Regulus provides a clear, step-by-step roadmap to prepare for the EU’s Cyber Resilience Act, helping you turn regulatory requirements into an actionable plan. Learn more at https://goregulus.com.

Master Terraform and Kubernetes with IaC for EKS, GKE, and AKS

Why Pair Terraform and Kubernetes

The Strategic Advantages of This Combination

Choosing Your Kubernetes Platform for Terraform

Meeting Modern Compliance Demands

Practical Example: Provisioning an AKS Cluster with Terraform

Defining Core Infrastructure in `main.tf`

Using Variables for Flexibility in `variables.tf`

Exposing Outputs with `outputs.tf`

Managing Applications on Kubernetes with Terraform

Defining Resources Directly with the Kubernetes Provider

Using the Helm Provider for Packaged Applications

Which Approach Is Right for You?

Integrating Terraform into Your CI/CD Pipeline

A Practical Example with GitHub Actions

Managing State and Environments

Advanced Security and Operational Practices

Managing RBAC and Security Policies

Scaling with Terraform Modules

Getting Your Hands Dirty: Common Questions

Nailing Provider Authentication

Knowing When to Use Terraform vs. Other Tools

Related publications

A Practical Guide to CRA CSIRT Reporting Requirements

CRA Substantial modification definition: EU Compliance Guide for 2026

Mastering the CRA Single Reporting Platform for EU Compliance

Master Terraform and Kubernetes with IaC for EKS, GKE, and AKS

Why Pair Terraform and Kubernetes

The Strategic Advantages of This Combination

Choosing Your Kubernetes Platform for Terraform

Meeting Modern Compliance Demands

Practical Example: Provisioning an AKS Cluster with Terraform

Defining Core Infrastructure in main.tf

Using Variables for Flexibility in variables.tf

Exposing Outputs with outputs.tf

Managing Applications on Kubernetes with Terraform

Defining Resources Directly with the Kubernetes Provider

Using the Helm Provider for Packaged Applications

Which Approach Is Right for You?

Integrating Terraform into Your CI/CD Pipeline

A Practical Example with GitHub Actions

Managing State and Environments

Advanced Security and Operational Practices

Managing RBAC and Security Policies

Scaling with Terraform Modules

Getting Your Hands Dirty: Common Questions

Nailing Provider Authentication

Knowing When to Use Terraform vs. Other Tools

Related publications

A Practical Guide to CRA CSIRT Reporting Requirements

CRA Substantial modification definition: EU Compliance Guide for 2026

Mastering the CRA Single Reporting Platform for EU Compliance

Defining Core Infrastructure in `main.tf`

Using Variables for Flexibility in `variables.tf`

Exposing Outputs with `outputs.tf`