Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Chapter 10: Kubernetes at Scale - Google Kubernetes Engine (GKE)

Google is the birthplace of Kubernetes. Born from Project Borg, Google’s internal container orchestrator, Kubernetes was donated to the CNCF and became the world’s standard. Google Kubernetes Engine (GKE) is the most mature, automated, and integrated managed Kubernetes service in the world.

1. GKE Architecture: The Foundation

Control Plane: Zonal vs. Regional

  • Zonal Clusters: A single control plane in one zone. If the zone goes down, the control plane is inaccessible (though your nodes keep running). SLA: 99.5%.
  • Regional Clusters (Production Standard): Three control planes distributed across three zones in a region. This ensures your API server is always available, even during a zonal outage or Google-initiated upgrades. SLA: 99.95%.

Release Channels

GKE offers three release channels to balance stability and features:
  • Rapid: For early adopters and testing.
  • Regular (Default): A balance of stability and new features.
  • Stable: For mission-critical production workloads.

2. Operation Modes: Autopilot vs. Standard

Choosing between Autopilot and Standard is a choice between Operational Simplicity and Total Control.

2.1 GKE Autopilot (The SRE’s Dream)

In Autopilot mode, Google manages the entire cluster infrastructure, including node provisioning, scaling, and security hardening. Think of it as “Kubernetes without the Kubernetes operations.” AWS’s closest equivalent is EKS with Fargate (serverless pods), while Azure offers AKS with Virtual Nodes.
  • The “Pod-Only” Contract: You define your pods; Google ensures they have a place to run. You never see, manage, or SSH into the underlying nodes.
  • Security by Default: Enforces the GKE Hardening Guide (e.g., no privileged containers, mandatory NET_RAW removal). This means you cannot bypass security for convenience, which is a feature, not a limitation.
  • Billing: You are billed per-pod (CPU, RAM, and ephemeral storage requested). You pay for what you use, not the idle space on nodes. Minimum request is 250m CPU and 512MiB RAM per pod.
  • Ideal For: Teams that want to focus on code rather than Kubernetes cluster maintenance.
Cost Comparison: In GKE Standard, if your pods use 60% of node resources on average, you are wasting 40% of your node spend. In Autopilot, Google handles bin-packing, so you pay only for requested pod resources. For many workloads, Autopilot is 15-30% cheaper than poorly optimized Standard clusters. However, for teams that are expert at bin-packing, Standard can be cheaper because Autopilot adds a per-pod management premium.

2.2 GKE Standard (The Architect’s Choice)

In Standard mode, you manage the node pools (GCE Managed Instance Groups).
  • Full Control: You can customize kernel parameters, use privileged containers, and install custom drivers.
  • Hardware Flexibility: Required for GPUs, TPUs, Local SSDs, or Sole-Tenant nodes.
  • Bin-Packing Efficiency: If you are an expert at optimizing pod density, you can often achieve lower costs than Autopilot by manually managing large node pools.
  • Billing: You pay for the underlying Compute Engine VMs.

2.3 Decision Matrix: Principal’s Guide

RequirementAutopilotStandard
Operational OverheadLow (Google manages nodes)High (You manage node pools)
Custom Kernel ModulesNoYes
GPU / Machine LearningYes (Select regions)Yes (Full control)
Windows ContainersNoYes
Privileged ContainersNo (Security risk)Yes
Cost ModelPay-per-Pod (Predictable)Pay-per-Node (Optimization needed)

3. GKE Networking: Andromeda, PSC, and Multi-Cluster

3.1 VPC-Native Clusters (Alias IP)

Modern GKE clusters use VPC-Native networking. This is the foundation for all high-performance Kubernetes networking in GCP.
  • The Alias IP Mechanism: Every pod is assigned an IP from a secondary range in the VPC subnet. Unlike overlay networks (flannel, calico-vxlan), there is no packet encapsulation overhead.
  • Andromeda Integration: The VPC-Native pod IPs are “known” to the underlying Andromeda SDN. This allows for direct routing at the hardware level, bypassing the host kernel’s bridge for most traffic.
  • Connectivity: Pods can reach any other VPC resource (Cloud SQL, VMs) without NAT.

3.2 Private Service Connect (PSC) for GKE

PSC allows you to expose GKE services to other VPCs or projects privately, without VPC Peering or VPNs.
  • Service Attachments: You create a Service Attachment in the GKE project.
  • Endpoints: Consuming projects create a PSC Endpoint (an internal IP) that routes traffic directly to your GKE Internal Load Balancer.

3.3 Multi-Cluster Ingress (MCI) and Gateway

For global applications, MCI uses a single Global External HTTP(S) Load Balancer to route traffic to multiple clusters in different regions.
  • ClusterSet: A logical grouping of clusters.
  • MCI Controller: A managed service that synchronizes MultiClusterIngress and MultiClusterService resources across the set.

4. Security: The Defense-in-Depth Model

4.1 Workload Identity (The Principal Standard)

Already covered in Section 4.1, but here is the architectural why: Without Workload Identity, you would use JSON keys stored as K8s Secrets. These are “static” and “unmanaged.” Workload Identity provides short-lived tokens, eliminating the risk of key theft. Think of Workload Identity like a hotel key card system. Instead of giving every guest (pod) a permanent master key (JSON key file) that could be copied and used anywhere, you give them a temporary key card (short-lived token) that only works for their room (specific GCP resources) and expires automatically. AWS has an equivalent feature called “IAM Roles for Service Accounts” (IRSA) for EKS, and Azure uses “Workload Identity” for AKS as well. Common Mistake: Many teams migrating to GKE create a single service account JSON key, store it as a Kubernetes Secret, and share it across all pods. This works but is a ticking time bomb. If that key leaks (via a log, a debug dump, or a compromised pod), every workload using it is compromised. Workload Identity eliminates this risk entirely — there is no key file to steal.

4.2 Binary Authorization

Binary Authorization is a deploy-time security control that ensures only trusted container images are deployed on GKE.
  • Attestations: A “digital signature” created by a CI/CD pipeline (e.g., Cloud Build) after passing security scans.
  • Policy: “Require attestation from ‘Security-Scanner-V1’ for all production deployments.”

4.3 Policy Controller (Config Management)

Based on the Open Policy Agent (OPA) Gatekeeper, Policy Controller lets you enforce “Guardrails” using declarative policies.
  • Example: “Prevent any pod from running with a privileged security context.”
  • Example: “Require all services to have a ‘team-owner’ label.”

4.4 Shielded GKE Nodes

GKE nodes use Shielded VMs to provide:
  • Secure Boot: Ensures only verified software is used during the boot process.
  • Measured Boot: Uses a Virtual Trusted Platform Module (vTPM) to verify the integrity of the node.

5. Storage: Persistent Data in Kubernetes

5.1 Compute Engine Persistent Disk (PD) CSI Driver

The default storage for GKE.
  • Standard/SSD PD: Block storage for databases.
  • Balanced PD: Price/performance sweet spot for general workloads.
  • Regional PD: Synchronously replicated across two zones for High Availability.

5.2 Filestore for GKE

For workloads requiring a Shared File System (NFS), GKE provides the Filestore CSI driver.
  • ReadWriteMany (RWX): Allows multiple pods in different zones to read/write to the same volume.

5.3 Backup for GKE

A fully managed service to protect your GKE stateful workloads.
  • What it Backups: Both Kubernetes manifests (YAMLs) and the actual data in Persistent Disks.
  • Scenario: Accidental deletion of a namespace or a regional disaster.

6. Advanced Scaling and Cost Optimization

6.1 Node Auto-Provisioning (NAP)

While the Cluster Autoscaler adds nodes from existing pools, NAP can create entirely new node pools on the fly.
  • Logic: If a pod requires a specific T2D machine type or a GPU, and no such pool exists, NAP will create one, run the pod, and delete the pool when finished.

6.2 GKE Usage Metering

To solve the “Who is spending what?” problem, usage metering exports granular consumption data (CPU, RAM, Storage, Egress) to BigQuery.
  • Attribution: You can break down costs by Namespace, Label, or Service.

7. Interview Preparation: Architectural Deep Dive

1. Q: What is the primary difference between GKE Autopilot and GKE Standard? A: Autopilot is a fully managed mode where Google manages the nodes, scaling, and security. You pay per-pod. Standard gives you full control over node pools (machine types, GPUs). You pay per-node. Standard is required for custom kernels or specialized hardware like TPUs. 2. Q: How does VPC-Native networking improve performance over Kubenet? A: VPC-Native (Alias IP) assigns VPC IPs directly to pods. This allows the Andromeda SDN to route traffic at the hardware level without packet encapsulation (VXLAN) overhead, reducing latency and increasing throughput for pod-to-pod and pod-to-external communication. 3. Q: Explain the role of Binary Authorization in a secure CI/CD pipeline. A: Binary Authorization ensures that only images that have been signed (attested) by authorized entities (like a vulnerability scanner) can be deployed. It acts as a final gate in the production environment to prevent the execution of untrusted or unverified code. 4. Q: Why use Regional Persistent Disks in GKE? A: Regional PDs synchronously replicate data across two zones in a region. If a zone fails, Kubernetes can quickly re-attach the volume to a node in the second zone without data loss, providing a lower RTO for stateful applications like databases. 5. Q: What is the benefit of the GKE Gateway API over Ingress? A: The Gateway API is more expressive and role-based. It separates the infrastructure concerns (GatewayClass/Gateway) from the application routing (HTTPRoute), enabling better collaboration between SREs and developers and supporting advanced features like cross-namespace routing and multi-cluster traffic management.

Implementation: The “Enterprise Grade” GKE Lab

In this lab, we will build a production-ready GKE Standard cluster using Terraform, including VPC-Native networking, Workload Identity, and a sample application deployment.

Step 1: Terraform Infrastructure

Create a file named main.tf:
resource "google_container_cluster" "primary" {
  name     = "prod-cluster"
  location = "us-central1"   # Regional cluster = 3 control planes across 3 zones = 99.95% SLA

  # Why VPC-Native: Alias IPs give pods real VPC IPs, enabling direct routing via Andromeda SDN.
  # Without this, you get Kubenet (overlay network) which adds encapsulation overhead and breaks
  # features like VPC Service Controls, Private Google Access from pods, and Network Policies.
  networking_mode = "VPC_NATIVE"
  ip_allocation_policy {
    cluster_secondary_range_name  = "pods"      # Pre-create this range in your subnet
    services_secondary_range_name = "services"  # Separate range keeps pod and service IPs isolated
  }

  # Why Workload Identity: Eliminates JSON key files for pod-to-GCP authentication.
  # Without this, teams fall back to mounting SA keys as K8s Secrets -- a major security risk.
  workload_identity_config {
    workload_pool = "${var.project_id}.svc.id.goog"
  }

  # Why remove default pool: The default pool uses the overprivileged default compute SA.
  # We create a custom pool below with specific machine types and a dedicated SA.
  remove_default_node_pool = true
  initial_node_count       = 1    # Required even though we remove it (Terraform quirk)
}

resource "google_container_node_pool" "primary_nodes" {
  name       = "app-pool"
  location   = "us-central1"
  cluster    = google_container_cluster.primary.name
  node_count = 3   # 1 per zone for HA; use autoscaling in production (min 1, max 5 per zone)

  node_config {
    preemptible  = false          # Use true for dev/test to save ~70%
    machine_type = "e2-standard-4"  # 4 vCPU, 16GB RAM -- good starting point for most workloads

    # Why GKE_METADATA: Enables the Workload Identity metadata server on each node.
    # Without this, pods cannot exchange K8s tokens for GCP tokens.
    workload_metadata_config {
      mode = "GKE_METADATA"
    }

    labels = {
      env = "production"   # Used for node affinity and cost allocation
    }
  }
}

Step 2: Deploying a Secure Workload

Apply the Terraform, then connect to the cluster:
gcloud container clusters get-credentials prod-cluster --region us-central1
Create a Kubernetes Service Account and bind it to a GCP Service Account:
# 1. Create a dedicated GCP Service Account for this workload
# Why: Each microservice should have its own SA with minimal permissions.
# Common Mistake: Using the default compute SA -- it has Editor access to the entire project.
gcloud iam service-accounts create gke-sa --display-name="GKE App SA"

# 2. Bind the Kubernetes SA to the GCP SA via Workload Identity
# Why: This tells GCP "when a pod running as K8s SA 'my-app-sa' in the 'default' namespace
# requests a GCP token, give it the permissions of 'gke-sa'."
# The [default/my-app-sa] format is [namespace/ksa-name].
gcloud iam service-accounts add-iam-policy-binding gke-sa@$PROJECT_ID.iam.gserviceaccount.com \
    --role="roles/iam.workloadIdentityUser" \
    --member="serviceAccount:$PROJECT_ID.svc.id.goog[default/my-app-sa]"

# 3. Create the K8s SA and annotate it with the GCP SA email
# Why the annotation: This is how the GKE metadata server knows which GCP SA to impersonate.
# Without it, the pod falls back to the node's SA (which you should also restrict).
kubectl create serviceaccount my-app-sa
kubectl annotate serviceaccount my-app-sa \
    iam.gke.io/gcp-service-account=gke-sa@$PROJECT_ID.iam.gserviceaccount.com

Step 3: Deploying with Helm

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install my-ingress ingress-nginx/ingress-nginx

Pro-Tip: The “Zero-Drip” Upgrade

When upgrading GKE nodes, use Surge Upgrades. GKE will create new nodes with the updated version before deleting old nodes, ensuring your pods have a destination to move to, maintaining 100% availability during the maintenance window.