Azure Kubernetes Service (AKS)

AKS is Azure’s managed Kubernetes service. Master AKS to deploy modern, cloud-native applications at scale.

What You’ll Learn

By the end of this chapter, you’ll understand:

What containers are and why they exist
Why Kubernetes is needed for managing containers
How AKS simplifies Kubernetes management
How to deploy and scale applications with containers
When to use containers vs VMs vs serverless

Introduction: What Are Containers? (Start Here if You’re Completely New)

The Problem Containers Solve

Have you ever heard this?

“But it works on my machine!” 😤

This is one of software development’s biggest headaches. Let me explain why:

Before Containers: The “Works on My Machine” Problem

Scenario: You built an amazing web application. Your development laptop:

Node.js version 18.15.0
Python 3.10
PostgreSQL 14
Ubuntu 22.04
16 GB RAM

Your colleague’s laptop:

Node.js version 16.14.0 ← Different!
Python 3.9 ← Different!
PostgreSQL 13 ← Different!
Windows 11 ← Different OS!
8 GB RAM ← Different!

What happens?

Your machine: ✅ App runs perfectly
Colleague's machine: ❌ App crashes with dependency errors
Production server: ❌ App won't even start

The root cause: Your app depends on specific versions of libraries, tools, and the operating system. When any of these change, things break.

Real-World Analogy: Shipping Containers

Think about physical shipping containers: Before Shipping Containers (1950s):

Pack goods in boxes, crates, barrels
Different ships need different loading methods
Goods get damaged during transfer
Loading/unloading takes weeks
Ship → Train → Truck = repack everything each time

After Shipping Containers:

Standard 20-foot or 40-foot metal boxes
Works on ships, trains, trucks
Contents protected and isolated
Loading/unloading takes hours, not weeks
Ship → Train → Truck = same container, no repacking

Software containers work the same way:

Package your app + all dependencies in one “box”
Runs identically on any computer with Docker
Your laptop → Colleague’s laptop → Production server = same container

What is a Container?

Container = A lightweight, standalone package that includes:

Your application code
Runtime (Node.js, Python, Java, etc.)
Libraries and dependencies
Configuration files
Operating system files (just what your app needs)

Think of it as: A fully-furnished apartment in a box. Everything your app needs to run is inside.

Container Example: Blog Website

Without Containers (Traditional Setup):

# On production server:
1. Install Node.js 18.15.0 manually
2. Install PostgreSQL 14 manually
3. Configure environment variables
4. Clone your code from Git
5. Install dependencies: npm install
6. Start the app: npm start
7. Hope nothing breaks 🤞

Problems:
- Takes 2-3 hours to setup
- Different versions = different bugs
- New server = redo all steps
- Server updates might break your app

With Containers (Modern Setup):

# Dockerfile (defines your container)
FROM node:18.15.0                    # Start with Node.js 18.15.0
WORKDIR /app                          # Working directory
COPY package*.json ./                 # Copy dependency files
RUN npm install                       # Install dependencies
COPY . .                              # Copy app code
EXPOSE 3000                           # App runs on port 3000
CMD ["npm", "start"]                  # Start the app

# On any server with Docker:
docker build -t myblog:1.0 .          # Build container (1 minute)
docker run -p 3000:3000 myblog:1.0    # Run container (5 seconds)

Benefits:
- Works identically everywhere
- Takes seconds to start
- Isolated from other apps
- Easy to update (new container = new version)

Containers vs Virtual Machines

Visual Comparison:

VIRTUAL MACHINES (Heavy):
┌─────────────────────────────────┐
│  Physical Server                 │
│  ├── Hardware (CPU, RAM, Disk)   │
│  ├── Host OS (Windows/Linux)     │
│  ├── Hypervisor (VMware/Hyper-V) │
│  └── Virtual Machines:           │
│      ├── VM 1:                   │
│      │   ├── Guest OS (2 GB)     │ ← Full operating system
│      │   └── App A                │
│      ├── VM 2:                   │
│      │   ├── Guest OS (2 GB)     │ ← Another full OS
│      │   └── App B                │
│      └── VM 3:                   │
│          ├── Guest OS (2 GB)     │ ← Yet another full OS
│          └── App C                │
└─────────────────────────────────┘
Total: 6 GB just for operating systems!

CONTAINERS (Lightweight):
┌─────────────────────────────────┐
│  Physical Server                 │
│  ├── Hardware (CPU, RAM, Disk)   │
│  ├── Host OS (Linux)              │
│  ├── Docker Engine                │
│  └── Containers:                 │
│      ├── Container 1 (App A)     │ ← Shares host OS
│      ├── Container 2 (App B)     │ ← Shares host OS
│      └── Container 3 (App C)     │ ← Shares host OS
└─────────────────────────────────┘
Total: ~100 MB for all containers!

Feature	Virtual Machine	Container
Size	2-10 GB	50-500 MB
Startup Time	1-5 minutes	1-5 seconds
Resource Usage	High (full OS)	Low (shared OS)
Isolation	Complete (hardware-level)	Process-level
Portability	Moderate	Excellent
Use Case	Different OS needed	Same OS, fast deployment

Example:

VM: You want to run Windows software on a Mac → Use VM
Container: You want to deploy 100 copies of your web app → Use containers

Why Use Containers?

1. Consistency:

Developer laptop: ✅ Works
Colleague's laptop: ✅ Works (same container)
Testing server: ✅ Works (same container)
Production server: ✅ Works (same container)

2. Speed:

VM startup: 2 minutes
Container startup: 2 seconds ← 60x faster!

3. Efficiency:

Physical Server (64 GB RAM):
├── VMs: Can run ~10 VMs (each uses 2+ GB OS)
└── Containers: Can run ~100 containers (share host OS)

4. Portability:

Build once → Run anywhere:
- Your laptop (Windows)
- Colleague's laptop (Mac)
- Azure datacenter (Linux)
- AWS datacenter (Linux)
- Google Cloud (Linux)

5. Isolation:

Container A crashes → Containers B, C, D unaffected
Container B has security bug → Containers A, C, D unaffected

Real-World Example: E-Commerce Website

Traditional Deployment (No Containers):

Production Server 1:
- Install Node.js 18
- Install nginx web server
- Install MongoDB 6
- Install Redis 7
- Deploy web app code
- Configure everything manually

Problem: Takes 3-4 hours, prone to human error

New developer joins:
- Spend 2 days setting up local environment
- "It doesn't work on my machine!" ← Wastes days debugging

Container Deployment:

docker-compose.yml (defines all containers):

version: '3'
services:
  web:                           # Web application
    image: mycompany/webapp:2.1
    ports: ["80:3000"]

  database:                      # MongoDB database
    image: mongo:6.0
    volumes: ["db-data:/data/db"]

  cache:                         # Redis cache
    image: redis:7.0

# On ANY server (dev, staging, production):
docker-compose up -d

# Result:
- Starts 3 containers in 10 seconds
- Works identically everywhere
- New developer: 5 minutes to run entire stack locally

Common Mistakes Beginners Make

❌ Mistake 1: Thinking containers are just lightweight VMs ✅ Reality: Containers share the host OS kernel, VMs don’t ❌ Mistake 2: Storing data inside containers ✅ Reality: Containers are ephemeral (temporary). Use volumes for persistent data. ❌ Mistake 3: Running multiple apps in one container ✅ Reality: One container = one process (web server OR database, not both). Think of it like the single responsibility principle in software design — each container does one thing well. If you need a web server and a database, run two containers. This lets them scale independently (you might need 10 web containers but only 1 database container). ❌ Mistake 4: Using containers for everything ✅ Reality: Sometimes VMs are better (need different OS, strong isolation)

When to Use Containers vs VMs

Use Containers When: ✅ You want fast deployment (seconds) ✅ You need to run many copies of the same app ✅ You want consistent environments (dev = production) ✅ Your app runs on Linux Use VMs When: ✅ You need complete isolation (security, compliance) ✅ You need different operating systems (Windows + Linux on same hardware) ✅ You have legacy apps that can’t be containerized ✅ You need full control over the operating system

What is Kubernetes? (The Next Step After Containers)

The Problem Kubernetes Solves

You’ve learned containers solve the “works on my machine” problem. But… Scenario: Your blog got popular! 🎉 Month 1:

100 visitors/day
1 container handles it easily
Cost: $10/month

Month 6:

50,000 visitors/day
Need 20 containers to handle traffic
Multiple servers needed

New problems arise:

Which server should run which container?
- Server 1 has 20 GB RAM free, Server 2 has 4 GB free
- Manual placement = nightmare
What if a container crashes?
- Who restarts it? How do you know it crashed?
- Manual monitoring = 24/7 job
How do users reach the right container?
- 20 containers, each with different IP address
- Users need one URL: myblog.com
How to update without downtime?
- Stop all 20 containers = website down
- Update one-by-one manually = takes hours, error-prone
How to handle traffic spikes?
- Black Friday: need 50 containers
- Tuesday at 3am: need 5 containers
- Manual scaling = expensive or too slow

Kubernetes solves ALL these problems automatically.

Real-World Analogy: Shipping Port

Without Kubernetes (Manual Container Management):

You own 20 shipping containers (your app containers)
You have 5 trucks (your servers)

Every day you manually:
- Decide which containers go on which trucks
- Check if containers fell off trucks (crashed)
- Put fallen containers back on trucks (restart)
- Tell customers which truck has their container
- Swap trucks when they're full

Result: Full-time job, errors, slow

With Kubernetes (Automated):

Kubernetes = Smart logistics system

You tell Kubernetes:
- "I need 20 containers of my app running"
- "Each container needs 500 MB RAM"
- "Don't run more than 5 on one truck (server)"

Kubernetes automatically:
✅ Places containers on trucks (servers) optimally
✅ Monitors all containers
✅ Restarts crashed containers instantly
✅ Gives customers one address (myblog.com)
✅ Routes customers to healthy containers
✅ Replaces containers during updates (zero downtime)
✅ Adds/removes containers based on traffic

Result: You focus on your app, Kubernetes handles operations

What is Kubernetes?

Kubernetes (K8s) = An open-source container orchestration platform that automates deployment, scaling, and management of containerized applications. Think of Kubernetes as: An operating system for your containers across many servers. Core Features:

Self-Healing: Container crashed? Kubernetes restarts it automatically
Load Balancing: Distributes traffic across containers
Auto-Scaling: More traffic? Kubernetes adds containers. Less traffic? Removes them.
Rolling Updates: Update app without downtime
Service Discovery: Containers find each other automatically
Storage Orchestration: Attach storage to containers automatically

Kubernetes in Simple Terms

Transportation Analogy:

Your App = Passengers that need to travel

Container = Car (carries your app)

Kubernetes = Smart Transportation System:
- Monitors all cars (containers)
- Dispatches new cars when needed
- Removes cars when traffic is light
- Redirects passengers if a car breaks down
- Shows passengers one address (the airport) instead of 20 car locations
- Replaces old cars with new models (updates) while passengers keep arriving

Real Example: Online Shopping Website

Black Friday Sale (No Kubernetes):

00 AM: Normal traffic (5 containers)
00 AM: Traffic increases (need 20 containers)

Your manual actions:
Notice website is slow (user complaints pour in)
Log into Azure portal
Manually create 15 new VMs (20 minutes)
Manually start 15 containers
Manually update load balancer configuration
Total time: 45 minutes of website slowness

00 PM: Traffic decreases (back to 5 containers needed)
Manually stop 15 containers
Manually delete 15 VMs (to save money)
Total time: 30 minutes of manual work

Result: Lost sales, frustrated customers, exhausted you

Black Friday Sale (With Kubernetes):

00 AM: Normal traffic (5 containers)
00 AM: Traffic increases

Kubernetes automatically:
Detects high CPU usage (70%+)
Creates 15 new containers in 2 minutes
Distributes traffic across all 20 containers
Total time: 2 minutes, zero human intervention ✅

00 PM: Traffic decreases

Kubernetes automatically:
Detects low CPU usage (&lt;30%)
Gracefully stops 15 containers (waits for requests to finish)
Scales back to 5 containers
Total time: 5 minutes, zero human intervention ✅

Result: Happy customers, maximized sales, you sleep peacefully

What is Azure Kubernetes Service (AKS)?

Plain Kubernetes (DIY):

Your responsibilities:
- Install Kubernetes on VMs (complex, 50+ steps)
- Configure networking, storage, security
- Upgrade Kubernetes versions manually
- Monitor control plane (master nodes)
- Pay for control plane VMs
- Fix control plane issues (3am outages)

Time investment: 40-80 hours/month

Azure Kubernetes Service (AKS) (Managed):

Your responsibilities:
- Click "Create AKS Cluster" in Azure portal
- Deploy your containers

Azure's responsibilities:
✅ Installs and configures Kubernetes
✅ Manages control plane (free!)
✅ Auto-upgrades Kubernetes
✅ Monitors control plane health 24/7
✅ Fixes control plane issues
✅ Provides enterprise features (security, compliance)

Time investment: 4-8 hours/month
Cost: Control plane is FREE, you only pay for worker nodes (VMs that run your containers)

AKS = Kubernetes without the operational headache

Under the Hood: The AKS Control Plane

In AKS, the cluster is split into two distinct lives:

1. The Control Plane (Managed by Azure)

You never see these VMs, but they are there. They run:

kube-apiserver: The front door. Every kubectl command hits this.
etcd: The source of truth. A distributed database that stores the cluster state.
kube-scheduler: Decides which node should run your pod based on resources.
kube-controller-manager: Watches for deviations (e.g., “I need 3 pods, but only 2 are running”) and fixes them.

2. The Data Plane (Your Worker Nodes)

These are the VMs in your subscription. They run:

kubelet: The agent that takes orders from the control plane and starts containers.
kube-proxy: Handles networking and load balancing between pods.
Container Runtime: Usually containerd (Docker’s core).

[!IMPORTANT] Pro Insight: The ‘Free’ Control Plane In the standard AKS tier, the control plane is free. However, if you have a massive cluster (100+ nodes), you should upgrade to the Uptime SLA tier ( $0.10/hour = ~$ 73/month). This gives you a guaranteed 99.95% availability for the API server itself, backed by financially-backed credits. Without this tier, the control plane has no SLA — meaning Microsoft makes no guarantees about API server uptime. For production workloads, this $73/month is cheap insurance: if kubectl commands fail during an incident because the API server is down, you cannot scale, deploy, or debug your cluster.

Practical Tip: AKS Node Pool Strategy for Cost Optimization Most teams overspend on AKS by using a single, oversized node pool. A better approach:

System Node Pool (always on):
- 2-3 nodes, Standard_D2s_v3 (2 vCPU, 8 GB)
- Runs CoreDNS, metrics-server, kube-system pods
- Cost: ~$145/month for 2 nodes

User Node Pool (your workloads):
- Autoscaling: 2-10 nodes, Standard_D4s_v3
- Runs your application pods
- Cost: $290-$1,430/month depending on scale

Spot Node Pool (batch jobs, non-critical):
- Standard_D4s_v3 Spot VMs (up to 90% discount)
- For CI/CD runners, batch processing, dev environments
- Cost: ~$29/month per node (vs $143 on-demand)

This three-pool strategy can save 40-60% compared to a single pool sized for peak traffic.

Cost Comparison Example

Running 20 Containers for a Web App: Option 1: Traditional VMs (No Containers):

20 VMs × $50/month = $1,000/month
+ Slow to scale (5-10 minutes to provision new VM)
+ Manual management required

Option 2: Plain Kubernetes (DIY):

3 Control Plane VMs × $50 = $150/month  ← You pay for this
5 Worker VMs × $80 = $400/month         ← You pay for this
Total: $550/month
+ You manage control plane (time cost)
+ Complex setup and maintenance

Option 3: Azure Kubernetes Service (AKS):

Control Plane: $0/month                 ← Azure manages free!
5 Worker VMs × $80 = $400/month         ← You only pay this
Total: $400/month
+ Azure manages control plane
+ Enterprise-grade security and updates
+ Scales in seconds
+ Integrates with Azure services

Winner: AKS saves $150/month + hundreds of hours of management time

1. Why Kubernetes?

Before Kubernetes

Manual container orchestration
No auto-scaling
Complex networking
Manual load balancing
No self-healing

With Kubernetes

Automated orchestration
Auto-scaling (HPA, VPA, Cluster Autoscaler)
Service discovery
Built-in load balancing
Self-healing (restart failed pods)

[!WARNING] Gotcha: System Node Pools Every AKS cluster needs at least one “System Node Pool” to run Kubernetes itself (CoreDNS, Metrics Server). You cannot delete this pool or scale it to 0. It will always cost you money (usually 1-3 VMs).

[!TIP] Jargon Alert: Pod vs Node Node: A Virtual Machine (The house). Pod: A running process/container (The tenant living in the house). A single Node (VM) usually hosts many Pods.

2. AKS Architecture

3. Create AKS Cluster

# Create AKS cluster
az aks create \
  --name aks-prod \
  --resource-group rg-prod \
  --node-count 3 \
  --node-vm-size Standard_D4s_v3 \
  --zones 1 2 3 \
  --enable-managed-identity \
  --network-plugin azure \
  --enable-addons monitoring \
  --generate-ssh-keys

# Get credentials
az aks get-credentials \
  --name aks-prod \
  --resource-group rg-prod

# Verify
kubectl get nodes

4. AKS Networking

kubenet (Basic)
Azure CNI (Advanced)

- Pods get IPs from separate address space
- NAT for outbound connectivity
- Simpler, fewer IP addresses needed
- Use for: Dev/test, small clusters

- Pods get IPs from VNet subnet
- Direct connectivity to VNet resources
- More IP addresses required
- Use for: Production, VNet integration

5. Deploy Application

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: web
        image: myregistry.azurecr.io/web:v1
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi
        ports:
        - containerPort: 80

---
# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: web-service
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: web

---
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
  - hosts:
    - myapp.example.com
    secretName: tls-secret
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-service
            port:
              number: 80

kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f ingress.yaml

6. Autoscaling

Horizontal Pod Autoscaler
Cluster Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

# Enable cluster autoscaler
az aks update \
  --name aks-prod \
  --resource-group rg-prod \
  --enable-cluster-autoscaler \
  --min-count 3 \
  --max-count 10

# Cluster automatically adds/removes nodes based on demand

7. Best Practices

Resource Limits

Always set CPU/memory requests and limits to prevent noisy neighbors

Health Checks

Configure liveness and readiness probes for self-healing

Use Namespaces

Separate environments (dev, staging, prod) with namespaces

Security

Use Azure AD pod identity, network policies, and Pod Security Standards

Monitoring

Enable Container Insights for observability

GitOps

Use Flux or ArgoCD for declarative deployments

8. Interview Questions

Beginner Level

Q1: What is the difference between a Pod and a Node?

Answer:

Node: A worker machine (VM) in Kubernetes. It runs pods.
Pod: The smallest deployable unit. Usually contains one container (but can have sidecars).

Analogy: Node = House, Pod = Room, Container = Person in the room.

Q2: Explain the difference between ClusterIP, NodePort, and LoadBalancer

Answer:

ClusterIP: Internal IP only. Not accessible from outside. Default type.
NodePort: Exposes service on a static port on each Node IP.
LoadBalancer: Provisions an external Azure Load Balancer to expose service publicly.

Q3: What is the Master Node (Control Plane) responsible for?

Answer:

Scheduling pods (kube-scheduler)
Detecting and responding to cluster events (kube-controller-manager)
Storing cluster state (etcd)
Exposing the Kubernetes API (kube-apiserver)

Note: In AKS, Azure manages the control plane for you (free).

Intermediate Level

Q4: How does an Ingress Controller differ from a Load Balancer?

Answer:

Load Balancer: Layer 4 (TCP/UDP). One IP per service. Expensive for many services.
Ingress Controller: Layer 7 (HTTP/HTTPS). Single IP for multiple services. Supports path-based routing (/api, /web), SSL termination, and rewriting.

Q5: What happens when a Pod crashes?

Answer:

The Kubelet on the node detects the crash.
Based on restartPolicy (default: Always), it restarts the container.
If the pod is part of a Deployment/ReplicaSet, if the Node dies, the Scheduler creates a new Pod on a healthy Node.

Advanced Level

Q6: How do you upgrade an AKS cluster with zero downtime?

Answer: AKS handles this via Surge Upgrades:

Cordon a node (prevent new pods).
Drain the node (move existing pods to other nodes).
Delete the node.
Create a new node with the updated version.
Repeat for all nodes (one by one or in batches).

Requirement: PodDisruptionBudgets must be configured to ensure minAvailable replicas during the process.

Q7: Explain the Sidecar Pattern

Answer: A helper container running alongside the main application container in the same Pod.Uses:

Logging (sending logs to Splunk/Log Analytics)
Proxying (Service Mesh like Istio/Linkerd)
Config watching (reloading configuration)
Security (TLS termination)

9. Helm: Kubernetes Package Manager

Helm Architecture - Kubernetes Package Manager

Helm is the package manager for Kubernetes. It simplifies deploying complex applications with reusable charts.

Why Helm?

Without Helm:

Manage 20+ YAML files manually
Copy-paste configurations for dev/staging/prod
Hard to version and rollback deployments

With Helm:

Single command deployment: helm install myapp ./chart
Templated configurations with values
Easy rollbacks: helm rollback myapp 1
Reusable charts from public repositories

Helm Architecture

Helm Chart (Package)
├── Chart.yaml          # Metadata (name, version)
├── values.yaml         # Default configuration
├── templates/          # Kubernetes manifests with templating
│   ├── deployment.yaml
│   ├── service.yaml
│   └── ingress.yaml
└── charts/             # Dependencies (sub-charts)

Creating a Helm Chart

# Create new chart
helm create myapp

# Chart structure created:
myapp/
├── Chart.yaml
├── values.yaml
├── templates/
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── ingress.yaml
│   └── _helpers.tpl

Chart.yaml (Metadata)

apiVersion: v2
name: myapp
description: A Helm chart for my application
type: application
version: 1.0.0        # Chart version
appVersion: "2.5.1"   # Application version

values.yaml (Configuration)

replicaCount: 3

image:
  repository: myregistry.azurecr.io/myapp
  tag: "2.5.1"
  pullPolicy: IfNotPresent

service:
  type: LoadBalancer
  port: 80

ingress:
  enabled: true
  className: nginx
  hosts:
    - host: myapp.example.com
      paths:
        - path: /
          pathType: Prefix

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 250m
    memory: 256Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

templates/deployment.yaml (Templated Manifest)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "myapp.fullname" . }}
  labels:
    {{- include "myapp.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "myapp.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      labels:
        {{- include "myapp.selectorLabels" . | nindent 8 }}
    spec:
      containers:
      - name: {{ .Chart.Name }}
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        imagePullPolicy: {{ .Values.image.pullPolicy }}
        ports:
        - name: http
          containerPort: 80
          protocol: TCP
        resources:
          {{- toYaml .Values.resources | nindent 12 }}

Deploying with Helm

# Install chart
helm install myapp ./myapp

# Install with custom values
helm install myapp ./myapp \
  --set replicaCount=5 \
  --set image.tag=3.0.0

# Install with values file
helm install myapp ./myapp \
  -f values-production.yaml

# Upgrade deployment
helm upgrade myapp ./myapp \
  --set image.tag=3.1.0

# Rollback to previous version
helm rollback myapp 1

# Uninstall
helm uninstall myapp

Helm Repositories

# Add official Helm repo
helm repo add stable https://charts.helm.sh/stable

# Add Bitnami repo (popular charts)
helm repo add bitnami https://charts.bitnami.com/bitnami

# Search for charts
helm search repo nginx

# Install from repository
helm install my-nginx bitnami/nginx

# Update repo index
helm repo update

Multi-Environment Strategy

values-dev.yaml:

replicaCount: 1
image:
  tag: "latest"
ingress:
  hosts:
    - host: myapp-dev.example.com

values-prod.yaml:

replicaCount: 5
image:
  tag: "2.5.1"
ingress:
  hosts:
    - host: myapp.example.com
resources:
  limits:
    cpu: 1000m
    memory: 1Gi

# Deploy to dev
helm install myapp ./myapp -f values-dev.yaml

# Deploy to prod
helm install myapp ./myapp -f values-prod.yaml

[!TIP] Best Practice: Chart Versioning

Chart version (version in Chart.yaml): Increment when chart structure changes

App version (appVersion): Tracks the application version being deployed

Use semantic versioning: 1.2.3 (MAJOR.MINOR.PATCH)

[!WARNING] Gotcha: Helm Secrets Never commit secrets to values.yaml! Use:

Azure Key Vault: Inject secrets via CSI driver

Sealed Secrets: Encrypt secrets in Git

helm-secrets plugin: Encrypt values files with SOPS

10. GitOps with ArgoCD

GitOps = Git as the single source of truth for declarative infrastructure and applications.

GitOps Principles

Declarative: Entire system described declaratively (YAML in Git)
Versioned: Git history = deployment history
Automated: Changes in Git automatically deployed
Reconciled: Cluster state continuously reconciled with Git

ArgoCD Architecture

Git Repository (Source of Truth)
    ↓
ArgoCD (Continuous Sync)
    ↓
Kubernetes Cluster (Desired State)

Installing ArgoCD on AKS

# Create namespace
kubectl create namespace argocd

# Install ArgoCD
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Expose ArgoCD UI (LoadBalancer)
kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "LoadBalancer"}}'

# Get admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
  -o jsonpath="{.data.password}" | base64 -d

# Get external IP
kubectl get svc argocd-server -n argocd

Creating an Application

Git Repository Structure:

my-app-gitops/
├── base/
│   ├── deployment.yaml
│   ├── service.yaml
│   └── kustomization.yaml
├── overlays/
│   ├── dev/
│   │   └── kustomization.yaml
│   └── prod/
│       └── kustomization.yaml

ArgoCD Application Manifest:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp-prod
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/my-app-gitops
    targetRevision: main
    path: overlays/prod
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true      # Delete resources not in Git
      selfHeal: true   # Auto-sync if cluster drifts
    syncOptions:
      - CreateNamespace=true

# Apply ArgoCD application
kubectl apply -f argocd-app.yaml

# Or use ArgoCD CLI
argocd app create myapp-prod \
  --repo https://github.com/myorg/my-app-gitops \
  --path overlays/prod \
  --dest-server https://kubernetes.default.svc \
  --dest-namespace production \
  --sync-policy automated

GitOps Workflow

Developer commits code
    ↓
CI builds Docker image (tag: v1.2.3)
    ↓
CI updates Git repo (image: myapp:v1.2.3)
    ↓
ArgoCD detects change
    ↓
ArgoCD syncs to cluster
    ↓
Deployment updated automatically

[!IMPORTANT] Recommendation: Separate Repos

Application code repo: Source code, Dockerfile

GitOps repo: Kubernetes manifests, Helm charts

CI updates GitOps repo after building image

Sync Strategies

Strategy	Behavior	Use Case
Manual	Requires manual sync	Production (human approval)
Automated	Auto-sync on Git change	Dev/Staging
Auto-Prune	Delete resources not in Git	Clean up old resources
Self-Heal	Revert manual kubectl changes	Enforce Git as source of truth

11. Service Mesh Basics (Istio)

Service Mesh = Infrastructure layer for service-to-service communication with observability, security, and traffic management.

Why Service Mesh?

Without Service Mesh:

Implement retries, timeouts, circuit breakers in every microservice
No visibility into service-to-service traffic
Difficult to enforce mTLS between services

With Service Mesh (Istio):

Traffic Management: Canary deployments, A/B testing, retries
Security: Automatic mTLS between services
Observability: Distributed tracing, metrics, logs

Istio Architecture

Application Pod
├── App Container (your code)
└── Envoy Sidecar (injected by Istio)
    ↓
All traffic flows through Envoy
    ↓
Istio Control Plane (manages Envoy configs)

Installing Istio on AKS

# Download Istio
curl -L https://istio.io/downloadIstio | sh -
cd istio-1.20.0

# Install Istio
istioctl install --set profile=demo -y

# Enable sidecar injection for namespace
kubectl label namespace default istio-injection=enabled

# Verify
kubectl get pods -n istio-system

Traffic Management Example

Canary Deployment (90% v1, 10% v2):

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
    - myapp.example.com
  http:
    - match:
        - headers:
            user-type:
              exact: beta-tester
      route:
        - destination:
            host: myapp
            subset: v2
    - route:
        - destination:
            host: myapp
            subset: v1
          weight: 90
        - destination:
            host: myapp
            subset: v2
          weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: myapp
spec:
  host: myapp
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2

[!NOTE] Deep Dive: When to Use Service Mesh?

YES: Microservices (10+ services), need mTLS, complex traffic routing

NO: Monolith, simple apps, small teams (adds complexity)

13. AKS Security Deep Dive

Pod Security Standards

Pod Security Standards replace deprecated Pod Security Policies (PSPs). Three Levels:

Privileged: Unrestricted (no restrictions)
Baseline: Minimally restrictive (prevents known privilege escalations)
Restricted: Heavily restricted (hardened, follows pod hardening best practices)

# Enforce restricted policy on namespace
kubectl label namespace production \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/audit=restricted \
  pod-security.kubernetes.io/warn=restricted

Example: Restricted Pod:

apiVersion: v1
kind: Pod
metadata:
  name: secure-app
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: myapp:1.0
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
      readOnlyRootFilesystem: true

Network Policies

Network Policies = Firewall rules for pods.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-allow-from-web
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: web
      ports:
        - protocol: TCP
          port: 8080

Default Deny All:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

Secrets Management with Azure Key Vault

CSI Driver for Azure Key Vault:

# Install CSI driver
helm repo add csi-secrets-store-provider-azure \
  https://azure.github.io/secrets-store-csi-driver-provider-azure/charts

helm install csi csi-secrets-store-provider-azure/csi-secrets-store-provider-azure

SecretProviderClass:

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: azure-kv-sync
spec:
  provider: azure
  parameters:
    usePodIdentity: "false"
    useVMManagedIdentity: "true"
    userAssignedIdentityID: "your-identity-client-id"
    keyvaultName: "mykeyvault"
    objects: |
      array:
        - |
          objectName: database-password
          objectType: secret
    tenantId: "your-tenant-id"

Pod using Key Vault secret:

apiVersion: v1
kind: Pod
metadata:
  name: app-with-secrets
spec:
  containers:
  - name: app
    image: myapp:1.0
    volumeMounts:
    - name: secrets-store
      mountPath: "/mnt/secrets"
      readOnly: true
  volumes:
  - name: secrets-store
    csi:
      driver: secrets-store.csi.k8s.io
      readOnly: true
      volumeAttributes:
        secretProviderClass: "azure-kv-sync"

14. StatefulSets & Persistent Storage

StatefulSet = For stateful applications (databases, message queues) that need stable network identity and persistent storage.

StatefulSet vs Deployment

Feature	Deployment	StatefulSet
Pod Names	Random (web-7d8f-xyz)	Ordered (web-0, web-1, web-2)
Scaling	Parallel	Sequential (web-0 → web-1 → web-2)
Storage	Shared or ephemeral	Dedicated persistent volume per pod
Network Identity	Random	Stable (web-0.service.namespace.svc)
Use Case	Stateless apps	Databases, Kafka, Redis

StatefulSet Example

apiVersion: v1
kind: Service
metadata:
  name: mysql
spec:
  clusterIP: None  # Headless service
  selector:
    app: mysql
  ports:
  - port: 3306
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: managed-premium
      resources:
        requests:
          storage: 100Gi

Accessing pods:

# Direct access to specific pod
mysql-0.mysql.default.svc.cluster.local
mysql-1.mysql.default.svc.cluster.local
mysql-2.mysql.default.svc.cluster.local

Azure Disk vs Azure Files

Feature	Azure Disk	Azure Files
Access Mode	ReadWriteOnce (single pod)	ReadWriteMany (multiple pods)
Performance	Higher IOPS	Lower IOPS
Use Case	Databases	Shared storage, logs
Storage Class	`managed-premium`, `managed-standard`	`azurefile`, `azurefile-premium`

15. KEDA: Event-Driven Autoscaling

KEDA (Kubernetes Event-Driven Autoscaling) = Scale pods based on external metrics (queue length, HTTP requests, database queries).

Installing KEDA

# Install KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace

Example: Scale Based on Azure Service Bus Queue

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaler
spec:
  scaleTargetRef:
    name: order-processor  # Deployment to scale
  minReplicaCount: 1
  maxReplicaCount: 20
  triggers:
  - type: azure-servicebus
    metadata:
      queueName: orders
      namespace: mynamespace
      messageCount: "10"  # Scale up when >10 messages
      connectionFromEnv: SERVICEBUS_CONNECTION

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-processor
spec:
  replicas: 1  # KEDA will override this
  selector:
    matchLabels:
      app: order-processor
  template:
    metadata:
      labels:
        app: order-processor
    spec:
      containers:
      - name: processor
        image: myapp/order-processor:1.0
        env:
        - name: SERVICEBUS_CONNECTION
          valueFrom:
            secretKeyRef:
              name: servicebus-secret
              key: connection-string

How it works:

Queue has 50 messages → KEDA scales to 5 pods (50/10)
Queue has 200 messages → KEDA scales to 20 pods (max)
Queue empty → KEDA scales to 1 pod (min)

Popular KEDA Scalers

Azure Service Bus: Queue/Topic message count
Azure Storage Queue: Queue length
HTTP: Incoming HTTP requests
Prometheus: Custom metrics
Kafka: Consumer lag
Redis: List length
Cron: Time-based scaling

16. Interview Questions

Beginner Level

Q1: What is the difference between a Pod and a Deployment?

Answer:Pod:

Smallest deployable unit in Kubernetes
One or more containers running together
Ephemeral (dies when node fails)
No self-healing

Deployment:

Manages a set of identical Pods (ReplicaSet)
Ensures desired number of Pods are running
Self-healing (recreates failed Pods)
Supports rolling updates and rollbacks

In production: Always use Deployments, never bare Pods.

Q2: Explain Kubernetes namespaces

Answer:Namespaces = Virtual clusters within a physical cluster.Use cases:

Environment separation: dev, staging, prod
Team isolation: team-a, team-b
Resource quotas: Limit CPU/memory per namespace

Default namespaces:

default: Default namespace for resources
kube-system: Kubernetes system components
kube-public: Public resources (readable by all)

Example:

kubectl create namespace production
kubectl get pods -n production

Q3: What is a Service in Kubernetes?

Answer:Service = Stable network endpoint for a set of Pods.Problem: Pods have dynamic IPs (change on restart) Solution: Service provides a stable IP and DNS nameTypes:

ClusterIP (default): Internal only (10.0.1.5)
NodePort: Exposes on each node’s IP (30000-32767)
LoadBalancer: Creates Azure Load Balancer (public IP)

Example:

apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  type: LoadBalancer
  selector:
    app: web
  ports:
  - port: 80
    targetPort: 8080

Intermediate Level

Q4: How does Horizontal Pod Autoscaler (HPA) work?

Answer:HPA = Automatically scales pods based on CPU/memory usage.How it works:

Metrics Server collects pod metrics every 15 seconds
HPA controller checks metrics every 30 seconds
If avg CPU > target, scale up
If avg CPU < target (for 5 min), scale down

Formula:

desiredReplicas = ceil(currentReplicas * (currentMetric / targetMetric))

Example:

Current: 3 pods, avg CPU 80%
Target: 50%
Desired: ceil(3 * (80/50)) = ceil(4.8) = 5 pods

Gotcha: Requires resources.requests to be set!

Q5: Explain the difference between Kubenet and Azure CNI

Answer:

Feature	Kubenet	Azure CNI
Pod IP	Private (10.244.x.x)	VNet IP (10.0.1.x)
IP Consumption	Low (NAT used)	High (1 IP per pod)
Performance	Slight overhead (NAT)	Direct routing (faster)
VNet Integration	No	Yes (pods directly in VNet)
Network Policies	Calico required	Native support
Use Case	Small clusters, IP conservation	Enterprise, VNet integration

Recommendation: Use Azure CNI for production (better integration, performance).

Q6: How do you implement zero-downtime deployments in AKS?

Answer:Strategy: Rolling Update with readiness probes

apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1  # Max 1 pod down at a time
      maxSurge: 1        # Max 1 extra pod during update
  template:
    spec:
      containers:
      - name: app
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5

Process:

Create 1 new pod (v2)
Wait for readiness probe to pass
Terminate 1 old pod (v1)
Repeat until all pods are v2

Result: Always 4-6 pods running (never less than 4).

Advanced Level

Q7: Design a multi-tenant AKS architecture

Answer:Requirements:

Isolate tenants (security, resources)
Cost allocation per tenant
Prevent noisy neighbor

Architecture:Option 1: Namespace per Tenant (Soft Isolation)

# Namespace
apiVersion: v1
kind: Namespace
metadata:
  name: tenant-acme
  labels:
    tenant: acme

# Resource Quota
apiVersion: v1
kind: ResourceQuota
metadata:
  name: tenant-quota
  namespace: tenant-acme
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    pods: "50"

# Network Policy (Deny cross-tenant traffic)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-other-tenants
  namespace: tenant-acme
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          tenant: acme

Option 2: Node Pool per Tenant (Hard Isolation)

# Create dedicated node pool for tenant
az aks nodepool add \
  --cluster-name aks-prod \
  --name acmepool \
  --node-count 3 \
  --node-taints tenant=acme:NoSchedule \
  --labels tenant=acme

Deployment with node affinity:

spec:
  tolerations:
  - key: tenant
    operator: Equal
    value: acme
    effect: NoSchedule
  nodeSelector:
    tenant: acme

Cost Allocation: Use tags/labels + Azure Cost Management.

Q8: How do you troubleshoot a CrashLoopBackOff pod?

Answer:CrashLoopBackOff = Pod starts, crashes, Kubernetes restarts it, crashes again (loop).Troubleshooting Steps:

Check pod events:

kubectl describe pod mypod
# Look for: Events section (OOMKilled, ImagePullBackOff, etc.)

Check logs:

kubectl logs mypod
kubectl logs mypod --previous  # Logs from crashed container

Common causes:

OOMKilled: Increase resources.limits.memory
Application error: Fix code, check environment variables
Missing dependencies: Database not ready → Add init container
Liveness probe failing: Adjust probe settings

Debug with ephemeral container (Kubernetes 1.23+):

kubectl debug mypod -it --image=busybox --target=mycontainer

Disable probes temporarily:

# Comment out liveness probe to prevent restarts
# livenessProbe:
#   httpGet:
#     path: /health

Q9: Implement a blue-green deployment strategy in AKS

Answer:Blue-Green = Run two identical environments (blue=current, green=new), switch traffic instantly.Implementation with Services:

# Blue Deployment (current)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-blue
spec:
  replicas: 5
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
      - name: app
        image: myapp:1.0

# Green Deployment (new)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-green
spec:
  replicas: 5
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
      - name: app
        image: myapp:2.0

# Service (points to blue initially)
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
    version: blue  # Switch to 'green' to cutover
  ports:
  - port: 80
    targetPort: 8080

Cutover Process:

# 1. Deploy green
kubectl apply -f deployment-green.yaml

# 2. Test green internally
kubectl port-forward deployment/myapp-green 8080:8080

# 3. Switch traffic (instant cutover)
kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'

# 4. Monitor for issues
# If problems: kubectl patch service myapp -p '{"spec":{"selector":{"version":"blue"}}}'

# 5. Delete blue after validation
kubectl delete deployment myapp-blue

Pros: Instant rollback, zero downtime Cons: 2x resources during deployment

Troubleshooting: The AKS Production Triage

When a pod fails in production, don’t panic. Follow this 3-step triage:

1. The “Pods won’t start” Phase

ImagePullBackOff: Kubernetes can’t download your container image.
- The Pro Check: Does the AKS Cluster have the AcrPull permission on your Container Registry?
CrashLoopBackOff: The container starts but immediately crashes.
- The Pro Check: Run kubectl logs <pod-name> --previous. You need to see the logs from the failed instance, not the new one that just restarted.
Pending: The pod isn’t even trying to start.
- The Pro Check: Run kubectl describe pod <pod-name>. Usually, it’s because you requested 2 GB of RAM but your nodes only have 1 GB available.

2. The “Network Ghost” Phase

Service but no Response: The service is running, but you get a 504 timeout.
- The Pro Check: Do the selectors in your Service YAML exactly match the labels in your Deployment YAML? If not, the Load Balancer is sending traffic into a black hole.

3. The “Node Pressure” Phase

Evicted Pods: Your pods are being killed randomly.
- The Pro Check: Your Node is out of disk space or RAM. Check “Azure Monitor for Containers” to see which app is leaking memory.

[!TIP] Pro Tool: Lens & k9s While kubectl is the standard, Principal Engineers often use Lens (Desktop UI) or k9s (Terminal UI) to visualize cluster health in real-time. These tools make it instantly obvious when a deployment is failing across multiple zones.

17. Key Takeaways

Managed Control Plane

AKS manages the master nodes (API server, etcd) for free. You only pay for worker nodes.

Declarative Config

Use YAML manifests to define desired state. Avoid imperative commands (kubectl run) in production.

Autoscaling

Use HPA for pods (CPU/Memory) and Cluster Autoscaler for nodes to handle variable loads efficiently.

Networking Choice

Use Kubenet for simplicity/IP conservation. Use Azure CNI for distinct IPs per pod and direct VNet connectivity.

Security

Integrate Azure AD for authentication. Use Network Policies to restrict traffic between pods.

Namespace Isolation

Use namespaces to logically separate teams, environments (dev/prod), or applications within a cluster.

Interview Deep-Dive

Your AKS cluster runs 50 microservices. During a deployment, Service A returns 500 errors, cascading to Services B and C. How do you prevent this?

Strong Candidate Answer:

The cascade mechanism: Service B calls A synchronously. When A returns 500s, B retries aggressively, amplifying load on A. B’s response time increases as it waits for A’s timeouts, exhausting B’s connection pool. Service C, calling B, experiences the same cascade. Within 60 seconds, all three services are down.
Prevention 1 — Circuit Breaker: After 5 consecutive failures from A, the circuit opens and B returns a fallback response (cached data, degraded response) instead of waiting. Use Istio destination rules or application-level libraries like Polly.
Prevention 2 — Aggressive timeouts: Set 2-3 second timeouts for internal calls (not 30-second defaults). Configure retries with exponential backoff and jitter, limited to 3 attempts. Prevents retry storms.
Prevention 3 — Bulkhead pattern: Separate connection pools per downstream dependency. If the pool for A is exhausted, calls to other services continue unaffected.
Prevention 4 — Async communication: If B does not need synchronous response from A, switch to Service Bus messaging. B publishes “process payment” and immediately returns. A processes when recovered.

Follow-up: How do you test that circuit breakers work before a real incident?Chaos engineering with Chaos Mesh or Azure Chaos Studio. Inject 5-second latency on A, return 500 errors for 50% of requests, or kill A’s pods. Verify B’s circuit breaker opens and C remains healthy. Run in staging first, then production during low-traffic windows. The first chaos experiment always reveals misconfigured circuit breakers — discovering that in a test is worth 100x more than during a real incident.

Compare AKS versus Azure Container Apps versus Azure Container Instances. When would you choose each?

Strong Candidate Answer:

ACI: Single container, no orchestration, pay per second. Best for batch jobs, build agents, burst capacity. ~$35/month for 1 vCPU 24/7. Not suitable for production web services.
Container Apps (ACA): Serverless containers on managed Kubernetes (KEDA + Envoy + Dapr). Auto-scales to zero. Built-in Dapr for service-to-service calls. Best for 3-10 engineer teams wanting container benefits without Kubernetes overhead.
AKS: Full Kubernetes control — networking, node pools, admission controllers, service mesh, GPU scheduling. Best for 20+ engineer orgs, complex architectures, multi-cloud portability, or custom operators.
Decision: Start with Container Apps unless you need a specific Kubernetes feature. Migrate to AKS when you outgrow it (custom networking, Windows containers, GPU nodes).

Follow-up: Your team has 15 microservices on AKS with 3 engineers. On-call burden is heavy. Should you migrate to Container Apps?This is the Container Apps sweet spot. Each engineer manages 5 services plus the Kubernetes platform. Container Apps eliminates node management, cluster upgrades, and ingress controller configuration. Migration is 2-4 weeks: convert K8s manifests to Container Apps YAML (mostly 1:1 mapping), use Dapr for service communication.

You need to choose between Azure CNI and kubenet for AKS networking. The VNet is a /24 with 251 usable addresses. What do you recommend?

Next Steps

Continue to Chapter 8

Master Azure Functions and serverless event-driven architecture

Database Services Serverless Architecture

Documentation Index

​Azure Kubernetes Service (AKS)

​What You’ll Learn

​Introduction: What Are Containers? (Start Here if You’re Completely New)

​The Problem Containers Solve

​Before Containers: The “Works on My Machine” Problem

​Real-World Analogy: Shipping Containers

​What is a Container?

​Container Example: Blog Website

​Containers vs Virtual Machines

​Why Use Containers?

​Real-World Example: E-Commerce Website

​Common Mistakes Beginners Make

​When to Use Containers vs VMs

​What is Kubernetes? (The Next Step After Containers)

​The Problem Kubernetes Solves

​Real-World Analogy: Shipping Port

​What is Kubernetes?

​Kubernetes in Simple Terms

​Real Example: Online Shopping Website

​What is Azure Kubernetes Service (AKS)?

​Under the Hood: The AKS Control Plane

​1. The Control Plane (Managed by Azure)

​2. The Data Plane (Your Worker Nodes)

​Cost Comparison Example

​1. Why Kubernetes?

Before Kubernetes

With Kubernetes

​2. AKS Architecture

​3. Create AKS Cluster

​4. AKS Networking

​5. Deploy Application

​6. Autoscaling

​7. Best Practices

Resource Limits

Health Checks

Use Namespaces

Security

Monitoring

GitOps

​8. Interview Questions

​Beginner Level

​Intermediate Level

​Advanced Level

​9. Helm: Kubernetes Package Manager

​Why Helm?

​Helm Architecture

​Creating a Helm Chart

​Chart.yaml (Metadata)

​values.yaml (Configuration)

​templates/deployment.yaml (Templated Manifest)

​Deploying with Helm

​Helm Repositories

​Multi-Environment Strategy

​10. GitOps with ArgoCD

​GitOps Principles

​ArgoCD Architecture

​Installing ArgoCD on AKS

​Creating an Application

​GitOps Workflow

​Sync Strategies

​11. Service Mesh Basics (Istio)

​Why Service Mesh?

​Istio Architecture

​Installing Istio on AKS

​Traffic Management Example

​13. AKS Security Deep Dive

​Pod Security Standards

​Network Policies

​Secrets Management with Azure Key Vault

​14. StatefulSets & Persistent Storage

​StatefulSet vs Deployment

​StatefulSet Example

​Azure Disk vs Azure Files

​15. KEDA: Event-Driven Autoscaling

​Installing KEDA

​Example: Scale Based on Azure Service Bus Queue

​Popular KEDA Scalers

​16. Interview Questions

​Beginner Level

Azure Kubernetes Service (AKS)

What You’ll Learn

Introduction: What Are Containers? (Start Here if You’re Completely New)

The Problem Containers Solve

Before Containers: The “Works on My Machine” Problem

Real-World Analogy: Shipping Containers

What is a Container?

Container Example: Blog Website

Containers vs Virtual Machines

Why Use Containers?

Real-World Example: E-Commerce Website

Common Mistakes Beginners Make

When to Use Containers vs VMs

What is Kubernetes? (The Next Step After Containers)

The Problem Kubernetes Solves

Real-World Analogy: Shipping Port

What is Kubernetes?

Kubernetes in Simple Terms

Real Example: Online Shopping Website

What is Azure Kubernetes Service (AKS)?

Under the Hood: The AKS Control Plane

1. The Control Plane (Managed by Azure)

2. The Data Plane (Your Worker Nodes)

Cost Comparison Example

1. Why Kubernetes?

2. AKS Architecture

3. Create AKS Cluster

4. AKS Networking

5. Deploy Application

6. Autoscaling

7. Best Practices

8. Interview Questions

Beginner Level

Intermediate Level

Advanced Level

9. Helm: Kubernetes Package Manager

Why Helm?

Helm Architecture

Creating a Helm Chart

Chart.yaml (Metadata)

values.yaml (Configuration)

templates/deployment.yaml (Templated Manifest)

Deploying with Helm

Helm Repositories

Multi-Environment Strategy

10. GitOps with ArgoCD

GitOps Principles

ArgoCD Architecture

Installing ArgoCD on AKS

Creating an Application

GitOps Workflow

Sync Strategies

11. Service Mesh Basics (Istio)

Why Service Mesh?

Istio Architecture

Installing Istio on AKS

Traffic Management Example

13. AKS Security Deep Dive

Pod Security Standards

Network Policies

Secrets Management with Azure Key Vault

14. StatefulSets & Persistent Storage

StatefulSet vs Deployment

StatefulSet Example

Azure Disk vs Azure Files

15. KEDA: Event-Driven Autoscaling

Installing KEDA

Example: Scale Based on Azure Service Bus Queue

Popular KEDA Scalers

16. Interview Questions

Beginner Level

Intermediate Level

Advanced Level

Troubleshooting: The AKS Production Triage

1. The “Pods won’t start” Phase

2. The “Network Ghost” Phase

3. The “Node Pressure” Phase

17. Key Takeaways

Interview Deep-Dive

Next Steps