Chapter 11: No Servers to Manage - Cloud Run and Cloud Functions

Serverless in Google Cloud is not just about “functions.” It is a comprehensive ecosystem that allows you to deploy containers, functions, or entire web applications without managing a single virtual machine. The core philosophy is abstraction: you provide the code, and Google provides the scale.

1. Cloud Run: The Future of Serverless

Cloud Run is Google’s premier serverless offering. It is built on Knative, an open-source Kubernetes-based platform, but it abstracts away the Kubernetes complexity entirely. If you are coming from AWS, Cloud Run is closest to AWS App Runner or a simplified ECS Fargate — you give it a container image, and it handles everything else. Unlike AWS Lambda (which runs functions), Cloud Run runs full container images, giving you complete control over your runtime, language, and dependencies.

Concurrency and Cold Starts: The Performance Trade-off

Cloud Run supports Concurrency.

Efficiency: A single Cloud Run instance can handle up to 1000 concurrent requests.
Cold Start Mitigation: Because one instance stays warm for many users, the “Cold Start” penalty is hit far less frequently than in traditional FaaS (Function as a Service) models.
Cost: You are billed for the resources used during the overlap of requests, making it significantly cheaper for high-traffic APIs.

Min Instances: The Warm Pool Strategy To eliminate cold starts entirely, you can configure Min Instances.

Setting: --min-instances=5 keeps at least 5 instances perpetually warm.
Trade-off: You pay for 5 instances continuously, even if they are idle. This is a guaranteed cost for guaranteed low latency.
SRE Formula: If your service has a 100ms p99 cold start time and serves 1M requests/day with a 1-second SLA, keeping 2-3 min instances can prevent 99% of cold starts while only increasing cost by ~10%.

Cloud Run Services vs. Jobs

Services: For request-driven workloads (APIs, Web Frontends). They scale to zero when idle and scale up instantly based on traffic. The scale-to-zero capability means you pay nothing during off-hours — a huge cost advantage for internal tools, staging environments, or low-traffic APIs.
Jobs: For data processing, database migrations, or scheduled tasks. They run to completion and do not listen for HTTP requests. Think of Jobs like a serverless cron task — you define the container, the parallelism (up to 10,000 tasks), and Cloud Run handles execution.

Cost Example: A Cloud Run service handling 1 million requests/month with an average processing time of 200ms, using 1 vCPU and 512MB RAM, costs approximately

0.00002-

0.00004 per request. Total monthly cost: roughly

20-40. Compare this to running an `e2-small` Compute Engine VM 24/7 at ~

15/month, which seems cheaper but cannot scale and requires you to manage the OS, patches, and availability. For variable traffic, Cloud Run almost always wins on total cost of ownership.

1.3 Advanced Cloud Run Patterns

Sidecar Containers (Multi-Container)

Cloud Run now supports Sidecars. You can run multiple containers within a single service.

Use Cases:
- Envoy/Nginx: As a local proxy for auth or caching.
- Logging Agents: Shipping custom logs to third-party tools (Datadog, Splunk).
- Cloud SQL Proxy: Running the proxy as a sidecar for better security and performance.
Constraint: Only one container (the “ingress” container) can listen for HTTP requests on the specified port.

Direct VPC Egress

The traditional “Serverless VPC Access Connector” was a separate VM-based bottleneck. Cloud Run now supports Direct VPC Egress.

Performance: Lower latency and higher throughput (up to 10 Gbps).
Security: No need to manage a separate “Connector” subnet.
Cost: Eliminates the cost of the Connector VMs.

2. Cloud Functions (2nd Gen) Deep Dive

Cloud Functions 2nd Gen is a major architectural leap. It is actually built on top of Cloud Run and Eventarc.

2.1 The Event-Driven Heart

2nd Gen functions are designed to react to the world via Eventarc.

90+ Event Sources: Including Cloud Storage (File Created), Pub/Sub (Message Published), and BigQuery (Query Finished).
Architecture: Eventarc captures the event -> Wraps it in a CloudEvent JSON -> POSTs it to the Cloud Function’s HTTP endpoint.

2.2 Security: Secret Manager Integration

Never store API keys in environment variables.

Implementation: Mount secrets as Volumes or Environment Variables directly from Secret Manager.
Benefit: Rotation is automatic. When you update the secret in Secret Manager, the function picks it up without a redeploy.

5. Orchestration: Cloud Workflows

When your serverless architecture grows beyond 3-4 services, you shouldn’t use “Chaining” (where A calls B, B calls C). This creates “Distributed Spaghetti.”

5.1 Why Workflows?

Cloud Workflows is a serverless orchestrator that allows you to chain services with:

Retries: Automatic backoff if a service is down.
State Management: Storing variables across different steps.
Long-Running: A single workflow can wait for up to 1 year.

5.2 Workflow Example (YAML)

main:
  steps:
    - call_translation_service:
        call: http.post
        args:
          url: https://translate-api-xyz.a.run.app
          body:
            text: "Hello World"
        result: translation_output
    - save_to_db:
        call: http.post
        args:
          url: https://db-api-xyz.a.run.app
          body:
            data: ${translation_output.body}

3. App Engine: The Mature PaaS

App Engine was Google’s first cloud product (2008) and was actually the first “serverless” platform, predating AWS Lambda by six years. While Cloud Run is the modern choice for new projects, App Engine remains a powerful Platform-as-a-Service for traditional web apps. AWS Elastic Beanstalk and Azure App Service are the closest equivalents. Honest Assessment: For new projects in 2024+, Cloud Run is almost always the better choice. App Engine Standard has some unique advantages (faster cold starts, simpler deployment for supported runtimes), but Cloud Run’s container-based model is more flexible and avoids the “App Engine lock-in” concern where your deployment tooling becomes tightly coupled to App Engine-specific configuration files (app.yaml).

Standard vs. Flexible

Feature	Standard Environment	Flexible Environment
Scaling	Seconds (to Zero)	Minutes (Cannot scale to zero)
Runtime	Specific versions (Python 3.10, Node 18, etc.)	Any (Docker-based)
Hardware	Sandboxed	GCE Virtual Machines
Networking	Internal	VPC-enabled

Traffic Splitting

App Engine makes “Canary Deployments” incredibly easy. You can deploy a new version and split traffic (e.g., 95% to v1, 5% to v2) based on IP address, cookies, or random selection.

4. Performance Tuning: CPU Allocation and Probes

4.1 CPU Allocation: “Always On” vs. “During Requests”

By default, Cloud Run only allocates CPU during request processing.

CPU Boost: For 2nd generation execution environments, Cloud Run can “boost” CPU during startup to reduce cold start latency.
Always-on CPU: You can choose to allocate CPU even when no requests are being processed. This is useful for background tasks or maintaining heavy in-memory caches.

4.2 Startup Probes

If your container takes a long time to initialize (e.g., a heavy Java app), use Startup Probes. Cloud Run will wait until the probe succeeds before sending any traffic to the instance, ensuring users don’t see 503 errors during scale-up.

5. Event-Driven Architecture with Eventarc

Eventarc is the glue that connects GCP services. It allows you to build decoupled systems where a change in one service triggers an action in another.

Flow: An Event occurs (e.g., a file is uploaded to GCS) → Eventarc captures it → Eventarc routes it to a Trigger (e.g., a Cloud Function or Cloud Run service).
Format: It uses the CloudEvents standard, ensuring your event-driven code is portable.

5. Serverless Security and Connectivity

VPC Connector

Serverless services live outside your VPC by default. To access a private Cloud SQL instance or a Redis cache in your VPC, you must use a Serverless VPC Access Connector.

It creates a bridge between the serverless environment and your VPC, allowing traffic to flow over private internal IPs.

IAM and the “Least Privilege” Service Account

By default, serverless services use the “Default Compute Service Account,” which is far too powerful. Always:

Create a Custom Service Account.
Grant only the necessary roles (e.g., roles/storage.objectViewer).
Assign that SA to the Cloud Run service or Cloud Function.

6. Interview Preparation: Architectural Deep Dive

1. Q: How does Cloud Run handle concurrency compared to traditional Functions-as-a-Service (FaaS)? A: Traditional FaaS (like AWS Lambda or GCF 1st Gen) handles one request per instance. This causes many “cold starts” during traffic spikes. Cloud Run supports concurrency (up to 1,000 requests per instance). This means a single container instance can serve multiple users simultaneously, drastically reducing the number of cold starts and significantly lowering costs for high-traffic services. 2. Q: What is the purpose of the “Serverless VPC Access Connector”? A: Cloud Run and Cloud Functions live in a Google-managed tenant project outside your VPC. By default, they cannot reach resources with private IPs (like Cloud SQL or Memorystore). The VPC Access Connector creates a bridge (via a small VM subnet) that allows the serverless service to route traffic to your VPC’s internal IP addresses securely, without going over the public internet. 3. Q: When should you use Cloud Run “Jobs” instead of “Services”? A: Use Services for request-driven workloads that listen for HTTP/gRPC requests (APIs, web apps) and scale to zero. Use Jobs for task-driven workloads that run to completion and do not have an HTTP endpoint. Examples for Jobs include database migrations, scheduled data processing, or batch report generation. Jobs can be triggered manually or via a cron schedule (Cloud Scheduler). 4. Q: Explain the concept of “Min Instances” and the cost-performance trade-off. A: Setting --min-instances (e.g., to 2) keeps a “warm pool” of instances always running. Benefit: It eliminates cold start latency for the first requests. Trade-off: You are billed for these instances even if they are idle. This is a “pay for performance” model used for latency-sensitive production APIs where a 2-second cold start is unacceptable. 5. Q: How does “Traffic Splitting” in App Engine or Cloud Run facilitate Canary deployments? A: Traffic splitting allows you to deploy a new version (Revision) of your service without routing 100% of traffic to it. You can specify a split (e.g., 95% to v1, 5% to v2) based on random selection or tags. This allows you to monitor the health and performance of the new version in production with a small subset of real users before completing the rollout, minimizing the “Blast Radius” of potential bugs.

Implementation: The “Serverless Pro” Lab

Deploying a Multi-Region Cloud Run API

# 1. Deploy the service to two regions for global coverage
# Why separate image registries per region: Pulling images cross-region adds 2-5 seconds
# to cold start time. Using a regional Artifact Registry (us-docker vs europe-docker)
# ensures fast image pulls from the nearest registry.
# Why custom service account: The default compute SA has Editor access -- way too broad.
gcloud run deploy api-us \
    --image=us-docker.pkg.dev/my-project/my-repo/api:v1 \
    --region=us-central1 \
    --service-account=my-api-sa@$PROJECT_ID.iam.gserviceaccount.com

gcloud run deploy api-eu \
    --image=europe-docker.pkg.dev/my-project/my-repo/api:v1 \
    --region=europe-west1 \
    --service-account=my-api-sa@$PROJECT_ID.iam.gserviceaccount.com

# 2. Use a Global HTTP(S) Load Balancer to route to both
# This provides a single global IP that routes users to the nearest Cloud Run region.
# AWS equivalent: You would need CloudFront + two regional ALBs + Route 53 latency routing.
# Cost: The LB itself is ~$18/month + $0.008-$0.012 per million requests. Negligible for
# the global routing benefit you get.

Pro-Tip: Cloud Run Revision Tags

When you deploy a new version of a Cloud Run service, use Revision Tags. gcloud run deploy --image=... --tag=beta This allows you to access the new version at a specific URL (beta---my-service-xyz.a.run.app) for testing before you route any production traffic to it.

Google Kubernetes Engine Data Analytics & BigQuery

Documentation Index

​Chapter 11: No Servers to Manage - Cloud Run and Cloud Functions

​1. Cloud Run: The Future of Serverless

​Concurrency and Cold Starts: The Performance Trade-off

​Cloud Run Services vs. Jobs

​1.3 Advanced Cloud Run Patterns

​Sidecar Containers (Multi-Container)

​Direct VPC Egress

​2. Cloud Functions (2nd Gen) Deep Dive

​2.1 The Event-Driven Heart

​2.2 Security: Secret Manager Integration

​5. Orchestration: Cloud Workflows

​5.1 Why Workflows?

​5.2 Workflow Example (YAML)

​3. App Engine: The Mature PaaS

​Standard vs. Flexible

​Traffic Splitting

​4. Performance Tuning: CPU Allocation and Probes

​4.1 CPU Allocation: “Always On” vs. “During Requests”

​4.2 Startup Probes

​5. Event-Driven Architecture with Eventarc

​5. Serverless Security and Connectivity

​VPC Connector

​IAM and the “Least Privilege” Service Account

​6. Interview Preparation: Architectural Deep Dive

​Implementation: The “Serverless Pro” Lab

​Deploying a Multi-Region Cloud Run API

​Pro-Tip: Cloud Run Revision Tags