Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
API Gateway
If you have 10 microservices, you don’t want the frontend to know 10 different URLs. You need a single entry point. Spring Cloud Gateway is the standard, built on top of Spring WebFlux (Non-blocking I/O).1. Why an API Gateway?
- Routing: Single domain (
api.myapp.com) routes to multiple services. - Security: Centralized Authentication/Authorization (OAuth2).
- Rate Limiting: Protect your backend from DDoS.
- Monitoring: Log every request entering the system.
2. Setup
- New Spring Boot Project.
- Dependencies:
spring-cloud-starter-gateway,spring-cloud-starter-netflix-eureka-client.
spring-boot-starter-web (Tomcat). Gateway uses Netty.
3. Configuration (Routing)
You can route requests based on paths.application.yml:
localhost:8080/api/v1/users/1 is forwarded to USER-SERVICE/users/1.
4. Custom Filters
You can write Global Filters to intercept every request (e.g., logging).5. Gateway Authentication
Using Spring Security at the gateway level.6. How it Works (Internal Flow)
Spring Cloud Gateway is built on the Reactor Netty (Non-blocking) server.7. Rate Limiting (Redis)
Spring Cloud Gateway has a built-in RequestRateLimiter usage Redis and the Token Bucket Algorithm. Dependency:spring-boot-starter-data-redis-reactive.
Config:
Interview Deep-Dive
Why is Spring Cloud Gateway built on WebFlux (Netty) instead of Spring MVC (Tomcat)? What would happen if you accidentally included spring-boot-starter-web in a Gateway project?
Why is Spring Cloud Gateway built on WebFlux (Netty) instead of Spring MVC (Tomcat)? What would happen if you accidentally included spring-boot-starter-web in a Gateway project?
Strong Answer:
- A gateway is fundamentally a proxy — it receives a request, routes it to a backend service, and streams the response back. It does almost no computation; it is I/O-bound. The performance bottleneck is how many concurrent connections it can hold open while waiting for backend responses.
- Tomcat (Spring MVC) uses a thread-per-request model. Default: 200 threads. If each backend call takes 500ms, you can handle 400 requests/second before thread exhaustion. To handle 10,000 concurrent connections, you would need 10,000 threads — each consuming ~1MB of stack memory (10GB just for thread stacks).
- Netty (WebFlux) uses an event-loop model with a small number of threads (default: CPU cores). A single thread can manage thousands of connections because it never blocks waiting for I/O. When a backend response arrives, the event loop picks it up and forwards it. With 8 cores, Netty can handle 50,000+ concurrent connections using ~200MB of memory.
- If you include
spring-boot-starter-webalongside the gateway starter, Spring Boot detects Tomcat on the classpath and tries to start a servlet-based context. But Spring Cloud Gateway requires a reactive context. You get a startup error or, worse, the gateway starts but behaves unpredictably because WebFlux filters and servlet filters have incompatible lifecycles. The fix: excludespring-boot-starter-weband only usespring-cloud-starter-gateway(which transitively includesspring-boot-starter-webflux).
Thread.sleep(), synchronous JDBC, or .block() on a Mono inside a filter), you freeze all connections handled by that thread. Reactor’s BlockHound tool detects this in tests. For the slow-backend scenario specifically, set timeouts at the gateway level (spring.cloud.gateway.httpclient.response-timeout) and pair with a circuit breaker. The gateway returns 504 Gateway Timeout to the client, freeing the connection for other requests.How would you implement authentication and authorization at the API Gateway level for a microservices architecture with 20 backend services?
How would you implement authentication and authorization at the API Gateway level for a microservices architecture with 20 backend services?
Explain the Token Bucket algorithm used by Spring Cloud Gateway's rate limiter. How does it differ from fixed window rate limiting, and when does each fail?
Explain the Token Bucket algorithm used by Spring Cloud Gateway's rate limiter. How does it differ from fixed window rate limiting, and when does each fail?
Strong Answer:
- Token Bucket: imagine a bucket that holds
burstCapacitytokens. Tokens are added atreplenishRateper second. Each request consumes one token. If the bucket is empty, the request is rejected (429). The bucket never exceedsburstCapacity. This allows short bursts (a user fires 20 requests at once if the bucket is full) while enforcing a long-term average rate. - Fixed Window: divide time into fixed intervals (e.g., 1-minute windows). Count requests per window. If the count exceeds the limit, reject. The problem: boundary attacks. A user sends 100 requests at 11:59:59 (within the first window’s limit) and 100 more at 12:00:01 (within the second window’s limit). In a 2-second span, they sent 200 requests while each window only allows 100.
- Token Bucket avoids the boundary attack because it does not have windows — it is continuous. The bucket has at most
burstCapacitytokens at any point, so the maximum burst is capped regardless of timing. - Spring Cloud Gateway’s implementation uses Redis with a Lua script (atomic MULTI/EXEC). The script checks and updates token count and timestamp in a single Redis operation, preventing race conditions when multiple gateway instances process requests for the same user simultaneously.
- Failure mode of Token Bucket: if Redis is down, the rate limiter cannot check tokens. The default behavior is to allow the request (fail open), which means no rate limiting during a Redis outage. For critical rate limiting, you need a local fallback.
KeyResolver that returns a composite key: {userId}:{tier}. Then configure multiple route instances or use a custom RateLimiter implementation that reads tier-specific limits from a configuration source (Redis hash, database, or Spring Cloud Config). The cleaner approach: resolve the user tier in a pre-filter, set it as a request attribute, and write a custom RateLimiter that reads the attribute and applies the corresponding limit. This keeps the routing configuration simple while allowing arbitrary tier logic.