API Gateway

If you have 10 microservices, you don’t want the frontend to know 10 different URLs. You need a single entry point. Spring Cloud Gateway is the standard, built on top of Spring WebFlux (Non-blocking I/O).

1. Why an API Gateway?

Routing: Single domain (api.myapp.com) routes to multiple services.
Security: Centralized Authentication/Authorization (OAuth2).
Rate Limiting: Protect your backend from DDoS.
Monitoring: Log every request entering the system.

2. Setup

New Spring Boot Project.
Dependencies: spring-cloud-starter-gateway, spring-cloud-starter-netflix-eureka-client.

Note: Do NOT include spring-boot-starter-web (Tomcat). Gateway uses Netty.

3. Configuration (Routing)

You can route requests based on paths. application.yml:

server:
  port: 8080

spring:
  application:
    name: API-GATEWAY
  cloud:
    gateway:
      discovery:
        locator:
          enabled: true # Automatically create routes for services in Eureka
          lower-case-service-id: true
      routes:
        - id: user-service
          uri: lb://USER-SERVICE # lb = Load Balanced
          predicates:
            - Path=/api/v1/users/**
          filters:
             - RewritePath=/api/v1/users/(?<segment>.*), /users/$\{segment}

Now, a request to localhost:8080/api/v1/users/1 is forwarded to USER-SERVICE/users/1.

4. Custom Filters

You can write Global Filters to intercept every request (e.g., logging).

@Component
public class LoggingFilter implements GlobalFilter {

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        System.out.println("Request Path: " + exchange.getRequest().getPath());
        
        // Pre-processing
        return chain.filter(exchange)
            .then(Mono.fromRunnable(() -> {
                // Post-processing
                System.out.println("Response Status: " + exchange.getResponse().getStatusCode());
            }));
    }
}

5. Gateway Authentication

Using Spring Security at the gateway level.

@Bean
public SecurityWebFilterChain springSecurityFilterChain(ServerHttpSecurity http) {
    http
        .csrf(ServerHttpSecurity.CsrfSpec::disable)
        .authorizeExchange(exchanges -> exchanges
            .pathMatchers("/public/**").permitAll() // Public endpoints
            .anyExchange().authenticated() // Everything else requires auth
        )
        .oauth2ResourceServer(oauth2 -> oauth2.jwt(Customizer.withDefaults()));
    
    return http.build();
}

This verifies the JWT Token before the request even reaches your microservices.

6. How it Works (Internal Flow)

Spring Cloud Gateway is built on the Reactor Netty (Non-blocking) server.

7. Rate Limiting (Redis)

Spring Cloud Gateway has a built-in RequestRateLimiter usage Redis and the Token Bucket Algorithm. Dependency: spring-boot-starter-data-redis-reactive. Config:

spring:
  cloud:
    gateway:
      routes:
        - id: user_service
          uri: lb://USER-SERVICE
          filters:
            - name: RequestRateLimiter
              args:
                redis-rate-limiter.replenishRate: 10 # Tokens added per second
                redis-rate-limiter.burstCapacity: 20 # Max tokens in bucket
                key-resolver: "#{@userKeyResolver}"

This ensures no single user can overload your system.

Interview Deep-Dive

Why is Spring Cloud Gateway built on WebFlux (Netty) instead of Spring MVC (Tomcat)? What would happen if you accidentally included spring-boot-starter-web in a Gateway project?

Strong Answer:

A gateway is fundamentally a proxy — it receives a request, routes it to a backend service, and streams the response back. It does almost no computation; it is I/O-bound. The performance bottleneck is how many concurrent connections it can hold open while waiting for backend responses.
Tomcat (Spring MVC) uses a thread-per-request model. Default: 200 threads. If each backend call takes 500ms, you can handle 400 requests/second before thread exhaustion. To handle 10,000 concurrent connections, you would need 10,000 threads — each consuming ~1MB of stack memory (10GB just for thread stacks).
Netty (WebFlux) uses an event-loop model with a small number of threads (default: CPU cores). A single thread can manage thousands of connections because it never blocks waiting for I/O. When a backend response arrives, the event loop picks it up and forwards it. With 8 cores, Netty can handle 50,000+ concurrent connections using ~200MB of memory.
If you include spring-boot-starter-web alongside the gateway starter, Spring Boot detects Tomcat on the classpath and tries to start a servlet-based context. But Spring Cloud Gateway requires a reactive context. You get a startup error or, worse, the gateway starts but behaves unpredictably because WebFlux filters and servlet filters have incompatible lifecycles. The fix: exclude spring-boot-starter-web and only use spring-cloud-starter-gateway (which transitively includes spring-boot-starter-webflux).

Follow-up: How do you handle the situation where a downstream service is slow and the gateway’s event loop threads are all waiting?Event-loop threads should never block. If a downstream call is slow, the Netty event loop schedules a callback and moves on to handle other connections. The “waiting” is not a thread sitting idle — it is a registered interest in a file descriptor that the OS notifies when data arrives. If you accidentally block an event-loop thread (calling Thread.sleep(), synchronous JDBC, or .block() on a Mono inside a filter), you freeze all connections handled by that thread. Reactor’s BlockHound tool detects this in tests. For the slow-backend scenario specifically, set timeouts at the gateway level (spring.cloud.gateway.httpclient.response-timeout) and pair with a circuit breaker. The gateway returns 504 Gateway Timeout to the client, freeing the connection for other requests.

How would you implement authentication and authorization at the API Gateway level for a microservices architecture with 20 backend services?

Strong Answer:

The gateway handles authentication (who are you?) centrally. Configure it as an OAuth2 Resource Server that validates JWTs: verify the signature against the identity provider’s JWKS endpoint, check expiration, extract claims. This is done in a SecurityWebFilterChain with .oauth2ResourceServer(oauth2 -> oauth2.jwt(...)). Every request without a valid JWT is rejected with 401 before reaching any backend service.
For coarse-grained authorization at the gateway: use path-based rules. /admin/** requires ROLE_ADMIN. /public/** permits all. This prevents unauthorized requests from even reaching backend services, reducing their load.
For fine-grained authorization: propagate the validated JWT (or extracted claims as headers) to backend services. The Order Service knows that POST /orders requires ROLE_CUSTOMER and GET /orders/admin/report requires ROLE_ADMIN. Backend services trust the gateway’s token validation and focus on business-level authorization.
Implementation pattern: a GlobalFilter extracts claims from the validated JWT and adds them as headers: X-User-Id, X-User-Roles, X-User-Email. Backend services read these headers instead of re-parsing the JWT. This avoids every service needing the JWKS endpoint and the token parsing overhead.
Security consideration: backend services must ONLY be reachable through the gateway. If someone can bypass the gateway (misconfigured Kubernetes NetworkPolicy, exposed NodePort), the X-User-Id header can be spoofed. Enforce network-level isolation: backend services reject traffic not originating from the gateway’s pod CIDR or service account.

Follow-up: How do you handle token refresh when a long-running request spans multiple backend calls and the token expires mid-chain?JWTs should have a lifetime longer than your maximum request chain duration. If your deepest call chain takes 10 seconds, a JWT with a 5-minute expiry is fine. For truly long-running operations (file uploads, batch processing), use a different pattern: the gateway validates the JWT and issues a short-lived internal token (or passes a correlation ID) that backend services trust implicitly. Alternatively, use opaque tokens with introspection for long-running flows — the token can be revoked or refreshed server-side without the client being involved. The key insight: JWTs are great for stateless validation at the gateway edge, but for internal service-to-service communication behind the gateway, you have more flexibility because the network is trusted.

Explain the Token Bucket algorithm used by Spring Cloud Gateway's rate limiter. How does it differ from fixed window rate limiting, and when does each fail?

Strong Answer:

Token Bucket: imagine a bucket that holds burstCapacity tokens. Tokens are added at replenishRate per second. Each request consumes one token. If the bucket is empty, the request is rejected (429). The bucket never exceeds burstCapacity. This allows short bursts (a user fires 20 requests at once if the bucket is full) while enforcing a long-term average rate.
Fixed Window: divide time into fixed intervals (e.g., 1-minute windows). Count requests per window. If the count exceeds the limit, reject. The problem: boundary attacks. A user sends 100 requests at 11:59:59 (within the first window’s limit) and 100 more at 12:00:01 (within the second window’s limit). In a 2-second span, they sent 200 requests while each window only allows 100.
Token Bucket avoids the boundary attack because it does not have windows — it is continuous. The bucket has at most burstCapacity tokens at any point, so the maximum burst is capped regardless of timing.
Spring Cloud Gateway’s implementation uses Redis with a Lua script (atomic MULTI/EXEC). The script checks and updates token count and timestamp in a single Redis operation, preventing race conditions when multiple gateway instances process requests for the same user simultaneously.
Failure mode of Token Bucket: if Redis is down, the rate limiter cannot check tokens. The default behavior is to allow the request (fail open), which means no rate limiting during a Redis outage. For critical rate limiting, you need a local fallback.

Follow-up: How would you implement different rate limits for different user tiers (free: 10 req/s, premium: 100 req/s) in Spring Cloud Gateway?Implement a custom KeyResolver that returns a composite key: {userId}:{tier}. Then configure multiple route instances or use a custom RateLimiter implementation that reads tier-specific limits from a configuration source (Redis hash, database, or Spring Cloud Config). The cleaner approach: resolve the user tier in a pre-filter, set it as a request attribute, and write a custom RateLimiter that reads the attribute and applies the corresponding limit. This keeps the routing configuration simple while allowing arbitrary tier logic.

Service Discovery Config Management

Documentation Index

​API Gateway

​1. Why an API Gateway?

​2. Setup

​3. Configuration (Routing)

​4. Custom Filters

​5. Gateway Authentication

​6. How it Works (Internal Flow)

​7. Rate Limiting (Redis)

​Interview Deep-Dive

API Gateway

1. Why an API Gateway?

2. Setup

3. Configuration (Routing)

4. Custom Filters

5. Gateway Authentication

6. How it Works (Internal Flow)

7. Rate Limiting (Redis)

Interview Deep-Dive