Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Microservices Foundations

Before writing code, we must understand why we are building microservices.

1. Monolith vs Microservices

The Monolith

A single deployment unit containing all business logic, UI, and data access. Pros:
  • Simple to develop, test, and deploy (initially).
  • No network latency between calls.
  • Easy to debug (single stack trace).
Cons:
  • Scaling: Can only scale the whole app (X-axis scaling), not specific bottlenecks.
  • Technology Lock-in: Hard to switch languages/frameworks.
  • Complexity: Over time, “Big Ball of Mud” anti-pattern emerges.
  • Reliability: If one module causes an OOM error, the whole system crashes.

Microservices

Architectural style where an application is a collection of loosely coupled services. Pros:
  • Technological Freedom: Each service can use the best tool for the job.
  • Independent Deployment: Deploy User Service without redeploying Order Service.
  • Fault Isolation: If Payment Service fails, users can still browse products.
  • Scaling: Scale only the popular services independently.
Cons:
  • Distributed Complexity: Network failures, latency, consistency issues (CAP Theorem).
  • Operational Overhead: Need sophisticated monitoring, logging, and deployment pipelines.

2. Domain Driven Design (DDD) Basics

DDD is crucial for identifying service boundaries. A microservice should correspond to a Bounded Context.
  • Ubiquitous Language: Common language shared by developers and domain experts.
  • Bounded Context: A boundary within which a particular domain model is defined and applicable.
    • Example: “User” in the Sales Context might refer to a customer with credit limits. “User” in the Support Context might refer to a ticket opener. These should be different Microservices.

3. Communication Patterns

Services need to talk to each other.

Synchronous (Request/Response)

The client waits for a response.
  • HTTP/REST: Standard, easy to debug over JSON.
  • gRPC: High performance, strictly typed (Protobuf).
Drawback: Coupling. If Service B is down, Service A might fail (Cascading Failure).

Asynchronous (Event-Driven)

The client sends a message and forgets.
  • Message Queues: RabbitMQ, ActiveMQ.
  • Event Streaming: Apache Kafka.
Advantage: Decoupling. If Service B is down, the message stays in the queue until B recovers.

4. The 12-Factor App

A methodology for building cloud-native apps:
  1. Codebase: One codebase tracked in revision control, many deploys.
  2. Dependencies: Explicitly declare and isolate dependencies.
  3. Config: Store config in the environment.
  4. Backing services: Treat backing services as attached resources.
  5. Build, release, run: Strictly separate build and run stages.
  6. Processes: Execute the app as one or more stateless processes.
  7. Port binding: Export services via port binding.
  8. Concurrency: Scale out via the process model.
  9. Disposability: Maximize robustness with fast startup and graceful shutdown.
  10. Dev/prod parity: Keep development, staging, and production as similar as possible.
  11. Logs: Treat logs as event streams.
  12. Admin processes: Run admin/management tasks as one-off processes.

5. CAP Theorem (The Trade-offs)

In any distributed data store, you can only provide two of the following three guarantees:
  1. Consistency (C): Every read receives the most recent write or an error.
  2. Availability (A): Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
  3. Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped/delayed by the network.
Reality: Network Partitions (P) will happen. So you have to choose between CP (Consistency) and AP (Availability).
  • CP (Banking): If the network breaks, stop accepting updates until it’s fixed. Don’t show wrong balance.
  • AP (Social Media): If the network breaks, show the old feed. It’s better than showing an error.

Interview Deep-Dive

Strong Answer:
  • The answer is Domain-Driven Design, specifically Bounded Contexts. A service boundary should align with a domain boundary — an area where a particular model and ubiquitous language are internally consistent. The classic mistake is splitting by technical layer (a “User Service” that handles user CRUD for every domain) instead of by business capability (a “Billing Service” that owns its own user concept with payment methods and invoices).
  • Concrete example: an e-commerce monolith. Start by mapping the domain. “Product” in the Catalog context means name, description, images, categories. “Product” in the Inventory context means SKU, warehouse location, quantity. “Product” in the Pricing context means base price, discounts, regional pricing rules. These are three different bounded contexts with three different models of “Product.” Each becomes a candidate microservice.
  • The practical test: look at the data. If two modules rarely share writes to the same tables, they are good split candidates. If they constantly JOIN across each other’s tables, forcing a split creates a distributed join nightmare that performs 10x worse than the monolith query. I would run SQL analysis on production query logs to find these coupling patterns before making any architectural decision.
  • Another signal: team structure (Conway’s Law). If the billing team and the shipping team are separate organizational units with different release cadences and on-call rotations, they should own separate services. If one team owns both and deploys them together, keeping them in one service (or one deployable) reduces coordination overhead.
Follow-up: What is the “distributed monolith” anti-pattern, and how do you recognize that you have accidentally built one?A distributed monolith is a system split into multiple services that must be deployed together, share a database, or have synchronous call chains so deep that a failure in any service cascades to all others. You recognize it by three symptoms: (1) you cannot deploy Service A without also deploying Service B, because they share schema or have tightly coupled APIs that change in lockstep; (2) every user-facing request triggers a synchronous chain 5+ services deep, so latency is the sum of all services and availability is the product of all uptimes (five 99.9% services give you 99.5%); (3) teams cannot make decisions independently — a schema change in one service requires coordinated changes in three others. The fix: identify which synchronous calls can become asynchronous events, which shared databases need to be split (accepting eventual consistency), and which services should be merged back together because they were split prematurely.
Strong Answer:
  • The textbook version: in a distributed system, you can only guarantee two of three properties — Consistency (every read gets the latest write), Availability (every request gets a non-error response), Partition Tolerance (the system works despite network splits). Since partitions are inevitable in any networked system, you choose between CP (consistency over availability) and AP (availability over consistency).
  • Why it is misunderstood: CAP is about behavior DURING a network partition, not during normal operation. When the network is healthy, you can have all three. The real question is: “When a partition occurs, do you refuse some requests (CP) or serve potentially stale data (AP)?” Most systems are not purely one or the other — they make different choices for different operations.
  • The more practical framework is PACELC (by Daniel Abadi): if there is a Partition, choose A or C; Else (normal operation), choose Latency or Consistency. DynamoDB, for example, is PA/EL — during partitions it chooses availability; during normal operation it chooses low latency (eventual consistency by default, but you can opt into strong consistency per query at higher latency).
  • Real-world application: a banking system’s balance check might be CP (refuse to show balance if data might be stale), while the same bank’s transaction history might be AP (show yesterday’s statement even during a partition — it is close enough for most users). The choice is per-operation, not per-system.
Follow-up: How does eventual consistency work in practice in a microservices architecture? Give a concrete example of how you would handle it.In an e-commerce system: the Order Service saves an order and publishes an OrderPlaced event. The Inventory Service consumes this event and decrements stock. There is a window (milliseconds to seconds) where the order exists but inventory has not been updated — that is the “eventual” in eventual consistency. The risk: during that window, another order might over-sell the last item. Handling strategies: (1) Saga pattern — each service publishes events, and compensating actions undo work if a downstream step fails (Inventory Service publishes OutOfStock, Order Service cancels the order). (2) Reserve-then-commit — Inventory Service reserves stock synchronously (decrement + hold), Order Service commits, Inventory Service confirms. (3) Accept the inconsistency for low-risk items (a book can be backordered) and enforce strict consistency only for high-value items (concert tickets use pessimistic locking).
Strong Answer:
  • Synchronous (REST, gRPC): the caller sends a request and blocks (or awaits) until the response arrives. Advantages: simple mental model, immediate feedback (success or failure), easy to debug (one request, one response, one trace). Disadvantages: temporal coupling (both services must be up simultaneously), latency accumulates across call chains, cascading failures (if Service C is slow, Service B’s thread pool fills, Service A times out).
  • Asynchronous (Kafka, RabbitMQ): the caller publishes a message/event and moves on. Advantages: temporal decoupling (consumer can be down, message waits in the queue), better throughput (producer is never blocked), natural load leveling (consumer processes at its own pace). Disadvantages: eventual consistency (caller does not know immediately if the operation succeeded), harder to debug (no single request-response trace), infrastructure complexity (broker management, dead letter queues, message ordering).
  • Choose synchronous when: the caller NEEDS the result to proceed (checking inventory before confirming an order), the operation is fast (sub-100ms), or strong consistency is required (financial debit must be confirmed before credit).
  • Choose asynchronous when: the caller does not need the result immediately (sending a notification email after order placement), the operation is slow or unreliable (PDF generation, third-party API calls), or you need to decouple teams (Order team should not be blocked by Notification team’s deployment schedule).
  • The hybrid approach is most common in production: synchronous for the critical path (check inventory, charge payment) and asynchronous for side effects (send email, update analytics, sync to data warehouse).
Follow-up: Your service currently makes a synchronous REST call to the Inventory Service. You want to make it asynchronous. What changes are needed beyond just adding Kafka?The biggest change is in your data model and API contract. Currently, POST /orders synchronously checks inventory and returns 200 (confirmed) or 400 (out of stock). With async, you cannot give an immediate answer. Options: (1) Return 202 Accepted with a status URL. The client polls GET /orders/{id}/status until it transitions from PENDING to CONFIRMED or REJECTED. (2) Use WebSockets or SSE to push the confirmation to the client. (3) Accept the order optimistically and compensate later (email “sorry, out of stock” if inventory rejects it). Each approach changes the user experience. You also need idempotency (what if the event is delivered twice), ordering guarantees (Kafka partition by order ID), and a dead letter queue for events that repeatedly fail processing.