Event-Driven Topology Selector
When to Use
You are designing or evaluating an event-driven architecture and need to choose between broker topology (decentralized event chains) and mediator topology (centralized event orchestration). Typical situations:
- Building a new event-driven system and need to decide how events flow
- Evaluating whether existing event workflows need central coordination
- Debugging error handling problems in an async system — events are being lost or workflows get stuck
- Comparing choreography vs orchestration for inter-service communication
- Deciding whether a use case is better served by request-based or event-based processing
- System has a mix of simple and complex workflows — need to choose the right topology for each
Before starting, verify:
- Has the team already decided on event-driven architecture? If not, this skill selects the TOPOLOGY within event-driven, not whether to use event-driven at all.
- Does the system have async processing needs? If everything is synchronous request-reply, event-driven may not be the right style — consider request-based model first.
Context & Input Gathering
Input Sufficiency Check
This skill depends on understanding the WORKFLOW characteristics, not just the system description. The same system may need different topologies for different workflows.
Required Context (must have — ask if missing)
-
System description and use cases: What does the system do? What events need to be processed?
- Check prompt for: system purpose, event types, processing steps, workflow descriptions
- If missing, ask: "What does your system do, and what events or workflows need to be processed asynchronously?"
-
Workflow dependencies: Are processing steps independent or do they depend on each other?
- Check prompt for: step ordering, conditional logic, rollback needs, parallel vs sequential
- If missing, ask: "When an event occurs, do the processing steps depend on each other (step B needs step A's result), or can they all happen independently in parallel?"
-
Error handling requirements: What happens when a processing step fails?
- Check prompt for: rollback, compensation, retry, notification, data consistency needs
- If missing, ask: "When a processing step fails (e.g., payment declined), do you need to (a) roll back previous steps, (b) retry automatically, (c) just log and continue, or (d) halt everything until resolved?"
- WHY this is critical: Error handling is the single biggest differentiator between broker and mediator. Broker topology has no built-in error handling — failed events are silently lost unless you build custom recovery.
Observable Context (gather from environment)
- Existing messaging infrastructure: What message brokers or event systems are in place?
- Look for: RabbitMQ, Kafka, ActiveMQ, AWS SQS/SNS configs, event bus implementations
- Reveals: whether infrastructure already favors one topology
- Current event patterns: Are there existing event handlers or processors?
- Look for: event handler classes, message consumers, saga implementations
- Reveals: current topology direction and complexity level
Default Assumptions
- If error handling requirements unknown, assume they ARE important (safer to recommend mediator and simplify than to recommend broker and discover you need coordination later)
- If workflow complexity unknown, assume moderate complexity (some dependencies between steps)
- If performance requirements unspecified, assume standard (not sub-millisecond)
Sufficiency Threshold
SUFFICIENT: system description + workflow dependencies + error handling needs are known
MUST ASK: error handling requirements are unknown (this drives the entire topology decision)
PROCEED WITH DEFAULTS: workflow dependencies partially known but error handling is clear
Process
Step 1: Determine If Event-Based Model Is Appropriate
ACTION: Evaluate whether the use case is better served by a request-based or event-based processing model.
WHY: Not everything should be event-driven. Request-based models are better when processing is data-driven, deterministic, and needs a direct response. Event-based models are better when processing is reactive, requires high responsiveness, and the system must adapt to situations as they arise. Choosing the wrong model wastes the entire topology analysis.
| Dimension | Request-Based | Event-Based |
|---|---|---|
| Communication style | Synchronous | Asynchronous |
| Data access | Request-reply (ask for data) | Fire-and-forget (react to events) |
| Determinism | High — same request gives same path | Lower — event chains are dynamic |
| Responsiveness | Moderate (bound by slowest step) | High (immediate acknowledgment) |
| Typical use case | "Get me the order history" | "A bid was placed, react to it" |
| Workflow control | Easy (caller controls the flow) | Hard (no single controller in broker) |
| Error handling | Straightforward (caller gets error) | Complex (no caller waiting) |
IF the use case is purely data-retrieval with synchronous needs, recommend request-based model. Stop here. ELSE proceed to Step 2.
Step 2: Map the Workflow Characteristics
ACTION: For each identified workflow/use case, map its characteristics across the 7 comparison dimensions.
WHY: Broker and mediator topologies have opposite strengths. Mapping the workflow against these dimensions prevents gut-feel decisions and reveals which trade-offs matter most for THIS specific system.
For each workflow, evaluate:
| Dimension | Favors Broker | Favors Mediator |
|---|---|---|
| Workflow control | No coordination needed — events flow freely | Steps must execute in specific order with conditions |
| Error handling | Errors are tolerable or self-healing | Failures require rollback, compensation, or retry coordination |
| Recoverability | System can recover organically | Must be able to recover to a known state |
| Restart capability | No need to restart a failed workflow | Must restart workflows from point of failure |
| Scalability need | Maximum throughput is critical | Moderate throughput is acceptable |
| Performance need | Sub-millisecond or very high performance | Standard latency is acceptable |
| Fault tolerance | Individual processor failure is acceptable | Single processor failure must not break the chain |
Step 3: Select the Topology
ACTION: Based on the dimension mapping, recommend broker, mediator, or hybrid topology.
WHY: The choice is fundamentally a trade-off between workflow control and error handling capability (mediator) versus high performance and scalability (broker). Neither is inherently better — it depends entirely on which dimensions the system prioritizes.
Decision logic:
IF workflow steps are independent AND error handling is not critical AND performance/scalability are top priorities:
- Recommend BROKER topology
- Processors are self-contained, events chain through channels
- No central coordinator — maximum decoupling and performance
- Each processor advertises what it did; other processors react
IF workflow steps have dependencies AND error handling/recoverability are important AND workflow must be coordinated:
- Recommend MEDIATOR topology
- Central mediator orchestrates the processing steps
- Mediator knows the workflow, manages state, handles errors
- Processing events are commands (things to do) not events (things that happened)
IF system has BOTH types of workflows:
- Recommend HYBRID topology
- Use mediator for complex workflows requiring coordination
- Use broker for simple, independent event chains
- Route through a simple mediator that classifies events and delegates
Step 4: Determine Mediator Complexity Level (If Mediator Selected)
ACTION: If mediator topology was selected, determine the appropriate mediator implementation complexity.
WHY: Mediators range from simple source-code routers to full BPM engines. Over-engineering the mediator wastes months; under-engineering it creates a bottleneck that can't handle the workflow complexity. Matching mediator complexity to workflow complexity is critical.
| Mediator Type | Use When | Implementation |
|---|---|---|
| Simple mediator | Linear workflows, basic error handling, routing logic | Source code (e.g., Apache Camel, Spring Integration, custom code) |
| Hardcoded mediator | Complex conditional workflows, multiple dynamic paths, structured error handling | BPEL engine (e.g., Apache ODE, Oracle BPEL Process Manager) |
| Complex mediator (BPM) | Long-running transactions, human intervention points, complex state machines | BPM engine (e.g., jBPM, Camunda) |
Classify each event type: Determine if it's simple, hard, or complex. Route through the simple mediator first — it classifies and delegates to the appropriate mediator type. This delegation model handles mixed-complexity events efficiently.
Step 5: Address Error Handling and Data Loss Prevention
ACTION: Design the error handling strategy based on the selected topology.
WHY: Asynchronous event-driven architectures have THREE points where data loss can occur in the async communication chain. Protecting only one point still leaves the system vulnerable at the other two. Most architects only think about the message queue and forget about the send and acknowledgment links.
The three data loss points:
-
Message send (producer to queue): Event is created but never reaches the queue
- Mitigation: Synchronous send with broker acknowledgment. Use persistent message queues. The producer waits for confirmation that the message was persisted before proceeding.
-
Message processing (queue to consumer): Event is dequeued but consumer crashes before processing
- Mitigation: Client acknowledge mode (not auto-acknowledge). The message stays on the queue until the consumer explicitly acknowledges successful processing. If the consumer crashes, the message is re-delivered.
-
Post-processing (consumer to database): Event is processed but the database write fails
- Mitigation: Use the last participant support pattern — the database commit and the message acknowledgment happen in the same transaction scope. If the DB fails, the message is not acknowledged and will be redelivered.
For broker topology error handling:
- Implement the workflow event pattern: a dedicated error-handling event processor monitors for failures and can trigger compensating actions
- Use dead letter queues for events that fail repeatedly — prevents infinite retry loops and allows manual inspection
For mediator topology error handling:
- The mediator itself manages error state — it knows which step failed and can stop the workflow
- Mediator persists workflow state, enabling restart from point of failure
- Compensating transactions can be orchestrated by the mediator (e.g., reverse payment if shipping fails)
Step 6: Produce the Topology Recommendation
ACTION: Compile the complete topology recommendation with rationale.
WHY: The recommendation must be specific enough to implement. A vague "use mediator" without explaining the error handling strategy, data loss prevention, and mediator complexity level leaves the team to figure out the hard parts on their own.
Inputs
- System description and event processing use cases
- Workflow dependencies and ordering requirements
- Error handling and data consistency requirements
- Performance and scalability targets (if known)
Outputs
Event-Driven Topology Recommendation
# Event-Driven Topology Recommendation: {System Name}
## Request-Based vs Event-Based Assessment
**Model selected:** {Request-based / Event-based / Mixed}
**Rationale:** {why this model fits}
## Workflow Analysis
| Workflow | Steps | Dependencies | Error Handling Need | Topology |
|----------|-------|:---:|:---:|:---:|
| {workflow 1} | {step list} | Independent / Dependent | Low / Medium / High | Broker / Mediator |
| {workflow 2} | ... | ... | ... | ... |
## Topology Decision
### Selected: {Broker / Mediator / Hybrid}
**Primary driver:** {the dimension that tipped the decision}
### 7-Dimension Trade-off Assessment
| Dimension | This System's Need | Broker | Mediator | Fit |
|-----------|-------------------|:---:|:---:|:---:|
| Workflow control | {need level} | Low | High | {which fits} |
| Error handling | {need level} | Low | High | {which fits} |
| Recoverability | {need level} | Low | High | {which fits} |
| Restart capability | {need level} | Low | High | {which fits} |
| Scalability | {need level} | High | Moderate | {which fits} |
| Performance | {need level} | High | Moderate | {which fits} |
| Fault tolerance | {need level} | High | Low | {which fits} |
## Mediator Complexity (if applicable)
**Level:** {Simple / Hardcoded / Complex (BPM)}
**Implementation suggestion:** {specific technology recommendation}
**Rationale:** {why this complexity level}
## Error Handling Strategy
**Data loss prevention:**
- Message send: {mitigation}
- Message processing: {mitigation}
- Post-processing: {mitigation}
**Error recovery pattern:** {workflow event pattern / dead letter queue / mediator-managed / combination}
## Architecture Characteristics Impact
- Performance: {stars}/5
- Scalability: {stars}/5
- Fault tolerance: {stars}/5
- Evolutionary: {stars}/5
- Testability: {stars}/5
Key Principles
-
The choice is workflow control vs performance — Broker topology maximizes performance, scalability, and decoupling. Mediator topology maximizes workflow control, error handling, and recoverability. Neither is inherently better. The decision hinges on which of these your system values more.
-
Events vs commands reveal the topology — In broker topology, processing events describe what HAPPENED (order-created, payment-applied). In mediator topology, processing events are COMMANDS telling processors what to DO (place-order, apply-payment). If your events are naturally commands with expected outcomes, you need a mediator.
-
Error handling is the deal-breaker — If a processing step can fail and the failure requires coordinated recovery (rollback, compensation, retry), broker topology cannot handle this without significant custom infrastructure. The mediator exists precisely for this scenario. When in doubt about error handling needs, lean toward mediator.
-
Protect all three links in the async chain — Data loss can occur at message send, message processing, and post-processing. Most architects only protect the message queue itself (persistence) but forget about the send confirmation and the consumer acknowledgment. All three must be addressed.
-
Hybrid is often the right answer — Real systems rarely have uniformly simple or uniformly complex workflows. A simple mediator that classifies incoming events and delegates simple ones to broker-style processing while routing complex ones through a full mediator gives the best of both worlds.
-
Match mediator complexity to workflow complexity — Using a BPM engine for simple routing wastes months of effort. Using source-code routing for complex workflows with human intervention points creates unmaintainable spaghetti. Classify your events (simple/hard/complex) and pick the mediator type accordingly.
Examples
Scenario: Order fulfillment with payment rollback Trigger: "We're building an order fulfillment system. When a customer places an order, we need to validate inventory, charge payment, send confirmation email, update warehouse, and notify shipping. If payment fails, we need to rollback the inventory reservation." Process: Mapped workflow — steps have dependencies (payment must succeed before fulfillment). Error handling is critical (payment failure requires inventory rollback). This is a coordinated workflow with compensation requirements. Evaluated 7 dimensions: workflow control = high need, error handling = high need, recoverability = high need. Performance and scalability are standard. All three critical dimensions favor mediator. Output: Mediator topology. Simple mediator implementation (source code, e.g., custom orchestrator or Apache Camel). 5-step workflow: (1) create order, (2) process order (email + payment + inventory in parallel), (3) fulfill order, (4) ship order, (5) notify customer. Mediator waits for acknowledgment from parallel step 2 processors before proceeding. If payment fails at step 2, mediator triggers inventory rollback and halts workflow. Data loss prevention: persistent queues with synchronous send, client-acknowledge mode, last-participant-support for DB writes.
Scenario: Social media fan-out with independent processors Trigger: "Users post content that needs to: update feeds, notify followers, run content moderation, update search index, and generate analytics. These are all independent." Process: Mapped workflow — all steps are independent (no ordering, no dependencies). Error handling is low priority (if search indexing fails, it can retry independently without affecting other steps). Evaluated 7 dimensions: workflow control = not needed, error handling = low (each processor handles its own), scalability = high (viral posts need fan-out), performance = high (real-time feed updates). All critical dimensions favor broker. Output: Broker topology. Post-created initiating event fans out to 5 independent event processors. Each processor publishes its own processing event (feed-updated, followers-notified, etc.) for extensibility. No mediator needed — processors are self-contained. Dead letter queues for each processor to catch persistent failures. Per-processor scaling based on load.
Scenario: Mixed workloads — trading platform with compliance Trigger: "Trade events need sub-millisecond processing. We also have compliance reporting that aggregates trades daily with complex rules." Process: Identified two distinct workflows. Trade execution: independent, performance-critical, fault-tolerant — classic broker. Compliance reporting: complex rules, conditional paths, must complete all steps, needs audit trail — classic mediator. Recommended hybrid topology. Output: Hybrid topology. Trade execution path uses broker topology for maximum performance — trade-executed events fan out to position tracking, risk calculation, and P&L processors independently. Compliance reporting path uses mediator topology — daily compliance mediator orchestrates trade aggregation, rule evaluation, exception flagging, and report generation in sequence. Simple event router at entry point classifies events by type and delegates to the appropriate topology. Trade path uses Kafka for high-throughput; compliance path uses RabbitMQ with a lightweight orchestrator.
References
- For the detailed broker vs mediator comparison table with full trade-off analysis, see references/broker-vs-mediator-comparison.md
License
This skill is licensed under CC-BY-SA-4.0. Source: BookForge — Fundamentals of Software Architecture by Mark Richards, Neal Ford.
Related BookForge Skills
This skill is standalone. Browse more BookForge skills: bookforge-skills