Kafka Topology Design Skill

Design deterministic, replay-safe, cost-efficient Kafka Streams topologies using KSA patterns.

Required Reading

Before responding, load the shared reference:

cat ${SKILL_PATH}/references/KSA.md

This is the authoritative source for all patterns, principles, and constraints.

Workflow

Step 1 — Understand the Problem

Gather from the user (ask if missing):

Source events — what topics trigger this service?
Enrichment needs — what data beyond the event itself?
Outputs — output topics, DB writes, notifications?
Statefulness — does output depend on past events?
Partition key — what entity scopes this? (userId, orderId, etc.)

Step 2 — Select Recipes

Map to KSA recipes (KSA.md §4):

Problem involves…	Recipe
Cleaning/validating inbound events	01 — Validation & Normalization
Duplicate events from upstream	02 — Deduplication
Splitting events to different consumers	03 — Routing & Fan-Out
Looking up reference data	04 — Data Enrichment
Reference data + historical computation	05 — Enrichment + Stateful
Counting, rate limiting, windowed metrics	06 — Windowed Aggregation
Entity lifecycle (order, payment, KYC)	07 — Per-Key State Machine
Cross-service coordination with rollback	08 — Saga Orchestrator
Building a read model or search index	09 — CQRS Projection
Bug fix replay or data backfill	10 — Event Replay

Step 3 — Compose the Topology

Arrange recipes left to right:

Source → [Ingress] → [Enrichment] → [Computation] → [Egress] → Sink

Not every stage needed. Only include what the problem requires.

Step 4 — Draw the Diagram

Produce a Mermaid flowchart LR using the KSA symbol legend (KSA.md §3):

[TopicName] — Kafka topic
[TopicName*] — compacted topic (KTable source)
(Processor) — stateless processor
{{Processor}} — stateful processor
((Join)) — stream–table join
[[Sink]] — side-effect boundary
{Decision?} — conditional branch

Step 5 — Declare Policies

For every topology, explicitly document:

Missing-state policy per join (drop / dead-letter / retry / buffer)
Partition key and why it aligns with all joins
State store retention per stateful processor
Sink idempotency strategy

Step 6 — Cost Check

Estimate per KSA.md §7.4:

Factor	Estimate	Red Flag
State store size/key	value × keys × retention	> 50 GB/instance
Changelog overhead	store size × replication	> 100 GB total
Repartition count	selectKey/through calls	> 2 on high-volume
KTable restore time	topic size / throughput	> 10 minutes
Partition count	all internal + output topics	> 500 total

Multiple red flags → recommend alternatives (KSA.md §7.3).

Step 7 — Compliance Checklist

Verify against KSA.md §6 before signing off.

Output Format

Always produce:

Summary — one paragraph describing what the service does
Recipes used — numbered list of KSA recipe numbers and names
Topology diagram — Mermaid flowchart LR
State diagram — Mermaid stateDiagram-v2 (if FSM involved)
Policy table — missing-state, retention, idempotency decisions
Cost estimate — back-of-napkin numbers for the heuristic
Compliance — checklist pass/fail

Anti-Patterns to Flag

Anti-Pattern	Why It's Wrong	Suggest
HTTP calls inside processor	Breaks replay determinism	KTable enrichment
DB queries inside processor	Same as above	Compacted topic
No declared missing-state policy	Undeclared behavior = design defect	Ask: "what happens when KTable has no entry?"
Partition key mismatch	Join key ≠ partition key	Repartition (flag cost)
Unbounded state stores	No TTL = unbounded growth	Ask about retention
GlobalKTable for large data	Loads ALL data on EVERY instance	Regular KTable with partition-aligned joins
Multiple repartitions on same stream	Each doubles I/O	Redesign key strategy
Stateful where stateless suffices	Unnecessary state store overhead	Remove state store

Conversation Style

Ask clarifying questions before designing. Problem statements are often incomplete.
When multiple recipes apply, explain trade-offs and let the engineer choose.
Always produce a diagram.
Be explicit about what the topology does NOT handle (scope).
If the problem is better solved without KStreams, say so (KSA.md §7.3).

fp-kstream-design

Safety Notice

Copy this and send it to your AI assistant to learn