m13-domain-error

Domain Error Strategy

Layer 2: Design Choices

Core Question

Who needs to handle this error, and how should they recover?

Before designing error types:

Is this user-facing or internal?
Is recovery possible?
What context is needed for debugging?

Error Categorization

Error Type Audience Recovery Example

User-facing End users Guide action InvalidEmail , NotFound

Internal Developers Debug info DatabaseError , ParseError

System Ops/SRE Monitor/alert ConnectionTimeout , RateLimited

Transient Automation Retry NetworkError , ServiceUnavailable

Permanent Human Investigate ConfigInvalid , DataCorrupted

Thinking Prompt

Before designing error types:

Who sees this error?

End user → friendly message, actionable
Developer → detailed, debuggable
Ops → structured, alertable

Can we recover?

Transient → retry with backoff
Degradable → fallback value
Permanent → fail fast, alert

What context is needed?

Call chain → anyhow::Context
Request ID → structured logging
Input data → error payload

Trace Up ↑

To domain constraints (Layer 3):

"How should I handle payment failures?" ↑ Ask: What are the business rules for retries? ↑ Check: domain-fintech (transaction requirements) ↑ Check: SLA (availability requirements)

Question Trace To Ask

Retry policy domain-* What's acceptable latency for retry?

User experience domain-* What message should users see?

Compliance domain-* What must be logged for audit?

Trace Down ↓

To implementation (Layer 1):

"Need typed errors" ↓ m06-error-handling: thiserror for library ↓ m04-zero-cost: Error enum design

"Need error context" ↓ m06-error-handling: anyhow::Context ↓ Logging: tracing with fields

"Need retry logic" ↓ m07-concurrency: async retry patterns ↓ Crates: tokio-retry, backoff

Quick Reference

Recovery Pattern When Implementation

Retry Transient failures exponential backoff

Fallback Degraded mode cached/default value

Circuit Breaker Cascading failures failsafe-rs

Timeout Slow operations tokio::time::timeout

Bulkhead Isolation separate thread pools

Error Hierarchy

#[derive(thiserror::Error, Debug)] pub enum AppError { // User-facing #[error("Invalid input: {0}")] Validation(String),

// Transient (retryable)
#[error("Service temporarily unavailable")]
ServiceUnavailable(#[source] reqwest::Error),

// Internal (log details, show generic)
#[error("Internal error")]
Internal(#[source] anyhow::Error),

}

impl AppError { pub fn is_retryable(&self) -> bool { matches!(self, Self::ServiceUnavailable(_)) } }

Retry Pattern

use tokio_retry::{Retry, strategy::ExponentialBackoff};

async fn with_retry<F, T, E>(f: F) -> Result<T, E> where F: Fn() -> impl Future<Output = Result<T, E>>, E: std::fmt::Debug, { let strategy = ExponentialBackoff::from_millis(100) .max_delay(Duration::from_secs(10)) .take(5);

Retry::spawn(strategy, || f()).await

}

Common Mistakes

Mistake Why Wrong Better

Same error for all No actionability Categorize by audience

Retry everything Wasted resources Only transient errors

Infinite retry DoS self Max attempts + backoff

Expose internal errors Security risk User-friendly messages

No context Hard to debug .context() everywhere

Anti-Patterns

Anti-Pattern Why Bad Better

String errors No structure thiserror types

panic! for recoverable Bad UX Result with context

Ignore errors Silent failures Log or propagate

Box everywhere Lost type info thiserror

Error in happy path Performance Early validation

Related Skills

When See

Error handling basics m06-error-handling

Retry implementation m07-concurrency

Domain modeling m09-domain

User-facing APIs domain-*

m13-domain-error

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

m13-domain-error

m13-domain-error

coding-guidelines

m05-type-driven