Domain Error Strategy
Layer 2: Design Choices
Core Question
Who needs to handle this error, and how should they recover?
Before designing error types:
-
Is this user-facing or internal?
-
Is recovery possible?
-
What context is needed for debugging?
Error Categorization
Error Type Audience Recovery Example
User-facing End users Guide action InvalidEmail , NotFound
Internal Developers Debug info DatabaseError , ParseError
System Ops/SRE Monitor/alert ConnectionTimeout , RateLimited
Transient Automation Retry NetworkError , ServiceUnavailable
Permanent Human Investigate ConfigInvalid , DataCorrupted
Thinking Prompt
Before designing error types:
Who sees this error?
-
End user → friendly message, actionable
-
Developer → detailed, debuggable
-
Ops → structured, alertable
Can we recover?
-
Transient → retry with backoff
-
Degradable → fallback value
-
Permanent → fail fast, alert
What context is needed?
-
Call chain → anyhow::Context
-
Request ID → structured logging
-
Input data → error payload
Trace Up ↑
To domain constraints (Layer 3):
"How should I handle payment failures?" ↑ Ask: What are the business rules for retries? ↑ Check: domain-fintech (transaction requirements) ↑ Check: SLA (availability requirements)
Question Trace To Ask
Retry policy domain-* What's acceptable latency for retry?
User experience domain-* What message should users see?
Compliance domain-* What must be logged for audit?
Trace Down ↓
To implementation (Layer 1):
"Need typed errors" ↓ m06-error-handling: thiserror for library ↓ m04-zero-cost: Error enum design
"Need error context" ↓ m06-error-handling: anyhow::Context ↓ Logging: tracing with fields
"Need retry logic" ↓ m07-concurrency: async retry patterns ↓ Crates: tokio-retry, backoff
Quick Reference
Recovery Pattern When Implementation
Retry Transient failures exponential backoff
Fallback Degraded mode cached/default value
Circuit Breaker Cascading failures failsafe-rs
Timeout Slow operations tokio::time::timeout
Bulkhead Isolation separate thread pools
Error Hierarchy
#[derive(thiserror::Error, Debug)] pub enum AppError { // User-facing #[error("Invalid input: {0}")] Validation(String),
// Transient (retryable)
#[error("Service temporarily unavailable")]
ServiceUnavailable(#[source] reqwest::Error),
// Internal (log details, show generic)
#[error("Internal error")]
Internal(#[source] anyhow::Error),
}
impl AppError { pub fn is_retryable(&self) -> bool { matches!(self, Self::ServiceUnavailable(_)) } }
Retry Pattern
use tokio_retry::{Retry, strategy::ExponentialBackoff};
async fn with_retry<F, T, E>(f: F) -> Result<T, E> where F: Fn() -> impl Future<Output = Result<T, E>>, E: std::fmt::Debug, { let strategy = ExponentialBackoff::from_millis(100) .max_delay(Duration::from_secs(10)) .take(5);
Retry::spawn(strategy, || f()).await
}
Common Mistakes
Mistake Why Wrong Better
Same error for all No actionability Categorize by audience
Retry everything Wasted resources Only transient errors
Infinite retry DoS self Max attempts + backoff
Expose internal errors Security risk User-friendly messages
No context Hard to debug .context() everywhere
Anti-Patterns
Anti-Pattern Why Bad Better
String errors No structure thiserror types
panic! for recoverable Bad UX Result with context
Ignore errors Silent failures Log or propagate
Box everywhere Lost type info thiserror
Error in happy path Performance Early validation
Related Skills
When See
Error handling basics m06-error-handling
Retry implementation m07-concurrency
Domain modeling m09-domain
User-facing APIs domain-*