LangChain Observability
Contents
-
Overview
-
Prerequisites
-
Instructions
-
Output
-
Error Handling
-
Examples
-
Resources
Overview
Set up comprehensive observability for LangChain applications with LangSmith, OpenTelemetry, Prometheus, and structured logging.
Prerequisites
-
LangChain application in staging/production
-
LangSmith account (optional but recommended)
-
Prometheus/Grafana infrastructure
-
OpenTelemetry collector (optional)
Instructions
Step 1: Enable LangSmith Tracing
Set LANGCHAIN_TRACING_V2=true and configure LANGCHAIN_API_KEY and LANGCHAIN_PROJECT environment variables. All chains are automatically traced.
Step 2: Add Prometheus Metrics
Create a PrometheusCallback handler that tracks langchain_llm_requests_total , langchain_llm_latency_seconds , and langchain_llm_tokens_total counters/histograms.
Step 3: Integrate OpenTelemetry
Use OTLPSpanExporter with a custom OpenTelemetryCallback to add spans for chain and LLM operations with parent-child relationships.
Step 4: Configure Structured Logging
Use structlog with a StructuredLoggingCallback to emit JSON logs for all LLM start/end/error events.
Step 5: Set Up Grafana Dashboard
Create panels for request rate, P95 latency, token usage, and error rate using Prometheus queries.
Step 6: Configure Alerting Rules
Define Prometheus alerts for high error rate (>5%), high latency (P95 >5s), and token budget exceeded.
See detailed implementation for complete callback code, dashboard JSON, and alert rules.
Output
-
LangSmith tracing enabled
-
Prometheus metrics exported
-
OpenTelemetry spans
-
Structured logging
-
Grafana dashboard and alerts
Error Handling
Issue Cause Solution
Missing metrics Callback not attached Pass callback to LLM constructor
Trace gaps Missing context propagation Check parent span handling
Alert storms Thresholds too sensitive Tune for duration and thresholds
Examples
Basic usage: Apply langchain observability to a standard project setup with default configuration options.
Advanced scenario: Customize langchain observability for production environments with multiple constraints and team-specific requirements.
Resources
-
LangSmith Documentation
-
OpenTelemetry Python
-
Prometheus Python Client
Next Steps
Use langchain-incident-runbook for incident response procedures.