Phoenix - AI Observability Platform
Collaborating skills
- AI Engineering: skill:
ai-engineeringfor building the LLM applications that Phoenix observes
Open-source AI observability and evaluation platform for LLM applications with tracing, evaluation, datasets, experiments, and real-time monitoring.
When to Use Phoenix
- Debugging LLM applications with detailed traces and span analysis
- Running systematic evaluations on datasets with LLM-as-judge
- Monitoring production LLM systems with real-time insights
- Building experiment pipelines for prompt/model comparison
- Self-hosted observability without vendor lock-in
Key Features
- Tracing: OpenTelemetry-based trace collection for any LLM framework
- Evaluation: LLM-as-judge evaluators for quality assessment
- Datasets: Versioned test sets for regression testing
- Experiments: Compare prompts, models, and configurations
- Open-source: Self-hosted with PostgreSQL or SQLite
Quick Start
Installation
pip install arize-phoenix
# With specific features
pip install arize-phoenix[embeddings] # Embedding analysis
pip install arize-phoenix-otel # OpenTelemetry config
pip install arize-phoenix-evals # Evaluation framework
Launch Phoenix Server
import phoenix as px
# Launch in notebook
session = px.launch_app()
# View UI
session.view() # Embedded iframe
print(session.url) # http://localhost:6006
Command-line Server
# Start Phoenix server
phoenix serve
# With PostgreSQL backend
export PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host/db"
phoenix serve --port 6006
Basic Tracing
from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentor
# Configure OpenTelemetry with Phoenix
tracer_provider = register(
project_name="my-llm-app",
endpoint="http://localhost:6006/v1/traces"
)
# Instrument OpenAI SDK
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
# All OpenAI calls are now traced
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
Custom Agents with Decorators
For framework-agnostic agentic systems, use @tracer.agent, @tracer.chain, and @tracer.tool decorators:
from openinference.instrumentation import Instrumentor
from phoenix.otel import register
tracer_provider = register(project_name="custom-agent")
instrumentor = Instrumentor(tracer_provider=tracer_provider)
@instrumentor.agent
def my_agent(query: str) -> str:
context = search_tool(query)
return synthesize_tool(context, query)
@instrumentor.tool
def search_tool(query: str) -> list:
return vector_store.search(query)
@instrumentor.tool
def synthesize_tool(context: list, query: str) -> str:
return llm.generate(query, context)
For detailed tracing patterns, see tracing-setup.md.
Storage Backends
Phoenix supports both SQLite and PostgreSQL for persistent storage:
- SQLite: Simple, file-based storage (default, ideal for development)
- PostgreSQL: Production-ready database for scalability and concurrent access
For detailed configuration examples, see storage-backends.md.
Docker Deployment
For containerized deployment, see docker-deployment.md for:
- Docker compose files for both SQLite and PostgreSQL
- Production-ready configuration
- Multi-container setup
Tracing Setup
For comprehensive tracing setup with OpenTelemetry, see tracing-setup.md:
- Framework-agnostic decorators:
@tracer.agent,@tracer.chain,@tracer.toolfor custom agents - Manual instrumentation with OpenTelemetry API
- Automatic instrumentation for LLM frameworks
- Distributed tracing for multi-service applications
- Custom span attributes and context propagation
Framework Integrations
Phoenix provides auto-instrumentation for many LLM frameworks. For detailed integration guides, see:
- framework-integrations.md: Complete list of supported frameworks
- DSPy, LangChain, LlamaIndex, Agno, AutoGen, CrewAI, and more
- Provider-specific integrations (OpenAI, Anthropic, Bedrock, etc.)
- Platform integrations (Dify, Flowise, LangFlow)
Core Concepts
Traces and Spans
A trace represents a complete execution flow, while spans are individual operations within that trace.
from phoenix.otel import register
from opentelemetry import trace
# Setup tracing
tracer_provider = register(project_name="my-app")
tracer = trace.get_tracer(__name__)
# Create custom spans
with tracer.start_as_current_span("process_query") as span:
span.set_attribute("input.value", query)
# Child spans are automatically nested
with tracer.start_as_current_span("retrieve_context"):
context = retriever.search(query)
with tracer.start_as_current_span("generate_response"):
response = llm.generate(query, context)
span.set_attribute("output.value", response)
Projects
Projects organize related traces:
import os
os.environ["PHOENIX_PROJECT_NAME"] = "production-chatbot"
# Or per-trace
from phoenix.otel import register
tracer_provider = register(project_name="experiment-v2")
Evaluation Framework
Built-in Evaluators
from phoenix.evals import (
OpenAIModel,
HallucinationEvaluator,
RelevanceEvaluator,
ToxicityEvaluator,
)
# Setup model for evaluation
eval_model = OpenAIModel(model="gpt-4o")
# Evaluate hallucination
hallucination_eval = HallucinationEvaluator(eval_model)
results = hallucination_eval.evaluate(
input="What is the capital of France?",
output="The capital of France is Paris.",
reference="Paris is the capital of France."
)
Run Evaluations on Dataset
from phoenix import Client
from phoenix.evals import run_evals
client = Client()
# Get spans to evaluate
spans_df = client.get_spans_dataframe(
project_name="my-app",
filter_condition="span_kind == 'LLM'"
)
# Run evaluations
eval_results = run_evals(
dataframe=spans_df,
evaluators=[
HallucinationEvaluator(eval_model),
RelevanceEvaluator(eval_model)
],
provide_explanation=True
)
# Log results back to Phoenix
client.log_evaluations(eval_results)
Client API
Query Traces and Spans
from phoenix import Client
client = Client(endpoint="http://localhost:6006")
# Get spans as DataFrame
spans_df = client.get_spans_dataframe(
project_name="my-app",
filter_condition="span_kind == 'LLM'",
limit=1000
)
# Get specific span
span = client.get_span(span_id="abc123")
# Get trace
trace = client.get_trace(trace_id="xyz789")
Log Feedback
from phoenix import Client
client = Client()
# Log user feedback
client.log_annotation(
span_id="abc123",
name="user_rating",
annotator_kind="HUMAN",
score=0.8,
label="helpful",
metadata={"comment": "Good response"}
)
Environment Variables
| Variable | Description | Default |
|---|---|---|
PHOENIX_PORT | HTTP server port | 6006 |
PHOENIX_HOST | Server bind address | 127.0.0.1 |
PHOENIX_GRPC_PORT | gRPC/OTLP port | 4317 |
PHOENIX_SQL_DATABASE_URL | Database connection | SQLite temp |
PHOENIX_WORKING_DIR | Data storage directory | OS temp |
PHOENIX_ENABLE_AUTH | Enable authentication | false |
PHOENIX_SECRET | JWT signing secret | Required if auth enabled |
Best Practices
- Use projects: Separate traces by environment (dev/staging/prod)
- Add metadata: Include user IDs, session IDs for debugging
- Evaluate regularly: Run automated evaluations in CI/CD
- Version datasets: Track test set changes over time
- Monitor costs: Track token usage via Phoenix dashboards
- Self-host: Use PostgreSQL for production deployments
Common Issues
Traces Not Appearing
from phoenix.otel import register
# Verify endpoint
tracer_provider = register(
project_name="my-app",
endpoint="http://localhost:6006/v1/traces" # Correct endpoint
)
# Force flush
from opentelemetry import trace
trace.get_tracer_provider().force_flush()
Database Connection Issues
# Verify PostgreSQL connection
psql $PHOENIX_SQL_DATABASE_URL -c "SELECT 1"
# Check Phoenix logs
phoenix serve --log-level debug
Resources
- Documentation: https://docs.arize.com/phoenix
- Repository: https://github.com/Arize-ai/phoenix
- Docker Hub: https://hub.docker.com/r/arizephoenix/phoenix
- Version: 12.0.0+
- License: Apache 2.0