Phoenix - AI Observability Platform

Collaborating skills

AI Engineering: skill: ai-engineering for building the LLM applications that Phoenix observes

Open-source AI observability and evaluation platform for LLM applications with tracing, evaluation, datasets, experiments, and real-time monitoring.

When to Use Phoenix

Debugging LLM applications with detailed traces and span analysis
Running systematic evaluations on datasets with LLM-as-judge
Monitoring production LLM systems with real-time insights
Building experiment pipelines for prompt/model comparison
Self-hosted observability without vendor lock-in

Key Features

Tracing: OpenTelemetry-based trace collection for any LLM framework
Evaluation: LLM-as-judge evaluators for quality assessment
Datasets: Versioned test sets for regression testing
Experiments: Compare prompts, models, and configurations
Open-source: Self-hosted with PostgreSQL or SQLite

Quick Start

Installation

pip install arize-phoenix
# With specific features
pip install arize-phoenix[embeddings]  # Embedding analysis
pip install arize-phoenix-otel         # OpenTelemetry config
pip install arize-phoenix-evals        # Evaluation framework

Launch Phoenix Server

import phoenix as px
# Launch in notebook
session = px.launch_app()
# View UI
session.view()  # Embedded iframe
print(session.url)  # http://localhost:6006

Command-line Server

# Start Phoenix server
phoenix serve

# With PostgreSQL backend
export PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host/db"
phoenix serve --port 6006

Basic Tracing

from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentor

# Configure OpenTelemetry with Phoenix
tracer_provider = register(
    project_name="my-llm-app",
    endpoint="http://localhost:6006/v1/traces"
)

# Instrument OpenAI SDK
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)

# All OpenAI calls are now traced
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

Custom Agents with Decorators

For framework-agnostic agentic systems, use @tracer.agent, @tracer.chain, and @tracer.tool decorators:

from openinference.instrumentation import Instrumentor
from phoenix.otel import register

tracer_provider = register(project_name="custom-agent")
instrumentor = Instrumentor(tracer_provider=tracer_provider)

@instrumentor.agent
def my_agent(query: str) -> str:
    context = search_tool(query)
    return synthesize_tool(context, query)

@instrumentor.tool
def search_tool(query: str) -> list:
    return vector_store.search(query)

@instrumentor.tool
def synthesize_tool(context: list, query: str) -> str:
    return llm.generate(query, context)

For detailed tracing patterns, see tracing-setup.md.

Storage Backends

Phoenix supports both SQLite and PostgreSQL for persistent storage:

SQLite: Simple, file-based storage (default, ideal for development)
PostgreSQL: Production-ready database for scalability and concurrent access

For detailed configuration examples, see storage-backends.md.

Docker Deployment

For containerized deployment, see docker-deployment.md for:

Docker compose files for both SQLite and PostgreSQL
Production-ready configuration
Multi-container setup

Tracing Setup

For comprehensive tracing setup with OpenTelemetry, see tracing-setup.md:

Framework-agnostic decorators: @tracer.agent, @tracer.chain, @tracer.tool for custom agents
Manual instrumentation with OpenTelemetry API
Automatic instrumentation for LLM frameworks
Distributed tracing for multi-service applications
Custom span attributes and context propagation

Framework Integrations

Phoenix provides auto-instrumentation for many LLM frameworks. For detailed integration guides, see:

framework-integrations.md: Complete list of supported frameworks
- DSPy, LangChain, LlamaIndex, Agno, AutoGen, CrewAI, and more
- Provider-specific integrations (OpenAI, Anthropic, Bedrock, etc.)
- Platform integrations (Dify, Flowise, LangFlow)

Core Concepts

Traces and Spans

A trace represents a complete execution flow, while spans are individual operations within that trace.

from phoenix.otel import register
from opentelemetry import trace

# Setup tracing
tracer_provider = register(project_name="my-app")
tracer = trace.get_tracer(__name__)

# Create custom spans
with tracer.start_as_current_span("process_query") as span:
    span.set_attribute("input.value", query)
    # Child spans are automatically nested
    with tracer.start_as_current_span("retrieve_context"):
        context = retriever.search(query)
    with tracer.start_as_current_span("generate_response"):
        response = llm.generate(query, context)
    span.set_attribute("output.value", response)

Projects

Projects organize related traces:

import os
os.environ["PHOENIX_PROJECT_NAME"] = "production-chatbot"

# Or per-trace
from phoenix.otel import register
tracer_provider = register(project_name="experiment-v2")

Evaluation Framework

Built-in Evaluators

from phoenix.evals import (
    OpenAIModel,
    HallucinationEvaluator,
    RelevanceEvaluator,
    ToxicityEvaluator,
)

# Setup model for evaluation
eval_model = OpenAIModel(model="gpt-4o")

# Evaluate hallucination
hallucination_eval = HallucinationEvaluator(eval_model)
results = hallucination_eval.evaluate(
    input="What is the capital of France?",
    output="The capital of France is Paris.",
    reference="Paris is the capital of France."
)

Run Evaluations on Dataset

from phoenix import Client
from phoenix.evals import run_evals

client = Client()

# Get spans to evaluate
spans_df = client.get_spans_dataframe(
    project_name="my-app",
    filter_condition="span_kind == 'LLM'"
)

# Run evaluations
eval_results = run_evals(
    dataframe=spans_df,
    evaluators=[
        HallucinationEvaluator(eval_model),
        RelevanceEvaluator(eval_model)
    ],
    provide_explanation=True
)

# Log results back to Phoenix
client.log_evaluations(eval_results)

Client API

Query Traces and Spans

from phoenix import Client

client = Client(endpoint="http://localhost:6006")

# Get spans as DataFrame
spans_df = client.get_spans_dataframe(
    project_name="my-app",
    filter_condition="span_kind == 'LLM'",
    limit=1000
)

# Get specific span
span = client.get_span(span_id="abc123")

# Get trace
trace = client.get_trace(trace_id="xyz789")

Log Feedback

from phoenix import Client

client = Client()

# Log user feedback
client.log_annotation(
    span_id="abc123",
    name="user_rating",
    annotator_kind="HUMAN",
    score=0.8,
    label="helpful",
    metadata={"comment": "Good response"}
)

Environment Variables

Variable	Description	Default
`PHOENIX_PORT`	HTTP server port	`6006`
`PHOENIX_HOST`	Server bind address	`127.0.0.1`
`PHOENIX_GRPC_PORT`	gRPC/OTLP port	`4317`
`PHOENIX_SQL_DATABASE_URL`	Database connection	SQLite temp
`PHOENIX_WORKING_DIR`	Data storage directory	OS temp
`PHOENIX_ENABLE_AUTH`	Enable authentication	`false`
`PHOENIX_SECRET`	JWT signing secret	Required if auth enabled

Best Practices

Use projects: Separate traces by environment (dev/staging/prod)
Add metadata: Include user IDs, session IDs for debugging
Evaluate regularly: Run automated evaluations in CI/CD
Version datasets: Track test set changes over time
Monitor costs: Track token usage via Phoenix dashboards
Self-host: Use PostgreSQL for production deployments

Common Issues

Traces Not Appearing

from phoenix.otel import register

# Verify endpoint
tracer_provider = register(
    project_name="my-app",
    endpoint="http://localhost:6006/v1/traces"  # Correct endpoint
)

# Force flush
from opentelemetry import trace
trace.get_tracer_provider().force_flush()

Database Connection Issues

# Verify PostgreSQL connection
psql $PHOENIX_SQL_DATABASE_URL -c "SELECT 1"

# Check Phoenix logs
phoenix serve --log-level debug

Resources

Documentation: https://docs.arize.com/phoenix
Repository: https://github.com/Arize-ai/phoenix
Docker Hub: https://hub.docker.com/r/arizephoenix/phoenix
Version: 12.0.0+
License: Apache 2.0

phoenix-observability

Safety Notice

Copy this and send it to your AI assistant to learn