phoenix-observability

Open-source AI observability platform for LLM tracing, evaluation, and monitoring. Use when debugging LLM applications with detailed traces, running evaluations on datasets, monitoring production AI systems, or setting up observability infrastructure for agentic systems. **PROACTIVE ACTIVATION**: Auto-invoke when implementing observability/tracing for LLM agents, setting up evaluation pipelines, or configuring OpenTelemetry instrumentation. **DETECTION**: Check for arize-phoenix imports, OpenTelemetry setup, or observability-related code. **USE CASES**: Debugging LLM apps, running evaluations, monitoring production systems, setting up tracing infrastructure, instrumenting agent frameworks, tracing custom agents with decorators (@tracer.agent, @tracer.chain, @tracer.tool).

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "phoenix-observability" with this command: npx skills add mguinada/agent-skills/mguinada-agent-skills-phoenix-observability

Phoenix - AI Observability Platform

Collaborating skills

  • AI Engineering: skill: ai-engineering for building the LLM applications that Phoenix observes

Open-source AI observability and evaluation platform for LLM applications with tracing, evaluation, datasets, experiments, and real-time monitoring.

When to Use Phoenix

  • Debugging LLM applications with detailed traces and span analysis
  • Running systematic evaluations on datasets with LLM-as-judge
  • Monitoring production LLM systems with real-time insights
  • Building experiment pipelines for prompt/model comparison
  • Self-hosted observability without vendor lock-in

Key Features

  • Tracing: OpenTelemetry-based trace collection for any LLM framework
  • Evaluation: LLM-as-judge evaluators for quality assessment
  • Datasets: Versioned test sets for regression testing
  • Experiments: Compare prompts, models, and configurations
  • Open-source: Self-hosted with PostgreSQL or SQLite

Quick Start

Installation

pip install arize-phoenix
# With specific features
pip install arize-phoenix[embeddings]  # Embedding analysis
pip install arize-phoenix-otel         # OpenTelemetry config
pip install arize-phoenix-evals        # Evaluation framework

Launch Phoenix Server

import phoenix as px
# Launch in notebook
session = px.launch_app()
# View UI
session.view()  # Embedded iframe
print(session.url)  # http://localhost:6006

Command-line Server

# Start Phoenix server
phoenix serve

# With PostgreSQL backend
export PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host/db"
phoenix serve --port 6006

Basic Tracing

from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentor

# Configure OpenTelemetry with Phoenix
tracer_provider = register(
    project_name="my-llm-app",
    endpoint="http://localhost:6006/v1/traces"
)

# Instrument OpenAI SDK
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)

# All OpenAI calls are now traced
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

Custom Agents with Decorators

For framework-agnostic agentic systems, use @tracer.agent, @tracer.chain, and @tracer.tool decorators:

from openinference.instrumentation import Instrumentor
from phoenix.otel import register

tracer_provider = register(project_name="custom-agent")
instrumentor = Instrumentor(tracer_provider=tracer_provider)

@instrumentor.agent
def my_agent(query: str) -> str:
    context = search_tool(query)
    return synthesize_tool(context, query)

@instrumentor.tool
def search_tool(query: str) -> list:
    return vector_store.search(query)

@instrumentor.tool
def synthesize_tool(context: list, query: str) -> str:
    return llm.generate(query, context)

For detailed tracing patterns, see tracing-setup.md.

Storage Backends

Phoenix supports both SQLite and PostgreSQL for persistent storage:

  • SQLite: Simple, file-based storage (default, ideal for development)
  • PostgreSQL: Production-ready database for scalability and concurrent access

For detailed configuration examples, see storage-backends.md.

Docker Deployment

For containerized deployment, see docker-deployment.md for:

  • Docker compose files for both SQLite and PostgreSQL
  • Production-ready configuration
  • Multi-container setup

Tracing Setup

For comprehensive tracing setup with OpenTelemetry, see tracing-setup.md:

  • Framework-agnostic decorators: @tracer.agent, @tracer.chain, @tracer.tool for custom agents
  • Manual instrumentation with OpenTelemetry API
  • Automatic instrumentation for LLM frameworks
  • Distributed tracing for multi-service applications
  • Custom span attributes and context propagation

Framework Integrations

Phoenix provides auto-instrumentation for many LLM frameworks. For detailed integration guides, see:

  • framework-integrations.md: Complete list of supported frameworks
    • DSPy, LangChain, LlamaIndex, Agno, AutoGen, CrewAI, and more
    • Provider-specific integrations (OpenAI, Anthropic, Bedrock, etc.)
    • Platform integrations (Dify, Flowise, LangFlow)

Core Concepts

Traces and Spans

A trace represents a complete execution flow, while spans are individual operations within that trace.

from phoenix.otel import register
from opentelemetry import trace

# Setup tracing
tracer_provider = register(project_name="my-app")
tracer = trace.get_tracer(__name__)

# Create custom spans
with tracer.start_as_current_span("process_query") as span:
    span.set_attribute("input.value", query)
    # Child spans are automatically nested
    with tracer.start_as_current_span("retrieve_context"):
        context = retriever.search(query)
    with tracer.start_as_current_span("generate_response"):
        response = llm.generate(query, context)
    span.set_attribute("output.value", response)

Projects

Projects organize related traces:

import os
os.environ["PHOENIX_PROJECT_NAME"] = "production-chatbot"

# Or per-trace
from phoenix.otel import register
tracer_provider = register(project_name="experiment-v2")

Evaluation Framework

Built-in Evaluators

from phoenix.evals import (
    OpenAIModel,
    HallucinationEvaluator,
    RelevanceEvaluator,
    ToxicityEvaluator,
)

# Setup model for evaluation
eval_model = OpenAIModel(model="gpt-4o")

# Evaluate hallucination
hallucination_eval = HallucinationEvaluator(eval_model)
results = hallucination_eval.evaluate(
    input="What is the capital of France?",
    output="The capital of France is Paris.",
    reference="Paris is the capital of France."
)

Run Evaluations on Dataset

from phoenix import Client
from phoenix.evals import run_evals

client = Client()

# Get spans to evaluate
spans_df = client.get_spans_dataframe(
    project_name="my-app",
    filter_condition="span_kind == 'LLM'"
)

# Run evaluations
eval_results = run_evals(
    dataframe=spans_df,
    evaluators=[
        HallucinationEvaluator(eval_model),
        RelevanceEvaluator(eval_model)
    ],
    provide_explanation=True
)

# Log results back to Phoenix
client.log_evaluations(eval_results)

Client API

Query Traces and Spans

from phoenix import Client

client = Client(endpoint="http://localhost:6006")

# Get spans as DataFrame
spans_df = client.get_spans_dataframe(
    project_name="my-app",
    filter_condition="span_kind == 'LLM'",
    limit=1000
)

# Get specific span
span = client.get_span(span_id="abc123")

# Get trace
trace = client.get_trace(trace_id="xyz789")

Log Feedback

from phoenix import Client

client = Client()

# Log user feedback
client.log_annotation(
    span_id="abc123",
    name="user_rating",
    annotator_kind="HUMAN",
    score=0.8,
    label="helpful",
    metadata={"comment": "Good response"}
)

Environment Variables

VariableDescriptionDefault
PHOENIX_PORTHTTP server port6006
PHOENIX_HOSTServer bind address127.0.0.1
PHOENIX_GRPC_PORTgRPC/OTLP port4317
PHOENIX_SQL_DATABASE_URLDatabase connectionSQLite temp
PHOENIX_WORKING_DIRData storage directoryOS temp
PHOENIX_ENABLE_AUTHEnable authenticationfalse
PHOENIX_SECRETJWT signing secretRequired if auth enabled

Best Practices

  1. Use projects: Separate traces by environment (dev/staging/prod)
  2. Add metadata: Include user IDs, session IDs for debugging
  3. Evaluate regularly: Run automated evaluations in CI/CD
  4. Version datasets: Track test set changes over time
  5. Monitor costs: Track token usage via Phoenix dashboards
  6. Self-host: Use PostgreSQL for production deployments

Common Issues

Traces Not Appearing

from phoenix.otel import register

# Verify endpoint
tracer_provider = register(
    project_name="my-app",
    endpoint="http://localhost:6006/v1/traces"  # Correct endpoint
)

# Force flush
from opentelemetry import trace
trace.get_tracer_provider().force_flush()

Database Connection Issues

# Verify PostgreSQL connection
psql $PHOENIX_SQL_DATABASE_URL -c "SELECT 1"

# Check Phoenix logs
phoenix serve --log-level debug

Resources

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

afrexai-observability-engine

Complete observability & reliability engineering system. Use when designing monitoring, implementing structured logging, setting up distributed tracing, building alerting systems, creating SLO/SLI frameworks, running incident response, conducting post-mortems, or auditing system reliability. Covers all three pillars (logs/metrics/traces), alert design, dashboard architecture, on-call operations, chaos engineering, and cost optimization.

Archived SourceRecently Updated
Automation

refactor

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

ai-engineering

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

git-commit

No summary provided by upstream source.

Repository SourceNeeds Review