Phoenix Playwright Test Writing
Write end-to-end tests for Phoenix using Playwright. Tests live in app/tests/ and follow established patterns.
Timeout Policy
-
Do not pass timeout args in test code under app/tests .
-
Tune timing centrally in app/playwright.config.ts (global timeout , expect.timeout , use.navigationTimeout , and webServer.timeout ).
Quick Start
import { expect, test } from "@playwright/test"; import { randomUUID } from "crypto";
test.describe("Feature Name", () => {
test.beforeEach(async ({ page }) => {
await page.goto(/login);
await page.getByLabel("Email").fill("admin@localhost");
await page.getByLabel("Password").fill("admin123");
await page.getByRole("button", { name: "Log In", exact: true }).click();
await page.waitForURL("**/projects");
});
test("can do something", async ({ page }) => { // Test implementation }); });
Test Credentials
User Email Password Role
Admin admin@localhost admin123 admin
Member member@localhost.com member123 member
Viewer viewer@localhost.com viewer123 viewer
Selector Patterns (Priority Order)
Role selectors (most robust):
page.getByRole("button", { name: "Save" }); page.getByRole("link", { name: "Datasets" }); page.getByRole("tab", { name: /Evaluators/i }); page.getByRole("menuitem", { name: "Edit" }); page.getByRole("cell", { name: "my-item" }); page.getByRole("heading", { name: "Title" }); page.getByRole("dialog"); page.getByRole("textbox", { name: "Name" }); page.getByRole("combobox", { name: /mapping/i });
Label selectors:
page.getByLabel("Email"); page.getByLabel("Dataset Name"); page.getByLabel("Description");
Text selectors:
page.getByText("No evaluators added"); page.getByPlaceholder("Search...");
Test IDs (when available):
page.getByTestId("modal");
CSS locators (last resort):
page.locator('button:has-text("Save")');
Common UI Patterns
Dropdown Menus
// Click button to open dropdown await page.getByRole("button", { name: "New Dataset" }).click(); // Select menu item await page.getByRole("menuitem", { name: "New Dataset" }).click();
Nested Menus (Submenus)
// Open menu, hover over submenu trigger, click submenu item await page.getByRole("button", { name: "Add evaluator" }).click(); await page .getByRole("menuitem", { name: "Use LLM evaluator template" }) .hover(); await page.getByRole("menuitem", { name: /correctness/i }).click();
// IMPORTANT: Always use getByRole("menuitem") for submenu items, not getByText() // Playwright's auto-waiting handles the submenu appearance timing // ❌ BAD - flaky in CI: // await page.getByText("ExactMatch").first().click(); // ✅ GOOD - reliable: // await page.getByRole("menuitem", { name: /ExactMatch/i }).click();
Dialogs/Modals
// Wait for dialog await expect(page.getByRole("dialog")).toBeVisible(); // Fill form in dialog await page.getByLabel("Name").fill("test-name"); // Submit await page.getByRole("button", { name: "Create" }).click(); // Wait for close await expect(page.getByRole("dialog")).not.toBeVisible();
Tables with Row Actions
// Find row by cell content const row = page.getByRole("row").filter({ has: page.getByRole("cell", { name: "item-name" }), }); // Click action button in row (usually last button) await row.getByRole("button").last().click(); // Select action from menu await page.getByRole("menuitem", { name: "Edit" }).click();
Tabs
await page.getByRole("tab", { name: /Evaluators/i }).click(); await page.waitForURL("**/evaluators"); await expect(page.getByRole("tab", { name: /Evaluators/i })).toHaveAttribute( "aria-selected", "true", );
Form Inputs in Sections
// When multiple textboxes exist, scope to section const systemSection = page.locator('button:has-text("System")'); const systemTextbox = systemSection .locator("..") .locator("..") .getByRole("textbox"); await systemTextbox.fill("content");
Serial Tests (Shared State)
Use test.describe.serial when tests depend on each other:
test.describe.serial("Workflow", () => {
const itemName = item-${randomUUID()};
test("step 1: create item", async ({ page }) => { // Creates itemName });
test("step 2: edit item", async ({ page }) => { // Uses itemName from previous test });
test("step 3: verify edits", async ({ page }) => { // Verifies itemName was edited }); });
Assertions
// Visibility await expect(element).toBeVisible(); await expect(element).not.toBeVisible();
// Text content await expect(element).toHaveText("expected"); await expect(element).toContainText("partial");
// Attributes await expect(element).toHaveAttribute("aria-selected", "true");
// Input values await expect(input).toHaveValue("expected value");
// URL await page.waitForURL("/datasets//examples");
Navigation Patterns
// Direct navigation await page.goto("/datasets"); await page.waitForURL("**/datasets");
// Click navigation await page.getByRole("link", { name: "Datasets" }).click(); await page.waitForURL("**/datasets");
// Extract ID from URL const url = page.url(); const match = url.match(/datasets/([^/]+)/); const datasetId = match ? match[1] : "";
// Navigate with query params
await page.goto(/playground?datasetId=${datasetId});
Running Tests
Before running Playwright tests, build the app so E2E runs against the latest frontend changes:
pnpm run build
Run specific test file
pnpm exec playwright test tests/server-evaluators.spec.ts --project=chromium
Run with UI mode
pnpm exec playwright test --ui
Run specific test by name
pnpm exec playwright test -g "can create"
Debug mode
pnpm exec playwright test --debug
Avoiding Interactive Report Server
By default, Playwright serves an HTML report after tests finish and waits for Ctrl+C, which can cause command timeouts. Use these options to avoid this:
Use list reporter (no interactive server)
pnpm exec playwright test tests/example.spec.ts --project=chromium --reporter=list
Use dot reporter for minimal output
pnpm exec playwright test tests/example.spec.ts --project=chromium --reporter=dot
Set CI mode to disable interactive features
CI=1 pnpm exec playwright test tests/example.spec.ts --project=chromium
Recommended for automation: Always use --reporter=list or CI=1 when running tests programmatically to ensure the command exits cleanly after tests complete.
Phoenix-Specific Pages
Page URL Pattern Key Elements
Datasets /datasets
Table, "New Dataset" button
Dataset Detail /datasets/{id}/examples
Tabs (Experiments, Examples, Evaluators, Versions)
Dataset Evaluators /datasets/{id}/evaluators
"Add evaluator" button, evaluators table
Playground /playground
Prompts section, Experiment section
Playground + Dataset /playground?datasetId={id}
Dataset selector, Evaluators button
Prompts /prompts
"New Prompt" button, prompts table
Settings /settings/general
"Add User" button, users table
UI Exploration with agent-browser
When selectors are unclear, use agent-browser to explore the Phoenix UI. For detailed agent-browser usage, invoke the /agent-browser skill.
Quick Reference for Phoenix
Open Phoenix page (dev server runs on port 6006)
agent-browser open "http://localhost:6006/datasets"
Get interactive snapshot with element refs
agent-browser snapshot -i
Click using refs from snapshot
agent-browser click @e5
Fill form fields
agent-browser fill @e2 "test value"
Get element text
agent-browser get text @e1
Discovering Selectors Workflow
-
Open the page: agent-browser open "http://localhost:6006/datasets"
-
Get snapshot: agent-browser snapshot -i
-
Find element refs in output (e.g., @e1 [button] "New Dataset" )
-
Interact: agent-browser click @e1
-
Re-snapshot after navigation/DOM changes: agent-browser snapshot -i
Translating to Playwright
agent-browser output Playwright selector
@e1 [button] "Save"
page.getByRole("button", { name: "Save" })
@e2 [link] "Datasets"
page.getByRole("link", { name: "Datasets" })
@e3 [textbox] "Name"
page.getByRole("textbox", { name: "Name" })
@e4 [menuitem] "Edit"
page.getByRole("menuitem", { name: "Edit" })
@e5 [tab] "Evaluators 0"
page.getByRole("tab", { name: /Evaluators/i })
File Naming
-
Feature tests: {feature-name}.spec.ts
-
Access control: {role}-access.spec.ts
-
Rate limiting: {feature}.rate-limit.spec.ts (runs last)
Common Gotchas
-
Dialog not closing: Wait for a deterministic post-action signal (e.g., dialog hidden + success row visible)
-
Multiple elements: Use .first() , .last() , or .nth(n)
-
Dynamic content: Use regex in name: { name: /pattern/i }
-
Flaky waits: Prefer waitForURL over waitForTimeout
-
Menu not appearing: Wait for specific menu state/element visibility
Debugging Flaky Tests
Critical Lessons Learned
Don't assume parallelism is the problem
-
Phoenix tests run with 7 parallel workers without issues
-
The app handles concurrent logins, database operations, and session management properly
-
If tests fail with parallelism, it's usually a test timing issue, not infrastructure
-
Playwright's browser context isolation is robust - each worker gets isolated cookies/sessions
waitForTimeout is almost always wrong
-
page.waitForTimeout() is the #1 cause of flakiness in Phoenix tests
-
Arbitrary timeouts race against rendering and network speed
-
Always replace with state-based waits: // ❌ BAD - flaky, races against rendering await page.waitForTimeout(500); await element.click();
// ✅ GOOD - waits for actual state await element.waitFor({ state: "visible" }); await element.click();
Test the actual failure before fixing
-
Run tests with parallelism enabled to see what actually fails
-
Check error messages - they often point to the real issue
-
Don't optimize prematurely (e.g., caching auth state) if it's not the problem
Phoenix test infrastructure is solid
-
In-memory SQLite works fine with parallel tests
-
No need for per-worker databases
-
No need for auth state caching
-
Tests use randomUUID() for data isolation - this works well
Debugging Workflow
When tests are flaky:
Run with parallelism multiple times to catch intermittent failures:
for i in 1 2 3 4 5; do pnpm exec playwright test --project=chromium --reporter=dot done
Look for waitForTimeout usage - replace with proper waits:
grep -r "waitForTimeout" app/tests/
Check for race conditions in element interactions:
-
Wait for element visibility before interacting
-
Wait for network idle when needed: page.waitForLoadState("networkidle")
-
Use waitForURL after navigation actions
Verify selectors are stable:
-
Avoid CSS selectors that depend on DOM structure
-
Use role/label selectors that match ARIA attributes
-
Test selectors don't break when UI updates
Run with trace on failure to see what happened:
pnpm exec playwright test --trace on-first-retry
Common Flaky Patterns and Fixes
Flaky Pattern Root Cause Fix
Submenu item not found Using getByText() instead of getByRole()
Use getByRole("menuitem", { name: /pattern/i }) for submenu items
Menu click fails Menu not fully rendered await menu.waitFor({ state: "visible" }) before click
Dialog assertion fails Dialog animation not complete Assert specific completion signal (hidden dialog + next-state element)
Navigation timeout Page still loading Remove waitForLoadState("networkidle")
- it's flaky in CI
Element not found Dynamic content loading Wait for element visibility, not arbitrary timeout
Stale element Re-render between locate and click Store locator, not element handle
Test Stability Best Practices
Use proper waits:
// Wait for element state await element.waitFor({ state: "visible" | "hidden" | "attached" })
// Wait for network await page.waitForLoadState("networkidle" | "domcontentloaded" | "load")
// Wait for URL change await page.waitForURL("**/expected-path")
Use unique test data:
const uniqueName = test-${randomUUID()};
Prefer role selectors - they're less brittle:
page.getByRole("button", { name: "Save" }) // ✅ Good page.locator('button.save-btn') // ❌ Brittle
Don't fight animations - wait for them:
await expect(dialog).not.toBeVisible();
Verify URL changes after navigation:
await page.waitForURL("**/datasets");