dyad:deflake-e2e

Deflake E2E Tests

Identify and fix flaky E2E tests by running them repeatedly and investigating failures.

Arguments

$ARGUMENTS : (Optional) Specific E2E test file(s) to deflake (e.g., main.spec.ts or e2e-tests/main.spec.ts ). If not provided, will prompt to deflake the entire test suite.

Instructions

Check if specific tests are provided:

If $ARGUMENTS is empty or not provided, ask the user:

"No specific tests provided. Do you want to deflake the entire E2E test suite? This can take a very long time as each test will be run 10 times."

Wait for user confirmation before proceeding. If they decline, ask them to provide specific test files.

Install dependencies:

npm install

Build the app binary:

npm run build

IMPORTANT: This step is required before running E2E tests. E2E tests run against the built binary. If you make any changes to application code (anything outside of e2e-tests/ ), you MUST re-run npm run build before running E2E tests again, otherwise you'll be testing the old version.

Run tests repeatedly to detect flakiness:

For each test file, run it 10 times:

PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<testfile>.spec.ts --repeat-each=10

IMPORTANT: PLAYWRIGHT_RETRIES=0 is required to disable automatic retries. Without it, CI environments (where CI=true ) default to 2 retries, causing flaky tests to pass on retry and be incorrectly skipped as "not flaky."

Notes:

If $ARGUMENTS is provided without the e2e-tests/ prefix, add it
If $ARGUMENTS is provided without the .spec.ts suffix, add it
A test is considered flaky if it fails at least once out of 10 runs

For each flaky test, investigate with debug logs:

Run the failing test with Playwright browser debugging enabled:

DEBUG=pw:browser PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<testfile>.spec.ts

Analyze the debug output to understand:

Timing issues (race conditions, elements not ready)
Animation/transition interference
Network timing variability
State leaking between tests
Snapshot comparison differences

Fix the flaky test:

Common fixes following Playwright best practices:

Use await expect(locator).toBeVisible() before interacting with elements
Use await page.waitForLoadState('networkidle') for network-dependent tests
Use stable selectors (data-testid, role, text) instead of fragile CSS selectors
Add explicit waits for animations: await page.waitForTimeout(300) (use sparingly)
Use await expect(locator).toHaveScreenshot() options like maxDiffPixelRatio for visual tests
Ensure proper test isolation (clean state before/after tests)

IMPORTANT: Do NOT change any application code. Assume the application code is correct. Only modify test files and snapshot baselines.

Update snapshot baselines if needed:

If the flakiness is due to legitimate visual differences:

PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<testfile>.spec.ts --update-snapshots

Verify the fix:

Re-run the test 10 times to confirm it's no longer flaky:

PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<testfile>.spec.ts --repeat-each=10

The test should pass all 10 runs consistently.

Summarize results:

Report to the user:

Which tests were identified as flaky
What was causing the flakiness
What fixes were applied
Verification results (all 10 runs passing)
Any tests that could not be fixed and need further investigation

dyad:deflake-e2e

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

dyad:multi-pr-review

remember-learnings

dyad:pr-push

dyad:deflake-e2e-recent-commits