Drupal Intent Testing (agent-browser)
This skill is for “does this do what we meant?” testing — the semi-random, UI-first verification you’d do manually, but with an agent driving a real browser.
It is intentionally not a replacement for Drupal’s PHPUnit/Kernel/FunctionalJavascript tests. It complements them by validating UX, integration glue, and end-to-end intent.
Why agent-browser?
- Works in CLI agents (Claude Code, Codex, Copilot, Cursor…) because it’s a shell tool.
- Uses the accessibility tree + deterministic element refs, so it doesn’t require a vision model.
- Still captures real rendering via screenshots for evidence.
Safety note
Exploratory testing can create content, change config, and click destructive buttons.
- Prefer running against local/dev instances.
- Use
--safety read-only(fuzz mode) unless you explicitly want mutation.
Credentials note: examples use placeholder admin / admin. Do not store real credentials in public artifacts; prefer local overrides or env vars.
Setup
Install agent-browser and download Chromium:
npm install -g agent-browser
agent-browser install
On Linux you may need dependencies:
agent-browser install --with-deps
Core workflow (agent-driven)
- Navigate:
agent-browser open <url> - Get a snapshot:
agent-browser snapshot -i -c --json - Interact using refs (
@e1,@e2) or semantic locators (find label,find role) - Re-snapshot after page changes
- Capture evidence:
agent-browser screenshot <path>
Evidence Pack + checkpoints
This skill treats checkpoints as the unit of evidence. A checkpoint captures:
- Interactive snapshot (for driving)
- Screenshot (visual evidence)
- Console + JS errors
- Current URL
- Drupal messages (
role=status/role=alert) - AI Agents Explorer output (non-interactive text blocks)
- Optional backend probes (
--probe-cmd)
AI Explorer extracts store pre_texts, final_answer, and tool_payload, plus summary metrics:
raw_in_final_answervsraw_in_tool_payload- optional label checks via
--label-term - raw-value regexes via
--raw-value-regex
Intent manifest (autonomous loop)
Use an intent manifest to encode what to test and what counts as PASS/FAIL:
issue:
url: "https://www.drupal.org/node/3551315"
title: "Enum options should expose UI labels"
environment:
base_url: "https://canvas-dev.ddev.site:8443"
login_url: "/user/login"
admin_user: "admin"
admin_pass: "admin"
probe_cwd: "/path/to/project"
adr:
- "Consumers can reliably map labels to stored values without guessing."
- "Translation correctness depends on cache context bubbling and serialization boundaries; labels are kept stringable until serialization."
- "Additional schema warnings are logged in a dedicated channel for troubleshooting malformed enum metadata."
strategy:
mode: "single" # single | compare
between_cmd: "ddev snapshot restore intent-baseline"
retries: 1
timeouts:
page_load_ms: 120000
ai_response_ms: 600000
steps:
- open: "/admin/config/ai/agents/explore?agent_id=canvas_page_builder_agent"
- action:
type: run_ai_agent_explorer
prompt_file: "resources/vienna-hero-labels-mini.txt"
model: "gpt-4.1"
completion_texts: ["Final Answer", "Ran"]
run_buttons: ["Run Agent", "Run"]
post_completion_timeout_ms: 60000
post_completion_stable_ms: 1500
pre_min_count: 1
- checkpoint: "after_run"
assertions:
- id: "no_raw_values_in_final_answer"
type: "text_absent"
scope: "final_answer"
patterns: ["hg:", "flex-row"]
severity: "fail"
guards:
- id: "no_js_errors"
type: "no_console_errors"
severity: "fail"
Run it:
python3 -m scripts.intent.validate_manifest path/to/intent.yaml
python3 -m scripts.intent_test path/to/intent.yaml --output-dir test_outputs
Agent runbook: intent verification loop for Drupal.org issues
This is the default operating procedure for contrib fixes/features. If there is no Drupal.org issue, use the provided intent statement instead.
0) Mission statement
Your job is not “make tests pass.” Your job is:
- Understand user intent (from Drupal.org issue if available)
- Implement the change
- Verify like a human in a tight loop:
- Doesn’t error
- Actually does the thing the user wanted
- If you can’t verify confidently, escalate with excellent evidence and a specific question
Default behavior: if the user says “test with drupal-intent-testing” and does not provide a scenario/manifest, you must author the verification artifact yourself from the intent and run it. Do not ask for a script unless you are blocked on missing environment details (base URL, credentials, or success criteria).
1) Inputs you start from
You will be given at least:
- Drupal.org issue URL (or node id) if available
- Local dev base URL (DDEV or other), and credentials if needed
- Repo working tree/branch
2) Pick the right verification mode (decision tree)
Prefer the strongest behavioral proof available (rendered UI, saved config, persisted output) to minimize upstream maintainer burden; avoid tests that only check token presence if a behavioral proof is possible.
A) UI/behavioral issue (click here, see X, config form, node form, rendering):
- Use Compare mode with a generated scenario script + assertions.
B) AI Agents Explorer output or tool payload shape:
- Use intent manifest +
python -m scripts.intent_test+ judge. - Checkpoints auto-extract AI Explorer
<pre>blocks intofinal_answerandtool_payload.
C) Unclear, hard to reproduce, or multiple workflows:
- Do a short guided exploration to learn the UI path, then convert to Compare/Manifest.
D) Light fuzz smoke (optional but recommended):
- After a likely fix, run seeded fuzz in
read-onlyunless you have snapshot restore.
3) Core loop (bugfix + feature)
Step 1 — Pull “intent” from the issue (if available)
If available, use the drupal-issue-queue skill (https://github.com/scottfalconer/drupal-issue-queue) to extract:
- Beneficiary persona: site builder, content editor, admin, dev
- Expected behavior (“success looks like…”)
- Repro steps (exact UI journey)
- Edge cases + regression risks
If you do not have the skill installed, summarize the issue manually and proceed.
If there is no Drupal.org issue, use the provided intent statement instead.
Output (required): write a short “Intent statement”:
- “The user is trying to ___”
- “Success means ___”
- “Failure modes to guard against: ___”
If an ADR applies, include its statements in the intent summary and translate each into assertions.
Step 2 — Generate a verification artifact
Option A: Scenario script (Compare mode) — preferred for UI issues
Create: scripts/test_scenarios/issue_<nid>.txt
- Direct route navigation (Drupal admin is URL-addressable)
- Semantic locators (
find label,find role) - Checkpoints at key moments
- Assertions that encode success + regression guards
Option B: Intent manifest — preferred for AI Explorer/tool payload assertions
Create: .intent/issue_<nid>.yaml
Minimum required fields:
issue.url,issue.titleenvironment.base_url- non-empty
steps
If there is no Drupal.org issue, use intent_statement instead of issue.*.
If credentials are provided, add environment.admin_user/admin_pass to auto-generate login flow.
If an ADR applies, list it under adr: and map each statement to concrete assertions.
Step 3 — Baseline run
Run verification before code changes to prove the problem exists.
Compare mode:
python3 scripts/compare_runs.py --url "https://YOUR.ddev.site" --script "scripts/test_scenarios/issue_<nid>.txt" --output-dir "test_outputs/issue_<nid>" --output "comparison_report.json" --output-md "comparison_report.md"
Manifest compare mode:
- Use
strategy.mode: compareandbetween_cmdto reset state.
Baseline expectations:
- Bugfix: baseline should FAIL intent assertions or show the problem clearly.
- Feature: baseline should show “not present yet.”
- If you cannot reproduce, escalate.
Step 4 — Implement the change
Make code changes. Run built-in tests as normal.
Step 5 — Modified run
Repeat the exact same verification artifact.
Step 6 — Produce verdict + summary
If using manifest/judge:
python -m scripts.intent.validate_manifestvalidates required keys.python -m scripts.intent_testruns and writesintent_run.json+ verdict.judge_intent.pyproducesready_to_submit.
Supported judge assertion types:
no_console_errorsno_drupal_messages(alert/status)url_containstext_absent/text_present(AI Explorerfinal_answer/tool_payload)yaml_path_equals
If using Compare mode:
assert-*results are recorded.- Checkpoints provide full evidence packs.
Required summary format:
- Intent statement (1–3 sentences)
- What you ran (commands + artifact paths)
- Baseline outcome
- Modified outcome
- Regressions checked (JS errors / Drupal alerts / etc.)
- Confidence level (high/medium/low) + why
4) How to write good verification artifacts (“like a human”)
- Prefer direct routes (
/admin/config,/node/add/article, etc.) - Prefer semantic locators (
find label,find role) - Always include guardrails:
- No JS errors
- No Drupal alerts (
role=alert) - Expected status message (
role=status) when saving
- Place checkpoints at decision points:
- after login
- after key action
- after expected result
5) Escalation protocol
Escalate when:
- You cannot reproduce baseline
- Success criteria are ambiguous/subjective
- Verification results are flaky
- Environment prevents verification
Template:
ESCALATION: intent verification blocked / uncertain
Issue: <url or “no issue provided”>
Intent (my understanding): …
Verification mode: Compare / Manifest / Explore
Commands run: …
Artifacts: test_outputs/...
Baseline result: PASS/FAIL/ERROR + key evidence
Modified result: PASS/FAIL/ERROR + evidence
Why I’m not confident: …
What I need from you: …
6) Done criteria
Do not mark complete until:
- You attempted to observe the intended outcome in the UI
- You verified both:
- no errors (JS + Drupal alerts)
- intended behavior occurs (assertion or strong evidence)
- Prefer the strongest behavioral proof available; increase timeouts when needed instead of failing fast.
- Bugfixes: baseline failure + modified success (or escalated if unreproducible)
- You produced attachable artifacts (screenshots + reports + brief narrative)
7) Optional next improvements
- Rely on
assert-*heavily so judge output stays crisp. - Add more assertion types as needed.
- Use default probes (e.g.,
drush ws) for root-cause context.
If you have a Drupal.org issue (source of intent)
When a specific issue is driving the work, the issue discussion is the intent. Use it to derive your test goals, steps, and expectations before you explore.
Recommended flow (uses the drupal-issue-queue skill; skip if you do not have it installed):
If installed, this command comes from that skill:
# Summarize the issue discussion (acceptance criteria, expected/actual, edge cases).
python scripts/dorg.py issue <nid-or-url> --format md --mode summary --comments 20
Then:
- Extract intent statements (acceptance criteria, expected behavior, and repro steps).
- Translate each into scenario steps with
expectassertions. - Use the scenario as your baseline for Compare or Guided exploration.
Example mapping:
- Issue: “When adding a context label, it should appear in the Canvas AI sidebar.”
- Scenario: add label →
expect --text "New Label"→ screenshot.
Modes
Mode 1 — Watch (guided, human-validating)
Use when you want to visually inspect outcomes (new UI element appears, content looks right).
Suggested pattern:
agent-browser open "https://my.ddev.site/user/login"
agent-browser snapshot -i -c
agent-browser find label "Username" fill "admin"
agent-browser find label "Password" fill "admin"
agent-browser find role button click --name "Log in"
agent-browser wait --load networkidle
agent-browser open "https://my.ddev.site/admin/config/canvas-ai"
agent-browser wait --load networkidle
agent-browser screenshot test_outputs/canvas-ai-config.png
Mode 2 — Compare (paired A/B, semi-deterministic)
Runs the same script twice and produces a diff report + artifacts (including console/errors captured at each snapshot).
Useful flags:
--trace: capturetrace.zipper run--probe-cmd: run backend probes at each checkpoint (repeatable)--probe-cwd: working directory for probe commands--raw-value-regex/--label-term: AI output metrics
python3 scripts/compare_runs.py --url "https://my.ddev.site" --script scripts/test_scenarios/canvas_ai_context_label.txt --output-dir test_outputs --output comparison_report.json --output-md comparison_report.md
Strongly recommended: reset DB/state between A and B so your script starts from the same world.
If you use DDEV, you can snapshot/restore:
ddev snapshot --name intent-baselineddev snapshot restore intent-baseline
compare_runs.py supports running an arbitrary “between” shell command and optional evidence probes:
python3 scripts/compare_runs.py --url "https://my.ddev.site" --script scripts/test_scenarios/canvas_ai_context_label.txt --between-cmd "ddev snapshot restore intent-baseline" --probe-cmd "ddev exec drush ws --count=50 --format=json" --trace --output-dir test_outputs
Mode 3 — Explore
3A — Guided mission (LLM-driven)
Best for requests like: “Build a microsite using Canvas and give feedback.”
Recommended loop (the agent does this):
- Decide a small next step toward the goal (e.g., “create a landing page”, “add a hero component”).
- Run:
agent-browser snapshot -i -c --json- choose element(s) using refs or
find - execute action(s)
agent-browser wait --load networkidleagent-browser screenshot ...
- Append observations to
test_outputs/mission_log.md - Repeat until timebox ends.
- Produce a structured summary: what worked, what was confusing, errors seen, suggestions.
3B — Seeded fuzz/monkey testing (scripted, no LLM required)
Good for: “click around like a distracted user and see what breaks” — for an hour.
python3 scripts/explore.py --url "https://my.ddev.site" --duration 60m --mode fuzz --seed 1337 --safety dangerous --checkpoint-every 20 --probe-cmd "ddev exec drush ws --count=50 --format=json" --output-dir test_outputs --output exploration_report.md
safety levels:
read-only: no “Save/Submit/Apply” clicks; avoids mutation.dangerous: anything goes.
Optional fuzz flags:
--checkpoint-every N: capture full evidence every N actions--probe-cmd ...: include backend probes at checkpoints--probe-cwd ...: working directory for probes (useful for DDEV)
Scenario script format (for Compare)
The compare runner understands a tiny DSL:
open /path(prefixes--url)snapshot <name>(saves snapshot JSON to artifacts)checkpoint <name>(full evidence bundle: snapshot + screenshot + console/errors + message extraction)screenshot <file.png>wait <seconds>(numeric)expect …(assert/await; passed toagent-browser wait)assert-present --text "..."/assert-absent --text "..."(basic assertions)assert-no-js-errors/assert-no-drupal-alertsassert-url --contains "/node/123"assert-count --selector ".some-class" --eq 3extract eval <name> <js>/extract text <name> <locator>probe shell <name> -- <command>/probe drush <name> -- <args>- Any other line is passed through as a raw
agent-browser …command
Best practice: prefer semantic locators (stable) over hard-coded refs.
checkpoint is the preferred evidence capture; snapshot remains for legacy scripts.
Example:
open /user/login
wait --load networkidle
find label "Username" fill "admin"
find label "Password" fill "admin"
find role button click --name "Log in"
wait --load networkidle
expect --text "Log out"
screenshot 01_logged_in.png
open /admin/config/canvas-ai
wait --load networkidle
checkpoint config_page
screenshot 02_config_page.png
Drupal patterns
See references/drupal_patterns.md for practical Drupal UI patterns (login, messages, admin routes, CKEditor, AJAX).
Reading results
Artifacts are written under --output-dir (default ./test_outputs/):
baseline/andmodified/subfolders (compare mode)*.pngscreenshots*.jsonsnapshots (normalized in the report)*.errors.json/*.console.json(captured at snapshot checkpoints)*.drupal_messages.json(status/alert text at checkpoints)*.ai_explorer.json(AI Agents Explorer output + summary metrics)*.probe.N.json(optional backend probes)*.trace.zip(if--traceenabled)comparison_report.json(diff + summary)comparison_report.md(human summary)exploration_report.md(fuzz mode report)intent_run.json/intent_verdict.json(manifest runner outputs)