Runtime Notes
- Ask the user directly when the workflow says to stop for input.
- Treat
AGENTS.md,TODO.md, andTODOS.mdas the likely sources of repo-local instructions. - Keep the workflow intent intact, but translate any environment-specific wording to the current toolset.
Ship: Fully Automated Ship Workflow
You are running the ship workflow. This is a non-interactive, fully automated workflow. Do NOT ask for confirmation at any step. The user said ship which means DO IT. Run straight through and output the PR URL at the end.
Only stop for:
- On
mainbranch (abort) - Merge conflicts that can't be auto-resolved (stop, show conflicts)
- Test failures (stop, show failures)
- Pre-landing review finds CRITICAL issues and user chooses to fix (not acknowledge or skip)
- MINOR or MAJOR version bump needed (ask — see Step 4)
Never stop for:
- Uncommitted changes (always include them)
- Version bump choice (auto-pick MICRO or PATCH — see Step 4)
- CHANGELOG content (auto-generate from diff)
- Commit message approval (auto-commit)
- Multi-file changesets (auto-split into bisectable commits)
Step 1: Pre-flight
-
Check the current branch. If on
main, abort: "You're on main. Ship from a feature branch." -
Run
git status(never use-uall). Uncommitted changes are always included — no need to ask. -
Run
git diff main...HEAD --statandgit log main..HEAD --onelineto understand what's being shipped.
Step 2: Merge origin/main (BEFORE tests)
Fetch and merge origin/main into the feature branch so tests run against the merged state:
git fetch origin main && git merge origin/main --no-edit
If there are merge conflicts: Try to auto-resolve if they are simple (VERSION, schema.rb, CHANGELOG ordering). If conflicts are complex or ambiguous, STOP and show them.
If already up to date: Continue silently.
Step 3: Run tests (on merged code)
Do NOT run RAILS_ENV=test bin/rails db:migrate — bin/test-lane already calls
db:test:prepare internally, which loads the schema into the correct lane database.
Running bare test migrations without INSTANCE hits an orphan DB and corrupts structure.sql.
Run both test suites in parallel:
bin/test-lane 2>&1 | tee /tmp/ship_tests.txt &
npm run test 2>&1 | tee /tmp/ship_vitest.txt &
wait
After both complete, read the output files and check pass/fail.
If any test fails: Show the failures and STOP. Do not proceed.
If all pass: Continue silently — just note the counts briefly.
Step 3.25: Eval Suites (conditional)
Evals are mandatory when prompt-related files change. Skip this step entirely if no prompt files are in the diff.
1. Check if the diff touches prompt-related files:
git diff origin/main --name-only
Match against these patterns (from AGENTS.md or nearby repo instructions):
app/services/*_prompt_builder.rbapp/services/*_generation_service.rb,*_writer_service.rb,*_designer_service.rbapp/services/*_evaluator.rb,*_scorer.rb,*_classifier_service.rb,*_analyzer.rbapp/services/concerns/*voice*.rb,*writing*.rb,*prompt*.rb,*token*.rbapp/services/chat_tools/*.rb,app/services/x_thread_tools/*.rbconfig/system_prompts/*.txttest/evals/**/*(eval infrastructure changes affect all suites)
If no matches: Print "No prompt-related files changed — skipping evals." and continue to Step 3.5.
2. Identify affected eval suites:
Each eval runner (test/evals/*_eval_runner.rb) declares PROMPT_SOURCE_FILES listing which source files affect it. Grep these to find which suites match the changed files:
grep -l "changed_file_basename" test/evals/*_eval_runner.rb
Map runner → test file: post_generation_eval_runner.rb → post_generation_eval_test.rb.
Special cases:
- Changes to
test/evals/judges/*.rb,test/evals/support/*.rb, ortest/evals/fixtures/affect ALL suites that use those judges/support files. Check imports in the eval test files to determine which. - Changes to
config/system_prompts/*.txt— grep eval runners for the prompt filename to find affected suites. - If unsure which suites are affected, run ALL suites that could plausibly be impacted. Over-testing is better than missing a regression.
3. Run affected suites at EVAL_JUDGE_TIER=full:
ship is a pre-merge gate, so always use full tier (Sonnet structural + Opus persona judges).
EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/<suite>_eval_test.rb 2>&1 | tee /tmp/ship_evals.txt
If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.
4. Check results:
- If any eval fails: Show the failures, the cost dashboard, and STOP. Do not proceed.
- If all pass: Note pass counts and cost. Continue to Step 3.5.
5. Save eval output — include eval results and cost dashboard in the PR body (Step 8).
Tier reference (for context — /ship always uses full):
| Tier | When | Speed (cached) | Cost |
|---|---|---|---|
fast (Haiku) | Dev iteration, smoke tests | ~5s (14x faster) | ~$0.07/run |
standard (Sonnet) | Default dev, bin/test-lane --eval | ~17s (4x faster) | ~$0.37/run |
full (Opus persona) | ship and pre-merge | ~72s (baseline) | ~$1.27/run |
Step 3.5: Pre-Landing Review
Review the diff for structural issues that tests don't catch.
-
Read
references/review-checklist.md. If the file cannot be read, STOP and report the error. -
Run
git diff origin/mainto get the full diff (scoped to feature changes against the freshly-fetched remote main). -
Apply the review checklist in two passes:
- Pass 1 (CRITICAL): SQL & Data Safety, LLM Output Trust Boundary
- Pass 2 (INFORMATIONAL): All remaining categories
-
Always output ALL findings — both critical and informational. The user must see every issue found.
-
Output a summary header:
Pre-Landing Review: N issues (X critical, Y informational) -
If CRITICAL issues found: For EACH critical issue, ask the user directly in a separate message with:
- The problem (
file:line+ description) - Your recommended fix
- Options: A) Fix it now (recommend), B) Acknowledge and ship anyway, C) It's a false positive — skip
After resolving all critical issues: if the user chose A (fix) on any issue, apply the recommended fixes, then commit only the fixed files by name (
git add <fixed-files> && git commit -m "fix: apply pre-landing review fixes"), then STOP and tell the user to runshipagain to re-test with the fixes applied. If the user chose only B (acknowledge) or C (false positive) on all issues, continue with Step 4.
- The problem (
-
If only non-critical issues found: Output them and continue. They will be included in the PR body at Step 8.
-
If no issues found: Output
Pre-Landing Review: No issues found.and continue.
Save the review output — it goes into the PR body in Step 8.
Step 4: Version bump (auto-decide)
-
Read the current
VERSIONfile (4-digit format:MAJOR.MINOR.PATCH.MICRO) -
Auto-decide the bump level based on the diff:
- Count lines changed (
git diff origin/main...HEAD --stat | tail -1) - MICRO (4th digit): < 50 lines changed, trivial tweaks, typos, config
- PATCH (3rd digit): 50+ lines changed, bug fixes, small-medium features
- MINOR (2nd digit): ASK the user — only for major features or significant architectural changes
- MAJOR (1st digit): ASK the user — only for milestones or breaking changes
- Count lines changed (
-
Compute the new version:
- Bumping a digit resets all digits to its right to 0
- Example:
0.19.1.0+ PATCH →0.19.2.0
-
Write the new version to the
VERSIONfile.
Step 5: CHANGELOG (auto-generate)
-
Read
CHANGELOG.mdheader to know the format. -
Auto-generate the entry from ALL commits on the branch (not just recent ones):
- Use
git log main..HEAD --onelineto see every commit being shipped - Use
git diff main...HEADto see the full diff against main - The CHANGELOG entry must be comprehensive of ALL changes going into the PR
- If existing CHANGELOG entries on the branch already cover some commits, replace them with one unified entry for the new version
- Categorize changes into applicable sections:
### Added— new features### Changed— changes to existing functionality### Fixed— bug fixes### Removed— removed features
- Write concise, descriptive bullet points
- Insert after the file header (line 5), dated today
- Format:
## [X.Y.Z.W] - YYYY-MM-DD
- Use
Do NOT ask the user to describe changes. Infer from the diff and commit history.
Step 6: Commit (bisectable chunks)
Goal: Create small, logical commits that work well with git bisect and help LLMs understand what changed.
-
Analyze the diff and group changes into logical commits. Each commit should represent one coherent change — not one file, but one logical unit.
-
Commit ordering (earlier commits first):
- Infrastructure: migrations, config changes, route additions
- Models & services: new models, services, concerns (with their tests)
- Controllers & views: controllers, views, JS/React components (with their tests)
- VERSION + CHANGELOG: always in the final commit
-
Rules for splitting:
- A model and its test file go in the same commit
- A service and its test file go in the same commit
- A controller, its views, and its test go in the same commit
- Migrations are their own commit (or grouped with the model they support)
- Config/route changes can group with the feature they enable
- If the total diff is small (< 50 lines across < 4 files), a single commit is fine
-
Each commit must be independently valid — no broken imports, no references to code that doesn't exist yet. Order commits so dependencies come first.
-
Compose each commit message:
- First line:
<type>: <summary>(type = feat/fix/chore/refactor/docs) - Body: brief description of what this commit contains
- Only the final commit (VERSION + CHANGELOG) gets the version tag and co-author trailer:
- First line:
git commit -m "$(cat <<'EOF'
chore: bump version and changelog (vX.Y.Z.W)
Co-Authored-By: Codex Opus 4.6 <noreply@anthropic.com>
EOF
)"
Step 7: Push
Push to the remote with upstream tracking:
git push -u origin <branch-name>
Step 8: Create PR
Create a pull request using gh:
gh pr create --title "<type>: <summary>" --body "$(cat <<'EOF'
## Summary
<bullet points from CHANGELOG>
## Pre-Landing Review
<findings from Step 3.5, or "No issues found.">
## Eval Results
<If evals ran: suite names, pass/fail counts, cost dashboard summary. If skipped: "No prompt-related files changed — evals skipped.">
## Test plan
- [x] All Rails tests pass (N runs, 0 failures)
- [x] All Vitest tests pass (N tests)
EOF
)"
Output the PR URL — this should be the final output the user sees.
Important Rules
- Never skip tests. If tests fail, stop.
- Never skip the pre-landing review. If checklist.md is unreadable, stop.
- Never force push. Use regular
git pushonly. - Never ask for confirmation except for MINOR/MAJOR version bumps and CRITICAL review findings (one direct user question per critical issue with fix recommendation).
- Always use the 4-digit version format from the VERSION file.
- Date format in CHANGELOG:
YYYY-MM-DD - Split commits for bisectability — each commit = one logical change.
- The goal is: user says
ship, next thing they see is the review + PR URL.