gh-autopilot

GH Autopilot

Use this skill to operate a deterministic Copilot review loop on one PR. The user must explicitly choose the starting stage. The skill must begin there and keep looping until a terminal condition is reached.

This skill is stateful and persists artifacts under .context/gh-autopilot/ .

Required Start Stage

Require the user to provide one of the following stage values:

1 (create_pr ): PR not created yet. Create/select PR first.
2 (monitor_review ): PR exists and Copilot is reviewing (or expected soon).
3 (address_comments ): Copilot comments already exist and must be addressed now.

If start stage is missing, ask once and wait. Do not guess.

Terminal end conditions for the loop:

completed_no_comments (success)
timeout with reason stage2_max_wait_reached (Stage 2 overall wait budget exhausted)

Timing Contract

Initial wait: 300 seconds.
Poll interval after initial wait: 45 seconds.
Keep polling after 10 minutes.
Stop each cycle wait at 40 minutes (2400 seconds) and mark cycle timeout.
Stage 2 overall max wait defaults to 12 hours (43200 seconds) unless overridden.
On cycle timeout, immediately retry Stage 2 wait. Do not stop manually while still inside the Stage 2 max-wait budget.
Stop entire loop when Copilot summary says generated no comments .

Autopilot Persistence Contract

Autopilot is a persistent control loop. Once started, it must keep operating with timed polling and deterministic transitions until one of the following happens:

terminal success: completed_no_comments and drain guard passes
terminal timeout: timeout with reason stage2_max_wait_reached
explicit blocker: auth failure, state corruption, or PR mismatch that cannot be auto-recovered

Idle waiting is not a stop condition. A single cycle timeout is not a stop condition.

GH Command Reference Contract

When this skill uses gh commands, treat gh-cli as the command source of truth for command shape and flags.

Validate auth flows with gh-cli guidance (gh auth status , gh auth login ).
Validate PR resolution/edit patterns with gh-cli guidance (gh pr view , gh pr edit --add-reviewer/--remove-reviewer , --json , --jq ).
Validate GraphQL/API invocation patterns with gh-cli guidance (gh api graphql ).

Primary Engine

Use scripts/run_autopilot_loop.py as the control-plane entrypoint.

Commands

init : initialize state for one PR.
run-cycle : wait for new Copilot review and export cycle artifacts.
run-stage2-loop : run Stage 2 with automatic cycle-timeout retries until action/terminal/Stage 2 max-wait limit.
finalize-cycle : mark current cycle addressed and re-request Copilot.
status : print current state.
assert-drained : fail if any address-required cycle is still pending.
simulate-fsm : deterministic dry-run of event-driven status transitions.

The engine is event-driven: state transitions are applied from explicit events (for example cycle_timeout , cycle_needs_address , finalize_with_reviewer_request ) instead of ad-hoc status rewrites. Every JSON command result includes exactly one canonical resume_command to continue from the current state.

State File

Default: .context/gh-autopilot/state.json

Important fields:

status
cycle
last_processed_review_id
pending_review_id
pr

Event Log

Default: .context/gh-autopilot/events.jsonl

Each line is a normalized JSON event:

schema_version : event schema version.
timestamp : event time in UTC ISO8601.
event_type : normalized snake_case event name.
payload : event payload object.

Context Workspace Files

Use .context/gh-autopilot/ as the durable workspace for autonomy and recovery.

context.md : single source of truth for next actions, status snapshot, artifacts, and suggested commands. Includes a compact status header: phase=<...> | cycle=<...> | status=<...> | timeout_reason=<...> .

Keep this intentionally simple: one context file, not multiple overlapping notes.

Stage Router

Start from the user-selected stage: create_pr , monitor_review , or address_comments .

Routing rules:

Stage 1 (create_pr ) always transitions to Stage 2.
Stage 2 (monitor_review ) runs the persistent supervisor path (run-stage2-loop ).
Stage 2 guard order is strict:
check pending address/triage first
check terminal no-comments + drain guard
check per-cycle-timeout retry state
check Stage 2 max-wait timeout
otherwise run another poll cycle
If Stage 2 returns awaiting_address or awaiting_triage , transition to Stage 3.
Stage 3 (address_comments ) finalizes the full batch with finalize-cycle , then transitions back to Stage 2.

Event-driven state transitions:

initialized|rerequested --begin_cycle_wait--> waiting_for_review waiting_for_review --cycle_timeout--> timeout (cycle_max_wait_reached) timeout (cycle_max_wait_reached) --stage2_retry_after_cycle_timeout--> initialized waiting_for_review --cycle_no_comments--> completed_no_comments waiting_for_review --cycle_needs_address--> awaiting_address waiting_for_review --cycle_needs_triage--> awaiting_triage awaiting_address --finalize_with_reviewer_request--> rerequested awaiting_triage --finalize_with_reviewer_request--> rerequested awaiting_address|awaiting_triage --finalize_without_reviewer_request--> initialized initialized|waiting_for_review|rerequested --stage2_max_wait_reached--> timeout

Stage Details

Stage 1 (create_pr )

User intent: PR has not been created yet.

Actions:

Use gh-cli as reference for all gh command usage in this stage.
Run gh auth status .
Resolve current-branch PR with gh pr view (omit --pr ).
If an open PR already exists for the branch, skip PR creation and move to Stage 2.
If no PR exists, run gh-pr-creation to open one.
Initialize state with init (avoid --force unless state is intentionally reset).
Move to Stage 2.

Stage 2 (monitor_review )

User intent: PR exists and we are waiting for Copilot output.

Actions:

Use gh-cli as reference for all gh command usage in this stage.
Run gh auth status .
Resolve PR (current branch or explicit --pr ).
Ensure state exists for the PR:
If missing: run init .
If state already awaiting_address or awaiting_triage : move directly to Stage 3.
Run run-stage2-loop with normal timing (300/45/2400 ) plus Stage 2 max wait (43200 by default).
On Stage 2 entry, run-stage2-loop performs an immediate fetch pass (initial_sleep=0 ) to capture already-finished Copilot reviews/comments.
run-stage2-loop retries run-cycle automatically after each cycle timeout.
run-cycle exports comments by matching each thread comment to the active Copilot review_id (not by timestamp cutoff).
Interpret result:
completed_no_comments -> terminal success; stop loop.
includes cycles where no Copilot thread comments were captured for that review round
timeout with reason stage2_max_wait_reached -> terminal timeout; stop loop.
awaiting_address or awaiting_triage -> move to Stage 3.
Before any terminal stop/report in Stage 2, run assert-drained .
If it exits non-zero, do not stop; continue to Stage 3.

If Copilot is already reviewing when Stage 2 starts, do not re-request reviewer; continue waiting with run-stage2-loop .

Never stop Stage 2 manually while the command is still within the configured Stage 2 max-wait limit.

Stage 3 (address_comments )

User intent: comments already exist and must be processed now.

Actions:

Ensure fresh cycle artifacts are available:
If state is already awaiting_address or awaiting_triage , use existing cycle.json .
Otherwise run run-cycle with --initial-sleep-seconds 0 to capture existing comments immediately.
If parsed_summary.generated_comments > 0 but counts.copilot_comments_total == 0 , artifacts are inconsistent: re-run immediate run-cycle and do not finalize until comments are captured.
Build normalized worker artifacts in shared context:
run build_review_batch.py to create review-batch.json
Use gh-cli as reference for any gh commands used to resolve/reply on PR threads.
Run Stage 3 worker actions inside this skill:
process all threads from review-batch.json
account for every Copilot comment in those threads
resolve each actionable thread in GitHub
reply on each non-actionable thread with rationale
do not leave any thread/comment unreviewed or unaddressed
push exactly once for the batch
do not request Copilot review while processing individual threads
update cycle.json.addressing with complete per-thread and per-comment coverage
Validate cycle.json.addressing before finalizing:
status=ready_for_finalize
pushed_once=true
review_id and cycle match active state
threads.addressed + threads.rejected_with_rationale equals total thread count
threads.needs_clarification=0
thread_responses has exactly one entry per thread
for each thread_responses entry:
classification=actionable requires resolved=true
classification=non-actionable requires rationale_replied=true
comments.addressed_or_rationalized equals total comment count
comments.needs_clarification=0
comment_statuses has exactly one entry per comment with:
status in {action, no_action}
cycle equal to the active cycle
chronological sort by created_at
Run finalize-cycle only when validation passes (re-requests Copilot unless explicitly skipped for recovery).
run this once per cycle, after the full thread batch is complete
never run it immediately after addressing a single thread
never call reviewer add/remove directly during Stage 3; finalize-cycle is the only allowed reviewer request path
Return to Stage 2.

Command Templates

Initialize State

python "<path-to-skill>/scripts/run_autopilot_loop.py"
--repo "."
--pr "<PR_NUMBER_OR_URL>"
init
--initial-sleep-seconds 300
--poll-interval-seconds 45
--cycle-max-wait-seconds 2400

Use --force with init only when intentionally resetting prior state. If reusing an existing PR branch, do not run --force unless the current state is stale or corrupted.

Monitor Stage 2 Loop (recommended)

python "<path-to-skill>/scripts/run_autopilot_loop.py"
--repo "."
--pr "<PR_NUMBER_OR_URL>"
run-stage2-loop
--initial-sleep-seconds 300
--poll-interval-seconds 45
--cycle-max-wait-seconds 2400
--stage2-max-wait-seconds 43200

Use this command for normal Stage 2 operation. It performs an immediate bootstrap fetch first, then automatically retries cycle waits when a cycle-level timeout occurs.

Monitor One Cycle (diagnostic/manual)

python "<path-to-skill>/scripts/run_autopilot_loop.py"
--repo "."
--pr "<PR_NUMBER_OR_URL>"
run-cycle
--initial-sleep-seconds 300
--poll-interval-seconds 45
--cycle-max-wait-seconds 2400

Capture Existing Comments Immediately (Stage 3 bootstrap)

python "<path-to-skill>/scripts/run_autopilot_loop.py"
--repo "."
--pr "<PR_NUMBER_OR_URL>"
run-cycle
--initial-sleep-seconds 0
--poll-interval-seconds 45
--cycle-max-wait-seconds 2400

Build Stage 3 Worker Batch Artifacts

python "<path-to-skill>/scripts/build_review_batch.py"
--cycle ".context/gh-autopilot/cycle.json"
--output-dir ".context/gh-autopilot"

Finalize Addressed Cycle

python "<path-to-skill>/scripts/run_autopilot_loop.py"
--repo "."
--pr "<PR_NUMBER_OR_URL>"
finalize-cycle

This command validates cycle.json.addressing coverage first, then:

Moves pending_review_id into last_processed_review_id .
Increments cycle .
Re-requests Copilot via remove/add reviewer sequence.
Records per-comment status (action /no_action ) in finalize event payloads.

Use --skip-reviewer-request only for manual recovery paths.

Print Current State

python "<path-to-skill>/scripts/run_autopilot_loop.py"
--repo "."
--pr "<PR_NUMBER_OR_URL>"
status

Assert No Pending Address-Required Cycle

python "<path-to-skill>/scripts/run_autopilot_loop.py"
--repo "."
--pr "<PR_NUMBER_OR_URL>"
assert-drained

Use this as the final gate before reporting completion/timeout handling results. If state is awaiting_address or awaiting_triage , this command fails and the loop must continue through Stage 3.

Simulate FSM Transitions (deterministic)

python "<path-to-skill>/scripts/run_autopilot_loop.py"
simulate-fsm
--start-status initialized
--event begin_cycle_wait
--event cycle_needs_address
--event finalize_with_reviewer_request

Artifacts and Exit Codes

Outputs (default .context/gh-autopilot/ ):

cycle.json
review-batch.json (generated by Stage 3 worker setup)
context.md (updated)

Status meanings:

completed_no_comments : terminal success
timeout : timeout status
from run-cycle : timeout for that single cycle wait
from run-stage2-loop : Stage 2 overall timeout (reason=stage2_max_wait_reached )
awaiting_address : actionable Copilot comments captured
awaiting_triage : review exists but needs manual interpretation

Exit codes:

0 : terminal success or already-terminal state
3 : run-cycle timeout
10 : comments/triage action required
11 : assert-drained detected unaddressed pending cycle
12 : run-stage2-loop exhausted Stage 2 max-wait budget

Loop Contract

After entering via the user-selected stage, keep routing until terminal. Do not stop after a single cycle unless blocked by auth/state errors.

current_stage = user_selected_stage while true: if current_stage == 1: run Stage 1 current_stage = 2 continue

if current_stage == 2: run persistent Stage 2 supervisor if status in {awaiting_address, awaiting_triage}: current_stage = 3 continue if status == completed_no_comments: if assert-drained != 0: current_stage = 3 continue stop if status == timeout and reason == stage2_max_wait_reached: if assert-drained != 0: current_stage = 3 continue stop # cycle timeout is internal retry; never stop here continue

if current_stage == 3: run Stage 3 worker handoff if cycle.addressing.status != ready_for_finalize: stop and request clarification if cycle.addressing does not cover all review comments: stop and request clarification current_stage = 2 continue

Recovery Scenarios

Handle common failure modes explicitly:

Auth failure:
Run gh auth status .
If unauthenticated, run gh auth login and retry.
State/PR mismatch:
If state PR differs from intended PR, re-run init with correct --pr .
Use --force only when intentionally discarding prior loop state.
Closed/merged PR mid-loop:
Stop loop.
Open or select a new active PR.
Re-initialize state for that PR.
Existing open PR before start:
Skip gh-pr-creation .
Initialize directly against that PR.
Copilot already reviewing when loop starts (Stage 2):
Skip re-request.
Run run-stage2-loop and allow repeated cycle wait windows to continue.
Continue normal addressing flow when cycle comments arrive.
Copilot comments already present when loop starts (Stage 3):
Run immediate capture (--initial-sleep-seconds 0 ) only if cycle artifacts are missing/stale.
Build review-batch.json in .context/gh-autopilot/ .
Address comments directly in Stage 3 of this skill.
Finalize only when cycle.json.addressing reports ready and full comment coverage.
Resume Stage 2.
Agent interruption or handoff:
Resume from .context/gh-autopilot/context.md .
Continue using context.md as the source of next actions and state snapshot.

Safety Rules

Never process a cycle while state is already awaiting_address .
Never finalize a cycle without confirming comments were fully addressed.
Never finalize a cycle if any review thread lacks a resolve/rationale response.
Never finalize when parsed_summary.generated_comments > 0 but captured comment count is 0 .
Never manually stop Stage 2 idle waiting before run-stage2-loop exits by configured limits or terminal status.
Never treat a single cycle timeout as terminal; it is always a retry path while Stage 2 max-wait budget remains.
Never claim terminal completion unless assert-drained exits 0 .
Keep one push per cycle.
Do not delete .context/gh-autopilot/ artifacts mid-loop.
Keep context.md in sync by using engine commands (init , run-cycle , finalize-cycle ) rather than manual edits.
Use gh-cli skill as the source of truth whenever selecting or changing gh commands in this skill.

Optional Utility Scripts

The following scripts remain available for ad-hoc diagnostics:

scripts/monitor_copilot_review.py
scripts/export_copilot_feedback.py

Prefer run_autopilot_loop.py for normal loop operation.

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

gh-address-copilot-review

gh-pr-creation

OpenClaw Skill Growth

Find Skills for ClawHub