GH Autopilot
Use this skill to operate a deterministic Copilot review loop on one PR. The user must explicitly choose the starting stage. The skill must begin there and keep looping until a terminal condition is reached.
This skill is stateful and persists artifacts under .context/gh-autopilot/ .
Required Start Stage
Require the user to provide one of the following stage values:
-
1 (create_pr ): PR not created yet. Create/select PR first.
-
2 (monitor_review ): PR exists and Copilot is reviewing (or expected soon).
-
3 (address_comments ): Copilot comments already exist and must be addressed now.
If start stage is missing, ask once and wait. Do not guess.
Terminal end conditions for the loop:
-
completed_no_comments (success)
-
timeout with reason stage2_max_wait_reached (Stage 2 overall wait budget exhausted)
Timing Contract
-
Initial wait: 300 seconds.
-
Poll interval after initial wait: 45 seconds.
-
Keep polling after 10 minutes.
-
Stop each cycle wait at 40 minutes (2400 seconds) and mark cycle timeout.
-
Stage 2 overall max wait defaults to 12 hours (43200 seconds) unless overridden.
-
On cycle timeout, immediately retry Stage 2 wait. Do not stop manually while still inside the Stage 2 max-wait budget.
-
Stop entire loop when Copilot summary says generated no comments .
Autopilot Persistence Contract
Autopilot is a persistent control loop. Once started, it must keep operating with timed polling and deterministic transitions until one of the following happens:
-
terminal success: completed_no_comments and drain guard passes
-
terminal timeout: timeout with reason stage2_max_wait_reached
-
explicit blocker: auth failure, state corruption, or PR mismatch that cannot be auto-recovered
Idle waiting is not a stop condition. A single cycle timeout is not a stop condition.
GH Command Reference Contract
When this skill uses gh commands, treat gh-cli as the command source of truth for command shape and flags.
-
Validate auth flows with gh-cli guidance (gh auth status , gh auth login ).
-
Validate PR resolution/edit patterns with gh-cli guidance (gh pr view , gh pr edit --add-reviewer/--remove-reviewer , --json , --jq ).
-
Validate GraphQL/API invocation patterns with gh-cli guidance (gh api graphql ).
Primary Engine
Use scripts/run_autopilot_loop.py as the control-plane entrypoint.
Commands
-
init : initialize state for one PR.
-
run-cycle : wait for new Copilot review and export cycle artifacts.
-
run-stage2-loop : run Stage 2 with automatic cycle-timeout retries until action/terminal/Stage 2 max-wait limit.
-
finalize-cycle : mark current cycle addressed and re-request Copilot.
-
status : print current state.
-
assert-drained : fail if any address-required cycle is still pending.
-
simulate-fsm : deterministic dry-run of event-driven status transitions.
The engine is event-driven: state transitions are applied from explicit events (for example cycle_timeout , cycle_needs_address , finalize_with_reviewer_request ) instead of ad-hoc status rewrites. Every JSON command result includes exactly one canonical resume_command to continue from the current state.
State File
Default: .context/gh-autopilot/state.json
Important fields:
-
status
-
cycle
-
last_processed_review_id
-
pending_review_id
-
pr
Event Log
Default: .context/gh-autopilot/events.jsonl
Each line is a normalized JSON event:
-
schema_version : event schema version.
-
timestamp : event time in UTC ISO8601.
-
event_type : normalized snake_case event name.
-
payload : event payload object.
Context Workspace Files
Use .context/gh-autopilot/ as the durable workspace for autonomy and recovery.
- context.md : single source of truth for next actions, status snapshot, artifacts, and suggested commands. Includes a compact status header: phase=<...> | cycle=<...> | status=<...> | timeout_reason=<...> .
Keep this intentionally simple: one context file, not multiple overlapping notes.
Stage Router
Start from the user-selected stage: create_pr , monitor_review , or address_comments .
Routing rules:
-
Stage 1 (create_pr ) always transitions to Stage 2.
-
Stage 2 (monitor_review ) runs the persistent supervisor path (run-stage2-loop ).
-
Stage 2 guard order is strict:
-
check pending address/triage first
-
check terminal no-comments + drain guard
-
check per-cycle-timeout retry state
-
check Stage 2 max-wait timeout
-
otherwise run another poll cycle
-
If Stage 2 returns awaiting_address or awaiting_triage , transition to Stage 3.
-
Stage 3 (address_comments ) finalizes the full batch with finalize-cycle , then transitions back to Stage 2.
Event-driven state transitions:
initialized|rerequested --begin_cycle_wait--> waiting_for_review waiting_for_review --cycle_timeout--> timeout (cycle_max_wait_reached) timeout (cycle_max_wait_reached) --stage2_retry_after_cycle_timeout--> initialized waiting_for_review --cycle_no_comments--> completed_no_comments waiting_for_review --cycle_needs_address--> awaiting_address waiting_for_review --cycle_needs_triage--> awaiting_triage awaiting_address --finalize_with_reviewer_request--> rerequested awaiting_triage --finalize_with_reviewer_request--> rerequested awaiting_address|awaiting_triage --finalize_without_reviewer_request--> initialized initialized|waiting_for_review|rerequested --stage2_max_wait_reached--> timeout
Stage Details
Stage 1 (create_pr )
User intent: PR has not been created yet.
Actions:
-
Use gh-cli as reference for all gh command usage in this stage.
-
Run gh auth status .
-
Resolve current-branch PR with gh pr view (omit --pr ).
-
If an open PR already exists for the branch, skip PR creation and move to Stage 2.
-
If no PR exists, run gh-pr-creation to open one.
-
Initialize state with init (avoid --force unless state is intentionally reset).
-
Move to Stage 2.
Stage 2 (monitor_review )
User intent: PR exists and we are waiting for Copilot output.
Actions:
-
Use gh-cli as reference for all gh command usage in this stage.
-
Run gh auth status .
-
Resolve PR (current branch or explicit --pr ).
-
Ensure state exists for the PR:
-
If missing: run init .
-
If state already awaiting_address or awaiting_triage : move directly to Stage 3.
-
Run run-stage2-loop with normal timing (300/45/2400 ) plus Stage 2 max wait (43200 by default).
-
On Stage 2 entry, run-stage2-loop performs an immediate fetch pass (initial_sleep=0 ) to capture already-finished Copilot reviews/comments.
-
run-stage2-loop retries run-cycle automatically after each cycle timeout.
-
run-cycle exports comments by matching each thread comment to the active Copilot review_id (not by timestamp cutoff).
-
Interpret result:
-
completed_no_comments -> terminal success; stop loop.
-
includes cycles where no Copilot thread comments were captured for that review round
-
timeout with reason stage2_max_wait_reached -> terminal timeout; stop loop.
-
awaiting_address or awaiting_triage -> move to Stage 3.
-
Before any terminal stop/report in Stage 2, run assert-drained .
-
If it exits non-zero, do not stop; continue to Stage 3.
If Copilot is already reviewing when Stage 2 starts, do not re-request reviewer; continue waiting with run-stage2-loop .
Never stop Stage 2 manually while the command is still within the configured Stage 2 max-wait limit.
Stage 3 (address_comments )
User intent: comments already exist and must be processed now.
Actions:
-
Ensure fresh cycle artifacts are available:
-
If state is already awaiting_address or awaiting_triage , use existing cycle.json .
-
Otherwise run run-cycle with --initial-sleep-seconds 0 to capture existing comments immediately.
-
If parsed_summary.generated_comments > 0 but counts.copilot_comments_total == 0 , artifacts are inconsistent: re-run immediate run-cycle and do not finalize until comments are captured.
-
Build normalized worker artifacts in shared context:
-
run build_review_batch.py to create review-batch.json
-
Use gh-cli as reference for any gh commands used to resolve/reply on PR threads.
-
Run Stage 3 worker actions inside this skill:
-
process all threads from review-batch.json
-
account for every Copilot comment in those threads
-
resolve each actionable thread in GitHub
-
reply on each non-actionable thread with rationale
-
do not leave any thread/comment unreviewed or unaddressed
-
push exactly once for the batch
-
do not request Copilot review while processing individual threads
-
update cycle.json.addressing with complete per-thread and per-comment coverage
-
Validate cycle.json.addressing before finalizing:
-
status=ready_for_finalize
-
pushed_once=true
-
review_id and cycle match active state
-
threads.addressed + threads.rejected_with_rationale equals total thread count
-
threads.needs_clarification=0
-
thread_responses has exactly one entry per thread
-
for each thread_responses entry:
-
classification=actionable requires resolved=true
-
classification=non-actionable requires rationale_replied=true
-
comments.addressed_or_rationalized equals total comment count
-
comments.needs_clarification=0
-
comment_statuses has exactly one entry per comment with:
-
status in {action, no_action}
-
cycle equal to the active cycle
-
chronological sort by created_at
-
Run finalize-cycle only when validation passes (re-requests Copilot unless explicitly skipped for recovery).
-
run this once per cycle, after the full thread batch is complete
-
never run it immediately after addressing a single thread
-
never call reviewer add/remove directly during Stage 3; finalize-cycle is the only allowed reviewer request path
-
Return to Stage 2.
Command Templates
Initialize State
python "<path-to-skill>/scripts/run_autopilot_loop.py"
--repo "."
--pr "<PR_NUMBER_OR_URL>"
init
--initial-sleep-seconds 300
--poll-interval-seconds 45
--cycle-max-wait-seconds 2400
Use --force with init only when intentionally resetting prior state. If reusing an existing PR branch, do not run --force unless the current state is stale or corrupted.
Monitor Stage 2 Loop (recommended)
python "<path-to-skill>/scripts/run_autopilot_loop.py"
--repo "."
--pr "<PR_NUMBER_OR_URL>"
run-stage2-loop
--initial-sleep-seconds 300
--poll-interval-seconds 45
--cycle-max-wait-seconds 2400
--stage2-max-wait-seconds 43200
Use this command for normal Stage 2 operation. It performs an immediate bootstrap fetch first, then automatically retries cycle waits when a cycle-level timeout occurs.
Monitor One Cycle (diagnostic/manual)
python "<path-to-skill>/scripts/run_autopilot_loop.py"
--repo "."
--pr "<PR_NUMBER_OR_URL>"
run-cycle
--initial-sleep-seconds 300
--poll-interval-seconds 45
--cycle-max-wait-seconds 2400
Capture Existing Comments Immediately (Stage 3 bootstrap)
python "<path-to-skill>/scripts/run_autopilot_loop.py"
--repo "."
--pr "<PR_NUMBER_OR_URL>"
run-cycle
--initial-sleep-seconds 0
--poll-interval-seconds 45
--cycle-max-wait-seconds 2400
Build Stage 3 Worker Batch Artifacts
python "<path-to-skill>/scripts/build_review_batch.py"
--cycle ".context/gh-autopilot/cycle.json"
--output-dir ".context/gh-autopilot"
Finalize Addressed Cycle
python "<path-to-skill>/scripts/run_autopilot_loop.py"
--repo "."
--pr "<PR_NUMBER_OR_URL>"
finalize-cycle
This command validates cycle.json.addressing coverage first, then:
-
Moves pending_review_id into last_processed_review_id .
-
Increments cycle .
-
Re-requests Copilot via remove/add reviewer sequence.
-
Records per-comment status (action /no_action ) in finalize event payloads.
Use --skip-reviewer-request only for manual recovery paths.
Print Current State
python "<path-to-skill>/scripts/run_autopilot_loop.py"
--repo "."
--pr "<PR_NUMBER_OR_URL>"
status
Assert No Pending Address-Required Cycle
python "<path-to-skill>/scripts/run_autopilot_loop.py"
--repo "."
--pr "<PR_NUMBER_OR_URL>"
assert-drained
Use this as the final gate before reporting completion/timeout handling results. If state is awaiting_address or awaiting_triage , this command fails and the loop must continue through Stage 3.
Simulate FSM Transitions (deterministic)
python "<path-to-skill>/scripts/run_autopilot_loop.py"
simulate-fsm
--start-status initialized
--event begin_cycle_wait
--event cycle_needs_address
--event finalize_with_reviewer_request
Artifacts and Exit Codes
Outputs (default .context/gh-autopilot/ ):
-
cycle.json
-
review-batch.json (generated by Stage 3 worker setup)
-
context.md (updated)
Status meanings:
-
completed_no_comments : terminal success
-
timeout : timeout status
-
from run-cycle : timeout for that single cycle wait
-
from run-stage2-loop : Stage 2 overall timeout (reason=stage2_max_wait_reached )
-
awaiting_address : actionable Copilot comments captured
-
awaiting_triage : review exists but needs manual interpretation
Exit codes:
-
0 : terminal success or already-terminal state
-
3 : run-cycle timeout
-
10 : comments/triage action required
-
11 : assert-drained detected unaddressed pending cycle
-
12 : run-stage2-loop exhausted Stage 2 max-wait budget
Loop Contract
After entering via the user-selected stage, keep routing until terminal. Do not stop after a single cycle unless blocked by auth/state errors.
current_stage = user_selected_stage while true: if current_stage == 1: run Stage 1 current_stage = 2 continue
if current_stage == 2: run persistent Stage 2 supervisor if status in {awaiting_address, awaiting_triage}: current_stage = 3 continue if status == completed_no_comments: if assert-drained != 0: current_stage = 3 continue stop if status == timeout and reason == stage2_max_wait_reached: if assert-drained != 0: current_stage = 3 continue stop # cycle timeout is internal retry; never stop here continue
if current_stage == 3: run Stage 3 worker handoff if cycle.addressing.status != ready_for_finalize: stop and request clarification if cycle.addressing does not cover all review comments: stop and request clarification current_stage = 2 continue
Recovery Scenarios
Handle common failure modes explicitly:
-
Auth failure:
-
Run gh auth status .
-
If unauthenticated, run gh auth login and retry.
-
State/PR mismatch:
-
If state PR differs from intended PR, re-run init with correct --pr .
-
Use --force only when intentionally discarding prior loop state.
-
Closed/merged PR mid-loop:
-
Stop loop.
-
Open or select a new active PR.
-
Re-initialize state for that PR.
-
Existing open PR before start:
-
Skip gh-pr-creation .
-
Initialize directly against that PR.
-
Copilot already reviewing when loop starts (Stage 2):
-
Skip re-request.
-
Run run-stage2-loop and allow repeated cycle wait windows to continue.
-
Continue normal addressing flow when cycle comments arrive.
-
Copilot comments already present when loop starts (Stage 3):
-
Run immediate capture (--initial-sleep-seconds 0 ) only if cycle artifacts are missing/stale.
-
Build review-batch.json in .context/gh-autopilot/ .
-
Address comments directly in Stage 3 of this skill.
-
Finalize only when cycle.json.addressing reports ready and full comment coverage.
-
Resume Stage 2.
-
Agent interruption or handoff:
-
Resume from .context/gh-autopilot/context.md .
-
Continue using context.md as the source of next actions and state snapshot.
Safety Rules
-
Never process a cycle while state is already awaiting_address .
-
Never finalize a cycle without confirming comments were fully addressed.
-
Never finalize a cycle if any review thread lacks a resolve/rationale response.
-
Never finalize when parsed_summary.generated_comments > 0 but captured comment count is 0 .
-
Never manually stop Stage 2 idle waiting before run-stage2-loop exits by configured limits or terminal status.
-
Never treat a single cycle timeout as terminal; it is always a retry path while Stage 2 max-wait budget remains.
-
Never claim terminal completion unless assert-drained exits 0 .
-
Keep one push per cycle.
-
Do not delete .context/gh-autopilot/ artifacts mid-loop.
-
Keep context.md in sync by using engine commands (init , run-cycle , finalize-cycle ) rather than manual edits.
-
Use gh-cli skill as the source of truth whenever selecting or changing gh commands in this skill.
Optional Utility Scripts
The following scripts remain available for ad-hoc diagnostics:
-
scripts/monitor_copilot_review.py
-
scripts/export_copilot_feedback.py
Prefer run_autopilot_loop.py for normal loop operation.