Fix CI Skill
Automates CI troubleshooting by fetching GitHub Actions failures, analyzing logs, reproducing issues locally, and creating a fix plan for user approval.
Execution Workflow
Step 1: Prerequisites Check
Verify the GitHub CLI is installed and authenticated:
gh --version && gh auth status
If gh is not installed:
-
Inform user: "GitHub CLI is required. Install with: brew install gh "
-
Exit gracefully
If not authenticated:
-
Inform user: "Please authenticate with: gh auth login "
-
Exit gracefully
Step 2: Parse Arguments
Determine the mode based on arguments:
-
No arguments (/fix-ci ): Fetch failures for the current branch only
-
With run-id (/fix-ci <run-id> ): Fetch specific run (bypasses branch scoping)
Step 3: Fetch Failed Run
Default mode (current branch):
BRANCH=$(git branch --show-current) gh run list --branch "$BRANCH" --status failure --limit 1 --json databaseId,name,headBranch,workflowName,createdAt
Specific run mode:
gh run view <run-id> --json databaseId,name,headBranch,workflowName,jobs,conclusion
If no failures found:
-
Report: "No failed runs found for branch $BRANCH . CI is green!"
-
Optionally show recent successful runs:
gh run list --branch "$BRANCH" --limit 3 --json databaseId,conclusion,workflowName,createdAt
- Exit gracefully
Step 4: Get Failure Details
Once a failed run is identified, gather comprehensive details:
RUN_ID=<the-run-id>
Get failed jobs with their steps
gh run view $RUN_ID --json jobs --jq '.jobs[] | select(.conclusion == "failure") | {name, conclusion, steps: [.steps[] | select(.conclusion == "failure")]}'
Get failed step logs (critical for debugging)
gh run view $RUN_ID --log-failed 2>&1 | head -500
Get verbose run info
gh run view $RUN_ID --verbose
Log handling:
-
Truncate logs to 500 lines to avoid context overflow
-
Note to user: "Showing first 500 lines of failed logs. Full logs available on GitHub."
Step 5: Download Artifacts (if available)
Attempt to download any debug artifacts:
Try common artifact names - failures are OK (not all runs have artifacts)
gh run download $RUN_ID -n "coverage" -D /tmp/ci-debug/ 2>/dev/null || true gh run download $RUN_ID -n "test-results" -D /tmp/ci-debug/ 2>/dev/null || true gh run download $RUN_ID -n "logs" -D /tmp/ci-debug/ 2>/dev/null || true
If artifacts downloaded, read them for additional context.
Step 6: Analyze Failure Type
Categorize the failure based on log patterns:
Pattern Failure Type Root Cause Area
FAIL: , --- FAIL , FAILED
Test Failure Specific test case
ruff check , ruff format
Lint Error Code style/formatting
ModuleNotFoundError , ImportError
Import Error Missing dependency
TypeError , AttributeError
Runtime Error Type mismatch
SyntaxError
Syntax Error Invalid code
AssertionError
Assertion Failure Test expectation mismatch
TimeoutError , timed out
Timeout Performance/hang
PermissionError , EACCES
Permission Error File/resource access
ConnectionError , ECONNREFUSED
Network Error External service
Extract key information:
-
Failed test name/file (if applicable)
-
Error message
-
Stack trace location (file:line)
-
Environment variables or config issues
Step 7: Map to Local Test Commands
Determine the appropriate local command based on the CI job:
CI Workflow/Job Local Command
test-cli
cd cli && go test ./...
test-python (server) cd server && uv run pytest -v
test-python (rag) cd rag && uv run pytest -v
test-python (config) cd config && uv run pytest -v
test-python (runtime) cd runtimes/universal && uv run pytest -v
lint (python) uv run ruff check .
lint (go) cd cli && golangci-lint run
type-check
uv run mypy .
build-cli
nx build cli
build-designer
cd designer && npm run build
For specific test failures, narrow down the command:
-
Python: cd <dir> && uv run pytest -v <test_file>::<test_name>
-
Go: cd cli && go test -v -run <TestName> ./...
Step 8: Reproduce Locally
Run the mapped local command to confirm the failure reproduces:
Example for Python test
cd server && uv run pytest -v tests/test_api.py::test_health_check
Outcome A - Failure reproduces locally:
-
Good! Continue to fix plan
-
Report: "Successfully reproduced failure locally"
Outcome B - Failure does NOT reproduce locally:
-
Note: "Could not reproduce locally. Possible causes:"
-
Flaky test (timing-dependent)
-
Environment difference (CI has different deps/config)
-
Race condition
-
Suggest: "Consider re-running CI with gh run rerun $RUN_ID "
-
Ask user how to proceed (investigate further or skip)
Step 9: Analyze Root Cause
Based on the failure type and logs, identify:
-
What failed: Specific test, lint rule, or build step
-
Why it failed: The actual error condition
-
Where to fix: File(s) and line(s) that need changes
-
How to fix: Proposed changes
Use available tools to explore:
-
Read the failing test file
-
Read the code being tested
-
Search for related patterns in the codebase
-
Check recent changes that might have caused the failure
Step 10: Enter Plan Mode
Use EnterPlanMode to create a formal fix plan. The plan should include:
CI Fix Plan
Problem Statement
[Summary of the CI failure from logs]
Failure Details
- Run ID: <run-id>
- Workflow: <workflow-name>
- Job: <job-name>
- Error Type: <categorized-type>
Root Cause Analysis
[Explanation of why the failure occurred]
Affected Files
path/to/file1.py(line X)path/to/file2.py(line Y)
Proposed Changes
Change 1: [Brief description]
[Specific edit to make]
Change 2: [Brief description]
[Specific edit to make]
Verification Steps
- Run:
<local-test-command> - Expected: All tests pass
- Optional: Run full test suite with
<full-suite-command>
Notes
- [Any caveats or considerations]
Step 11: User Approval Gate
Present the plan and wait for explicit user approval:
-
User approves: Proceed to execute fixes
-
User modifies: Incorporate feedback, update plan
-
User rejects: Exit gracefully without changes
CRITICAL: Never make code changes without user approval.
Step 12: Execute Fix (after approval only)
-
Make the proposed code changes using Edit tool
-
Run local tests to verify the fix:
<local-test-command>
-
Report results:
-
Success: "Fix verified locally. Tests pass."
-
Failure: "Fix did not resolve the issue. [details]"
IMPORTANT: Do NOT auto-commit changes. Leave committing to the user or /commit-push-pr skill.
Error Handling
Scenario Action
gh CLI not installed Direct user to install: brew install gh
gh not authenticated Direct user to: gh auth login
No failures found Report CI is green, exit gracefully
Rate limit exceeded Suggest waiting or using gh auth refresh
Run not found Verify run ID, suggest gh run list to find valid IDs
Large logs (>500 lines) Truncate, note full logs on GitHub
Local reproduction fails Note as flaky/env issue, offer re-run option
Network errors Suggest retry, check connection
Output Format
On finding a failure:
CI Failure Found Run: #12345 (workflow-name) Branch: feature-branch Failed Job: test-python Error Type: Test Failure
Analyzing logs... [Summary of failure]
Reproducing locally... [Result]
Entering plan mode to propose fix...
On success (after fix):
Fix Applied
- Modified: path/to/file.py
- Verification: Tests pass locally
Next steps:
- Review the changes
- Run
/commit-push-prto commit and push - CI will re-run automatically on push
Notes for the Agent
-
Always scope to current branch by default - Users expect /fix-ci to fix their current work, not random failures
-
Truncate logs wisely - CI logs can be huge; extract the relevant error sections
-
Reproduce before fixing - Don't propose fixes for issues that can't be reproduced
-
Plan mode is mandatory - Always use EnterPlanMode before making changes
-
Never auto-commit - The user controls when changes are committed
-
Be specific in analysis - Generic advice isn't helpful; identify exact files and lines
-
Handle flaky tests - If reproduction fails, acknowledge it might be flaky