sync-system-bus

Deploy the system-bus-worker to the joelclaw Kubernetes cluster from local machine. Use when syncing changes in packages/system-bus to k8s, especially because the GitHub Actions deploy job targets a non-existent self-hosted runner and cannot complete deploys automatically.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "sync-system-bus" with this command: npx skills add joelhooks/joelclaw/joelhooks-joelclaw-sync-system-bus

Sync System Bus Worker

Deploy system-bus-worker to the local joelclaw k8s cluster (Talos v1.12.4 / k8s v1.35.0).

Important: .github/workflows/system-bus-worker-deploy.yml has a deploy job on self-hosted. That runner does not exist, so deploys must be completed locally.

Quick Deploy

The publish script handles everything — build, auth, push, k8s apply, rollout, verification:

cd ~/Code/joelhooks/joelclaw
k8s/publish-system-bus-worker.sh

Optional: pass a tag (defaults to timestamp):

k8s/publish-system-bus-worker.sh a6de1e0

GHCR Auth Order

publish-system-bus-worker.sh now authenticates in this order:

  1. GHCR_TOKEN env var (if provided)
  2. secrets lease ghcr_pat (agent-secrets)
  3. gh auth token fallback

If your gh auth token lacks read:packages/write:packages, push will 403. Use ghcr_pat.

What the Script Does

  1. Builds ARM64 Docker image (required — Talos/Colima node is aarch64)
  2. Authenticates to GHCR (prefers agent-secrets lease ghcr_pat; falls back to gh auth token) with temp Docker config
  3. Pushes ghcr.io/joelhooks/system-bus-worker:${TAG} and :latest
  4. Updates the image ref in k8s/system-bus-worker.yaml
  5. kubectl apply the manifest
  6. Waits for rollout (--timeout=180s)
  7. Probes the new pod's health endpoint

Post-Deploy Verification

joelclaw refresh                           # Re-register functions with Inngest
joelclaw functions | grep "<new-function>" # Verify new function appears
joelclaw status                            # Full health check
joelclaw runs --count 3                    # Confirm runs are flowing

Restart Safety (ADR-0156)

The worker is stateless between Inngest steps. Each step is a separate HTTP call; Inngest stores step output server-side. This means k8s rolling restarts are safe — Inngest retries the in-flight step against the new pod.

Critical rule: NEVER set retries: 0 on Inngest functions. With retries: 0, a worker restart during step execution kills the run permanently. With retries ≥ 1, Inngest retries and hits the new pod.

Current story-pipeline has retries: 2 specifically to survive the ~1s restart window during deploys.

What happens during deploy

Step executing on old pod → old pod terminates → step fails (SDK unreachable)
→ Inngest retries after backoff → new pod handles retry → step completes

All previously completed steps are memoized. Only the in-flight step reruns.

Long-running steps (codex implement: 5-10 min)

If a deploy kills a codex step mid-execution, the step reruns from scratch on the new pod (5-10 min wasted but not fatal). For time-critical deploys during active loops, check joelclaw loop status first and deploy between stories.

Manual Steps (if script fails)

Build

cd ~/Code/joelhooks/joelclaw
TAG=$(git rev-parse --short HEAD)
IMAGE="ghcr.io/joelhooks/system-bus-worker:${TAG}"
docker build --platform linux/arm64 -t "$IMAGE" -t ghcr.io/joelhooks/system-bus-worker:latest -f packages/system-bus/Dockerfile .

Push

gh auth token | docker login ghcr.io -u $(gh api user -q .login) --password-stdin
docker push "$IMAGE"
docker push ghcr.io/joelhooks/system-bus-worker:latest

Deploy

kubectl -n joelclaw set image deployment/system-bus-worker system-bus-worker="$IMAGE"
kubectl -n joelclaw rollout status deployment/system-bus-worker --timeout=180s

Verify

joelclaw refresh
joelclaw status

Log

slog write --action deploy --tool system-bus-worker --detail "deployed ${IMAGE}" --reason "sync worker changes"

Talon Rebuild (Adding Secrets / Changing Worker Supervision)

Talon is a Rust binary that supervises the worker process. It leases secrets from agent-secrets and injects them as env vars. When adding new webhook secrets or changing supervision behavior:

# 1. Add secret to agent-secrets
secrets add my_new_secret --value "the-secret-value"

# 2. Update Talon source — add mapping to SECRET_MAPPINGS array
#    File: ~/Code/joelhooks/joelclaw/infra/talon/src/worker.rs
#    ("my_new_secret", "MY_NEW_SECRET_ENV_VAR"),

# 3. Recompile (fast — ~3s incremental)
export PATH="$HOME/.cargo/bin:$PATH"
cd ~/Code/joelhooks/joelclaw/infra/talon
cargo build --release

# 4. Install + re-sign (macOS kills unsigned binaries)
cp target/release/talon ~/.local/bin/talon
codesign -fs - ~/.local/bin/talon

# 5. Restart via launchd
launchctl bootout gui/$(id -u)/com.joel.talon
sleep 1
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.joel.talon.plist
sleep 12

# 6. Verify
curl -s http://localhost:3111/ | jq '.status'
curl -X PUT http://localhost:3111/api/inngest  # Force function sync

Current SECRET_MAPPINGS (worker.rs)

Secret NameEnv Var
claude_oauth_tokenCLAUDE_CODE_OAUTH_TOKEN
todoist_client_secretTODOIST_CLIENT_SECRET
todoist_api_tokenTODOIST_API_TOKEN
front_rules_webhook_secretFRONT_WEBHOOK_SECRET
front_api_tokenFRONT_API_TOKEN
vercel_webhook_secretVERCEL_WEBHOOK_SECRET
joelclaw_webhook_secretJOELCLAW_WEBHOOK_SECRET
revalidation_secretREVALIDATION_SECRET

Talon Key Paths

WhatPath
Binary~/.local/bin/talon
Source~/Code/joelhooks/joelclaw/infra/talon/src/
LaunchAgent plist~/Library/LaunchAgents/com.joel.talon.plist
Logs~/.local/log/talon.log / talon.err
ADR~/Vault/docs/decisions/0159-talon-worker-manager.md

Gotcha: codesign -fs - is required

After cargo build, the binary has adhoc linker-signed signature. macOS launchd may SIGKILL:9 it. Re-signing with codesign -fs - fixes this.

Common Gotchas

ProblemCauseFix
exec format error in podBuilt for amd64, not arm64Rebuild with --platform linux/arm64
GHCR push fails with 403 Forbidden on blob HEADgh auth token missing package scopesUse ghcr_pat via agent-secrets or export GHCR_TOKEN with package scope
docker-credential-desktop errorDocker config has credsStoreScript uses temp config dir — if manual, remove "credsStore": "desktop"
Function missing after deployNot in index fileAdd to both index.host.ts AND index.cluster.ts
Function still missingStale Inngest registrationjoelclaw refresh then check again
"Unable to reach SDK URL"Worker pod not readyWait for rollout, then joelclaw refresh
Runs stuck after deployretries: 0 on the functionSet retries: 2 minimum (ADR-0156)
Stale app registrationsMultiple apps registeredDelete old registrations in Inngest dashboard (:8289)

Key Paths

WhatPath
Publish scriptk8s/publish-system-bus-worker.sh
Dockerfilepackages/system-bus/Dockerfile
k8s manifestk8s/system-bus-worker.yaml
Host function indexpackages/system-bus/src/inngest/functions/index.host.ts
Cluster function indexpackages/system-bus/src/inngest/functions/index.cluster.ts
Worker entrypackages/system-bus/src/serve.ts
GH Actions workflow.github/workflows/system-bus-worker-deploy.yml
ADR-0156~/Vault/docs/decisions/0156-graceful-worker-restart.md

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

cli-design

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

github-bot

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

codex-prompting

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

gogcli

No summary provided by upstream source.

Repository SourceNeeds Review