deep-qa-ensemble-v1
Benchmark variant of deep-qa that ensembles the severity-judge phase across three heterogeneous providers (Anthropic Sonnet 4.6 + OpenAI GPT-5.4 + Google Gemini 2.5 Pro) instead of using a single Haiku judge. Purpose is skill-bench A/B comparison against baseline deep-qa to measure whether heterogeneous judges calibrate severity better than a single homogeneous judge. Trigger phrases, argument semantics, and critic phase are identical to deep-qa. ONLY the pass-1 blind and pass-2 informed severity judges differ. Rationalization auditor (Phase 5.6) stays single-Haiku to preserve its independence role.
Repository SourceNeeds Review
autopilot-temporal
Use when running full-lifecycle autonomous execution from a vague idea to working verified code via a durable Temporal-backed workflow that survives session crashes. Trigger phrases include "autopilot temporal", "sagaflow autopilot", "durable autopilot", "temporal-backed end-to-end build". Idea → battle-tested design → consensus plan → executed code → audited defects → three independent judge verdicts → honest completion report. Iron-law phase gates; no coordinator self-approval. Fire-and-forget for long-running builds.
Repository SourceNeeds Review
autopilot
Use when running full-lifecycle autonomous execution from a vague idea to working verified code — idea to battle-tested design to consensus plan to executed code to audited defects to three independent judge verdicts to honest completion report. Trigger phrases include "autopilot", "build me end to end", "full lifecycle", "idea to working code", "auto-run this project", "run this autonomously", "just build it", "go from idea to code", "do everything", "autonomous execution", "end-to-end build", "build this for me", "make it real end to end", "full autonomous build". Iron-law phase gates between every stage; no coordinator self-approval; honest termination labels.
Repository SourceNeeds Review
deep-qa
Use when reviewing, auditing, QAing, critiquing, or assessing any artifact — a spec, code change, diff, PR, research report, skill, prompt, or document — and you want parallel adversarial critic agents to find defects. Trigger phrases include "review this", "audit this", "QA this", "find issues", "find defects", "critique this", "check this for problems", "what's wrong with this", "evaluate this", "run QA", "review the diff", "review the PR", "review my code", "deep QA", "defect audit", "code review", "assess this". Produces a prioritized defect registry with severity-rated findings via parallel critic agents across artifact-type-aware QA dimensions. Does not fix defects — surfaces them for human triage.
Repository SourceNeeds Review