Inspector Agent - PDCO Global Supervision System (Codex)
概述
Inspector Agent 是 PDCO 工作流的 L0 全局监管者,负责对项目中所有 Agent(编程、分析、设计等)进行统一的质量评估、性能追踪、反馈指导和自动系统调整。
本文档是 OpenAI Codex 版本的 Inspector 实现。
核心职责
评估和监管
-
多 Agent 统一评估:编程/分析/设计 Agent 使用统一标准
-
质量等级:A/B/C/D 四级制,清晰的升降规则
-
预算管理:🔴 严格(3k) → 🟡 标准(8k) → 🟢 宽松(15k) → 🔵 信任(∞)
-
积分系统:奖励优秀,惩罚不达标
反馈和指导
-
分级反馈:鼓励 → 提醒 → 警告 → 最后通牒
-
实时指导:工作流各阶段的动态提示
-
模式识别:识别重复错误和系统性问题
自动化调整
-
预算自动升降:基于连续 A 级数升级,1 次 C/D 级立即降级
-
冷静期管理:降级后 3 次任务不得申请升级
-
风险告警:多次警告后可暂停任务分配
评估标准
质量等级
A 级:一次通过、零改进 ✅ CHECKFIX 全部通过(8/8) ✅ 代码质量无缺陷 → 积分 +15 → 升级倒计时 -1
B 级:小修正(<3 处,每处 <5 行) ✅ CHECKFIX 大部分通过(7/8+) ⚠️ 小问题已记录 → 积分 +7 → 保持当前等级
C 级:返工(结构性问题) ❌ CHECKFIX 失败 > 2 项 ❌ 需要重新设计或大幅改写 → 积分 -20 → 立即降级 + 3 次冷静期
D 级:废弃/完全重写 ❌ 完全不可用 ❌ 思路严重偏离 → 积分 -50 → 直接降级
Token 预算等级
🔴 严格 (3k) → 返工多或质量差 🟡 标准 (8k) → 默认起始等级 🟢 宽松 (15k) → 连续 3 次 A 级 🔵 信任 (∞) → 连续 5 次 A 级 + 效率汇报
预估偏差
精准:实际 ∈ 预估 ± 20% → +5 积分 合理:实际 ∈ 预估 ± 50% → 0 积分 偏离:实际 > 预估 × 150% → -5 积分 严重:实际 > 预估 × 100%+ → -10 积分
触发条件和反馈
触发条件
场景 触发时机 反馈类型
优秀表现 连续 2+ A 级 EXCELLENT
良好表现 A + B 混合 GOOD
问题累积 连续 B 或多次小问题 ALERT
质量下滑 1 次 C 级 CRITICAL (REWORK)
严重问题 2+ C/D 级或多次警告 CRITICAL ALERT (WARNING)
最坏情况 3+ C/D 级 FINAL ULTIMATUM
反馈框架
[EVALUATION] EXCELLENT
Agent Performance: EXCELLENT
Metrics:
- Consecutive A-grades: {N}
- Avg efficiency: {%}
- CHECKFIX rate: 100%
- Points: +{积分}
Next Upgrade: {N} more A-grades Recommendation: Escalate to harder tasks
[ALERT] Pattern Detected
Pattern: {issue} Frequency: {N} times Severity: MEDIUM
Actions:
- Review self.opt entries
- Apply prevention measures
- Monitor next task
Status: MEDIUM RISK
[CRITICAL] Rework Required
Grade: C (Rework needed) Issue: {问题} Severity: HIGH
Requirements: [ ] Fix primary issue [ ] Run CHECKFIX [8/8] [ ] Document in self.opt [ ] Resubmit
Deadline: {日期} Budget: Downgrade to Standard Points: -20
[CRITICAL ALERT] Degradation
Quality Degradation Detected
Issues: {N} found Points Lost: {积分}
MANDATORY IMPROVEMENTS: [1] CHECKFIX: 8/8 every delivery [2] Error Doc: self.opt entries [3] Estimation: ±20% accuracy
System Actions: ✓ Budget: Strict (3k) locked ✓ Review: MANDATORY 2-tier ✓ Points: -50
Risk: Continued → Task suspension
使用场景
场景 1:评估任务完成
$inspector evaluate-task
Task Info:
- Agent: Backend
- Budget: Standard (8k)
- Tokens: 6.8k / 8k
- CHECKFIX: 8/8 pass
- Quality: One-pass delivery
- Estimation: 7k → 6.8k (±3%)
Inspector Analysis: → Grade: A → Points: +15 (quality) + 5 (estimation) = +20 → Consecutive A: 2/3 (toward upgrade) → Status: Excellent - continue current trajectory
场景 2:检测问题模式
$inspector detect-pattern
Pattern Analysis:
- Issue: CHECKFIX failures in type checking
- Frequency: 3 agents, last 5 days
- Severity: MEDIUM
Root Cause:
- Likely: Type annotation complexity
- Affected: Backend (3), Analyst (1)
Recommendation:
- Team training on type system
- Add type-checking checklist
- Add buffer time in estimates
场景 3:团队周度报告
$inspector weekly-report
Team Performance Summary:
- Total Agents: 5
- Avg Grade: A- (across all)
- Team Efficiency: 87%
- CHECKFIX Compliance: 97%
- Weekly Points: +89
Individual Status:
✨ Frontend (A): 145 pts | Generous budget
✨ Analyst (A): 128 pts | Generous budget
👍 Backend (B): 87 pts | Standard budget
⚠️ Designer (B): 76 pts | Standard budget
📈 Tester (↗): 92 pts | Trending up
Risks & Actions:
- Designer: Pattern detected, needs mentoring
- Tester: On track, nearly ready for upgrade
Recommendations:
- Assign complex tasks to Frontend/Analyst
- Pair Designer with Frontend for learning
- Continue Tester's current workload
场景 4:自动资源调度
$inspector recommend-tasks
Current State:
- 5 Agents with varying performance
- 10 tasks in backlog with complexity levels
Task Assignment Recommendation: Hard Tasks (Highest complexity): → Frontend Agent (A, Generous) - Ready for challenge → Analyst Agent (A, Generous) - Complex analysis
Medium Tasks: → Backend Agent (B, Standard) - Normal workload → Tester Agent (B→A, improving) - Growth opportunity
Learning Tasks: → Designer Agent (B, needs improvement) - Simpler tasks + mentoring
Expected Outcome:
- Maximize team efficiency
- Accelerate growth of improving agents
- Maintain quality standards
- Leverage top performers
与其他平台的协调
所有平台(Claude/Codex/Gemini/Cursor)的 Inspector Agent 共享:
统一标准
-
✅ 质量等级 (A/B/C/D)
-
✅ Token 预算等级 (严格/标准/宽松/信任)
-
✅ 积分系统
-
✅ 反馈框架
独立实现
-
📋 Claude:Agent 内嵌 + CLI 仪表盘
-
📋 Codex:Skill 驱动 + 自动触发
-
📋 Gemini:CLI 查询 + 对话交互
-
📋 Cursor:Rules 规范 + 编辑器集成
共享知识库
-
📚 Team Self.opt(错误库、最佳实践)
-
📚 全局指标和趋势
-
📚 跨 Agent 学习资源
最佳实践
-
一致性:所有 Agent 遵循相同的评估标准
-
及时性:不在最后才指出问题
-
透明性:反馈和决策清晰、可追踪
-
公平性:奖励优秀,帮助改进,防止偏见
-
自动化:标准决策自动化,异常情况人工审核