agent-architecture-analysis

12-Factor Agents Compliance Analysis

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent-architecture-analysis" with this command: npx skills add existential-birds/beagle/existential-birds-beagle-agent-architecture-analysis

12-Factor Agents Compliance Analysis

Reference: 12-Factor Agents

Input Parameters

Parameter Description Required

docs_path

Path to documentation directory (for existing analyses) Optional

codebase_path

Root path of the codebase to analyze Required

Analysis Framework

Factor 1: Natural Language to Tool Calls

Principle: Convert natural language inputs into structured, deterministic tool calls using schema-validated outputs.

Search Patterns:

Look for Pydantic schemas

grep -r "class.BaseModel" --include=".py" grep -r "TaskDAG|TaskResponse|ToolCall" --include="*.py"

Look for JSON schema generation

grep -r "model_json_schema|json_schema" --include="*.py"

Look for structured output generation

grep -r "output_type|response_model" --include="*.py"

File Patterns: **/agents/.py , **/schemas/.py , **/models/*.py

Compliance Criteria:

Level Criteria

Strong All LLM outputs use Pydantic/dataclass schemas with validators

Partial Some outputs typed, but dict returns or unvalidated strings exist

Weak LLM returns raw strings parsed manually or with regex

Anti-patterns:

  • json.loads(llm_response) without schema validation

  • output.split() or regex parsing of LLM responses

  • dict[str, Any] return types from agents

  • No validation between LLM output and handler execution

Factor 2: Own Your Prompts

Principle: Treat prompts as first-class code you control, version, and iterate on.

Search Patterns:

Look for embedded prompts

grep -r "SYSTEM_PROMPT|system_prompt" --include="*.py" grep -r '""".You are' --include=".py"

Look for template systems

grep -r "jinja|Jinja|render_template" --include=".py" find . -name ".jinja2" -o -name "*.j2"

Look for prompt directories

find . -type d -name "prompts"

File Patterns: /prompts/ , /templates/ , **/agents/*.py

Compliance Criteria:

Level Criteria

Strong Prompts in separate files, templated (Jinja2), versioned

Partial Prompts as module constants, some parameterization

Weak Prompts hardcoded inline in functions, f-strings only

Anti-patterns:

  • f"You are a {role}..." inline in agent methods

  • Prompts mixed with business logic

  • No way to iterate on prompts without code changes

  • No prompt versioning or A/B testing capability

Factor 3: Own Your Context Window

Principle: Control how history, state, and tool results are formatted for the LLM.

Search Patterns:

Look for context/message management

grep -r "AgentMessage|ChatMessage|messages" --include=".py" grep -r "context_window|context_compiler" --include=".py"

Look for custom serialization

grep -r "to_xml|to_context|serialize" --include="*.py"

Look for token management

grep -r "token_count|max_tokens|truncate" --include="*.py"

File Patterns: **/context/.py , **/state/.py , **/core/*.py

Compliance Criteria:

Level Criteria

Strong Custom context format, token optimization, typed events, compaction

Partial Basic message history with some structure

Weak Raw message accumulation, standard OpenAI format only

Anti-patterns:

  • Unbounded message accumulation

  • Large artifacts embedded inline (diffs, files)

  • No agent-specific context filtering

  • Same context for all agent types

Factor 4: Tools Are Structured Outputs

Principle: Tools produce schema-validated JSON that triggers deterministic code, not magic function calls.

Search Patterns:

Look for tool/response schemas

grep -r "class.Response.BaseModel" --include=".py" grep -r "ToolResult|ToolOutput" --include=".py"

Look for deterministic handlers

grep -r "def handle_|def execute_" --include="*.py"

Look for validation layer

grep -r "model_validate|parse_obj" --include="*.py"

File Patterns: **/tools/.py , **/handlers/.py , **/agents/*.py

Compliance Criteria:

Level Criteria

Strong All tool outputs schema-validated, handlers type-safe

Partial Most tools typed, some loose dict returns

Weak Tools return arbitrary dicts, no validation layer

Anti-patterns:

  • Tool handlers that directly execute LLM output

  • eval() or exec() on LLM-generated code

  • No separation between decision (LLM) and execution (code)

  • Magic method dispatch based on string matching

Factor 5: Unify Execution State

Principle: Merge execution state (step, retries) with business state (messages, results).

Search Patterns:

Look for state models

grep -r "ExecutionState|WorkflowState|Thread" --include="*.py"

Look for dual state systems

grep -r "checkpoint|MemorySaver" --include=".py" grep -r "sqlite|database|repository" --include=".py"

Look for state reconstruction

grep -r "load_state|restore|reconstruct" --include="*.py"

File Patterns: **/state/.py , **/models/.py , **/database/*.py

Compliance Criteria:

Level Criteria

Strong Single serializable state object with all execution metadata

Partial State exists but split across systems (memory + DB)

Weak Execution state scattered, requires multiple queries to reconstruct

Anti-patterns:

  • Retry count stored separately from task state

  • Error history in logs but not in state

  • LangGraph checkpoints + separate database storage

  • No unified event thread

Factor 6: Launch/Pause/Resume

Principle: Agents support simple APIs for launching, pausing at any point, and resuming.

Search Patterns:

Look for REST endpoints

grep -r "@router.post|@app.post" --include=".py" grep -r "start_workflow|pause|resume" --include=".py"

Look for interrupt mechanisms

grep -r "interrupt_before|interrupt_after" --include="*.py"

Look for webhook handlers

grep -r "webhook|callback" --include="*.py"

File Patterns: **/routes/.py , **/api/.py , **/orchestrator/*.py

Compliance Criteria:

Level Criteria

Strong REST API + webhook resume, pause at any point including mid-tool

Partial Launch/pause/resume exists but only at coarse-grained points

Weak CLI-only launch, no pause/resume capability

Anti-patterns:

  • Blocking input() or confirm() calls

  • No way to resume after process restart

  • Approval only at plan level, not per-tool

  • No webhook-based resume from external systems

Factor 7: Contact Humans with Tools

Principle: Human contact is a tool call with question, options, and urgency.

Search Patterns:

Look for human input mechanisms

grep -r "typer.confirm|input(|prompt(" --include=".py" grep -r "request_human_input|human_contact" --include=".py"

Look for approval patterns

grep -r "approval|approve|reject" --include="*.py"

Look for structured question formats

grep -r "question.options|HumanInputRequest" --include=".py"

File Patterns: **/agents/.py , **/tools/.py , **/orchestrator/*.py

Compliance Criteria:

Level Criteria

Strong request_human_input tool with question/options/urgency/format

Partial Approval gates exist but hardcoded in graph structure

Weak Blocking CLI prompts, no tool-based human contact

Anti-patterns:

  • typer.confirm() in agent code

  • Human contact hardcoded at specific graph nodes

  • No way for agents to ask clarifying questions

  • Single response format (yes/no only)

Factor 8: Own Your Control Flow

Principle: Custom control flow, not framework defaults. Full control over routing, retries, compaction.

Search Patterns:

Look for routing logic

grep -r "add_conditional_edges|route_|should_continue" --include="*.py"

Look for custom loops

grep -r "while True|for.*in.range" --include=".py" | grep -v test

Look for execution mode control

grep -r "execution_mode|agentic|structured" --include="*.py"

File Patterns: **/orchestrator/.py , **/graph/.py , **/core/*.py

Compliance Criteria:

Level Criteria

Strong Custom routing functions, conditional edges, execution mode control

Partial Framework control flow with some customization

Weak Default framework loop with no custom routing

Anti-patterns:

  • Single path through graph with no branching

  • No distinction between tool types (all treated same)

  • Framework-default error handling only

  • No rate limiting or resource management

Factor 9: Compact Errors into Context

Principle: Errors in context enable self-healing. Track consecutive errors, escalate after threshold.

Search Patterns:

Look for error handling

grep -r "except.Exception|error_history|consecutive_errors" --include=".py"

Look for retry logic

grep -r "retry|backoff|max_attempts" --include="*.py"

Look for escalation

grep -r "escalate|human_escalation" --include="*.py"

File Patterns: **/agents/.py , **/orchestrator/.py , **/core/*.py

Compliance Criteria:

Level Criteria

Strong Errors in context, retry with threshold, automatic escalation

Partial Errors logged and returned, no automatic retry loop

Weak Errors logged only, not fed back to LLM, task fails immediately

Anti-patterns:

  • logger.error() without adding to context

  • No retry mechanism (fail immediately)

  • No consecutive error tracking

  • No escalation to humans after repeated failures

Factor 10: Small, Focused Agents

Principle: Each agent has narrow responsibility, 3-10 steps max.

Search Patterns:

Look for agent classes

grep -r "class.*Agent|class.*Architect|class.Developer" --include=".py"

Look for step definitions

grep -r "steps|tasks" --include="*.py" | head -20

Count methods per agent

grep -r "async def|def " agents/*.py 2>/dev/null | wc -l

File Patterns: **/agents/*.py

Compliance Criteria:

Level Criteria

Strong 3+ specialized agents, each with single responsibility, step limits

Partial Multiple agents but some have broad scope

Weak Single "god" agent that handles everything

Anti-patterns:

  • Single agent with 20+ tools

  • Agent with unbounded step count

  • Mixed responsibilities (planning + execution + review)

  • No step or time limits on agent execution

Factor 11: Trigger from Anywhere

Principle: Workflows triggerable from CLI, REST, WebSocket, Slack, webhooks, etc.

Search Patterns:

Look for entry points

grep -r "@cli.command|@router.post|@app.post" --include="*.py"

Look for WebSocket support

grep -r "WebSocket|websocket" --include="*.py"

Look for external integrations

grep -r "slack|discord|webhook" --include="*.py" -i

File Patterns: **/routes/.py , **/cli/.py , **/main.py

Compliance Criteria:

Level Criteria

Strong CLI + REST + WebSocket + webhooks + chat integrations

Partial CLI + REST API available

Weak CLI only, no programmatic access

Anti-patterns:

  • Only if name == "main" entry point

  • No REST API for external systems

  • No event streaming for real-time updates

  • Trigger logic tightly coupled to execution

Factor 12: Stateless Reducer

Principle: Agents as pure functions: (state, input) -> (state, output). No side effects in agent logic.

Search Patterns:

Look for state mutation patterns

grep -r ".status = |.field = " --include="*.py"

Look for immutable updates

grep -r "model_copy|.copy(|with_" --include="*.py"

Look for side effects in agents

grep -r "write_file|subprocess|requests." agents/*.py 2>/dev/null

File Patterns: **/agents/.py , **/nodes/.py

Compliance Criteria:

Level Criteria

Strong Immutable state updates, side effects isolated to tools/handlers

Partial Mostly immutable, some in-place mutations

Weak State mutated in place, side effects mixed with agent logic

Anti-patterns:

  • state.field = new_value (mutation)

  • File writes inside agent methods

  • HTTP calls inside agent decision logic

  • Shared mutable state between agents

Factor 13: Pre-fetch Context

Principle: Fetch likely-needed data upfront rather than mid-workflow.

Search Patterns:

Look for context pre-fetching

grep -r "pre_fetch|prefetch|fetch_context" --include="*.py"

Look for RAG/embedding systems

grep -r "embedding|vector|semantic_search" --include="*.py"

Look for related file discovery

grep -r "related_tests|similar_|find_relevant" --include="*.py"

File Patterns: **/context/.py , **/retrieval/.py , **/rag/*.py

Compliance Criteria:

Level Criteria

Strong Automatic pre-fetch of related tests, files, docs before planning

Partial Manual context passing, design doc support

Weak No pre-fetching, LLM must request all context via tools

Anti-patterns:

  • Architect starts with issue only, no codebase context

  • No semantic search for similar past work

  • Related tests/files discovered only during execution

  • No RAG or document retrieval system

Output Format

Executive Summary Table

FactorStatusNotes
1. Natural Language -> Tool CallsStrong/Partial/Weak[Key finding]
2. Own Your PromptsStrong/Partial/Weak[Key finding]
.........
13. Pre-fetch ContextStrong/Partial/Weak[Key finding]

Overall: X Strong, Y Partial, Z Weak

Per-Factor Analysis

For each factor, provide:

Current Implementation

  • Evidence with file:line references

  • Code snippets showing patterns

Compliance Level

  • Strong/Partial/Weak with justification

Gaps

  • What's missing vs. 12-Factor ideal

Recommendations

  • Actionable improvements with code examples

Analysis Workflow

Initial Scan

  • Run search patterns for all factors

  • Identify key files for each factor

  • Note any existing compliance documentation

Deep Dive (per factor)

  • Read identified files

  • Evaluate against compliance criteria

  • Document evidence with file paths

Gap Analysis

  • Compare current vs. 12-Factor ideal

  • Identify anti-patterns present

  • Prioritize by impact

Recommendations

  • Provide actionable improvements

  • Include before/after code examples

  • Reference roadmap if exists

Summary

  • Compile executive summary table

  • Highlight strengths and critical gaps

  • Suggest priority order for improvements

Quick Reference: Compliance Scoring

Score Meaning Action

Strong Fully implements principle Maintain, minor optimizations

Partial Some implementation, significant gaps Planned improvements

Weak Minimal or no implementation High priority for roadmap

When to Use This Skill

  • Evaluating new LLM-powered systems

  • Reviewing agent architecture decisions

  • Auditing production agentic applications

  • Planning improvements to existing agents

  • Comparing frameworks or implementations

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

langgraph-code-review

No summary provided by upstream source.

Repository SourceNeeds Review
General

tailwind-v4

No summary provided by upstream source.

Repository SourceNeeds Review
General

react-flow

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

docling

No summary provided by upstream source.

Repository SourceNeeds Review