workflow-quality-audit

Domain-agnostic quality audit pipeline. Enumerates items, loads standards, captures evidence, evaluates per-item, posts structured verdicts, verifies coverage, and summarizes findings. Works for UI components, code review, SDLC, icon design, animation, or any domain with items to audit against criteria.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "workflow-quality-audit" with this command: npx skills add lev-os/agents/lev-os-agents-workflow-quality-audit

Workflow: Quality Audit

A 7-step pipeline for auditing any collection of items against a set of standards. Domain-agnostic by design — the caller supplies a config block that plugs in the item source, standards document, evidence capture method, evaluation template, and reporting target.

When to Use

  • After completing a body of work (design wave, sprint, migration) that needs QA sign-off
  • Before a release to verify quality across a surface area
  • For structured reviews: code, visual, accessibility, compliance, performance
  • Any time you have N items and need a per-item verdict against defined criteria

Config Schema

Supply a YAML config block when invoking. All fields are optional — the pipeline will prompt for missing required fields.

# --- Required ---
domain: <string>              # Human label (e.g., "kingly-design-system", "api-endpoints")

items:
  source: <enum>              # bd-children | file-glob | manual-list | api
  parent: <string>            # bd parent bead ID (if source=bd-children)
  glob: <string>              # File glob pattern (if source=file-glob)
  list: [<string>, ...]       # Explicit item list (if source=manual-list)
  api: <string>               # API endpoint/command (if source=api)

standards:
  doc: <path>                 # Path to standards/checklist document
  sections: [<string>, ...]   # Specific sections to evaluate (optional, default=all)
  known_gaps: <string>        # Section to skip or flag separately

# --- Optional ---
evidence:
  type: <enum>                # screenshot | test-run | recording | measurement | source-read
  capture: <string>           # Shell command template for capture
  naming: <string>            # Filename template: {id}, {slug}, {date} supported
  storage: <path>             # Output directory for evidence artifacts

evaluation:
  template: <enum>            # component-audit | visual-audit | code-review | generic | custom
  custom_path: <path>         # Path to custom checklist (if template=custom)
  agents: <int>               # Number of parallel evaluator agents (default: 2)
  parallel: <bool>            # Run evaluators in parallel (default: true)
  split_by: <string>          # How to partition work (e.g., "source-vs-visual", "by-epic")

reporting:
  target: <enum>              # bd-comments | markdown-files | stdout
  verdict_taxonomy: [<string>, ...]  # Custom verdicts (default: [PASS, NEEDS_WORK, BLOCKED])
  output_dir: <path>          # Directory for report files (default: .lev/reports/)

summary:
  format: <enum>              # scorecard | heatmap | backlog (default: scorecard)

Pipeline Steps

Step 1: INVENTORY

Enumerate all items to audit.

SourceMethod
bd-childrenbd list --parent {parent} — collects bead IDs + titles
file-globGlob({glob}) — each matching file is an item
manual-listLiteral list from config
apiExecute {api} command, parse JSON/lines output

Output: An ordered list of (id, title/name, metadata) tuples. Persist to .lev/reports/audit-{domain}-{date}/inventory.json.

Step 2: STANDARDS

Load and parse the evaluation criteria.

  1. Read the standards.doc file
  2. If sections is specified, extract only those sections
  3. If known_gaps is specified, flag that section as informational (not scored)
  4. Parse into a checklist of (category, criterion, weight) triples

Output: Structured checklist ready for per-item evaluation.

Step 3: EVIDENCE

Capture per-item evidence using the configured method.

TypeMethod
screenshotExecute {capture} command per item, save to {storage}/{naming}
test-runExecute test suite, capture per-item results
recordingCapture screen/interaction recording per item
measurementRun measurement command, capture metrics
source-readRead source files associated with each item

Evidence is stored at {storage}/{naming} with template variables expanded. If capture fails for an item, mark it EVIDENCE_FAILED and continue.

Output: Evidence manifest mapping each item ID to its artifact paths.

Step 4: EVALUATE

Run the evaluation checklist against each item's evidence.

Team delegation pattern:

  • If agents > 1 and parallel: true, partition items across N evaluator agents
  • Each evaluator receives: item subset, standards checklist, evidence paths
  • Use split_by to control partitioning (round-robin if not specified)

Per-item evaluation produces:

## {item.title}

### {category_1}
- [ ] {criterion}: {PASS|FAIL} — {note}
- [ ] {criterion}: {PASS|FAIL} — {note}

### {category_2}
...

**Verdict:** {verdict} ({pass_count}/{total_count})
**Issues:**
1. {issue description} — severity: {P0-P3}

Built-in templates (see templates/ directory):

  • component-audit — Visual + Code Quality + Accessibility + Standards Compliance
  • visual-audit — Visual Fidelity + Storybook Coverage + Code Structure
  • code-review — Correctness + Style + Performance + Security + Tests
  • generic — User-defined categories (bare scaffold)
  • custom — Load from custom_path

Step 5: REPORT

Post the structured verdict for each item.

TargetMethod
bd-commentsbd comment {item.id} --body "{verdict}" per item
markdown-filesWrite {output_dir}/{item.id}-verdict.md per item
stdoutPrint verdicts to console

Each verdict includes: item ID, title, verdict label, pass/fail counts, issue list.

Step 6: VERIFY

Confirm 100% coverage.

  1. Load inventory from Step 1
  2. Load all verdicts from Step 5
  3. Check: every item has exactly one verdict
  4. Check: no items are EVIDENCE_FAILED without a retry or exception note
  5. Check: verdict taxonomy matches config (no unknown labels)

Output: Coverage report — {covered}/{total} with any gaps listed.

If gaps exist, report them and optionally re-run Steps 3-5 for missing items.

Step 7: SUMMARIZE

Aggregate findings into a top-level summary.

Scorecard format (default):

Domain: {domain}
Date: {date}
Items: {total} | PASS: {n} | NEEDS_WORK: {n} | BLOCKED: {n}
Coverage: {covered}/{total} ({pct}%)
Issues: {total_issues} (P0: {n}, P1: {n}, P2: {n}, P3: {n})
Top systemic patterns:
  1. {pattern} — {count} items affected
  2. {pattern} — {count} items affected

Write full report to {output_dir}/audit-{domain}-{date}.md.

Team Delegation

For audits with >5 items, delegate to parallel evaluators:

RoleResponsibility
Coordinator (you)Steps 1, 2, 6, 7 — inventory, standards, verify, summarize
Evaluator 1..NSteps 3-5 — capture evidence, evaluate, post verdicts

Evaluators receive a self-contained work packet:

  • Their item subset (IDs + titles)
  • The full standards checklist
  • The evidence capture config
  • The report template + target config

Evaluators return a manifest: {items_evaluated, verdicts, issues_found, evidence_paths}.

Domain Examples

UI Components (the original use case)

domain: kingly-design-system
items: { source: bd-children, parent: oclaw-top.2 }
standards: { doc: packages/kingly-ui/DESIGN_STANDARDS.md }
evidence: { type: screenshot, capture: "apptestr screenshot" }
evaluation: { template: component-audit, agents: 2, parallel: true }
reporting: { target: bd-comments, verdict_taxonomy: [PASS, NEEDS_WORK] }

Code Review

domain: api-endpoints
items: { source: file-glob, glob: "src/routes/**/*.ts" }
standards: { doc: docs/API_STANDARDS.md }
evidence: { type: source-read }
evaluation: { template: code-review, agents: 2 }
reporting: { target: markdown-files, output_dir: .lev/reports/api-review/ }

SDLC / Sprint Review

domain: sprint-42
items: { source: bd-children, parent: sprint-42 }
standards: { doc: docs/DEFINITION_OF_DONE.md }
evidence: { type: test-run, capture: "pnpm test --filter {id}" }
evaluation: { template: generic }
reporting: { target: bd-comments, verdict_taxonomy: [SHIP, FIX, HOLD] }

Icon Design

domain: app-icons
items: { source: file-glob, glob: "assets/icons/**/*.svg" }
standards: { doc: docs/BRAND_GUIDE.md, sections: [Icons, Color Palette] }
evidence: { type: measurement, capture: "convert {path} -resize 16x16 /tmp/{slug}-16.png" }
evaluation: { template: visual-audit }
reporting: { target: markdown-files, verdict_taxonomy: [APPROVED, REVISE] }

Validation

The audit is complete when:

  • Inventory persisted with item count logged
  • Standards checklist parsed and loaded
  • Evidence captured for all items (or failures documented)
  • Every item has exactly one verdict
  • Coverage is 100% (or gaps explicitly noted)
  • Summary report written to output directory
  • All P0/P1 issues have corresponding tracking items (if using bd)

Notes

  • For bd-children source, items inherit the parent's epic/project context
  • Evidence capture failures are non-fatal — the item gets EVIDENCE_FAILED and continues
  • The known_gaps standards section is included in reports as informational but not scored
  • Parallel evaluation scales linearly: 2 agents = ~2x throughput for >10 items
  • Custom verdict taxonomies must have at least 2 labels (one positive, one negative)
  • Templates are composable — component-audit includes visual + code + a11y categories

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

security-best-practices

No summary provided by upstream source.

Repository SourceNeeds Review
Security

security-review

No summary provided by upstream source.

Repository SourceNeeds Review
Security

security-threat-model

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

lev-intake

No summary provided by upstream source.

Repository SourceNeeds Review