evaluate

Comprehensive quality evaluation for any AI-generated artifact. Produces its report as a visualization.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "evaluate" with this command: npx skills add careerhackeralex/visualize/careerhackeralex-visualize-evaluate

Evaluate

Comprehensive quality evaluation for any AI-generated artifact. Produces its report as a visualization.

How It Works

┌──────────────────────────────────────────────┐ │ │ │ Phase 1: SPEC GENERATION │ │ Analyze the artifact type │ │ Generate tailored evaluation criteria │ │ Define scoring dimensions + weights │ │ Set quality gates │ │ │ │ │ ▼ │ │ Phase 2: EVALUATION │ │ Run automated checks (when possible) │ │ Visual/manual inspection │ │ Score each dimension with evidence │ │ Identify systemic vs local issues │ │ │ │ │ ▼ │ │ Phase 3: REPORT (via /visualize) │ │ Generate a beautiful HTML eval report │ │ Scores, charts, screenshots, fix list │ │ Radar chart of dimensions │ │ Before/after tracking │ │ │ └──────────────────────────────────────────────┘

Phase 1: Spec Generation

For any artifact, generate evaluation specs by analyzing:

  1. Identify Artifact Type
  • HTML Visualization → visual design, interactivity, technical, content, shareability

  • Code/Project → correctness, readability, architecture, test coverage, performance

  • Document/Report → clarity, structure, accuracy, completeness, tone

  • Conversation/Agent → helpfulness, accuracy, tone, efficiency, safety

  • Slide Deck → all visualization dims + narrative flow, persuasion, pacing

  • Dashboard → data accuracy, information density, scannability, actionability

  • Custom → derive dimensions from the skill's SKILL.md and stated goals

  1. Generate Dimensions

For each artifact type, produce 6-10 evaluation dimensions. Each dimension needs:

  • Name — short, clear label

  • Description — what this dimension measures

  • Weight — percentage (all weights sum to 100%)

  • Scoring anchors — what does a 10, 8, 6, 4 look like?

  • Automated checks — any programmatic tests (if applicable)

  • Deductions — specific issues and their point costs

  1. Set Quality Gates

Define gates based on the artifact's purpose:

Gate Criteria Meaning

🚀 EXCEPTIONAL Overall ≥ 9.5, all ≥ 9 Best-in-class. Share everywhere.

✅ SHIP Overall ≥ 9.0, all ≥ 8 Production-ready.

⚠️ ACCEPTABLE Overall ≥ 8.0, all ≥ 7 Usable but not impressive.

🔧 NEEDS WORK Overall ≥ 7.0 or any < 7 Fix before releasing.

❌ FAIL Overall < 7.0 or any < 5 Major rework.

  1. Output Spec Document

Write the spec to eval-spec-[artifact-name].md for reference and reuse.

Phase 2: Evaluation

For HTML Visualizations

Open in browser at 3 viewports (1280×720, 768×1024, 375×667).

Automated audit (run in browser console):

(function() { const audit = {}; const style = [...document.querySelectorAll('style')].map(s => s.textContent).join(' '); const html = document.documentElement.outerHTML;

// Structure audit.hasDoctype = /^<!doctype html>/i.test(html); audit.hasLangAttr = !!document.documentElement.lang; audit.hasCharset = !!document.querySelector('meta[charset]'); audit.hasViewport = !!document.querySelector('meta[name="viewport"]'); audit.hasTitle = document.title.length > 0;

// Menu system audit.menuExists = !!document.querySelector('.viz-menu'); audit.menuHasTheme = !!html.match(/cycleTheme|themeLabel/i); audit.menuHasDownload = !!html.match(/htmlToImage|html-to-image/i); audit.menuHasPrint = !!html.match(/window.print/i);

// Theme system audit.hasCSSVars = !!style.match(/--bg\s*:/); audit.hasDarkTheme = !!style.match(/(.theme-dark|:root)[\s\S]*?--bg/); audit.hasLightTheme = !!style.match(/.theme-light/); audit.themePersistedToStorage = !!html.match(/localStorage.*theme/i);

// Typography audit.hasInterFont = !!html.match(/fonts.googleapis.*Inter|font-family.*Inter/i); audit.hasFontFallback = !!style.match(/-apple-system|system-ui/); audit.bodyFontSize = parseFloat(getComputedStyle(document.body).fontSize); audit.bodyFontOK = audit.bodyFontSize >= 14;

// Layout audit.usesFlexOrGrid = !!(style.match(/display\s*:\s*(flex|grid)/)); audit.hasMaxWidth = !!style.match(/max-width/); audit.hasResponsiveBreakpoints = !!style.match(/@media.*max-width|@media.*min-width|sm:|md:|lg:/);

// Print & Accessibility audit.hasPrintStyles = !!style.match(/@media\s*print/); audit.hasPrintColorAdjust = !!style.match(/print-color-adjust/); audit.hasReducedMotion = !!style.match(/prefers-reduced-motion/); audit.hasAriaLabels = !!html.match(/aria-label/); audit.hasSemanticHTML = !!html.match(/<(header|main|nav|section|article|footer)/);

// Animations audit.hasKeyframes = !!style.match(/@keyframes/); audit.hasTransitions = !!style.match(/transition\s*:/);

// Performance audit.fileSizeKB = Math.round(new Blob([html]).size / 1024); audit.fileSizeOK = audit.fileSizeKB < 200; audit.noExternalImages = document.querySelectorAll('img[src^="http"]').length === 0; audit.htmlToImageLoaded = typeof htmlToImage !== 'undefined';

// Summary const bools = Object.entries(audit).filter(([k,v]) => typeof v === 'boolean'); const passed = bools.filter(([k,v]) => v).length; audit._passed = passed; audit._total = bools.length; audit._percent = Math.round(passed / bools.length * 100); audit._failures = bools.filter(([k,v]) => !v).map(([k]) => k);

console.table(audit); return audit; })();

Visual scoring — 8 dimensions for visualizations:

Dimension Weight 10 = 6 =

D1 First Impression 15% Apple keynote quality Generic template feel

D2 Typography 15% Perfect hierarchy, Inter font, fluid sizing All same size, no hierarchy

D3 Color & Contrast 10% Harmonious, WCAG AA, both themes beautiful Clashing, low contrast

D4 Layout & Spacing 15% Consistent rhythm, responsive, generous space Cramped, broken at mobile

D5 Content Quality 15% Clear message in 5 seconds, zero filler Confusing, placeholder text

D6 Interactivity 10% Menu + theme + download + print all flawless Missing features, broken

D7 Technical 10% Zero errors, semantic, accessible, print-ready Console errors, broken layout

D8 Shareability 10% Would tweet this unprompted Worse than Canva

For Code/Projects

Dimensions: Correctness, Readability, Architecture, Error Handling, Performance, Testing, Documentation, Security

For Documents

Dimensions: Clarity, Structure, Accuracy, Completeness, Tone, Formatting, Actionability, Brevity

For Agent Conversations

Dimensions: Helpfulness, Accuracy, Tone, Efficiency, Safety, Context Awareness, Tool Usage, Follow-through

Phase 3: Visual Report (via /visualize)

After scoring, generate the eval report as a beautiful HTML dashboard using the visualize skill:

Report Structure

  • Hero — artifact name, overall score (big number), quality gate badge

  • Radar Chart — all dimensions plotted on a radar/spider chart (Chart.js)

  • Dimension Cards — each dimension as a card with score, bar, key notes

  • Automated Audit — pass/fail checklist with percentages

  • Screenshots — key views embedded (if HTML artifact)

  • Fix List — prioritized fixes as a kanban-style layout (critical / high / medium / low)

  • Systemic Issues — patterns that affect all outputs (flagged for SKILL.md fixes)

  • History — if re-evaluating, show before/after score comparison chart

Report Filename

eval-report-[artifact-name]-[date].html

The report itself must score ≥ 9.0 on the visualize eval criteria.

This is the ultimate dogfood test — our evaluation tool produces evaluations using our visualization tool.

The Improvement Loop

Generate artifact (any skill) ↓ /evaluate → Spec + Score + Visual Report ↓ Review report → identify fixes ↓ Fix (systemic → SKILL.md, local → artifact) ↓ /evaluate again → compare scores ↓ Ship when gate = SHIP or EXCEPTIONAL

Max 3 loops per artifact. If it can't reach SHIP in 3 loops, the problem is in the skill — update the skill's instructions, not the artifact.

Quick Start

Evaluate a visualization

/evaluate path/to/visualization.html

Evaluate with custom context

/evaluate path/to/code-project --type code

Re-evaluate after fixes (tracks improvement)

/evaluate path/to/visualization.html --loop 2

Generate specs only (no scoring)

/evaluate --specs-only --type dashboard

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

visualize

No summary provided by upstream source.

Repository SourceNeeds Review
General

ai-image-generator

AI 图片与视频异步生成技能,调用 AI Artist API 根据文本提示词生成图片或视频,自动轮询直到任务完成。 ⚠️ 使用前必须设置环境变量 AI_ARTIST_TOKEN 为你自己的 API Key! 获取 API Key:访问 https://staging.kocgo.vip/index 注册登录后创建。 支持图片模型:SEEDREAM5_0(默认高质量图片)、NANO_BANANA_2(轻量快速)。 支持视频模型:SEEDANCE_1_5_PRO(文生视频,支持音频)、SORA2(文生视频或首尾帧图生视频,支持 firstImageUrl/lastImageUrl)。 触发场景: - 用户要求生成图片,如"生成一匹狼"、"画一只猫"、"风景画"、"帮我画"等。 - 用户要求生成视频,如"生成视频"、"用 SORA2 生成"、"文生视频"、"图生视频"、"生成一段...的视频"等。 - 用户指定模型:SEEDREAM5_0、NANO_BANANA_2、SEEDANCE_1_5_PRO、SORA2。

Archived SourceRecently Updated
General

淘宝投放数据分析

# 投放数据分析技能

Archived SourceRecently Updated
General

productclank-campaigns

Community-powered growth for builders. Boost amplifies your social posts with authentic community engagement (replies, likes, reposts). Discover finds relevant conversations and generates AI-powered replies at scale. Use Boost when the user has a post URL. Use Discover when the user wants to find and engage in conversations about their product.

Archived SourceRecently Updated