evals-context

Evals Codebase Context

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "evals-context" with this command: npx skills add roocodeinc/roo-code/roocodeinc-roo-code-evals-context

Evals Codebase Context

When to Use This Skill

Use this skill when the task involves:

  • Modifying or debugging the evals execution infrastructure

  • Adding new eval exercises or languages

  • Working with the evals web interface (apps/web-evals)

  • Modifying the public evals display page on roocode.com

  • Understanding where evals code lives in this monorepo

When NOT to Use This Skill

Do NOT use this skill when:

  • Working on unrelated parts of the codebase (extension, webview-ui, etc.)

  • The task is purely about the VS Code extension's core functionality

  • Working on the main website pages that don't involve evals

Key Disambiguation: Two "Evals" Locations

This monorepo has two distinct evals-related locations that can cause confusion:

Component Path Purpose

Evals Execution System packages/evals/

Core eval infrastructure: CLI, DB schema, Docker configs

Evals Management UI apps/web-evals/

Next.js app for creating/monitoring eval runs (localhost:3446)

Website Evals Page apps/web-roo-code/src/app/evals/

Public roocode.com page displaying eval results

External Exercises Repo Roo-Code-Evals Actual coding exercises (NOT in this monorepo)

Directory Structure Reference

packages/evals/

  • Core Evals Package

packages/evals/ ├── ARCHITECTURE.md # Detailed architecture documentation ├── ADDING-EVALS.md # Guide for adding new exercises/languages ├── README.md # Setup and running instructions ├── docker-compose.yml # Container orchestration ├── Dockerfile.runner # Runner container definition ├── Dockerfile.web # Web app container ├── drizzle.config.ts # Database ORM config ├── src/ │ ├── index.ts # Package exports │ ├── cli/ # CLI commands for running evals │ │ ├── runEvals.ts # Orchestrates complete eval runs │ │ ├── runTask.ts # Executes individual tasks in containers │ │ ├── runUnitTest.ts # Validates task completion via tests │ │ └── redis.ts # Redis pub/sub integration │ ├── db/ │ │ ├── schema.ts # Database schema (runs, tasks) │ │ ├── queries/ # Database query functions │ │ └── migrations/ # SQL migrations │ └── exercises/ │ └── index.ts # Exercise loading utilities └── scripts/ └── setup.sh # Local macOS setup script

apps/web-evals/

  • Evals Management Web App

apps/web-evals/ ├── src/ │ ├── app/ │ │ ├── page.tsx # Home page (runs list) │ │ ├── runs/ │ │ │ ├── new/ # Create new eval run │ │ │ └── [id]/ # View specific run status │ │ └── api/runs/ # SSE streaming endpoint │ ├── actions/ # Server actions │ │ ├── runs.ts # Run CRUD operations │ │ ├── tasks.ts # Task queries │ │ ├── exercises.ts # Exercise listing │ │ └── heartbeat.ts # Controller health checks │ ├── hooks/ # React hooks (SSE, models, etc.) │ └── lib/ # Utilities and schemas

apps/web-roo-code/src/app/evals/

  • Public Website Evals Page

apps/web-roo-code/src/app/evals/ ├── page.tsx # Fetches and displays public eval results ├── evals.tsx # Main evals display component ├── plot.tsx # Visualization component └── types.ts # EvalRun type (extends packages/evals types)

This page displays eval results on the public roocode.com website. It imports types from @roo-code/evals but does NOT run evals.

Architecture Overview

The evals system is a distributed evaluation platform that runs AI coding tasks in isolated VS Code environments:

┌─────────────────────────────────────────────────────────────┐ │ Web App (apps/web-evals) ──────────────────────────────── │ │ │ │ │ ▼ │ │ PostgreSQL ◄────► Controller Container │ │ │ │ │ │ ▼ ▼ │ │ Redis ◄───► Runner Containers (1-25 parallel) │ └─────────────────────────────────────────────────────────────┘

Key components:

  • Controller: Orchestrates eval runs, spawns runners, manages task queue (p-queue)

  • Runner: Isolated Docker container with VS Code + Roo Code extension + language runtimes

  • Redis: Pub/sub for real-time events (NOT task queuing)

  • PostgreSQL: Stores runs, tasks, metrics

Common Tasks Quick Reference

Adding a New Eval Exercise

  • Add exercise to Roo-Code-Evals repo (external)

  • See packages/evals/ADDING-EVALS.md for structure

Modifying Eval CLI Behavior

Edit files in packages/evals/src/cli/ :

  • runEvals.ts

  • Run orchestration

  • runTask.ts

  • Task execution

  • runUnitTest.ts

  • Test validation

Modifying the Evals Web Interface

Edit files in apps/web-evals/src/ :

  • app/runs/new/new-run.tsx

  • New run form

  • actions/runs.ts

  • Run server actions

Modifying the Public Evals Display Page

Edit files in apps/web-roo-code/src/app/evals/ :

  • evals.tsx

  • Display component

  • plot.tsx

  • Charts

Database Schema Changes

  • Edit packages/evals/src/db/schema.ts

  • Generate migration: cd packages/evals && pnpm drizzle-kit generate

  • Apply migration: pnpm drizzle-kit migrate

Running Evals Locally

From repo root

pnpm evals

Opens web UI at http://localhost:3446

Ports (defaults):

  • PostgreSQL: 5433

  • Redis: 6380

  • Web: 3446

Testing

packages/evals tests

cd packages/evals && npx vitest run

apps/web-evals tests

cd apps/web-evals && npx vitest run

Key Types/Exports from @roo-code/evals

The package exports are defined in packages/evals/src/index.ts :

  • Database queries: getRuns , getTasks , getTaskMetrics , etc.

  • Schema types: Run , Task , TaskMetrics

  • Used by both apps/web-evals and apps/web-roo-code

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

roo-conflict-resolution

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

roo-translation

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

A Python CLI skill for Cutout.Pro visual APIs — background removal, face cutout, and photo enhancement. Supports file upload & image URL input.

Call Cutout.Pro visual processing APIs to perform background removal, face cutout, and photo enhancement. Supports both file upload and image URL input, retu...

Registry SourceRecently Updated
evals-context | V50.AI