Meaningful Chunker — Code Intelligence API

A graph-based code analysis system that scans any codebase once, builds a semantic graph of all components and their relationships, then answers natural-language queries against that graph. Returns structured, ranked results with explanation — not raw file dumps.

🖥️ No Code Required — Try It in Your Browser

Open meaningful-chunker-production.up.railway.app, get a free key, paste a GitHub URL, and start asking questions. The web UI shows the same results as the API — rendered as human-readable cards with tier labels, relevance scores, change risk, and blast radius.

The API documentation below is for developers integrating the scanner into agents or pipelines.

⚠️ Required First Step

You MUST scan a codebase before using any query endpoints.

If you skip this, all queries will fail with a no_scan_loaded error.

Always start with POST /scan. See the Scanning a Codebase section below.

🔐 Authentication Required

All endpoints except /health and /status require an API key.

Include this header in every request:

x-api-key: YOUR_API_KEY

Without this header, requests return 401 Unauthorized. Set your key as CHUNKER_API_KEY in your agent's environment.

Setup

Two environment variables are required before using this skill:

CHUNKER_API_URL   — Base URL of the hosted API.
                    Example: https://meaningful-chunker-production.up.railway.app
CHUNKER_API_KEY   — Your API key. Get a free key instantly at:
                    https://meaningful-chunker-production.up.railway.app/register/free
                    (100 queries/month, resets each calendar month, no credit card required)
                    Upgrade to Pro (2,000/month) at /upgrade.

Full OpenAPI spec available at: $CHUNKER_API_URL/docs

When to Use This Skill

Use one of the four query endpoints depending on intent:

Intent	Endpoint	Example query
Understand what something does	`/query/architecture`	"What does the authentication module do?"
Trace a bug or failure	`/query/debug`	"Why is the login function failing?"
Assess safety of a change	`/query/refactor`	"Can I safely change the database connector?"
Understand design principles	`/query/philosophy`	"What principle governs error handling here?"

All four endpoints accept the same request body and return the same response shape.

🚀 30-Second Quick Start

Scan a repo: POST /scan with {"repo_url": "https://github.com/owner/repo"}
Wait until: GET /status → "ready": true
Query: POST /query/architecture with {"query": "What does X do?"}

Scanning a Codebase

Scan a GitHub repo (recommended)

curl -s -X POST $CHUNKER_API_URL/scan \
  -H "Content-Type: application/json" \
  -H "x-api-key: $CHUNKER_API_KEY" \
  -d '{"repo_url": "https://github.com/owner/repo"}'

The system clones the repo, builds the graph, and deletes the local clone automatically. Supports any public GitHub, GitLab, or git URL — including /tree/main branch URLs. Repos larger than 300MB are rejected with a clear error.

Scan a local path (self-hosted instances only)

curl -s -X POST $CHUNKER_API_URL/scan \
  -H "Content-Type: application/json" \
  -H "x-api-key: $CHUNKER_API_KEY" \
  -d '{"project_path": "/path/to/project"}'

Check scan progress

curl $CHUNKER_API_URL/status

When "ready": true the graph is built and all query endpoints are available. Scanning typically takes 5–30 seconds depending on repo size.

How to Query

Request format

curl -s -X POST $CHUNKER_API_URL/query/architecture \
  -H "Content-Type: application/json" \
  -H "x-api-key: $CHUNKER_API_KEY" \
  -d '{"query": "What does the payment processor do?"}'

Replace /query/architecture with the appropriate endpoint for your intent.

Optional: session continuity

Pass a session_id to maintain context across related queries. The system remembers what you asked recently and biases results toward the same area of the codebase.

curl -s -X POST $CHUNKER_API_URL/query/debug \
  -H "Content-Type: application/json" \
  -H "x-api-key: $CHUNKER_API_KEY" \
  -d '{"query": "Why is the checkout flow failing?", "session_id": "my-investigation-001"}'

⚡ Quick Read Guide

For fastest results, read the response in this order:

answer_summary — one sentence telling you what the key component is and why it matters. Start here every time.
primary_path — the execution chain (A → B → C). Shows how things connect structurally.
CENTER tier chunks — the exact matches. These ARE the answer.
EPIPHANY tier chunks — critical cross-system connections. Don't skip these.
next_step — actionable follow-up already computed for you.

Everything else (BREAKTHROUGH, RELEVANT) is supporting context. Read it when you need depth, skip it when you don't.

Understanding the Response

Top-level synthesis fields

answer_summary      — One-sentence synthesis. Start here. Tells you what the
                      key component is, its role, and why it matters.

system_explanation  — What the relevant subsystem does as a whole. Multiple
                      components explained together.

primary_path        — The execution chain: "A → B → C". Shows how components
                      connect structurally. Most useful for debugging.

query_profile       — Which intent was detected (architecture/debug/refactor_risk/
                      philosophy). Confirms the system understood your query.

confidence          — "high" / "medium" / "low". Trust signal before reading context.

next_step           — Actionable follow-up. What to look at next.

Debug-specific fields (present on `/query/debug` responses)

root_cause_analysis.execution_timeline    — Call chain with [FAILING_CHUNK] bracketed
root_cause_analysis.root_cause_hypotheses — Ranked suspects with confidence + check instructions
root_cause_analysis.top_suspect           — Single highest-suspicion chunk with specific check

Refactor-specific fields (present on `/query/refactor` responses)

structural_authority.change_risk   — "untouchable" / "sensitive" / "local"
structural_authority.blast_radius  — How many chunks/files break if this changes
top_risks                          — Top 5 most dangerous chunks in the whole codebase

Context tiers

CENTER       (score=100)  — Exact match. This IS what you asked about.
EPIPHANY     (score≥99)   — Critical cross-component connection. Don't skip these.
BREAKTHROUGH (score≥82)   — Direct structural dependency. Important context.
RELEVANT     (score≥70)   — Meaningful but peripheral. Useful for broader understanding.

Each chunk includes:

name — component identifier
type — class / function / method / module_code / etc.
file — source file path
summary — one-line description
why_matched — how the system found this chunk
neighbors — adjacent components in the graph
explanation (CENTER/EPIPHANY only) — roles, primary reason, importance summary

Example: Architecture Query

curl -s -X POST $CHUNKER_API_URL/query/architecture \
  -H "Content-Type: application/json" \
  -H "x-api-key: $CHUNKER_API_KEY" \
  -d '{"query": "What does the UserAuthenticator do?"}'

Read: answer_summary → primary_path → CENTER explanation.roles → BREAKTHROUGH neighbors

Example: Debug Query

curl -s -X POST $CHUNKER_API_URL/query/debug \
  -H "Content-Type: application/json" \
  -H "x-api-key: $CHUNKER_API_KEY" \
  -d '{"query": "Why is the login flow not working?"}'

Read: root_cause_analysis.top_suspect → execution_timeline → root_cause_hypotheses[0]

Example: Refactor Safety Query

curl -s -X POST $CHUNKER_API_URL/query/refactor \
  -H "Content-Type: application/json" \
  -H "x-api-key: $CHUNKER_API_KEY" \
  -d '{"query": "Can I safely change the database connection handler?"}'

Read: structural_authority.change_risk → blast_radius → advice → top_risks

Relevance Tiers — Scoring Reference

Tier	Score	What it means
CENTER	100	Exact match — this IS the answer
EPIPHANY	≥99	Critical cross-system connection
BREAKTHROUGH	≥82	Direct structural dependency
RELEVANT	≥70	Meaningful peripheral context

Results capped at: CENTER×3, EPIPHANY×5, BREAKTHROUGH×6, RELEVANT×5. Max 19 chunks per response.

Checking API Health

These endpoints are always open — no API key required:

curl $CHUNKER_API_URL/health

Returns {"status": "ok"} when the system is up and a project is indexed.

curl $CHUNKER_API_URL/status

Returns current graph stats: chunk count, edge count, cluster count, shortcut count, last scan time.

⚠️ Common Issues

"no_scan_loaded" → You forgot to run /scan
401 Unauthorized → Missing x-api-key header

Notes

Queries are natural language — no special syntax required. "Why is X broken?" works as well as "explain the architecture of X."
The system is language-agnostic: Python, C++, JSON, Markdown, and plain text files are all indexed.
Session memory persists within an API session (same session_id). Cross-session long-term focus is tracked via insight memory — frequently queried components surface higher automatically.
Repo scans maintain memory during the active service lifecycle. The same repo URL will accumulate focus signals across queries until the service restarts.
Large repos (10k+ files) may take up to 2 minutes to scan. Poll /status to check progress.