karpathy-wiki — OpenClaw Implementation v3.0
Based on Andrej Karpathy's LLM Wiki pattern.
wiki Root
wiki_root: /path/to/your/wiki # configure to your local path
<wiki_root>/
├── raw/
│ ├── sources/ # raw bookmarks/docs (immutable)
│ └── assets/ # images and resources
├── wiki/
│ ├── entities/ # entity pages (people, products, companies, sites, books)
│ ├── concepts/ # concept pages (tech, theory, methodology)
│ ├── comparisons/ # comparison pages
│ ├── synthesis/ # synthesis/overview pages
│ ├── index.md # wiki index (entry point)
│ ├── log.md # operation log (append-only)
│ └── overview.md # global overview
├── purpose.md # wiki goal definition (wiki constitution)
└── schema.md # structure conventions
Core Principles (v3.0)
- sources/ is read-only — LLM only writes wiki/, never modifies raw sources
- wikilink cross-references —
[[page-slug]]syntax for page connections - YAML frontmatter — every page has type/tags/related/sources
- Bidirectional links enforced — every write to related must sync back-link
- Two-phase Ingest — Analysis → Generation
- URL-level traceability — sources contain specific URLs, not just filenames
- Lint-driven — periodic health checks, graph stays clean
- Deep Research — knowledge gaps auto-discovered and filled
Page Type Taxonomy (entity vs concept boundary)
| Type | Definition | Examples |
|---|---|---|
| entity | Named, discrete things | people/products/companies/sites/books/tools |
| concept | Abstract ideas/theories/methodologies | indexing principles, microservices, DI |
| comparison | Multi-option comparisons | Vue vs React, MySQL vs PostgreSQL |
| synthesis | Comprehensive overview | tech stack panorama, annual summary |
Boundary Rules:
- If it has a specific name → entity ("pdai.tech", "Effective Java")
- If it's abstract/generic → concept ("MySQL indexing", "dependency injection")
- Avoid having both entity and concept for the same topic
Naming Convention
entity:
blogs/sites: use domain or person name
→ mysql-zhu-shuangyin
→ pdai-tech
→ jon-index-blog
books: use simplified book title
→ effective-java
concept:
use core terms in kebab-case
→ mysql-innodb
→ jwt-json-web-token
→ dependency-injection
comparison:
→ mysql-postgresql-comparison
→ vue-vs-react
synthesis:
→ go-web-dev-overview
→ 2026-learning-roadmap-summary
Rules:
- All lowercase, hyphen-separated
- No mixed Chinese/English
- Unique slugs, no duplicates
YAML Frontmatter (Required Fields)
---
type: entity | concept | comparison | synthesis
title: Page Title
created: YYYY-MM-DD
updated: YYYY-MM-DD
tags: [tag1, tag2]
related: [page-slug-1, page-slug-2] # forward reference (back-link auto-added)
sources:
- file: bookmarks_xxx.md
urls:
- https://example.com/article1
- https://example.com/article2
---
sources.urls is mandatory — URL-level traceability is a core principle.
Quality Thresholds
Every concept/comparison page must have:
| Requirement | Description |
|---|---|
| One-line definition | frontmatter title or page header > |
| Core principles ≥ 3 | body contains at least 3 substantial points |
| Related pages ≥ 1 | related field is non-empty |
| Source URLs | sources.urls is non-empty |
| Back-links added | every page in related back-links to this page |
Every entity page must have:
| Requirement | Description |
|---|---|
| One-line description | frontmatter title |
| Key features ≥ 2 | body has substantive descriptions |
| Related pages ≥ 1 | related field non-empty |
| Source URLs | sources.urls is non-empty |
Page Templates
Entity Page
---
type: entity
title: Entity Name
created: YYYY-MM-DD
updated: YYYY-MM-DD
tags: [tags]
related: [page-slug-1, page-slug-2]
sources:
- file: bookmarks_xxx.md
urls:
- https://example.com
---
# Entity Name
> One-line description (used in index.md summary).
## Overview
Main content and background.
## Key Features
- Feature 1
- Feature 2
## Related
- [[page-slug]] — reason (back-link auto-added)
## Sources
- [Article Title](https://example.com) — source description
Concept Page
---
type: concept
title: Concept Name
created: YYYY-MM-DD
updated: YYYY-MM-DD
tags: [tags]
related: [page-slug-1, page-slug-2]
sources:
- file: bookmarks_xxx.md
urls:
- https://example.com/article1
---
# Concept Name
> One-line definition.
## Core Principles
- Principle 1
- Principle 2
- Principle 3
## Use Cases
- Use case 1
## Related
- [[page-slug]] — reason
## Counter-arguments / Data Gaps
- Known limitations
- Uncovered aspects
## Sources
- [Article Title](https://example.com) — source description
Operations
Ingest (Collection & Digestion)
Phase 1 — Analysis
## Key Entities
Identified entities
## Key Concepts
Identified core concepts
## Main Arguments & Findings
Key arguments and findings
## Connections to Existing Wiki
Relations to existing wiki pages
## Contradictions & Tensions
Conflicts with existing knowledge
## Coverage Gaps
What was mentioned but not covered deeply?
What related topics are missing?
## Recommendations
New/update which pages
Phase 2 — Generation
- Create/update target pages (with urls in sources)
- Sync related + back-link (bidirectional link enforcement)
- Verify pages meet quality thresholds
- Update index.md
- Append to log.md
Output format:
---FILE: wiki/concepts/page.md---
[page content with sources.urls]
---END FILE---
---FILE: wiki/entities/backlink-target.md---
[update target page, append back-link]
---END FILE---
---FILE: wiki/index.md---
[append new page entry]
---END FILE---
---FILE: wiki/log.md---
[append ingest log entry]
---END FILE---
Query
- Read
wiki/index.mdto locate relevant pages - Read related pages + extract sources.urls
- Use
web_fetchto trace and verify original URLs - Synthesize answer, annotate source confidence
Relink (Automatic Relationship Discovery)
Trigger: batch ingest complete / periodic heartbeat
Process:
1. Scan all wiki/*.md tags and body text
2. Extract core topics from each page
3. Find page pairs sharing tags/topics
4. Analyze relationship strength pairwise
5. Generate recommended link list (candidate)
6. User confirms before writing (back-link sync)
Execution steps:
# 1. Collect all related pairs (shared tags)
grep -r "^tags:" wiki/concepts/ wiki/entities/ | analyze
# 2. List orphan pages
for f in wiki/**/*.md; do
related=$(grep "^related:" "$f")
inbound=$(grep -r "^\* \[\[$(basename $f .md)\]\]" wiki/)
[ -z "$related" ] && [ -z "$inbound" ] && echo "$f is orphan"
done
# 3. LLM generates relink suggestion report
# Format:
# [[page-A]] <--> [[page-B]] reason: shared tag MySQL B+tree
# [[page-C]] --> [[page-D]] reason: C mentions D but not linked
Write rules:
- Update A's related to add B
- Update B's related to add A
- Append to log.md
Lint (Health Check) — Enhanced
Trigger: user request / periodic heartbeat
Scan dimensions (6):
| # | Dimension | Description |
|---|---|---|
| 1 | Orphan pages | No related pages, no inbound links |
| 2 | Dangling references | related references non-existent slugs |
| 3 | One-way links | A→B but B→A missing |
| 4 | Contradiction detection | Same claim described differently across pages |
| 5 | Quality threshold | Page fails minimum quality (no urls/no related/principles<3) |
| 6 | Naming drift | Slug style inconsistent (mixed case/mixed Chinese-English) |
Lint report format:
## Lint Report — YYYY-MM-DD
### Orphan Pages (N)
- [[page]] — no related, no inbound
### One-way Links (N)
- [[A]] → [[B]] (B not back-linking A)
### Dangling References (N)
- [[page]] references non-existent [[nonexistent]]
### Quality Failures (N)
- [[page]] — missing urls source
- [[page]] — empty related
### Contradictions (N)
- [[page-A]] says: X is Y
- [[page-B]] says: X is Z
### Naming Issues (N)
- [[page]] — slug has uppercase/mixed Chinese-English
### Recommended Actions
1. [Priority 1]
2. [Priority 2]
Deep Research
Trigger: lint finds Coverage Gaps / user says "research X"
Process:
1. Discover knowledge gap
lint report "missing coverage" items
user says "help me research XXX"
2. Generate search queries
LLM generates 3-5 search queries from gap
3. Multi-source search
Execute web_search for each query
4. Ingest results
Write search results to raw/sources/
Execute ingest to generate new pages
5. relink + lint
complete relationships + health check
purpose.md (Wiki Constitution)
Every wiki should have purpose.md defining:
# purpose.md
## Goal
Who is this wiki for? What problem does it solve?
## Core Questions
What core questions must this wiki answer?
## Scope
What domains are covered?
What is explicitly excluded?
## Evolution Direction
Near-term (3 months): fill gaps in which domains?
Mid-term (6 months): what state to achieve?
Long-term (1 year): what is the ideal wiki form?
## Quality Standards
What is the minimum quality threshold?
Source Traceability Chain
User bookmarks (Chrome export)
↓
raw/sources/bookmarks_xxx.md (immutable)
↓ Ingest writes
wiki/xxx.md
sources:
- file: bookmarks_xxx.md
urls:
- https://example.com ← specific URL
↓ Query time
OpenClaw reads wiki → reads sources.urls → web_fetch original URL → verify
Use Cases
- User asks technical question (check wiki first, then search)
- User says "help me digest this link"
- User requests "organize my collected content on XXX"
- User requests "run lint"
- User requests "relink"
- User requests "research X" (Deep Research)
- Periodic heartbeat triggers lint + relink + quality check
Confidence Annotations
| Annotation | Meaning |
|---|---|
✅ Verified | wiki content matches original URL source |
⚠️ Inferred | wiki content is LLM inference based on source, not direct quote |
❌ Disputed | wiki content contradicts source, needs verification |
Bidirectional Link Write Rules (Enforced)
Every time you modify the related field:
When writing A's related to add B:
1. Add B to A's related: [...]
2. Check if B's related already has A
3. If not, add A as back-link
4. If yes, skip
Prohibited:
- ❌ Write A→B only, skip B→A
- ❌ Leave related empty with no links added
- ❌ sources has only file, no urls