karpathy-wiki — OpenClaw Implementation v3.0

Based on Andrej Karpathy's LLM Wiki pattern.

wiki Root

wiki_root: /path/to/your/wiki  # configure to your local path

<wiki_root>/
├── raw/
│   ├── sources/          # raw bookmarks/docs (immutable)
│   └── assets/           # images and resources
├── wiki/
│   ├── entities/        # entity pages (people, products, companies, sites, books)
│   ├── concepts/        # concept pages (tech, theory, methodology)
│   ├── comparisons/     # comparison pages
│   ├── synthesis/       # synthesis/overview pages
│   ├── index.md         # wiki index (entry point)
│   ├── log.md           # operation log (append-only)
│   └── overview.md      # global overview
├── purpose.md           # wiki goal definition (wiki constitution)
└── schema.md            # structure conventions

Core Principles (v3.0)

sources/ is read-only — LLM only writes wiki/, never modifies raw sources
wikilink cross-references — [[page-slug]] syntax for page connections
YAML frontmatter — every page has type/tags/related/sources
Bidirectional links enforced — every write to related must sync back-link
Two-phase Ingest — Analysis → Generation
URL-level traceability — sources contain specific URLs, not just filenames
Lint-driven — periodic health checks, graph stays clean
Deep Research — knowledge gaps auto-discovered and filled

Page Type Taxonomy (entity vs concept boundary)

Type	Definition	Examples
entity	Named, discrete things	people/products/companies/sites/books/tools
concept	Abstract ideas/theories/methodologies	indexing principles, microservices, DI
comparison	Multi-option comparisons	Vue vs React, MySQL vs PostgreSQL
synthesis	Comprehensive overview	tech stack panorama, annual summary

Boundary Rules:

If it has a specific name → entity ("pdai.tech", "Effective Java")
If it's abstract/generic → concept ("MySQL indexing", "dependency injection")
Avoid having both entity and concept for the same topic

Naming Convention

entity:
  blogs/sites: use domain or person name
    → mysql-zhu-shuangyin
    → pdai-tech
    → jon-index-blog
  books: use simplified book title
    → effective-java

concept:
  use core terms in kebab-case
    → mysql-innodb
    → jwt-json-web-token
    → dependency-injection

comparison:
  → mysql-postgresql-comparison
  → vue-vs-react

synthesis:
  → go-web-dev-overview
  → 2026-learning-roadmap-summary

Rules:

All lowercase, hyphen-separated
No mixed Chinese/English
Unique slugs, no duplicates

YAML Frontmatter (Required Fields)

---
type: entity | concept | comparison | synthesis
title: Page Title
created: YYYY-MM-DD
updated: YYYY-MM-DD
tags: [tag1, tag2]
related: [page-slug-1, page-slug-2]  # forward reference (back-link auto-added)
sources:
  - file: bookmarks_xxx.md
    urls:
      - https://example.com/article1
      - https://example.com/article2
---

sources.urls is mandatory — URL-level traceability is a core principle.

Quality Thresholds

Every concept/comparison page must have:

Requirement	Description
One-line definition	frontmatter title or page header `>`
Core principles ≥ 3	body contains at least 3 substantial points
Related pages ≥ 1	related field is non-empty
Source URLs	sources.urls is non-empty
Back-links added	every page in related back-links to this page

Every entity page must have:

Requirement	Description
One-line description	frontmatter title
Key features ≥ 2	body has substantive descriptions
Related pages ≥ 1	related field non-empty
Source URLs	sources.urls is non-empty

Page Templates

Entity Page

---
type: entity
title: Entity Name
created: YYYY-MM-DD
updated: YYYY-MM-DD
tags: [tags]
related: [page-slug-1, page-slug-2]
sources:
  - file: bookmarks_xxx.md
    urls:
      - https://example.com
---

# Entity Name

> One-line description (used in index.md summary).

## Overview
Main content and background.

## Key Features
- Feature 1
- Feature 2

## Related
- [[page-slug]] — reason (back-link auto-added)

## Sources
- [Article Title](https://example.com) — source description

Concept Page

---
type: concept
title: Concept Name
created: YYYY-MM-DD
updated: YYYY-MM-DD
tags: [tags]
related: [page-slug-1, page-slug-2]
sources:
  - file: bookmarks_xxx.md
    urls:
      - https://example.com/article1
---

# Concept Name

> One-line definition.

## Core Principles
- Principle 1
- Principle 2
- Principle 3

## Use Cases
- Use case 1

## Related
- [[page-slug]] — reason

## Counter-arguments / Data Gaps
- Known limitations
- Uncovered aspects

## Sources
- [Article Title](https://example.com) — source description

Operations

Ingest (Collection & Digestion)

Phase 1 — Analysis

## Key Entities
Identified entities

## Key Concepts
Identified core concepts

## Main Arguments & Findings
Key arguments and findings

## Connections to Existing Wiki
Relations to existing wiki pages

## Contradictions & Tensions
Conflicts with existing knowledge

## Coverage Gaps
What was mentioned but not covered deeply?
What related topics are missing?

## Recommendations
New/update which pages

Phase 2 — Generation

Create/update target pages (with urls in sources)
Sync related + back-link (bidirectional link enforcement)
Verify pages meet quality thresholds
Update index.md
Append to log.md

Output format:

---FILE: wiki/concepts/page.md---
[page content with sources.urls]
---END FILE---

---FILE: wiki/entities/backlink-target.md---
[update target page, append back-link]
---END FILE---

---FILE: wiki/index.md---
[append new page entry]
---END FILE---

---FILE: wiki/log.md---
[append ingest log entry]
---END FILE---

Query

Read wiki/index.md to locate relevant pages
Read related pages + extract sources.urls
Use web_fetch to trace and verify original URLs
Synthesize answer, annotate source confidence

Relink (Automatic Relationship Discovery)

Trigger: batch ingest complete / periodic heartbeat

Process:

1. Scan all wiki/*.md tags and body text
2. Extract core topics from each page
3. Find page pairs sharing tags/topics
4. Analyze relationship strength pairwise
5. Generate recommended link list (candidate)
6. User confirms before writing (back-link sync)

Execution steps:

# 1. Collect all related pairs (shared tags)
grep -r "^tags:" wiki/concepts/ wiki/entities/ | analyze

# 2. List orphan pages
for f in wiki/**/*.md; do
  related=$(grep "^related:" "$f")
  inbound=$(grep -r "^\* \[\[$(basename $f .md)\]\]" wiki/)
  [ -z "$related" ] && [ -z "$inbound" ] && echo "$f is orphan"
done

# 3. LLM generates relink suggestion report
#    Format:
#    [[page-A]] <--> [[page-B]]  reason: shared tag MySQL B+tree
#    [[page-C]] --> [[page-D]]   reason: C mentions D but not linked

Write rules:

Update A's related to add B
Update B's related to add A
Append to log.md

Lint (Health Check) — Enhanced

Trigger: user request / periodic heartbeat

Scan dimensions (6):

#	Dimension	Description
1	Orphan pages	No related pages, no inbound links
2	Dangling references	related references non-existent slugs
3	One-way links	A→B but B→A missing
4	Contradiction detection	Same claim described differently across pages
5	Quality threshold	Page fails minimum quality (no urls/no related/principles<3)
6	Naming drift	Slug style inconsistent (mixed case/mixed Chinese-English)

Lint report format:

## Lint Report — YYYY-MM-DD

### Orphan Pages (N)
- [[page]] — no related, no inbound

### One-way Links (N)
- [[A]] → [[B]] (B not back-linking A)

### Dangling References (N)
- [[page]] references non-existent [[nonexistent]]

### Quality Failures (N)
- [[page]] — missing urls source
- [[page]] — empty related

### Contradictions (N)
- [[page-A]] says: X is Y
- [[page-B]] says: X is Z

### Naming Issues (N)
- [[page]] — slug has uppercase/mixed Chinese-English

### Recommended Actions
1. [Priority 1]
2. [Priority 2]

Deep Research

Trigger: lint finds Coverage Gaps / user says "research X"

Process:

1. Discover knowledge gap
   lint report "missing coverage" items
   user says "help me research XXX"

2. Generate search queries
   LLM generates 3-5 search queries from gap

3. Multi-source search
   Execute web_search for each query

4. Ingest results
   Write search results to raw/sources/
   Execute ingest to generate new pages

5. relink + lint
   complete relationships + health check

purpose.md (Wiki Constitution)

Every wiki should have purpose.md defining:

# purpose.md

## Goal
Who is this wiki for? What problem does it solve?

## Core Questions
What core questions must this wiki answer?

## Scope
What domains are covered?
What is explicitly excluded?

## Evolution Direction
Near-term (3 months): fill gaps in which domains?
Mid-term (6 months): what state to achieve?
Long-term (1 year): what is the ideal wiki form?

## Quality Standards
What is the minimum quality threshold?

Source Traceability Chain

User bookmarks (Chrome export)
  ↓
raw/sources/bookmarks_xxx.md  (immutable)
  ↓  Ingest writes
wiki/xxx.md
  sources:
    - file: bookmarks_xxx.md
      urls:
        - https://example.com  ← specific URL
  ↓  Query time
OpenClaw reads wiki → reads sources.urls → web_fetch original URL → verify

Use Cases

User asks technical question (check wiki first, then search)
User says "help me digest this link"
User requests "organize my collected content on XXX"
User requests "run lint"
User requests "relink"
User requests "research X" (Deep Research)
Periodic heartbeat triggers lint + relink + quality check

Confidence Annotations

Annotation	Meaning
`✅ Verified`	wiki content matches original URL source
`⚠️ Inferred`	wiki content is LLM inference based on source, not direct quote
`❌ Disputed`	wiki content contradicts source, needs verification

Bidirectional Link Write Rules (Enforced)

Every time you modify the related field:

When writing A's related to add B:
  1. Add B to A's related: [...]
  2. Check if B's related already has A
  3. If not, add A as back-link
  4. If yes, skip

Prohibited:

❌ Write A→B only, skip B→A
❌ Leave related empty with no links added
❌ sources has only file, no urls

karpathy-wiki

Safety Notice

Copy this and send it to your AI assistant to learn