llm-wiki

Karpathy's llm-wiki pattern implementation — cumulative knowledge management for AI agents

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "llm-wiki" with this command: npx skills add nemo4110/041-llm-wiki

CLI Reference

Agent Bridge (Recommended for Agents)

Use scripts/agent-bridge.py as the single entry point for all tool-assisted operations:

# Environment check
python scripts/agent-bridge.py check

# Discover relations for a new page
python scripts/agent-bridge.py link --source "NewPage" --mode light

# Execute merge with diff review
python scripts/agent-bridge.py link --source "NewPage" --target "OldPage" --strategy append_related

# Batch global linking for recent pages
python scripts/agent-bridge.py relink --since 2026-04-20 --mode deep

# Health check
python scripts/agent-bridge.py lint

# Status overview
python scripts/agent-bridge.py status

Why Agent Bridge?

  • Single obvious entry point — no guessing whether to use protocol mode or CLI mode
  • Structured Markdown output — human-readable and machine-parseable
  • Execution traceability — detailed logging with file:line references to stderr
  • Auto-detects Python environment (uv venv / conda / system)

Protocol Mode (Natural Language)

For tasks requiring LLM judgment (content extraction, synthesis, strategy selection):

"Please ingest sources/paper.pdf into wiki"
"Query wiki: What is the difference between Transformer and RNN?"
"Check wiki health"

Legacy CLI Mode (Optional)

Direct library access for scripting or debugging:

# Show wiki status overview
python -m src.llm_wiki status

# Run health check
python -m src.llm_wiki lint

# Show help
python -m src.llm_wiki --help

Note: ingest and query commands in legacy CLI only provide auxiliary functions (like listing pages). Actual content processing requires natural language interaction with the agent.

LLM-Wiki

Karpathy's llm-wiki pattern implementation — cumulative knowledge management for AI agents.

Core Philosophy: LLM as programmer, Wiki as codebase, User as product manager.

Why SKILL Form?

We chose the SKILL form because it brings these advantages:

  • Zero deployment — No services to run, no databases to configure; works the moment you clone the repository
  • Native integration — Direct command execution via Claude Code, no middleware or protocol translation needed
  • Plain-text data — Pure Markdown files, git-native, with no proprietary formats or vendor lock-in
  • Editor freedom — Use Obsidian, VS Code, or any text editor you prefer
  • Minimal footprint — ~500 lines of core protocol, keeping complexity low

Features

  • Protocol-driven: Works with natural language (no installation required)
  • Pure Markdown: No database, no lock-in, git-native
  • Wiki-style links: [[PageName]] format, Obsidian-compatible
  • Cumulative learning: Every query can create new knowledge
  • Health checks: Orphan pages, dead links, stale content detection
  • Optional CLI: Python scripts for automation and batch operations

Quick Start

# 1. Clone
git clone https://github.com/Nemo4110/llm-wiki.git
cd llm-wiki

# 2. Add source material
cp ~/Downloads/paper.pdf sources/

# 3. Tell your agent
"Please ingest sources/paper.pdf into wiki"

Installation

Protocol Mode (Recommended)

No installation needed. Agent reads CLAUDE.md and operates directly.

CLI Mode (Optional)

Using uv (Fastest)

# Create virtual environment and install dependencies
uv venv
uv pip install -r src/requirements.txt --python .venv/Scripts/python.exe

# Activate environment (Windows)
.venv\Scripts\activate
# Or Linux/macOS
source .venv/bin/activate

Using conda

# Create environment
conda create -n llm-wiki python=3.11

# Activate environment
conda activate llm-wiki

# Install dependencies
pip install -r src/requirements.txt

Using pip

# Create virtual environment
python -m venv .venv

# Activate environment
source .venv/bin/activate  # Linux/macOS
.venv\Scripts\activate     # Windows

# Install dependencies
pip install -r src/requirements.txt

Verify Installation

python -c "from src.llm_wiki.core import WikiManager; print('✓ Installation successful')"

Important Dependency Notes:

DependencyVersionPurposeNotes
click>=8.0.0CLI framework-
pyyaml>=6.0YAML parsing-
pymupdf>=1.25.0PDF processingPrimary PDF engine, best for CJK

Optional dependencies (for enhanced features):

  • numpy >=1.24.0 — Vector operations for embedding retrieval
  • httpx >=0.27.0 — HTTP client for Ollama/local services
  • openai >=1.0.0 — OpenAI embedding API
  • mcp >=1.0.0 — MCP SDK for remote embedding providers

Fallback PDF dependency:

  • pdfplumber >=0.11.8 — Table extraction fallback (security version required for CVE-2025-64512)
  • pdfminer.six >=20251107 — PDF underlying library fallback

Project Structure

llm-wiki/
├── CLAUDE.md           # ⭐ Core protocol: Agent behavior guidelines
├── AGENTS.md           # Agent implementation guide (CLI usage)
├── SKILL.md            # This file, machine-readable specification
├── log.md              # Timeline log (append-only)
├── sources/            # Raw materials (user-managed + tool-fetched; Agent forbidden from writing LLM-generated content)
│   └── README.md
├── wiki/               # Generated knowledge pages (Agent-managed)
│   ├── index.md        # Entry index
│   └── *.md            # Topic pages
├── assets/             # Templates and configuration
│   ├── page_template.md
│   └── ingest_rules.md
├── src/                # SKILL implementation (optional, for CLI)
│   ├── llm_wiki/
│   └── requirements.txt
├── scripts/            # Auxiliary scripts
├── hooks/              # Platform hooks (optional)
└── examples/           # Example wiki

About sources/: Excluded from git by default to avoid repository bloat. Wiki only retains extracted knowledge; original files are managed separately (cloud storage, Zotero, etc.). See sources/README.md for tracking specific files.

How It Works

Data Flow

+----------+     +--------------------+     +--------------+
| sources/ |---->|   LLM Processing   |---->|    wiki/     |
|  (Raw)   |     | (Extract + Link)   |     | (Structured) |
+----------+     +--------------------+     +--------------+
                          |
                          v
                    +----------+
                    |  log.md  |
                    | (Record) |
                    +----------+

Key Design

  1. CLAUDE.md as Protocol: Defines Agent behavior standards, anyone/any Agent can follow
  2. Pure Markdown: No database, no lock-in, native git version control
  3. Bidirectional Links: [[PageName]] format, compatible with Obsidian
  4. Cumulative Learning: Each query can generate new wiki pages, knowledge continuously accumulates

Query Mechanism

Current Implementation: Symbolic Navigation + LLM Synthesis (Default)

By default, this SKILL does not require Embedding/vector retrieval. Queries are completed through:

User asks question
         |
         v
+-------------------------------+
|  1. Read index.md             |  <-- Human/Agent-maintained category index
|     Locate relevant topics    |
+-------------------------------+
         |
         v
+-------------------------------+
|  2. Read relevant pages       |  <-- Discover associations through [[links]]
|     and their link neighbors  |
+-------------------------------+
         |
         v
+-------------------------------+
|  3. LLM Synthesis             |  <-- Generate answers based on read content
|     Generate with citations   |  Citation format: [[PageName]]
+-------------------------------+

Optional Enhancement: After enabling config.yaml embedding settings, CLI query --semantic adds hybrid search (Keyword Match + Vector Search + Link Traversal) for faster, more accurate retrieval.

Example Flow:

User asks: "What is LoRA?"

  1. Agent reads wiki/index.md, finds [[LoRA]] under "AI/ML" topic
  2. Agent reads wiki/LoRA.md, discovers links to [[Fine-tuning]], [[Adapter]]
  3. Agent synthesizes answer:

    LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method — see [[LoRA]]. Compared to traditional [[Fine-tuning]], it only trains low-rank matrices...

Why is Embedding Optional?

ConsiderationCurrent SolutionEmbedding Solution
DependenciesZero external dependenciesRequires Embedding API or local model
CostNo additional feesCharged per token/request
PrivacyData not uploadedMust send content to external service
AccuracyPrecise links, explainableApproximate similarity, may retrieve irrelevant content
ScaleSuitable for 0-500 pagesEssential for large scale (1000+ pages)

Conclusion: For personal/small team knowledge bases, maintaining index.md and page links is simpler and more effective than introducing Embedding. Embedding is available as an opt-in CLI enhancement when scale demands it.

When to Use Embedding?

Consider enabling config.yaml embedding settings when your wiki shows:

  • Page count > 500, manual indexing hard to maintain
  • Queries are mostly fuzzy semantic ("that paper about optimization...")
  • Need cross-topic associations (user didn't mention keywords, but semantically related)

See ROADMAP.md for details.

Example Workflows

Scenario 1: Researching New Domain

# Day 1: Add materials
sources/
├── 2026-04-10-transformer-paper.pdf
└── 2026-04-10-lora-blog.md
User: Please ingest new materials
Claude: Processed 2 materials
  - Created [[Transformer]] — Attention mechanism architecture
  - Created [[LoRA]] — Low-rank adaptation
  - Created [[Self-Attention]] — Self-attention details
  - Updated wiki/index.md

Scenario 2: Deep Exploration

User: What's the difference between LoRA and full fine-tuning?

Claude (reads wiki/index.md → finds [[LoRA]] → synthesizes answer):

Based on our wiki:

**LoRA** — [[LoRA]]
- Freezes pre-trained weights, only trains low-rank matrices
- Parameter count reduced ~10,000x

**Full Fine-tuning**
- Updates all parameters
- High computational cost

Key differences:
1. Parameters: LoRA only trains 0.1%~1% of parameters [[Parameter Efficiency]]
2. Storage: LoRA only saves adapters, not full model
3. Composition: Multiple LoRAs can be stacked [[LoRA Composition]]

Was this helpful? Should I archive this answer to wiki?

Scenario 3: Knowledge Accumulation

User: Yes, archive this answer

Claude: Created [[LoRA vs Full Fine-tuning]]
- Extracted comparison points from conversation
- Linked to [[LoRA]] and [[Fine-tuning]]
- Added to FAQ section in wiki/index.md

Using with Obsidian

  1. Open wiki/ directory in Obsidian
  2. Enjoy graph view, quick navigation, beautiful rendering
  3. Claude Code handles maintenance, Obsidian handles reading and thinking

Comparison with Alternatives

SolutionCharacteristicsBest For
This SKILLZero dependencies, pure text, Claude Code nativePersonal knowledge management, research notes
Sage-WikiFull-featured, multimodal, standalone appTeam knowledge base, enterprise deployment
Obsidian + PluginsStrong visualization, rich communityExisting Obsidian workflow
Notion/LogseqCollaborative, real-time syncMulti-user collaboration, mobile access

Documentation

  • CLAUDE.md — User-facing protocol (read this first)
  • AGENTS.md — Implementation guide for agent developers
  • SKILL.md — This file, machine-readable specification
  • ROADMAP.md — Future plans

Contributing

Issues and PRs welcome!

Current TODO

  • MCP server wrapper (for other Agents)
  • Obsidian plugin (one-click sync)
  • Incremental embedding for faster retrieval
  • Multi-language support

License

MIT — free to use, modify, and distribute.


Inspired by Karpathy's llm-wiki

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

Outreach Demo

Research a business website, produce a concise prospect report, recommend concrete OpenClaw use cases, and draft a tailored outreach email. Use when demonstr...

Registry SourceRecently Updated
2790Profile unavailable
General

Mcp Hello World

最小可行 MCP 服务器示例 - 在 OpenClaw 中调用 MCP 工具(add 计算 + hello_world 问候)

Registry SourceRecently Updated
2420Profile unavailable
Coding

Rdk X5 App

运行 RDK X5 /app 目录下的预装示例程序:12 个 Python AI 推理 demo(YOLO/分类/分割/Web展示)、40pin GPIO 示例、C++ 多媒体示例(编解码/RTSP/VIO)、查看 34 个预装 BPU 模型。Use when the user wants to run pre-...

Registry SourceRecently Updated
3280Profile unavailable
Coding

Tirosman Demo Skill

Drive the TirOSMAN autonomous multi-agent demo (5 projects × 150 tasks, Jira-like lifecycle, QA approval) from any MCP-aware client.

Registry SourceRecently Updated
910Profile unavailable