CLI Reference
Agent Bridge (Recommended for Agents)
Use scripts/agent-bridge.py as the single entry point for all tool-assisted operations:
# Environment check
python scripts/agent-bridge.py check
# Discover relations for a new page
python scripts/agent-bridge.py link --source "NewPage" --mode light
# Execute merge with diff review
python scripts/agent-bridge.py link --source "NewPage" --target "OldPage" --strategy append_related
# Batch global linking for recent pages
python scripts/agent-bridge.py relink --since 2026-04-20 --mode deep
# Health check
python scripts/agent-bridge.py lint
# Status overview
python scripts/agent-bridge.py status
Why Agent Bridge?
- Single obvious entry point — no guessing whether to use protocol mode or CLI mode
- Structured Markdown output — human-readable and machine-parseable
- Execution traceability — detailed logging with file:line references to stderr
- Auto-detects Python environment (uv venv / conda / system)
Protocol Mode (Natural Language)
For tasks requiring LLM judgment (content extraction, synthesis, strategy selection):
"Please ingest sources/paper.pdf into wiki"
"Query wiki: What is the difference between Transformer and RNN?"
"Check wiki health"
Legacy CLI Mode (Optional)
Direct library access for scripting or debugging:
# Show wiki status overview
python -m src.llm_wiki status
# Run health check
python -m src.llm_wiki lint
# Show help
python -m src.llm_wiki --help
Note: ingest and query commands in legacy CLI only provide auxiliary functions (like listing pages). Actual content processing requires natural language interaction with the agent.
LLM-Wiki
Karpathy's llm-wiki pattern implementation — cumulative knowledge management for AI agents.
Core Philosophy: LLM as programmer, Wiki as codebase, User as product manager.
Why SKILL Form?
We chose the SKILL form because it brings these advantages:
- Zero deployment — No services to run, no databases to configure; works the moment you clone the repository
- Native integration — Direct command execution via Claude Code, no middleware or protocol translation needed
- Plain-text data — Pure Markdown files, git-native, with no proprietary formats or vendor lock-in
- Editor freedom — Use Obsidian, VS Code, or any text editor you prefer
- Minimal footprint — ~500 lines of core protocol, keeping complexity low
Features
- Protocol-driven: Works with natural language (no installation required)
- Pure Markdown: No database, no lock-in, git-native
- Wiki-style links:
[[PageName]]format, Obsidian-compatible - Cumulative learning: Every query can create new knowledge
- Health checks: Orphan pages, dead links, stale content detection
- Optional CLI: Python scripts for automation and batch operations
Quick Start
# 1. Clone
git clone https://github.com/Nemo4110/llm-wiki.git
cd llm-wiki
# 2. Add source material
cp ~/Downloads/paper.pdf sources/
# 3. Tell your agent
"Please ingest sources/paper.pdf into wiki"
Installation
Protocol Mode (Recommended)
No installation needed. Agent reads CLAUDE.md and operates directly.
CLI Mode (Optional)
Using uv (Fastest)
# Create virtual environment and install dependencies
uv venv
uv pip install -r src/requirements.txt --python .venv/Scripts/python.exe
# Activate environment (Windows)
.venv\Scripts\activate
# Or Linux/macOS
source .venv/bin/activate
Using conda
# Create environment
conda create -n llm-wiki python=3.11
# Activate environment
conda activate llm-wiki
# Install dependencies
pip install -r src/requirements.txt
Using pip
# Create virtual environment
python -m venv .venv
# Activate environment
source .venv/bin/activate # Linux/macOS
.venv\Scripts\activate # Windows
# Install dependencies
pip install -r src/requirements.txt
Verify Installation
python -c "from src.llm_wiki.core import WikiManager; print('✓ Installation successful')"
Important Dependency Notes:
| Dependency | Version | Purpose | Notes |
|---|---|---|---|
click | >=8.0.0 | CLI framework | - |
pyyaml | >=6.0 | YAML parsing | - |
pymupdf | >=1.25.0 | PDF processing | Primary PDF engine, best for CJK |
Optional dependencies (for enhanced features):
numpy >=1.24.0— Vector operations for embedding retrievalhttpx >=0.27.0— HTTP client for Ollama/local servicesopenai >=1.0.0— OpenAI embedding APImcp >=1.0.0— MCP SDK for remote embedding providers
Fallback PDF dependency:
pdfplumber >=0.11.8— Table extraction fallback (security version required for CVE-2025-64512)pdfminer.six >=20251107— PDF underlying library fallback
Project Structure
llm-wiki/
├── CLAUDE.md # ⭐ Core protocol: Agent behavior guidelines
├── AGENTS.md # Agent implementation guide (CLI usage)
├── SKILL.md # This file, machine-readable specification
├── log.md # Timeline log (append-only)
├── sources/ # Raw materials (user-managed + tool-fetched; Agent forbidden from writing LLM-generated content)
│ └── README.md
├── wiki/ # Generated knowledge pages (Agent-managed)
│ ├── index.md # Entry index
│ └── *.md # Topic pages
├── assets/ # Templates and configuration
│ ├── page_template.md
│ └── ingest_rules.md
├── src/ # SKILL implementation (optional, for CLI)
│ ├── llm_wiki/
│ └── requirements.txt
├── scripts/ # Auxiliary scripts
├── hooks/ # Platform hooks (optional)
└── examples/ # Example wiki
About sources/: Excluded from git by default to avoid repository bloat. Wiki only retains extracted knowledge; original files are managed separately (cloud storage, Zotero, etc.). See sources/README.md for tracking specific files.
How It Works
Data Flow
+----------+ +--------------------+ +--------------+
| sources/ |---->| LLM Processing |---->| wiki/ |
| (Raw) | | (Extract + Link) | | (Structured) |
+----------+ +--------------------+ +--------------+
|
v
+----------+
| log.md |
| (Record) |
+----------+
Key Design
- CLAUDE.md as Protocol: Defines Agent behavior standards, anyone/any Agent can follow
- Pure Markdown: No database, no lock-in, native git version control
- Bidirectional Links:
[[PageName]]format, compatible with Obsidian - Cumulative Learning: Each query can generate new wiki pages, knowledge continuously accumulates
Query Mechanism
Current Implementation: Symbolic Navigation + LLM Synthesis (Default)
By default, this SKILL does not require Embedding/vector retrieval. Queries are completed through:
User asks question
|
v
+-------------------------------+
| 1. Read index.md | <-- Human/Agent-maintained category index
| Locate relevant topics |
+-------------------------------+
|
v
+-------------------------------+
| 2. Read relevant pages | <-- Discover associations through [[links]]
| and their link neighbors |
+-------------------------------+
|
v
+-------------------------------+
| 3. LLM Synthesis | <-- Generate answers based on read content
| Generate with citations | Citation format: [[PageName]]
+-------------------------------+
Optional Enhancement: After enabling config.yaml embedding settings, CLI query --semantic adds hybrid search (Keyword Match + Vector Search + Link Traversal) for faster, more accurate retrieval.
Example Flow:
User asks: "What is LoRA?"
- Agent reads
wiki/index.md, finds[[LoRA]]under "AI/ML" topic - Agent reads
wiki/LoRA.md, discovers links to[[Fine-tuning]],[[Adapter]] - Agent synthesizes answer:
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method — see [[LoRA]]. Compared to traditional [[Fine-tuning]], it only trains low-rank matrices...
Why is Embedding Optional?
| Consideration | Current Solution | Embedding Solution |
|---|---|---|
| Dependencies | Zero external dependencies | Requires Embedding API or local model |
| Cost | No additional fees | Charged per token/request |
| Privacy | Data not uploaded | Must send content to external service |
| Accuracy | Precise links, explainable | Approximate similarity, may retrieve irrelevant content |
| Scale | Suitable for 0-500 pages | Essential for large scale (1000+ pages) |
Conclusion: For personal/small team knowledge bases, maintaining index.md and page links is simpler and more effective than introducing Embedding. Embedding is available as an opt-in CLI enhancement when scale demands it.
When to Use Embedding?
Consider enabling config.yaml embedding settings when your wiki shows:
- Page count > 500, manual indexing hard to maintain
- Queries are mostly fuzzy semantic ("that paper about optimization...")
- Need cross-topic associations (user didn't mention keywords, but semantically related)
See ROADMAP.md for details.
Example Workflows
Scenario 1: Researching New Domain
# Day 1: Add materials
sources/
├── 2026-04-10-transformer-paper.pdf
└── 2026-04-10-lora-blog.md
User: Please ingest new materials
Claude: Processed 2 materials
- Created [[Transformer]] — Attention mechanism architecture
- Created [[LoRA]] — Low-rank adaptation
- Created [[Self-Attention]] — Self-attention details
- Updated wiki/index.md
Scenario 2: Deep Exploration
User: What's the difference between LoRA and full fine-tuning?
Claude (reads wiki/index.md → finds [[LoRA]] → synthesizes answer):
Based on our wiki:
**LoRA** — [[LoRA]]
- Freezes pre-trained weights, only trains low-rank matrices
- Parameter count reduced ~10,000x
**Full Fine-tuning**
- Updates all parameters
- High computational cost
Key differences:
1. Parameters: LoRA only trains 0.1%~1% of parameters [[Parameter Efficiency]]
2. Storage: LoRA only saves adapters, not full model
3. Composition: Multiple LoRAs can be stacked [[LoRA Composition]]
Was this helpful? Should I archive this answer to wiki?
Scenario 3: Knowledge Accumulation
User: Yes, archive this answer
Claude: Created [[LoRA vs Full Fine-tuning]]
- Extracted comparison points from conversation
- Linked to [[LoRA]] and [[Fine-tuning]]
- Added to FAQ section in wiki/index.md
Using with Obsidian
- Open
wiki/directory in Obsidian - Enjoy graph view, quick navigation, beautiful rendering
- Claude Code handles maintenance, Obsidian handles reading and thinking
Comparison with Alternatives
| Solution | Characteristics | Best For |
|---|---|---|
| This SKILL | Zero dependencies, pure text, Claude Code native | Personal knowledge management, research notes |
| Sage-Wiki | Full-featured, multimodal, standalone app | Team knowledge base, enterprise deployment |
| Obsidian + Plugins | Strong visualization, rich community | Existing Obsidian workflow |
| Notion/Logseq | Collaborative, real-time sync | Multi-user collaboration, mobile access |
Documentation
- CLAUDE.md — User-facing protocol (read this first)
- AGENTS.md — Implementation guide for agent developers
- SKILL.md — This file, machine-readable specification
- ROADMAP.md — Future plans
Contributing
Issues and PRs welcome!
Current TODO
- MCP server wrapper (for other Agents)
- Obsidian plugin (one-click sync)
- Incremental embedding for faster retrieval
- Multi-language support
License
MIT — free to use, modify, and distribute.
Inspired by Karpathy's llm-wiki