LEANN Skill

LEANN (Learned Embedding ANchor Navigation) - A graph-based selective recomputation system achieving 97% storage reduction for local RAG indexes while maintaining fast retrieval performance.

Core Principle

Use leann for persistent local code indexes with minimal storage overhead.

Instead of storing all embeddings at full precision (3 GB for 1M files), leann stores only strategic anchors and reconstructs others on-demand (128 MB for same dataset - 95.8% reduction).

When to Use LEANN

Primary Use Cases

1. Large Codebase Indexing (10K+ files)

Monorepos with multiple services
Enterprise codebases that exceed vector database free tiers
Projects requiring fast local semantic search without cloud dependencies

2. Memory-Constrained Environments

Development machines with limited RAM
CI/CD pipelines needing index validation
Edge deployments or air-gapped systems

3. Cost-Sensitive RAG Applications

Avoiding Pinecone/Weaviate monthly costs
Self-hosted vector search with minimal infrastructure
Batch processing of large document collections

4. Real-Time Code Navigation

IDE integrations for semantic code search
Developer tools needing instant relevance feedback
Documentation search within editors

When NOT to Use LEANN

Use traditional vector DBs instead when:

Dataset is small (<1K items) - overhead not worth it
Cloud infrastructure is required (multi-tenant, global CDN)
Need advanced features (hybrid search, filtering, multi-tenancy)
Absolute lowest latency required (<5ms) - leann trades latency for storage

Architecture Overview

Two-Stage Query Process

Query → [Stage 1: Anchor Graph Search] → [Stage 2: Lazy Reconstruction] → Results

Stage 1: Fast HNSW/DiskANN traversal (1-5ms)
Stage 2: Reconstruct top candidates (5-25ms)
Total: 10-30ms (HNSW) or 50-200ms (DiskANN)

Storage Model

Naive:  N × D × 4 bytes (full embeddings)
LEANN:  M × D × 4 + N × (D / 8) bytes (anchors + compressed deltas)

Example (100K files, 768 dims):
Naive:  100,000 × 768 × 4     = 307 MB
LEANN:  1,000 × 768 × 4       = 3 MB (anchors)
        100,000 × 96          = 9.6 MB (deltas)
        Total                 ≈ 12.6 MB (95.9% reduction)

Key Components

Anchor Graph: HNSW (in-memory) or DiskANN (disk-based) graph of anchor embeddings
Product Quantization: Compress delta vectors (delta = embedding - anchor)
ZMQ Embedding Server: GPU-accelerated embedding generation
Delta Index: Incremental updates without full rebuild

Quick Start

1. Installation

# Via pip
pip install leann

# Via conda
conda install -c conda-forge leann

# From source
git clone https://github.com/leann-index/leann.git
cd leann && pip install -e .

2. Start Embedding Server

# Docker (recommended - GPU support)
docker run -d \
  --name leann-embeddings \
  --gpus all \
  -p 5555:5555 \
  -e MODEL=sentence-transformers/all-MiniLM-L6-v2 \
  leann/embedding-server:latest

# Python script (alternative)
python -m leann.server \
  --model sentence-transformers/all-MiniLM-L6-v2 \
  --port 5555 \
  --device cuda

3. Create Index

# Basic indexing
leann index create \
  --input /path/to/codebase \
  --output ./leann.index \
  --config leann.config.json

# With progress tracking
leann index create \
  --input /path/to/codebase \
  --output ./leann.index \
  --progress \
  --verbose

4. Query Index

# CLI query
leann query \
  --index ./leann.index \
  --query "JWT authentication middleware" \
  --top-k 10

# Watch mode (live updates)
leann serve \
  --index ./leann.index \
  --watch \
  --port 8080

Configuration

Minimal Configuration

{
  "backend": "hnsw",
  "complexity": "medium",
  "anchorSelection": {
    "type": "kmeans",
    "clusters": 300
  },
  "quantization": {
    "subVectors": 96,
    "codebookSize": 256
  },
  "embeddingServer": {
    "endpoint": "tcp://localhost:5555",
    "model": "sentence-transformers/all-MiniLM-L6-v2",
    "batchSize": 128,
    "timeout": 30000
  }
}

Production Configuration

{
  "backend": "hnsw",
  "complexity": "high",
  "anchorSelection": {
    "type": "kmeans",
    "clusters": 1000,
    "samples": 20000
  },
  "quantization": {
    "subVectors": 96,
    "codebookSize": 256,
    "trainingSamples": 50000
  },
  "embeddingServer": {
    "endpoint": "tcp://localhost:5555",
    "model": "sentence-transformers/all-MiniLM-L6-v2",
    "batchSize": 256,
    "timeout": 60000
  },
  "incremental": {
    "enabled": true,
    "deltaIndexThreshold": 1000,
    "rebuildSchedule": "0 2 * * *"
  },
  "hnsw": {
    "M": 32,
    "efConstruction": 400,
    "efSearch": 200
  }
}

Configuration Parameters

Backend Selection

backend: "hnsw" (In-Memory)

Use when index fits in RAM
Query latency: 10-30ms
Suitable for: <1M files, <50K anchors

backend: "diskann" (Disk-Based)

Use when index exceeds RAM
Query latency: 50-200ms
Suitable for: 1M+ files, massive monorepos

Complexity Levels

Level	M	efConstruction	efSearch	Build Time	Accuracy
low	12	100	50	Fast	Good
medium	16	200	100	Moderate	Better
high	32	400	200	Slow	Best

Anchor Selection Strategies

1. Random (Fastest)

{
  "type": "random",
  "count": 300
}

Speed: O(N)
Quality: Acceptable for homogeneous codebases
Use when: Prototyping or single-language projects

2. K-Means (Recommended)

{
  "type": "kmeans",
  "clusters": 300,
  "samples": 10000
}

Speed: O(N × k × iterations)
Quality: Excellent coverage
Use when: Production deployments (default)

3. Max-Coverage (Best Quality)

{
  "type": "max-coverage",
  "count": 300,
  "diversityThreshold": 0.2
}

Speed: O(N × M²)
Quality: Best for sparse distributions
Use when: High-value outliers must be findable

Common Workflows

Workflow 1: Initial Indexing

# 1. Discover codebase
find /path/to/codebase -type f \( -name "*.py" -o -name "*.ts" \) | wc -l
# Output: 15,000 files

# 2. Calculate recommended anchors
# Rule: M = √N → √15000 ≈ 122 → recommend 300 for better coverage

# 3. Create config
cat > leann.config.json <<EOF
{
  "backend": "hnsw",
  "complexity": "medium",
  "anchorSelection": {
    "type": "kmeans",
    "clusters": 300,
    "samples": 5000
  },
  "quantization": {
    "subVectors": 96,
    "codebookSize": 256
  },
  "embeddingServer": {
    "endpoint": "tcp://localhost:5555",
    "model": "sentence-transformers/all-MiniLM-L6-v2",
    "batchSize": 128,
    "timeout": 30000
  }
}
EOF

# 4. Build index
leann index create \
  --config leann.config.json \
  --input /path/to/codebase \
  --output ./leann.index \
  --progress

# 5. Validate
leann index validate ./leann.index

# 6. Test query
leann query \
  --index ./leann.index \
  --query "database connection pooling" \
  --top-k 5

Workflow 2: CI/CD Integration

# .github/workflows/update-index.yml
name: Update LEANN Index

on:
  push:
    branches: [main]
  schedule:
    - cron: '0 2 * * 0'  # Weekly full rebuild

jobs:
  update-index:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup LEANN
        run: |
          pip install leann
          docker run -d -p 5555:5555 leann/embedding-server:latest

      - name: Incremental Update
        run: |
          leann index update \
            --index ./leann.index \
            --git-diff origin/main HEAD \
            --output ./leann-updated.index

      - name: Validate Index
        run: |
          leann index validate ./leann-updated.index
          if [ $? -ne 0 ]; then
            # Validation failed - trigger full rebuild
            leann index create \
              --config leann.config.json \
              --input . \
              --output ./leann-updated.index
          fi

      - name: Upload Artifact
        uses: actions/upload-artifact@v3
        with:
          name: leann-index
          path: ./leann-updated.index

Workflow 3: IDE Integration

# VS Code extension example
from leann import LEANNIndex

class SemanticSearchProvider:
    def __init__(self, index_path):
        self.index = LEANNIndex.load(index_path)

    def search(self, query, top_k=10):
        results = self.index.query(
            query=query,
            top_k=top_k,
            mode='retrieve',
            filters={'languages': ['python', 'typescript']}
        )
        return [
            {
                'path': item.item.path,
                'score': item.score,
                'preview': self._get_preview(item.item)
            }
            for item in results.items
        ]

Workflow 4: Polyglot Codebase

{
  "backend": "hnsw",
  "complexity": "medium",
  "anchorSelection": {
    "type": "stratified-kmeans",
    "totalClusters": 500,
    "stratification": {
      "enabled": true,
      "groupBy": "language",
      "minPerGroup": 20,
      "allocation": "proportional"
    }
  },
  "filters": {
    "languages": ["python", "typescript", "go", "rust"],
    "excludePaths": ["node_modules/", "__pycache__/", "vendor/", "target/"]
  }
}

Benefit: Ensures each language gets proportional representation in anchors, preventing Python from dominating a Python/TypeScript/Go codebase.

Performance Tuning

Query Latency Optimization

Problem: Queries taking >50ms

Diagnosis:

leann index stats ./leann.index

# Check:
# - Avg candidates evaluated (should be <1000)
# - efSearch parameter (default: 100)

Solutions:

Reduce efSearch (faster, lower recall)

{
  "hnsw": {
    "efSearch": 50
  }
}

Increase anchor count (more memory, better coverage)

leann index rebuild \
  --index ./leann.index \
  --new-anchor-count 1000 \
  --output ./leann-improved.index

Use early termination

leann query \
  --index ./leann.index \
  --query "authentication" \
  --top-k 10 \
  --early-terminate \
  --confidence-threshold 0.9

Storage Optimization

Problem: Index larger than expected

Diagnosis:

leann index stats ./leann.index

# Check:
# - Anchor count (should be ~√N)
# - PQ configuration (96 subvectors × 256 codebook is standard)

Solutions:

Reduce anchor count (lower quality, less storage)

{
  "anchorSelection": {
    "clusters": 100  // Reduced from 300
  }
}

Increase PQ compression (lower quality, less storage)

{
  "quantization": {
    "subVectors": 128,  // More aggressive compression
    "codebookSize": 128
  }
}

Accuracy Improvement

Problem: Poor retrieval relevance

Diagnosis:

leann index validate ./leann.index --verbose

# Check:
# - Coverage score (target: >0.80)
# - Quantization error (target: <0.05)

Solutions:

Increase anchor count

leann index rebuild \
  --index ./leann.index \
  --new-anchor-count 1000 \
  --output ./leann-improved.index

Use k-means selection (better than random)

{
  "anchorSelection": {
    "type": "kmeans",  // Changed from "random"
    "clusters": 500
  }
}

Enable re-ranking with cross-encoder

leann query \
  --index ./leann.index \
  --query "authentication middleware" \
  --mode rerank \
  --top-k 10

Monitoring & Health

Key Metrics

1. Coverage Score (target: >0.80)

leann index validate ./leann.index --metric coverage

Measures how uniformly anchors cover embedding space.

2. Quantization Error (target: <0.05)

leann index validate ./leann.index --metric quantization

Measures average reconstruction error from PQ compression.

3. Utilization Balance (target: >0.70)

leann index validate ./leann.index --metric balance

Checks if anchors are evenly utilized (no overloaded anchors).

Rebuild Triggers

Automatic rebuild when:

Coverage score drops below 0.70
Delta index exceeds threshold (default: 10% of main index)
Query latency degrades >30% from baseline
Manual trigger: leann index rebuild

Health Check Script

#!/bin/bash
# scripts/index-validator.sh

INDEX="./leann.index"

echo "LEANN Index Health Check"
echo "========================"

# Check coverage
COVERAGE=$(leann index validate $INDEX --metric coverage --json | jq -r '.score')
echo "Coverage: $COVERAGE (target: >0.80)"

# Check quantization error
QUANT_ERROR=$(leann index validate $INDEX --metric quantization --json | jq -r '.error')
echo "Quantization Error: $QUANT_ERROR (target: <0.05)"

# Check balance
BALANCE=$(leann index validate $INDEX --metric balance --json | jq -r '.score')
echo "Balance: $BALANCE (target: >0.70)"

# Check delta index size
DELTA_SIZE=$(leann index stats $INDEX --json | jq -r '.deltaIndexSize')
MAIN_SIZE=$(leann index stats $INDEX --json | jq -r '.mainIndexSize')
DELTA_RATIO=$(echo "scale=2; $DELTA_SIZE / $MAIN_SIZE * 100" | bc)
echo "Delta Index: $DELTA_RATIO% of main (rebuild at 10%)"

# Recommend actions
if (( $(echo "$COVERAGE < 0.70" | bc -l) )); then
  echo "⚠️  WARNING: Coverage too low - consider rebuilding with more anchors"
fi

if (( $(echo "$QUANT_ERROR > 0.05" | bc -l) )); then
  echo "⚠️  WARNING: High quantization error - consider reducing PQ compression"
fi

if (( $(echo "$DELTA_RATIO > 10" | bc -l) )); then
  echo "⚠️  WARNING: Delta index large - recommend full rebuild"
fi

Integration Patterns

Pattern 1: Python API

from leann import LEANNIndex, QueryRequest

# Load index
index = LEANNIndex.load('./leann.index')

# Basic query
results = index.query(
    query="database connection pooling",
    top_k=10,
    mode='retrieve'
)

for item in results.items:
    print(f"{item.item.path} (score: {item.score:.2f})")
    print(f"  {item.preview}\n")

# Query with filters
results = index.query(
    query="authentication middleware",
    top_k=5,
    filters={
        'languages': ['typescript'],
        'paths': ['src/middleware/*.ts']
    }
)

# Batch queries
queries = [
    "JWT authentication",
    "database migration",
    "error handling"
]
results_batch = index.batch_query(queries, top_k=5)

Pattern 2: REST API

from fastapi import FastAPI
from leann import LEANNIndex

app = FastAPI()
index = LEANNIndex.load('./leann.index')

@app.post('/search')
async def search(query: str, top_k: int = 10):
    results = index.query(query=query, top_k=top_k)
    return {
        'results': [
            {
                'path': item.item.path,
                'score': item.score,
                'preview': item.preview
            }
            for item in results.items
        ],
        'latency': results.latency
    }

@app.get('/health')
async def health():
    stats = index.get_stats()
    return {
        'status': 'healthy' if stats.healthScore > 70 else 'degraded',
        'healthScore': stats.healthScore,
        'indexSize': stats.storageSize,
        'avgLatency': stats.avgLatency
    }

Pattern 3: CLI Tool

#!/bin/bash
# Wrapper script for common queries

function search() {
  leann query \
    --index ~/.cache/leann/project.index \
    --query "$1" \
    --top-k "${2:-10}" \
    --format json | jq -r '.items[] | "\(.item.path):\(.item.startLine)"'
}

function update-index() {
  leann index update \
    --index ~/.cache/leann/project.index \
    --watch \
    --auto-rebuild
}

# Usage:
# search "database connection" 5
# update-index

Troubleshooting

Issue: Out of Memory During Indexing

Error:

Cannot allocate memory for k-means clustering

Solution: Reduce sampling in anchor selection:

{
  "anchorSelection": {
    "type": "kmeans",
    "clusters": 300,
    "samples": 5000  // Reduced from 10000
  }
}

Issue: Embedding Server Connection Failed

Error:

Connection refused: tcp://localhost:5555

Solution:

# Check if server is running
docker ps | grep leann-embeddings

# If not, start it
docker run -d -p 5555:5555 leann/embedding-server:latest

# Test connection
curl http://localhost:5555/health

Issue: Low Query Accuracy

Error:

Results not relevant to query

Diagnosis:

leann index validate ./leann.index --verbose
# Check coverage score and quantization error

Solution:

# Rebuild with more anchors and k-means selection
leann index rebuild \
  --index ./leann.index \
  --anchor-strategy kmeans \
  --anchor-count 1000 \
  --output ./leann-improved.index

Resources

Codebase Documentation:

Type definitions: @architect/leann-codebase/types/core.ts
Anchor selection principles: @architect/leann-codebase/principles/anchor-selection.md
Graph navigation: @architect/leann-codebase/principles/graph-navigation.md
Indexing workflows: @architect/leann-codebase/templates/indexing-workflow.md

References:

Quick commands: @architect/leann/assets/cheatsheet.md
Indexing patterns: @architect/leann/references/indexing-patterns.md
Configuration guide: @architect/leann/references/configuration.md

External:

GitHub: https://github.com/leann-index/leann
Documentation: https://leann-index.github.io
Paper: "LEANN: Learned Embedding Anchor Navigation"

Summary

Core Value Proposition:

97% storage reduction vs traditional vector databases
10-30ms query latency (local, no network overhead)
Zero cloud costs (self-hosted)
Incremental updates (no nightly rebuilds)

Best For:

Large codebases (10K+ files)
Memory-constrained environments
Cost-sensitive RAG applications
Local-first semantic search

Decision Tree:

Need semantic code search?
├─ Dataset < 1K items → Use simple vector DB (overhead not worth it)
└─ Dataset > 1K items
    ├─ Cloud infrastructure required → Use Pinecone/Weaviate
    └─ Local-first / cost-sensitive → Use LEANN
        ├─ Index < RAM → HNSW backend (10-30ms)
        └─ Index > RAM → DiskANN backend (50-200ms)

Golden Rules:

Start with medium complexity, increase only if needed
Use k-means anchor selection for production
Monitor coverage score - rebuild if <0.70
Enable incremental updates for CI/CD
Use re-ranking for user-facing search

leann

Safety Notice

Copy this and send it to your AI assistant to learn

LEANN Skill

Core Principle

When to Use LEANN

Primary Use Cases

When NOT to Use LEANN

Architecture Overview

Two-Stage Query Process

Storage Model

Key Components

Quick Start

1. Installation

2. Start Embedding Server

3. Create Index

4. Query Index

Configuration

Minimal Configuration

Production Configuration

Configuration Parameters

Backend Selection

Complexity Levels

Anchor Selection Strategies

Common Workflows

Workflow 1: Initial Indexing

Workflow 2: CI/CD Integration

Workflow 3: IDE Integration

Workflow 4: Polyglot Codebase

Performance Tuning

Query Latency Optimization

Storage Optimization

Accuracy Improvement

Monitoring & Health

Key Metrics

Rebuild Triggers

Health Check Script

Integration Patterns

Pattern 1: Python API

Pattern 2: REST API

Pattern 3: CLI Tool

Troubleshooting

Issue: Out of Memory During Indexing

Issue: Embedding Server Connection Failed

Issue: Low Query Accuracy

Resources

Summary

Source Transparency

Related Skills

graph

code refactoring

dspy-code

code-refactor