RAG Retrieval
Comprehensive patterns for building production RAG systems. Each category has individual rule files in rules/ loaded on-demand.
Quick Reference
Category Rules Impact When to Use
Core RAG 4 CRITICAL Basic RAG, citations, hybrid search, context management
Embeddings 3 HIGH Model selection, chunking, batch/cache optimization
Contextual Retrieval 3 HIGH Context-prepending, hybrid BM25+vector, pipeline
HyDE 3 HIGH Vocabulary mismatch, hypothetical document generation
Agentic RAG 4 HIGH Self-RAG, CRAG, knowledge graphs, adaptive routing
Multimodal RAG 3 MEDIUM Image+text retrieval, PDF chunking, cross-modal search
Query Decomposition 3 MEDIUM Multi-concept queries, parallel retrieval, RRF fusion
Reranking 3 MEDIUM Cross-encoder, LLM scoring, combined signals
PGVector 4 HIGH PostgreSQL hybrid search, HNSW indexes, schema design
Total: 30 rules across 9 categories
Core RAG
Fundamental patterns for retrieval, generation, and pipeline composition.
Rule File Key Pattern
Basic RAG rules/core-basic-rag.md
Retrieve + context + generate with citations
Hybrid Search rules/core-hybrid-search.md
RRF fusion (k=60) for semantic + keyword
Context Management rules/core-context-management.md
Token budgeting + sufficiency check
Pipeline Composition rules/core-pipeline-composition.md
Composable Decompose → HyDE → Retrieve → Rerank
Embeddings
Embedding models, chunking strategies, and production optimization.
Rule File Key Pattern
Models & API rules/embeddings-models.md
Model selection, batch API, similarity
Chunking rules/embeddings-chunking.md
Semantic boundary splitting, 512 token sweet spot
Advanced rules/embeddings-advanced.md
Redis cache, Matryoshka dims, batch processing
Contextual Retrieval
Anthropic's context-prepending technique — 67% fewer retrieval failures.
Rule File Key Pattern
Context Prepending rules/contextual-prepend.md
LLM-generated context + prompt caching
Hybrid Search rules/contextual-hybrid.md
40% BM25 / 60% vector weight split
Complete Pipeline rules/contextual-pipeline.md
End-to-end indexing + hybrid retrieval
HyDE
Hypothetical Document Embeddings for bridging vocabulary gaps.
Rule File Key Pattern
Generation rules/hyde-generation.md
Embed hypothetical doc, not query
Per-Concept rules/hyde-per-concept.md
Parallel HyDE for multi-topic queries
Fallback rules/hyde-fallback.md
2-3s timeout → direct embedding fallback
Agentic RAG
Self-correcting retrieval with LLM-driven decision making.
Rule File Key Pattern
Self-RAG rules/agentic-self-rag.md
Binary document grading for relevance
Corrective RAG rules/agentic-corrective-rag.md
CRAG workflow with web fallback
Knowledge Graph rules/agentic-knowledge-graph.md
KG + vector hybrid for entity-rich domains
Adaptive Retrieval rules/agentic-adaptive-retrieval.md
Query routing to optimal strategy
Multimodal RAG
Image + text retrieval with cross-modal search.
Rule File Key Pattern
Embeddings rules/multimodal-embeddings.md
CLIP, SigLIP 2, Voyage multimodal-3
Chunking rules/multimodal-chunking.md
PDF extraction preserving images
Pipeline rules/multimodal-pipeline.md
Dedup + hybrid retrieval + generation
Query Decomposition
Breaking complex queries into concepts for parallel retrieval.
Rule File Key Pattern
Detection rules/query-detection.md
Heuristic indicators (<1ms fast path)
Decompose + RRF rules/query-decompose.md
LLM concept extraction + parallel retrieval
HyDE Combo rules/query-hyde-combo.md
Decompose + HyDE for maximum coverage
Reranking
Post-retrieval re-scoring for higher precision.
Rule File Key Pattern
Cross-Encoder rules/reranking-cross-encoder.md
ms-marco-MiniLM (~50ms, free)
LLM Reranking rules/reranking-llm.md
Batch scoring + Cohere API
Combined rules/reranking-combined.md
Multi-signal weighted scoring
PGVector
Production hybrid search with PostgreSQL.
Rule File Key Pattern
Schema rules/pgvector-schema.md
HNSW index + pre-computed tsvector
Hybrid Search rules/pgvector-hybrid-search.md
SQLAlchemy RRF with FULL OUTER JOIN
Indexing rules/pgvector-indexing.md
HNSW (17x faster) vs IVFFlat
Metadata rules/pgvector-metadata.md
Filtering, boosting, Redis 8 comparison
Quick Start Example
from openai import OpenAI
client = OpenAI()
async def rag_query(question: str, top_k: int = 5) -> dict: """Basic RAG with citations.""" docs = await vector_db.search(question, limit=top_k) context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(docs)])
response = await llm.chat([
{"role": "system", "content": "Answer with inline citations [1], [2]. Use ONLY provided context."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
])
return {"answer": response.content, "sources": [d.metadata['source'] for d in docs]}
Key Decisions
Decision Recommendation
Embedding model text-embedding-3-small (general), voyage-3 (production)
Chunk size 256-1024 tokens (512 typical)
Hybrid weight 40% BM25 / 60% vector
Top-k 3-10 documents
Temperature 0.1-0.3 (factual)
Context budget 4K-8K tokens
Reranking Retrieve 50, rerank to 10
Vector index HNSW (production), IVFFlat (high-volume)
HyDE timeout 2-3 seconds with fallback
Query decomposition Heuristic first, LLM only if multi-concept
Common Mistakes
-
No citation tracking (unverifiable answers)
-
Context too large (dilutes relevance)
-
Single retrieval method (misses keyword matches)
-
Not chunking long documents (context gets lost)
-
Embedding queries differently than documents
-
No fallback path in agentic RAG (workflow hangs)
-
Infinite rewrite loops (no retry limit)
-
Using wrong similarity metric (cosine vs euclidean)
-
Not caching embeddings (recomputing unchanged content)
-
Missing image captions in multimodal RAG (limits text search)
Evaluations
See test-cases.json for 30 test cases across all categories.
Related Skills
-
ork:langgraph
-
LangGraph workflow patterns (for agentic RAG workflows)
-
caching
-
Cache RAG responses for repeated queries
-
ork:golden-dataset
-
Evaluate retrieval quality
-
ork:llm-integration
-
Local embeddings with nomic-embed-text
-
vision-language-models
-
Image analysis for multimodal RAG
-
ork:database-patterns
-
Schema design for vector search
Capability Details
retrieval-patterns
Keywords: retrieval, context, chunks, relevance, rag Solves:
-
Retrieve relevant context for LLM
-
Implement RAG pipeline with citations
-
Optimize retrieval quality
hybrid-search
Keywords: hybrid, bm25, vector, fusion, rrf Solves:
-
Combine keyword and semantic search
-
Implement reciprocal rank fusion
-
Balance precision and recall
embeddings
Keywords: embedding, text to vector, vectorize, chunk, similarity Solves:
-
Convert text to vector embeddings
-
Choose embedding models and dimensions
-
Implement chunking strategies
contextual-retrieval
Keywords: contextual, anthropic, context-prepend, bm25 Solves:
-
Prepend context to chunks for better retrieval
-
Reduce retrieval failures by 67%
-
Implement hybrid BM25+vector search
hyde
Keywords: hyde, hypothetical, vocabulary mismatch Solves:
-
Bridge vocabulary gaps in semantic search
-
Generate hypothetical documents for embedding
-
Handle abstract or conceptual queries
agentic-rag
Keywords: self-rag, crag, corrective, adaptive, grading Solves:
-
Build self-correcting RAG workflows
-
Grade document relevance
-
Implement web search fallback
multimodal-rag
Keywords: multimodal, image, clip, vision, pdf Solves:
-
Build RAG with images and text
-
Cross-modal search (text → image)
-
Process PDFs with mixed content
query-decomposition
Keywords: decompose, multi-concept, complex query Solves:
-
Break complex queries into concepts
-
Parallel retrieval per concept
-
Improve coverage for compound questions
reranking
Keywords: rerank, cross-encoder, precision, scoring Solves:
-
Improve search precision post-retrieval
-
Score relevance with cross-encoder or LLM
-
Combine multiple scoring signals
pgvector-search
Keywords: pgvector, postgresql, hnsw, tsvector, hybrid Solves:
-
Production hybrid search with PostgreSQL
-
HNSW vs IVFFlat index selection
-
SQL-based RRF fusion