llamaindex

LlamaIndex - Data Framework for LLM Applications

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "llamaindex" with this command: npx skills add davila7/claude-code-templates/davila7-claude-code-templates-llamaindex

LlamaIndex - Data Framework for LLM Applications

The leading framework for connecting LLMs with your data.

When to use LlamaIndex

Use LlamaIndex when:

  • Building RAG (retrieval-augmented generation) applications

  • Need document question-answering over private data

  • Ingesting data from multiple sources (300+ connectors)

  • Creating knowledge bases for LLMs

  • Building chatbots with enterprise data

  • Need structured data extraction from documents

Metrics:

  • 45,100+ GitHub stars

  • 23,000+ repositories use LlamaIndex

  • 300+ data connectors (LlamaHub)

  • 1,715+ contributors

  • v0.14.7 (stable)

Use alternatives instead:

  • LangChain: More general-purpose, better for agents

  • Haystack: Production search pipelines

  • txtai: Lightweight semantic search

  • Chroma: Just need vector storage

Quick start

Installation

Starter package (recommended)

pip install llama-index

Or minimal core + specific integrations

pip install llama-index-core pip install llama-index-llms-openai pip install llama-index-embeddings-openai

5-line RAG example

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

Load documents

documents = SimpleDirectoryReader("data").load_data()

Create index

index = VectorStoreIndex.from_documents(documents)

Query

query_engine = index.as_query_engine() response = query_engine.query("What did the author do growing up?") print(response)

Core concepts

  1. Data connectors - Load documents

from llama_index.core import SimpleDirectoryReader, Document from llama_index.readers.web import SimpleWebPageReader from llama_index.readers.github import GithubRepositoryReader

Directory of files

documents = SimpleDirectoryReader("./data").load_data()

Web pages

reader = SimpleWebPageReader() documents = reader.load_data(["https://example.com"])

GitHub repository

reader = GithubRepositoryReader(owner="user", repo="repo") documents = reader.load_data(branch="main")

Manual document creation

doc = Document( text="This is the document content", metadata={"source": "manual", "date": "2025-01-01"} )

  1. Indices - Structure data

from llama_index.core import VectorStoreIndex, ListIndex, TreeIndex

Vector index (most common - semantic search)

vector_index = VectorStoreIndex.from_documents(documents)

List index (sequential scan)

list_index = ListIndex.from_documents(documents)

Tree index (hierarchical summary)

tree_index = TreeIndex.from_documents(documents)

Save index

index.storage_context.persist(persist_dir="./storage")

Load index

from llama_index.core import load_index_from_storage, StorageContext storage_context = StorageContext.from_defaults(persist_dir="./storage") index = load_index_from_storage(storage_context)

  1. Query engines - Ask questions

Basic query

query_engine = index.as_query_engine() response = query_engine.query("What is the main topic?") print(response)

Streaming response

query_engine = index.as_query_engine(streaming=True) response = query_engine.query("Explain quantum computing") for text in response.response_gen: print(text, end="", flush=True)

Custom configuration

query_engine = index.as_query_engine( similarity_top_k=3, # Return top 3 chunks response_mode="compact", # Or "tree_summarize", "simple_summarize" verbose=True )

  1. Retrievers - Find relevant chunks

Vector retriever

retriever = index.as_retriever(similarity_top_k=5) nodes = retriever.retrieve("machine learning")

With filtering

retriever = index.as_retriever( similarity_top_k=3, filters={"metadata.category": "tutorial"} )

Custom retriever

from llama_index.core.retrievers import BaseRetriever

class CustomRetriever(BaseRetriever): def _retrieve(self, query_bundle): # Your custom retrieval logic return nodes

Agents with tools

Basic agent

from llama_index.core.agent import FunctionAgent from llama_index.llms.openai import OpenAI

Define tools

def multiply(a: int, b: int) -> int: """Multiply two numbers.""" return a * b

def add(a: int, b: int) -> int: """Add two numbers.""" return a + b

Create agent

llm = OpenAI(model="gpt-4o") agent = FunctionAgent.from_tools( tools=[multiply, add], llm=llm, verbose=True )

Use agent

response = agent.chat("What is 25 * 17 + 142?") print(response)

RAG agent (document search + tools)

from llama_index.core.tools import QueryEngineTool

Create index as before

index = VectorStoreIndex.from_documents(documents)

Wrap query engine as tool

query_tool = QueryEngineTool.from_defaults( query_engine=index.as_query_engine(), name="python_docs", description="Useful for answering questions about Python programming" )

Agent with document search + calculator

agent = FunctionAgent.from_tools( tools=[query_tool, multiply, add], llm=llm )

Agent decides when to search docs vs calculate

response = agent.chat("According to the docs, what is Python used for?")

Advanced RAG patterns

Chat engine (conversational)

from llama_index.core.chat_engine import CondensePlusContextChatEngine

Chat with memory

chat_engine = index.as_chat_engine( chat_mode="condense_plus_context", # Or "context", "react" verbose=True )

Multi-turn conversation

response1 = chat_engine.chat("What is Python?") response2 = chat_engine.chat("Can you give examples?") # Remembers context response3 = chat_engine.chat("What about web frameworks?")

Metadata filtering

from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter

Filter by metadata

filters = MetadataFilters( filters=[ ExactMatchFilter(key="category", value="tutorial"), ExactMatchFilter(key="difficulty", value="beginner") ] )

retriever = index.as_retriever( similarity_top_k=3, filters=filters )

query_engine = index.as_query_engine(filters=filters)

Structured output

from pydantic import BaseModel from llama_index.core.output_parsers import PydanticOutputParser

class Summary(BaseModel): title: str main_points: list[str] conclusion: str

Get structured response

output_parser = PydanticOutputParser(output_cls=Summary) query_engine = index.as_query_engine(output_parser=output_parser)

response = query_engine.query("Summarize the document") summary = response # Pydantic model print(summary.title, summary.main_points)

Data ingestion patterns

Multiple file types

Load all supported formats

documents = SimpleDirectoryReader( "./data", recursive=True, required_exts=[".pdf", ".docx", ".txt", ".md"] ).load_data()

Web scraping

from llama_index.readers.web import BeautifulSoupWebReader

reader = BeautifulSoupWebReader() documents = reader.load_data(urls=[ "https://docs.python.org/3/tutorial/", "https://docs.python.org/3/library/" ])

Database

from llama_index.readers.database import DatabaseReader

reader = DatabaseReader( sql_database_uri="postgresql://user:pass@localhost/db" ) documents = reader.load_data(query="SELECT * FROM articles")

API endpoints

from llama_index.readers.json import JSONReader

reader = JSONReader() documents = reader.load_data("https://api.example.com/data.json")

Vector store integrations

Chroma (local)

from llama_index.vector_stores.chroma import ChromaVectorStore import chromadb

Initialize Chroma

db = chromadb.PersistentClient(path="./chroma_db") collection = db.get_or_create_collection("my_collection")

Create vector store

vector_store = ChromaVectorStore(chroma_collection=collection)

Use in index

from llama_index.core import StorageContext storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

Pinecone (cloud)

from llama_index.vector_stores.pinecone import PineconeVectorStore import pinecone

Initialize Pinecone

pinecone.init(api_key="your-key", environment="us-west1-gcp") pinecone_index = pinecone.Index("my-index")

Create vector store

vector_store = PineconeVectorStore(pinecone_index=pinecone_index) storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

FAISS (fast)

from llama_index.vector_stores.faiss import FaissVectorStore import faiss

Create FAISS index

d = 1536 # Dimension of embeddings faiss_index = faiss.IndexFlatL2(d)

vector_store = FaissVectorStore(faiss_index=faiss_index) storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

Customization

Custom LLM

from llama_index.llms.anthropic import Anthropic from llama_index.core import Settings

Set global LLM

Settings.llm = Anthropic(model="claude-sonnet-4-5-20250929")

Now all queries use Anthropic

query_engine = index.as_query_engine()

Custom embeddings

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

Use HuggingFace embeddings

Settings.embed_model = HuggingFaceEmbedding( model_name="sentence-transformers/all-mpnet-base-v2" )

index = VectorStoreIndex.from_documents(documents)

Custom prompt templates

from llama_index.core import PromptTemplate

qa_prompt = PromptTemplate( "Context: {context_str}\n" "Question: {query_str}\n" "Answer the question based only on the context. " "If the answer is not in the context, say 'I don't know'.\n" "Answer: " )

query_engine = index.as_query_engine(text_qa_template=qa_prompt)

Multi-modal RAG

Image + text

from llama_index.core import SimpleDirectoryReader from llama_index.multi_modal_llms.openai import OpenAIMultiModal

Load images and documents

documents = SimpleDirectoryReader( "./data", required_exts=[".jpg", ".png", ".pdf"] ).load_data()

Multi-modal index

index = VectorStoreIndex.from_documents(documents)

Query with multi-modal LLM

multi_modal_llm = OpenAIMultiModal(model="gpt-4o") query_engine = index.as_query_engine(llm=multi_modal_llm)

response = query_engine.query("What is in the diagram on page 3?")

Evaluation

Response quality

from llama_index.core.evaluation import RelevancyEvaluator, FaithfulnessEvaluator

Evaluate relevance

relevancy = RelevancyEvaluator() result = relevancy.evaluate_response( query="What is Python?", response=response ) print(f"Relevancy: {result.passing}")

Evaluate faithfulness (no hallucination)

faithfulness = FaithfulnessEvaluator() result = faithfulness.evaluate_response( query="What is Python?", response=response ) print(f"Faithfulness: {result.passing}")

Best practices

  • Use vector indices for most cases - Best performance

  • Save indices to disk - Avoid re-indexing

  • Chunk documents properly - 512-1024 tokens optimal

  • Add metadata - Enables filtering and tracking

  • Use streaming - Better UX for long responses

  • Enable verbose during dev - See retrieval process

  • Evaluate responses - Check relevance and faithfulness

  • Use chat engine for conversations - Built-in memory

  • Persist storage - Don't lose your index

  • Monitor costs - Track embedding and LLM usage

Common patterns

Document Q&A system

Complete RAG pipeline

documents = SimpleDirectoryReader("docs").load_data() index = VectorStoreIndex.from_documents(documents) index.storage_context.persist(persist_dir="./storage")

Query

query_engine = index.as_query_engine( similarity_top_k=3, response_mode="compact", verbose=True ) response = query_engine.query("What is the main topic?") print(response) print(f"Sources: {[node.metadata['file_name'] for node in response.source_nodes]}")

Chatbot with memory

Conversational interface

chat_engine = index.as_chat_engine( chat_mode="condense_plus_context", verbose=True )

Multi-turn chat

while True: user_input = input("You: ") if user_input.lower() == "quit": break response = chat_engine.chat(user_input) print(f"Bot: {response}")

Performance benchmarks

Operation Latency Notes

Index 100 docs ~10-30s One-time, can persist

Query (vector) ~0.5-2s Retrieval + LLM

Streaming query ~0.5s first token Better UX

Agent with tools ~3-8s Multiple tool calls

LlamaIndex vs LangChain

Feature LlamaIndex LangChain

Best for RAG, document Q&A Agents, general LLM apps

Data connectors 300+ (LlamaHub) 100+

RAG focus Core feature One of many

Learning curve Easier for RAG Steeper

Customization High Very high

Documentation Excellent Good

Use LlamaIndex when:

  • Your primary use case is RAG

  • Need many data connectors

  • Want simpler API for document Q&A

  • Building knowledge retrieval system

Use LangChain when:

  • Building complex agents

  • Need more general-purpose tools

  • Want more flexibility

  • Complex multi-step workflows

References

  • Query Engines Guide - Query modes, customization, streaming

  • Agents Guide - Tool creation, RAG agents, multi-step reasoning

  • Data Connectors Guide - 300+ connectors, custom loaders

Resources

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

senior-data-scientist

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

senior-backend

No summary provided by upstream source.

Repository SourceNeeds Review
1.2K-davila7
Coding

senior-frontend

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

ui-ux-pro-max

No summary provided by upstream source.

Repository SourceNeeds Review