Chroma - Open-Source Embedding Database

The AI-native database for building LLM applications with memory.

When to use Chroma

Use Chroma when:

Building RAG (retrieval-augmented generation) applications
Need local/self-hosted vector database
Want open-source solution (Apache 2.0)
Prototyping in notebooks
Semantic search over documents
Storing embeddings with metadata

Metrics:

24,300+ GitHub stars
1,900+ forks
v1.3.3 (stable, weekly releases)
Apache 2.0 license

Use alternatives instead:

Pinecone: Managed cloud, auto-scaling
FAISS: Pure similarity search, no metadata
Weaviate: Production ML-native database
Qdrant: High performance, Rust-based

Quick start

Installation

Python

pip install chromadb

JavaScript/TypeScript

npm install chromadb @chroma-core/default-embed

Basic usage (Python)

import chromadb

Create client

client = chromadb.Client()

Create collection

collection = client.create_collection(name="my_collection")

Add documents

collection.add( documents=["This is document 1", "This is document 2"], metadatas=[{"source": "doc1"}, {"source": "doc2"}], ids=["id1", "id2"] )

Query

results = collection.query( query_texts=["document about topic"], n_results=2 )

print(results)

Core operations

Create collection

Simple collection

collection = client.create_collection("my_docs")

With custom embedding function

from chromadb.utils import embedding_functions

openai_ef = embedding_functions.OpenAIEmbeddingFunction( api_key="your-key", model_name="text-embedding-3-small" )

collection = client.create_collection( name="my_docs", embedding_function=openai_ef )

Get existing collection

collection = client.get_collection("my_docs")

Delete collection

client.delete_collection("my_docs")

Add documents

Add with auto-generated IDs

collection.add( documents=["Doc 1", "Doc 2", "Doc 3"], metadatas=[ {"source": "web", "category": "tutorial"}, {"source": "pdf", "page": 5}, {"source": "api", "timestamp": "2025-01-01"} ], ids=["id1", "id2", "id3"] )

Add with custom embeddings

collection.add( embeddings=[[0.1, 0.2, ...], [0.3, 0.4, ...]], documents=["Doc 1", "Doc 2"], ids=["id1", "id2"] )

Query (similarity search)

Basic query

results = collection.query( query_texts=["machine learning tutorial"], n_results=5 )

Query with filters

results = collection.query( query_texts=["Python programming"], n_results=3, where={"source": "web"} )

Query with metadata filters

results = collection.query( query_texts=["advanced topics"], where={ "$and": [ {"category": "tutorial"}, {"difficulty": {"$gte": 3}} ] } )

Access results

print(results["documents"]) # List of matching documents print(results["metadatas"]) # Metadata for each doc print(results["distances"]) # Similarity scores print(results["ids"]) # Document IDs

Get documents

Get by IDs

docs = collection.get( ids=["id1", "id2"] )

Get with filters

docs = collection.get( where={"category": "tutorial"}, limit=10 )

Get all documents

docs = collection.get()

Update documents

Update document content

collection.update( ids=["id1"], documents=["Updated content"], metadatas=[{"source": "updated"}] )

Delete documents

Delete by IDs

collection.delete(ids=["id1", "id2"])

Delete with filter

collection.delete( where={"source": "outdated"} )

Persistent storage

Persist to disk

client = chromadb.PersistentClient(path="./chroma_db")

collection = client.create_collection("my_docs") collection.add(documents=["Doc 1"], ids=["id1"])

Data persisted automatically

Reload later with same path

client = chromadb.PersistentClient(path="./chroma_db") collection = client.get_collection("my_docs")

Embedding functions

Default (Sentence Transformers)

Uses sentence-transformers by default

collection = client.create_collection("my_docs")

Default model: all-MiniLM-L6-v2

OpenAI

from chromadb.utils import embedding_functions

openai_ef = embedding_functions.OpenAIEmbeddingFunction( api_key="your-key", model_name="text-embedding-3-small" )

collection = client.create_collection( name="openai_docs", embedding_function=openai_ef )

HuggingFace

huggingface_ef = embedding_functions.HuggingFaceEmbeddingFunction( api_key="your-key", model_name="sentence-transformers/all-mpnet-base-v2" )

collection = client.create_collection( name="hf_docs", embedding_function=huggingface_ef )

Custom embedding function

from chromadb import Documents, EmbeddingFunction, Embeddings

class MyEmbeddingFunction(EmbeddingFunction): def call(self, input: Documents) -> Embeddings: # Your embedding logic return embeddings

my_ef = MyEmbeddingFunction() collection = client.create_collection( name="custom_docs", embedding_function=my_ef )

Metadata filtering

Exact match

results = collection.query( query_texts=["query"], where={"category": "tutorial"} )

Comparison operators

results = collection.query( query_texts=["query"], where={"page": {"$gt": 10}} # $gt, $gte, $lt, $lte, $ne )

Logical operators

results = collection.query( query_texts=["query"], where={ "$and": [ {"category": "tutorial"}, {"difficulty": {"$lte": 3}} ] } # Also: $or )

Contains

results = collection.query( query_texts=["query"], where={"tags": {"$in": ["python", "ml"]}} )

LangChain integration

from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings from langchain.text_splitter import RecursiveCharacterTextSplitter

Split documents

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000) docs = text_splitter.split_documents(documents)

Create Chroma vector store

vectorstore = Chroma.from_documents( documents=docs, embedding=OpenAIEmbeddings(), persist_directory="./chroma_db" )

Query

results = vectorstore.similarity_search("machine learning", k=3)

As retriever

retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

LlamaIndex integration

from llama_index.vector_stores.chroma import ChromaVectorStore from llama_index.core import VectorStoreIndex, StorageContext import chromadb

Initialize Chroma

db = chromadb.PersistentClient(path="./chroma_db") collection = db.get_or_create_collection("my_collection")

Create vector store

vector_store = ChromaVectorStore(chroma_collection=collection) storage_context = StorageContext.from_defaults(vector_store=vector_store)

Create index

index = VectorStoreIndex.from_documents( documents, storage_context=storage_context )

Query

query_engine = index.as_query_engine() response = query_engine.query("What is machine learning?")

Server mode

Run Chroma server

Terminal: chroma run --path ./chroma_db --port 8000

Connect to server

import chromadb from chromadb.config import Settings

client = chromadb.HttpClient( host="localhost", port=8000, settings=Settings(anonymized_telemetry=False) )

Use as normal

collection = client.get_or_create_collection("my_docs")

Best practices

Use persistent client - Don't lose data on restart
Add metadata - Enables filtering and tracking
Batch operations - Add multiple docs at once
Choose right embedding model - Balance speed/quality
Use filters - Narrow search space
Unique IDs - Avoid collisions
Regular backups - Copy chroma_db directory
Monitor collection size - Scale up if needed
Test embedding functions - Ensure quality
Use server mode for production - Better for multi-user

Performance

Operation Latency Notes

Add 100 docs ~1-3s With embedding

Query (top 10) ~50-200ms Depends on collection size

Metadata filter ~10-50ms Fast with proper indexing

Resources

GitHub: https://github.com/chroma-core/chroma ⭐ 24,300+
Docs: https://docs.trychroma.com
Discord: https://discord.gg/MMeYNTmh3x
Version: 1.3.3+
License: Apache 2.0

chroma

Safety Notice

Copy this and send it to your AI assistant to learn

Python

JavaScript/TypeScript

Create client

Create collection

Add documents

Query

Simple collection

With custom embedding function

Get existing collection

Delete collection

Add with auto-generated IDs

Add with custom embeddings

Basic query

Query with filters

Query with metadata filters

Access results

Get by IDs

Get with filters

Get all documents

Update document content

Delete by IDs

Delete with filter

Persist to disk

Data persisted automatically

Reload later with same path

Uses sentence-transformers by default

Default model: all-MiniLM-L6-v2

Exact match

Comparison operators

Logical operators

Contains

Split documents

Create Chroma vector store

Query

As retriever

Initialize Chroma

Create vector store

Create index

Query

Run Chroma server

Terminal: chroma run --path ./chroma_db --port 8000

Connect to server

Use as normal

Source Transparency

Related Skills

senior-data-scientist

senior-backend

senior-frontend

excel analysis