chroma

Chroma - Open-Source Embedding Database

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "chroma" with this command: npx skills add davila7/claude-code-templates/davila7-claude-code-templates-chroma

Chroma - Open-Source Embedding Database

The AI-native database for building LLM applications with memory.

When to use Chroma

Use Chroma when:

  • Building RAG (retrieval-augmented generation) applications

  • Need local/self-hosted vector database

  • Want open-source solution (Apache 2.0)

  • Prototyping in notebooks

  • Semantic search over documents

  • Storing embeddings with metadata

Metrics:

  • 24,300+ GitHub stars

  • 1,900+ forks

  • v1.3.3 (stable, weekly releases)

  • Apache 2.0 license

Use alternatives instead:

  • Pinecone: Managed cloud, auto-scaling

  • FAISS: Pure similarity search, no metadata

  • Weaviate: Production ML-native database

  • Qdrant: High performance, Rust-based

Quick start

Installation

Python

pip install chromadb

JavaScript/TypeScript

npm install chromadb @chroma-core/default-embed

Basic usage (Python)

import chromadb

Create client

client = chromadb.Client()

Create collection

collection = client.create_collection(name="my_collection")

Add documents

collection.add( documents=["This is document 1", "This is document 2"], metadatas=[{"source": "doc1"}, {"source": "doc2"}], ids=["id1", "id2"] )

Query

results = collection.query( query_texts=["document about topic"], n_results=2 )

print(results)

Core operations

  1. Create collection

Simple collection

collection = client.create_collection("my_docs")

With custom embedding function

from chromadb.utils import embedding_functions

openai_ef = embedding_functions.OpenAIEmbeddingFunction( api_key="your-key", model_name="text-embedding-3-small" )

collection = client.create_collection( name="my_docs", embedding_function=openai_ef )

Get existing collection

collection = client.get_collection("my_docs")

Delete collection

client.delete_collection("my_docs")

  1. Add documents

Add with auto-generated IDs

collection.add( documents=["Doc 1", "Doc 2", "Doc 3"], metadatas=[ {"source": "web", "category": "tutorial"}, {"source": "pdf", "page": 5}, {"source": "api", "timestamp": "2025-01-01"} ], ids=["id1", "id2", "id3"] )

Add with custom embeddings

collection.add( embeddings=[[0.1, 0.2, ...], [0.3, 0.4, ...]], documents=["Doc 1", "Doc 2"], ids=["id1", "id2"] )

  1. Query (similarity search)

Basic query

results = collection.query( query_texts=["machine learning tutorial"], n_results=5 )

Query with filters

results = collection.query( query_texts=["Python programming"], n_results=3, where={"source": "web"} )

Query with metadata filters

results = collection.query( query_texts=["advanced topics"], where={ "$and": [ {"category": "tutorial"}, {"difficulty": {"$gte": 3}} ] } )

Access results

print(results["documents"]) # List of matching documents print(results["metadatas"]) # Metadata for each doc print(results["distances"]) # Similarity scores print(results["ids"]) # Document IDs

  1. Get documents

Get by IDs

docs = collection.get( ids=["id1", "id2"] )

Get with filters

docs = collection.get( where={"category": "tutorial"}, limit=10 )

Get all documents

docs = collection.get()

  1. Update documents

Update document content

collection.update( ids=["id1"], documents=["Updated content"], metadatas=[{"source": "updated"}] )

  1. Delete documents

Delete by IDs

collection.delete(ids=["id1", "id2"])

Delete with filter

collection.delete( where={"source": "outdated"} )

Persistent storage

Persist to disk

client = chromadb.PersistentClient(path="./chroma_db")

collection = client.create_collection("my_docs") collection.add(documents=["Doc 1"], ids=["id1"])

Data persisted automatically

Reload later with same path

client = chromadb.PersistentClient(path="./chroma_db") collection = client.get_collection("my_docs")

Embedding functions

Default (Sentence Transformers)

Uses sentence-transformers by default

collection = client.create_collection("my_docs")

Default model: all-MiniLM-L6-v2

OpenAI

from chromadb.utils import embedding_functions

openai_ef = embedding_functions.OpenAIEmbeddingFunction( api_key="your-key", model_name="text-embedding-3-small" )

collection = client.create_collection( name="openai_docs", embedding_function=openai_ef )

HuggingFace

huggingface_ef = embedding_functions.HuggingFaceEmbeddingFunction( api_key="your-key", model_name="sentence-transformers/all-mpnet-base-v2" )

collection = client.create_collection( name="hf_docs", embedding_function=huggingface_ef )

Custom embedding function

from chromadb import Documents, EmbeddingFunction, Embeddings

class MyEmbeddingFunction(EmbeddingFunction): def call(self, input: Documents) -> Embeddings: # Your embedding logic return embeddings

my_ef = MyEmbeddingFunction() collection = client.create_collection( name="custom_docs", embedding_function=my_ef )

Metadata filtering

Exact match

results = collection.query( query_texts=["query"], where={"category": "tutorial"} )

Comparison operators

results = collection.query( query_texts=["query"], where={"page": {"$gt": 10}} # $gt, $gte, $lt, $lte, $ne )

Logical operators

results = collection.query( query_texts=["query"], where={ "$and": [ {"category": "tutorial"}, {"difficulty": {"$lte": 3}} ] } # Also: $or )

Contains

results = collection.query( query_texts=["query"], where={"tags": {"$in": ["python", "ml"]}} )

LangChain integration

from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings from langchain.text_splitter import RecursiveCharacterTextSplitter

Split documents

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000) docs = text_splitter.split_documents(documents)

Create Chroma vector store

vectorstore = Chroma.from_documents( documents=docs, embedding=OpenAIEmbeddings(), persist_directory="./chroma_db" )

Query

results = vectorstore.similarity_search("machine learning", k=3)

As retriever

retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

LlamaIndex integration

from llama_index.vector_stores.chroma import ChromaVectorStore from llama_index.core import VectorStoreIndex, StorageContext import chromadb

Initialize Chroma

db = chromadb.PersistentClient(path="./chroma_db") collection = db.get_or_create_collection("my_collection")

Create vector store

vector_store = ChromaVectorStore(chroma_collection=collection) storage_context = StorageContext.from_defaults(vector_store=vector_store)

Create index

index = VectorStoreIndex.from_documents( documents, storage_context=storage_context )

Query

query_engine = index.as_query_engine() response = query_engine.query("What is machine learning?")

Server mode

Run Chroma server

Terminal: chroma run --path ./chroma_db --port 8000

Connect to server

import chromadb from chromadb.config import Settings

client = chromadb.HttpClient( host="localhost", port=8000, settings=Settings(anonymized_telemetry=False) )

Use as normal

collection = client.get_or_create_collection("my_docs")

Best practices

  • Use persistent client - Don't lose data on restart

  • Add metadata - Enables filtering and tracking

  • Batch operations - Add multiple docs at once

  • Choose right embedding model - Balance speed/quality

  • Use filters - Narrow search space

  • Unique IDs - Avoid collisions

  • Regular backups - Copy chroma_db directory

  • Monitor collection size - Scale up if needed

  • Test embedding functions - Ensure quality

  • Use server mode for production - Better for multi-user

Performance

Operation Latency Notes

Add 100 docs ~1-3s With embedding

Query (top 10) ~50-200ms Depends on collection size

Metadata filter ~10-50ms Fast with proper indexing

Resources

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

senior-data-scientist

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

senior-backend

No summary provided by upstream source.

Repository SourceNeeds Review
-1.2K
davila7
Coding

senior-frontend

No summary provided by upstream source.

Repository SourceNeeds Review