FAISS - Efficient Similarity Search
Facebook AI's library for billion-scale vector similarity search.
When to use FAISS
Use FAISS when:
-
Need fast similarity search on large vector datasets (millions/billions)
-
GPU acceleration required
-
Pure vector similarity (no metadata filtering needed)
-
High throughput, low latency critical
-
Offline/batch processing of embeddings
Metrics:
-
31,700+ GitHub stars
-
Meta/Facebook AI Research
-
Handles billions of vectors
-
C++ with Python bindings
Use alternatives instead:
-
Chroma/Pinecone: Need metadata filtering
-
Weaviate: Need full database features
-
Annoy: Simpler, fewer features
Quick start
Installation
CPU only
pip install faiss-cpu
GPU support
pip install faiss-gpu
Basic usage
import faiss import numpy as np
Create sample data (1000 vectors, 128 dimensions)
d = 128 nb = 1000 vectors = np.random.random((nb, d)).astype('float32')
Create index
index = faiss.IndexFlatL2(d) # L2 distance index.add(vectors) # Add vectors
Search
k = 5 # Find 5 nearest neighbors query = np.random.random((1, d)).astype('float32') distances, indices = index.search(query, k)
print(f"Nearest neighbors: {indices}") print(f"Distances: {distances}")
Index types
- Flat (exact search)
L2 (Euclidean) distance
index = faiss.IndexFlatL2(d)
Inner product (cosine similarity if normalized)
index = faiss.IndexFlatIP(d)
Slowest, most accurate
- IVF (inverted file) - Fast approximate
Create quantizer
quantizer = faiss.IndexFlatL2(d)
IVF index with 100 clusters
nlist = 100 index = faiss.IndexIVFFlat(quantizer, d, nlist)
Train on data
index.train(vectors)
Add vectors
index.add(vectors)
Search (nprobe = clusters to search)
index.nprobe = 10 distances, indices = index.search(query, k)
- HNSW (Hierarchical NSW) - Best quality/speed
HNSW index
M = 32 # Number of connections per layer index = faiss.IndexHNSWFlat(d, M)
No training needed
index.add(vectors)
Search
distances, indices = index.search(query, k)
- Product Quantization - Memory efficient
PQ reduces memory by 16-32×
m = 8 # Number of subquantizers nbits = 8 index = faiss.IndexPQ(d, m, nbits)
Train and add
index.train(vectors) index.add(vectors)
Save and load
Save index
faiss.write_index(index, "large.index")
Load index
index = faiss.read_index("large.index")
Continue using
distances, indices = index.search(query, k)
GPU acceleration
Single GPU
res = faiss.StandardGpuResources() index_cpu = faiss.IndexFlatL2(d) index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu) # GPU 0
Multi-GPU
index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)
10-100× faster than CPU
LangChain integration
from langchain_community.vectorstores import FAISS from langchain_openai import OpenAIEmbeddings
Create FAISS vector store
vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())
Save
vectorstore.save_local("faiss_index")
Load
vectorstore = FAISS.load_local( "faiss_index", OpenAIEmbeddings(), allow_dangerous_deserialization=True )
Search
results = vectorstore.similarity_search("query", k=5)
LlamaIndex integration
from llama_index.vector_stores.faiss import FaissVectorStore import faiss
Create FAISS index
d = 1536 faiss_index = faiss.IndexFlatL2(d)
vector_store = FaissVectorStore(faiss_index=faiss_index)
Best practices
-
Choose right index type - Flat for <10K, IVF for 10K-1M, HNSW for quality
-
Normalize for cosine - Use IndexFlatIP with normalized vectors
-
Use GPU for large datasets - 10-100× faster
-
Save trained indices - Training is expensive
-
Tune nprobe/ef_search - Balance speed/accuracy
-
Monitor memory - PQ for large datasets
-
Batch queries - Better GPU utilization
Performance
Index Type Build Time Search Time Memory Accuracy
Flat Fast Slow High 100%
IVF Medium Fast Medium 95-99%
HNSW Slow Fastest High 99%
PQ Medium Fast Low 90-95%
Resources
-
GitHub: https://github.com/facebookresearch/faiss ⭐ 31,700+
-
License: MIT