Sentence Transformers - State-of-the-Art Embeddings
Python framework for sentence and text embeddings using transformers.
When to use Sentence Transformers
Use when:
-
Need high-quality embeddings for RAG
-
Semantic similarity and search
-
Text clustering and classification
-
Multilingual embeddings (100+ languages)
-
Running embeddings locally (no API)
-
Cost-effective alternative to OpenAI embeddings
Metrics:
-
15,700+ GitHub stars
-
5000+ pre-trained models
-
100+ languages supported
-
Based on PyTorch/Transformers
Use alternatives instead:
-
OpenAI Embeddings: Need API-based, highest quality
-
Instructor: Task-specific instructions
-
Cohere Embed: Managed service
Quick start
Installation
pip install sentence-transformers
Basic usage
from sentence_transformers import SentenceTransformer
Load model
model = SentenceTransformer('all-MiniLM-L6-v2')
Generate embeddings
sentences = [ "This is an example sentence", "Each sentence is converted to a vector" ]
embeddings = model.encode(sentences) print(embeddings.shape) # (2, 384)
Cosine similarity
from sentence_transformers.util import cos_sim similarity = cos_sim(embeddings[0], embeddings[1]) print(f"Similarity: {similarity.item():.4f}")
Popular models
General purpose
Fast, good quality (384 dim)
model = SentenceTransformer('all-MiniLM-L6-v2')
Better quality (768 dim)
model = SentenceTransformer('all-mpnet-base-v2')
Best quality (1024 dim, slower)
model = SentenceTransformer('all-roberta-large-v1')
Multilingual
50+ languages
model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')
100+ languages
model = SentenceTransformer('paraphrase-multilingual-mpnet-base-v2')
Domain-specific
Legal domain
model = SentenceTransformer('nlpaueb/legal-bert-base-uncased')
Scientific papers
model = SentenceTransformer('allenai/specter')
Code
model = SentenceTransformer('microsoft/codebert-base')
Semantic search
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('all-MiniLM-L6-v2')
Corpus
corpus = [ "Python is a programming language", "Machine learning uses algorithms", "Neural networks are powerful" ]
Encode corpus
corpus_embeddings = model.encode(corpus, convert_to_tensor=True)
Query
query = "What is Python?" query_embedding = model.encode(query, convert_to_tensor=True)
Find most similar
hits = util.semantic_search(query_embedding, corpus_embeddings, top_k=3) print(hits)
Similarity computation
Cosine similarity
similarity = util.cos_sim(embedding1, embedding2)
Dot product
similarity = util.dot_score(embedding1, embedding2)
Pairwise cosine similarity
similarities = util.cos_sim(embeddings, embeddings)
Batch encoding
Efficient batch processing
sentences = ["sentence 1", "sentence 2", ...] * 1000
embeddings = model.encode( sentences, batch_size=32, show_progress_bar=True, convert_to_tensor=False # or True for PyTorch tensors )
Fine-tuning
from sentence_transformers import InputExample, losses from torch.utils.data import DataLoader
Training data
train_examples = [ InputExample(texts=['sentence 1', 'sentence 2'], label=0.8), InputExample(texts=['sentence 3', 'sentence 4'], label=0.3), ]
train_dataloader = DataLoader(train_examples, batch_size=16)
Loss function
train_loss = losses.CosineSimilarityLoss(model)
Train
model.fit( train_objectives=[(train_dataloader, train_loss)], epochs=10, warmup_steps=100 )
Save
model.save('my-finetuned-model')
LangChain integration
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings( model_name="sentence-transformers/all-mpnet-base-v2" )
Use with vector stores
from langchain_chroma import Chroma
vectorstore = Chroma.from_documents( documents=docs, embedding=embeddings )
LlamaIndex integration
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding( model_name="sentence-transformers/all-mpnet-base-v2" )
from llama_index.core import Settings Settings.embed_model = embed_model
Use in index
index = VectorStoreIndex.from_documents(documents)
Model selection guide
Model Dimensions Speed Quality Use Case
all-MiniLM-L6-v2 384 Fast Good General, prototyping
all-mpnet-base-v2 768 Medium Better Production RAG
all-roberta-large-v1 1024 Slow Best High accuracy needed
paraphrase-multilingual 768 Medium Good Multilingual
Best practices
-
Start with all-MiniLM-L6-v2 - Good baseline
-
Normalize embeddings - Better for cosine similarity
-
Use GPU if available - 10× faster encoding
-
Batch encoding - More efficient
-
Cache embeddings - Expensive to recompute
-
Fine-tune for domain - Improves quality
-
Test different models - Quality varies by task
-
Monitor memory - Large models need more RAM
Performance
Model Speed (sentences/sec) Memory Dimension
MiniLM ~2000 120MB 384
MPNet ~600 420MB 768
RoBERTa ~300 1.3GB 1024
Resources
-
GitHub: https://github.com/UKPLab/sentence-transformers ⭐ 15,700+
-
Docs: https://www.sbert.net
-
License: Apache 2.0