Together Embeddings & Reranking
Overview
Generate vector embeddings for text and rerank documents by relevance.
- Embeddings endpoint:
/v1/embeddings - Rerank endpoint:
/v1/rerank
Installation
# Python (recommended)
uv init # optional, if starting a new project
uv add together
# or with pip
pip install together
# TypeScript / JavaScript
npm install together-ai
Set your API key:
export TOGETHER_API_KEY=<your-api-key>
Embeddings
Generate Embeddings
from together import Together
client = Together()
response = client.embeddings.create(
model="BAAI/bge-base-en-v1.5",
input="What is the meaning of life?",
)
print(response.data[0].embedding[:5]) # First 5 dimensions
import Together from "together-ai";
const together = new Together();
const response = await together.embeddings.create({
model: "BAAI/bge-base-en-v1.5",
input: "What is the meaning of life?",
});
console.log(response.data[0].embedding.slice(0, 5));
curl -X POST "https://api.together.xyz/v1/embeddings" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"BAAI/bge-base-en-v1.5","input":"What is the meaning of life?"}'
Batch Embeddings
texts = ["First document", "Second document", "Third document"]
response = client.embeddings.create(
model="BAAI/bge-base-en-v1.5",
input=texts,
)
for i, item in enumerate(response.data):
print(f"Text {i}: {len(item.embedding)} dimensions")
import Together from "together-ai";
const together = new Together();
const response = await together.embeddings.create({
model: "BAAI/bge-base-en-v1.5",
input: [
"First document",
"Second document",
"Third document",
],
});
for (const item of response.data) {
console.log(`Index ${item.index}: ${item.embedding.length} dimensions`);
}
curl -X POST "https://api.together.xyz/v1/embeddings" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "BAAI/bge-base-en-v1.5",
"input": [
"First document",
"Second document",
"Third document"
]
}'
Embedding Models
| Model | API String | Dimensions | Max Input |
|---|---|---|---|
| BGE Base EN v1.5 | BAAI/bge-base-en-v1.5 | 768 | 512 tokens |
| Multilingual E5 Large | intfloat/multilingual-e5-large-instruct | 1024 | 514 tokens (recommended) |
Reranking
Rerank a set of documents by relevance to a query:
response = client.rerank.create(
model="mixedbread-ai/Mxbai-Rerank-Large-V2",
query="What is the capital of France?",
documents=[
"Paris is the capital of France.",
"Berlin is the capital of Germany.",
"London is the capital of England.",
"The Eiffel Tower is in Paris.",
],
)
for result in response.results:
print(f"Index: {result.index}, Score: {result.relevance_score:.4f}")
import Together from "together-ai";
const together = new Together();
const documents = [
"Paris is the capital of France.",
"Berlin is the capital of Germany.",
"London is the capital of England.",
"The Eiffel Tower is in Paris.",
];
const response = await together.rerank.create({
model: "mixedbread-ai/Mxbai-Rerank-Large-V2",
query: "What is the capital of France?",
documents,
top_n: 2,
});
for (const result of response.results) {
console.log(`Index: ${result.index}, Score: ${result.relevance_score}`);
}
curl -X POST "https://api.together.xyz/v1/rerank" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mixedbread-ai/Mxbai-Rerank-Large-V2",
"query": "What is the capital of France?",
"documents": ["Paris is the capital of France.", "Berlin is the capital of Germany."]
}'
Rerank Parameters
| Parameter | Type | Description |
|---|---|---|
model | string | Rerank model (required) |
query | string | Search query (required) |
documents | string[] or object[] | Documents to rerank (required). Pass objects with named fields for structured documents. |
top_n | int | Return top N results |
return_documents | bool | Include document text in response |
rank_fields | string[] | Fields to use for ranking when documents are JSON objects (e.g., ["title", "text"]) |
RAG Pipeline Pattern
# 1. Generate query embedding
query_embedding = client.embeddings.create(
model="BAAI/bge-base-en-v1.5",
input="How does photosynthesis work?",
).data[0].embedding
# 2. Retrieve candidates from vector DB (your code)
candidates = vector_db.search(query_embedding, top_k=20)
# 3. Rerank for precision
reranked = client.rerank.create(
model="mixedbread-ai/Mxbai-Rerank-Large-V2",
query="How does photosynthesis work?",
documents=[c.text for c in candidates],
top_n=5,
)
# 4. Use top results as context for LLM
context = "\n".join([candidates[r.index].text for r in reranked.results])
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[
{"role": "system", "content": f"Answer based on this context:\n{context}"},
{"role": "user", "content": "How does photosynthesis work?"},
],
)
Resources
- Model details: See references/models.md
- Runnable script: See scripts/embed_and_rerank.py — embed, compute similarity, and rerank pipeline (v2 SDK)
- Runnable script (TypeScript): See scripts/embed_and_rerank.ts — minimal OpenAPI
x-codeSamplesextraction for embeddings + rerank (TypeScript SDK) - Official docs: Embeddings Overview
- Official docs: Rerank Overview
- API reference: Embeddings API
- API reference: Rerank API