chromadb-integration-skills

ChromaDB Integration Skills

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "chromadb-integration-skills" with this command: npx skills add kimasplund/claude_cognitive_reasoning/kimasplund-claude-cognitive-reasoning-chromadb-integration-skills

ChromaDB Integration Skills

Purpose: This skill teaches agents how to integrate ChromaDB for semantic search, persistent storage, and pattern matching across ANY domain - research, code, trading, legal, documentation, and more.

Critical Use Case: When agents need to work with large datasets (1000+ items), perform semantic search, maintain persistent knowledge, or learn from historical patterns, ChromaDB eliminates token limits and enables powerful vector-based retrieval.

Used By: All agent types - researchers, developers, traders, legal analysts, documentation writers, QA testers, etc.

When to Use ChromaDB Integration

Use ChromaDB when:

  • Large Datasets: Working with 1000+ items (documents, code files, bugs, trades, contracts, etc.)

  • Semantic Search: Finding items by meaning, not just keywords

  • Persistent Memory: Knowledge needs to survive across sessions, days, months

  • Pattern Matching: Identifying similar historical cases/patterns for decision-making

  • Cross-Session Learning: Building institutional knowledge over time

  • Token Limits: Data too large to fit in context window (100K+ tokens)

  • Aggregation: Combining results from multiple queries/sources

Core ChromaDB Concepts

Collections

Definition: Named vector databases storing documents with embeddings and metadata

Naming Strategy:

  • Domain-based: {domain}{purpose}{identifier}

  • Examples:

  • Research: research_prior_art_blockchain_2024 , research_literature_ml_transformers

  • Code: codebase_api_endpoints , codebase_bug_patterns_auth

  • Trading: backtest_results_sma_strategy , market_conditions_spy_2024

  • Legal: case_law_patent_eligibility , contracts_saas_clauses

  • Documentation: api_docs_v2 , architecture_decisions_2024

Documents

Definition: Text content to be searched semantically

Best Practices:

  • Chunk Size: 200-500 words optimal (too small = context loss, too large = poor granularity)

  • Content Format: Title + summary + key details (e.g., "Patent US10123456 - Blockchain Authentication. Abstract: A method for..." ))

  • Deduplication: Use unique IDs to prevent duplicate storage

Metadata

Definition: Structured data for filtering, not semantic search

Strategy:

{ // Temporal filters "date": "2024-11-14", "year": 2024, "month": 11,

// Categorical filters "type": "bug_report", "category": "authentication", "severity": "high",

// Numeric filters "citations": 42, "price": 150.25, "performance_score": 0.87,

// Source tracking "source": "github_issue", "author": "kim-asplund", "url": "https://..." }

Embeddings

Definition: Vector representations enabling semantic similarity

How It Works:

  • ChromaDB automatically generates embeddings from document text

  • Similar meanings → similar vectors → close in vector space

  • Distance metrics (cosine, euclidean) measure similarity

Universal ChromaDB Workflow

Phase 1: Collection Design

// Step 1: Design collection strategy based on agent type const collectionStrategy = { research_agent: "One collection per research topic/question", code_agent: "Collections by codebase module/feature", trading_agent: "Collections by strategy/timeframe/symbol", legal_agent: "Collections by practice area/jurisdiction", documentation_agent: "Collections by project/version" };

// Step 2: Create collection with descriptive metadata mcp__chroma__create_collection({ collection_name: "{domain}{purpose}{identifier}", embedding_function_name: "default", // Uses sentence transformers metadata: { created_date: "2024-11-14", domain: "research|code|trading|legal|docs", purpose: "Descriptive purpose", total_items: 0, // Will update last_updated: "2024-11-14" } });

Phase 2: Data Ingestion

// Step 1: Batch data collection (minimize API calls) const items = collectAllItems(); // From API, files, database, etc.

// Step 2: Transform to ChromaDB format const documents = items.map(item => formatDocument(item)); const ids = items.map(item => item.id || generateUniqueId()); const metadatas = items.map(item => extractMetadata(item));

// Step 3: Batch insert (ChromaDB handles chunking automatically) mcp__chroma__add_documents({ collection_name: collectionName, documents: documents, ids: ids, metadatas: metadatas });

// Step 4: Update collection metadata mcp__chroma__modify_collection({ collection_name: collectionName, new_metadata: { ...existingMetadata, total_items: items.length, last_updated: new Date().toISOString() } });

Phase 3: Semantic Search

// Step 1: Formulate semantic query (natural language works!) const query = "authentication failures in production environment";

// Step 2: Execute semantic search with filters const results = mcp__chroma__query_documents({ collection_name: collectionName, query_texts: [query], n_results: 20, where: { "$and": [ { "environment": "production" }, { "severity": { "$in": ["high", "critical"] } }, { "date": { "$gte": "2024-01-01" } } ] }, include: ["documents", "metadatas", "distances"] });

// Step 3: Filter by semantic similarity (distance threshold) const highlyRelevant = results.ids[0].filter((id, idx) => results.distances[0][idx] < 0.3 // Adjust threshold based on use case );

// Step 4: Retrieve full details if needed const fullDetails = mcp__chroma__get_documents({ collection_name: collectionName, ids: highlyRelevant, include: ["documents", "metadatas"] });

Phase 4: Pattern Matching

// Cross-collection pattern detection const allCollections = mcp__chroma__list_collections(); const relevantCollections = allCollections.filter(c => c.startsWith(collectionPrefix) );

const patterns = []; for (const collection of relevantCollections) { const matches = mcp__chroma__query_documents({ collection_name: collection, query_texts: [patternQuery], n_results: 10, where: { "outcome": "success" } // Only successful cases });

if (matches.ids[0].length > 0) { patterns.push({ collection: collection, matches: matches, success_rate: calculateSuccessRate(matches) }); } }

// Identify best pattern const bestPattern = patterns.sort((a, b) => b.success_rate - a.success_rate )[0];

Use Case Templates

Template 1: Research Agent - Literature Review

Problem: Store 1000+ research papers, find semantically similar work

// Collection: research_literature_{topic} const papers = fetchPapersFromAPI("machine learning transformers");

mcp__chroma__create_collection({ collection_name: "research_literature_ml_transformers", metadata: { topic: "ML Transformers", papers_count: 0 } });

// Store papers with rich metadata papers.forEach(paper => { mcp__chroma__add_documents({ collection_name: "research_literature_ml_transformers", documents: [${paper.title}. ${paper.abstract}], ids: [paper.doi || paper.id], metadatas: [{ title: paper.title, authors: paper.authors.join(", "), year: paper.year, citations: paper.citation_count, venue: paper.venue, url: paper.url }] }); });

// Semantic search: "Find papers about attention mechanisms for vision" const relevant = mcp__chroma__query_documents({ collection_name: "research_literature_ml_transformers", query_texts: ["attention mechanisms computer vision"], n_results: 20, where: { "year": { "$gte": 2020 }, "citations": { "$gte": 50 } } });

Benefits: No token limits, semantic discovery, citation filtering, persistent library

Template 2: Code Agent - Bug Pattern Recognition

Problem: Store bug reports, identify similar issues, suggest solutions

// Collection: codebase_bug_patterns_{module} const bugs = fetchAllGitHubIssues("is:issue label:bug");

mcp__chroma__create_collection({ collection_name: "codebase_bug_patterns_auth", metadata: { module: "authentication", total_bugs: 0 } });

// Store bugs with solutions bugs.forEach(bug => { mcp__chroma__add_documents({ collection_name: "codebase_bug_patterns_auth", documents: [Bug #${bug.number}: ${bug.title}. ${bug.body}], ids: [bug_${bug.number}], metadatas: [{ number: bug.number, title: bug.title, severity: bug.labels.find(l => l.startsWith("severity:"))?.split(":")[1], status: bug.state, solution: bug.resolution || "No solution yet", created_at: bug.created_at, resolved_at: bug.closed_at, url: bug.html_url }] }); });

// New bug arrives - find similar historical bugs const newBugDescription = "User login fails with 401 error after password reset";

const similarBugs = mcp__chroma__query_documents({ collection_name: "codebase_bug_patterns_auth", query_texts: [newBugDescription], n_results: 10, where: { "status": "closed", "solution": { "$ne": "No solution yet" } } });

// Extract solution from most similar resolved bug const suggestedSolution = similarBugs.metadatas[0][0].solution;

Benefits: Instant bug pattern matching, solution reuse, similar issue detection

Template 3: Trading Agent - Backtest Results Database

Problem: Store 10,000+ backtest results, identify optimal parameter patterns

// Collection: backtest_results_{strategy_name} const backtests = runParameterSweep(strategyCode, parameterRanges);

mcp__chroma__create_collection({ collection_name: "backtest_results_sma_crossover", metadata: { strategy: "SMA Crossover", total_backtests: 0 } });

// Store each backtest with parameters + results backtests.forEach(backtest => { const description = SMA Crossover strategy with fast=${backtest.params.fast_period}, slow=${backtest.params.slow_period}, stop_loss=${backtest.params.stop_loss}. Market conditions: ${backtest.market_regime}, volatility=${backtest.avg_volatility}. ;

mcp__chroma__add_documents({ collection_name: "backtest_results_sma_crossover", documents: [description], ids: [backtest_${backtest.id}], metadatas: [{ fast_period: backtest.params.fast_period, slow_period: backtest.params.slow_period, stop_loss: backtest.params.stop_loss, sharpe_ratio: backtest.sharpe_ratio, max_drawdown: backtest.max_drawdown, win_rate: backtest.win_rate, total_return: backtest.total_return, market_regime: backtest.market_regime, symbol: backtest.symbol, timeframe: backtest.timeframe, start_date: backtest.start_date, end_date: backtest.end_date }] }); });

// Find optimal parameters for current market conditions const currentMarket = analyzeCurrentMarket(); const marketDescription = Market regime: ${currentMarket.regime}, volatility: ${currentMarket.volatility}, trend strength: ${currentMarket.trend_strength};

const optimalBacktests = mcp__chroma__query_documents({ collection_name: "backtest_results_sma_crossover", query_texts: [marketDescription], n_results: 20, where: { "$and": [ { "sharpe_ratio": { "$gte": 1.5 } }, { "max_drawdown": { "$lte": -0.15 } }, { "symbol": currentMarket.symbol } ] } });

// Extract best parameter set const bestParams = optimalBacktests.metadatas[0][0];

Benefits: Parameter optimization, market regime matching, performance pattern discovery

Template 4: Documentation Agent - Style Guide Enforcement

Problem: Store API documentation examples, ensure consistent style

// Collection: api_docs_{project_version} const existingDocs = parseAllApiDocs("./docs/api/");

mcp__chroma__create_collection({ collection_name: "api_docs_v2", metadata: { version: "2.0", total_endpoints: 0 } });

// Store documentation with style metadata existingDocs.forEach(doc => { mcp__chroma__add_documents({ collection_name: "api_docs_v2", documents: [doc.fullContent], ids: [doc.endpoint], metadatas: [{ endpoint: doc.endpoint, method: doc.method, category: doc.category, style_score: doc.styleScore, // Computed during ingestion has_examples: doc.examples.length > 0, has_error_codes: doc.errorCodes.length > 0, last_updated: doc.lastModified }] }); });

// New endpoint documented - find similar endpoints for style consistency const newEndpoint = "POST /api/v2/users/{id}/preferences";

const similarEndpoints = mcp__chroma__query_documents({ collection_name: "api_docs_v2", query_texts: [${newEndpoint} user preferences update], n_results: 5, where: { "$and": [ { "method": "POST" }, { "style_score": { "$gte": 0.9 } }, { "has_examples": true } ] } });

// Use similar endpoint as template const template = similarEndpoints.documents[0][0];

Benefits: Style consistency, template discovery, automated quality checks

Template 5: QA Testing Agent - Test Pattern Library

Problem: Store test cases, identify gaps, suggest new tests

// Collection: test_cases_{module} const existingTests = parseTestFiles("./tests/");

mcp__chroma__create_collection({ collection_name: "test_cases_authentication", metadata: { module: "authentication", total_tests: 0 } });

// Store test cases with coverage metadata existingTests.forEach(test => { mcp__chroma__add_documents({ collection_name: "test_cases_authentication", documents: [${test.description}. Covers: ${test.coveredScenarios.join(", ")}], ids: [test.id], metadatas: [{ test_type: test.type, // "unit", "integration", "e2e" file_path: test.filePath, line_number: test.lineNumber, last_run: test.lastRun, status: test.lastStatus, execution_time_ms: test.executionTime, assertions: test.assertionCount }] }); });

// New feature added - identify missing test coverage const newFeature = "Password reset with 2FA verification";

const existingCoverage = mcp__chroma__query_documents({ collection_name: "test_cases_authentication", query_texts: [newFeature], n_results: 10 });

// If distance > 0.5, probably not covered const isCovered = existingCoverage.distances[0][0] < 0.5;

if (!isCovered) { // Suggest test cases based on similar features const similarFeatures = mcp__chroma__query_documents({ collection_name: "test_cases_authentication", query_texts: ["password reset", "2FA verification"], n_results: 5 });

// Use similar tests as templates const testTemplates = similarFeatures.documents[0]; }

Benefits: Coverage gap detection, test template discovery, pattern-based test generation

Advanced Patterns

Pattern 1: Multi-Collection Aggregation

Use Case: Search across multiple related collections simultaneously

// Example: Search all research topics for a cross-cutting concept const researchCollections = mcp__chroma__list_collections(); const topicCollections = researchCollections.filter(c => c.startsWith("research_literature_") );

const crossTopicResults = []; for (const collection of topicCollections) { const results = mcp__chroma__query_documents({ collection_name: collection, query_texts: ["transfer learning"], n_results: 10 });

crossTopicResults.push({ topic: collection.replace("research_literature_", ""), papers: results }); }

// Aggregate and rank by relevance across topics const allPapers = crossTopicResults.flatMap(r => r.papers.ids[0].map((id, idx) => ({ id: id, topic: r.topic, distance: r.papers.distances[0][idx], metadata: r.papers.metadatas[0][idx] })) );

const rankedPapers = allPapers.sort((a, b) => a.distance - b.distance);

Pattern 2: Hierarchical Collections

Use Case: Parent-child relationship between collections

// Parent: codebase_architecture_decisions // Children: codebase_architecture_decisions_{year}

// Create parent collection with aggregated data mcp__chroma__create_collection({ collection_name: "codebase_architecture_decisions", metadata: { type: "parent", child_collections: [] } });

// Create child collections by year [2022, 2023, 2024].forEach(year => { mcp__chroma__create_collection({ collection_name: codebase_architecture_decisions_${year}, metadata: { type: "child", parent: "codebase_architecture_decisions", year } }); });

// Query strategy: Try child first (faster), fallback to parent const queryYear = 2024; let results = mcp__chroma__query_documents({ collection_name: codebase_architecture_decisions_${queryYear}, query_texts: [query], n_results: 10 });

if (results.ids[0].length < 5) { // Not enough results in child, query parent results = mcp__chroma__query_documents({ collection_name: "codebase_architecture_decisions", query_texts: [query], n_results: 10 }); }

Pattern 3: Temporal Decay

Use Case: Prioritize recent items while keeping historical context

// Store items with temporal metadata mcp__chroma__add_documents({ collection_name: collectionName, documents: documents, ids: ids, metadatas: metadatas.map(m => ({ ...m, timestamp: Date.now(), age_days: 0 // Will be updated })) });

// Query with temporal boost const results = mcp__chroma__query_documents({ collection_name: collectionName, query_texts: [query], n_results: 50 // Get more results for re-ranking });

// Re-rank with temporal decay const now = Date.now(); const rankedResults = results.ids[0].map((id, idx) => { const ageDays = (now - results.metadatas[0][idx].timestamp) / (1000 * 60 * 60 * 24); const decayFactor = Math.exp(-ageDays / 30); // Half-life ~30 days const semanticScore = 1 - results.distances[0][idx]; const combinedScore = semanticScore * 0.7 + decayFactor * 0.3;

return { id: id, semantic_score: semanticScore, decay_factor: decayFactor, combined_score: combinedScore, metadata: results.metadatas[0][idx] }; }).sort((a, b) => b.combined_score - a.combined_score);

Performance Optimization

Batching Strategy

// BAD: One document at a time (slow) for (const item of items) { mcp__chroma__add_documents({ collection_name: collectionName, documents: [item.document], ids: [item.id], metadatas: [item.metadata] }); }

// GOOD: Batch insert (100x faster) const BATCH_SIZE = 100; for (let i = 0; i < items.length; i += BATCH_SIZE) { const batch = items.slice(i, i + BATCH_SIZE); mcp__chroma__add_documents({ collection_name: collectionName, documents: batch.map(item => item.document), ids: batch.map(item => item.id), metadatas: batch.map(item => item.metadata) }); }

Caching Strategy

// Check collection exists before creating const existingCollections = mcp__chroma__list_collections(); if (!existingCollections.includes(collectionName)) { mcp__chroma__create_collection({ collection_name: collectionName }); }

// Check document exists before adding const existing = mcp__chroma__get_documents({ collection_name: collectionName, ids: [documentId] });

if (!existing.ids || existing.ids.length === 0) { // Document doesn't exist, add it mcp__chroma__add_documents({ ... }); } else { // Document exists, update instead mcp__chroma__update_documents({ ... }); }

Query Optimization

// Use metadata filters to reduce search space const results = mcp__chroma__query_documents({ collection_name: collectionName, query_texts: [query], n_results: 20, where: { // Pre-filter with metadata (faster than post-filtering semantic results) "date": { "$gte": "2024-01-01" }, "category": { "$in": ["high_priority", "critical"] } } });

// Only include what you need const minimalResults = mcp__chroma__query_documents({ collection_name: collectionName, query_texts: [query], n_results: 10, include: ["metadatas", "distances"] // Exclude documents if not needed });

Success Criteria

ChromaDB integration is SUCCESSFUL when:

  • ✅ Collections Created: Meaningful naming, appropriate metadata

  • ✅ Data Ingested: Batched efficiently, deduplicated

  • ✅ Semantic Search Works: Returns relevant results (distance < 0.4)

  • ✅ Metadata Filters Applied: Correctly scopes search space

  • ✅ Performance Optimized: Batching, caching, minimal queries

  • ✅ Cross-Collection Queries: When appropriate for use case

  • ✅ Persistent Knowledge: Data survives across sessions

  • ✅ Pattern Matching: Identifies similar historical cases

  • ✅ Token Limits Eliminated: Handles 1000+ items without context overflow

Skill Version: 1.0 Created: 2025-11-14 Purpose: Teach universal ChromaDB integration patterns for all agent types Target Quality: 65/70 Dependencies: ChromaDB MCP (mcp__chroma__*) Universal: Works for research, code, trading, legal, documentation, QA, and all other domains

🔴 Error Handling & Resilience (Priority 1)

Critical for Production: Prevent data loss, handle failures gracefully

Retry with Exponential Backoff

async function retryWithBackoff(operation, maxRetries = 3) { for (let attempt = 0; attempt < maxRetries; attempt++) { try { return await operation(); } catch (error) { if (attempt === maxRetries - 1) throw error; const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s await sleep(delay); } } }

Get-or-Create Pattern

function getOrCreateCollection(collectionName, metadata = {}) { const collections = mcp__chroma__list_collections();

if (collections.includes(collectionName)) { return { created: false, collection_name: collectionName }; }

mcp__chroma__create_collection({ collection_name: collectionName, embedding_function_name: "default", metadata: metadata });

return { created: true, collection_name: collectionName }; }

Document Validation

function validateDocument(document, id, metadata) { if (!document || typeof document !== 'string') { throw new Error(Document must be non-empty string); } if (!id || id.includes(' ')) { throw new Error(ID must be non-empty string without spaces); } if (metadata && typeof metadata !== 'object') { throw new Error(Metadata must be object); } }

See full patterns: Load chromadb-error-handling sub-skill

🧪 Testing Patterns (Priority 1)

Essential for Quality: Ensure ChromaDB integrations work correctly

Unit Tests (Mock ChromaDB)

// Mock ChromaDB for fast unit tests class MockChromaDB { constructor() { this.collections = {}; }

create_collection({ collection_name, metadata }) { this.collections[collection_name] = { documents: [], ids: [], metadatas: [], metadata }; }

query_documents({ collection_name, query_texts, n_results }) { const collection = this.collections[collection_name]; return { ids: [collection.ids.slice(0, n_results)], distances: [collection.ids.slice(0, n_results).map(() => 0.2)] }; } }

Integration Tests (Real ChromaDB)

describe('Semantic Search Integration', () => { test('returns relevant documents', async () => { await chromaClient.add({ collection_name: testCollection, documents: [ 'Machine learning uses neural networks', 'Python is a programming language' ], ids: ['doc1', 'doc2'] });

const results = await chromaClient.query({
  collection_name: testCollection,
  query_texts: ['neural networks deep learning'],
  n_results: 2
});

expect(results.ids[0]).toContain('doc1');
expect(results.distances[0][0]).toBeLessThan(0.4);

}); });

See full patterns: Load chromadb-testing-patterns sub-skill

🔐 Security & Privacy (Priority 2)

Critical for Compliance: Protect PII, sanitize data

PII Redaction

function redactPII(text) { let redacted = text;

// Email redaction redacted = redacted.replace( /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]{2,}\b/g, '[EMAIL_REDACTED]' );

// Phone redaction redacted = redacted.replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE_REDACTED]');

// SSN redaction redacted = redacted.replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN_REDACTED]');

return redacted; }

Secret Sanitization

function redactSecrets(text) { let redacted = text;

// GitHub tokens redacted = redacted.replace(/ghp_[a-zA-Z0-9]{36}/g, '[GITHUB_TOKEN]');

// AWS keys redacted = redacted.replace(/AKIA[0-9A-Z]{16}/g, '[AWS_KEY]');

// API keys redacted = redacted.replace( /api[-]?key['"]?\s*[:=]\s*['"]?([a-zA-Z0-9-]{20,})/gi, 'api_key: [REDACTED]' );

return redacted; }

Access Control

const collectionPermissions = { 'research_confidential': ['research_team', 'admin'], 'customer_pii': ['support_team', 'admin'] };

function checkAccess(collectionName, userRole) { const allowedRoles = collectionPermissions[collectionName] || ['admin'];

if (!allowedRoles.includes(userRole)) { throw new Error(Access denied for role '${userRole}'); } }

See full patterns: Load chromadb-security-patterns sub-skill

🗄️ Data Lifecycle Management (Priority 2)

Sustain Production: Version collections, archive old data

Schema Versioning

// Semantic versioning for collections function createVersionedCollection(domain, purpose, version = 'v1') { const collectionName = ${domain}_${purpose}_${version};

mcp__chroma__create_collection({ collection_name: collectionName, metadata: { version: version, created_at: new Date().toISOString(), schema_version: '1.0', retention_days: 730, // 2 years lifecycle_stage: 'active' } });

return collectionName; }

Schema Migration

async function migrateCollectionSchema(oldCollection, newVersion) { const newCollection = ${oldCollection}_v${newVersion};

// Create new collection await mcp__chroma__create_collection({ collection_name: newCollection, metadata: { migrated_from: oldCollection, version: newVersion } });

// Copy all documents with transformed metadata const allDocs = await mcp__chroma__get_documents({ collection_name: oldCollection, limit: 100000 });

const transformedMetadatas = allDocs.metadatas.map(transformMetadata);

await mcp__chroma__add_documents({ collection_name: newCollection, documents: allDocs.documents, ids: allDocs.ids, metadatas: transformedMetadatas });

// Mark old as deprecated await mcp__chroma__modify_collection({ collection_name: oldCollection, new_metadata: { lifecycle_stage: 'deprecated', replacement: newCollection } }); }

Retention Enforcement

async function enforceRetentionPolicies() { const collections = await mcp__chroma__list_collections();

for (const collectionName of collections) { const info = await mcp__chroma__get_collection_info({ collection_name }); const retentionDays = info.metadata.retention_days || 730; const ageDays = calculateAgeDays(info.metadata.created_at);

if (ageDays > retentionDays) {
  await archiveCollection(collectionName);  // Backup first
  await mcp__chroma__delete_collection({ collection_name: collectionName });
}

} }

See full patterns: Load chromadb-lifecycle-management sub-skill

🐛 Debugging & Troubleshooting (Priority 1)

Common Issues

Issue: No results returned

  • Cause: Distance threshold too strict, wrong collection

  • Fix: Increase threshold (0.3 → 0.5), verify collection name

  • Debug: Check results.distances[0] values

Issue: Poor semantic matches

  • Cause: Document chunking too large/small

  • Fix: Optimal chunk size 200-500 words

  • Debug: Review document length, split long documents

Issue: Slow queries

  • Cause: Large collection without metadata filters

  • Fix: Add metadata pre-filters (where clause)

  • Debug: Check collection size, add filters

Distance Threshold Guide

Distance Similarity Use Case

< 0.2 Almost exact Duplicate detection

0.2-0.3 Very similar High precision search

0.3-0.5 Moderately similar Balanced search

0.5-0.7 Weakly similar Broad exploration

0.7 Different topics Not relevant

Antipatterns to Avoid

❌ Storing entire files as single document

  • Loses granularity, poor search relevance

  • ✅ Fix: Chunk into 200-500 word sections

❌ No metadata filters on large collections

  • Slow queries, high latency

  • ✅ Fix: Always filter by date, category, type

❌ Not deduplicating documents

  • Wasted storage, duplicate results

  • ✅ Fix: Check existence before adding

❌ Ignoring connection failures

  • Data loss, silent failures

  • ✅ Fix: Implement retry logic, fallback

📚 Sub-Skill Reference

Load targeted sub-skills for deep dives:

  • chromadb-error-handling: Retry patterns, validation, circuit breakers (~150 lines)

  • chromadb-testing-patterns: Unit/integration tests, mocking, fixtures (~120 lines)

  • chromadb-security-patterns: PII redaction, access control, GDPR compliance (~90 lines)

  • chromadb-lifecycle-management: Versioning, migration, archival, retention (~100 lines)

Usage: Skill({ skill: "chromadb-error-handling" })

Updated Success Criteria

ChromaDB integration is PRODUCTION-READY when:

Core Functionality (Original):

  • ✅ Collections created with meaningful naming

  • ✅ Data ingested efficiently (batching)

  • ✅ Semantic search returns relevant results

  • ✅ Metadata filters applied correctly

  • ✅ Performance optimized

Production Readiness (New):

  • ✅ Error Handling: Retry logic, validation, graceful degradation

  • ✅ Testing: Unit tests (mocked), integration tests (real ChromaDB)

  • ✅ Security: PII redacted, secrets sanitized, access control

  • ✅ Lifecycle: Versioning strategy, retention policies, archival

Quality Score: 65/70 → 85/100 (with all enhancements)

Skill Version: 2.0 Updated: 2025-11-14 Enhancements: Error handling, testing, security, lifecycle management Quality Score: 85/100 (Production-Ready) Dependencies: ChromaDB MCP (mcp__chroma__*) Sub-Skills: 4 modular sub-skills for targeted loading

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

integrated-reasoning

No summary provided by upstream source.

Repository SourceNeeds Review
General

error-handling-skills

No summary provided by upstream source.

Repository SourceNeeds Review
General

document-writing-skills

No summary provided by upstream source.

Repository SourceNeeds Review
General

breadth-of-thought

No summary provided by upstream source.

Repository SourceNeeds Review