Semantic Search Technique
Purpose
Decision framework and execution guide for using semantic search effectively in CodeCompass.
When to Use Semantic Search
✅ Use Semantic Search When
- Concept-based Queries
-
"Find code that handles payment processing"
-
"Where do we validate email addresses?"
-
"Show me error handling patterns"
- Intent-based Queries
-
"How is user authentication implemented?"
-
"What code calculates shipping costs?"
-
"Find business rules for order approval"
- Cross-language/Cross-file
-
Searching across PHP, TypeScript, config files
-
Pattern discovery across multiple modules
-
Finding similar implementations
- Fuzzy/Exploratory
-
User doesn't know exact class/function names
-
Exploring unfamiliar codebase
-
"Code that does something like X"
- Natural Language
-
"Show me all database migrations"
-
"Find controllers that handle file uploads"
-
"Where are API rate limits defined?"
❌ Use Grep/Glob When
- Exact Matches
-
"Find class named PaymentController "
-
"Where is processPayment function defined?"
-
"Find all imports of UserService "
- Syntax Patterns
-
"Find all functions starting with get "
-
"Show me all @Injectable() decorators"
-
"Find TypeScript interfaces"
- Performance Critical
-
Quick lookups in known files
-
Repeated searches in tight loops
-
When you know exact location
- Structural Queries
-
"Find all .ts files in src/modules "
-
"List all test files"
-
"Show directory structure"
Execution Guide
Step 1: Formulate Effective Query
❌ Bad Queries (too vague):
-
"payment"
-
"code"
-
"function"
✅ Good Queries (specific context):
-
"business logic for processing customer payments and updating order status"
-
"validation rules for user email and password requirements"
-
"error handling patterns for database connection failures"
Why: More context = better semantic matching
Formula:
[Action/Purpose] for [Specific Entity] with [Context/Constraints]
Examples:
-
"Extract business capabilities from Yii2 controllers"
-
"Validation logic for user registration with email verification"
-
"Database migration patterns for schema versioning"
Step 2: Verify Indexing
Before searching, ensure codebase is indexed:
Check if indexed
curl http://localhost:8081/v1/schema
Should show collections like:
- CodeContext
- AtlasCode
If not indexed:
codecompass batch:index <path-to-codebase>
Step 3: Execute Search
codecompass search:semantic "business logic for payment processing"
Alternative (if using as library):
const results = await searchService.semanticSearch({ query: "business logic for payment processing", limit: 10, certainty: 0.7 // Minimum relevance score });
Step 4: Interpret Results
Check relevance scores:
-
0.8: Highly relevant (exact match)
-
0.7-0.8: Good match (related)
-
0.6-0.7: Moderate match (possibly relevant)
-
<0.6: Weak match (may be noise)
Verify context:
-
Does the returned code actually match intent?
-
Are results from expected modules?
-
Multiple related files found (good signal)
-
Or isolated random matches (refine query)
Step 5: Refine if Needed
Too many results (>50):
-
Add more specific context to query
-
Increase certainty threshold
-
Add domain constraints ("in authentication module")
Too few results (<3):
-
Broaden query (less specific)
-
Lower certainty threshold
-
Check if area is actually indexed
-
Try related terms/synonyms
Wrong results:
-
Rephrase query with different terminology
-
Add negative constraints
-
Try breaking into multiple specific queries
Behind the Scenes
Architecture
Query Text ↓ Ollama Embedding (mxbai-embed-large) ↓ 1024-dimensional vector ↓ Weaviate Vector Search (cosine similarity) ↓ Ranked Results
Key Components
From .ai/capabilities.json :
-
Module: search , vectorizer , weaviate
-
Embedding: Ollama mxbai-embed-large (1024 dimensions)
-
Vector DB: Weaviate with HNSW indexing
-
Collections: CodeContext , AtlasCode
Configuration (from .env ):
EMBEDDING_SERVICE=ollama OLLAMA_EMBEDDING_MODEL=mxbai-embed-large OLLAMA_URL=http://localhost:11434 CODECOMPASS_WEAVIATE_URL=http://localhost:8081
Advanced Patterns
Pattern 1: Multi-Query Exploration
For complex questions, break into multiple searches:
Instead of:
"authentication and authorization and session management"
Do:
codecompass search:semantic "user authentication login process" codecompass search:semantic "authorization and access control" codecompass search:semantic "session management and tokens"
Pattern 2: Iterative Refinement
1. Broad search
codecompass search:semantic "payment processing"
2. Review results, identify specific module
3. Narrow search
codecompass search:semantic "payment gateway integration in PaymentController"
4. Pinpoint implementation
codecompass search:semantic "Stripe API call for processing credit cards"
Pattern 3: Cross-Domain Search
Search across different aspects:
Code implementation
codecompass search:semantic "email validation logic"
Tests
codecompass search:semantic "test cases for email validation"
Configuration
codecompass search:semantic "email service configuration"
Common Pitfalls
❌ Pitfall 1: Searching Before Indexing
Symptom: No results or error Solution: Run codecompass batch:index first
❌ Pitfall 2: Too Vague Queries
Symptom: Returns everything or nothing useful Solution: Add specific context and intent
❌ Pitfall 3: Expecting Exact Matches
Symptom: "Why didn't it find function processPayment ?" Reason: Semantic search is for concepts, not exact names Solution: Use grep for exact matches
❌ Pitfall 4: Ignoring Relevance Scores
Symptom: Reading irrelevant results Solution: Filter by score >0.7, ignore weak matches
❌ Pitfall 5: Single Query for Complex Questions
Symptom: Poor results for multi-faceted questions Solution: Break into multiple targeted queries
Decision Tree
┌─────────────────────────────────────┐ │ I need to find code that... │ └─────────────────────────────────────┘ ↓ ┌─────────┐ │ Know │ Exact class/function name? │ exact │ │ name? │ └─────────┘ ↙ ↘ YES NO ↓ ↓ Use Grep ┌─────────┐ │ Concept │ Searching by meaning/purpose? │ search? │ └─────────┘ ↙ ↘ YES NO ↓ ↓ Semantic ┌─────────┐ Search │ Pattern │ Looking for code pattern? │ match? │ └─────────┘ ↙ ↘ YES NO ↓ ↓ Use Glob Use both (Glob + Semantic)
Performance Considerations
Speed
-
Grep: Milliseconds (fast, synchronous)
-
Semantic Search: 100-500ms (embedding + vector search)
Tradeoff: Semantic is slower but finds conceptually related code
Token Cost (Embeddings)
-
Each query → 1 embedding generation
-
Ollama local → No API cost
-
But consumes local compute
Scaling
-
Small codebase (<1K files): Either method fine
-
Medium codebase (1K-10K files): Semantic search advantage grows
-
Large codebase (>10K files): Semantic search essential
Integration with Other Tools
With Yii2 Analysis
1. Analyze Yii2 project
codecompass analyze:yii2 <path>
2. Index results
codecompass batch:index <path>
3. Explore with semantic search
codecompass search:semantic "Yii2 controller actions for user management"
With Requirements Extraction
1. Extract requirements
codecompass requirements:extract
2. Search extracted requirements
codecompass search:semantic "business rules for order validation"
With Weaviate Direct Query
Alternative: Query Weaviate GraphQL API directly
curl -X POST http://localhost:8081/v1/graphql
-H "Content-Type: application/json"
-d '{
"query": "{
Get {
CodeContext(
nearText: { concepts: ["payment processing"] }
limit: 10
) {
content
filePath
}
}
}"
}'
Related Skills
-
0-discover-capabilities.md
-
How to discover modules
-
analyze-yii2-project.md
-
Uses semantic search in workflow
Related Modules
From .ai/capabilities.json :
-
search
-
SearchService, IntegratedSearchService
-
vectorizer
-
Ollama embedding generation
-
weaviate
-
Vector database client
-
indexing
-
File indexing pipeline
Remember: Semantic search finds code by meaning, not by name. Choose the right tool for the job.