semantic-search

Semantic Search Technique

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "semantic-search" with this command: npx skills add pearlthoughts/codecompass/pearlthoughts-codecompass-semantic-search

Semantic Search Technique

Purpose

Decision framework and execution guide for using semantic search effectively in CodeCompass.

When to Use Semantic Search

✅ Use Semantic Search When

  1. Concept-based Queries
  • "Find code that handles payment processing"

  • "Where do we validate email addresses?"

  • "Show me error handling patterns"

  1. Intent-based Queries
  • "How is user authentication implemented?"

  • "What code calculates shipping costs?"

  • "Find business rules for order approval"

  1. Cross-language/Cross-file
  • Searching across PHP, TypeScript, config files

  • Pattern discovery across multiple modules

  • Finding similar implementations

  1. Fuzzy/Exploratory
  • User doesn't know exact class/function names

  • Exploring unfamiliar codebase

  • "Code that does something like X"

  1. Natural Language
  • "Show me all database migrations"

  • "Find controllers that handle file uploads"

  • "Where are API rate limits defined?"

❌ Use Grep/Glob When

  1. Exact Matches
  • "Find class named PaymentController "

  • "Where is processPayment function defined?"

  • "Find all imports of UserService "

  1. Syntax Patterns
  • "Find all functions starting with get "

  • "Show me all @Injectable() decorators"

  • "Find TypeScript interfaces"

  1. Performance Critical
  • Quick lookups in known files

  • Repeated searches in tight loops

  • When you know exact location

  1. Structural Queries
  • "Find all .ts files in src/modules "

  • "List all test files"

  • "Show directory structure"

Execution Guide

Step 1: Formulate Effective Query

❌ Bad Queries (too vague):

  • "payment"

  • "code"

  • "function"

✅ Good Queries (specific context):

  • "business logic for processing customer payments and updating order status"

  • "validation rules for user email and password requirements"

  • "error handling patterns for database connection failures"

Why: More context = better semantic matching

Formula:

[Action/Purpose] for [Specific Entity] with [Context/Constraints]

Examples:

  • "Extract business capabilities from Yii2 controllers"

  • "Validation logic for user registration with email verification"

  • "Database migration patterns for schema versioning"

Step 2: Verify Indexing

Before searching, ensure codebase is indexed:

Check if indexed

curl http://localhost:8081/v1/schema

Should show collections like:

- CodeContext

- AtlasCode

If not indexed:

codecompass batch:index <path-to-codebase>

Step 3: Execute Search

codecompass search:semantic "business logic for payment processing"

Alternative (if using as library):

const results = await searchService.semanticSearch({ query: "business logic for payment processing", limit: 10, certainty: 0.7 // Minimum relevance score });

Step 4: Interpret Results

Check relevance scores:

  • 0.8: Highly relevant (exact match)

  • 0.7-0.8: Good match (related)

  • 0.6-0.7: Moderate match (possibly relevant)

  • <0.6: Weak match (may be noise)

Verify context:

  • Does the returned code actually match intent?

  • Are results from expected modules?

  • Multiple related files found (good signal)

  • Or isolated random matches (refine query)

Step 5: Refine if Needed

Too many results (>50):

  • Add more specific context to query

  • Increase certainty threshold

  • Add domain constraints ("in authentication module")

Too few results (<3):

  • Broaden query (less specific)

  • Lower certainty threshold

  • Check if area is actually indexed

  • Try related terms/synonyms

Wrong results:

  • Rephrase query with different terminology

  • Add negative constraints

  • Try breaking into multiple specific queries

Behind the Scenes

Architecture

Query Text ↓ Ollama Embedding (mxbai-embed-large) ↓ 1024-dimensional vector ↓ Weaviate Vector Search (cosine similarity) ↓ Ranked Results

Key Components

From .ai/capabilities.json :

  • Module: search , vectorizer , weaviate

  • Embedding: Ollama mxbai-embed-large (1024 dimensions)

  • Vector DB: Weaviate with HNSW indexing

  • Collections: CodeContext , AtlasCode

Configuration (from .env ):

EMBEDDING_SERVICE=ollama OLLAMA_EMBEDDING_MODEL=mxbai-embed-large OLLAMA_URL=http://localhost:11434 CODECOMPASS_WEAVIATE_URL=http://localhost:8081

Advanced Patterns

Pattern 1: Multi-Query Exploration

For complex questions, break into multiple searches:

Instead of:

"authentication and authorization and session management"

Do:

codecompass search:semantic "user authentication login process" codecompass search:semantic "authorization and access control" codecompass search:semantic "session management and tokens"

Pattern 2: Iterative Refinement

1. Broad search

codecompass search:semantic "payment processing"

2. Review results, identify specific module

3. Narrow search

codecompass search:semantic "payment gateway integration in PaymentController"

4. Pinpoint implementation

codecompass search:semantic "Stripe API call for processing credit cards"

Pattern 3: Cross-Domain Search

Search across different aspects:

Code implementation

codecompass search:semantic "email validation logic"

Tests

codecompass search:semantic "test cases for email validation"

Configuration

codecompass search:semantic "email service configuration"

Common Pitfalls

❌ Pitfall 1: Searching Before Indexing

Symptom: No results or error Solution: Run codecompass batch:index first

❌ Pitfall 2: Too Vague Queries

Symptom: Returns everything or nothing useful Solution: Add specific context and intent

❌ Pitfall 3: Expecting Exact Matches

Symptom: "Why didn't it find function processPayment ?" Reason: Semantic search is for concepts, not exact names Solution: Use grep for exact matches

❌ Pitfall 4: Ignoring Relevance Scores

Symptom: Reading irrelevant results Solution: Filter by score >0.7, ignore weak matches

❌ Pitfall 5: Single Query for Complex Questions

Symptom: Poor results for multi-faceted questions Solution: Break into multiple targeted queries

Decision Tree

┌─────────────────────────────────────┐ │ I need to find code that... │ └─────────────────────────────────────┘ ↓ ┌─────────┐ │ Know │ Exact class/function name? │ exact │ │ name? │ └─────────┘ ↙ ↘ YES NO ↓ ↓ Use Grep ┌─────────┐ │ Concept │ Searching by meaning/purpose? │ search? │ └─────────┘ ↙ ↘ YES NO ↓ ↓ Semantic ┌─────────┐ Search │ Pattern │ Looking for code pattern? │ match? │ └─────────┘ ↙ ↘ YES NO ↓ ↓ Use Glob Use both (Glob + Semantic)

Performance Considerations

Speed

  • Grep: Milliseconds (fast, synchronous)

  • Semantic Search: 100-500ms (embedding + vector search)

Tradeoff: Semantic is slower but finds conceptually related code

Token Cost (Embeddings)

  • Each query → 1 embedding generation

  • Ollama local → No API cost

  • But consumes local compute

Scaling

  • Small codebase (<1K files): Either method fine

  • Medium codebase (1K-10K files): Semantic search advantage grows

  • Large codebase (>10K files): Semantic search essential

Integration with Other Tools

With Yii2 Analysis

1. Analyze Yii2 project

codecompass analyze:yii2 <path>

2. Index results

codecompass batch:index <path>

3. Explore with semantic search

codecompass search:semantic "Yii2 controller actions for user management"

With Requirements Extraction

1. Extract requirements

codecompass requirements:extract

2. Search extracted requirements

codecompass search:semantic "business rules for order validation"

With Weaviate Direct Query

Alternative: Query Weaviate GraphQL API directly

curl -X POST http://localhost:8081/v1/graphql
-H "Content-Type: application/json"
-d '{ "query": "{ Get { CodeContext( nearText: { concepts: ["payment processing"] } limit: 10 ) { content filePath } } }" }'

Related Skills

  • 0-discover-capabilities.md

  • How to discover modules

  • analyze-yii2-project.md

  • Uses semantic search in workflow

Related Modules

From .ai/capabilities.json :

  • search

  • SearchService, IntegratedSearchService

  • vectorizer

  • Ollama embedding generation

  • weaviate

  • Vector database client

  • indexing

  • File indexing pipeline

Remember: Semantic search finds code by meaning, not by name. Choose the right tool for the job.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

analyze-yii2-project

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

extract-requirements

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

discover-capabilities

No summary provided by upstream source.

Repository SourceNeeds Review