Web Research Skill

Combines WebSearch with automatic Qdrant storage to build a searchable knowledge base.

Workflow

Check Qdrant first → qdrant-find for existing knowledge
Search if needed → WebSearch for current information
Store valuable finds → qdrant-store with rich metadata
Return synthesized → Combine stored + new knowledge

Step 1: Check Existing Knowledge

Before searching the web, check if the answer already exists:

Tool: qdrant-find Query: "<user's question or topic>"

If sufficient information exists with recent harvested_at , use it directly.

Step 2: Web Search

When stored knowledge is insufficient or stale:

Tool: WebSearch Query: "<refined search query>"

Step 3: Store Results

After getting valuable results, store with rich metadata:

Tool: qdrant-store Information: |

<Topic/Question>

Key Findings

Finding 1
Finding 2

Details

Sources

Title

Metadata:

Required fields

source: "web_search" content_type: "text" harvested_at: "2025-01-04T10:30:00Z"

Search context

query: "<original search query>" urls: ["https://example.com/1", "https://example.com/2"]

Classification (for filtering)

category: "technology" subcategory: "databases" type: "documentation"

Technical context (when applicable)

language: "python" framework: "fastapi" version: "0.100+"

Quality signals

confidence: "high" freshness: "current"

Relationships

related_topics: ["vector-search", "embeddings", "rag"] project: "reflex"

Rich Metadata Schema

Required Fields

Field Type Description

source string Origin: web_search , api_docs , github , manual

content_type string text , code , image , video_transcript

harvested_at string ISO 8601 timestamp

Search Context

Field Type Description

query string Original search query

urls array Source URLs (array for proper filtering)

domain string Primary domain (e.g., github.com )

Classification (Enables Filtering)

Field Type Values

category string technology , business , science , design , security , devops

subcategory string More specific: databases , frontend , ml , networking

type string documentation , tutorial , troubleshooting , reference , comparison , news

Technical Context

Field Type Description

language string Programming language: python , typescript , rust , go

framework string Framework/library: fastapi , react , tokio

version string Version constraint: 3.12+ , >=2.0 , latest

platform string linux , macos , windows , docker , kubernetes

Quality Signals

Field Type Values

confidence string high , medium , low

how reliable is this info

freshness string current , recent , dated , historical

depth string overview , detailed , comprehensive

Relationships

Field Type Description

related_topics array Related concepts for discovery

project string Associated project name

supersedes string ID of entry this replaces

parent_topic string Broader topic this belongs to

Image References (URL only, no download)

Field Type Description

image_url string URL to the image

alt_text string Image description

image_type string photo , diagram , screenshot , chart , icon

Filtering Examples

Find Python documentation:

qdrant-find with filter: category: "technology" language: "python" type: "documentation"

Find recent troubleshooting:

qdrant-find with filter: type: "troubleshooting" freshness: "current"

Find project-specific knowledge:

qdrant-find with filter: project: "reflex"

When to Store

Always store:

Technical documentation findings
API patterns and examples
Error solutions and workarounds
Best practices and recommendations
Tool comparisons and evaluations

Skip storing:

Simple factual lookups (dates, definitions)
Ephemeral information (current weather, stock prices)
Information already in Qdrant with same content

Example: Full Research Flow

User asks: "How do I set up GitHub Actions for Python testing?"

Step 1: Check existing

qdrant-find: "GitHub Actions Python testing setup" → No relevant results

Step 2: Search

WebSearch: "GitHub Actions Python pytest workflow 2025" → Returns results with workflow examples

Step 3: Store

qdrant-store: Information: | # GitHub Actions Python Testing Setup

## Key Findings
- Use `actions/setup-python@v5` for Python environment
- Matrix testing across Python versions: 3.9, 3.10, 3.11, 3.12
- pytest with coverage using `pytest-cov`

## Workflow Template
```yaml
name: Python Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.9", "3.10", "3.11", "3.12"]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
      - run: pip install -e .[test]
      - run: pytest --cov
```

## Sources
- [GitHub Actions Python Guide](https://docs.github.com/en/actions/...)

Metadata: source: "web_search" content_type: "code" harvested_at: "2025-01-04T10:30:00Z" query: "GitHub Actions Python pytest workflow 2025" urls: ["https://docs.github.com/en/actions/..."] domain: "github.com" category: "technology" subcategory: "ci-cd" type: "documentation" language: "python" framework: "pytest" platform: "github-actions" confidence: "high" freshness: "current" depth: "detailed" related_topics: ["testing", "ci-cd", "yaml", "github"]

Integration with Other Skills

research-patterns: Use web-research for external searches
qdrant-patterns: Follows same metadata conventions
knowledge-ingestion-patterns: Compatible chunking approach
github-harvester: Similar metadata schema for GitHub content

web-research

Safety Notice

Copy this and send it to your AI assistant to learn

<Topic/Question>

Key Findings

Details

Sources

Required fields

Search context

Classification (for filtering)

Technical context (when applicable)

Quality signals

Relationships

Step 1: Check existing

Step 2: Search

Step 3: Store

Source Transparency

Related Skills

knowledge-ingestion-patterns

research-patterns

ffmpeg-patterns