Free Search Aggregator
Reliable, provider-diverse web search for OpenClaw with high uptime + low operator overhead.
Why use this skill
- 12 search providers, 6 requiring no API key at all
- Automatic failover: if one provider fails, the next is tried instantly
- Quota-aware: tracks daily usage, warns at 80%, skips exhausted providers
- Task search mode for multi-angle research queries
- Built-in storage lifecycle (cache / index / report), no workspace clutter
- Self-healing: health-based smart routing automatically promotes reliable providers
- Quality optimization: relevance scoring, fuzzy dedup, domain diversity, re-ranking
- Auto-discovery: probes candidate search engines and SearXNG instances for new sources
- Self-diagnostic:
doctorandsetupcommands for zero-friction onboarding
Provider Overview
| Provider | Key Required | Free Quota | Index Source | Notes |
|---|---|---|---|---|
brave | BRAVE_API_KEY | 2000/day | Brave independent | High quality, privacy-friendly |
exa | EXA_API_KEY | ~33/day (1k/mo) | Neural + web | Semantic search, unique finds |
tavily | TAVILY_API_KEY | 1000/day | Web (AI-optimized) | Designed for AI agents |
duckduckgo | None | ~500/day | Bing + own | No key, privacy-focused |
bing_html | None | ~300/day | Microsoft Bing RSS | No key, stable XML feed |
mojeek | None (or MOJEEK_API_KEY) | 200/day | Mojeek independent | Non-Google/Bing index |
serper | SERPER_API_KEY | 2500/day | High quota free tier | |
searchapi | SEARCHAPI_API_KEY | 100/mo | Google / Bing | Multi-engine |
google_cse | GOOGLE_API_KEY + GOOGLE_CX | 100/day | Official Google API | |
baidu | BAIDU_API_KEY | 200/day | Baidu | Best for Chinese content |
wikipedia | None | 1000/day | Wikipedia | Factual/encyclopedic queries |
searxng | None | unlimited (self-hosted) | Meta (all engines) | Requires own instance |
Total daily quota (all keys configured): 8400+ requests/day
Credential model (important)
- No mandatory API key — DuckDuckGo + Bing RSS + Mojeek + Wikipedia work out of the box.
- API-key providers fail gracefully if key is missing (AuthError → skip, no quota consumed, no latency):
BRAVE_API_KEYEXA_API_KEYTAVILY_API_KEYSERPER_API_KEYSEARCHAPI_API_KEYGOOGLE_API_KEY+GOOGLE_CXBAIDU_API_KEYMOJEEK_API_KEY(optional — without it uses HTML scraping)
Core capabilities
1. Search failover
Default provider order:
brave → exa → tavily → duckduckgo → bing_html → mojeek → serper → searchapi → google_cse → baidu → wikipedia
First successful non-empty result returns immediately.
2. Task-level multi-query search
- Expands one goal into multiple targeted queries
- Aggregates + deduplicates results
- Prefix presets:
- default:
workers=1 @dual ...→workers=2@deep ...→workers=3+ deeper query coverage
- default:
3. Quota intelligence
- Per-provider daily tracking
- Real quota retrieval where supported (Tavily, SearchAPI, Brave via probe)
- Auto concurrency reduction at 80% quota saturation
4. Provider health monitoring
- Tracks success rate, latency, and error types per provider over time
- Computes health scores (success 50%, latency 30%, freshness 20%)
- Smart ordering: auto-promotes healthy providers, demotes degraded ones
- View dashboard:
python -m free_search health
5. Result quality optimization
- Relevance scoring (query-title-snippet token overlap)
- Enhanced dedup: URL + title similarity (Jaccard threshold)
- Domain diversity: limits same-domain results (default max 3)
- Automatic filtering of low-quality results (short titles, missing URLs)
6. Source auto-discovery
- Probes all configured providers for availability
- Scans candidate search engines (Marginalia, Wiby, public SearXNG instances)
- Validates response format, latency, and result quality
- Generates recommendations for new sources to integrate
- Run:
python -m free_search discover
7. Managed persistence
memory/search-cache/YYYY-MM-DD/*.jsonmemory/search-index/search-index.jsonlmemory/search-reports/YYYY-MM-DD/*.md
Quick commands
# Normal search
scripts/search "latest AI agent frameworks 2026" --max-results 5
# Task search (multi-query, parallel)
scripts/search task "@dual Compare Claude vs GPT-4 for code generation" --max-results 5
# Deep research mode
scripts/search task "@deep autonomous vehicle safety 2026" --max-results 8 --max-queries 10
# Quota status
scripts/status
# Real quota from provider APIs
scripts/remaining --real
# Cleanup cache
python3 -m free_search gc --cache-days 14
# Provider health dashboard
python3 -m free_search health
# Discover new search sources
python3 -m free_search discover
# System diagnostics
python3 -m free_search doctor
# Setup status & recommendations
python3 -m free_search setup
Provider setup guides
Bing RSS (bing_html) — No key needed
Uses Bing's built-in RSS endpoint (format=rss) — bypasses bot detection. Works out of the box.
Mojeek — No key needed (API key optional)
Out-of-the-box HTML scraping. For higher quotas/stability:
- Register at https://www.mojeek.com/services/search/api/
- Set
MOJEEK_API_KEY→ automatically switches to JSON API mode
Wikipedia — No key needed
Multilingual support — change lang in providers.yaml:
wikipedia:
lang: it # en | zh | it | de | fr | ja ...
Exa.ai — API key required
- Register at https://exa.ai/
- Set
EXA_API_KEY - Free tier: 1000 searches/month (~33/day)
Google Custom Search — API key + CX required
- Get API key: https://developers.google.com/custom-search/v1/introduction
- Create search engine: https://programmablesearchengine.google.com/
- Set
GOOGLE_API_KEYandGOOGLE_CX - Free tier: 100 queries/day
Baidu Qianfan — API key required
- Register at https://cloud.baidu.com/
- Set
BAIDU_API_KEY - Best for Chinese-language content
SearXNG — Self-hosted instance required
Public instances rate-limit server-to-server requests. Use your own:
docker run -d -p 8080:8080 searxng/searxng
Then in providers.yaml:
searxng:
endpoint: http://localhost:8080
enabled: true
Post-install self-check
# 1) Confirm provider load
scripts/status --compact
# 2) Smoke test (uses duckduckgo/bing/mojeek out of the box)
scripts/search "openclaw" --max-results 3 --compact
# 3) Verify storage paths
ls -la /home/openclaw/.openclaw/workspace/memory/search-cache/ | tail -n 5
# 4) Check real quota (optional)
scripts/remaining --real --compact
Output contract (stable)
- Search:
query,provider,results[],meta.attempted,meta.quota - Task search:
task,queries[],grouped_results[],merged_results[],meta - Quota:
date,providers[],totals; with--real:real_quota.providers[]
Operator notes
- Default mode:
workers=1— conservative for cost control - Use
@dual/@deeponly for research tasks SearXNGandYaCyareenabled: falseby default (self-hosted only)MOJEEK_API_KEYis optional — provider gracefully falls back to HTML scraping- Provider health data stored in
memory/provider-health/health.jsonl - Discovery results stored in
memory/provider-discovery/discovery.jsonl - Run
python -m free_search doctorafter setup to verify everything works - Run
python -m free_search discoverperiodically to find new search sources