Deep Research

Systematic, evidence-based research that produces cited, source-backed reports. Uses the filesystem as working memory -- scraped content, extracted notes, and source metadata are saved to disk rather than held in context. This keeps the context window lean and makes research resumable, searchable, and reusable.

Architecture

research/{topic-slug}/ plan.md # Research plan with sub-questions sources-index.md # URL registry: what's been scraped, metadata sources/ # Raw scraped content (one file per source) 001-source-slug.md 002-source-slug.md ... notes/ # Extracted findings per sub-question question-1.md question-2.md ... report.md # Final compiled report

Core principle: Write to disk aggressively, read back selectively. Never hold raw scraped content in context longer than it takes to extract findings. The LLM context should contain only the current working set -- not the entire research corpus.

Research Parameters

Breadth (default: 3): Parallel search queries per sub-question. Range: 2-6.
Depth (default: 2): Recursive deepening rounds. Range: 1-4.

Estimate: breadth=3, depth=2 -> ~12-20 web searches, ~8-15 pages read, 3-8 minute runtime.

For quick fact-checks: breadth=2, depth=1. For exhaustive reviews: breadth=5, depth=3.

Step-by-Step Workflow

Phase 0: Initialize

Create the research directory:

mkdir -p research/{topic-slug}/sources research/{topic-slug}/notes

Search local filesystem first. Before any web search, check if relevant content already exists:

Past research in research/ directories in the working directory
Relevant files in the working directory or project
Use grep , glob , or semantic search (lss ) as appropriate for the task
If prior research exists on this topic, read it and build on it -- don't start from scratch

Initialize sources-index.md :

Sources Index

#	URL	Title	Date	Type	Credibility	File

Phase 1: Planning

Analyze the query. Identify:

Core research question(s)
Required evidence types (empirical data, expert opinion, primary sources, statistics)
Potential biases to watch for

Decompose into 2-5 independently researchable sub-questions.

Write plan.md with the sub-questions, search strategy, and scope decisions.

Create a todo list tracking each sub-question.

Phase 2: Search-Read-Extract Loop

For each sub-question, execute this loop:

2a. Generate Search Queries

Generate breadth distinct queries. Vary:

Phrasing (synonyms, technical vs. lay terms)
Angle (supporting evidence, counter-evidence, meta-analyses)
Source type (academic site:scholar.google.com , government site:gov , news, industry)
Recency (add year filters for time-sensitive topics)

Batch search with search_depth: "advanced" :

web-search("query1 ||| query2 ||| query3", search_depth="advanced")

2b. Read and Extract

For each promising search result:

Check sources-index.md first. If the URL is already scraped, skip it. Do not re-scrape pages already processed in this session.

Scrape the page. Batch URLs for efficiency:

scrape-webpage("url1, url2, url3")

Save raw content to disk immediately:

Write scraped content to sources/NNN-slug.md

Include a header with the URL, title, and scrape date. This gets the raw content out of your context.

Extract key findings from the scraped content and append to notes/question-N.md :

Key claims and findings (with brief quotes when important)
Data points: numbers, statistics, dates
The source number (for citation mapping later)
Methodology notes if relevant (sample size, study design)

Update sources-index.md with the new source entry:

Free your context. After extracting findings and saving to disk, you do not need to retain the raw scraped content. Move on to the next source.

2c. Deepen (Recursive)

After processing results for a sub-question:

Read notes/question-N.md to assess coverage.
Identify gaps: what remains unanswered? What claims lack corroboration?
Generate follow-up questions based on findings.
If depth > 0 : recurse with follow-up questions at depth - 1 and ceil(breadth / 2) .
If depth == 0 : stop, move to next sub-question.

Phase 3: Synthesis

After all sub-questions are researched:

Read all notes/*.md files (not raw sources -- the notes are already distilled).
Cross-reference claims -- flag where sources agree vs. contradict.
Resolve contradictions by source quality:
Peer-reviewed > government data > industry reports > news > blogs
Meta-analyses > individual studies
Multiple independent sources > single source
Identify consensus vs. uncertainty. Be explicit about what is well-established vs. debated vs. unknown.
Check for bias: Did search results skew toward one viewpoint? Are key perspectives missing?

Phase 4: Report Generation

Compile report.md using findings from notes and metadata from sources-index.md .

[Research Title]

Executive Summary

[2-3 paragraph overview of key findings and conclusions]

Key Findings

[Finding 1 Title]

[Discussion with inline citations [1][2]]

[Finding 2 Title]

[Discussion with inline citations [3][4]]

Analysis

[Cross-cutting patterns, contradictions, consensus areas]

Limitations & Caveats

[What this research couldn't determine, methodological limitations]

Conclusions

[Evidence-based conclusions with confidence levels]

Sources

[1] Author (Date). "Title." Publication. URL [2] Author (Date). "Title." Publication. URL

Build citations from sources-index.md at report time. During the search phase, you only needed to track {url, title, source_number} . Now format the full bibliography by reading back source metadata from the index and source files as needed.

Phase 5: Finalize

Save report.md in the research directory.
Report to user: main conclusions (2-3 sentences), number of sources, confidence levels, file path.

Citation Rules

Every factual claim must have a citation. No uncited assertions of fact.
Use numbered inline citations [1] , [2] that map to the Sources section.
Never fabricate sources. Every URL in Sources must be a real page you actually visited and read.
Quote directly when the exact wording matters. Use > blockquote for direct quotes.
Distinguish levels of evidence:
"Studies show..." (cite the studies)
"According to [Source]..." (single source, acknowledge it)
"The evidence suggests..." (multiple corroborating sources)
"It remains debated whether..." (conflicting evidence)
Date every source. If no date is found, write "(n.d.)".
Note source type in the bibliography: academic paper, government report, news article, blog post, etc.

Scientific Rigor Standards

Falsifiability: Actively search for counter-evidence. Don't just confirm priors.
Replication: A claim supported by multiple independent sources is stronger than one from a single source.
Recency awareness: Newer isn't always better. Note when older foundational work is more relevant.
Correlation vs. causation: Flag when sources conflate the two.
Sample size and methodology: Note these for empirical claims.
Conflicts of interest: Note when a source has a stake in its conclusions.

Tool Usage Patterns

Search (always use search_depth "advanced" for deep research)

web-search("topic aspect1 ||| topic aspect2 ||| topic counter-evidence", search_depth="advanced")

Scrape (batch URLs)

scrape-webpage("https://source1.com/article, https://source2.com/study")

Save scraped content to disk

After scraping, immediately write to file and extract findings

This is critical -- do not hold raw scraped content in context

Academic Paper Search (OpenAlex)

Load the openalex-paper-search skill for full API reference. Quick patterns:

Find highly-cited papers on a topic

curl -s "https://api.openalex.org/works?search=topic+keywords&filter=cited_by_count:>50,type:article,has_abstract:true&sort=cited_by_count:desc&per_page=15&select=id,display_name,publication_year,cited_by_count,doi,authorships,abstract_inverted_index&mailto=agent@kortix.ai"

Find recent preprints

curl -s "https://api.openalex.org/works?search=topic&filter=type:preprint,publication_year:2025&sort=publication_date:desc&per_page=10&mailto=agent@kortix.ai"

Find review/survey papers

curl -s "https://api.openalex.org/works?search=topic&filter=type:review,cited_by_count:>20&sort=cited_by_count:desc&per_page=10&mailto=agent@kortix.ai"

Follow citation chains: who cites a seminal paper?

curl -s "https://api.openalex.org/works?filter=cites:WORK_ID&sort=cited_by_count:desc&per_page=10&mailto=agent@kortix.ai"

Save paper metadata to sources-index.md and raw API responses to sources/ for later processing.

Web Source Hunting

web-search("topic site:scholar.google.com ||| topic site:arxiv.org ||| topic systematic review OR meta-analysis", search_depth="advanced")

Data Source Hunting

web-search("topic statistics site:data.gov ||| topic data site:worldbank.org ||| topic survey results", search_depth="advanced")

Fact-Checking Pattern

web-search("claim fact check ||| claim evidence ||| claim debunked OR confirmed", search_depth="advanced")

Local FS Search (check before web search)

Search past research

grep -r "keyword" research/

Semantic search if available

lss "research question" -p /workspace --json -k 10

Handling Edge Cases

No good results found: State what was searched and that evidence is lacking. This is a valid finding.
Highly technical topics: Search for both technical papers and accessible explainers.
Rapidly evolving topics: Prioritize recency. Note publication dates prominently.
Controversial topics: Present all major viewpoints with evidence. Don't editorialize.
Paywalled sources: Extract what's available from abstracts and secondary reporting. Note when full text was not accessible.
Resuming prior research: If a research directory already exists, read the existing plan, notes, and sources-index. Continue from where the previous session left off rather than starting over.

deep-research

Safety Notice

Copy this and send it to your AI assistant to learn