RLM (Recursive Language Model) Skill
This skill enables processing of arbitrarily long documents by treating them as an external environment and recursively calling sub-LLMs over chunks of the content.
Overview
The RLM pattern follows the architecture from the paper:
- Root Agent: Main agent orchestrating the task
- Sub-LLM (llm_query): Subordinate agent for chunk-level analysis
- External Environment: Persistent Python REPL with document state
Requirements
- Python 3.8+
- An agentic harness with code execution and subordinate agent capabilities
File Structure
rlm_skill/
├── SKILL.md # This file
├── rlm-subcall.md # Subordinate agent prompt profile
└── scripts/
└── rlm_repl.py # Persistent Python REPL
Quick Start
- Initialize the skill by loading it into your agent context
- Load a context file using the REPL init command
- Run the RLM workflow - chunk, delegate, synthesize
Detailed Usage
Step 1: Initialize the REPL with your context
Execute via your agent's code execution tool:
python3 /path/to/rlm_skill/scripts/rlm_repl.py init /path/to/your/context.txt
Step 2: Verify status
python3 /path/to/rlm_skill/scripts/rlm_repl.py status
Step 3: Scout the context
python3 /path/to/rlm_skill/scripts/rlm_repl.py exec -c 'print(peek(0, 3000))'
Step 4: Create chunks
python3 /path/to/rlm_skill/scripts/rlm_repl.py exec -c 'print("\\n".join(write_chunks("./rlm_chunks", size=200000, overlap=0)))'
Step 5: Delegate to sub-LLM
Use your agentic harness's subordinate agent capability with the rlm-subcall profile:
Subordinate Agent Query:
Profile: rlm-subcall
Message: Query: <user_question>
Chunk file: ./rlm_chunks/chunk_0000.txt
Extract relevant information from this chunk.
Step 6: Synthesize results
Combine sub-LLM outputs and provide the final answer in the main conversation.
REPL Helper Functions
The rlm_repl.py provides these injected functions:
peek(start=0, end=1000)- View portion of contentgrep(pattern, max_matches=20, window=120, flags=0)- Search with contextchunk_indices(size=200000, overlap=0)- Get chunk boundarieswrite_chunks(out_dir, size=200000, overlap=0, prefix='chunk')- Materialize chunks as filesadd_buffer(text)- Store intermediate results
REPL Commands
init <context_path>- Load context filestatus [--show-vars]- Show current stateexec [-c "code"]- Execute Python code (reads from stdin if no -c)export-buffers <out_path>- Write buffers to filereset- Delete state file
Best Practices
- Don't paste large chunks into main chat - Use REPL and subordinates
- Use grep() to locate relevant sections - Then peek() for details
- Keep chunk outputs structured - Prefer JSON for sub-LLM responses
- Synthesize in main conversation - After collecting evidence
- Clean up state when done - Use
resetto clear old contexts
Example Workflow
User: "Analyze this 500KB log file and find all error messages from service X"
Agent:
1. Init: python rlm_repl.py init /path/to/logs.txt
2. Scout: exec -c 'print(peek(0, 500))'
3. Search: exec -c 'print(grep("service X.*error", max_matches=10))'
4. Chunk: exec -c 'write_chunks("./chunks", size=100000)'
5. For each chunk, call subordinate with rlm-subcall profile
6. Synthesize findings and answer user
Limitations
- State is stored locally via pickle (not distributed)
- Context size limited by available memory
- Sub-LLM calls are synchronous in this implementation
References
- MIT paper: "Recursive Language Models"
- Implementation: brainqub3/claude_code_RLM