Prompt Assemble
Overview
A standardized, token-safe prompt assembly framework that guarantees API stability. Implements Two-Phase Context Construction and Memory Safety Valve to prevent token overflow while maximizing relevant context.
Design Goals:
- ✅ Never fail due to memory-related token overflow
- ✅ Memory is always discardable enhancement, never rigid dependency
- ✅ Token budget decisions centralized at prompt assemble layer
When to Use
Use this skill when:
- Building or modifying any agent that constructs prompts
- Implementing memory retrieval systems
- Adding new prompt-related logic to existing agents
- Any scenario where token budget safety is required
Core Workflow
User Input
↓
Need-Memory Decision
↓
Minimal Context Build
↓
Memory Retrieval (Optional)
↓
Memory Summarization
↓
Token Estimation
↓
Safety Valve Decision
↓
Final Prompt → LLM Call
Phase Details
Phase 0: Base Configuration
# Model Context Windows (2026-02-04)
# - MiniMax-M2.1: 204,000 tokens (default)
# - Claude 3.5 Sonnet: 200,000 tokens
# - GPT-4o: 128,000 tokens
MAX_TOKENS = 204000 # Set to your model's context limit
SAFETY_MARGIN = 0.75 * MAX_TOKENS # Conservative: 75% threshold = 153,000 tokens
MEMORY_TOP_K = 3 # Max 3 memories
MEMORY_SUMMARY_MAX = 3 lines # Max 3 lines per memory
Design Philosophy:
- Leave 25% buffer for safety (model overhead, estimation errors, spikes)
- Better to underutilize capacity than to overflow
Phase 1: Minimal Context
- System prompt
- Recent N messages (N=3, trimmed)
- Current user input
- No memory by default
Phase 2: Memory Need Decision
def need_memory(user_input):
triggers = [
"previously",
"earlier we discussed",
"do you remember",
"as I mentioned before",
"continuing from",
"before we",
"last time",
"previously mentioned"
]
for trigger in triggers:
if trigger.lower() in user_input.lower():
return True
return False
Phase 3: Memory Retrieval (Optional)
memories = memory_search(query=user_input, top_k=MEMORY_TOP_K)
for mem in memories:
summarized_memories.append(summarize(mem, max_lines=MEMORY_SUMMARY_MAX))
Phase 4: Token Estimation
Calculate estimated tokens for base_context + summarized_memories.
Phase 5: Safety Valve (Critical)
if estimated_tokens > SAFETY_MARGIN:
base_context.append("[System Notice] Relevant memory skipped due to token budget.")
return assemble(base_context)
Hard Rules:
- ❌ Never downgrade system prompt
- ❌ Never truncate user input
- ❌ No "lucky splicing"
- ✅ Only memory layer is expendable
Phase 6: Final Assembly
final_prompt = assemble(base_context + summarized_memories)
return final_prompt
Memory Data Standards
Allowed in Long-Term Memory
- ✅ User preferences / identity / long-term goals
- ✅ Confirmed important conclusions
- ✅ System-level settings and rules
Forbidden in Long-Term Memory
- ❌ Raw conversation logs
- ❌ Reasoning traces
- ❌ Temporary discussions
- ❌ Information recoverable from chat history
Quick Start
Copy scripts/prompt_assemble.py to your agent and use:
from prompt_assemble import build_prompt
# In your agent's prompt construction:
final_prompt = build_prompt(user_input, memory_search_fn, get_recent_dialog_fn)
Resources
scripts/
prompt_assemble.py- Complete implementation with all phases (PromptAssembler class)
references/
memory_standards.md- Detailed memory content guidelinestoken_estimation.md- Token counting strategies