Memory Design Patterns
Production-ready memory architecture patterns for AI applications using Mem0. This skill provides comprehensive guidance on designing scalable, performant memory systems with proper isolation, retention strategies, and optimization techniques.
Instructions
Phase 1: Understand Memory Types
Mem0 provides three distinct memory scopes, each serving different purposes:
- User Memory (Persistent Preferences & Profile)
Purpose: Long-term personal preferences, profile data, and user characteristics that persist across all interactions.
Use Cases:
-
User preferences (dietary restrictions, communication style, language preferences)
-
Personal information (location, occupation, family details)
-
Long-term goals and interests
-
Historical context that should persist indefinitely
Implementation:
Add user-level memory
memory.add( "User prefers concise responses without technical jargon" user_id="customer_bob" )
Search user memories
user_context = memory.search( "communication style" user_id="customer_bob" )
Key Characteristics:
-
Persists indefinitely (or until explicitly deleted)
-
Shared across all agents interacting with this user
-
Should contain stable, long-term information
-
Typically 10-50 memories per user
- Agent Memory (Agent-Specific Context)
Purpose: Agent-specific knowledge, behaviors, and learned patterns that apply across all users interacting with this agent.
Use Cases:
-
Agent capabilities and limitations
-
Domain-specific knowledge
-
Learned behaviors and patterns
-
Agent-specific instructions and protocols
Implementation:
Add agent-level memory
memory.add( "When handling refund requests, always check order date first" agent_id="support_agent_v2" )
Search agent memories
agent_context = memory.search( "refund process" agent_id="support_agent_v2" )
Key Characteristics:
-
Shared across all users interacting with this agent
-
Contains agent-specific procedures and knowledge
-
Moderate retention (days to months)
-
Typically 50-200 memories per agent
- Session/Run Memory (Temporary Conversation Context)
Purpose: Ephemeral context specific to a single conversation or task session.
Use Cases:
-
Current conversation topic
-
Temporary task context
-
Session-specific state
-
Short-term working memory
Implementation:
Add session-level memory
memory.add( "Current issue: payment failed with error code 402" run_id="session_12345_20250115" )
Search session memories
session_context = memory.search( "current issue" run_id="session_12345_20250115" )
Key Characteristics:
-
Short-lived (minutes to hours)
-
Isolated to specific conversation or task
-
Should be cleaned up after session ends
-
Typically 5-20 memories per session
Phase 2: Choose Storage Backend (Vector vs Graph)
Vector Memory (Default)
How It Works: Embeddings stored in vector database, semantic similarity search using cosine distance.
Strengths:
-
Fast semantic search
-
Excellent for unstructured data
-
Low setup complexity
-
Works out-of-the-box with Mem0
Weaknesses:
-
Cannot query relationships
-
No explicit entity connections
-
Limited reasoning about connections
Best For:
-
Simple preference storage
-
Document/chunk retrieval
-
Semantic search use cases
-
Quick prototyping
Configuration:
from mem0 import Memory
Default vector-only configuration
memory = Memory()
Graph Memory (Advanced)
How It Works: Entities and relationships stored in graph database (Neo4j/Memgraph), enables relationship traversal and complex queries.
Strengths:
-
Explicit entity relationships
-
Complex query capabilities
-
Relationship reasoning
-
Multi-hop traversal
Weaknesses:
-
Requires graph database setup
-
Higher infrastructure complexity
-
Slower for pure semantic search
-
More storage overhead
Best For:
-
Multi-entity systems
-
Relationship-heavy domains
-
Complex reasoning requirements
-
Enterprise knowledge graphs
Configuration:
from mem0 import Memory from mem0.configs.base import MemoryConfig
config = MemoryConfig( graph_store={ "provider": "neo4j" "config": { "url": "bolt://localhost:7687" "username": "neo4j" "password": "password" } } ) memory = Memory(config)
Decision Matrix:
Use Case Vector Graph
User preferences ✅ Best ⚠️ Overkill
Product recommendations ✅ Best ⚠️ Overkill
Customer support ✅ Good ✅ Better
Knowledge management ⚠️ Limited ✅ Best
Multi-tenant systems ✅ Good ✅ Best
Team collaboration ⚠️ Limited ✅ Best
Phase 3: Design Retention Strategy
Use the retention strategy template:
bash scripts/generate-retention-policy.sh <memory-type> <retention-days>
Retention Guidelines
User Memory:
-
Retention: Indefinite (with user control)
-
Cleanup: User-initiated deletion only
-
Archival: After 1 year of inactivity
-
GDPR: Must support right to deletion
Agent Memory:
-
Retention: 90-180 days typical
-
Cleanup: Automatic based on relevance score
-
Versioning: Keep agent version history
-
Deprecation: Clear old agent memories on major updates
Session Memory:
-
Retention: 1-24 hours
-
Cleanup: Automatic after session end
-
Conversion: Promote important memories to user/agent level
-
Storage: Consider in-memory for very short sessions
Retention Implementation
Run the retention analyzer:
bash scripts/analyze-retention.sh <user_id_or_agent_id>
This script:
-
Analyzes memory age and access patterns
-
Identifies stale memories
-
Suggests cleanup actions
-
Generates retention reports
Phase 4: Implement Multi-Level Memory Pattern
Pattern: Combine all three memory types for comprehensive context.
Template: Use templates/multi-level-memory-pattern.py
Architecture:
Query Processing Flow:
- Retrieve session context (immediate)
- Retrieve user context (preferences)
- Retrieve agent context (capabilities)
- Merge contexts with priority weighting
- Generate response with full context
Priority Weighting:
-
Session: 40% weight (most relevant to current task)
-
User: 35% weight (personalizes response)
-
Agent: 25% weight (ensures consistent behavior)
Implementation:
Retrieve all context levels
session_memories = memory.search(query, run_id=run_id) user_memories = memory.search(query, user_id=user_id) agent_memories = memory.search(query, agent_id=agent_id)
Weighted merge
context = merge_contexts( session=session_memories user=user_memories agent=agent_memories weights={"session": 0.4, "user": 0.35, "agent": 0.25} )
Phase 5: Optimize Performance
Vector Search Optimization
Run the performance analyzer:
bash scripts/analyze-memory-performance.sh <project_name>
Optimization Techniques:
Limit Search Results:
memories = memory.search(query, user_id=user_id, limit=5)
-
Default: 10 results
-
Recommended: 3-5 for chat, 10-20 for RAG
Use Filters to Reduce Search Space:
memories = memory.search( query filters={ "AND": [ {"user_id": "alex"} {"agent_id": "support_agent"} ] } )
Cache Frequently Accessed Memories:
-
Cache user preferences (rarely change)
-
Refresh cache every 5-10 minutes
-
Invalidate on explicit memory updates
Batch Operations:
Add multiple memories in one call
memory.add(messages, user_id=user_id)
Graph Query Optimization
For graph memory:
-
Limit Traversal Depth: Max 2-3 hops
-
Index Key Properties: user_id, agent_id, timestamps
-
Use Relationship Filters: Reduce unnecessary traversals
-
Monitor Query Performance: Track slow queries > 100ms
Phase 6: Implement Cost Optimization
Run the cost analyzer:
bash scripts/analyze-memory-costs.sh <user_id> <date_range>
Cost Optimization Strategies:
Deduplication: Remove similar/redundant memories
bash scripts/deduplicate-memories.sh <user_id>
Archival: Move old memories to cold storage
-
Active: Last 30 days (vector DB)
-
Archive: 30-180 days (compressed JSON)
-
Long-term: > 180 days (S3/cold storage)
Compression: Use shorter embeddings for less critical memories
-
Critical: 1536 dimensions (OpenAI large)
-
Standard: 768 dimensions (OpenAI small)
-
Archival: 384 dimensions (lightweight model)
Smart Pruning: Remove low-value memories
-
Score-based: Keep only high relevance scores
-
Access-based: Remove never-accessed memories
-
Importance-based: User/agent priority tagging
Phase 7: Security and Isolation
Multi-Tenant Isolation
Pattern: Ensure complete data isolation between users/organizations.
Implementation:
Always scope by user_id or org_id
memories = memory.search( query filters={"user_id": current_user_id} )
Validate access before retrieval
if not user_has_access(user_id, requested_user_id): raise PermissionError("Access denied")
Security Checklist:
-
✅ Never allow cross-user memory access
-
✅ Validate all user_id parameters
-
✅ Implement org-level isolation for multi-tenant apps
-
✅ Audit memory access logs
-
✅ Encrypt sensitive memory content
-
✅ Support GDPR right to deletion
Run the security audit:
bash scripts/audit-memory-security.sh
Decision Trees
When to Use Each Memory Type
Use the decision helper:
bash scripts/suggest-memory-type.sh "<use_case_description>"
Quick Reference:
-
User dietary preferences → User Memory
-
Agent's SOP for task X → Agent Memory
-
Current conversation topic → Session Memory
-
Customer support ticket details → Session Memory (promote to User if resolved)
-
System capabilities → Agent Memory
-
User's birthday → User Memory
Vector vs Graph Decision
Use the architecture advisor:
bash scripts/suggest-storage-architecture.sh "<project_description>"
Decision Criteria:
-
Need relationship traversal? → Graph
-
Pure semantic search? → Vector
-
< 10,000 memories total? → Vector
-
Complex entity relationships? → Graph
-
Team/org hierarchies? → Graph
-
Simple preference storage? → Vector
Key Files
Scripts (all functional, not placeholders):
-
scripts/generate-retention-policy.sh
-
Create retention policy configs
-
scripts/analyze-retention.sh
-
Analyze memory age and access patterns
-
scripts/analyze-memory-performance.sh
-
Performance profiling
-
scripts/analyze-memory-costs.sh
-
Cost analysis and optimization suggestions
-
scripts/deduplicate-memories.sh
-
Find and remove duplicate memories
-
scripts/audit-memory-security.sh
-
Security compliance checking
-
scripts/suggest-memory-type.sh
-
Interactive memory type advisor
-
scripts/suggest-storage-architecture.sh
-
Architecture recommendation tool
Templates:
-
templates/multi-level-memory-pattern.py
-
Complete implementation
-
templates/retention-policy.yaml
-
Retention configuration
-
templates/vector-only-config.py
-
Vector memory setup
-
templates/graph-memory-config.py
-
Graph memory setup
-
templates/hybrid-architecture.py
-
Vector + Graph combined
-
templates/cost-optimization-config.yaml
-
Cost optimization settings
Examples:
-
examples/customer-support-memory-architecture.md
-
Full implementation guide
-
examples/multi-agent-collaboration.md
-
Shared memory patterns
-
examples/e-commerce-personalization.md
-
Product recommendation memory
-
examples/healthcare-assistant.md
-
HIPAA-compliant memory architecture
Best Practices
-
Start Simple: Use vector-only with user + session memories
-
Add Complexity as Needed: Only introduce graph when relationships matter
-
Monitor Performance: Track memory retrieval times and costs
-
Implement Retention Early: Don't let memory grow unbounded
-
Test Isolation: Verify cross-user memory access is impossible
-
Document Memory Schema: Track what memories mean and when they're used
-
Version Agent Memories: Clear separation between agent versions
-
Promote Important Memories: Session → User when patterns emerge
-
Use Metadata: Tag memories with categories for better filtering
-
Regular Audits: Monthly review of memory growth and costs
Troubleshooting
Slow Memory Retrieval:
-
Reduce search limit
-
Add more specific filters
-
Check vector index performance
-
Consider caching
High Costs:
-
Run cost analyzer script
-
Implement deduplication
-
Review retention policy
-
Archive old memories
Poor Search Results:
-
Check embedding model quality
-
Verify memory content is descriptive
-
Use hybrid search (keyword + semantic)
-
Add metadata for filtering
Memory Leakage Between Users:
-
Audit security script immediately
-
Review all memory queries for user_id filtering
-
Check RLS policies if using custom backends
-
Implement access logging
Plugin: mem0 Version: 1.0.0 Last Updated: 2025-10-27