LangChain Architecture
Master the LangChain framework for building sophisticated LLM applications with agents, chains, memory, and tool integration.
When to Use This Skill
-
Building autonomous AI agents with tool access
-
Implementing complex multi-step LLM workflows
-
Managing conversation memory and state
-
Integrating LLMs with external data sources and APIs
-
Creating modular, reusable LLM application components
-
Implementing document processing pipelines
-
Building production-grade LLM applications
Core Concepts
- Agents
Autonomous systems that use LLMs to decide which actions to take.
Agent Types:
-
ReAct: Reasoning + Acting in interleaved manner
-
OpenAI Functions: Leverages function calling API
-
Structured Chat: Handles multi-input tools
-
Conversational: Optimized for chat interfaces
-
Self-Ask with Search: Decomposes complex queries
- Chains
Sequences of calls to LLMs or other utilities.
Chain Types:
-
LLMChain: Basic prompt + LLM combination
-
SequentialChain: Multiple chains in sequence
-
RouterChain: Routes inputs to specialized chains
-
TransformChain: Data transformations between steps
-
MapReduceChain: Parallel processing with aggregation
- Memory
Systems for maintaining context across interactions.
Memory Types:
-
ConversationBufferMemory: Stores all messages
-
ConversationSummaryMemory: Summarizes older messages
-
ConversationBufferWindowMemory: Keeps last N messages
-
EntityMemory: Tracks information about entities
-
VectorStoreMemory: Semantic similarity retrieval
- Document Processing
Loading, transforming, and storing documents for retrieval.
Components:
-
Document Loaders: Load from various sources
-
Text Splitters: Chunk documents intelligently
-
Vector Stores: Store and retrieve embeddings
-
Retrievers: Fetch relevant documents
-
Indexes: Organize documents for efficient access
- Callbacks
Hooks for logging, monitoring, and debugging.
Use Cases:
-
Request/response logging
-
Token usage tracking
-
Latency monitoring
-
Error handling
-
Custom metrics collection
Quick Start
from langchain.agents import AgentType, initialize_agent, load_tools from langchain.llms import OpenAI from langchain.memory import ConversationBufferMemory
Initialize LLM
llm = OpenAI(temperature=0)
Load tools
tools = load_tools(["serpapi", "llm-math"], llm=llm)
Add memory
memory = ConversationBufferMemory(memory_key="chat_history")
Create agent
agent = initialize_agent( tools, llm, agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION, memory=memory, verbose=True )
Run agent
result = agent.run("What's the weather in SF? Then calculate 25 * 4")
Architecture Patterns
Pattern 1: RAG with LangChain
from langchain.chains import RetrievalQA from langchain.document_loaders import TextLoader from langchain.text_splitter import CharacterTextSplitter from langchain.vectorstores import Chroma from langchain.embeddings import OpenAIEmbeddings
Load and process documents
loader = TextLoader('documents.txt') documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200) texts = text_splitter.split_documents(documents)
Create vector store
embeddings = OpenAIEmbeddings() vectorstore = Chroma.from_documents(texts, embeddings)
Create retrieval chain
qa_chain = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=vectorstore.as_retriever(), return_source_documents=True )
Query
result = qa_chain({"query": "What is the main topic?"})
Pattern 2: Custom Agent with Tools
from langchain.agents import Tool, AgentExecutor from langchain.agents.react.base import ReActDocstoreAgent from langchain.tools import tool
@tool def search_database(query: str) -> str: """Search internal database for information.""" # Your database search logic return f"Results for: {query}"
@tool def send_email(recipient: str, content: str) -> str: """Send an email to specified recipient.""" # Email sending logic return f"Email sent to {recipient}"
tools = [search_database, send_email]
agent = initialize_agent( tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True )
Pattern 3: Multi-Step Chain
from langchain.chains import LLMChain, SequentialChain from langchain.prompts import PromptTemplate
Step 1: Extract key information
extract_prompt = PromptTemplate( input_variables=["text"], template="Extract key entities from: {text}\n\nEntities:" ) extract_chain = LLMChain(llm=llm, prompt=extract_prompt, output_key="entities")
Step 2: Analyze entities
analyze_prompt = PromptTemplate( input_variables=["entities"], template="Analyze these entities: {entities}\n\nAnalysis:" ) analyze_chain = LLMChain(llm=llm, prompt=analyze_prompt, output_key="analysis")
Step 3: Generate summary
summary_prompt = PromptTemplate( input_variables=["entities", "analysis"], template="Summarize:\nEntities: {entities}\nAnalysis: {analysis}\n\nSummary:" ) summary_chain = LLMChain(llm=llm, prompt=summary_prompt, output_key="summary")
Combine into sequential chain
overall_chain = SequentialChain( chains=[extract_chain, analyze_chain, summary_chain], input_variables=["text"], output_variables=["entities", "analysis", "summary"], verbose=True )
Memory Management Best Practices
Choosing the Right Memory Type
For short conversations (< 10 messages)
from langchain.memory import ConversationBufferMemory memory = ConversationBufferMemory()
For long conversations (summarize old messages)
from langchain.memory import ConversationSummaryMemory memory = ConversationSummaryMemory(llm=llm)
For sliding window (last N messages)
from langchain.memory import ConversationBufferWindowMemory memory = ConversationBufferWindowMemory(k=5)
For entity tracking
from langchain.memory import ConversationEntityMemory memory = ConversationEntityMemory(llm=llm)
For semantic retrieval of relevant history
from langchain.memory import VectorStoreRetrieverMemory memory = VectorStoreRetrieverMemory(retriever=retriever)
Callback System
Custom Callback Handler
from langchain.callbacks.base import BaseCallbackHandler
class CustomCallbackHandler(BaseCallbackHandler): def on_llm_start(self, serialized, prompts, **kwargs): print(f"LLM started with prompts: {prompts}")
def on_llm_end(self, response, **kwargs):
print(f"LLM ended with response: {response}")
def on_llm_error(self, error, **kwargs):
print(f"LLM error: {error}")
def on_chain_start(self, serialized, inputs, **kwargs):
print(f"Chain started with inputs: {inputs}")
def on_agent_action(self, action, **kwargs):
print(f"Agent taking action: {action}")
Use callback
agent.run("query", callbacks=[CustomCallbackHandler()])
Testing Strategies
import pytest from unittest.mock import Mock
def test_agent_tool_selection(): # Mock LLM to return specific tool selection mock_llm = Mock() mock_llm.predict.return_value = "Action: search_database\nAction Input: test query"
agent = initialize_agent(tools, mock_llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)
result = agent.run("test query")
# Verify correct tool was selected
assert "search_database" in str(mock_llm.predict.call_args)
def test_memory_persistence(): memory = ConversationBufferMemory()
memory.save_context({"input": "Hi"}, {"output": "Hello!"})
assert "Hi" in memory.load_memory_variables({})['history']
assert "Hello!" in memory.load_memory_variables({})['history']
Performance Optimization
- Caching
from langchain.cache import InMemoryCache import langchain
langchain.llm_cache = InMemoryCache()
- Batch Processing
Process multiple documents in parallel
from langchain.document_loaders import DirectoryLoader from concurrent.futures import ThreadPoolExecutor
loader = DirectoryLoader('./docs') docs = loader.load()
def process_doc(doc): return text_splitter.split_documents([doc])
with ThreadPoolExecutor(max_workers=4) as executor: split_docs = list(executor.map(process_doc, docs))
- Streaming Responses
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()])
Resources
-
references/agents.md: Deep dive on agent architectures
-
references/memory.md: Memory system patterns
-
references/chains.md: Chain composition strategies
-
references/document-processing.md: Document loading and indexing
-
references/callbacks.md: Monitoring and observability
-
assets/agent-template.py: Production-ready agent template
-
assets/memory-config.yaml: Memory configuration examples
-
assets/chain-example.py: Complex chain examples
Common Pitfalls
-
Memory Overflow: Not managing conversation history length
-
Tool Selection Errors: Poor tool descriptions confuse agents
-
Context Window Exceeded: Exceeding LLM token limits
-
No Error Handling: Not catching and handling agent failures
-
Inefficient Retrieval: Not optimizing vector store queries
Production Checklist
-
Implement proper error handling
-
Add request/response logging
-
Monitor token usage and costs
-
Set timeout limits for agent execution
-
Implement rate limiting
-
Add input validation
-
Test with edge cases
-
Set up observability (callbacks)
-
Implement fallback strategies
-
Version control prompts and configurations