llm-application-dev

LLM Application Development

Prompt Engineering

Structured Prompts

const systemPrompt = `You are a helpful assistant that answers questions about our product.

RULES:

Only answer questions about our product
If you don't know, say "I don't know"
Keep responses concise (under 100 words)
Never make up information

CONTEXT: {context}`;

const userPrompt = Question: {question};

Few-Shot Examples

const prompt = `Classify the sentiment of customer feedback.

Examples: Input: "Love this product!" Output: positive

Input: "Worst purchase ever" Output: negative

Input: "It works fine" Output: neutral

Input: "${customerFeedback}" Output:`;

Chain of Thought

const prompt = `Solve this step by step:

Question: ${question}

Let's think through this:

First, identify the key information
Then, determine the approach
Finally, calculate the answer

Step-by-step solution:`;

API Integration

OpenAI Pattern

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function chat(messages: Message[]): Promise<string> { const response = await openai.chat.completions.create({ model: 'gpt-4', messages, temperature: 0.7, max_tokens: 500, });

return response.choices[0].message.content ?? ''; }

Anthropic Pattern

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

async function chat(prompt: string): Promise<string> { const response = await anthropic.messages.create({ model: 'claude-3-opus-20240229', max_tokens: 1024, messages: [{ role: 'user', content: prompt }], });

return response.content[0].type === 'text' ? response.content[0].text : ''; }

Streaming Responses

async function* streamChat(prompt: string) { const stream = await openai.chat.completions.create({ model: 'gpt-4', messages: [{ role: 'user', content: prompt }], stream: true, });

for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content; if (content) yield content; } }

RAG (Retrieval-Augmented Generation)

Basic RAG Pipeline

async function ragQuery(question: string): Promise<string> { // 1. Embed the question const questionEmbedding = await embedText(question);

// 2. Search vector database const relevantDocs = await vectorDb.search(questionEmbedding, { limit: 5 });

// 3. Build context const context = relevantDocs.map(d => d.content).join('\n\n');

// 4. Generate answer const prompt = Answer based on this context:\n${context}\n\nQuestion: ${question}; return await chat(prompt); }

Document Chunking

function chunkDocument(text: string, options: ChunkOptions): string[] { const { chunkSize = 1000, overlap = 200 } = options; const chunks: string[] = [];

let start = 0; while (start < text.length) { const end = Math.min(start + chunkSize, text.length); chunks.push(text.slice(start, end)); start += chunkSize - overlap; }

return chunks; }

Embedding Storage

// Using Supabase with pgvector async function storeEmbeddings(docs: Document[]) { for (const doc of docs) { const embedding = await embedText(doc.content);

await supabase.from('documents').insert({
  content: doc.content,
  metadata: doc.metadata,
  embedding: embedding,  // vector column
});

} }

async function searchSimilar(query: string, limit = 5) { const embedding = await embedText(query);

const { data } = await supabase.rpc('match_documents', { query_embedding: embedding, match_count: limit, });

return data; }

Error Handling

async function safeLLMCall<T>( fn: () => Promise<T>, options: { retries?: number; fallback?: T } ): Promise<T> { const { retries = 3, fallback } = options;

for (let i = 0; i < retries; i++) { try { return await fn(); } catch (error) { if (error.status === 429) { // Rate limit - exponential backoff await sleep(Math.pow(2, i) * 1000); continue; } if (i === retries - 1) { if (fallback !== undefined) return fallback; throw error; } } } throw new Error('Max retries exceeded'); }

Best Practices

Token Management: Track usage and set limits
Caching: Cache embeddings and common queries
Evaluation: Test prompts with diverse inputs
Guardrails: Validate outputs before using
Logging: Log prompts and responses for debugging
Cost Control: Use cheaper models for simple tasks
Latency: Stream responses for better UX
Privacy: Don't send PII to external APIs

llm-application-dev

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

code-refactoring

backend-development

python-development