llm-basics

LLM architecture, tokenization, transformers, and inference optimization. Use for understanding and working with language models.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "llm-basics" with this command: npx skills add pluginagentmarketplace/custom-plugin-ai-engineer/pluginagentmarketplace-custom-plugin-ai-engineer-llm-basics

LLM Basics

Master the fundamentals of Large Language Models.

Quick Start

Using OpenAI API

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain transformers briefly."}
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message.content)

Using Hugging Face

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "meta-llama/Llama-2-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

inputs = tokenizer("Hello, how are", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))

Core Concepts

Transformer Architecture

Input → Embedding → [N × Transformer Block] → Output

Transformer Block:
┌───────────────────────────┐
│ Multi-Head Self-Attention │
├───────────────────────────┤
│   Layer Normalization     │
├───────────────────────────┤
│   Feed-Forward Network    │
├───────────────────────────┤
│   Layer Normalization     │
└───────────────────────────┘

Tokenization

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
text = "Hello, world!"

# Encode
tokens = tokenizer.encode(text)
print(tokens)  # [15496, 11, 995, 0]

# Decode
decoded = tokenizer.decode(tokens)
print(decoded)  # "Hello, world!"

Key Parameters

# Generation parameters
params = {
    'temperature': 0.7,      # Randomness (0-2)
    'max_tokens': 1000,      # Output length limit
    'top_p': 0.9,            # Nucleus sampling
    'top_k': 50,             # Top-k sampling
    'frequency_penalty': 0,  # Reduce repetition
    'presence_penalty': 0    # Encourage new topics
}

Model Comparison

ModelParametersContextBest For
GPT-4~1.7T128KComplex reasoning
GPT-3.5175B16KGeneral tasks
Claude 3N/A200KLong context
Llama 27-70B4KOpen source
Mistral 7B7B32KEfficient inference

Local Inference

With Ollama

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Run a model
ollama run llama2

# API usage
curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Why is the sky blue?"
}'

With vLLM

from vllm import LLM, SamplingParams

llm = LLM(model="meta-llama/Llama-2-7b-hf")
sampling = SamplingParams(temperature=0.8, max_tokens=100)

outputs = llm.generate(["Hello, my name is"], sampling)

Best Practices

  1. Start simple: Use API before local deployment
  2. Mind context: Stay within context window limits
  3. Temperature tuning: Lower for facts, higher for creativity
  4. Token efficiency: Shorter prompts = lower costs
  5. Streaming: Use for better UX in applications

Error Handling & Retry

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def call_llm_with_retry(prompt: str) -> str:
    return client.chat.completions.create(...)

Troubleshooting

SymptomCauseSolution
Rate limit errorsToo many requestsAdd exponential backoff
Empty responsemax_tokens=0Check parameter values
High latencyLarge modelUse smaller model
TimeoutPrompt too longReduce input size

Unit Test Template

def test_llm_completion():
    response = call_llm("Hello")
    assert response is not None
    assert len(response) > 0

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

fine-tuning

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

prompt-engineering

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

model-deployment

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

agent-frameworks

No summary provided by upstream source.

Repository SourceNeeds Review