Transformers.js - Machine Learning for JavaScript
Transformers.js enables running state-of-the-art machine learning models directly in JavaScript, both in browsers and Node.js environments, with no server required.
When to Use This Skill
Use this skill when you need to:
-
Run ML models for text analysis, generation, or translation in JavaScript
-
Perform image classification, object detection, or segmentation
-
Implement speech recognition or audio processing
-
Build multimodal AI applications (text-to-image, image-to-text, etc.)
-
Run models client-side in the browser without a backend
Installation
NPM Installation
npm install @huggingface/transformers
Browser Usage (CDN)
<script type="module"> import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers'; </script>
Core Concepts
- Pipeline API
The pipeline API is the easiest way to use models. It groups together preprocessing, model inference, and postprocessing:
import { pipeline } from '@huggingface/transformers';
// Create a pipeline for a specific task const pipe = await pipeline('sentiment-analysis');
// Use the pipeline const result = await pipe('I love transformers!'); // Output: [{ label: 'POSITIVE', score: 0.999817686 }]
// IMPORTANT: Always dispose when done to free memory await classifier.dispose();
⚠️ Memory Management: All pipelines must be disposed with pipe.dispose() when finished to prevent memory leaks. See examples in Code Examples for cleanup patterns across different environments.
- Model Selection
You can specify a custom model as the second argument:
const pipe = await pipeline( 'sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment' );
Finding Models:
Browse available Transformers.js models on Hugging Face Hub:
-
All models: https://huggingface.co/models?library=transformers.js&sort=trending
-
By task: Add pipeline_tag parameter
-
Text generation: https://huggingface.co/models?pipeline_tag=text-generation&library=transformers.js&sort=trending
-
Image classification: https://huggingface.co/models?pipeline_tag=image-classification&library=transformers.js&sort=trending
-
Speech recognition: https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&library=transformers.js&sort=trending
Tip: Filter by task type, sort by trending/downloads, and check model cards for performance metrics and usage examples.
- Device Selection
Choose where to run the model:
// Run on CPU (default for WASM) const pipe = await pipeline('sentiment-analysis', 'model-id');
// Run on GPU (WebGPU - experimental) const pipe = await pipeline('sentiment-analysis', 'model-id', { device: 'webgpu', });
- Quantization Options
Control model precision vs. performance:
// Use quantized model (faster, smaller) const pipe = await pipeline('sentiment-analysis', 'model-id', { dtype: 'q4', // Options: 'fp32', 'fp16', 'q8', 'q4' });
Supported Tasks
Note: All examples below show basic usage.
Natural Language Processing
Text Classification
const classifier = await pipeline('text-classification'); const result = await classifier('This movie was amazing!');
Named Entity Recognition (NER)
const ner = await pipeline('token-classification'); const entities = await ner('My name is John and I live in New York.');
Question Answering
const qa = await pipeline('question-answering'); const answer = await qa({ question: 'What is the capital of France?', context: 'Paris is the capital and largest city of France.' });
Text Generation
const generator = await pipeline('text-generation', 'onnx-community/gemma-3-270m-it-ONNX'); const text = await generator('Once upon a time', { max_new_tokens: 100, temperature: 0.7 });
For streaming and chat: See Text Generation Guide for:
-
Streaming token-by-token output with TextStreamer
-
Chat/conversation format with system/user/assistant roles
-
Generation parameters (temperature, top_k, top_p)
-
Browser and Node.js examples
-
React components and API endpoints
Translation
const translator = await pipeline('translation', 'Xenova/nllb-200-distilled-600M'); const output = await translator('Hello, how are you?', { src_lang: 'eng_Latn', tgt_lang: 'fra_Latn' });
Summarization
const summarizer = await pipeline('summarization'); const summary = await summarizer(longText, { max_length: 100, min_length: 30 });
Zero-Shot Classification
const classifier = await pipeline('zero-shot-classification'); const result = await classifier('This is a story about sports.', ['politics', 'sports', 'technology']);
Computer Vision
Image Classification
const classifier = await pipeline('image-classification'); const result = await classifier('https://example.com/image.jpg'); // Or with local file const result = await classifier(imageUrl);
Object Detection
const detector = await pipeline('object-detection'); const objects = await detector('https://example.com/image.jpg'); // Returns: [{ label: 'person', score: 0.95, box: { xmin, ymin, xmax, ymax } }, ...]
Image Segmentation
const segmenter = await pipeline('image-segmentation'); const segments = await segmenter('https://example.com/image.jpg');
Depth Estimation
const depthEstimator = await pipeline('depth-estimation'); const depth = await depthEstimator('https://example.com/image.jpg');
Zero-Shot Image Classification
const classifier = await pipeline('zero-shot-image-classification'); const result = await classifier('image.jpg', ['cat', 'dog', 'bird']);
Audio Processing
Automatic Speech Recognition
const transcriber = await pipeline('automatic-speech-recognition'); const result = await transcriber('audio.wav'); // Returns: { text: 'transcribed text here' }
Audio Classification
const classifier = await pipeline('audio-classification'); const result = await classifier('audio.wav');
Text-to-Speech
const synthesizer = await pipeline('text-to-speech', 'Xenova/speecht5_tts'); const audio = await synthesizer('Hello, this is a test.', { speaker_embeddings: speakerEmbeddings });
Multimodal
Image-to-Text (Image Captioning)
const captioner = await pipeline('image-to-text'); const caption = await captioner('image.jpg');
Document Question Answering
const docQA = await pipeline('document-question-answering'); const answer = await docQA('document-image.jpg', 'What is the total amount?');
Zero-Shot Object Detection
const detector = await pipeline('zero-shot-object-detection'); const objects = await detector('image.jpg', ['person', 'car', 'tree']);
Feature Extraction (Embeddings)
const extractor = await pipeline('feature-extraction'); const embeddings = await extractor('This is a sentence to embed.'); // Returns: tensor of shape [1, sequence_length, hidden_size]
// For sentence embeddings (mean pooling) const extractor = await pipeline('feature-extraction', 'onnx-community/all-MiniLM-L6-v2-ONNX'); const embeddings = await extractor('Text to embed', { pooling: 'mean', normalize: true });
Finding and Choosing Models
Browsing the Hugging Face Hub
Discover compatible Transformers.js models on Hugging Face Hub:
Base URL (all models):
https://huggingface.co/models?library=transformers.js&sort=trending
Filter by task using the pipeline_tag parameter:
Task URL
Text Generation https://huggingface.co/models?pipeline_tag=text-generation&library=transformers.js&sort=trending
Text Classification https://huggingface.co/models?pipeline_tag=text-classification&library=transformers.js&sort=trending
Translation https://huggingface.co/models?pipeline_tag=translation&library=transformers.js&sort=trending
Summarization https://huggingface.co/models?pipeline_tag=summarization&library=transformers.js&sort=trending
Question Answering https://huggingface.co/models?pipeline_tag=question-answering&library=transformers.js&sort=trending
Image Classification https://huggingface.co/models?pipeline_tag=image-classification&library=transformers.js&sort=trending
Object Detection https://huggingface.co/models?pipeline_tag=object-detection&library=transformers.js&sort=trending
Image Segmentation https://huggingface.co/models?pipeline_tag=image-segmentation&library=transformers.js&sort=trending
Speech Recognition https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&library=transformers.js&sort=trending
Audio Classification https://huggingface.co/models?pipeline_tag=audio-classification&library=transformers.js&sort=trending
Image-to-Text https://huggingface.co/models?pipeline_tag=image-to-text&library=transformers.js&sort=trending
Feature Extraction https://huggingface.co/models?pipeline_tag=feature-extraction&library=transformers.js&sort=trending
Zero-Shot Classification https://huggingface.co/models?pipeline_tag=zero-shot-classification&library=transformers.js&sort=trending
Sort options:
-
&sort=trending
-
Most popular recently
-
&sort=downloads
-
Most downloaded overall
-
&sort=likes
-
Most liked by community
-
&sort=modified
-
Recently updated
Choosing the Right Model
Consider these factors when selecting a model:
- Model Size
-
Small (< 100MB): Fast, suitable for browsers, limited accuracy
-
Medium (100MB - 500MB): Balanced performance, good for most use cases
-
Large (> 500MB): High accuracy, slower, better for Node.js or powerful devices
- Quantization Models are often available in different quantization levels:
-
fp32
-
Full precision (largest, most accurate)
-
fp16
-
Half precision (smaller, still accurate)
-
q8
-
8-bit quantized (much smaller, slight accuracy loss)
-
q4
-
4-bit quantized (smallest, noticeable accuracy loss)
- Task Compatibility Check the model card for:
-
Supported tasks (some models support multiple tasks)
-
Input/output formats
-
Language support (multilingual vs. English-only)
-
License restrictions
- Performance Metrics Model cards typically show:
-
Accuracy scores
-
Benchmark results
-
Inference speed
-
Memory requirements
Example: Finding a Text Generation Model
// 1. Visit: https://huggingface.co/models?pipeline_tag=text-generation&library=transformers.js&sort=trending
// 2. Browse and select a model (e.g., onnx-community/gemma-3-270m-it-ONNX)
// 3. Check model card for: // - Model size: ~270M parameters // - Quantization: q4 available // - Language: English // - Use case: Instruction-following chat
// 4. Use the model: import { pipeline } from '@huggingface/transformers';
const generator = await pipeline( 'text-generation', 'onnx-community/gemma-3-270m-it-ONNX', { dtype: 'q4' } // Use quantized version for faster inference );
const output = await generator('Explain quantum computing in simple terms.', { max_new_tokens: 100 });
await generator.dispose();
Tips for Model Selection
-
Start Small: Test with a smaller model first, then upgrade if needed
-
Check ONNX Support: Ensure the model has ONNX files (look for onnx folder in model repo)
-
Read Model Cards: Model cards contain usage examples, limitations, and benchmarks
-
Test Locally: Benchmark inference speed and memory usage in your environment
-
Community Models: Look for models by Xenova (Transformers.js maintainer) or onnx-community
-
Version Pin: Use specific git commits in production for stability: const pipe = await pipeline('task', 'model-id', { revision: 'abc123' });
Advanced Configuration
Environment Configuration (env )
The env object provides comprehensive control over Transformers.js execution, caching, and model loading.
Quick Overview:
import { env } from '@huggingface/transformers';
// View version console.log(env.version); // e.g., '3.8.1'
// Common settings env.allowRemoteModels = true; // Load from Hugging Face Hub env.allowLocalModels = false; // Load from file system env.localModelPath = '/models/'; // Local model directory env.useFSCache = true; // Cache models on disk (Node.js) env.useBrowserCache = true; // Cache models in browser env.cacheDir = './.cache'; // Cache directory location
Configuration Patterns:
// Development: Fast iteration with remote models env.allowRemoteModels = true; env.useFSCache = true;
// Production: Local models only env.allowRemoteModels = false; env.allowLocalModels = true; env.localModelPath = '/app/models/';
// Custom CDN env.remoteHost = 'https://cdn.example.com/models';
// Disable caching (testing) env.useFSCache = false; env.useBrowserCache = false;
For complete documentation on all configuration options, caching strategies, cache management, pre-downloading models, and more, see:
→ Configuration Reference
Working with Tensors
import { AutoTokenizer, AutoModel } from '@huggingface/transformers';
// Load tokenizer and model separately for more control const tokenizer = await AutoTokenizer.from_pretrained('bert-base-uncased'); const model = await AutoModel.from_pretrained('bert-base-uncased');
// Tokenize input const inputs = await tokenizer('Hello world!');
// Run model const outputs = await model(inputs);
Batch Processing
const classifier = await pipeline('sentiment-analysis');
// Process multiple texts const results = await classifier([ 'I love this!', 'This is terrible.', 'It was okay.' ]);
Browser-Specific Considerations
WebGPU Usage
WebGPU provides GPU acceleration in browsers:
const pipe = await pipeline('text-generation', 'onnx-community/gemma-3-270m-it-ONNX', { device: 'webgpu', dtype: 'fp32' });
Note: WebGPU is experimental. Check browser compatibility and file issues if problems occur.
WASM Performance
Default browser execution uses WASM:
// Optimized for browsers with quantization const pipe = await pipeline('sentiment-analysis', 'model-id', { dtype: 'q8' // or 'q4' for even smaller size });
Progress Tracking & Loading Indicators
Models can be large (ranging from a few MB to several GB) and consist of multiple files. Track download progress by passing a callback to the pipeline() function:
import { pipeline } from '@huggingface/transformers';
// Track progress for each file const fileProgress = {};
function onProgress(info) {
console.log(${info.status}: ${info.file});
if (info.status === 'progress') {
fileProgress[info.file] = info.progress;
console.log(${info.file}: ${info.progress.toFixed(1)}%);
}
if (info.status === 'done') {
console.log(✓ ${info.file} complete);
}
}
// Pass callback to pipeline const classifier = await pipeline('sentiment-analysis', null, { progress_callback: onProgress });
Progress Info Properties:
interface ProgressInfo { status: 'initiate' | 'download' | 'progress' | 'done' | 'ready'; name: string; // Model id or path file: string; // File being processed progress?: number; // Percentage (0-100, only for 'progress' status) loaded?: number; // Bytes downloaded (only for 'progress' status) total?: number; // Total bytes (only for 'progress' status) }
For complete examples including browser UIs, React components, CLI progress bars, and retry logic, see:
→ Pipeline Options - Progress Callback
Error Handling
try { const pipe = await pipeline('sentiment-analysis', 'model-id'); const result = await pipe('text to analyze'); } catch (error) { if (error.message.includes('fetch')) { console.error('Model download failed. Check internet connection.'); } else if (error.message.includes('ONNX')) { console.error('Model execution failed. Check model compatibility.'); } else { console.error('Unknown error:', error); } }
Performance Tips
-
Reuse Pipelines: Create pipeline once, reuse for multiple inferences
-
Use Quantization: Start with q8 or q4 for faster inference
-
Batch Processing: Process multiple inputs together when possible
-
Cache Models: Models are cached automatically (see Caching Reference for details on browser Cache API, Node.js filesystem cache, and custom implementations)
-
WebGPU for Large Models: Use WebGPU for models that benefit from GPU acceleration
-
Prune Context: For text generation, limit max_new_tokens to avoid memory issues
-
Clean Up Resources: Call pipe.dispose() when done to free memory
Memory Management
IMPORTANT: Always call pipe.dispose() when finished to prevent memory leaks.
const pipe = await pipeline('sentiment-analysis'); const result = await pipe('Great product!'); await pipe.dispose(); // ✓ Free memory (100MB - several GB per model)
When to dispose:
-
Application shutdown or component unmount
-
Before loading a different model
-
After batch processing in long-running apps
Models consume significant memory and hold GPU/CPU resources. Disposal is critical for browser memory limits and server stability.
For detailed patterns (React cleanup, servers, browser), see Code Examples
Troubleshooting
Model Not Found
-
Verify model exists on Hugging Face Hub
-
Check model name spelling
-
Ensure model has ONNX files (look for onnx folder in model repo)
Memory Issues
-
Use smaller models or quantized versions (dtype: 'q4' )
-
Reduce batch size
-
Limit sequence length with max_length
WebGPU Errors
-
Check browser compatibility (Chrome 113+, Edge 113+)
-
Try dtype: 'fp16' if fp32 fails
-
Fall back to WASM if WebGPU unavailable
Reference Documentation
This Skill
-
Pipeline Options - Configure pipeline() with progress_callback , device , dtype , etc.
-
Configuration Reference - Global env configuration for caching and model loading
-
Caching Reference - Browser Cache API, Node.js filesystem cache, and custom cache implementations
-
Text Generation Guide - Streaming, chat format, and generation parameters
-
Model Architectures - Supported models and selection tips
-
Code Examples - Real-world implementations for different runtimes
Official Transformers.js
-
Official docs: https://huggingface.co/docs/transformers.js
-
API reference: https://huggingface.co/docs/transformers.js/api/pipelines
-
Model hub: https://huggingface.co/models?library=transformers.js
-
Examples: https://github.com/huggingface/transformers.js/tree/main/examples
Best Practices
-
Always Dispose Pipelines: Call pipe.dispose() when done - critical for preventing memory leaks
-
Start with Pipelines: Use the pipeline API unless you need fine-grained control
-
Test Locally First: Test models with small inputs before deploying
-
Monitor Model Sizes: Be aware of model download sizes for web applications
-
Handle Loading States: Show progress indicators for better UX
-
Version Pin: Pin specific model versions for production stability
-
Error Boundaries: Always wrap pipeline calls in try-catch blocks
-
Progressive Enhancement: Provide fallbacks for unsupported browsers
-
Reuse Models: Load once, use many times - don't recreate pipelines unnecessarily
-
Graceful Shutdown: Dispose models on SIGTERM/SIGINT in servers
Quick Reference: Task IDs
Task Task ID
Text classification text-classification or sentiment-analysis
Token classification token-classification or ner
Question answering question-answering
Fill mask fill-mask
Summarization summarization
Translation translation
Text generation text-generation
Text-to-text generation text2text-generation
Zero-shot classification zero-shot-classification
Image classification image-classification
Image segmentation image-segmentation
Object detection object-detection
Depth estimation depth-estimation
Image-to-image image-to-image
Zero-shot image classification zero-shot-image-classification
Zero-shot object detection zero-shot-object-detection
Automatic speech recognition automatic-speech-recognition
Audio classification audio-classification
Text-to-speech text-to-speech or text-to-audio
Image-to-text image-to-text
Document question answering document-question-answering
Feature extraction feature-extraction
Sentence similarity sentence-similarity
This skill enables you to integrate state-of-the-art machine learning capabilities directly into JavaScript applications without requiring separate ML servers or Python environments.