Transformers.js - Machine Learning for JavaScript
Transformers.js enables running state-of-the-art machine learning models directly in JavaScript, both in browsers and Node.js environments, with no server required.
When to Use This Skill
Use this skill when you need to:
- Run ML models for text analysis, generation, or translation in JavaScript
- Perform image classification, object detection, or segmentation
- Implement speech recognition or audio processing
- Build multimodal AI applications (text-to-image, image-to-text, etc.)
- Run models client-side in the browser without a backend
Installation
NPM Installation
npm install @huggingface/transformers
Browser Usage (CDN)
<script type="module">
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers';
</script>
Core Concepts
1. Pipeline API
The pipeline API is the easiest way to use models. It groups together preprocessing, model inference, and postprocessing:
import { pipeline } from '@huggingface/transformers';
// Create a pipeline for a specific task
const pipe = await pipeline('sentiment-analysis');
// Use the pipeline
const result = await pipe('I love transformers!');
// Output: [{ label: 'POSITIVE', score: 0.999817686 }]
// IMPORTANT: Always dispose when done to free memory
await classifier.dispose();
⚠️ Memory Management: All pipelines must be disposed with pipe.dispose() when finished to prevent memory leaks. See examples in Code Examples for cleanup patterns across different environments.
2. Model Selection
You can specify a custom model as the second argument:
const pipe = await pipeline(
'sentiment-analysis',
'Xenova/bert-base-multilingual-uncased-sentiment'
);
Finding Models:
Browse available Transformers.js models on Hugging Face Hub:
- All models: https://huggingface.co/models?library=transformers.js&sort=trending
- By task: Add
pipeline_tagparameter- Text generation: https://huggingface.co/models?pipeline_tag=text-generation&library=transformers.js&sort=trending
- Image classification: https://huggingface.co/models?pipeline_tag=image-classification&library=transformers.js&sort=trending
- Speech recognition: https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&library=transformers.js&sort=trending
Tip: Filter by task type, sort by trending/downloads, and check model cards for performance metrics and usage examples.
3. Device Selection
Choose where to run the model:
// Run on CPU (default for WASM)
const pipe = await pipeline('sentiment-analysis', 'model-id');
// Run on GPU (WebGPU - experimental)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
device: 'webgpu',
});
4. Quantization Options
Control model precision vs. performance:
// Use quantized model (faster, smaller)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
dtype: 'q4', // Options: 'fp32', 'fp16', 'q8', 'q4'
});
Supported Tasks
Note: All examples below show basic usage.
Natural Language Processing
Text Classification
const classifier = await pipeline('text-classification');
const result = await classifier('This movie was amazing!');
Named Entity Recognition (NER)
const ner = await pipeline('token-classification');
const entities = await ner('My name is John and I live in New York.');
Question Answering
const qa = await pipeline('question-answering');
const answer = await qa({
question: 'What is the capital of France?',
context: 'Paris is the capital and largest city of France.'
});
Text Generation
const generator = await pipeline('text-generation', 'onnx-community/gemma-3-270m-it-ONNX');
const text = await generator('Once upon a time', {
max_new_tokens: 100,
temperature: 0.7
});
For streaming and chat: See Text Generation Guide for:
- Streaming token-by-token output with
TextStreamer - Chat/conversation format with system/user/assistant roles
- Generation parameters (temperature, top_k, top_p)
- Browser and Node.js examples
- React components and API endpoints
Translation
const translator = await pipeline('translation', 'Xenova/nllb-200-distilled-600M');
const output = await translator('Hello, how are you?', {
src_lang: 'eng_Latn',
tgt_lang: 'fra_Latn'
});
Summarization
const summarizer = await pipeline('summarization');
const summary = await summarizer(longText, {
max_length: 100,
min_length: 30
});
Zero-Shot Classification
const classifier = await pipeline('zero-shot-classification');
const result = await classifier('This is a story about sports.', ['politics', 'sports', 'technology']);
Computer Vision
Image Classification
const classifier = await pipeline('image-classification');
const result = await classifier('https://example.com/image.jpg');
// Or with local file
const result = await classifier(imageUrl);
Object Detection
const detector = await pipeline('object-detection');
const objects = await detector('https://example.com/image.jpg');
// Returns: [{ label: 'person', score: 0.95, box: { xmin, ymin, xmax, ymax } }, ...]
Image Segmentation
const segmenter = await pipeline('image-segmentation');
const segments = await segmenter('https://example.com/image.jpg');
Depth Estimation
const depthEstimator = await pipeline('depth-estimation');
const depth = await depthEstimator('https://example.com/image.jpg');
Zero-Shot Image Classification
const classifier = await pipeline('zero-shot-image-classification');
const result = await classifier('image.jpg', ['cat', 'dog', 'bird']);
Audio Processing
Automatic Speech Recognition
const transcriber = await pipeline('automatic-speech-recognition');
const result = await transcriber('audio.wav');
// Returns: { text: 'transcribed text here' }
Audio Classification
const classifier = await pipeline('audio-classification');
const result = await classifier('audio.wav');
Text-to-Speech
const synthesizer = await pipeline('text-to-speech', 'Xenova/speecht5_tts');
const audio = await synthesizer('Hello, this is a test.', {
speaker_embeddings: speakerEmbeddings
});
Multimodal
Image-to-Text (Image Captioning)
const captioner = await pipeline('image-to-text');
const caption = await captioner('image.jpg');
Document Question Answering
const docQA = await pipeline('document-question-answering');
const answer = await docQA('document-image.jpg', 'What is the total amount?');
Zero-Shot Object Detection
const detector = await pipeline('zero-shot-object-detection');
const objects = await detector('image.jpg', ['person', 'car', 'tree']);
Feature Extraction (Embeddings)
const extractor = await pipeline('feature-extraction');
const embeddings = await extractor('This is a sentence to embed.');
// Returns: tensor of shape [1, sequence_length, hidden_size]
// For sentence embeddings (mean pooling)
const extractor = await pipeline('feature-extraction', 'onnx-community/all-MiniLM-L6-v2-ONNX');
const embeddings = await extractor('Text to embed', { pooling: 'mean', normalize: true });
Finding and Choosing Models
Browsing the Hugging Face Hub
Discover compatible Transformers.js models on Hugging Face Hub:
Base URL (all models):
https://huggingface.co/models?library=transformers.js&sort=trending
Filter by task using the pipeline_tag parameter:
| Task | URL |
|---|---|
| Text Generation | https://huggingface.co/models?pipeline_tag=text-gener |