ElevenLabs Patterns
Quick Guide: Use the official
@elevenlabs/elevenlabs-jspackage to interact with the ElevenLabs API. Useclient.textToSpeech.convert()for full audio generation orclient.textToSpeech.stream()for low-latency streaming. Voice settings (stability,similarityBoost,style) control output character. Useeleven_v3for best quality,eleven_flash_v2_5for lowest latency, oreleven_multilingual_v2for stable long-form content. The SDK returnsReadableStream<Uint8Array>-- pipe to files or HTTP responses. Use@elevenlabs/clientfor real-time conversational AI agents.
<critical_requirements>
CRITICAL: Before Using This Skill
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST use @elevenlabs/elevenlabs-js for server-side TTS, voice management, and speech-to-speech -- use @elevenlabs/client only for conversational AI agents)
(You MUST never hardcode API keys -- always use environment variables via process.env.ELEVENLABS_API_KEY which the SDK reads automatically)
(You MUST consume the ReadableStream<Uint8Array> returned by convert() and stream() -- unconsumed streams leak resources)
(You MUST choose the correct model for your use case -- eleven_v3 for quality, eleven_flash_v2_5 for speed, eleven_multilingual_v2 for long-form stability)
(You MUST pass voiceId as the first positional argument to all textToSpeech methods -- it is NOT inside the options object)
</critical_requirements>
Auto-detection: ElevenLabs, elevenlabs, ElevenLabsClient, textToSpeech.convert, textToSpeech.stream, eleven_multilingual_v2, eleven_flash_v2_5, eleven_v3, speechToSpeech, voices.search, voice cloning, ELEVENLABS_API_KEY, @elevenlabs/elevenlabs-js, @elevenlabs/client, text-to-speech, TTS, voice synthesis
When to use:
- Generating speech audio from text (narration, audiobooks, announcements)
- Streaming audio in real-time for low-latency playback
- Cloning voices from audio samples (instant or professional voice cloning)
- Converting speech from one voice to another (speech-to-speech)
- Building real-time conversational AI agents with voice interaction
- Controlling pronunciation with SSML or pronunciation dictionaries
- Generating audio with character-level timestamp alignment
Key patterns covered:
- Client initialization and configuration (retries, timeouts, API key)
- Text-to-speech conversion and streaming (
convert,stream, timestamps) - Voice settings (
stability,similarityBoost,style,speed) - Voice selection and management (
voices.search,voices.get) - Voice cloning (instant via
voices.ivc.create) - Speech-to-speech voice conversion
- WebSocket input streaming for real-time text-to-speech
- Pronunciation dictionaries and SSML
- Conversational AI agents (
@elevenlabs/client) - Model selection, output formats, error handling
When NOT to use:
- You need multi-provider voice AI (multiple TTS vendors) -- use a unified abstraction
- You only need browser-side audio playback without generation -- use the Web Audio API
- You need speech-to-text transcription only -- ElevenLabs has this, but it is a separate concern
Examples Index
- Core: Setup, TTS, Streaming & Voice Settings -- Client init, convert, stream, timestamps, voice settings, output formats
- Voices & Cloning -- Voice search, selection, instant voice cloning, speech-to-speech
- WebSocket & Conversational AI -- WebSocket input streaming, conversational AI agents, real-time patterns
- Quick API Reference -- Model IDs, method signatures, output formats, error types, voice settings
<philosophy>
Philosophy
The ElevenLabs SDK provides direct access to the most advanced voice AI API available. It wraps the ElevenLabs REST API with full TypeScript types, streaming support, and automatic retries.
Core principles:
- Streams everywhere -- All audio methods return
ReadableStream<Uint8Array>. You pipe them to files, HTTP responses, or audio players. The SDK never buffers entire audio files in memory. - Voice settings are the primary control surface --
stability,similarityBoost,style, andspeedshape every generation. Learn these four knobs well. - Model selection drives the quality/latency tradeoff --
eleven_v3for best quality,eleven_flash_v2_5for sub-75ms latency,eleven_multilingual_v2for stable long-form. - Two packages for two use cases --
@elevenlabs/elevenlabs-jsfor server-side TTS/voice management,@elevenlabs/clientfor browser-side conversational AI agents. - Built-in resilience -- The SDK retries on 408, 409, 429, and 5xx errors (2 retries by default) with configurable timeouts.
<patterns>
Core Patterns
Pattern 1: Client Setup
Initialize the ElevenLabs client. It auto-reads ELEVENLABS_API_KEY from the environment.
// lib/elevenlabs.ts -- basic setup
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
const client = new ElevenLabsClient();
export { client };
// lib/elevenlabs.ts -- production configuration
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
const TIMEOUT_SECONDS = 60;
const MAX_RETRIES = 3;
const client = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
timeoutInSeconds: TIMEOUT_SECONDS,
maxRetries: MAX_RETRIES,
});
export { client };
Why good: Minimal setup, env var auto-detected, named constants for production settings
// BAD: Hardcoded API key
const client = new ElevenLabsClient({
apiKey: "sk-1234567890abcdef",
});
Why bad: Hardcoded API key is a security breach risk, will leak in version control
See: examples/core.md for per-request overrides, error handling
Pattern 2: Text-to-Speech (Convert)
Generate complete audio from text. Returns ReadableStream<Uint8Array>.
import { createWriteStream } from "node:fs";
import { Readable } from "node:stream";
const VOICE_ID = "JBFqnCBsd6RMkjVDRZzb"; // George
const audio = await client.textToSpeech.convert(VOICE_ID, {
text: "Welcome to the application.",
modelId: "eleven_multilingual_v2",
outputFormat: "mp3_44100_128",
});
// Pipe to file
const readable = Readable.fromWeb(audio);
const fileStream = createWriteStream("output.mp3");
readable.pipe(fileStream);
Why good: voiceId as first arg (required), model and format explicit, stream piped to file without buffering
// BAD: voiceId inside options object
const audio = await client.textToSpeech.convert({
voiceId: VOICE_ID, // WRONG: voiceId is a positional argument
text: "Hello",
});
Why bad: voiceId is the first positional argument, not an options field -- this will throw a type error
See: examples/core.md for timestamps, HTTP response piping
Pattern 3: Text-to-Speech (Stream)
Stream audio for real-time playback with lower latency than convert().
const VOICE_ID = "JBFqnCBsd6RMkjVDRZzb";
const LATENCY_OPTIMIZATION = 2;
const audioStream = await client.textToSpeech.stream(VOICE_ID, {
text: "This streams with lower latency for real-time playback.",
modelId: "eleven_flash_v2_5",
optimizeStreamingLatency: LATENCY_OPTIMIZATION,
outputFormat: "mp3_44100_128",
});
// Consume the stream
for await (const chunk of audioStream) {
process.stdout.write(chunk); // Or pipe to audio player / HTTP response
}
Why good: Uses stream() for lower latency, eleven_flash_v2_5 for speed, optimizeStreamingLatency reduces first-byte time
// BAD: Stream created but never consumed
const audioStream = await client.textToSpeech.stream(VOICE_ID, {
text: "This audio is lost",
modelId: "eleven_flash_v2_5",
}