ElevenLabs Patterns

Quick Guide: Use the official @elevenlabs/elevenlabs-js package to interact with the ElevenLabs API. Use client.textToSpeech.convert() for full audio generation or client.textToSpeech.stream() for low-latency streaming. Voice settings (stability, similarityBoost, style) control output character. Use eleven_v3 for best quality, eleven_flash_v2_5 for lowest latency, or eleven_multilingual_v2 for stable long-form content. The SDK returns ReadableStream<Uint8Array> -- pipe to files or HTTP responses. Use @elevenlabs/client for real-time conversational AI agents.

<critical_requirements>

CRITICAL: Before Using This Skill

All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering, import type, named constants)

(You MUST use @elevenlabs/elevenlabs-js for server-side TTS, voice management, and speech-to-speech -- use @elevenlabs/client only for conversational AI agents)

(You MUST never hardcode API keys -- always use environment variables via process.env.ELEVENLABS_API_KEY which the SDK reads automatically)

(You MUST consume the ReadableStream<Uint8Array> returned by convert() and stream() -- unconsumed streams leak resources)

(You MUST choose the correct model for your use case -- eleven_v3 for quality, eleven_flash_v2_5 for speed, eleven_multilingual_v2 for long-form stability)

(You MUST pass voiceId as the first positional argument to all textToSpeech methods -- it is NOT inside the options object)

</critical_requirements>

Auto-detection: ElevenLabs, elevenlabs, ElevenLabsClient, textToSpeech.convert, textToSpeech.stream, eleven_multilingual_v2, eleven_flash_v2_5, eleven_v3, speechToSpeech, voices.search, voice cloning, ELEVENLABS_API_KEY, @elevenlabs/elevenlabs-js, @elevenlabs/client, text-to-speech, TTS, voice synthesis

When to use:

Generating speech audio from text (narration, audiobooks, announcements)
Streaming audio in real-time for low-latency playback
Cloning voices from audio samples (instant or professional voice cloning)
Converting speech from one voice to another (speech-to-speech)
Building real-time conversational AI agents with voice interaction
Controlling pronunciation with SSML or pronunciation dictionaries
Generating audio with character-level timestamp alignment

Key patterns covered:

Client initialization and configuration (retries, timeouts, API key)
Text-to-speech conversion and streaming (convert, stream, timestamps)
Voice settings (stability, similarityBoost, style, speed)
Voice selection and management (voices.search, voices.get)
Voice cloning (instant via voices.ivc.create)
Speech-to-speech voice conversion
WebSocket input streaming for real-time text-to-speech
Pronunciation dictionaries and SSML
Conversational AI agents (@elevenlabs/client)
Model selection, output formats, error handling

When NOT to use:

You need multi-provider voice AI (multiple TTS vendors) -- use a unified abstraction
You only need browser-side audio playback without generation -- use the Web Audio API
You need speech-to-text transcription only -- ElevenLabs has this, but it is a separate concern

Examples Index

Core: Setup, TTS, Streaming & Voice Settings -- Client init, convert, stream, timestamps, voice settings, output formats
Voices & Cloning -- Voice search, selection, instant voice cloning, speech-to-speech
WebSocket & Conversational AI -- WebSocket input streaming, conversational AI agents, real-time patterns
Quick API Reference -- Model IDs, method signatures, output formats, error types, voice settings

Philosophy

The ElevenLabs SDK provides direct access to the most advanced voice AI API available. It wraps the ElevenLabs REST API with full TypeScript types, streaming support, and automatic retries.

Core principles:

Streams everywhere -- All audio methods return ReadableStream<Uint8Array>. You pipe them to files, HTTP responses, or audio players. The SDK never buffers entire audio files in memory.
Voice settings are the primary control surface -- stability, similarityBoost, style, and speed shape every generation. Learn these four knobs well.
Model selection drives the quality/latency tradeoff -- eleven_v3 for best quality, eleven_flash_v2_5 for sub-75ms latency, eleven_multilingual_v2 for stable long-form.
Two packages for two use cases -- @elevenlabs/elevenlabs-js for server-side TTS/voice management, @elevenlabs/client for browser-side conversational AI agents.
Built-in resilience -- The SDK retries on 408, 409, 429, and 5xx errors (2 retries by default) with configurable timeouts.

</philosophy>

Core Patterns

Pattern 1: Client Setup

Initialize the ElevenLabs client. It auto-reads ELEVENLABS_API_KEY from the environment.

// lib/elevenlabs.ts -- basic setup
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";

const client = new ElevenLabsClient();
export { client };

// lib/elevenlabs.ts -- production configuration
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";

const TIMEOUT_SECONDS = 60;
const MAX_RETRIES = 3;

const client = new ElevenLabsClient({
  apiKey: process.env.ELEVENLABS_API_KEY,
  timeoutInSeconds: TIMEOUT_SECONDS,
  maxRetries: MAX_RETRIES,
});

export { client };

Why good: Minimal setup, env var auto-detected, named constants for production settings

// BAD: Hardcoded API key
const client = new ElevenLabsClient({
  apiKey: "sk-1234567890abcdef",
});

Why bad: Hardcoded API key is a security breach risk, will leak in version control

See: examples/core.md for per-request overrides, error handling

Pattern 2: Text-to-Speech (Convert)

Generate complete audio from text. Returns ReadableStream<Uint8Array>.

import { createWriteStream } from "node:fs";
import { Readable } from "node:stream";

const VOICE_ID = "JBFqnCBsd6RMkjVDRZzb"; // George

const audio = await client.textToSpeech.convert(VOICE_ID, {
  text: "Welcome to the application.",
  modelId: "eleven_multilingual_v2",
  outputFormat: "mp3_44100_128",
});

// Pipe to file
const readable = Readable.fromWeb(audio);
const fileStream = createWriteStream("output.mp3");
readable.pipe(fileStream);

Why good: voiceId as first arg (required), model and format explicit, stream piped to file without buffering

// BAD: voiceId inside options object
const audio = await client.textToSpeech.convert({
  voiceId: VOICE_ID, // WRONG: voiceId is a positional argument
  text: "Hello",
});

Why bad: voiceId is the first positional argument, not an options field -- this will throw a type error

See: examples/core.md for timestamps, HTTP response piping

Pattern 3: Text-to-Speech (Stream)

Stream audio for real-time playback with lower latency than convert().

const VOICE_ID = "JBFqnCBsd6RMkjVDRZzb";
const LATENCY_OPTIMIZATION = 2;

const audioStream = await client.textToSpeech.stream(VOICE_ID, {
  text: "This streams with lower latency for real-time playback.",
  modelId: "eleven_flash_v2_5",
  optimizeStreamingLatency: LATENCY_OPTIMIZATION,
  outputFormat: "mp3_44100_128",
});

// Consume the stream
for await (const chunk of audioStream) {
  process.stdout.write(chunk); // Or pipe to audio player / HTTP response
}

Why good: Uses stream() for lower latency, eleven_flash_v2_5 for speed, optimizeStreamingLatency reduces first-byte time

// BAD: Stream created but never consumed
const audioStream = await client.textToSpeech.stream(VOICE_ID, {
  text: "This audio is lost",
  modelId: "eleven_flash_v2_5",
}

ai-provider-elevenlabs

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

webapp-testing

brand-guidelines

frontend-design

web-artifacts-builder

Recibe nuevas skills de Design e Frontend todos los lunes