Together AI SDK Patterns

Quick Guide: Use the together-ai npm package to access 200+ open-source models (Llama, Qwen, Mistral, DeepSeek) via Together AI's fast inference API. The SDK mirrors the OpenAI API shape -- client.chat.completions.create() for chat, client.images.generate() for images, client.embeddings.create() for embeddings. Use response_format: { type: "json_schema" } with Zod-generated schemas for structured output. Function calling uses the same tools parameter shape as OpenAI. You can also use the OpenAI SDK directly by pointing baseURL to https://api.together.xyz/v1.

<critical_requirements>

CRITICAL: Before Using This Skill

All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering, import type, named constants)

(You MUST use the together-ai package (import Together from "together-ai") -- NOT the OpenAI SDK -- unless explicitly building an OpenAI-compatible integration)

(You MUST include the JSON schema in BOTH the response_format parameter AND the system prompt when using structured output -- the model needs both)

(You MUST handle errors using Together.APIError and its subclasses -- never use bare catch blocks without error type checking)

(You MUST never hardcode API keys -- always use environment variables via process.env.TOGETHER_API_KEY)

</critical_requirements>

Auto-detection: Together AI, together-ai, together.ai, TOGETHER_API_KEY, client.chat.completions (together), client.images.generate, client.embeddings.create (together), Llama-3, Qwen3, Mistral, DeepSeek, FLUX, together.images, together.chat, together.embeddings, together.fineTuning, api.together.xyz

When to use:

Running open-source LLMs (Llama, Qwen, Mistral, DeepSeek) via serverless inference
Generating images with FLUX or Stable Diffusion models
Creating embeddings for RAG pipelines with open-source embedding models
Using function calling / tool use with open-source models
Extracting structured JSON output from LLM responses
Fine-tuning open-source models on custom data
Migrating from OpenAI to open-source models with minimal code changes

Key patterns covered:

Client initialization and configuration (retries, timeouts, logging)
Chat completions with open-source models (Llama, Qwen, Mistral, DeepSeek)
Streaming with stream: true and for await...of
Structured output with response_format: { type: "json_schema" } and Zod
Function calling / tool use with tools parameter
Image generation with FLUX and Stable Diffusion models
Embeddings API with open-source embedding models
Fine-tuning API (file upload, job creation, monitoring)
OpenAI SDK compatibility (base URL swap)
Error handling, retries, timeouts

When NOT to use:

You need OpenAI-specific features (Responses API, Batch API, Realtime API) -- use the OpenAI SDK directly
You want framework-specific chat UI hooks -- use a framework-integrated AI SDK
You only use OpenAI models and never plan to use open-source models

Examples Index

Core: Setup & Configuration -- Client init, production config, error handling, OpenAI compatibility
Chat Completions -- Basic chat, multi-turn, model selection, vision
Streaming -- Async iteration, stream cancellation
Tool/Function Calling -- Tool definitions, multi-step tool loops
Structured Output -- JSON mode, Zod schemas, regex mode
Images & Embeddings -- FLUX image generation, embedding models, semantic search
Quick API Reference -- Model IDs, method signatures, error types

Philosophy

Together AI provides fast serverless inference for open-source models. The TypeScript SDK (together-ai) is auto-generated with Stainless and mirrors the OpenAI API shape, making migration straightforward.

Core principles:

OpenAI-compatible API shape -- Same client.chat.completions.create() pattern, same messages array, same tools parameter. Switching from OpenAI is often just changing the import and model name.
Open-source model access -- Run Llama, Qwen, Mistral, DeepSeek, and 200+ other models without managing infrastructure. Models are identified by their Hugging Face-style IDs (e.g., meta-llama/Llama-3.3-70B-Instruct-Turbo).
Multi-modal support -- Chat completions, image generation (FLUX, Stable Diffusion), embeddings, audio, and video -- all through one SDK.
Structured output via JSON Schema -- Pass a JSON schema in response_format and include it in the system prompt. Use Zod's z.toJSONSchema() to generate schemas from TypeScript types.
Fine-tuning open-source models -- Upload JSONL data, create LoRA or full fine-tuning jobs, and deploy custom models -- all via the API.

When to use Together AI:

You want to use open-source models with fast serverless inference
You need cost-effective inference (often cheaper than proprietary APIs)
You want to fine-tune open-source models on your data
You need image generation with FLUX models
You want OpenAI API compatibility for easy migration

When NOT to use:

You need OpenAI-specific features (Responses API, Batch API, Realtime) -- use the OpenAI SDK
You need Anthropic or Google-specific features -- use their respective SDKs
You want a provider-agnostic SDK -- use a unified provider framework

</philosophy>

Core Patterns

Pattern 1: Client Setup

Initialize the Together client. It reads TOGETHER_API_KEY from the environment.

// lib/together.ts -- basic setup
import Together from "together-ai";
const client = new Together();
export { client };

// lib/together.ts -- production configuration
const TIMEOUT_MS = 30_000;
const MAX_RETRIES = 3;

const client = new Together({
  apiKey: process.env.TOGETHER_API_KEY,
  timeout: TIMEOUT_MS,
  maxRetries: MAX_RETRIES,
});
export { client };

Why good: Minimal setup, env var auto-detected, named constants for production settings

// BAD: Hardcoded API key
const client = new Together({
  apiKey: "sk-abc123...",
});

Why bad: Hardcoded keys get leaked in version control, security breach risk

See: examples/core.md for error handling, OpenAI compatibility, per-request overrides

Pattern 2: Chat Completions

Stateless text generation with open-source models.

const completion = await client.chat.completions.create({
  model: "meta-llama/Llama-3.3-70B-Instruct-Turbo",
  messages: [
    { role: "system", content: "You are a helpful coding assistant." },
    { role: "user", content: "Explain TypeScript generics." },
  ],
});
console.log(completion.choices[0].message.content);

Why good: Clear message roles, system message for behavior control, direct content access

// BAD: No system message, no model specified
const res = await client.chat.completions.create({
  messages: [{ role: "user", content: "do something" }],
});

Why bad: Missing model field will error, no system instruction means unpredictable behavior

See: examples/chat.md for multi-turn, vision models, model selection guide

Pattern 3: Streaming

Use streaming for user-facing responses.

const stream = await client.chat.completions.create({
  model: "meta-llama/Llama-3.3-70B-Instruct-Turbo",
  messages: [{ role: "user", content: "Explain async/await." }],
  stream: true,
});
for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

Why good: Progressive output for better UX, standard async iterator pattern

// BAD: Not consuming the stream
const stream = await client.chat.completions.create({

ai-infrastructure-together-ai

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

internal-comms

babysit

do

smart-explore

Recibe nuevas skills de DevOps e Infra todos los lunes