Google Gemini SDK Patterns
Quick Guide: Use the
@google/genaipackage (the unified SDK, NOT the deprecated@google/generative-ai) for all Gemini API interactions. All operations flow through a centralGoogleGenAIclient with service accessors:ai.modelsfor generation,ai.chatsfor multi-turn,ai.filesfor uploads,ai.cachesfor context caching. UseresponseMimeType: "application/json"withresponseJsonSchemafor structured output. Access response text viaresponse.text(property, not method). Streaming usesgenerateContentStreamreturning an async iterable -- iterate withfor await.
<critical_requirements>
CRITICAL: Before Using This Skill
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST use @google/genai (the new unified SDK) -- NOT the deprecated @google/generative-ai package)
(You MUST access response text via response.text (a property) -- NOT response.text() (the old SDK used a method call))
(You MUST pass model as a string parameter in every API call -- there is no getGenerativeModel() step)
(You MUST use config for all generation parameters (temperature, safetySettings, tools, systemInstruction) -- NOT top-level properties)
(You MUST never hardcode API keys -- use environment variables via process.env.GEMINI_API_KEY or GOOGLE_API_KEY)
</critical_requirements>
Auto-detection: Gemini, gemini, GoogleGenAI, @google/genai, ai.models.generateContent, generateContentStream, ai.chats, ai.files, ai.caches, gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash, gemini-3-flash, gemini-embedding, GEMINI_API_KEY, GOOGLE_API_KEY, FunctionCallingConfigMode, createUserContent, createPartFromUri, responseMimeType, responseJsonSchema
When to use:
- Building applications that call Google Gemini models directly (Gemini 2.x, 2.5, 3.x)
- Processing multimodal input: images, video, audio, PDFs
- Implementing function calling / tool use with custom functions or built-in tools (Google Search, code execution)
- Extracting structured JSON data from LLM responses using response schemas
- Streaming text generation for user-facing output
- Creating embeddings for RAG pipelines or semantic search (text and multimodal)
- Caching large context (documents, code) to reduce cost and latency across multiple requests
- Multi-turn chat sessions with automatic history management
Key patterns covered:
- Client initialization and environment-based configuration
- Text generation with
ai.models.generateContent() - Streaming with
ai.models.generateContentStream()andfor await - Multimodal input (inline base64, file upload, URIs)
- Function calling with
FunctionDeclarationand manual tool loops - Structured output with
responseMimeType+responseJsonSchema+ Zod - Chat sessions with
ai.chats.create()andsendMessage() - Embeddings with
ai.models.embedContent()(text and multimodal) - Context caching with
ai.caches.create() - Safety settings per-request via
config.safetySettings
When NOT to use:
- Multi-provider applications requiring provider switching -- use a unified provider SDK
- React-specific chat UI hooks (
useChat) -- use a framework-integrated AI SDK - When you need features unique to another provider's API -- use that provider's SDK directly
Examples Index
- Core: Setup & Configuration -- Client init, text generation, system instructions, error handling
- Multimodal Input -- Inline images, file upload, video, audio, PDF,
createPartFromUri - Streaming --
generateContentStream,sendMessageStream, abort patterns - Function Calling / Tools --
FunctionDeclaration,FunctionCallingConfigMode, manual tool loop, built-in tools - Structured Output -- JSON mode, Zod schemas,
responseJsonSchema, enum extraction - Chat Sessions --
ai.chats.create(), multi-turn, streaming chat, history - Advanced: Embeddings, Caching & Safety -- Embeddings, context caching, safety settings, token counting
- Quick API Reference -- Model IDs, method signatures, config parameters, safety enums
<philosophy>
Philosophy
The @google/genai SDK is Google's unified client for the Gemini API and Vertex AI. It replaces the deprecated @google/generative-ai package with a cleaner, centralized architecture.
Core principles:
- Centralized client -- A single
GoogleGenAIinstance provides all API services viaai.models,ai.chats,ai.files,ai.caches. No scattered manager classes. - Model-per-call -- Pass the model ID string in every API call rather than binding to a model instance. This simplifies multi-model usage.
- Config object pattern -- All generation parameters (
temperature,systemInstruction,tools,safetySettings) go inside aconfigobject, keeping the top-level call clean. - Native multimodal -- Images, video, audio, and PDFs are first-class inputs via inline data or file upload. Gemini models handle all modalities natively.
- Response as property -- Access
response.textas a property (not a method). Accessresponse.functionCallsfor tool calls.
When to use the Gemini SDK directly:
- You primarily use Google Gemini models
- You need multimodal input (images, video, audio, PDF) as a core feature
- You want built-in tools like Google Search and code execution
- You need context caching for large documents
- You want the simplest path to Gemini API features
When NOT to use:
- You need to switch between multiple providers -- use a unified SDK
- You want React-specific chat hooks -- use a framework-integrated AI SDK
- You need features unique to another provider's API -- use that provider's SDK directly
<patterns>
Core Patterns
Pattern 1: Client Setup
Initialize the GoogleGenAI client. It can auto-read GOOGLE_API_KEY from the environment.
// lib/gemini.ts
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
export { ai };
// Auto-reads GOOGLE_API_KEY from environment
const ai = new GoogleGenAI({});
Why good: Minimal setup, env var auto-detected, named export
// BAD: Using the old deprecated SDK
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI("hardcoded-key"); // WRONG
const model = genAI.getGenerativeModel({ model: "gemini-2.0-flash" });
Why bad: Old deprecated package, hardcoded API key, model binding step no longer needed
See: examples/core.md for Vertex AI setup, environment variables, error handling
Pattern 2: Text Generation
Pass model and contents directly -- no getGenerativeModel() step.
const response = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: "Explain TypeScript generics briefly.",
config: {
systemInstruction: "You are a concise coding tutor.",
temperature: 0.3,
},
});
console.log(response.text);
Why good: Model specified per-call, system instruction in config, response.text as property
// BAD: Old SDK patterns that don't work
const model = genAI.getGenerativeModel({ model: "gemini-2.0-flash" });
const result = await model.generateContent("Hello");
console.log(result.response.text()); // text() was a method in old SDK
Why bad: getGenerativeModel() doesn't exist in new SDK, text() is a property not a method
See: examples/core.md for system instructions, temperature, thinking config
Pattern 3: Streaming
Use generateContentStream and iterate with for await.
const response = await ai.models.gener