LlamaIndex.TS Patterns
Quick Guide: LlamaIndex.TS is a data framework for building context-aware LLM applications in TypeScript. Use
Settingssingleton to configure LLM and embedding models globally. Load documents withSimpleDirectoryReader, chunk withSentenceSplitter, index withVectorStoreIndex.fromDocuments(), and query withindex.asQueryEngine(). For agents, useagent()from@llamaindex/workflowwithtool()definitions using Zod schemas. All core operations are async -- every function returns a Promise. Thellamaindexpackage re-exports most things, but LLM providers require separate packages like@llamaindex/openaior@llamaindex/ollama.
<critical_requirements>
CRITICAL: Before Using This Skill
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST configure Settings.llm and Settings.embedModel before any indexing or querying -- the Settings singleton is lazily initialized and defaults to OpenAI, which will fail without an API key)
(You MUST await all LlamaIndex operations -- fromDocuments(), asQueryEngine(), query(), chat(), loadData() are ALL async)
(You MUST install provider packages separately -- @llamaindex/openai, @llamaindex/ollama, @llamaindex/anthropic are NOT included in the base llamaindex package)
(You MUST use storageContextFromDefaults({ persistDir }) to persist indexes -- without persistence, indexes are rebuilt from scratch on every restart)
(You MUST never hardcode API keys -- use environment variables and dotenv/config)
</critical_requirements>
Auto-detection: LlamaIndex, llamaindex, VectorStoreIndex, SimpleDirectoryReader, Settings.llm, Settings.embedModel, asQueryEngine, asChatEngine, ContextChatEngine, SentenceSplitter, storageContextFromDefaults, @llamaindex/openai, @llamaindex/ollama, @llamaindex/workflow, FunctionTool, QueryEngineTool, agentStreamEvent
When to use:
- Building RAG (Retrieval-Augmented Generation) applications with custom documents
- Loading, chunking, and indexing documents for LLM consumption
- Creating query engines that answer questions from indexed data
- Building chat interfaces with conversation memory over your data
- Implementing agentic RAG with tool-calling agents that query indexes
- Working with multiple data sources (files, PDFs, markdown, code)
- Persisting vector indexes to avoid re-indexing on every restart
Key patterns covered:
- Settings singleton for LLM and embedding model configuration
- Document loading with SimpleDirectoryReader and custom readers
- VectorStoreIndex creation, persistence, and querying
- Query engines and chat engines
- Agent creation with
agent()andtool()using Zod schemas - Text splitting and chunking strategies
- Streaming responses from query and chat engines
- Storage context and index persistence
When NOT to use:
- Simple one-shot LLM calls without document context -- use the LLM provider SDK directly
- Applications that only need embeddings without indexing -- use the embedding API directly
- Client-side / browser applications -- LlamaIndex.TS is server-side focused (Node.js >= 20)
Examples Index
- Core: Setup, Indexing & Querying -- Settings config, document loading, VectorStoreIndex, query engines, persistence
- Agents & Tools -- FunctionTool, agent(), multi-agent workflows, QueryEngineTool
- Chat & Streaming -- Chat engines, ContextChatEngine, streaming responses
- Ingestion & Splitting -- Text splitters, node parsers, ingestion pipeline, custom readers
- Quick API Reference -- Package map, method signatures, response modes, model providers
<philosophy>
Philosophy
LlamaIndex.TS is a data framework -- its core value proposition is connecting your data to LLMs through indexing, retrieval, and synthesis. It sits between raw LLM APIs and full application frameworks.
Core principles:
- Context engineering -- Inject the right data into the LLM prompt at the right time. This drives RAG, agent memory, extraction, and summarization.
- Modular provider system -- LLM providers, embedding models, vector stores, and readers are separate packages you compose. The base
llamaindexpackage provides the framework; providers are installed separately. - Settings singleton -- Global configuration for LLM, embedding model, node parser, and other shared resources. Set once, used everywhere. Override locally when needed.
- Async-first design -- Every I/O operation is async. Document loading, indexing, querying, and chat all return Promises.
- Index as the core abstraction -- Documents are loaded, split into nodes, embedded, and stored in an index. Queries retrieve relevant nodes and synthesize responses.
When to use LlamaIndex.TS:
- You have documents/data that need to be indexed for LLM consumption
- You want structured RAG pipelines with configurable retrieval and synthesis
- You need agentic RAG where agents query multiple indexes with tools
- You want persistence and incremental updates to your index
When NOT to use:
- Simple LLM calls without data context -- use the provider SDK directly
- Browser-only applications -- LlamaIndex.TS requires Node.js >= 20
- You only need embeddings -- use the embedding API directly
<patterns>
Core Patterns
Pattern 1: Settings Configuration
The Settings singleton configures LLM, embedding model, and node parser globally. Set it once at application startup before any indexing or querying.
import { Settings } from "llamaindex";
import { openai, OpenAIEmbedding } from "@llamaindex/openai";
// Configure at app startup -- before any index operations
Settings.llm = openai({ model: "gpt-4o" });
Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-small" });
Why good: Single configuration point, provider packages are explicit imports, model names are visible
// BAD: No Settings configuration, relying on implicit defaults
import { VectorStoreIndex, SimpleDirectoryReader } from "llamaindex";
// This will silently try to use OpenAI with OPENAI_API_KEY from env
// Fails with cryptic error if key is missing
const documents = await new SimpleDirectoryReader().loadData({
directoryPath: "./data",
});
const index = await VectorStoreIndex.fromDocuments(documents);
Why bad: Implicit defaults make failures confusing, no explicit provider, no model selection
See: examples/core.md for local LLM setup with Ollama, Anthropic configuration, and embedding model options
Pattern 2: Document Loading and Indexing
Load documents, create a vector index, and query it. This is the canonical RAG pipeline.
import { SimpleDirectoryReader, VectorStoreIndex, Settings } from "llamaindex";
import { openai, OpenAIEmbedding } from "@llamaindex/openai";
Settings.llm = openai({ model: "gpt-4o" });
Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-small" });
// Load all supported files from a directory
const documents = await new SimpleDirectoryReader().loadData({
directoryPath: "./data",
});
// Create vector index -- embeds and stores all document chunks
const index = await VectorStoreIndex.fromDocuments(documents);
// Query the index
const queryEngine = index.asQueryEngine();
const response = await queryEngine.query({ query: "What is the main topic?" });
console.log(response.message.content);
Why good: Complete pipeline in minimal code, explicit Settings, clear data flow
See: examples/core.md for persistence, custom readers, and advanced indexing options
Pattern 3: Index Persistence
Persist indexes to disk to avoid re-indexing on every restart.
import {
V