LlamaIndex.TS Patterns

Quick Guide: LlamaIndex.TS is a data framework for building context-aware LLM applications in TypeScript. Use Settings singleton to configure LLM and embedding models globally. Load documents with SimpleDirectoryReader, chunk with SentenceSplitter, index with VectorStoreIndex.fromDocuments(), and query with index.asQueryEngine(). For agents, use agent() from @llamaindex/workflow with tool() definitions using Zod schemas. All core operations are async -- every function returns a Promise. The llamaindex package re-exports most things, but LLM providers require separate packages like @llamaindex/openai or @llamaindex/ollama.

<critical_requirements>

CRITICAL: Before Using This Skill

All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering, import type, named constants)

(You MUST configure Settings.llm and Settings.embedModel before any indexing or querying -- the Settings singleton is lazily initialized and defaults to OpenAI, which will fail without an API key)

(You MUST await all LlamaIndex operations -- fromDocuments(), asQueryEngine(), query(), chat(), loadData() are ALL async)

(You MUST install provider packages separately -- @llamaindex/openai, @llamaindex/ollama, @llamaindex/anthropic are NOT included in the base llamaindex package)

(You MUST use storageContextFromDefaults({ persistDir }) to persist indexes -- without persistence, indexes are rebuilt from scratch on every restart)

(You MUST never hardcode API keys -- use environment variables and dotenv/config)

</critical_requirements>

Auto-detection: LlamaIndex, llamaindex, VectorStoreIndex, SimpleDirectoryReader, Settings.llm, Settings.embedModel, asQueryEngine, asChatEngine, ContextChatEngine, SentenceSplitter, storageContextFromDefaults, @llamaindex/openai, @llamaindex/ollama, @llamaindex/workflow, FunctionTool, QueryEngineTool, agentStreamEvent

When to use:

Building RAG (Retrieval-Augmented Generation) applications with custom documents
Loading, chunking, and indexing documents for LLM consumption
Creating query engines that answer questions from indexed data
Building chat interfaces with conversation memory over your data
Implementing agentic RAG with tool-calling agents that query indexes
Working with multiple data sources (files, PDFs, markdown, code)
Persisting vector indexes to avoid re-indexing on every restart

Key patterns covered:

Settings singleton for LLM and embedding model configuration
Document loading with SimpleDirectoryReader and custom readers
VectorStoreIndex creation, persistence, and querying
Query engines and chat engines
Agent creation with agent() and tool() using Zod schemas
Text splitting and chunking strategies
Streaming responses from query and chat engines
Storage context and index persistence

When NOT to use:

Simple one-shot LLM calls without document context -- use the LLM provider SDK directly
Applications that only need embeddings without indexing -- use the embedding API directly
Client-side / browser applications -- LlamaIndex.TS is server-side focused (Node.js >= 20)

Examples Index

Core: Setup, Indexing & Querying -- Settings config, document loading, VectorStoreIndex, query engines, persistence
Agents & Tools -- FunctionTool, agent(), multi-agent workflows, QueryEngineTool
Chat & Streaming -- Chat engines, ContextChatEngine, streaming responses
Ingestion & Splitting -- Text splitters, node parsers, ingestion pipeline, custom readers
Quick API Reference -- Package map, method signatures, response modes, model providers

Philosophy

LlamaIndex.TS is a data framework -- its core value proposition is connecting your data to LLMs through indexing, retrieval, and synthesis. It sits between raw LLM APIs and full application frameworks.

Core principles:

Context engineering -- Inject the right data into the LLM prompt at the right time. This drives RAG, agent memory, extraction, and summarization.
Modular provider system -- LLM providers, embedding models, vector stores, and readers are separate packages you compose. The base llamaindex package provides the framework; providers are installed separately.
Settings singleton -- Global configuration for LLM, embedding model, node parser, and other shared resources. Set once, used everywhere. Override locally when needed.
Async-first design -- Every I/O operation is async. Document loading, indexing, querying, and chat all return Promises.
Index as the core abstraction -- Documents are loaded, split into nodes, embedded, and stored in an index. Queries retrieve relevant nodes and synthesize responses.

When to use LlamaIndex.TS:

You have documents/data that need to be indexed for LLM consumption
You want structured RAG pipelines with configurable retrieval and synthesis
You need agentic RAG where agents query multiple indexes with tools
You want persistence and incremental updates to your index

When NOT to use:

Simple LLM calls without data context -- use the provider SDK directly
Browser-only applications -- LlamaIndex.TS requires Node.js >= 20
You only need embeddings -- use the embedding API directly

</philosophy>

Core Patterns

Pattern 1: Settings Configuration

The Settings singleton configures LLM, embedding model, and node parser globally. Set it once at application startup before any indexing or querying.

import { Settings } from "llamaindex";
import { openai, OpenAIEmbedding } from "@llamaindex/openai";

// Configure at app startup -- before any index operations
Settings.llm = openai({ model: "gpt-4o" });
Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-small" });

Why good: Single configuration point, provider packages are explicit imports, model names are visible

// BAD: No Settings configuration, relying on implicit defaults
import { VectorStoreIndex, SimpleDirectoryReader } from "llamaindex";

// This will silently try to use OpenAI with OPENAI_API_KEY from env
// Fails with cryptic error if key is missing
const documents = await new SimpleDirectoryReader().loadData({
  directoryPath: "./data",
});
const index = await VectorStoreIndex.fromDocuments(documents);

Why bad: Implicit defaults make failures confusing, no explicit provider, no model selection

See: examples/core.md for local LLM setup with Ollama, Anthropic configuration, and embedding model options

Pattern 2: Document Loading and Indexing

Load documents, create a vector index, and query it. This is the canonical RAG pipeline.

import { SimpleDirectoryReader, VectorStoreIndex, Settings } from "llamaindex";
import { openai, OpenAIEmbedding } from "@llamaindex/openai";

Settings.llm = openai({ model: "gpt-4o" });
Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-small" });

// Load all supported files from a directory
const documents = await new SimpleDirectoryReader().loadData({
  directoryPath: "./data",
});

// Create vector index -- embeds and stores all document chunks
const index = await VectorStoreIndex.fromDocuments(documents);

// Query the index
const queryEngine = index.asQueryEngine();
const response = await queryEngine.query({ query: "What is the main topic?" });
console.log(response.message.content);

Why good: Complete pipeline in minimal code, explicit Settings, clear data flow

See: examples/core.md for persistence, custom readers, and advanced indexing options

Pattern 3: Index Persistence

Persist indexes to disk to avoid re-indexing on every restart.

import {
  V

ai-orchestration-llamaindex

How to add

Drop this on your repo README

Related skills

MoneyPrinterTurbo

weather-svg-creator

telegram-bot-builder

segment-automation

Get new Automação skills every Monday