Context Engineering

Context engineering is the discipline of curating and maintaining the optimal set of tokens during LLM inference. Unlike prompt engineering (crafting individual prompts), context engineering focuses on what information enters the context window and when.

Core Principles
Context Management Strategies
System Prompt Design
Tool Design for Context Efficiency
Long-Horizon Task Patterns
Implementation Patterns
Best Practices
References

Core Principles

Context as a Finite Resource

LLMs have limited "attention budgets." As context length increases, models experience context rot—decreased ability to accurately recall information. The goal is finding the smallest possible set of high-signal tokens that maximize desired outcomes.

Effective Context = Relevant Information / Total Tokens

Key insight: More context isn't better. The right context is better.

The Context Pollution Problem

Every token added to context has costs:

Increased latency and compute
Diluted attention to important information
Higher risk of hallucination from conflicting data
Reduced model performance on retrieval tasks

Context Management Strategies

1. Context Trimming

Drop older conversation turns, keeping only the last N turns.

Aspect	Details
Mechanism	Sliding window over conversation history
Pros	Deterministic, zero latency, preserves recent context verbatim
Cons	Abrupt loss of long-range context, "amnesia" effect
Best for	Independent tasks, short interactions, predictable workflows

def trim_context(messages: list, keep_last_n: int = 10) -> list:
    """Keep system message + last N turns."""
    system_msgs = [m for m in messages if m["role"] == "system"]
    other_msgs = [m for m in messages if m["role"] != "system"]
    return system_msgs + other_msgs[-keep_last_n:]

2. Context Summarization

Compress prior messages into structured summaries.

Aspect	Details
Mechanism	LLM generates summary of older context
Pros	Retains long-range memory, smoother UX, scalable
Cons	Summarization bias risk, added latency, potential compounding errors
Best for	Complex multi-step tasks, long-horizon interactions

SUMMARIZATION_PROMPT = """Summarize the conversation so far, preserving:
1. Key decisions made
2. Important context established
3. Current task state and goals
4. Any constraints or preferences expressed

Be concise but complete. Output as structured markdown."""

async def summarize_context(messages: list, model) -> str:
    """Generate a summary of conversation history."""
    conversation_text = format_messages_for_summary(messages)
    response = await model.generate(
        system=SUMMARIZATION_PROMPT,
        user=conversation_text
    )
    return response.content

3. Hybrid Approach

Combine trimming and summarization for optimal balance.

class HybridContextManager:
    def __init__(
        self,
        keep_recent: int = 5,      # Recent turns to keep verbatim
        summary_threshold: int = 20, # When to trigger summarization
    ):
        self.keep_recent = keep_recent
        self.summary_threshold = summary_threshold
        self.running_summary = ""

    def process(self, messages: list) -> list:
        if len(messages) < self.summary_threshold:
            return messages

        # Summarize older messages
        old_messages = messages[:-self.keep_recent]
        self.running_summary = summarize(old_messages, self.running_summary)

        # Return summary + recent messages
        return [
            {"role": "system", "content": f"Previous context:\n{self.running_summary}"},
            *messages[-self.keep_recent:]
        ]

4. Session Memory

Persist reusable facts, preferences, and task state outside the context window. Load only the relevant slice for the current turn.

Aspect	Details
Mechanism	External store keyed by user, session, task, or resource
Pros	Recovers long-range context without carrying all history
Cons	Requires retrieval, freshness, and deletion policies
Best for	Agents, project work, personalization, long-running workflows

Separate durable memory from ephemeral scratchpads. Durable memory should contain stable facts and explicit decisions, not every intermediate thought.

System Prompt Design

Principles for Context-Efficient Prompts

Clear and direct language: Avoid ambiguity that requires clarification turns
Structured sections: Organize by purpose (role, capabilities, constraints)
Minimal yet comprehensive: Include only what affects behavior
Self-contained instructions: Reduce need for context retrieval

Example Structure

# Role
You are [specific role] that [primary function].

# Capabilities
- [Capability 1 with scope]
- [Capability 2 with scope]

# Constraints
- [Hard constraint]
- [Preference]

# Output Format
[Specific format requirements]

Tool Design for Context Efficiency

Just-in-Time Context Loading

Instead of front-loading all possible context, load information dynamically as needed.

# Anti-pattern: Loading everything upfront
context = load_all_user_data()  # Large, mostly unused
context += load_all_documents()  # Even larger

# Better: Just-in-time retrieval
tools = [
    Tool(
        name="get_user_preference",
        description="Get specific user preference by key",
        # Only fetches what's needed when asked
    ),
    Tool(
        name="search_documents",
        description="Search documents by query",
        # Returns relevant subset
    ),
]

Tool Design Principles

Self-contained: Each tool returns complete, usable information
Scoped: Tools do one thing well
Descriptive: Names and descriptions guide LLM toward correct usage
Error-robust: Return informative errors that don't pollute context

# Well-designed tool
def search_codebase(query: str, max_results: int = 5) -> str:
    """Search codebase for relevant code snippets.

    Args:
        query: Natural language description of what to find
        max_results: Maximum snippets to return (default 5)

    Returns:
        Formatted code snippets with file paths and line numbers,
        or 'No results found' if nothing matches.
    """
    results = perform_search(query, limit=max_results)
    if not results:
        return "No results found for query."
    return format_results(results)  # Concise, structured output

Long-Horizon Task Patterns

Pattern 1: Compaction

Periodically compress conversation history to reclaim context space.

async def compaction_loop(agent, messages, task):
    while not task.complete:
        # Process next step
        response = await agent.run(messages)
        messages.append(response)

        # Compact when approaching limit
        if estimate_tokens(messages) > TOKEN_LIMIT * 0.8:
            summary = await summarize_context(messages[:-3])
            messages = [
                {"role": "system", "content": agent.system_prompt},
                {"role": "assistant", "content": f"Summary of progress:\n{summary}"},
                *messages[-3:]  # Keep recent context
            ]

    return messages

Pattern 2: Structured Note-Taking

Agent maintains external notes, retrieving as needed.

class NoteTakingAgent:
    def __init__(self):
        self.notes = {}  # Key-value store outside context

    async def run(self, messages):
        tools = [
            Tool("save_note", self.save_note, "Save inform

context-engineering

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

webapp-testing

brand-guidelines

frontend-design

web-artifacts-builder

Recibe nuevas skills de Design e Frontend todos los lunes