×

Context Engineering for AI Agents: Lessons from Building Manus

Yichao 'Peak' Ji, the builder of the Manus project, shared their practical experience and lessons learned in Context Engineering for AI Agents, emphasizing that in the rapidly iterating AI field, relying on in-context learning and effective context management (rather than training models from scratch) is key to rapid product development and decoupling from underlying model technologies, and detailed how to improve agent performance, efficiency, robustness, and adaptability through refined context design via six core principles (such as optimizing KV-cache, intelligent tool management, using file systems as external memory, actively guiding attention, retaining error information to promote learning, and avoiding excessive few-shot).

1. Designing Around KV-Cache

Impact of KV Cache on Model Latency and Cost

The author considers KV-cache hit rate the most important metric at the product stage for AI Agents, directly impacting latency and cost. The operating mechanism of agents dictates a highly skewed input-to-output token ratio; for example, in Manus, the average input to output token ratio is approximately 100:1. KV-cache significantly reduces time-to-first-token (TTFT) and inference cost by caching identical prefixes.

Cost-Effectiveness Significance

For instance, using Claude Sonnet, the cost of cached input tokens is $0.30/MTok, while uncached ones are as high as $3/MTok, representing a 10x difference.

Key Practices for Improving KV-Cache Hit Rate

Maintain Prompt Stability

Even a single token difference can invalidate the cache (e.g., system prompt with precise timestamps).

Context Append-Only

Avoid modifying previous actions or observations to prevent cache invalidation.

Deterministic Serialization

Ensure key order stability during JSON object serialization to avoid silently breaking cache.

Explicit Cache Breakpoints

Manually insert if your model provider/framework lacks automatic incremental prefix caching.

KV Cache Impact on Model Latency and Cost

Click image to enlarge

KV Cache Impact on Model Latency and Cost

2. Masking Rather Than Removing

Managing Agent Action Space

As agent capabilities continuously enhance, their action space become increasingly complex, with an explosive growth in the number of tools. Even with user-defined tools, this can lead to agents choosing incorrect actions or taking inefficient paths. The Manus team experimented with dynamic action spaces, but found that unless absolutely necessary, tools should not be dynamically added or removed during iteration because: tool definitions are usually at the frontend of the context, and any changes invalidate the KV-cache for subsequent actions and observations. Moreover, if previous actions and observations refer to tools not defined in the current context, the model becomes confused, potentially leading to schema violation or hallucinated action.

Manus's Approach: Context-Aware State Machine

Manus manages tool availability through a context-aware state machine: by masking token logits during decoding to prevent or enforce specific action selections, rather than removing tools. Most model providers and inference frameworks support some form of response prefill, allowing constraints on the action space without modifying tool definitions.

Manus Optimizes Model Behavior by Adjusting Tool Availability

Click image to enlarge

Manus Optimizes Model Behavior by Adjusting Tool Availability

Hermes Format: Function Calling Modes

Auto Mode

Model can choose whether or not to call a function.

Required Mode

Model must call a function, but selection is unconstrained.

Specified Mode

Model must call a function from a specific subset.

Manus also designs action names with consistent prefixes (e.g., all browser-related tools start with 'browser_', command-line tools with 'shell_'), making it convenient to force the agent to choose from a specific group of tools in a given state.

3. Using File System as Context

Limitations of Traditional Context Windows

Modern frontier LLMs' context window can reach 128K or more tokens, but in real-world agentic scenarios, it is often insufficient and can even become a burden. This is primarily because observation results can be huge (e.g., when processing web pages or PDFs), easily exceeding context limits. Even if the context window technically supports it, model performance tends to decline after exceeding a certain context length. Long input costs are high; even with prefix caching, the cost of transmitting and prefilling each token must be paid. To address this, many agent systems implement context truncation or compression strategies, but overly aggressive compression inevitably leads to information loss.

The "Ultimate" Context: File System

  • Unlimited size: its capacity is not restricted.
  • Inherent persistence: data remains stable over time.
  • Directly manipulable by agent: enables agent autonomy.

Manus Uses File System as External Memory

Click image to enlarge

Manus Uses File System as External Memory

Manus learns to read and write files on demand, using the file system not only as storage but also as structured, externalized memory. Manus's compression strategies are always recoverable: for example, web content can be removed from the context as long as the URL is retained; document content can be omitted as long as its path in the sandbox is available. This way, Manus can narrow down the context length without permanently losing information. The author also envisions the potential of State Space Model (SSM) in agentic settings: if SSMs can master file-based memory, externalizing long-term states instead of keeping them in context, their speed and efficiency might unlock new types of agents.

4. Manipulating Attention Through Reiteration

The Role of 'todo.md' in Attention Guidance

When handling complex tasks, Manus tends to create a todo.md file and manipulates attention by progressively updating and checking off completed items. This is a deliberate mechanism within Manus: a typical task in Manus requires an average of 50 tool calls, which is a long cycle. Since Manus relies on LLMs for decision-making, in long contexts or complex tasks, it easily deviates from the topic or forgets early goals, i.e., the lost-in-the-middle problem and goal misalignment.

Reiteration for Focus

By constantly rewriting the todo list, Manus reiterates its goals to the end of the context, pushing the global plan into the model's recent attention span. This practice, without changing the architecture, effectively uses natural language to bias the model's own focus, making it more concentrated on task objectives.

Create/Update
todo.md
Complete/Check
Items
Reiterate
Context
Focus

Manus Guides Model Attention by Reiteration of Todo List

Click image to enlarge

Manus Guides Model Attention by Reiteration of Todo List

5. Retaining Error Information

Embracing Failure for Learning

It is normal for agents to make mistakes, not an error; LLMs may hallucination, the environment may return errors, external tools may behave abnormally, and unexpected edge cases always occur. In multi-step tasks, failure is part of the cycle, not an exception. A common impulse to handle errors is to hide them (clean traces, retry actions, or reset model state), but this comes at a cost: erasing failures eliminates evidence.

Manus's Strategy: Retain Error Traces

  • When the model sees a failed action and its resulting observation or stack trace, it implicitly updates its internal beliefs.
  • This makes its prior knowledge no longer biased towards similar actions, thereby reducing the chance of repeating the same mistakes.
  • Error recovery is one of the clearest indicators of true agentic behavior, though often underrepresented in benchmarks.

Retaining Error Traces Enables Agent to Learn from Historical Experience

Click image to enlarge

Retaining Error Traces Enables Agent to Learn from Historical Experience

6. Avoiding the Few-Shot Trap

The Pitfalls of Over-reliance on Few-Shot Prompting

Few-shot prompting is a common technique to improve LLM output, but it can be counterproductive in agent systems. Language Models excel at imitation: they mimic behavior patterns in the context. If the context is filled with similar action-observation pairs, the model will tend to follow that pattern, even if it is no longer optimal. This can be dangerous in tasks involving repetitive decisions or actions: for instance, when Manus helps review a batch of 20 resumes, the agent often falls into a rhythm, repeating similar actions simply because it sees these patterns in the context. This can lead to drift, overgeneralization, or hallucination.

Solution: Increase Diversity

  • Manus introduces a small amount of structured variation in actions and observations.
  • Such as different serialization templates, alternate phrasing, or a small amount of sequential or format noise.
  • This controlled randomness helps break patterns and adjust the model's attention, thereby preventing the agent from becoming fragile due to overly uniform context.

Overly Few-Shot Can Lead to Rigid Agent Behavior

Click image to enlarge

Overly Few-Shot Can Lead to Rigid Agent Behavior