Context Engineering for AI Agents: Lessons from Building Manus AI Agent上下文工程实践：来自Manus项目的经验

Yichao 'Peak' Ji, the builder of the Manus project, shared their practical experience and lessons learned in Context Engineering for AI Agents, emphasizing that in the rapidly iterating AI field, relying on in-context learning and effective context management (rather than training models from scratch) is key to rapid product development and decoupling from underlying model technologies, and detailed how to improve agent performance, efficiency, robustness, and adaptability through refined context design via six core principles (such as optimizing KV-cache, intelligent tool management, using file systems as external memory, actively guiding attention, retaining error information to promote learning, and avoiding excessive few-shot). Yichao 'Peak' Ji作为Manus项目的构建者，分享了他们在AI Agents的Context Engineering方面的实践经验与所得教训，强调了在快速迭代的AI领域，依赖in-context learning和有效的上下文管理（而非从头训练模型）是实现产品快速发展和与底层模型技术解耦的关键，并通过六大核心原则（如优化KV-cache、智能工具管理、利用文件系统作为外部记忆、主动引导注意力、保留错误信息以促进学习、以及避免过度few-shot）详细阐述了如何通过精细的上下文设计来提升agent的性能、效率、鲁棒性和适应性。

1. Designing Around KV-Cache 1. 围绕KV-Cache设计

Impact of KV Cache on Model Latency and Cost KV Cache对模型延迟与成本的影响

The author considers KV-cache hit rate the most important metric at the product stage for AI Agents, directly impacting latency and cost. The operating mechanism of agents dictates a highly skewed input-to-output token ratio; for example, in Manus, the average input to output token ratio is approximately 100:1. KV-cache significantly reduces time-to-first-token (TTFT) and inference cost by caching identical prefixes. 作者认为KV-cache hit rate是衡量AI Agent产品阶段最重要的指标，直接影响延迟和成本。agent的运行机制决定了其input-to-output token ratio高度倾斜，例如在Manus中，平均输入与输出的token比例约为100:1。KV-cache通过缓存相同的前缀，显著降低了time-to-first-token (TTFT)和推理成本。

Cost-Effectiveness Significance 显著的成本效益

For instance, using Claude Sonnet, the cost of cached input tokens is $0.30/MTok, while uncached ones are as high as $3/MTok, representing a 10x difference. 例如，使用Claude Sonnet时，缓存的输入token成本为$0.30/MTok，而未缓存的则高达$3/MTok，存在10x的巨大差异。

Key Practices for Improving KV-Cache Hit Rate 提高KV-Cache命中率的关键实践

Maintain Prompt Stability 保持Prompt稳定性

Even a single token difference can invalidate the cache (e.g., system prompt with precise timestamps). 即使单个token差异也可能使缓存失效（例如系统prompt中包含精确到秒的时间戳）。

Context Append-Only Context只追加

Avoid modifying previous actions or observations to prevent cache invalidation. 避免修改之前的动作或观察结果，以防缓存失效。

Deterministic Serialization 确定性序列化

Ensure key order stability during JSON object serialization to avoid silently breaking cache. 确保JSON对象序列化时键的稳定顺序，以避免悄无声息地破坏缓存。

Explicit Cache Breakpoints 显式标记缓存断点

Manually insert if your model provider/framework lacks automatic incremental prefix caching. 如果模型提供商/推理框架不支持自动增量前缀缓存，需要手动插入。

KV Cache Impact on Model Latency and Cost KV Cache对模型延迟与成本的影响

Click image to enlarge 点击图片放大

2. Masking Rather Than Removing 2. 遮蔽而非移除

Managing Agent Action Space 管理Agent的动作空间

As agent capabilities continuously enhance, their action space become increasingly complex, with an explosive growth in the number of tools. Even with user-defined tools, this can lead to agents choosing incorrect actions or taking inefficient paths. The Manus team experimented with dynamic action spaces, but found that unless absolutely necessary, tools should not be dynamically added or removed during iteration because: tool definitions are usually at the frontend of the context, and any changes invalidate the KV-cache for subsequent actions and observations. Moreover, if previous actions and observations refer to tools not defined in the current context, the model becomes confused, potentially leading to schema violation or hallucinated action. 随着agent能力不断增强，其action space会变得日益复杂，工具数量爆炸式增长，即使允许用户自定义工具，也可能会导致agent选择错误动作或采取低效路径。Manus团队尝试过动态action space，但发现除非绝对必要，应避免在迭代过程中动态添加或移除工具，因为：工具定义通常在context的前端，任何更改都会使后续动作和观察的KV-cache失效。当之前的动作和观察仍然引用当前context中未定义的工具时，模型会感到困惑，可能导致schema violation或hallucinated action。

Manus's Approach: Context-Aware State Machine Manus的策略：上下文感知状态机

Manus manages tool availability through a context-aware state machine: by masking token logits during decoding to prevent or enforce specific action selections, rather than removing tools. Most model providers and inference frameworks support some form of response prefill, allowing constraints on the action space without modifying tool definitions. Manus通过context-aware state machine管理工具可用性：通过在解码过程中masking token logits来阻止或强制选择特定动作，而非移除工具。大多数模型提供商和推理框架支持某种形式的response prefill，这允许在不修改工具定义的情况下约束action space。

Manus Optimizes Model Behavior by Adjusting Tool Availability Manus通过调整工具可用性来优化模型行为

Click image to enlarge 点击图片放大

Hermes Format: Function Calling Modes Hermes格式：函数调用模式

Auto Mode 自动模式

Model can choose whether or not to call a function. 模型可选择调用函数或不调用。

Required Mode 强制模式

Model must call a function, but selection is unconstrained. 模型必须调用函数，但选择不受约束。

Specified Mode 指定模式

Model must call a function from a specific subset. 模型必须调用特定子集中的函数。

Manus also designs action names with consistent prefixes (e.g., all browser-related tools start with 'browser_', command-line tools with 'shell_'), making it convenient to force the agent to choose from a specific group of tools in a given state. Manus还通过设计具有一致前缀的action names（例如，所有browser-related tools以`browser_`开头，command-line tools以`shell_`开头），方便在给定状态下强制agent从特定组的工具中选择。

3. Using File System as Context 3. 使用文件系统作为上下文

Limitations of Traditional Context Windows 传统上下文窗口的局限性

Modern frontier LLMs' context window can reach 128K or more tokens, but in real-world agentic scenarios, it is often insufficient and can even become a burden. This is primarily because observation results can be huge (e.g., when processing web pages or PDFs), easily exceeding context limits. Even if the context window technically supports it, model performance tends to decline after exceeding a certain context length. Long input costs are high; even with prefix caching, the cost of transmitting and prefilling each token must be paid. To address this, many agent systems implement context truncation or compression strategies, but overly aggressive compression inevitably leads to information loss. 现代frontier LLMs的context window可达128K或更多token，但在真实世界的agentic scenarios中仍经常不够用，甚至可能成为负担，主要原因包括：观察结果可能巨大，例如处理网页或PDF等非结构化数据时容易超出上下文限制。即使context window在技术上支持，模型性能也倾向于在超过一定上下文长度后下降。长输入成本高昂，即使有prefix caching，仍需支付传输和预填充每个token的费用。应对这一问题，许多agent systems实施context truncation或compression strategies，但过度激进的压缩不可避免地导致信息损失。

The "Ultimate" Context: File System “终极”上下文：文件系统

Unlimited size: its capacity is not restricted.大小不受限制：其容量没有限制。
Inherent persistence: data remains stable over time.本质上是持久的：数据随时间保持稳定。
Directly manipulable by agent: enables agent autonomy.可由agent自身直接操作：赋予agent自主性。

Manus Uses File System as External Memory Manus将文件系统作为外部记忆

Click image to enlarge 点击图片放大

Manus learns to read and write files on demand, using the file system not only as storage but also as structured, externalized memory. Manus's compression strategies are always recoverable: for example, web content can be removed from the context as long as the URL is retained; document content can be omitted as long as its path in the sandbox is available. This way, Manus can narrow down the context length without permanently losing information. The author also envisions the potential of State Space Model (SSM) in agentic settings: if SSMs can master file-based memory, externalizing long-term states instead of keeping them in context, their speed and efficiency might unlock new types of agents. Manus学习按需读写文件，将文件系统不仅用作存储，还用作结构化、外部化的记忆。Manus的compression strategies总是可恢复的：例如，网页内容可以从上下文中移除，只要其URL保留；文档内容可以省略，只要其在sandbox中的路径可用。通过这种方式，Manus可以在不永久丢失信息的情况下缩小上下文长度。作者还设想了State Space Model (SSM)在agentic setting中的潜力：如果SSM能掌握file-based memory，外部化长期状态而非将其保留在上下文中，那么其速度和效率可能解锁新型agent。

4. Manipulating Attention Through Reiteration 4. 通过复述操纵注意力

The Role of 'todo.md' in Attention Guidance 'todo.md'在注意力引导中的作用

When handling complex tasks, Manus tends to create a todo.md file and manipulates attention by progressively updating and checking off completed items. This is a deliberate mechanism within Manus: a typical task in Manus requires an average of 50 tool calls, which is a long cycle. Since Manus relies on LLMs for decision-making, in long contexts or complex tasks, it easily deviates from the topic or forgets early goals, i.e., the lost-in-the-middle problem and goal misalignment. Manus在处理复杂任务时，会倾向于创建一个todo.md文件，并通过逐步更新和勾选已完成项目来操纵注意力。这是Manus的一种刻意机制：在Manus中，一个典型任务平均需要50次工具调用，这是一个漫长的循环。由于Manus依赖LLMs进行决策，在长上下文或复杂任务中，它容易偏离主题或忘记早期目标，即lost-in-the-middle问题和goal misalignment。

Reiteration for Focus 通过复述保持焦点

By constantly rewriting the todo list, Manus reiterates its goals to the end of the context, pushing the global plan into the model's recent attention span. This practice, without changing the architecture, effectively uses natural language to bias the model's own focus, making it more concentrated on task objectives. 通过不断重写todo list，Manus将其目标复述到上下文的末尾，将全局计划推入模型的recent attention span。这种做法在不改变架构的情况下，有效地利用natural language偏向模型自身的焦点，使其更专注于任务目标。

Create/Update

todo.md

Complete/Check

Items

Reiterate

Context

Focus

Manus Guides Model Attention by Reiteration of Todo List Manus通过复述todo list来引导模型注意力

Click image to enlarge 点击图片放大

5. Retaining Error Information 5. 保留错误信息

Embracing Failure for Learning 拥抱失败以促进学习

It is normal for agents to make mistakes, not an error; LLMs may hallucination, the environment may return errors, external tools may behave abnormally, and unexpected edge cases always occur. In multi-step tasks, failure is part of the cycle, not an exception. A common impulse to handle errors is to hide them (clean traces, retry actions, or reset model state), but this comes at a cost: erasing failures eliminates evidence. Agents犯错是常态而非错误，LLMs会产生hallucination，环境可能返回错误，外部工具可能行为异常，意外的edge cases总是会发生。在多步骤任务中，失败是循环的一部分，而非例外。常见的处理错误冲动是隐藏它们（清理痕迹、重试动作或重置模型状态），但这会付出代价：抹去失败会消除证据。

Manus's Strategy: Retain Error Traces Manus的策略：保留错误痕迹

When the model sees a failed action and its resulting observation or stack trace, it implicitly updates its internal beliefs.当模型看到一个失败的动作及其产生的观察结果或stack trace时，它会隐式更新其内部信念。
This makes its prior knowledge no longer biased towards similar actions, thereby reducing the chance of repeating the same mistakes.这会使其先验知识不再偏向于类似的动作，从而减少重复相同错误的机会。
Error recovery is one of the clearest indicators of true agentic behavior, though often underrepresented in benchmarks.Error recovery是true agentic behavior最清晰的指标之一，但在大多数学术工作和公共基准测试中仍未得到充分体现。

Retaining Error Traces Enables Agent to Learn from Historical Experience 保留错误痕迹使agent能从历史经验中学习

Click image to enlarge 点击图片放大

6. Avoiding the Few-Shot Trap 6. 避免Few-Shot陷阱

The Pitfalls of Over-reliance on Few-Shot Prompting 过度依赖Few-Shot Prompting的陷阱

Few-shot prompting is a common technique to improve LLM output, but it can be counterproductive in agent systems. Language Models excel at imitation: they mimic behavior patterns in the context. If the context is filled with similar action-observation pairs, the model will tend to follow that pattern, even if it is no longer optimal. This can be dangerous in tasks involving repetitive decisions or actions: for instance, when Manus helps review a batch of 20 resumes, the agent often falls into a rhythm, repeating similar actions simply because it sees these patterns in the context. This can lead to drift, overgeneralization, or hallucination. Few-shot prompting是提高LLM输出的常用技术，但在agent systems中可能适得其反。Language Models擅长模仿：它们会模仿context中的行为模式。如果上下文充满了相似的action-observation pairs，模型将倾向于遵循该模式，即使它不再是最优的。这在涉及重复决策或动作的任务中可能很危险：例如，当Manus帮助审查一批20份简历时，agent经常会陷入一种节奏，重复相似的动作，仅仅是因为它在上下文中看到了这些模式。这可能导致drift、overgeneralization或hallucination。

Solution: Increase Diversity 解决方案：增加多样性

Manus introduces a small amount of structured variation in actions and observations.Manus在动作和观察中引入少量结构化变体。
Such as different serialization templates, alternate phrasing, or a small amount of sequential or format noise.例如不同的serialization templates、alternate phrasing、少量的顺序或格式噪声。
This controlled randomness helps break patterns and adjust the model's attention, thereby preventing the agent from becoming fragile due to overly uniform context.这种受控的随机性有助于打破模式，并调整模型的注意力，从而避免agent因上下文过于统一而变得脆弱。

Overly Few-Shot Can Lead to Rigid Agent Behavior 过度few-shot导致agent行为僵化

Click image to enlarge 点击图片放大

Navigation 导航目录

Context Engineering for AI Agents: Lessons from Building Manus AI Agent上下文工程实践：来自Manus项目的经验

1. Designing Around KV-Cache 1. 围绕KV-Cache设计

Impact of KV Cache on Model Latency and Cost KV Cache对模型延迟与成本的影响

Cost-Effectiveness Significance 显著的成本效益

Key Practices for Improving KV-Cache Hit Rate 提高KV-Cache命中率的关键实践

Maintain Prompt Stability 保持Prompt稳定性

Context Append-Only Context只追加

Deterministic Serialization 确定性序列化

Explicit Cache Breakpoints 显式标记缓存断点

KV Cache Impact on Model Latency and Cost KV Cache对模型延迟与成本的影响

2. Masking Rather Than Removing 2. 遮蔽而非移除

Managing Agent Action Space 管理Agent的动作空间

Manus's Approach: Context-Aware State Machine Manus的策略：上下文感知状态机

Manus Optimizes Model Behavior by Adjusting Tool Availability Manus通过调整工具可用性来优化模型行为

Hermes Format: Function Calling Modes Hermes格式：函数调用模式

Auto Mode 自动模式

Required Mode 强制模式

Specified Mode 指定模式

3. Using File System as Context 3. 使用文件系统作为上下文

Limitations of Traditional Context Windows 传统上下文窗口的局限性

The "Ultimate" Context: File System “终极”上下文：文件系统

Manus Uses File System as External Memory Manus将文件系统作为外部记忆

4. Manipulating Attention Through Reiteration 4. 通过复述操纵注意力

The Role of 'todo.md' in Attention Guidance 'todo.md'在注意力引导中的作用

Reiteration for Focus 通过复述保持焦点

Manus Guides Model Attention by Reiteration of Todo List Manus通过复述todo list来引导模型注意力

5. Retaining Error Information 5. 保留错误信息

Embracing Failure for Learning 拥抱失败以促进学习

Manus's Strategy: Retain Error Traces Manus的策略：保留错误痕迹

Retaining Error Traces Enables Agent to Learn from Historical Experience 保留错误痕迹使agent能从历史经验中学习

6. Avoiding the Few-Shot Trap 6. 避免Few-Shot陷阱

The Pitfalls of Over-reliance on Few-Shot Prompting 过度依赖Few-Shot Prompting的陷阱

Solution: Increase Diversity 解决方案：增加多样性

Overly Few-Shot Can Lead to Rigid Agent Behavior 过度few-shot导致agent行为僵化