Scaling Laws and the Road to Human-Level AI 扩展法则与通往人类水平AI之路

Speaker: Jared Kaplan, Co-founder of Anthropic 演讲者: Jared Kaplan, Anthropic联合创始人

Source: YouTube - Scaling and the Road to Human-Level AI 来源: YouTube - 扩展法则与通往人类水平AI之路

Core Message: AI Progress Driven by Scaling Laws 核心信息: AI进步由扩展法则驱动

The key driver of current AI progress is "Scaling Laws": as computational resources and data scale increase in the pre-training and reinforcement learning stages, AI performance improves predictably. This progress stems from finding systematic methods to enhance AI capabilities, rather than solely researcher intelligence. 当前AI进步的关键驱动力是“扩展法则”：随着在预训练和强化学习两大阶段计算资源和数据规模的投入增加，AI性能会以可预测的方式持续提升。这种进步源于发现了系统性提升AI能力的方法，而不仅仅是研究人员的智慧。

Kaplan further discusses essential elements for achieving General Artificial Intelligence (AGI) or human-level AI, including organizing relevant organizational knowledge, memory, and more fine-grained supervision. He envisions AI's vast potential in multi-modal processing, complex task handling, and efficient human collaboration, eventually undertaking longer-span and broader tasks, potentially even replacing entire organizations or scientific fields. Kaplan进一步探讨了实现通用人工智能（AGI）或人类水平AI所需的关键要素，包括组织相关知识、记忆和更精细的监督能力，并展望了AI在多模态、复杂任务处理和与人类高效协作方面的巨大潜力，特别是在未来能够承担更长时间跨度、更广泛的任务，甚至可能替代整个组织或科学界的工作。

Jared Kaplan's Background & AI View Shift Jared Kaplan的背景与AI看法转变

Pre-20052005年前

Long career as a theoretical physicist, influenced by sci-fi mother, seeking to understand the universe. Frustrated by slow progress in physics. 大部分职业生涯是一名理论物理学家，受科幻作家母亲影响，致力于理解宇宙运作方式。对物理学进展缓慢感到沮丧。

2005-20092005-2009年

While studying, he found AI (50 years old at the time) unexciting (e.g., SVMs). Initially very skeptical about AI's importance. 读书期间，他认为当时已有50年发展历史的AI（如SVM）并不令人兴奋。最初对AI的重要性非常怀疑。

~6 years ago约6年前

Convinced by friends, realized AI was an exciting field. Joined Anthropic, starting his ~6 years in AI. 被朋友说服，认为AI是令人兴奋的领域。加入Anthropic，开始了他在AI领域约6年的工作。

Present现在

Leading researcher and co-founder at Anthropic, focusing on AI capabilities and scaling. Anthropic的首席研究员和联合创始人，专注于AI能力和扩展性。

How Contemporary AI Models Work 当代AI模型的工作原理

Modern AI models like Claude and ChatGPT are primarily trained through two fundamental stages. 当代AI模型如Claude和ChatGPT的训练主要包含两个基本阶段。

1. Pre-training 1. 预训练

Models learn to predict the next word/token in a massive corpus of human-generated data (primarily text, now multi-modal). This teaches them underlying relationships and patterns in data. 模型被训练来预测海量人类生成数据（主要是文本，现已扩展到多模态数据）中的下一个词语/标记。这教会模型数据中潜在的相关性和模式。

2. Reinforcement Learning (RL) 2. 强化学习

Models are refined using human feedback (e.g., choosing better responses) to optimize for helpful, honest, and harmless behavior, suppressing undesirable outputs. 模型通过人类反馈（例如，选择更好的回复）进行优化，以强化有益、诚实、无害的行为，并抑制不良行为。

In Summary: AI models primarily learn to predict the next token and then refine their ability to perform useful tasks through reinforcement learning. 总结: AI模型所有训练就是学习预测下一个词语，然后通过强化学习来执行有用的任务。

The Discovery & Importance of Scaling Laws 扩展法则的发现与重要性

Pre-training Scaling Law: Predictable Performance Gains 预训练扩展法则: 可预测的性能提升

About 5-6 years ago, Kaplan's team discovered precise scaling laws in the pre-training phase. This means AI model performance consistently improves in a predictable manner as scale (compute, data, model size) increases. 大约5-6年前，Kaplan的团队在预训练阶段发现了精确的扩展法则。这意味着随着规模（计算、数据、模型大小）的扩大，AI模型的性能会以可预测的方式持续提升。

"Dumbest Questions": Kaplan's physicist mindset led him to ask simple, fundamental questions about data and model size, which revealed astonishingly precise trends, comparable to those in physics or astronomy.“最愚蠢的问题”: Kaplan的物理学家思维促使他提出关于数据和模型大小的简单、根本性问题，这揭示了令人惊叹的精确趋势，堪比物理学或天文学中的发现。

AI Model Performance vs. Scale (Pre-training) AI模型性能与规模关系 (预训练)

This chart illustrates the consistent and predictable improvement in AI model performance as computational resources and data scale increase. The trend holds across many orders of magnitude. 此图表展示了随着计算资源和数据规模的增加，AI模型性能持续且可预测的提升。这一趋势在多个数量级上都成立。

Reinforcement Learning Scaling Law: Beyond Pre-training 强化学习扩展法则: 超越预训练

Approximately 4 years ago, researchers like Andy Jones extended this discovery to the reinforcement learning phase, using simpler games like Hex to demonstrate similar linear trends between training input and model performance (e.g., ELO score). 大约4年前，像Andy Jones这样的研究员将这一发现扩展到了强化学习阶段，利用Hex等更简单的棋盘游戏，展示了训练投入与模型表现（如ELO评分）之间类似的线性趋势。

Core Argument: AI progress is not solely due to researcher intelligence, but rather a systematic, scalable method of making AI better through increased compute in both pre-training and RL.核心论点: AI的进步并非仅仅源于研究人员的智慧，而是由于找到了通过在预训练和强化学习中扩大计算资源来系统性提升AI能力的简单方法，并且正在持续执行。

AI Model Performance vs. Training (Reinforcement Learning) AI模型性能与训练关系 (强化学习)

This chart demonstrates the continued performance improvement in reinforcement learning with increased training, measured by an ELO-like score. 此图表展示了强化学习阶段随着训练投入的增加，性能（以ELO类评分衡量）的持续提升。

AI Capabilities: Flexibility & Task Time AI能力: 灵活性与任务时长

Understanding AI Performance Across Dimensions 多维度理解AI表现

Kaplan evaluates AI capabilities across two critical dimensions: Flexibility/Applicability Scope and Task Completion Time. Kaplan从两个关键维度评估AI能力：灵活性/适用范围和任务完成时间。

Flexibility / Applicability Scope (Y-axis)灵活性 / 适用范围 (Y轴)

The AI's ability to adapt to different modalities and contexts. Early AIs like AlphaGo were super-intelligent but limited to specific domains (e.g., Go). Modern LLMs are steadily progressing in handling various human-perceivable modalities (text, images). AI适应不同模态和情境的能力。早期AI如AlphaGo虽然超级智能，但仅限于特定领域（如围棋）。现代大型语言模型在处理人类可处理的多种模态（文本、图像）方面稳步进展。

Task Completion Time (X-axis)任务完成时间 (X轴)

The AI's capacity to complete tasks that would take humans a significant amount of time. This capability is continuously growing, with AI models able to handle increasingly longer time-span tasks. AI完成人类需要较长时间才能完成的任务的能力。这种能力持续增长，AI模型能够执行时间跨度越来越长的任务。

The Meter research suggests AI model task duration approximately doubles every 7 months. This exponential growth implies that future AI could perform tasks spanning days, weeks, months, or even years, potentially replacing entire organizations or scientific fields. Meter组织研究表明，AI模型完成任务的时长大约每7个月翻倍。这种指数级增长意味着未来的AI能够执行跨越数天、数周、数月甚至数年的任务，可能最终替代整个组织或科学界的工作。

AI Capability Landscape AI能力格局

High Flexibility 高灵活性

Low Flexibility 低灵活性

Short Tasks 短时任务

Long Tasks 长时任务

Narrow AI + Short Tasks狭窄AI + 短时任务

e.g., AlphaGo (Chess, Go)例如: AlphaGo (围棋)

Emerging AI + Long Tasks新兴AI + 长时任务

e.g., Advanced LLMs (Complex coding, research tasks)例如: 高级LLM (复杂编码、研究任务)

Basic AI + Short Tasks基础AI + 短时任务

e.g., Simple Chatbots例如: 简单聊天机器人

Broad AI + Short Tasks通用AI + 短时任务

e.g., Early LLMs (Text generation)例如: 早期LLM (文本生成)

This conceptual matrix plots AI capabilities along two dimensions, illustrating the progression from narrow, short-task AI to more flexible, longer-task-capable models. 这个概念矩阵图描绘了AI能力在两个维度上的发展，展示了从狭窄、短时任务AI向更灵活、能处理更长时间任务模型的发展历程。

The Road to Human-Level AI: Missing Elements 通往人类水平AI之路: 缺失要素

To unlock human-level AI, Kaplan identifies three key remaining elements that are relatively straightforward to address. 为了解锁广义上的人类水平AI，Kaplan指出了三个相对简单的关键缺失要素。

1. Relevant Organizational Knowledge 1. 相关组织知识

AI models need to be trained with contextual knowledge, similar to an experienced employee in an organization, rather than starting from a blank slate. This enables them to effectively process and apply domain-specific information. 需要训练AI模型，使其不再以空白状态示人，而是能够像在公司、组织、政府工作多年的人一样，具备相关上下文知识。这使得它们能够有效处理和应用特定领域的信息。

2. Memory 2. 记忆

Beyond mere knowledge, AI needs the ability to track task progress and build/utilize relevant memories over extended, time-consuming tasks. Claude 4 is beginning to develop this capability, which will be crucial for long-term task execution. 除了知识之外，AI还需要在执行耗时任务时，跟踪任务进展、建立和使用相关记忆的能力。Claude 4已开始构建此能力，未来将变得越来越重要。

3. Oversight / Fine-grained Supervision 3. 监督 / 精细化监督

AI models must improve their ability to understand nuances and solve difficult, ambiguous tasks. This involves developing AI that can generate more detailed reward signals for subjective tasks (e.g., writing good poetry) where right/wrong is less clear. AI模型必须提高其理解细微差别、解决困难模糊任务的能力。这包括开发能够为更主观的任务（例如，写好诗歌）提供更细致奖励信号的AI，因为这些任务的对错标准不明确。

Complex Tasks: Training AI models to execute increasingly complex tasks, evolving from text to multi-modal and robotic domains, will continue to yield significant benefits in the coming years.复杂任务: 训练AI模型执行越来越复杂的任务，从文本模型向多模态模型、再向机器人领域发展，预计未来几年在这些不同领域应用扩展性仍将带来持续的收益。

How to Prepare for the Future of AI 如何为AI的未来做准备

1. Build Products on the AI Frontier 1. 在AI能力边界构建产品

Experiment with products that aren't fully functional with current AI models, anticipating that rapid improvements (e.g., Claude 5) will soon make them viable and valuable. 构建现在还无法完全运作的产品，因为AI模型改进迅速，可以预期未来模型（如Claude 5）将使其运作并带来巨大价值。

2. Leverage AI to Integrate AI 2. 利用AI整合AI

The pace of AI development is too fast for manual integration. Use AI itself to accelerate the process of integrating AI into products, companies, and scientific endeavors. AI发展速度太快，无法手动整合。利用AI本身来加速将AI整合到产品、公司和科学中的过程。

3. Identify New Rapid Integration Areas 3. 识别快速普及的新领域

Beyond software engineering (where AI integration is booming), identify "what's next?" Which other sectors offer similar explosive growth potential for AI integration? 除了软件工程（AI整合已爆炸式增长）之外，识别“下一个是什么？”。哪些其他领域能够如此快速地增长？

Claude 4's Progress & Future Outlook Claude 4的进展与未来展望

Key Improvements in Claude 4 Claude 4的关键改进

Coding Capability: Significant enhancement in agent-based coding and search applications. Claude 3.7 Sonnet was exciting, but Claude 4 is even better as an agent.编码能力: 基于代理的编码和搜索应用能力显著提升。Claude 3.7 Sonnet已经令人兴奋，但Claude 4作为代理更出色。
Supervision: Improved ability to follow instructions and generate higher quality code.监督能力: 更好地遵循指令并提高代码质量。
Memory: Enhanced capacity to retain and store memory, enabling complex tasks across multiple context windows. This is the most exciting feature, unlocking longer time-span tasks.记忆能力: 改进了保存和存储记忆的能力，这使得Claude 4能够跨越多个上下文窗口持续完成非常复杂的任务。这是最令人兴奋的功能，解锁了更长时间跨度的任务。

Scaling Laws & Future Vision 扩展法则与未来愿景

Gradual Progress to AGI 通往AGI的渐进式进步

Scaling laws predict a smooth, gradual curve towards human-level AI (AGI), with models like Claude steadily improving across releases. Kaplan envisions Claude evolving into a collaborator capable of undertaking increasingly larger portions of work, moving from hourly tasks to much longer engagements. 扩展法则预示着一条通向人类水平AI（AGI）的平滑曲线，Claude等模型将通过每次发布稳步改进。Kaplan设想Claude将演变为能够承担越来越大部分工作的协作者，从小时级任务转向更长时间的协作。

12-Month Outlook: If no better models emerge within 12 months, it would signify a problem for continued progress.12个月展望: 如果12个月内没有更好的模型出现，那将预示着持续进步遇到了问题。

The Future of Human-AI Collaboration 人机协作的未来

Evolving Roles and Performance Thresholds 不断演变的角色与性能门槛

While AI can achieve "absolutely brilliant" things, it can also make "stupid mistakes." Unlike humans, where judgment and generation capabilities differ, AI's generation and judgment are more closely aligned. AI能够做出“绝对精彩”的事情，但也可能犯“愚蠢的错误”。与人类判断能力和生成能力存在差异不同，AI的生成能力和判断能力更为接近。

Human's Evolving Role:人类角色的演变:

Humans will increasingly act as "managers" or "sanity checkers" for AI's work, especially for complex tasks. This is evolving rapidly from "co-pilot" models (requiring human approval) to "full workflow replacement" solutions seen in recent YC batches. 人类将越来越多地扮演AI工作的“管理者”或“理智检查员”，尤其是在复杂任务中。这一角色正在迅速演变，从“副驾驶”模型（需要人类批准）转向近期YC批次中出现的“完整工作流替代方案”。

Acceptable Performance Levels:可接受的性能水平:

70% Accuracy: Sufficient for some tasks.70%准确率: 足以应对某些任务。
99.9% Accuracy: Required for deployment in critical tasks.99.9%准确率: 在关键任务部署所需。

Kaplan suggests building in scenarios where 70-80% accuracy is acceptable to truly push the AI capability frontier. Kaplan建议在70-80%准确率就足够用的场景进行构建，以便真正触及AI能力的前沿。

AI Intelligence: Depth vs. Breadth AI智能: 深度与广度

Deep Intelligence:深度智能:

Solving highly specific, difficult problems (e.g., proving Fermat's Last Theorem over 10 years). 解决高度具体、困难的问题（例如，花费10年证明费马大定理）。

Breadth Intelligence:广度智能:

Integrating vast information from diverse fields (biology, psychology, history). AI excels here, having absorbed all human civilization knowledge during pre-training, enabling insights across many specialized domains (e.g., biomedicine). 整合来自不同领域的庞大信息（生物学、心理学、历史学）。AI在此方面表现出色，因其在预训练阶段吸收了整个人类文明的知识，能够在许多专业领域（例如生物医学研究）中提取见解。

Kaplan anticipates more applications leveraging AI's breadth intelligence, especially in fields requiring extensive knowledge integration. While AI is improving in deep tasks, its ability to synthesize across domains is unique. Kaplan预期将有更多利用AI广度智能的应用，尤其是在需要广泛知识整合的领域。虽然AI在深度任务方面不断改进，但其跨领域合成的能力是独特的。

New Horizons for AI Applications AI应用的新领域

Identifying Untapped Potential 识别未开发的潜力

Kaplan, despite his research background, sees opportunities in any domain requiring significant skill and primarily involving computer-based interaction with data. Kaplan尽管有研究背景，但他认为在任何需要大量技能且主要涉及坐在电脑前与数据交互的领域，都存在机会。

Finance:金融领域:

Professionals heavily rely on spreadsheets (Excel), a prime area for AI augmentation. 专业人士大量使用Excel电子表格，是AI增强的黄金领域。

Legal:法律领域:

Although potentially more regulated and requiring certification, it's another area with high information interaction. 尽管可能受到更多监管并需要专业认证，但这是另一个信息交互量大的领域。

Integrating AI into Existing Businesses 将AI整合到现有业务中

The "Electricity" Analogy:“电力”类比:

When electricity first emerged, the most impactful use wasn't just replacing steam engines with electric motors, but completely reshaping factory operations. Similarly, AI's greatest leverage will come from fundamentally rethinking and integrating it into every part of the economy. 当电力刚出现时，最初最简单的用法并不一定是最好的，人们不只是用电动机替换蒸汽机，而是重塑了工厂的运作方式。同样，利用AI将AI整合到经济的各个部分，能够尽可能快地带来巨大的杠杆作用。

Physics Background: A Unique Advantage in AI Research 物理学背景: 在AI研究中的独特优势

Key Advantages from Physics Training 物理学训练的关键优势

Macro-Trend Identification: Training to seek the biggest picture and most macroscopic trends, striving for precision (e.g., distinguishing between exponential, power law, quadratic growth).宏观趋势识别: 训练自己寻找最大图景、最宏观的趋势，并尽可能使其精确（例如，区分指数、幂律或二次方增长）。
"Holy Grail" of Scaling Laws: The pursuit of finding a better slope (efficiency) in scaling, meaning more compute yields disproportionately larger advantages over competitors.扩展法则的“圣杯”: 追求找到扩展中更好的斜率（效率），这意味着投入更多计算，就能比其他AI开发者获得更大的优势。
Applying Approximation Methods: Knowledge of mature approximation methods in physics/mathematics for dealing with very large systems (like billion-parameter neural networks composed of large matrices).应用近似方法: 懂得物理学和数学中处理大型系统（如由大型矩阵构成的数十亿参数神经网络）的成熟近似方法。
Asking "Naive Questions": The ability to ask very fundamental, "stupid" questions, which can lead to significant breakthroughs in a new field like AI (only 10-15 years old in its current form).提出“天真问题”: 提出非常基本、“愚蠢”问题的能力，这在一个像AI这样非常新的领域（当前形态仅有10-15年历史）中能带来重大突破。

Physics vs. AI Challenges: Interpretability 物理学与AI挑战: 可解释性

Interpretability:可解释性:

Understanding "how AI models truly operate" is less like physics and more akin to biology or neuroscience. Unlike the human brain, AI systems allow for measuring everything, providing abundant data for reverse engineering their function. 理解“AI模型如何真正运作”不像物理学，而更像生物学或神经科学。与大脑无法测量每个神经元活动不同，AI可以测量一切，这使得AI有更多数据进行逆向工程。

Scaling Laws: Limitations & Challenges 扩展法则: 局限性与挑战

Diagnosing Deviations from the Curve 诊断曲线偏差

Kaplan acknowledges that predicting when scaling laws might change is a very difficult question. His primary use for scaling laws is to diagnose when AI training is "failing." Kaplan承认，预测扩展法则何时会改变是一个非常困难的问题。他主要用扩展法则来诊断AI训练是否“失效”。

When Scaling Laws "Fail":当扩展法则“失效”时:

If a scaling law appears to break down, it's typically because something is "messed up" in the AI training process. This could be due to a faulty neural network architecture, an undiscovered bottleneck in training, or issues with algorithm precision. 如果扩展法则似乎失效了，那通常是因为AI训练的某些方面“搞砸了”。这可能是由于神经网络的架构设计错误，训练中存在未发现的瓶颈，或者算法精度有问题。

Based on 5 years of experience, Kaplan states that significant evidence would be required to believe that scaling laws are truly no longer effective at the empirical level, as past apparent failures have consistently been attributed to errors in their own approach. 根据5年的经验，Kaplan指出，需要非常多的证据才能相信扩展法则在经验层面上真的不再起作用，因为过去看似失效的情况，往往都归因于他们自身的方法错误。

AI Efficiency & Jevons Paradox AI效率与杰文斯悖论

Current Inefficiency & Future Cost Reduction 当前低效率与未来成本降低

Currently, AI is inefficient because the value of unlocking the most powerful frontier models is extremely high. Companies like Anthropic prioritize developing cutting-edge capabilities while also striving for efficiency. 目前，AI效率低下，因为解锁最强大的前沿模型的价值巨大。Anthropic等公司在努力提高效率的同时，优先解锁前沿能力。

Efficiency Gains:效率提升:

Algorithmic and computational scaling efficiencies improve by 3x to 10x annually. Expect even lower precision (e.g., FP2) in the future for greater inference efficiency. 算法和计算扩展方面的效率每年有3到10倍的提升。预计未来会出现更低的精度（例如FP2），以提高推理效率。

The AI development landscape is in a "very non-equilibrium" state, with rapid improvements continually unlocking more capabilities, making it unclear if it will ever reach a stable "equilibrium" where AI becomes extremely cheap. AI发展目前处于“非常不均衡”的状态，快速的改进不断解锁更多能力，使得AI是否会达到一个极其便宜的稳定“均衡”状态尚不明确。

The Jevons Paradox in AI AI领域的杰文斯悖论

Jevons Paradox:杰文斯悖论:

As a resource becomes more efficient or less costly, its consumption increases. In AI, as models get better, demand for them increases, potentially offsetting cost reductions. 随着资源变得更高效或成本更低，其消耗量反而会增加。在AI中，随着模型变得越来越好，人们对它的需求也会增加，可能会抵消成本的下降。

Kaplan fully agrees with this: as AI systems become more capable and can perform more work, people will be willing to pay more for cutting-edge capabilities. The most powerful models, capable of end-to-end complex tasks, are expected to capture the majority of the value. Kaplan完全同意这一点：随着AI系统能力越来越强，能够完成更多工作，人们会更愿意为前沿能力付费。能够端到端完成非常复杂任务的最强大模型，预计将占据大部分价值。

Advice for Young AI Practitioners 给年轻AI从业者的建议

Master Model Utilization & Integration 掌握模型利用与整合

Deeply understand how these models work and develop the skills to efficiently use and integrate them into various applications and workflows. 深入理解这些模型如何运作，并培养高效利用和将其整合到各种应用和工作流中的技能。

Build & Experiment at the Frontier 在前沿领域构建与实验

Actively engage in building and experimenting with AI capabilities at their cutting edge, as these frontiers are rapidly expanding and new opportunities are emerging. 积极参与在AI能力的前沿进行构建和实验，因为这些前沿正在迅速扩展，新机会不断涌现。

Audience Questions & Jared Kaplan's Insights 观众提问与Jared Kaplan的洞察

Q1: Scaling Law Linearity vs. Task Duration Exponential Growth Q1: 扩展法则线性增长与任务时长指数增长

Question: If scaling laws show linear performance gain with exponential compute, why does task duration suddenly show exponential growth? 问题: 如果扩展法则显示计算投入呈指数增长，但性能提升是线性的，为何任务时长突然呈现指数增长？

Kaplan's Response: He doesn't know the exact answer, noting Meter's findings are empirical. He speculates that increasing task duration might hinge on the AI's ability for "self-correction." Relatively mild intelligence improvements could allow models to notice and correct errors, thereby doubling the task scope. This suggests that even modest smartness gains can unlock increasingly longer time spans for tasks. Kaplan回应: 他也不知道确切答案，Meter的发现是经验性的。他推测，增加任务时长可能取决于AI的“自我纠正”能力。相对温和的智能改进，就能让模型注意到并纠正错误，从而使任务范围翻倍。这表明即使是适度的智能提升，也能解锁越来越长的时间跨度的任务。

Q2: Verification Signals for RL Tasks & AGI Q2: 强化学习任务的验证信号与AGI

Question: Increasing task duration requires verification signals (e.g., product deployment for coding). How do other domains get these signals? Will it require massive human annotation for AGI, or are there better methods? 问题: 增加时间跨度需要验证信号，例如编码领域可以通过产品部署获得。那其他领域呢？是需要大量人工标注者来达到AGI，还是有更好的方法？

Kaplan's Response: The "worst case" is needing to constantly build more complex, longer tasks and train via RL, which people will do given the investment and value. A "better method" is to train one AI model to supervise another, providing detailed feedback beyond simple right/wrong for complex, subjective tasks (e.g., "you did well," "you didn't do well"). This AI-driven supervision is already happening to some extent, making training for long-duration tasks more efficient. Kaplan回应: “最坏的情况”是不得不不断构建更多、更复杂、时间跨度更长的任务，然后通过强化学习进行训练。鉴于对AI的投资和创造的价值，人们必要时会这样做。“更好的方法”是训练一个AI模型来监督和管理另一个AI模型，为复杂的、主观的任务提供更详细的反馈（例如，“你做得很好”、“你做得不好”）。这种AI驱动的监督已在某种程度上发生，使得针对长时任务的训练更高效。

Q3: Creation of RL Tasks (Human vs. AI) Q3: 强化学习任务的创建（人类与AI）

Question: Are these RL tasks created using large language models, or still primarily by humans? 问题: 这些强化学习任务是用大型语言模型创建的，还是仍然使用人类？

Kaplan's Response: It's a hybrid approach. They use AI as much as possible (e.g., AI generating code tasks) while also involving humans in task creation. As AI improves, they aim to leverage AI more, but the frontier of task difficulty is also rising, so human involvement remains necessary. Kaplan回应: 这是一种混合方式。他们会尽可能多地使用AI（例如用AI生成代码任务），同时也会请人类来创建任务。随着AI变得越来越好，他们希望能够更多地利用AI，但任务难度的前沿也在提高，所以人类仍然会参与。

Navigation 导航目录