<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Breezedeus.com</title>
        <link>https://www.breezedeus.com/</link>
        <description>善意的AI生产幸福❤</description>
        <lastBuildDate>Thu, 12 Feb 2026 07:51:16 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>zh-CN</language>
        <copyright>All rights reserved 2026, Breezedeus</copyright>
        <item>
            <title><![CDATA[GUI Agents（智能体）最新论文]]></title>
            <link>https://www.breezedeus.com/article/awesome-ui-agents</link>
            <guid>https://www.breezedeus.com/article/awesome-ui-agents</guid>
            <pubDate>Sat, 09 Nov 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[近期 Claude 发布了 Compute Use，智谱发布了 Phone Use 的 AutoGLM，它们都是利用 UI Agents 技术让智能体模拟人操作电脑和手机完成指定任务。本文列出 UI Agents 相关的最新论文和资料，并持续更新中…]]></description>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-139c0110d3318079a832f109869738f6"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-row notion-block-139c0110d33181fd9a9fdeb9f798a344"><div class="notion-column notion-block-139c0110d331818d9adfd4dede7d3c5e" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.25)"><div class="notion-blank notion-block-139c0110d33181068668de0d19d19e2c"> </div></div><div class="notion-spacer"></div><div class="notion-column notion-block-139c0110d3318121bc45e429c567c838" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.5416666666666667)"><div class="notion-text notion-block-139c0110d33181609369f06546dbb7b6"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/">Home</a></b><b> | </b><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus">GitHub</a></b><b> | </b><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://twitter.com/breezedeus">Twitter</a></b><b> | </b><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.youtube.com/@breezedeus">Youtube</a></b><b>  |  </b><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://space.bilibili.com/509307267">Bilibili</a></b></div></div><div class="notion-spacer"></div><div class="notion-column notion-block-139c0110d331816880fbeb7e99e6dcc1" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.2083333333333335)"><div class="notion-blank notion-block-139c0110d3318180a7e7f7fb4c366231"> </div></div><div class="notion-spacer"></div></div><div class="notion-row notion-block-139c0110d3318167aa02e044a71eed92"><div class="notion-column notion-block-139c0110d331813ca223e69b1ef0f334" style="width:calc((100% - (1 * min(32px, 4vw))) * 0.5)"><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-139c0110d331813a8e76fb171db60410"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:384px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F9341931a-53f0-48e1-b026-0f1ad17b457c%2Fdb46f0ea-9d28-4b46-bdf6-bb477e96bd5c%2Fimage.png?table=block&amp;id=139c0110-d331-813a-8e76-fb171db60410&amp;t=139c0110-d331-813a-8e76-fb171db60410&amp;width=384&amp;cache=v2" alt="notion image" loading="lazy" decoding="async"/></div></figure></div><div class="notion-spacer"></div><div class="notion-column notion-block-139c0110d331812ca9ded6139e7f5634" style="width:calc((100% - (1 * min(32px, 4vw))) * 0.5)"><div class="notion-text notion-block-139c0110d33181199aa1db35ca0e554c"><b>目录：</b></div><div class="notion-table-of-contents notion-gray notion-block-139c0110d331814bb087f605b0e4b7a6"><a href="#139c0110d331813b9692f9056a7960fd" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">UI Agents 知识星球</span></a><a href="#139c0110d331802aa42ddd658b6e405a" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">UI Agents 综述资料【Update 2025.03.22】</span></a><a href="#139c0110d33180db9fcdfb72773858e4" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">UI Agents 论文列表【Update 2025.03.22】</span></a><a href="#151c0110d33180388fe1feb3f5c8721e" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">Policy Models</span></a><a href="#151c0110d3318044b605dbd983cb9c2c" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:48px">Training-based Models</span></a><a href="#151c0110d33180988f5bcd9a57f7e13f" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:48px">Training-free Models</span></a><a href="#151c0110d33180bf818bd4f09dbb6a1d" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">Enhanced Knowledges</span></a><a href="#151c0110d33180cc8dbcdc2eb0427a7b" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">Data Synthesis</span></a><a href="#139c0110d331804687a9e6aa54725c37" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">Datasets / Benchmarks</span></a><a href="#151c0110d3318074bc5bdac6be5ace96" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">Tools / Environments</span></a><a href="#165c0110d33180c385d3cd8b1f7c8a8d" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">Others</span></a><a href="#139c0110d331805895a9d1467418e470" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">UI Agents 其他汇总信息</span></a></div></div><div class="notion-spacer"></div></div><div class="notion-blank notion-block-139c0110d33180f0928ee79b2b8de1a9"> </div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-139c0110d331813b9692f9056a7960fd" data-id="139c0110d331813b9692f9056a7960fd"><span><div id="139c0110d331813b9692f9056a7960fd" class="notion-header-anchor"></div><a class="notion-hash-link" href="#139c0110d331813b9692f9056a7960fd" title="UI Agents 知识星球"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">UI Agents 知识星球</span></span></h2><div class="notion-text notion-block-139c0110d33181c5a8e2c61fff2b12ba">UI Agents 技术发展迅猛，想紧跟 UI agents 技术前沿？我们的知识星球每周以视频方式<b>解读最新论文</b>，为你开启技术新视野，快来加入吧！</div><div class="notion-sync-block notion-block-152c0110d33181f090d9e4997c8eca1f"><div class="notion-row notion-block-152c0110d33181c08dc6d7b8428cdfc2"><div class="notion-column notion-block-152c0110d3318133876cd29e6b65c0c0" style="width:calc((100% - (1 * min(32px, 4vw))) * 0.5)"><div class="notion-text notion-block-152c0110d33181198aaee36db2ec5d05">加入知识星球，每周获取会员专享视频👇</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-1abc0110d33180bfa5cdc4bad8debfbc"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A5f7a73f8-7130-42b3-8aee-ddff88aad100%3Aimage.png?table=block&amp;id=1abc0110-d331-80bf-a5cd-c4bad8debfbc&amp;t=1abc0110-d331-80bf-a5cd-c4bad8debfbc" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-blank notion-block-152c0110d331814191e7c1bfedb83d9a"> </div></div><div class="notion-spacer"></div><div class="notion-column notion-block-152c0110d33181009259fa3bca4ad6d3" style="width:calc((100% - (1 * min(32px, 4vw))) * 0.5)"><div class="notion-text notion-block-152c0110d33181be9604d9b7609c8db2">扫码加微信小助手为好友，备注「agent」，小助手会定期邀请入群👇</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-154c0110d33180528fd6e63e8ad53371"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F9341931a-53f0-48e1-b026-0f1ad17b457c%2Feed2e715-74df-4361-bd15-bc084f0d791f%2Fimage.png?table=block&amp;id=154c0110-d331-8052-8fd6-e63e8ad53371&amp;t=154c0110-d331-8052-8fd6-e63e8ad53371&amp;width=331.9952087402344&amp;cache=v2" alt="notion image" loading="lazy" decoding="async"/></div></figure></div><div class="notion-spacer"></div></div></div><div class="notion-sync-block notion-block-151c0110d33180b3ba16fe7b239b5be6"><div class="notion-text notion-block-151c0110d33180d28b7bec1e1f98963b"><b>当前星球包含的专享视频包括：</b></div><ul class="notion-list notion-list-disc notion-block-2b9c0110d33180f3ab7cfd5dbdd069b3"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1fcSEBMEzr">AI-Agents 中的上下文工程（Context-Engineering）</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-266c0110d331804eac48ca0f99b0ade2"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV13wuozDExH">GUI Agents 最新技术综述（2025）</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-266c0110d3318066b907e494b7b212d4"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1EeTvzaEBw">GUI Agent 最新技术：MONDAY—从视频自动构建 GUI Agents 轨迹数据</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-1e4c0110d33180cbb9eaf9db68f4ced0"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1MVG1zsEB7">GUI Agent 最新技术：InfiGUI-R1—从反应式执行向推理式决策的进阶之路</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-1e4c0110d3318097a0b3dc7029483852"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1bmdzYzEty">GUI Agent 最新技术：自动驾驶与具身智能技术能带来哪些启示？</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d3318065a5b1ec3028aaefdf"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1uyRhY2EFi">GUI Agent 最新技术：ATLaS—同时提升训练效率和模型泛化性</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-1acc0110d3318034a55ae1fe3e709812"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1gm96YrEQY">GUI Agent 技术分享：DigiQ/VEM—使用 RL 提升模型的泛化能力</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d331806580dec9dc260154e7"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1pPFceFE42">UI Agent 技术分享： UI-TARS—利用长期记忆和反思调整迭代优化模型</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-185c0110d331800f9687c1f6b3d5c83b"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1itwkerEyu">AI Agent 技术分享：Insight-V—探索 VLM 的长链条视觉推理能力</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-17ac0110d331809a9d2dc345f7df87f5"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1hZcGe1ELm">UI Agent 技术分享：PC-Agent—提升模型认知能力以便更好完成复杂任务</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-173c0110d331801e9074e847bbeaaefd"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1aKrTY7EWB">UI Agent 技术分享：OS-Genesis—自动合成高质量且多样化的训练数据</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-173c0110d33180fdad96fafb9cd8623d"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1u26hY5Eyw">UI Agent 技术分享：PAE-通过自动探索新任务不断扩展模型能力</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d331804b9542c46d6f309176"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1dgCNYXEfa">UI Agent 技术分享：Iris-通过自动构造的数据提升模型效果</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d33180f79e4efbc20c6e3980"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV17VqfYfEjk">UI Agent 技术分享：Falcon-UI—利用无监督数据预训练 UI Agent 模型</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-15cc0110d3318068b5f7cb3a2a4e3d24"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1erqxYhEBc">UI Agent 技术分享：Aguvis-来自 HKU &amp; Salesforce 的大一统训练数据和训练框架</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d3318043887fd27956e15eb6"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1U86FY9E1G">UI Agent 技术分享：ShowUI-当前最好的 UI Agents 开源模型，还适用中文 APP？</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d33180b98171fcbc2934024c"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1pjBtYnE6C">UI Agent 技术分享：使用世界模型提升 UI Agents 效果？</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d33180d0a0c2c69318bc6ff3"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV14eU7YWEEs">UI Agent 技术分享：来自华为诺亚方舟实验室的 LiMAC</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d33180eda815f4f6a769ae7e"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1c1mpYtEqG">UI Agent 技术分享：来自 LG AI Research 的 Auto-Intent</a></b></span></li></ul><div class="notion-blank notion-block-151c0110d33180e3802cefd4b14a659c"> </div></div><div class="notion-text notion-block-b4de8fd4a96f4864b62b332c7309f167">&lt;ins/&gt;</div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-139c0110d331802aa42ddd658b6e405a" data-id="139c0110d331802aa42ddd658b6e405a"><span><div id="139c0110d331802aa42ddd658b6e405a" class="notion-header-anchor"></div><a class="notion-hash-link" href="#139c0110d331802aa42ddd658b6e405a" title="UI Agents 综述资料【Update 2025.03.22】"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">UI Agents 综述资料【Update 2025.03.22】</span></span></h2><ul class="notion-list notion-list-disc notion-block-139c0110d33180fab917dd28dcc68af8"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/ui-agent">UI Agents（智能体）技术综述</a></b></span><span class="notion-blue"><b>, </b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1CtDWYzE9b">Bilibili</a></b></span><span class="notion-blue"><b>, </b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://youtu.be/YAhXGjV25zU">Youtube</a></b></span><span class="notion-blue"><b>, Breezedeus, 2024.11</b></span></li></ul><ul class="notion-list notion-list-disc notion-block-1bec0110d33180dca6dfd98784780483"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.11069">[2503.11069] API Agents vs. GUI Agents: Divergence and Convergence</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d3318084be70f8a239e7539a"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2501.16150">[2501.16150] AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-186c0110d331801c9235c0e4de7a8eca"><li><span class="notion-blue">[2501] </span><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://os-agent-survey.github.io/">OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use</a></span><span class="notion-blue">, OPPO</span></li></ul><ul class="notion-list notion-list-disc notion-block-161c0110d331804685dff5ed839ac1fa"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.13501">[2412.13501] GUI Agents: A Survey</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-161c0110d3318025a284e7464a0ef4ae"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.10047">[2412.10047] Large Action Models: From Inception to Implementation</a></span><span class="notion-blue">, Microsoft</span></li></ul><ul class="notion-list notion-list-disc notion-block-17ac0110d3318069b528cf1264687206"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2411.10943">[2411.10943] Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-14fc0110d331801fb1d7cf1d14cfb27a"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2411.18279">[2411.18279] Large Language Model-Brained GUI Agents: A Survey</a></span><span class="notion-blue">, Microsoft</span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d33180eaa1a6c98ae41e0082"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2411.10323">[2411.10323] The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-139c0110d33180b9a9efc08680ebddbb"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2411.04890">[2411.04890] GUI Agents with Foundation Models: A Comprehensive Survey</a></span><span class="notion-blue">, Huawei Noah’s Ark Lab</span></li></ul><ul class="notion-list notion-list-disc notion-block-139c0110d331807aa66ec9455628cb40"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2411.02006">[2411.02006] Foundations and Recent Trends in Multimodal Mobile Agents: A Survey</a></span></li></ul><div class="notion-blank notion-block-14fc0110d33180509866cb6676aa0011"> </div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-139c0110d33180db9fcdfb72773858e4" data-id="139c0110d33180db9fcdfb72773858e4"><span><div id="139c0110d33180db9fcdfb72773858e4" class="notion-header-anchor"></div><a class="notion-hash-link" href="#139c0110d33180db9fcdfb72773858e4" title="UI Agents 论文列表【Update 2025.03.22】"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">UI Agents 论文列表【Update 2025.03.22】</span></span></h2><div class="notion-callout notion-gray_background_co notion-block-139c0110d33180e5869cdcbe2e1c9bd0"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="⛔">⛔</span></div><div class="notion-callout-text"><div class="notion-text notion-block-ab237300da674afabed4a83c20a13d5b"><b>知识星球</b>每周会从此列表中选出一到两篇论文通过视频方式讲解。大家对哪些论文感兴趣欢迎加入星球并留言说明。</div></div></div><div class="notion-blank notion-block-151c0110d33180df8949dc358cdca826"> </div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-151c0110d33180388fe1feb3f5c8721e" data-id="151c0110d33180388fe1feb3f5c8721e"><span><div id="151c0110d33180388fe1feb3f5c8721e" class="notion-header-anchor"></div><a class="notion-hash-link" href="#151c0110d33180388fe1feb3f5c8721e" title="Policy Models"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Policy Models</span></span></h3><h4 class="notion-h notion-h3 notion-h-indent-2 notion-block-151c0110d3318044b605dbd983cb9c2c" data-id="151c0110d3318044b605dbd983cb9c2c"><span><div id="151c0110d3318044b605dbd983cb9c2c" class="notion-header-anchor"></div><a class="notion-hash-link" href="#151c0110d3318044b605dbd983cb9c2c" title="Training-based Models"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Training-based Models</span></span></h4><ul class="notion-list notion-list-disc notion-block-1b3c0110d33180cd9df4f9f01534ff78"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.02197">[2503.02197] ATLaS: Agent Tuning via Learning Critical Steps</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d33180a89afec727a08d0373"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.13130">[2502.13130] Magma: A Foundation Model for Multimodal AI Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d3318013802aeeb1be1b5c59"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.02955">[2502.02955] ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-186c0110d33180668a31f5f8c514479c"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2501.12326">[2501.12326] UI-TARS: Pioneering Automated GUI Interaction with Native Agents</a></span><span class="notion-blue">, ByteDance</span></li></ul><ul class="notion-list notion-list-disc notion-block-17ac0110d331805db04bfccebc151b3f"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2501.04575">[2501.04575] InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-17ac0110d33180d39bf2c1a935b29898"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.17589">[2412.17589] PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-173c0110d33180db8daae2ff6fdcd15d"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.16256">[2412.16256] Aria-UI: Visual Grounding for GUI Instructions</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d3318092ac6cd6e53a17da8c"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.09362">[2412.09362] Falcon-UI: Understanding GUI Before Following User Instructions</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d33180279248eae415cd55f5"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.04454">[2412.04454] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d33180da89dedf80baa4df78"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.01268">[2412.01268] Ponder &amp; Press: Advancing Visual GUI Agent towards General Computer Control</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d33180b9bd67ccd9c4e89acd"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2411.17465">[2411.17465] ShowUI: One Vision-Language-Action Model for GUI Visual Agent</a></span><span class="notion-blue">, </span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/showlab/ShowUI">Github</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d3318082b861c6862491f864"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2402.15506">[2402.15506] AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d33180fca031e7ace5daef2c"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2411.13451">[2411.13451] AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d33180a0af62d8a3af5d91c6"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2411.00820">[2411.00820] AutoGLM: Autonomous Foundation Agents for GUIs</a></span><span class="notion-blue">, Zhipu</span></li></ul><ul class="notion-list notion-list-disc notion-block-139c0110d33180f1af9ccf02fcb05160"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2410.19461">[2410.19461] EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-139c0110d3318004b8bbc8ece630b335"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2410.22916">[2410.22916] Explainable Behavior Cloning: Teaching Large Language Model Agents through Learning by Demonstration</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-139c0110d33180fe827ffcf0268e1014"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2406.19263">[2406.19263] Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-173c0110d33180fbb14bfabea3e06097"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2312.08914">[2312.08914] CogAgent: A Visual Language Model for GUI Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-17ac0110d3318079a5d7da6502c02e0e"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2312.15820">[2312.15820] WebVLN: Vision-and-Language Navigation on Websites</a></span></li></ul><div class="notion-blank notion-block-151c0110d33180e08721c195bd507579"> </div><div class="notion-text notion-block-151c0110d3318070a837f6d24578c9f3"><b>Reinforcement Learning</b></div><ul class="notion-list notion-list-disc notion-block-1b3c0110d331801bb4b9faf011b2c89b"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.18906">[2502.18906] VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d33180a6bd8ffd1638046b0b"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.15760">[2502.15760] Digi-Q: Learning Q-Value Functions for Training Device-Control Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d3318044ab06ee12e4f06e10"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.12130">[2502.12130] Scaling Autonomous Agents via Automatic Reward Modeling And Planning</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d33180fd8b34ee10455c510a"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.07949">[2502.07949] VSC-RL: Advancing Autonomous Vision-Language Agents with Variational Subgoal-Conditioned Reinforcement Learning</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d33180c9a703d7cde7698371"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.10742">[2412.10742] WEPO: Web Element Preference Optimization for LLM-based Web Navigation</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d331803dbbf1d7b4c3b156f8"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.06313">[2412.06313] Vision-Based Deep Reinforcement Learning of UAV Autonomous Navigation Using Privileged Information</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d331804ab071c43fd6bdd99f"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2411.03817">[2411.03817] From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d331807ba1bbc12895ab521f"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2411.02337">[2411.02337] WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning</a></span><span class="notion-blue">, Zhipu</span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d3318050b46dd6a5f2210bf0"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2410.24218">[2410.24218] Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-16ac0110d33180dfb807cd6c9139e588"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2406.11896">[2406.11896] DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d33180bf9097de49665d4432"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2406.04151">[2406.04151] AgentGym: Evolving Large Language Model-based Agents across Diverse Environments</a></span></li></ul><div class="notion-blank notion-block-151c0110d331800b93e4e96ce74e654f"> </div><h4 class="notion-h notion-h3 notion-h-indent-2 notion-block-151c0110d33180988f5bcd9a57f7e13f" data-id="151c0110d33180988f5bcd9a57f7e13f"><span><div id="151c0110d33180988f5bcd9a57f7e13f" class="notion-header-anchor"></div><a class="notion-hash-link" href="#151c0110d33180988f5bcd9a57f7e13f" title="Training-free Models"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Training-free Models</span></span></h4><div class="notion-text notion-block-1b3c0110d331803d96c5f453ac30706d"><b>Multi-Agents Framework</b></div><ul class="notion-list notion-list-disc notion-block-1bec0110d33180fcb01bdae10ea23d62"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.09572">[2503.09572] Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks</a></span><span class="notion-blue"> </span><code class="notion-inline-code">long-horizon</code></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d33180ecb745e65943a86244"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.03459">[2503.03459] Unified Mind Model: Reimagining Autonomous Agents in the LLM Era</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d331801b82c6ffd32af45fa9"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.16796">[2502.16796] MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d33180d584b4ccebfee3feaf"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.14282">[2502.14282] PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-186c0110d33180008803d32d1e015e4d"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2501.11733">[2501.11733] Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks</a></span></li></ul><div class="notion-blank notion-block-1b3c0110d33180598be5fb4cec047110"> </div><div class="notion-text notion-block-1b3c0110d33180a2a3a1db38927b17e8"><b>Single-Agent Framework</b></div><ul class="notion-list notion-list-disc notion-block-1bec0110d33180b49363fc9f21c77f94"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.15937">[2503.15937] Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1bec0110d33180fb84f1f58363378ffc"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.03196">[2503.03196] SpiritSight Agent: Advanced GUI Agent with One Look</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1bec0110d33180d6b2cecccfad91652a"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.03743">[2503.03743] CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1bec0110d33180b29cd4f2193acbb368"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.13843">[2503.13843] WebNav: An Intelligent Agent for Voice-Controlled Web Navigation</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1bec0110d33180a5b2a6dce5570a6abb"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.10689">[2503.10689] Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1bec0110d33180b989e7ec9dce68a066"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.06580">[2503.06580] Agent models: Internalizing Chain-of-Action Generation into Reasoning models</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d33180ac88effd69830b73c9"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.02950">[2503.02950] LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d33180c3af4ae6a6c9501a85"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.02268">[2503.02268] AppAgentX: Evolving GUI Agents as Proficient Smartphone Users</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d331807ba4b5ec5ff2128826"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.09215">[2502.09215] Architecture for Simulating Behavior Mode Changes in Norm-Aware Autonomous Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d331807bb3d2e440ea7e49dc"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.08226">[2502.08226] TRISHUL: Towards Region Identification and Screen Hierarchy Understanding for Large VLM based GUI Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d33180619f7bca599b9c0f81"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.07056">[2502.07056] Autonomous Deep Agent</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-186c0110d331805ca0dde86740e338f2"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2501.12485">[2501.12485] R2D2: Remembering, Reflecting and Dynamic Decision Making for Web Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-173c0110d33180559ff3f4ab11762386"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.18116">[2412.18116] AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d331802ea050d28dbe2e29e6"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.10840">[2412.10840] Attention-driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models without Fine-Tuning</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d33180c29c25eae29c67dc0b"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.07472">[2412.07472] SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d33180d1a2ecf3792ccb1cb2"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2411.15004">[2411.15004] ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d33180929d8ff3da061e51b7"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2411.06559">[2411.06559] Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d33180c88ce4f5aa6e554add"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2408.06458">[2408.06458] Towards Autonomous Agents: Adaptive-planning, Reasoning, and Acting in Language Models</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d331809b9c58d27184f64e8f"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2411.13591">[2411.13591] Improved GUI Grounding via Iterative Narrowing</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-139c0110d3318003a58ef61ddd0e0f74"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2410.19609">[2410.19609] OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d3318021a713d943fc11c19e"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2408.00203">[2408.00203] OmniParser for Pure Vision Based GUI Agent</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d3318083bb26c0cae6832e80"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2407.03913">[2407.03913] MobileExperts: A Dynamic Tool-Enabled Agent Team in Mobile Devices</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-173c0110d331808cbddbd39ab84f1eb0"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2406.06947">[2406.06947] CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d33180c8b3f0e8948c08d3e5"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2405.15341">[2405.15341] V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-139c0110d3318024acf3c5de4ff514a1"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2312.11190">[2312.11190] VisionTasker: Mobile Task Automation Using Vision Based UI Understanding and LLM Task Planning</a></span></li></ul><div class="notion-blank notion-block-151c0110d33180b2a413cc6fdce81d08"> </div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-151c0110d33180bf818bd4f09dbb6a1d" data-id="151c0110d33180bf818bd4f09dbb6a1d"><span><div id="151c0110d33180bf818bd4f09dbb6a1d" class="notion-header-anchor"></div><a class="notion-hash-link" href="#151c0110d33180bf818bd4f09dbb6a1d" title="Enhanced Knowledges"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Enhanced Knowledges</span></span></h3><ul class="notion-list notion-list-disc notion-block-19cc0110d33180c5bac8c101c65701b1"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2501.11425">[2501.11425] Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d33180298c0de158ca47b765"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2410.13232">[2410.13232][WMA] Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-139c0110d331804ea8aaeb23fcc00dce"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2410.23555">[2410.23555] From Context to Action: Analysis of the Impact of State Representation and Context on the Generalization of Multi-Turn Web Navigation Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-139c0110d33180c890afe5a7db33c57c"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2410.22552">[2410.22552] Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-186c0110d33180ee86e9c394a1282ac0"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2406.14596">[2406.14596] VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-139c0110d331804abbbcc0d9db266ddf"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2405.16247">[2405.16247] AutoManual: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning</a></span></li></ul><div class="notion-blank notion-block-165c0110d331806b8453f7d0783d1e57"> </div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-151c0110d33180cc8dbcdc2eb0427a7b" data-id="151c0110d33180cc8dbcdc2eb0427a7b"><span><div id="151c0110d33180cc8dbcdc2eb0427a7b" class="notion-header-anchor"></div><a class="notion-hash-link" href="#151c0110d33180cc8dbcdc2eb0427a7b" title="Data Synthesis"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Data Synthesis</span></span></h3><ul class="notion-list notion-list-disc notion-block-1b3c0110d331804da1fec09758e5af8a"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.11357">[2502.11357] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d33180c1a7d3ce1e2467a892"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.07942">[2502.07942] Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d331800695dcd323ca4a8b36"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.02982">[2502.02982] FedMobileAgent: Training Mobile Agents Using Decentralized Self-Sourced Data from Diverse Users</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-186c0110d3318060a35ffde3a6ae0cc5"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2501.13896">[2501.13896] GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-186c0110d33180fe8cafc58e66c8cc7f"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2501.10893">[2501.10893] Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-173c0110d33180169c72cf7152dc6bee"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.19723">[2412.19723] OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-173c0110d3318095a9f1f9a5f44caa7e"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.13194">[2412.13194] Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d331805995bcce529ad63db6"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.10342">[2412.10342] Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d3318073bd95c3fb16d1ca17"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.09605">[2412.09605] AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d3318067a09df7aaebca3c71"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2410.02907">[2410.02907] NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild</a></span></li></ul><div class="notion-blank notion-block-165c0110d33180339ed6c900ed9dacdf"> </div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-139c0110d331804687a9e6aa54725c37" data-id="139c0110d331804687a9e6aa54725c37"><span><div id="139c0110d331804687a9e6aa54725c37" class="notion-header-anchor"></div><a class="notion-hash-link" href="#139c0110d331804687a9e6aa54725c37" title="Datasets / Benchmarks"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Datasets / Benchmarks</span></span></h3><ul class="notion-list notion-list-disc notion-block-1bec0110d33180779338d23bef4d5ff2"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.09780">[2503.09780] AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1bec0110d33180f7ab10dde15c63c27f"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.04957">[2503.04957] SafeArena: Evaluating the Safety of Autonomous Web Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d331804c8ac6fe864c37fe6f"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.03056">[2503.03056] A2Perf: Real-World Autonomous Agents Benchmark</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d33180d1a6daee0fb9b44c97"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.02403">[2503.02403] AutoEval: A Practical Framework for Autonomous Evaluation of Mobile Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d3318000be9ef4584223f8a4"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.18356">[2502.18356] WebGames: Challenging General-Purpose Web-Browsing AI Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d3318023ab7ff6a3089d10c8"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.15840">[2502.15840] Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents</a></span><span class="notion-blue"> </span><code class="notion-inline-code">long-horizon</code></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d33180409361e8695afb9c97"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2501.02863">[2501.02863] Beyond Pass or Fail: Multi-Dimensional Benchmarking of Foundation Models for Goal-based Mobile UI Navigation</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d33180378631e3a6cb5f254e"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2403.11905">[2403.11905] Tur[k]ingBench: A Challenge Benchmark for Web Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d33180b785cdcf50e8288866"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2410.19100">[2410.19100] VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d331800b94cbe875b828427e"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.13053">[2502.13053] AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d3318090b84dfd2a42d28319"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.08047">[2502.08047] WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d331808d9d60cc106010b8c6"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2501.16609">[2501.16609] CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d33180c5bdccf2e65870ce3a"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2410.23252">[2410.23252] Evaluating Cultural and Social Awareness of LLM Web Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-173c0110d33180f287f7ea3667ed59a0"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/blog/Ziyang/screenspot-pro">ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-173c0110d331801f97b3c3944ce0f0f3"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2501.01149">[2501.01149] A3: Android Agent Arena for Mobile GUI Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-173c0110d33180829a49d00779926c2e"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.18426">[2412.18426] GUI Testing Arena: A Unified Benchmark for Advancing Autonomous GUI Testing Agent</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d33180ac816df2fdb3636fef"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2410.17520">[2410.17520] MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d331805983d0f9303e63affd"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.04531">[2412.04531] MageBench: Bridging Large Multimodal Models to Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d3318061b749d6c261ec5f68"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.05789">[2412.05789] InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d3318077b986d053fa4898fd"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2410.06703">[2410.06703] ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-139c0110d331800f82b8e48de39273ea"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2410.24024">[2410.24024] AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents</a></span><span class="notion-blue">, Zhipu</span></li></ul><ul class="notion-list notion-list-disc notion-block-139c0110d33180b79009c1f75b7ccbfd"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2410.15164">[2410.15164] SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-139c0110d331806e8d3cf5b04ed88be7"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2410.17520">[2410.17520] MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d33180d69805deb405e0029b"><li><span class="notion-blue">[</span><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2406.03679">2406.03679][AndroidControl] On the Effects of Data Scale on UI Control Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d331806b84d4ed3d6048eca7"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2406.14250">[2406.14250] E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d33180f884bbe7e3190dd4a2"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2405.14573">[2405.14573] AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d33180ec8c5ddbb4b7373ad5"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2209.08199">[2209.08199] ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots</a></span></li></ul><div class="notion-blank notion-block-165c0110d3318014aecef6764b152737"> </div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-151c0110d3318074bc5bdac6be5ace96" data-id="151c0110d3318074bc5bdac6be5ace96"><span><div id="151c0110d3318074bc5bdac6be5ace96" class="notion-header-anchor"></div><a class="notion-hash-link" href="#151c0110d3318074bc5bdac6be5ace96" title="Tools / Environments"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Tools / Environments</span></span></h3><ul class="notion-list notion-list-disc notion-block-19cc0110d3318034859fe71b930abfb1"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/browser-use/browser-use">browser-use/browser-use: Make websites accessible for AI agents</a></span></li><ul class="notion-list notion-list-disc notion-block-19cc0110d3318034859fe71b930abfb1"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/browser-use/web-ui">browser-use/web-ui: Run AI Agent in your browser.</a></span></li><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/browser-use/macOS-use">browser-use/macOS-use: Make Mac apps accessible for AI agents</a></span></li></ul></ul><ul class="notion-list notion-list-disc notion-block-19fc0110d331808ca115d6365204a0fd"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/maitrix-org/llm-reasoners">maitrix-org/llm-reasoners: A library for advanced large language model reasoning</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d33180fda19ac048bc556310"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/web-infra-dev/Midscene">web-infra-dev/midscene: Let AI be your browser operator.</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d3318089b605f8ffe2603e4c"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/microsoft/OmniParser/tree/master/omnitool">OmniParser/omnitool at master · microsoft/OmniParser</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d331805b9a5ffbccc6fbcf14"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/bytedance/UI-TARS-desktop">bytedance/UI-TARS-desktop: A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d3318070aa07c3c6c21c5932"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/hrithikkoduri/WebRover">hrithikkoduri/WebRover: WebRover is an autonomous AI agent designed to interpret user input and execute actions by interacting with web elements to accomplish tasks or answer questions. It leverages advanced language models and web automation tools to navigate the web, gather information, and provide structured responses based on the user&#x27;s needs.</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d33180b3b503c1793c897fe5"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2412.05467">[2412.05467] The BrowserGym Ecosystem for Web Agent Research</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d33180ccb29bdfb1610e6d21"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2403.17918">[2403.17918] AgentStudio: A Toolkit for Building General Virtual Agents</a></span></li></ul><div class="notion-blank notion-block-1b3c0110d33180e8bb8af0b575e71ae3"> </div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-165c0110d33180c385d3cd8b1f7c8a8d" data-id="165c0110d33180c385d3cd8b1f7c8a8d"><span><div id="165c0110d33180c385d3cd8b1f7c8a8d" class="notion-header-anchor"></div><a class="notion-hash-link" href="#165c0110d33180c385d3cd8b1f7c8a8d" title="Others"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Others</span></span></h3><ul class="notion-list notion-list-disc notion-block-1bec0110d33180519b84e7184e3b75cf"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.09385">[2503.09385] PCLA: A Framework for Testing Autonomous Agents in the CARLA Simulator</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1bec0110d33180188248e4f3749372d7"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2503.08464">[2503.08464] An Autonomous RL Agent Methodology for Dynamic Web UI Testing in a BDD Framework</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d33180f2995fee0e89e0c943"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.20383">[2502.20383] Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d33180779cf0f8b6db66978f"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2502.17903">[2502.17903] Towards Sustainable Web Agents: A Plea for Transparency and Dedicated Metrics for Energy Consumption</a></span></li></ul><div class="notion-blank notion-block-1b3c0110d33180909d86eaac6204fcd1"> </div><div class="notion-blank notion-block-1b3c0110d33180c9ae72c293f205f4f8"> </div><div class="notion-text notion-block-151c0110d33180ec9d8ae3002b03ddff">&lt;ins/&gt;</div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-139c0110d331805895a9d1467418e470" data-id="139c0110d331805895a9d1467418e470"><span><div id="139c0110d331805895a9d1467418e470" class="notion-header-anchor"></div><a class="notion-hash-link" href="#139c0110d331805895a9d1467418e470" title="UI Agents 其他汇总信息"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">UI Agents 其他汇总信息</span></span></h2><ul class="notion-list notion-list-disc notion-block-160c0110d33180359efcf92af521f6ca"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/OSU-NLP-Group/GUI-Agents-Paper-List">https://github.com/OSU-NLP-Group/GUI-Agents-Paper-List</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-14fc0110d33180b49b83e9eb9d99cdd6"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/showlab/Awesome-GUI-Agent">https://github.com/showlab/Awesome-GUI-Agent</a></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d3318005ad82cc1e275e63c2"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/opendilab/awesome-ui-agents/">https://github.com/opendilab/awesome-ui-agents</a></span></li></ul><div class="notion-blank notion-block-14fc0110d33180e78917dbc5bbdfda97"> </div><div class="notion-text notion-block-139c0110d331801e9d6dfaad5d6b108f">&lt;ins/&gt;</div><div class="notion-blank notion-block-139c0110d33180489620fbeec7ea9008"> </div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[AI Agent 性能优化：核心策略与实战技巧]]></title>
            <link>https://www.breezedeus.com/article/ai-agent-perf-tips</link>
            <guid>https://www.breezedeus.com/article/ai-agent-perf-tips</guid>
            <pubDate>Mon, 13 Oct 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[本文深度拆解 AI Agent 性能优化的五大核心维度，提供超多可落地的实战技巧，助力打造更智能、鲁棒且高效的 AI Agent！]]></description>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-28bc0110d33180a79030fa219c1248a6"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-row notion-block-28bc0110d3318136bd56c45895a8ee1d"><div class="notion-column notion-block-28bc0110d33181c3b96acac8c2dcc8b8" style="width:calc((100% - (1 * min(32px, 4vw))) * 0.5)"><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-28bc0110d33180f69cccde51c49363a8"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3Adf07f5f5-920b-47a0-a822-2bb8a9b39d47%3Aimage.png?table=block&amp;id=28bc0110-d331-80f6-9ccc-de51c49363a8&amp;t=28bc0110-d331-80f6-9ccc-de51c49363a8" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-blank notion-block-28bc0110d33181f5a560fbdf33d006cf"> </div></div><div class="notion-spacer"></div><div class="notion-column notion-block-28bc0110d33181cf933bd507e9cf724e" style="width:calc((100% - (1 * min(32px, 4vw))) * 0.5)"><div class="notion-text notion-block-28bc0110d33181cc99cfdc15c5ea92aa"><b>目录：</b></div><div class="notion-table-of-contents notion-gray notion-block-28bc0110d331819eb92decf0ce97f903"><a href="#28bc0110d33180d39e52c5b8775ff7ea" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">1. Prompt 工程：塑造智能体的“思维模式”</span></a><a href="#28bc0110d331803682a6c7ed8ee5ba07" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">1.1 系统 Prompt 优化</span></a><a href="#28bc0110d33180e6bfe0c6763f157cd1" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">1.2 LLM 搜索优于 RAG 搜索</span></a><a href="#28bc0110d331804e9008c29fb7dfa0a3" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">1.3 可控性与行为引导</span></a><a href="#28bc0110d33180e6a1f7f56f0cc46847" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">2 上下文工程：精细化管理智能体的“记忆”</span></a><a href="#28bc0110d331806a8371c001b174ef5d" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">2.1 KV-cache 命中率优化</span></a><a href="#28bc0110d331806798bfe8531800ae12" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">2.2 上下文长度管理与外部化记忆</span></a><a href="#28bc0110d33180f6bb9de51d35fbca67" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">2.3 注意力机制操纵</span></a><a href="#28cc0110d33180aebf93cbf5f56d5d41" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">2.4 上下文检索与 Agentic 搜索</span></a><a href="#28dc0110d33180e4916ed2210b55aba2" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">2.5 长周期任务的上下文工程</span></a><a href="#28bc0110d3318001bbf3ca1228802f1c" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">3. 工具设计与管理：为智能体打造高效“武器库”</span></a><a href="#28bc0110d33180b5aa81c21716cf3861" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">3.1 工具设计原则</span></a><a href="#28bc0110d3318075b294e98047eb0fc8" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">3.2 动态工具选择与约束</span></a><a href="#28bc0110d33180c2bbe6dc544813d9d0" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">3.3 工具分层</span></a><a href="#290c0110d3318065be3ac2052079e8e8" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">3.4 Agent Skills：构建可组合、可扩展的专业能力</span></a><a href="#28bc0110d33180fead34faf2aede1a43" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">4. 控制循环与架构：构建稳定高效的智能体骨架</span></a><a href="#28bc0110d33180af8d7bc234c879ad3e" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">4.1 保持一个主循环</span></a><a href="#28bc0110d33180489991caa7359db918" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">4.2 使用小型模型</span></a><a href="#28bc0110d33180959bfcc73fbbb42785" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">5. 评估与适应：持续改进智能体性能</span></a><a href="#28bc0110d3318041b7a5ed6ed0a927d9" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">5.1 原型构建与综合评估</span></a><a href="#28bc0110d331807fb2ebf394de6e06ea" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">5.2 错误恢复与适应</span></a><a href="#28bc0110d3318098bf7fcc88775d7fbc" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">5.3 多样性与泛化</span></a><a href="#28bc0110d33180e08941ecc23b1c6106" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">结论</span></a><a href="#28bc0110d3318077912cd1d5f6ed7261" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">参考文献</span></a></div><div class="notion-blank notion-block-28bc0110d3318175912fc554c720ce18"> </div></div><div class="notion-spacer"></div></div><div class="notion-text notion-block-28bc0110d331803ba17ae280833f38d4">随着人工智能技术的飞速发展，AI Agent 在处理复杂任务方面的能力日益增强。然而，要充分发挥其潜力，优化是不可或缺的一环。本文将综合分析多篇前沿文章，提炼出 AI Agent 中行之有效的优化手段，涵盖Prompt 工程、上下文工程、工具设计、控制循环与架构以及评估与适应等方面，旨在为构建更高效、更稳定的智能体提供指导。</div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-28bc0110d33180d39e52c5b8775ff7ea" data-id="28bc0110d33180d39e52c5b8775ff7ea"><span><div id="28bc0110d33180d39e52c5b8775ff7ea" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d33180d39e52c5b8775ff7ea" title="1. Prompt 工程：塑造智能体的“思维模式”"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>1. Prompt 工程：塑造智能体的“思维模式”</b></span></span></h3><div class="notion-text notion-block-28bc0110d331807d97edd3823767264f">Prompt 工程是引导 LLM 行为的关键技术，尤其是在构建复杂 Agent 时。</div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-28bc0110d331803682a6c7ed8ee5ba07" data-id="28bc0110d331803682a6c7ed8ee5ba07"><span><div id="28bc0110d331803682a6c7ed8ee5ba07" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d331803682a6c7ed8ee5ba07" title="1.1 系统 Prompt 优化"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>1.1 系统 Prompt 优化</b></span></span></h4><ul class="notion-list notion-list-disc notion-block-28bc0110d33180c0a1e2e4bbfa109186"><li><b>清晰简洁的语言</b>：系统 Prompt 应该非常清晰，使用简单、直接的语言，以“恰当的高度”呈现思想，既要足够具体以有效指导行为，又要足够灵活以提供强大的启发式方法 [4]。</li><ul class="notion-list notion-list-disc notion-block-28bc0110d33180c0a1e2e4bbfa109186"><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-28bc0110d331802f9100c55e127e2d41"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3Aa9095168-f2fc-4a42-a8e1-67b17d9e4cf5%3Aimage.png?table=block&amp;id=28bc0110-d331-802f-9100-c55e127e2d41&amp;t=28bc0110-d331-802f-9100-c55e127e2d41" alt="在光谱的一端，我们看到脆弱的 if-else 硬编码提示，在另一端，我们看到过于笼统或错误地假设共享上下文的提示。" loading="lazy" decoding="async"/><figcaption class="notion-asset-caption"><em>在光谱的一端，我们看到脆弱的 if-else 硬编码提示，在另一端，我们看到过于笼统或错误地假设共享上下文的提示。</em></figcaption></div></figure></ul></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d33180b79a5ce2aa7589787c"><li><b>结构化 Prompt</b>：建议将 Prompt 组织成不同的部分（如 <code class="notion-inline-code">&lt;background_information&gt;</code>、<code class="notion-inline-code">&lt;instructions&gt;</code>、<code class="notion-inline-code">## Tool guidance</code>、<code class="notion-inline-code">## Output description</code> 等），并使用 XML 标签或 Markdown 标题来划分这些部分 [4]。</li><ul class="notion-list notion-list-disc notion-block-28bc0110d33180b79a5ce2aa7589787c"><li><b>尽量用最少的话术列出对模型的所有预期</b>：最佳做法是先用可用的最佳模型测试初版包含所有预期的最少话术的 prompt，以查看其在任务上的表现，然后根据初始测试中发现的失效模式添加清晰的指令和示例来改进性能。</li></ul></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d3318081a07dc5f3132341bf"><li><b>详细的启发式规则和示例</b>：Prompt 中应包含详细的启发式规则、示例和重要提醒，例如使用 <code class="notion-inline-code">&lt;good-example&gt;</code> 和 <code class="notion-inline-code">&lt;bad-example&gt;</code> 来明确区分可取和不可取的行为路径 [3]。</li><ul class="notion-list notion-list-disc notion-block-28bc0110d3318081a07dc5f3132341bf"><li>团队通常会往提示中塞满各种边缘情况，试图阐明 LLM 在特定任务中应遵循的每一种可能规则。不推荐这样做。相反，<b>建议提供一组多样化、规范化的示例，这些示例能够有效地展现代理的预期行为。</b>对于 LLM 来说，示例就是胜“千言万语”的“图片”（the “pictures” worth a thousand words）。</li></ul></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d33180558501d4f34b860288"><li><b>用户上下文和偏好管理</b>：可以使用 <code class="notion-inline-code">claude.md</code> 或类似文件来传递无法从代码库推断的上下文和严格偏好，例如强制 LLM 跳过某些文件夹或使用特定库 [3]。</li></ul><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-28bc0110d33180e6bfe0c6763f157cd1" data-id="28bc0110d33180e6bfe0c6763f157cd1"><span><div id="28bc0110d33180e6bfe0c6763f157cd1" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d33180e6bfe0c6763f157cd1" title="1.2 LLM 搜索优于 RAG 搜索"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>1.2 LLM 搜索优于 RAG 搜索</b></span></span></h4><div class="notion-text notion-block-28bc0110d3318079ab22f1f4b70c39de">在某些场景下，直接利用 LLM 的代码理解能力进行搜索可能优于传统的 RAG（Retrieval-Augmented Generation）方法 [3]：</div><ul class="notion-list notion-list-disc notion-block-28bc0110d33180d48243c21ea0cc41f9"><li><b>利用 LLM 理解代码</b>：Claude Code 通过复杂的 <code class="notion-inline-code">ripgrep</code>、<code class="notion-inline-code">jq</code> 和 <code class="notion-inline-code">find</code> 命令搜索代码库，利用 LLM 对代码的深刻理解，使用复杂的正则表达式查找相关代码块，甚至使用小型模型读取整个文件。这种方法避免了 RAG 引入的新的（隐藏的）故障模式 [3]。</li></ul><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-28bc0110d331804e9008c29fb7dfa0a3" data-id="28bc0110d331804e9008c29fb7dfa0a3"><span><div id="28bc0110d331804e9008c29fb7dfa0a3" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d331804e9008c29fb7dfa0a3" title="1.3 可控性与行为引导"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>1.3 可控性与行为引导</b></span></span></h4><div class="notion-text notion-block-28bc0110d33180c18f19d2a4e45b04b0">有效引导 AI Agent 的行为，使其符合预期，是 Prompt 工程的重要组成部分 [3]：</div><ul class="notion-list notion-list-disc notion-block-28bc0110d331805d85b6edb993b0e3d2"><li><b>明确的语气和风格控制</b>：在系统 Prompt 中明确定义 Agent 的语气、风格和主动性，并提供具体的指令和示例。这有助于 Agent 在交互中展现出一致且合宜的行为 [3]。</li><ul class="notion-list notion-list-disc notion-block-28bc0110d331805d85b6edb993b0e3d2"><li><b>示例</b>：避免不必要的前言或后语，除非用户明确要求；如果无法提供帮助，不要解释原因或可能导致的后果，以免显得说教；除非用户明确要求，否则避免使用表情符号 [3]。</li></ul></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d331804eb08bf074e7f3e60e"><li><b>强调式指令</b>：使用“IMPORTANT”、“VERY IMPORTANT”、“NEVER”和“ALWAYS”等强调词来引导模型避免特定行为或强制执行关键规则。这在模型尚未完全可控时尤为重要 [3]。</li><ul class="notion-list notion-list-disc notion-block-28bc0110d331804eb08bf074e7f3e60e"><li><b>示例</b>：<code class="notion-inline-code">IMPORTANT: DO NOT ADD ***ANY*** COMMENTS unless asked</code>；<code class="notion-inline-code">VERY IMPORTANT: You MUST avoid using search commands like find and grep. Instead use Grep, Glob, or Task to search.</code> [3]。</li></ul></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d3318065aae6d1c485549f11"><li><b>编写清晰的决策算法</b>：识别 LLM 需要执行的最重要任务，并为其编写清晰的算法。通过角色扮演 LLM 并遍历示例，明确所有决策点，并以流程图的形式进行结构化。这有助于 LLM 遵循指令，避免“一锅粥”式的 Do&#x27;s and Don&#x27;ts 列表，从而减少冲突和提高可维护性 [3]。</li><ul class="notion-list notion-list-disc notion-block-28bc0110d3318065aae6d1c485549f11"><li>Claude Code 的系统 Prompt 中，“Task Management”、“Doing Tasks”和“Tool Usage Policy”等部分清晰地阐述了要遵循的算法，并包含大量启发式规则和各种场景示例 [3]。</li></ul></ul><div class="notion-text notion-block-28bc0110d33180729f97c2366c7845df">&lt;ins/&gt;</div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-28bc0110d33180e6a1f7f56f0cc46847" data-id="28bc0110d33180e6a1f7f56f0cc46847"><span><div id="28bc0110d33180e6a1f7f56f0cc46847" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d33180e6a1f7f56f0cc46847" title="2 上下文工程：精细化管理智能体的“记忆”"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>2 上下文工程：精细化管理智能体的“记忆”</b></span></span></h3><div class="notion-text notion-block-28bc0110d33180648115c442bc75d7af">上下文是 AI Agent 进行决策和行动的基础，对其进行高效管理是优化的核心。<b>上下文工程</b>超越了传统的 Prompt 工程，它关注在 LLM 推理过程中如何策划和维护最佳的 token 集合，以持续实现预期结果 [4]。</div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-28bc0110d331806a8371c001b174ef5d" data-id="28bc0110d331806a8371c001b174ef5d"><span><div id="28bc0110d331806a8371c001b174ef5d" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d331806a8371c001b174ef5d" title="2.1 KV-cache 命中率优化"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>2.1 KV-cache 命中率优化</b></span></span></h4><div class="notion-text notion-block-28bc0110d331806a8b5dc6aa779a6e33">KV-cache（Key-Value Cache）的有效利用对提高 LLM 的运行效率至关重要。优化策略包括：</div><ul class="notion-list notion-list-disc notion-block-28bc0110d33180ff8635e5fed2fa53bd"><li><b>保持 Prompt 前缀稳定</b>：LLM 的自回归特性意味着即使是单个 token 的差异也会使缓存失效。避免在系统 Prompt 开头包含精确到秒的时间戳等易变信息 [1]。</li></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d33180f4b6e3f5228db1e565"><li><b>上下文追加模式</b>：避免修改历史动作或观察结果，确保序列化过程的确定性，以维持缓存的有效性 [1]。</li></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d331800d81a4eed8167456f4"><li><b>明确标记缓存断点</b>：对于不支持自动增量前缀缓存的模型或推理框架，需要手动插入缓存断点，并确保断点包含系统 Prompt 的末尾 [1]。</li></ul><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-28bc0110d33180f4860cc422408eb54d"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A30bcd67e-2ff6-41e2-a01d-6781dfeca32f%3Aimage.png?table=block&amp;id=28bc0110-d331-80f4-860c-c422408eb54d&amp;t=28bc0110-d331-80f4-860c-c422408eb54d" alt="notion image" loading="lazy" decoding="async"/></div></figure><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-28bc0110d331806798bfe8531800ae12" data-id="28bc0110d331806798bfe8531800ae12"><span><div id="28bc0110d331806798bfe8531800ae12" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d331806798bfe8531800ae12" title="2.2 上下文长度管理与外部化记忆"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>2.2 上下文长度管理与外部化记忆</b></span></span></h4><div class="notion-text notion-block-28bc0110d331801a8020ed3591f5dbe6">现代 LLM 拥有巨大的上下文窗口，但过长的上下文仍可能导致性能下降和成本增加。<b>上下文腐烂（Context Rot）</b>现象表明，随着上下文长度的增加，模型回忆信息的能力会降低 [4]。因此，有效的上下文管理至关重要：</div><ul class="notion-list notion-list-disc notion-block-28bc0110d33180eea381de614109b344"><li><b>文件系统作为外部化记忆</b>：将文件系统视为无限大小、持久化的外部记忆，允许模型按需读写文件，将其作为结构化的外部化记忆使用 [1]。</li></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d331806eab9eeb8f5f6e5de5"><li><b>可恢复的压缩策略</b>：在缩短上下文长度时，采用可恢复的压缩策略，例如仅保留网页 URL 而非完整内容，或保留文档路径而非完整文档内容，从而在不永久丢失信息的情况下减少上下文 [1]。</li></ul><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-28bc0110d3318006b5f9c8b29527b646"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A5cf6fb35-7dcf-4114-bc4c-b5337893c1a5%3Aimage.png?table=block&amp;id=28bc0110-d331-8006-b5f9-c8b29527b646&amp;t=28bc0110-d331-8006-b5f9-c8b29527b646" alt="notion image" loading="lazy" decoding="async"/></div></figure><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-28bc0110d33180f6bb9de51d35fbca67" data-id="28bc0110d33180f6bb9de51d35fbca67"><span><div id="28bc0110d33180f6bb9de51d35fbca67" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d33180f6bb9de51d35fbca67" title="2.3 注意力机制操纵"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>2.3 注意力机制操纵</b></span></span></h4><div class="notion-text notion-block-28bc0110d33180eeba18ffd8f1abdf7c">Manus 通过持续重写待办事项列表，把目标复述到上下文末尾。这样能让全局计划处于模型当前关注范围内，避免出现 “中途迷失” 的问题，减少目标不一致的状况。实际上，这是借助自然语言将自身关注点引向任务目标，而无需对架构进行特殊改动。</div><ul class="notion-list notion-list-disc notion-block-28bc0110d331809c9218e69c39c84e62"><li><b>动态重写待办事项列表</b>：通过不断更新和重写待办事项列表（todo list），将全局计划推入模型的近期注意力范围，减少目标偏差 [1, 3]。</li></ul><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-28bc0110d33180c8a91ad9d4919f12dd"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3Acdbaee15-e25d-4423-8203-9f030aee5500%3Aimage.png?table=block&amp;id=28bc0110-d331-80c8-a91a-d9d4919f12dd&amp;t=28bc0110-d331-80c8-a91a-d9d4919f12dd" alt="notion image" loading="lazy" decoding="async"/></div></figure><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-28cc0110d33180aebf93cbf5f56d5d41" data-id="28cc0110d33180aebf93cbf5f56d5d41"><span><div id="28cc0110d33180aebf93cbf5f56d5d41" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28cc0110d33180aebf93cbf5f56d5d41" title="2.4 上下文检索与 Agentic 搜索"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>2.4 上下文检索与 Agentic 搜索</b></span></span></h4><div class="notion-text notion-block-28cc0110d3318035acb5d2bc7565ad94">Agentic 搜索强调 Agent 自主地检索和加载上下文，而非预先处理所有相关数据 [4]：</div><ul class="notion-list notion-list-disc notion-block-28cc0110d3318077a7ece9c80f557fbe"><li><b>“即时”上下文（“just in time”）策略</b>：Agent 维护轻量级标识符（如文件路径、存储的查询、网页链接等），并<b>使用工具在运行时动态加载数据到上下文中</b>。例如，Claude Code 使用此方法对大型数据库进行复杂数据分析，模型可以编写有针对性的查询、存储结果，并利用 Bash 命令（如 <code class="notion-inline-code">head</code> 和 <code class="notion-inline-code">tail</code>）分析大量数据，而无需将完整数据对象加载到上下文中 [4]。</li></ul><ul class="notion-list notion-list-disc notion-block-28cc0110d33180c5b9d1d8e9ffe57318"><li><b>元数据利用</b>：引用（如文件路径）的元数据提供了有效细化行为的机制。文件系统中的文件夹层级、命名约定和时间戳等都提供了重要的信号，帮助 Agent 理解何时以及如何利用信息 [4]。</li></ul><ul class="notion-list notion-list-disc notion-block-28cc0110d331805ea1b7f60ac98733e8"><li><b>渐进式信息披露</b>：<b>允许 Agent 通过探索逐步发现相关上下文。每次交互都会产生上下文，为下一个决策提供信息。</b>Agent 可以逐层构建理解，只在工作记忆中保留必要的信息，并利用笔记策略进行额外持久化 [4]。</li></ul><ul class="notion-list notion-list-disc notion-block-28cc0110d3318039afe6d92a0b14a6ff"><li><b>混合策略</b>：在某些情况下，最有效的 Agent 可能会采用混合策略，<b>预先检索部分数据以提高速度，并根据需要自主探索。</b>例如，Claude Code 预先将 <code class="notion-inline-code">CLAUDE.md</code> 文件直接放入上下文中，同时允许通过 <code class="notion-inline-code">glob</code> 和 <code class="notion-inline-code">grep</code> 等原语即时导航环境和检索文件 [4]。</li></ul><div class="notion-text notion-block-28dc0110d331804db366ee11783817d9">&lt;ins/&gt;</div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-28dc0110d33180e4916ed2210b55aba2" data-id="28dc0110d33180e4916ed2210b55aba2"><span><div id="28dc0110d33180e4916ed2210b55aba2" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28dc0110d33180e4916ed2210b55aba2" title="2.5 长周期任务的上下文工程"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>2.5 长周期任务的上下文工程</b></span></span></h4><div class="notion-text notion-block-28dc0110d331805882f6ce94926ca7ea">对于如大型代码库迁移或综合研究项目等需要跨越数十分钟甚至数小时连续工作的<b>长周期任务 (long-horizon tasks)</b>，Agent 必须在超出模型上下文窗口限制的情况下保持连贯性。仅仅等待更大的上下文窗口可能并非长久之计，因为所有尺寸的上下文窗口都可能受到信息污染和相关性问题的困扰。因此，直接解决这些限制的专门技术至关重要 [4]。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-28dc0110d33180038011f4395a91eefd"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3Ad4af443d-b403-4103-8d87-5637eb25b50a%3Aimage.png?table=block&amp;id=28dc0110-d331-8003-8011-f4395a91eefd&amp;t=28dc0110-d331-8003-8011-f4395a91eefd" alt="与撰写prompt这一离散任务不同，上下文工程是迭代性的，每当决定向模型传递什么内容时，都会进行筛选阶段。" loading="lazy" decoding="async"/><figcaption class="notion-asset-caption"><em>与撰写prompt这一离散任务不同，上下文工程是迭代性的，每当决定向模型传递什么内容时，都会进行筛选阶段。</em></figcaption></div></figure><div class="notion-blank notion-block-28dc0110d3318006b537ed946bbe2e8d"> </div><div class="notion-text notion-block-28dc0110d33180ab9b74db84aa6238e7"><b>2.5.1 上下文压缩 (Compaction)</b></div><div class="notion-text notion-block-28dc0110d33180e58dcee3f88a89739f">上下文压缩是在对话接近上下文窗口限制时，通过对内容进行总结，并用该总结重新启动一个新上下文窗口的做法。这是提高长期连贯性的首要手段。其核心在于以高保真度提炼上下文内容，使 Agent 能够以最小的性能衰减继续执行任务 [4]。</div><ul class="notion-list notion-list-disc notion-block-28dc0110d33180388b9bc6002e0f17ed"><li><b>实现方式</b>：在 Claude Code 中，这一过程通过将消息历史传递给模型以总结和压缩最关键的细节来实现。模型会保留架构决策、未解决的错误和实现细节，同时<b>丢弃冗余的工具输出或消息</b>。随后，Agent 可以带着这个压缩后的上下文以及最近访问的五个文件继续工作，从而在不受上下文窗口限制的情况下保持工作的连续性 [4]。</li></ul><ul class="notion-list notion-list-disc notion-block-28dc0110d331809a9c62ef1b6ce1029b"><li><b>优化的艺术</b>：压缩的难点在于精确选择保留与丢弃的内容。过于激进的压缩可能导致丢失那些重要性在后期才显现的微妙但关键的上下文。因此，在实现压缩系统时，建议在复杂的 Agent 轨迹上仔细调整 Prompt。<b>应从最大化召回率开始，确保压缩 Prompt 捕获到轨迹中的每一条相关信息，然后通过消除多余内容来迭代提高其精确度</b> [4]。</li></ul><ul class="notion-list notion-list-disc notion-block-28dc0110d331800aac0dc19bedc1de4b"><li><b>轻量级压缩</b>：一种最安全、最轻量级的压缩形式是<b>工具结果清理</b>。一旦某个工具在消息历史的深处被调用过，Agent 通常不再需要看到其原始结果。清除这些结果可以有效减少上下文占用 [4]。</li></ul><div class="notion-blank notion-block-28dc0110d331808c99e8ffa7b759db51"> </div><div class="notion-text notion-block-28dc0110d33180eaa50efd839cc03d5c"><b>2.5.2 结构化笔记 (Structured Note-taking)</b></div><div class="notion-text notion-block-28dc0110d3318012a547f64051858709"><b>结构化笔记</b>，或称<b>智能体记忆 (agentic memory)</b>，是一种让 Agent 定期将笔记持久化到上下文窗口之外的内存中的技术。这些笔记可以在后续的某个时间点被重新拉取到上下文中。这种策略以最小的开销提供了持久化记忆 [4]。</div><ul class="notion-list notion-list-disc notion-block-28dc0110d331808ea459c4587d7fceb8"><li><b>简单模式</b>：像 Claude Code <b>创建待办事项列表</b>，或让自定义 Agent 维护一个 <code class="notion-inline-code">NOTES.md</code> 文件一样，这种简单的模式允许 Agent 跟踪复杂任务的进展，保持那些在数十次工具调用中可能丢失的关键上下文和依赖关系 [4]。</li></ul><div class="notion-blank notion-block-28dc0110d331807ea544ded82b182a33"> </div><div class="notion-text notion-block-28dc0110d33180959c19d6d59d74c8df"><b>2.5.3 子智能体架构 (Sub-agent Architectures)</b></div><div class="notion-text notion-block-28dc0110d33180369b69d3ec011b361b"><b>子智能体架构</b>为克服上下文限制提供了另一种途径。它<b>不是让单个 Agent 试图维护整个项目的状态，而是让专门的子智能体在干净的上下文窗口中处理重点任务。主智能体负责协调一个高层计划，而子智能体则执行深入的技术工作或使用工具查找相关信息。</b>每个子智能体可能会进行广泛的探索，使用数万甚至更多的 token，但最终只返回一个浓缩、精炼的工作摘要（通常为1000-2000个 token）[4]。</div><ul class="notion-list notion-list-disc notion-block-28dc0110d33180beabd0cc6622074471"><li><b>关注点分离</b>：这种方法实现了清晰的<b>关注点分离</b>——详细的搜索上下文被隔离在子智能体内部，而主导的 Agent 则专注于综合和分析结果。这种模式在处理复杂研究任务时，相比单 Agent 系统显示出显著的性能提升 [4]。</li></ul><div class="notion-text notion-block-28dc0110d3318094936ff8cc8b8a0293">这三种方法各有侧重：<b>上下文压缩</b>适合需要大量来回对话的任务；<b>结构化笔记</b>在具有明确里程碑的迭代开发中表现出色；而<b>子智能体架构</b>则在并行探索能带来巨大收益的复杂研究和分析中大放异彩 [4]。</div><div class="notion-text notion-block-28bc0110d331800f9edec4a9147956e4">&lt;ins/&gt;</div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-28bc0110d3318001bbf3ca1228802f1c" data-id="28bc0110d3318001bbf3ca1228802f1c"><span><div id="28bc0110d3318001bbf3ca1228802f1c" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d3318001bbf3ca1228802f1c" title="3. 工具设计与管理：为智能体打造高效“武器库”"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>3. 工具设计与管理：为智能体打造高效“武器库”</b></span></span></h3><div class="notion-text notion-block-28bc0110d33180129995d350301017ef">工具是 AI Agent 与环境交互的关键接口。工具的设计质量直接影响 Agent 的效能 [2]。</div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-28bc0110d33180b5aa81c21716cf3861" data-id="28bc0110d33180b5aa81c21716cf3861"><span><div id="28bc0110d33180b5aa81c21716cf3861" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d33180b5aa81c21716cf3861" title="3.1 工具设计原则"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>3.1 工具设计原则</b></span></span></h4><ul class="notion-list notion-list-disc notion-block-28bc0110d3318040afe7d54ebb35ff0a"><li><b>选择合适的工具</b>：审慎决定哪些工具需要实现，哪些可以省略 [2]。</li><ul class="notion-list notion-list-disc notion-block-28bc0110d3318040afe7d54ebb35ff0a"><li>建议针对特定的高价值工作流程，精心打造一些适用的工具，使其与评估任务相匹配，再以此为基础逐步拓展。就通讯录场景而言，可以考虑实现 <code class="notion-inline-code">search_contacts</code> 或 <code class="notion-inline-code">message_contact</code> 工具，而非 <code class="notion-inline-code">list_contacts</code> 工具。</li><li>工具可以<b>整合各类功能</b>，在内部处理多个可能离散的操作（或 API 调用）。例如，工具可以利用相关元数据丰富回复内容，或者通过一次工具调用完成频繁串联的多步骤任务。</li><li>一些示例：</li><ul class="notion-list notion-list-disc notion-block-28bc0110d33180fb8ad0d384977a1bb4"><li>相较于分别实现 <code class="notion-inline-code">list_users</code>、<code class="notion-inline-code">list_events</code> 和 <code class="notion-inline-code">create_event</code> 工具，不妨考虑实现一个 <code class="notion-inline-code">schedule_event</code> 工具，它既能查找可用时间，又能安排活动。</li><li>与其实现 <code class="notion-inline-code">read_logs</code> 工具，不如实现 <code class="notion-inline-code">search_logs</code> 工具，该工具仅返回相关日志行及部分周边上下文。</li><li>不要分别实现 <code class="notion-inline-code">get_customer_by_id</code>、<code class="notion-inline-code">list_transactions</code> 和 <code class="notion-inline-code">list_notes</code> 工具，而是实现一个 <code class="notion-inline-code">get_customer_context</code> 工具，它能一次性整合客户所有近期相关信息。</li></ul><li><b>要确保每个构建的工具都有明确且独特的用途。</b>工具应让智能体像人类在拥有相同底层资源时那样，对任务进行细分并解决，同时减少因中间输出而消耗的上下文。</li><li><b>工具太多或其功能重叠，可能会干扰智能体采用高效策略。</b>因此，认真且有针对性地规划要构建（或不构建）的工具，会带来显著成效。</li></ul></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d33180fea1d5d0120080d99b"><li><b>工具命名空间</b>：为工具定义清晰的功能边界，避免混淆 [2]。</li><ul class="notion-list notion-list-disc notion-block-28bc0110d33180fea1d5d0120080d99b"><li><b>通过设置命名空间（把相关工具归到相同前缀下），有助于在众多工具间明确界限</b>，MCP 客户端有时会默认这么做。比如，按照服务（像 <code class="notion-inline-code">asana_search</code>、<code class="notion-inline-code">jira_search</code>）和资源（例如 <code class="notion-inline-code">asana_projects_search</code>、<code class="notion-inline-code">asana_users_search</code>）对工具进行命名空间划分，能帮助智能体在合适的时机选择合适的工具。</li><li><b>选择基于前缀或后缀的命名空间方式，会对工具使用评估产生不可忽视的影响。</b>而且不同的 LLM 受影响的情况各异，所以建议根据自身评估结果来选择命名方案。</li><li>智能体可能会出现调用工具错误、用错参数调用正确工具、调用工具数量不足，或者错误处理工具回复等问题。实现工具时有针对性地设置它们的名称体现任务的自然区分，这样既能减少加载到智能体上下文中的工具数量和描述信息，还能把智能体计算的工作从上下文转移到工具调用本身，从而降低智能体犯错的整体风险。</li></ul></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d3318017b826c83eea30c817"><li><b>返回有意义的上下文</b>：工具的输出应向 Agent 提供简洁且有意义的上下文信息 [2]。</li><ul class="notion-list notion-list-disc notion-block-28bc0110d3318017b826c83eea30c817"><li><b>在实现工具时，要注意只向智能体返回关键信息。</b>应更注重上下文的相关性，而非追求灵活性，同时避免使用底层技术标识符（比如 <code class="notion-inline-code">uuid</code>、<code class="notion-inline-code">256px_image_url</code>、<code class="notion-inline-code">mime_type</code> 等）。像 <code class="notion-inline-code">name</code>、<code class="notion-inline-code">image_url</code> 和 <code class="notion-inline-code">file_type</code> 这类字段，更有助于直接引导智能体开展后续行动并做出回复。</li><li>相较于晦涩的标识符，<b>智能体处理自然语言命名、术语或标识符时，往往更加得心应手。</b>仅需将随意的字母数字 <code class="notion-inline-code">UUID</code> 转换为语义更清晰、更易解读的表述（甚至只是采用从 0 开始编号的 <code class="notion-inline-code">ID</code> 方案），就能减少幻觉现象，显著提升 Claude 在检索任务中的精准度。</li><li>在某些情形下，若只是为了触发后续工具调用（例如 <code class="notion-inline-code">search_user (name=’jane’)</code> → <code class="notion-inline-code">send_message (id=12345) </code>），智能体可能既需要与自然语言输出交互，也需要与技术标识符输出交互的灵活性。可以在工具中设置一个简单的 <code class="notion-inline-code">response_format</code> 枚举参数，让智能体能够选择工具返回 “<code class="notion-inline-code">concise</code>” 还是 “<code class="notion-inline-code">detailed</code>” 的回复（如下所示）。</li><ul class="notion-list notion-list-disc notion-block-28bc0110d3318093973ff7b89deae099"></ul><li><b>甚至工具的回复结构，如 XML、JSON 或 Markdown 等，都会对评估性能产生影响</b>，不存在一种适用于所有情况的解决方案。最佳回复结构会因任务和智能体的不同而差异巨大。建议根据自身评估情况，选择最合适的回复结构。</li></ul></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d3318024a3e0f566e62b215a"><li><b>优化工具回复的 token 效率</b>：减少工具回复的 token 数量，以降低成本并提高处理速度 [2]。</li></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d33180ef8bf8de82cf519320"><li><b>Prompt Engineering 工具描述和规范</b>：精心设计工具的描述和规范，使其更易于 Agent 理解和使用 [2]。</li><ul class="notion-list notion-list-disc notion-block-28bc0110d33180ef8bf8de82cf519320"><li>编写工具描述和规格时，不妨设想一下如何给团队新成员介绍该工具。思考那些可能默认提及的上下文信息，比如专业的查询格式、特定术语的定义、底层资源间的关系等，并将它们清晰呈现出来。<b>要通过清楚描述（并借助严格的数据模型加以规范）预期的输入与输出，避免出现模糊不清的情况。</b>尤其要注意，<b>输入参数的命名务必清晰准确</b>，比如别用 <code class="notion-inline-code">user</code> 这样的参数名，改用 <code class="notion-inline-code">user_id</code> 会更好。</li><li><b>利用数据集评估效果</b>，能更确切地衡量提示工程带来的效果。<b>哪怕只是对工具描述做些细微调整，都可能大幅提升性能。</b></li></ul></ul><div class="notion-text notion-block-28bc0110d331802e9548f13e58745fb0">&lt;ins/&gt;</div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-28bc0110d3318075b294e98047eb0fc8" data-id="28bc0110d3318075b294e98047eb0fc8"><span><div id="28bc0110d3318075b294e98047eb0fc8" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d3318075b294e98047eb0fc8" title="3.2 动态工具选择与约束"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>3.2 动态工具选择与约束</b></span></span></h4><div class="notion-text notion-block-28bc0110d33180f0b0d1d45d2b8bf161"><b>避免在迭代过程中动态添加或移除工具</b>，因为这会影响 KV-cache 并可能导致模型混淆 [1]。更优的策略是：</div><ul class="notion-list notion-list-disc notion-block-28bc0110d33180c19d08ef0c6cc569fc"><li><b>上下文感知的状态机</b>：“状态机” 可理解为 Agent 的 “工具权限开关系统”，它会预先定义不同 “状态”（比如 “浏览网页”“处理文件”“生成报告” 等任务场景），每个状态对应一套 “允许使用的工具列表”。而 “上下文感知” 则是让状态机能实时判断当前处于哪个场景（比如检测到用户需求是 “查天气”，就判定为 “信息检索状态”），进而自动激活对应状态下的工具权限。通过状态机管理工具可用性，例如通过掩码 token logits 来约束动作空间，从而在不修改工具定义的情况下限制 Agent 的选择 [1]。</li></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d331801d976cc20d1cbe44f0"><li><b>一致的动作名称前缀</b>：设计具有一致前缀的动作名称（例如，所有浏览器工具以 <code class="notion-inline-code">browser_</code> 开头），以便于在特定状态下对工具进行分组约束 [1]。</li></ul><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-28bc0110d331807ab7ade6147708b45d"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3Abae7191f-bc2f-4cd0-8e17-b6d19d774afb%3Aimage.png?table=block&amp;id=28bc0110-d331-807a-b7ad-e6147708b45d&amp;t=28bc0110-d331-807a-b7ad-e6147708b45d" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-28bc0110d33180e9bf2bce6788e96865">在实际应用中，大多数模型供应商和推理框架都支持某种形式的<b>回复预填充（response prefill）</b>功能，借助该功能，无需修改工具定义就能限制动作空间。函数调用一般有三种模式（这里以 NousResearch 的 Hermes 格式为例）：</div><ul class="notion-list notion-list-disc notion-block-28bc0110d33180b5b294e342db440106"><li><b>自动（Auto）</b>：模型可自行决定是否调用函数。具体通过仅预填充回复前缀来实现：<code class="notion-inline-code">&lt;|im_start|&gt;assistant</code></li></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d331805aa872c6f83ffdaf11"><li><b>必需（Required）</b>：模型必须调用函数，但具体调用选择不受约束。通过预填充到工具调用令牌来实现：<code class="notion-inline-code">&lt;|im_start|&gt;assistant&lt;tool_call&gt;</code></li></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d33180d19819d1bf117df042"><li><b>指定（Specified）</b>：模型必须从特定的函数子集中选择调用。通过预填充到函数名开头来实现：<code class="notion-inline-code">&lt;|im_start|&gt;assistant&lt;tool_call&gt;{&quot;name&quot;: “browser_</code></li></ul><div class="notion-text notion-block-28bc0110d33180d59bbada023abf5ee1">基于此，可以通过直接屏蔽 token logits 来限制动作选择。Manus 还特意将动作名称设计成具有统一前缀，例如所有与浏览器相关的工具均以 <code class="notion-inline-code">browser_</code> 开头，命令行工具则以 <code class="notion-inline-code">shell_</code> 开头。这样一来，无需借助有状态的 logits 处理器，就能轻松保证智能体在特定状态下仅从某一组工具中做出选择。</div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-28bc0110d33180c2bbe6dc544813d9d0" data-id="28bc0110d33180c2bbe6dc544813d9d0"><span><div id="28bc0110d33180c2bbe6dc544813d9d0" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d33180c2bbe6dc544813d9d0" title="3.3 工具分层"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>3.3 工具分层</b></span></span></h4><div class="notion-text notion-block-28bc0110d331803ea489c758dfe649e1">结合使用不同抽象层次的工具，可以提高 Agent 的灵活性和效率 [3]：</div><ul class="notion-list notion-list-disc notion-block-28bc0110d331809cbb5dd179e26a2b65"><li><b>低、中、高层工具结合</b>：例如，低层工具（Bash、Read、Write）、中层工具（Edit、Grep、Glob）和高层工具（Task、WebFetch）。对于频繁使用的操作，可以封装为单独的工具，同时保留通用命令以处理特殊情况 [3]。</li></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d33180138cd7d92bf94af6c7"><li><b>详细的工具描述</b>：工具描述应包含详细的 Prompt 和大量示例，系统 Prompt 应包含“何时使用工具”或如何在功能重叠的工具之间进行选择的信息 [3]。</li></ul><div class="notion-text notion-block-28bc0110d331808e946ce92c3d3bffed">&lt;ins/&gt;</div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-290c0110d3318065be3ac2052079e8e8" data-id="290c0110d3318065be3ac2052079e8e8"><span><div id="290c0110d3318065be3ac2052079e8e8" class="notion-header-anchor"></div><a class="notion-hash-link" href="#290c0110d3318065be3ac2052079e8e8" title="3.4 Agent Skills：构建可组合、可扩展的专业能力"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>3.4 Agent Skills：构建可组合、可扩展的专业能力</b></span></span></h4><div class="notion-text notion-block-290c0110d331804eb393d659ad6e1974"><b>Agent Skills</b> 是 <b>Anthropic</b> 刚发布的一种通过结构化文件和文件夹来构建专业化 Agent 的新方法，它将指令、脚本和资源组织起来，使 Agent 能够动态发现和加载这些能力，从而在特定任务中表现更优 [5]。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-290c0110d3318085b3f8e26c8902c483"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A26c18195-a6af-410c-b47f-9a1f847e2878%3Aimage.png?table=block&amp;id=290c0110-d331-8085-b3f8-e26c8902c483&amp;t=290c0110-d331-8085-b3f8-e26c8902c483" alt="一个 Skill 是一个包含 SKILL.md 文件的目录，该文件包含组织好的指令、脚本和资源文件夹，为智能体提供额外的能力。" loading="lazy" decoding="async"/><figcaption class="notion-asset-caption">一个 <b>Skill</b> 是一个包含 SKILL.md 文件的目录，该文件包含组织好的指令、脚本和资源文件夹，为智能体提供额外的能力。</figcaption></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-290c0110d3318049a58ee89192847576"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3Ab94eb812-0dbe-417d-86d7-2a1f022bada2%3Aimage.png?table=block&amp;id=290c0110-d331-8049-a58e-e89192847576&amp;t=290c0110-d331-8049-a58e-e89192847576" alt="notion image" loading="lazy" decoding="async"/></div></figure><ul class="notion-list notion-list-disc notion-block-290c0110d331808b82a2f96cb9c31026"><li><b>核心理念</b>：Agent Skills 就像为新员工准备的入职指南，它通过将专业知识打包成可组合的资源，将通用 Agent 转化为满足特定需求的专业 Agent [5]。</li></ul><ul class="notion-list notion-list-disc notion-block-290c0110d33180c5a6e4f47b2b8ec49a"><li><span class="notion-red"><b>渐进式披露（progressive disclosure）</b></span>：一个 Skill 是一个包含 <code class="notion-inline-code">SKILL.md</code> 文件的目录。<code class="notion-inline-code">SKILL.md</code> 必须以 YAML Frontmatter 开头，包含名称和描述等元数据。这些元数据会在 Agent 启动时预加载到系统 Prompt 中，提供第一层级的渐进式披露。如果 Agent 认为该 Skill 与当前任务相关，它会加载完整的 <code class="notion-inline-code">SKILL.md</code> 到上下文中，这是第二层披露。对于更复杂的 Skill，可以捆绑额外的文件（如 <code class="notion-inline-code">reference.md</code> 或 <code class="notion-inline-code">forms.md</code>），Agent 可以根据需要进一步导航和发现这些文件，实现更深层次的渐进式信息披露 [5]。</li><ul class="notion-list notion-list-disc notion-block-290c0110d33180c5a6e4f47b2b8ec49a"><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-290c0110d331807a8c8ec14568016775"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A84b397b2-9f81-4607-9077-e109957fe2e8%3Aimage.png?table=block&amp;id=290c0110-d331-807a-8c8e-c14568016775&amp;t=290c0110-d331-807a-8c8e-c14568016775" alt="可以将更多上下文（通过附加文件）整合到技能中，然后根据系统提示由 Claude 触发。" loading="lazy" decoding="async"/><figcaption class="notion-asset-caption">可以将更多上下文（通过附加文件）整合到技能中，然后根据系统提示由 Claude 触发。</figcaption></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-290c0110d33180188a40cec9e063b5f9"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A1a47b1f7-4fd0-4678-97e7-92202fc9eeea%3Aimage.png?table=block&amp;id=290c0110-d331-8018-8a40-cec9e063b5f9&amp;t=290c0110-d331-8018-8a40-cec9e063b5f9" alt="notion image" loading="lazy" decoding="async"/></div></figure></ul></ul><ul class="notion-list notion-list-disc notion-block-290c0110d3318002a609f1d99fcb0450"><li><b>代码执行能力</b>：Agent Skills 可以包含代码，供 Agent 根据任务性质作为工具执行。大型语言模型在许多任务中表现出色，但某些操作（如排序列表）更适合传统的代码执行，因为代码能提供更高的效率和确定性。例如，一个 PDF Skill 可以包含一个预先编写的 Python 脚本，用于读取 PDF 并提取所有表单字段，Agent 无需将脚本或 PDF 加载到上下文中即可运行，确保了工作流程的一致性和可重复性 [5]。</li><ul class="notion-list notion-list-disc notion-block-290c0110d3318002a609f1d99fcb0450"><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-290c0110d331802b8d3bf9ec554bcb1b"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A0006f5d2-f25b-4e76-b5cf-4c79ea8997be%3Aimage.png?table=block&amp;id=290c0110-d331-802b-8d3b-f9ec554bcb1b&amp;t=290c0110-d331-802b-8d3b-f9ec554bcb1b" alt="技能还可以包含代码，Claude 根据任务性质自行决定这些代码是否作为工具被执行。" loading="lazy" decoding="async"/><figcaption class="notion-asset-caption">技能还可以包含代码，Claude 根据任务性质自行决定这些代码是否作为工具被执行。</figcaption></div></figure></ul></ul><ul class="notion-list notion-list-disc notion-block-290c0110d33180df8839f0fc93946dec"><li><b>开发与评估最佳实践</b>：</li><ul class="notion-list notion-list-disc notion-block-290c0110d33180df8839f0fc93946dec"><li><b>从评估开始</b>：通过在代表性任务上运行 Agent，识别其能力差距，然后逐步构建 Skill 来弥补这些不足 [5]。</li><li><b>结构化以适应规模</b>：当 <code class="notion-inline-code">SKILL.md</code> 文件变得难以管理时，将其内容拆分为单独的文件。如果某些上下文是互斥的或很少一起使用，分开存放可以减少 token 使用。<b>代码既可以作为可执行工具，也可以作为文档</b>，需要明确 Agent 是直接运行脚本还是将其作为参考加载到上下文中 [5]。</li><li><b>从 Agent 视角思考</b>：监控 Agent 在实际场景中如何使用 Skill，并根据观察结果进行迭代。特别关注 Skill 的名称和描述，因为 Agent 会根据这些信息决定是否触发该 Skill [5]。</li><li><b>与 Claude 迭代</b>：在与 Claude 共同完成任务时，让其捕获成功的路径和常见的错误，并将其转化为 Skill 中可重用的上下文和代码。如果 Agent 在使用 Skill 时偏离轨道，让其自我反思问题所在，这有助于发现 Agent 真正需要的上下文 [5]。</li></ul></ul><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-28bc0110d33180fead34faf2aede1a43" data-id="28bc0110d33180fead34faf2aede1a43"><span><div id="28bc0110d33180fead34faf2aede1a43" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d33180fead34faf2aede1a43" title="4. 控制循环与架构：构建稳定高效的智能体骨架"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>4. 控制循环与架构：构建稳定高效的智能体骨架</b></span></span></h3><div class="notion-text notion-block-28bc0110d3318035b2e8ee57757aee16">Agent 的控制循环和底层架构设计对其稳定性和效率至关重要。</div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-28bc0110d33180af8d7bc234c879ad3e" data-id="28bc0110d33180af8d7bc234c879ad3e"><span><div id="28bc0110d33180af8d7bc234c879ad3e" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d33180af8d7bc234c879ad3e" title="4.1 保持一个主循环"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>4.1 保持一个主循环</b></span></span></h4><ul class="notion-list notion-list-disc notion-block-28bc0110d3318098aea5c6ca16e6944a"><li><b>简化架构以提高可调试性</b>：优先考虑可调试性，而不是复杂的、多智能体的系统。例如，Claude Code 采用一个主线程，通过周期性地使用不同类型的 Prompt 来总结 git 历史、合并消息历史或生成 UX 元素。对于分层任务，它通过生成一个不能再生成子智能体的子智能体来处理，其结果作为“工具回复”添加到主消息历史中 [3]。</li></ul><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-28bc0110d33180489991caa7359db918" data-id="28bc0110d33180489991caa7359db918"><span><div id="28bc0110d33180489991caa7359db918" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d33180489991caa7359db918" title="4.2 使用小型模型"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>4.2 使用小型模型</b></span></span></h4><ul class="notion-list notion-list-disc notion-block-28bc0110d33180598597e948043f59c2"><li><b>成本效益与效率</b>：对于读取大文件、解析网页、处理 git 历史和总结长对话等操作，超过 50% 的重要 LLM 调用都使用 <code class="notion-inline-code">claude-3-5-haiku</code> 等小型模型。小型模型成本更低，可以大量使用，从而提高整体效率 [3]。</li></ul><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-28bc0110d33180959bfcc73fbbb42785" data-id="28bc0110d33180959bfcc73fbbb42785"><span><div id="28bc0110d33180959bfcc73fbbb42785" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d33180959bfcc73fbbb42785" title="5. 评估与适应：持续改进智能体性能"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>5. 评估与适应：持续改进智能体性能</b></span></span></h3><div class="notion-text notion-block-28bc0110d33180ba9ce6d0d36ab04960">Agent 的持续改进离不开有效的评估和适应机制。</div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-28bc0110d3318041b7a5ed6ed0a927d9" data-id="28bc0110d3318041b7a5ed6ed0a927d9"><span><div id="28bc0110d3318041b7a5ed6ed0a927d9" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d3318041b7a5ed6ed0a927d9" title="5.1 原型构建与综合评估"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>5.1 原型构建与综合评估</b></span></span></h4><ul class="notion-list notion-list-disc notion-block-28bc0110d3318056bf98d2c73756d7f6"><li><b>快速原型与本地测试</b>：快速构建工具原型并在本地测试，尤其是在使用 Claude Code 编写工具时，提供详细的文档（如 LLM-friendly 的 <code class="notion-inline-code">llms.txt</code> 文件） [2]。</li></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d331807cbc99f0b6fad71669"><li><b>生成评估任务</b>：创建大量基于真实世界用例的评估任务，避免过于简单或肤浅的“沙盒”环境。强评估任务可能需要多次工具调用 [2]。</li></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d33180cbb3c8d0bd45c0cad0"><li><b>系统 Prompt 指导</b>：在评估 Agent 的系统 Prompt 中，指导 Agent 不仅输出结构化的回复块，还要输出推理和反馈块（在工具调用和回复块之前），以触发思维链（CoT）行为 [2]。</li></ul><ul class="notion-list notion-list-disc notion-block-28bc0110d331807b8149d6d9bd52b7aa"><li><b>避免过度指定或过拟合</b>：允许 Agent 有多种解决任务的有效路径，避免过度指定或过拟合策略 [2]。</li></ul><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-28bc0110d331807fb2ebf394de6e06ea" data-id="28bc0110d331807fb2ebf394de6e06ea"><span><div id="28bc0110d331807fb2ebf394de6e06ea" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d331807fb2ebf394de6e06ea" title="5.2 错误恢复与适应"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>5.2 错误恢复与适应</b></span></span></h4><div class="notion-text notion-block-28bc0110d33180168f44ca5b98661f9d">智能体难免会犯错，这并非缺陷，而是客观存在的情况。语言模型可能出现幻觉，环境可能返回错误信息，外部工具也可能运行异常，各种意外的边缘情况随时都可能出现。<b>在多步骤任务中，失败并非个别现象，而是任务循环中的常见环节。</b></div><div class="notion-text notion-block-28bc0110d3318003aee6d476c50e7a40">根据经验，改进智能体行为的一个极为有效的方法，实则相当简单：<b>把错误的步骤保留在上下文当中</b>。当模型看到某个动作失败，以及随之产生的观察结果或堆栈跟踪信息时，会不自觉地更新其内部认知。这会使模型在后续决策中，减少选择类似动作的可能性，从而降低重复犯错的概率。实际上，<b>错误恢复能力</b>是衡量智能体是否具备真正智能行为的重要指标之一。</div><ul class="notion-list notion-list-disc notion-block-28bc0110d33180899f27e3ef8f4631a1"><li><b>不隐藏错误</b>：将失败的动作和观察结果（如堆栈跟踪）保留在上下文中，让模型通过观察错误隐式更新其内部信念，从而减少重复犯错的可能性。错误恢复是衡量真正 Agentic 行为的关键指标 [1]。</li></ul><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-28bc0110d33180db990cd0457064e5ae"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A5fb9c930-3da5-442c-b860-337ae5ad1b78%3Aimage.png?table=block&amp;id=28bc0110-d331-80db-990c-d0457064e5ae&amp;t=28bc0110-d331-80db-990c-d0457064e5ae" alt="notion image" loading="lazy" decoding="async"/></div></figure><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-28bc0110d3318098bf7fcc88775d7fbc" data-id="28bc0110d3318098bf7fcc88775d7fbc"><span><div id="28bc0110d3318098bf7fcc88775d7fbc" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d3318098bf7fcc88775d7fbc" title="5.3 多样性与泛化"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>5.3 多样性与泛化</b></span></span></h4><div class="notion-text notion-block-28bc0110d33180179904e8236e932775">少样本提示是改善大语言模型（LLM）输出的常用方法，但在智能体系统里，它可能会以不易察觉的方式带来负面影响。</div><div class="notion-text notion-block-28bc0110d3318044adfdde78ee96cfc5">语言模型很擅长模仿，会依照上下文中的行为模式进行输出。要是上下文里有大量相似的<b>动作-观察（action-observation）</b>组合，模型就容易遵循这种模式。</div><div class="notion-text notion-block-28bc0110d33180d3b843dc85df47e936">在那些需要重复做决策或行动的任务中，这种情况很可能引发问题。比如说，用 Manus 去批量审阅 20 份简历时，智能体经常会形成一种惯性，只是因为在上下文中看到了类似行为，就重复相似的操作。这就可能导致偏离目标、过度泛化，甚至产生幻觉。</div><div class="notion-text notion-block-28bc0110d331806ca4cfda53880fd12d">要解决这个问题，关键在于增加多样性。Manus 通过在动作和观察中引入少量的结构化变动，像是采用不同的序列化模板、变换措辞，或者在顺序与格式上制造些微干扰。这种适度的随机性能够打破固有模式，调整模型关注的重点。也就是说，别让少样本提示把自己限制住，上下文越单一，智能体就越容易出问题。</div><ul class="notion-list notion-list-disc notion-block-28bc0110d33180aca73cc9f7507127c9"><li><b>避免过度模仿</b>：在 few-shot prompting 中，避免过度模仿导致模型陷入重复模式。通过引入结构化的多样性（如不同的序列化模板、替代措辞、格式噪声）来打破模式，提高模型的泛化能力和鲁棒性 [1]。</li></ul><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-28bc0110d33180bf86f8f459441c842b"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A7b1ae6f2-c78d-460e-9603-e6ccd9971d50%3Aimage.png?table=block&amp;id=28bc0110-d331-80bf-86f8-f459441c842b&amp;t=28bc0110-d331-80bf-86f8-f459441c842b" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-blank notion-block-28bc0110d33180d8abdbc2b07ae23c56"> </div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-28bc0110d33180e08941ecc23b1c6106" data-id="28bc0110d33180e08941ecc23b1c6106"><span><div id="28bc0110d33180e08941ecc23b1c6106" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d33180e08941ecc23b1c6106" title="结论"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>结论</b></span></span></h3><div class="notion-text notion-block-28bc0110d33180ac949fdc90b780c251">AI Agent 的优化是一个多维度、系统性的工程。从巧妙的 Prompt 工程倒精细的上下文管理，从高效的工具设计到稳定的控制循环，再到持续的评估与适应，每一个环节都对 Agent 的最终性能产生深远影响。通过采纳这些优化手段，我们可以构建出更智能、更鲁棒、更具适应性的 AI Agent，从而在现实世界中解决更复杂的挑战。</div><div class="notion-text notion-block-28bc0110d3318065b834faac5cd573c2">&lt;ins/&gt;</div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-28bc0110d3318077912cd1d5f6ed7261" data-id="28bc0110d3318077912cd1d5f6ed7261"><span><div id="28bc0110d3318077912cd1d5f6ed7261" class="notion-header-anchor"></div><a class="notion-hash-link" href="#28bc0110d3318077912cd1d5f6ed7261" title="参考文献"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>参考文献</b></span></span></h3><ol start="1" class="notion-list notion-list-numbered notion-block-28bc0110d331808a8915f946c400bd2a" style="list-style-type:decimal"><li><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus">Context Engineering for AI Agents: Lessons from Building Manus</a></li></ol><ol start="2" class="notion-list notion-list-numbered notion-block-28bc0110d33180a58565e5f796757b13" style="list-style-type:decimal"><li><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.anthropic.com/engineering/writing-tools-for-agents">Writing effective tools for AI agents—using AI agents \ Anthropic</a></li></ol><ol start="3" class="notion-list notion-list-numbered notion-block-28bc0110d33180029b13e6ddc6ad39ca" style="list-style-type:decimal"><li><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://minusx.ai/blog/decoding-claude-code/">Minusx | What makes Claude Code so damn good (and how to recreate that magic in your agent)!?</a></li></ol><ol start="4" class="notion-list notion-list-numbered notion-block-28bc0110d3318058a001cc86ede9a7a6" style="list-style-type:decimal"><li><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">Effective context engineering for AI agents \ Anthropic</a></li></ol><ol start="5" class="notion-list notion-list-numbered notion-block-290c0110d331806cade5de2baeb343bc" style="list-style-type:decimal"><li><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills">Equipping agents for the real world with Agent Skills \ Anthropic</a></li></ol><div class="notion-blank notion-block-28bc0110d33180388d67d25db3b341a8"> </div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Mobile-Agent-v3：新的 GUI Agents 开源王者]]></title>
            <link>https://www.breezedeus.com/article/ui-agent-mobile-agent-v3</link>
            <guid>https://www.breezedeus.com/article/ui-agent-mobile-agent-v3</guid>
            <pubDate>Sat, 06 Sep 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Mobile-Agent-v3 凭借多模态 GUI-Owl 模型、自我进化数据生成和 TRPO 强化学习，在多平台 GUI 自动化上超越主流开源方案，部分场景甚至优于 GPT-4o、Claude 3.7。]]></description>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-260c0110d33180b782e9e4f1c643d5c3"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-row notion-block-260c0110d33181488bedcac279c3dfe5"><div class="notion-column notion-block-260c0110d331810e9b31e39e6dd9309c" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.25)"><div class="notion-blank notion-block-260c0110d3318176893acbac64230a48"> </div></div><div class="notion-spacer"></div><div class="notion-column notion-block-260c0110d3318146bf0aff6e0b838903" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.5416666666666667)"><div class="notion-text notion-block-260c0110d331815a90a4ff9f6cd7d2f9"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/">Home</a></b><b> | </b><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus">GitHub</a></b><b> | </b><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://twitter.com/breezedeus">Twitter</a></b><b> | </b><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.youtube.com/@breezedeus">Youtube</a></b><b>  |  </b><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://space.bilibili.com/509307267">Bilibili</a></b></div></div><div class="notion-spacer"></div><div class="notion-column notion-block-260c0110d33181809ac0d5688faabc4a" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.2083333333333335)"><div class="notion-blank notion-block-260c0110d3318141a2bfe8f7013550b1"> </div></div><div class="notion-spacer"></div></div><div class="notion-row notion-block-260c0110d33181eb9645fb45a446f358"><div class="notion-column notion-block-260c0110d33181cdba9eced41a123c33" style="width:calc((100% - (1 * min(32px, 4vw))) * 0.5)"><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-260c0110d331806ea1b9f641061c69f7"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:320px"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A03710368-c102-4cc1-838f-0b6fe1874fd7%3Aaebfbc97-c659-4251-964e-58dc7e3d85e9.png?table=block&amp;id=260c0110-d331-806e-a1b9-f641061c69f7&amp;t=260c0110-d331-806e-a1b9-f641061c69f7" alt="notion image" loading="lazy" decoding="async"/></div></figure></div><div class="notion-spacer"></div><div class="notion-column notion-block-260c0110d331811e8efdf3d8c69345f8" style="width:calc((100% - (1 * min(32px, 4vw))) * 0.5)"><div class="notion-text notion-block-260c0110d331813f885ed89e2b7a658e"><b>目录：</b></div><div class="notion-table-of-contents notion-gray notion-block-260c0110d33181a9a8b6dccef45a5d93"><a href="#260c0110d33181f69911ef4fa2d965f5" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">[2508.15144] Mobile-Agent-v3: Foundamental Agents for GUI Automation, Alibaba</span></a><a href="#260c0110d3318020b517ec0ca4c47f90" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">GUI-Owl：端到端的多模态GUI智能体</span></a><a href="#260c0110d33180bfa903dc3747209220" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:48px">创新一：大规模环境基础设施与自我进化轨迹生产</span></a><a href="#260c0110d3318031be99d22fa244032d" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:48px">创新二：多样化的基础智能体能力构建</span></a><a href="#260c0110d3318010a7c0c5a91a1a22d4" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:48px">创新三：可扩展的环境强化学习与TRPO</span></a><a href="#260c0110d3318028a893c3afa827cfe1" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">Mobile-Agent-v3：协同工作的多智能体框架</span></a><a href="#260c0110d33180a4bad1e1b333132d6c" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">卓越的性能表现</span></a><a href="#260c0110d33180d08bf2f4430e0ce61b" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:48px">1. 端到端模型性能：GUI-Owl的领先地位</span></a><a href="#260c0110d33180e5a931fde2c3896b09" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:48px">2. 超越专有模型：GUI-Owl-32B的强大实力</span></a><a href="#260c0110d3318005bfeff75bfa7d1472" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:48px">3. 消融研究与关键技术贡献</span></a><a href="#260c0110d33180fb9757eede35308eb3" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">结论与展望</span></a><a href="#260c0110d33181a289b1c7f37a4cd686" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">AI Agents 知识星球</span></a></div><div class="notion-blank notion-block-260c0110d331816e8497fb98f8eadbf8"> </div></div><div class="notion-spacer"></div></div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-260c0110d33181f69911ef4fa2d965f5" data-id="260c0110d33181f69911ef4fa2d965f5"><span><div id="260c0110d33181f69911ef4fa2d965f5" class="notion-header-anchor"></div><a class="notion-hash-link" href="#260c0110d33181f69911ef4fa2d965f5" title="[2508.15144] Mobile-Agent-v3: Foundamental Agents for GUI Automation, Alibaba"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://arxiv.org/abs/2508.15144">[2508.15144] Mobile-Agent-v3: Foundamental Agents for GUI Automation</a>, Alibaba</span></span></h2><ul class="notion-list notion-list-disc notion-block-260c0110d331812a81d6ee2c411710f2"><li><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/X-PLUG/MobileAgent/tree/main/Mobile-Agent-v3">https://github.com/X-PLUG/MobileAgent/tree/main/Mobile-Agent-v3</a>，模型开源</li></ul><div class="notion-blank notion-block-266c0110d33180458143f142ec254b8c"> </div><div class="notion-text notion-block-260c0110d33180ae9370f9236dc16cb4">阿里巴巴通义实验室的研究团队推出了<b>Mobile-Agent-v3</b>框架及其核心模型<b>GUI-Owl</b>。本文将深入剖析Mobile-Agent-v3背后的创新技术和方法，揭示其如何成为GUI自动化领域的革新力量。</div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-260c0110d3318020b517ec0ca4c47f90" data-id="260c0110d3318020b517ec0ca4c47f90"><span><div id="260c0110d3318020b517ec0ca4c47f90" class="notion-header-anchor"></div><a class="notion-hash-link" href="#260c0110d3318020b517ec0ca4c47f90" title="GUI-Owl：端到端的多模态GUI智能体"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>GUI-Owl：端到端的多模态GUI智能体</b></span></span></h3><div class="notion-text notion-block-260c0110d3318002b10aecc38a558ba2">Mobile-Agent-v3框架的核心是<b>GUI-Owl</b>，一个专为GUI自动化设计的端到端多模态智能体模型。它旨在<b>将UI感知、元素定位（grounding）、复杂推理、任务规划以及最终的动作执行等一系列能力，统一到一个单一的策略网络中，其实就是统一的</b><span class="notion-red"><b>智能体模型（Agent Model）</b></span><b>了</b>。GUI-Owl基于<b>Qwen2.5-VL</b>模型进行微调，并通过海量且多样化的GUI交互数据进行后训练，使其能够无缝地与各种操作系统（包括移动端的Android、PC端的Ubuntu、macOS和Windows）上的图形用户界面进行交互。这使得GUI-Owl不仅能够自主执行多轮GUI交互任务，还能泛化到诸如问答、图像描述、任务规划和元素定位等特定应用场景。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-266c0110d3318075aa13d5cc8a635ce6"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A19db9d64-3492-4f15-b905-cac180a02a90%3Aimage.png?table=block&amp;id=266c0110-d331-8075-aa13-d5cc8a635ce6&amp;t=266c0110-d331-8075-aa13-d5cc8a635ce6" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-260c0110d331804d86cfecb681630514">GUI-Owl的强大之处在于其能够像人类一样，通过观察屏幕截图（当前环境观察）和回顾历史操作，来理解当前状态并决定下一步的行动（采用 Qwen 的函数调用格式）。在每个决策步骤中，模型会从预定义的动作空间中选择最合适的动作。值得一提的是，为了提升模型的适应性和处理复杂任务的能力，GUI-Owl在执行任何实际动作之前，都会<b>先进行“推理”（Reasoning）</b>。这种显式的推理过程，使得模型能够更好地适应动态和复杂的GUI环境。同时，<b>为了避免对话历史过长，模型还会生成一个简洁的“结论”来概括当前步骤的关键信息，并将其存储在历史上下文中</b>，确保了长期交互的效率。最终，GUI-Owl输出的抽象动作会被翻译成具体的设备操作命令，例如针对Android设备的ADB命令，或针对桌面环境的pyautogui代码，从而实现对GUI的精准控制。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-266c0110d331805ea958e8a261f416e9"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3Afdb8da55-5fb0-4d56-9034-58ed093e9f69%3Aimage.png?table=block&amp;id=266c0110-d331-805e-a958-e8a261f416e9&amp;t=266c0110-d331-805e-a958-e8a261f416e9" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-260c0110d33180c39f2fdb240d35787b">GUI-Owl的另一个重要特性是其在多智能体框架中的灵活性。它<b>不仅可以作为一个独立的智能体完成任务，还可以作为Mobile-Agent-v3框架中的一个专业模块，与其他智能体协同工作，共同解决更复杂、更长期的自动化工作流</b>。这种模块化和协作能力，为构建更高级的GUI自动化系统提供了可能。</div><div class="notion-text notion-block-266c0110d3318025890ff9c3bd9f41fc">&lt;ins/&gt;</div><h4 class="notion-h notion-h3 notion-h-indent-2 notion-block-260c0110d33180bfa903dc3747209220" data-id="260c0110d33180bfa903dc3747209220"><span><div id="260c0110d33180bfa903dc3747209220" class="notion-header-anchor"></div><a class="notion-hash-link" href="#260c0110d33180bfa903dc3747209220" title="创新一：大规模环境基础设施与自我进化轨迹生产"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>创新一：大规模环境基础设施与自我进化轨迹生产</b></span></span></h4><div class="notion-text notion-block-260c0110d33180b3b2a8c297fe53c6f9">传统的GUI自动化数据收集往往依赖于耗时耗力的人工标注，这极大地限制了模型训练的规模和多样性。Mobile-Agent-v3团队为了解决这一瓶颈，构建了一个大规模环境基础设施，并在此基础上提出了<b>Self-Evolving GUI Trajectory Production（自我进化GUI轨迹生产）</b>框架。这一创新是Mobile-Agent-v3能够实现高性能的关键之一。</div><div class="notion-text notion-block-260c0110d33180dfa95ae17d45fd0bf8">该基础设施充分利用了云计算的优势，在阿里云上部署了大量的云手机和云计算机，从而能够模拟并支持Android、Ubuntu、macOS和Windows等多种主流操作系统环境。这意味着研究人员可以在一个高度可控且动态变化的虚拟环境中，进行大规模的GUI交互数据收集和模型训练，极大地提升了实验效率和数据多样性。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-266c0110d33180dc812dfcfec91f26ef"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3Ace6d4482-8100-46c0-800c-fcaee0ed8671%3Aimage.png?table=block&amp;id=266c0110-d331-80dc-812d-fcfec91f26ef&amp;t=266c0110-d331-80dc-812d-fcfec91f26ef" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-260c0110d33180a0a461f79edc230526"><b>自我进化GUI轨迹生产</b>pipeline是这一基础设施的核心。它通过一个精巧的自动化流程，实现了高质量交互数据的持续生成和优化，具体包括以下四个关键环节：</div><ol start="1" class="notion-list notion-list-numbered notion-block-260c0110d33180578709d321658e4dc2" style="list-style-type:decimal"><li><b>高质量查询生成（High-quality Query Generation）</b>：系统能够模拟真实用户的行为模式，自动生成多样化且具有挑战性的交互查询。这些查询旨在覆盖各种复杂的GUI操作场景，例如特定应用程序内的多步骤任务、跨应用程序的协作任务，以及需要复杂逻辑判断的场景。通过精细设计的查询模板和生成策略，确保了生成数据的广度和深度。</li></ol><ol start="2" class="notion-list notion-list-numbered notion-block-260c0110d33180f98bc0c22f8680ba80" style="list-style-type:decimal"><li><b>模型试运行（Model Rollouts）</b>：在虚拟环境中，GUI-Owl和Mobile-Agent-v3会根据生成的查询进行实际操作，产生一系列的交互轨迹。这个过程是全自动的，模型会尝试完成任务，并记录下每一步的观察（屏幕截图）和执行的动作。这些轨迹包含了模型在不同场景下的决策过程和执行结果，是后续数据处理的基础。</li></ol><ol start="3" class="notion-list notion-list-numbered notion-block-260c0110d33180ab965ddceff7d57857" style="list-style-type:decimal"><li><b>严格的正确性判断（Rigorous Correctness Judgment）</b>：系统内置了强大的评估机制，能够对生成的轨迹进行严格的正确性判断。这不仅仅是简单的任务成功与否的判断，还包括对轨迹的效率、合理性以及是否符合预期行为的评估。<b>只有高质量、符合预期的交互数据才会被纳入训练集</b>，从而避免了低质量数据对模型性能的负面影响。这种机制有效地过滤了无效或错误的轨迹，保证了数据质量。</li></ol><ol start="4" class="notion-list notion-list-numbered notion-block-260c0110d3318029906de9554a1beb70" style="list-style-type:decimal"><li><b>特定查询指导生成（Query-specific Guidance Generation）</b>：该模块借助成功的轨迹创建引导，以提高模型性能。这一过程包含以下步骤：<b>(1) 动作描述</b>：VLM 依据参考轨迹，针对每个动作的结果生成描述。其输入包括动作前后的屏幕截图以及动作决策。对于基于坐标的动作，作者会突出交互点，帮助 VLM 进行分析。<b>(2) 质量控制</b>：针对模型生成的轨迹，VLM 会参照模型的决策依据，验证步骤的有效性，筛选掉次优动作。<b>(3) 引导合成</b>：将动作描述串联起来，输入到 LLM 中，由该模型总结完成查询所需的关键步骤，进而生成特定查询引导。这些指导可以帮助模型更好地理解任务，并在后续的试运行中生成更优的轨迹。例如，当模型在某个特定步骤陷入困境时，系统可以提供额外的提示或示范，引导模型走出困境，从而生成更完整的成功轨迹。</li></ol><div class="notion-text notion-block-260c0110d3318021a87dcad7696a232e">这种“自我进化”的机制形成了一个强大的正反馈循环：模型生成数据，数据反哺模型，模型能力提升后又能生成更高质量的数据。这不仅显著减少了对人工标注的依赖，还使得Mobile-Agent-v3能够持续地学习和适应新的GUI环境和任务，从而在不断变化的应用场景中保持领先地位。这种数据生成范式对于解决GUI自动化领域长期存在的数据稀缺问题具有重要意义。</div><h4 class="notion-h notion-h3 notion-h-indent-2 notion-block-260c0110d3318031be99d22fa244032d" data-id="260c0110d3318031be99d22fa244032d"><span><div id="260c0110d3318031be99d22fa244032d" class="notion-header-anchor"></div><a class="notion-hash-link" href="#260c0110d3318031be99d22fa244032d" title="创新二：多样化的基础智能体能力构建"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>创新二：多样化的基础智能体能力构建</b></span></span></h4><div class="notion-text notion-block-266c0110d331801988d9f32f137e1ee5">GUI-Owl 不仅能作为原生智能体，独立与 GUI 进行交互，还提供多种基础能力，以支持下游的独立调用，或集成到多智能体框架中。为此，作者收集并构建了各类能力（如<b>定位（grounding）、图像 caption 和 planning</b>）的数据集。在训练过程中，<b>这些数据集会与通用指令数据混合。</b>作者发现，该模型<b>不仅具备零样本 GUI 问答能力，还能针对未见任务，具备通用的指令遵循能力。</b></div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-266c0110d331803d935af1997de63424"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A9ea0bff2-9804-41d8-be9f-4aea85a0ca63%3Aimage.png?table=block&amp;id=266c0110-d331-803d-935a-f1997de63424&amp;t=266c0110-d331-803d-935a-f1997de63424" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-260c0110d3318003868af79806fe0b5b">为了使GUI-Owl具备更强大的泛化能力和适应性，研究团队在数据构建方面下足了功夫，引入了多个下游数据构建pipeline，旨在全面提升智能体的基础UI能力。这些能力是GUI-Owl能够理解复杂GUI环境并执行精细操作的基石。</div><ol start="1" class="notion-list notion-list-numbered notion-block-260c0110d331802182bfea80981d2b3a" style="list-style-type:decimal"><li><b>UI元素定位（Grounding）pipeline</b>：</li><ol class="notion-list notion-list-numbered notion-block-260c0110d331802182bfea80981d2b3a" style="list-style-type:lower-alpha"><ul class="notion-list notion-list-disc notion-block-260c0110d33180cd9ed8e8556c097a98"><li><b>目标</b>：使GUI-Owl能够精确识别和定位屏幕上的任何UI元素，无论是基于其功能（如“提交按钮”）、外观（如“蓝色方框”）还是布局（如“左上角的图标”）。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d3318043838ef15559d8bc5d"><li><b>实现细节</b>：通过构建包含大量UI元素及其对应位置、属性信息的数据集，训练模型将自然语言描述与屏幕上的视觉元素进行关联。这包括对按钮、文本框、图片、链接等各种UI组件的精确识别和边界框定位。此外，<b>还支持细粒度的单词/字符定位</b>，这意味着模型不仅能识别一个按钮，还能识别按钮上的具体文字，这对于需要精确文本交互的任务至关重要。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d331800d90c2d21c32fc10ab"><li><b>重要性</b>：<b>精确的定位能力是GUI自动化的基础</b>，没有它，智能体就无法知道要操作哪个元素，也无法理解界面上各个组件的含义。</li></ul></ol></ol><ol start="2" class="notion-list notion-list-numbered notion-block-260c0110d33180248ef2d496269dd639" style="list-style-type:decimal"><li><b>任务规划（Task Planning）pipeline</b>：</li><ol class="notion-list notion-list-numbered notion-block-260c0110d33180248ef2d496269dd639" style="list-style-type:lower-alpha"><ul class="notion-list notion-list-disc notion-block-260c0110d33180fbbd73c13b22be738b"><li><b>目标</b>：让GUI-Owl能够将复杂、长期的任务分解为一系列可执行的子步骤，并理解这些步骤之间的逻辑关系，从而实现高效的任务完成。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d33180c6a4eecebfd04cc697"><li><b>实现细节</b>：研究团队从大量的成功历史轨迹中提炼出程序化知识，并结合LLM的强大推理能力，构建了任务规划数据集。这些数据包含了从高层任务目标到具体操作序列的映射，例如“预订机票”可能被分解为“打开订票应用”、“选择出发地和目的地”、“选择日期”、“选择航班”等一系列子任务。这使得模型能够学习到在不同场景下完成任务的最佳路径和策略，即使面对跨应用程序协作（如从邮件中提取信息并在浏览器中搜索）的任务也能有效应对。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d33180eebb3ae458305b76b3"><li><b>重要性</b>：<b>规划能力是智能体处理复杂任务的关键</b>，它决定了智能体能否高效、准确地完成多步骤操作，尤其是在需要多轮交互和状态转换的场景下。</li></ul></ol></ol><ol start="3" class="notion-list notion-list-numbered notion-block-260c0110d331807387aaf80eb35e4659" style="list-style-type:decimal"><li><b>动作语义（Action Semantics）pipeline</b>：</li><ol class="notion-list notion-list-numbered notion-block-260c0110d331807387aaf80eb35e4659" style="list-style-type:lower-alpha"><ul class="notion-list notion-list-disc notion-block-260c0110d33180dda1bff93a2736d9a0"><li><b>目标</b>：使GUI-Owl能够理解其执行的每个动作所带来的界面变化和潜在影响，从而形成对环境的深刻理解。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d331805794a1cb6002a7e741"><li><b>实现细节</b>：通过捕捉动作执行前后的UI观察（屏幕截图），模型学习动作与状态转换之间的因果关系。例如，点击一个按钮后，界面可能会出现新的弹窗、内容更新、页面跳转等变化，模型需要理解这种变化是由于其点击动作引起的，并能预测不同动作可能导致的结果。这<b>有助于模型建立一个内部的世界模型，从而进行更深层次的推理和反思。</b></li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d33180c9afd2cf316c425618"><li><b>重要性</b>：<b>理解动作语义有助于模型进行更深层次的推理和反思</b>，从而避免无效操作，并在出现错误时进行自我纠正，提升了智能体的鲁棒性和适应性。</li></ul></ol></ol><div class="notion-text notion-block-260c0110d33180249f73d419e7d2f2d6">除了上述三种核心能力，研究团队还特别关注了<b>推理与反思（Reasoning and Reflecting）</b>能力的构建。他们通过多种数据合成技术，如离线提示引导拒绝采样（offline hint-guided rejection sampling）、从多智能体框架中蒸馏知识（distillation from a multi-agent framework）以及迭代在线拒绝采样（iterative online rejection sampling），来生成丰富的推理和反思数据。这种监督机制使得GUI-Owl不仅能够进行独立的逻辑推理，还能在Mobile-Agent-v3这样的多智能体框架中，与其他智能体进行复杂的协作推理，并根据其在框架中扮演的角色调整其推理风格。这极大地增强了GUI-Owl在面对未知或复杂情况时的适应性和鲁棒性，使其能够处理更具挑战性的开放式任务。</div><div class="notion-text notion-block-266c0110d3318023a3caea5ec59c252c">&lt;ins/&gt;</div><h4 class="notion-h notion-h3 notion-h-indent-2 notion-block-260c0110d3318010a7c0c5a91a1a22d4" data-id="260c0110d3318010a7c0c5a91a1a22d4"><span><div id="260c0110d3318010a7c0c5a91a1a22d4" class="notion-header-anchor"></div><a class="notion-hash-link" href="#260c0110d3318010a7c0c5a91a1a22d4" title="创新三：可扩展的环境强化学习与TRPO"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>创新三：可扩展的环境强化学习与TRPO</b></span></span></h4><div class="notion-text notion-block-260c0110d33180cea56eca8b1c139a1b">为了进一步提升GUI-Owl在真实世界GUI自动化任务中的表现，研究团队引入了强化学习技术。他们开发了一个高度可扩展的训练框架，其核心在于一个统一的多任务训练接口，该接口能够标准化单轮推理任务和多轮智能体任务之间的交互。这一设计使得模型能够在一个统一的范式下学习不同复杂度的任务。</div><div class="notion-text notion-block-260c0110d3318084a79ac9c522ca019f">该框架的一个关键创新是<b>将经验生成与策略更新解耦</b>。这意味着模型在与环境交互生成经验（即操作轨迹）的同时，可以独立地进行策略更新。这种解耦提供了对策略遵循的细粒度控制，使得训练过程更加灵活和高效。更重要的是，这种设计支持<b>完全异步训练</b>，极大地加速了训练过程，并能更好地将模型的决策与实际用户在真实世界中的使用习惯对齐，从而提升了模型的实用性和泛化能力。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-266c0110d33180c2ab73ee6d67165ce1"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A3b79cf3d-e26a-4df1-a937-baae5dc08443%3Aimage.png?table=block&amp;id=266c0110-d331-80c2-ab73-ee6d67165ce1&amp;t=266c0110-d331-80c2-ab73-ee6d67165ce1" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-blank notion-block-266c0110d3318098b722faa64f5e6e55"> </div><div class="notion-text notion-block-260c0110d331800b8be5ca0d905540ad">在强化学习领域，处理长且可变长度的动作序列一直是一个挑战，尤其是在在线环境中。传统的强化学习方法在处理稀疏和延迟奖励时往往效率低下。为了解决这一问题，论文引入了<b>轨迹感知相对策略优化（Trajectory-aware Relative Policy Optimization, TRPO）</b>。</div><div class="notion-text notion-block-266c0110d33180ff9a9afa65c2ae3078">其中， <!-- --> 为批次中 token 的总数， <!-- --> 是轨迹 <!-- --> 的轨迹级优势，而 <!-- --> 是当前策略与旧策略下一个 token 的概率比。这种经过裁剪的目标函数，在稳定训练的同时，能够有效利用整体轨迹级奖励信号，以应对长期的 GUI 自动化任务。
</div><div class="notion-text notion-block-266c0110d331808fba28f085671deec7">TRPO是一种新颖的强化学习算法，它具有以下特点：</div><ul class="notion-list notion-list-disc notion-block-260c0110d33180c89289f97801b20338"><li><b>轨迹级别奖励（Trajectory-level Rewards）</b>：TRPO不再仅仅关注单个时间步的奖励，而是利用整个轨迹的奖励信息来计算每个时间步的优势（advantage）。具体来说，它使用轨迹的准确性和格式奖励之和来计算一个归一化的优势估计，并将这个优势均匀地分配给轨迹中的每一个动作。这种全局视角使得模型能够更好地理解长期行为对最终结果的影响，从而做出更明智的决策，尤其是在GUI自动化这种任务成功往往需要一系列正确操作的场景中。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d3318089b5cdc0116de14d57"><li><b>重放缓冲区（Replay Buffer）</b>：为了提高强化学习的稳定性和数据利用效率，TRPO采用了重放缓冲区。它将历史上<b>成功的轨迹</b>存储起来，并在训练过程中随机采样这些经验进行学习。这有助于打破数据之间的相关性，减少训练过程中的方差，并使得模型能够从更广泛的经验中学习，从而加速收敛并提高策略的稳定性。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d3318054b3adda71c111425e"><li><b>策略优化目标</b>：对于高分辨率的GUI屏幕截图，完整的轨迹会被分割成单步数据实例用于策略更新。损失函数会根据原始轨迹中的总步数进行缩放，以平衡优化过程。这种细致的优化策略确保了模型在处理复杂视觉信息和长序列操作时的有效性。</li></ul><div class="notion-text notion-block-260c0110d33180158be8fe817c2f17c1">通过TRPO，GUI-Owl能够更有效地从实际交互中学习和优化其行为策略，尤其是在面对那些需要一系列复杂、连续操作才能完成的GUI自动化任务时。这种强大的强化学习能力，使得Mobile-Agent-v3在处理真实世界中的动态和不确定性方面表现出色，为构建鲁棒的GUI智能体提供了坚实的技术支撑。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-266c0110d331804aa49adeb022d12f4e"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3Ae99e8cc6-6670-4caa-91ca-384a364acd18%3Aimage.png?table=block&amp;id=266c0110-d331-804a-a49a-deb022d12f4e&amp;t=266c0110-d331-804a-a49a-deb022d12f4e" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-266c0110d33180dcb964d160a7a372f1">&lt;ins/&gt;</div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-260c0110d3318028a893c3afa827cfe1" data-id="260c0110d3318028a893c3afa827cfe1"><span><div id="260c0110d3318028a893c3afa827cfe1" class="notion-header-anchor"></div><a class="notion-hash-link" href="#260c0110d3318028a893c3afa827cfe1" title="Mobile-Agent-v3：协同工作的多智能体框架"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>Mobile-Agent-v3：协同工作的多智能体框架</b></span></span></h3><div class="notion-text notion-block-260c0110d33180c794d5e96c205c2756">Mobile-Agent-v3不仅仅是一个单一的GUI-Owl模型，它更是一个精巧设计的<b>多智能体框架</b>，通过协调多个专门的智能体来进一步提升GUI-Owl的性能，以应对更复杂、更长期的自动化工作流。这个框架将复杂的任务分解给不同的智能体，每个智能体各司其职，并通过协作机制共同完成目标。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-266c0110d33180068d34fe37dd866c3c"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3Ad705b4f3-1234-44ae-8dec-fe30a9efdca6%3Aimage.png?table=block&amp;id=266c0110-d331-8006-8d34-fe37dd866c3c&amp;t=266c0110-d331-8006-8d34-fe37dd866c3c" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-266c0110d33180148eb1ff80cb4b0a88">Mobile-Agent-v3框架主要由以下四个核心智能体组成：</div><ol start="1" class="notion-list notion-list-numbered notion-block-260c0110d3318026adf7cfe4b1b816c9" style="list-style-type:decimal"><li><b>管理智能体（Manager Agent, M）</b>：</li><ol class="notion-list notion-list-numbered notion-block-260c0110d3318026adf7cfe4b1b816c9" style="list-style-type:lower-alpha"><ul class="notion-list notion-list-disc notion-block-260c0110d3318055810ee45d21a9444c"><li><b>角色</b>：战略规划者。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d331806ca9ece935c7fc8945"><li><b>职责</b>：<b>负责将用户给定的高层指令分解为一系列有序的子目标列表。</b>它利用外部知识（通过检索增强生成RAG模块，例如使用维基百科、搜索引擎和用户提供的文档）来获取相关信息，并<b>动态地根据执行结果和反馈更新规划。</b>这意味着Manager Agent能够根据任务的进展和遇到的问题，灵活调整后续的步骤，确保任务能够顺利进行。</li></ul></ol></ol><ol start="2" class="notion-list notion-list-numbered notion-block-260c0110d33180b2bfffed7f8b8bd011" style="list-style-type:decimal"><li><b>工作智能体（Worker Agent, W）</b>：</li><ol class="notion-list notion-list-numbered notion-block-260c0110d33180b2bfffed7f8b8bd011" style="list-style-type:lower-alpha"><ul class="notion-list notion-list-disc notion-block-260c0110d331801ba5cfde5e7d265ff4"><li><b>角色</b>：战术执行者。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d331806fbdbcc13345124813"><li><b>职责</b>：根据Manager Agent提供的子目标，结合当前的GUI状态、历史反馈和累积的笔记，选择并执行最相关的可操作子目标。它会生成一个动作元组，其中包含其<b>思考过程（thought）、具体的动作命令（action command）以及对当前步骤的总结（summary）</b>。Worker Agent是直接与GUI环境交互并执行操作的智能体，它将GUI-Owl的能力转化为实际的GUI操作。</li></ul></ol></ol><ol start="3" class="notion-list notion-list-numbered notion-block-260c0110d331802ba39de39fadd79960" style="list-style-type:decimal"><li><b>反思智能体（Reflector Agent, R）</b>：</li><ol class="notion-list notion-list-numbered notion-block-260c0110d331802ba39de39fadd79960" style="list-style-type:lower-alpha"><ul class="notion-list notion-list-disc notion-block-260c0110d3318009b5d7cea9c991b786"><li><b>角色</b>：自我纠正机制。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d331808db0d1c33aac403043"><li><b>职责</b>：负责评估Worker Agent执行动作后的结果。它会比较Worker Agent预期的结果与实际的界面状态变化，将结果分类为成功、中立或有害，并生成详细的因果反馈。Reflector Agent的关键作用在于提供实时的、有指导性的反馈，帮助系统识别并纠正错误，从而提升整体的鲁棒性和学习效率。</li></ul></ol></ol><ol start="4" class="notion-list notion-list-numbered notion-block-260c0110d33180a6a7a9d0e1c1530582" style="list-style-type:decimal"><li><b>笔记智能体（Notetaker Agent, C）</b>：</li><ol class="notion-list notion-list-numbered notion-block-260c0110d33180a6a7a9d0e1c1530582" style="list-style-type:lower-alpha"><ul class="notion-list notion-list-disc notion-block-260c0110d33180c1b8cefca7808c790d"><li><b>角色</b>：持久化上下文记忆维护者。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d331809583fad9ad419c13b4"><li><b>职责</b>：仅在Reflector Agent判断为成功或中立时被触发。它负责从当前屏幕中提取并存储关键元素作为笔记，从而维护一个持续的上下文记忆。这些累积的记忆（包括关键UI元素、任务进展等）将支持Manager Agent未来的规划和Worker Agent的执行，避免重复探索和遗忘关键信息。</li></ul></ol></ol><div class="notion-text notion-block-260c0110d3318038b80eeedced533fc1">Mobile-Agent-v3框架以一个循环的方式运作：从用户指令开始，Manager Agent初始化规划，然后Worker Agent执行动作，Reflector Agent评估结果，Notetaker Agent更新记忆，最后Manager Agent根据反馈更新规划，直到任务完成或达到预设的停止条件。这种精巧的多智能体协作机制，使得Mobile-Agent-v3能够处理单一智能体难以完成的复杂、长周期和需要持续适应的任务，展现了多智能体系统在GUI自动化领域的巨大潜力。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-266c0110d33180d8ac90d264c08e6341"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3Aea55f669-d82a-48ae-94e6-713299ccdcc3%3Aimage.png?table=block&amp;id=266c0110-d331-80d8-ac90-d264c08e6341&amp;t=266c0110-d331-80d8-ac90-d264c08e6341" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-266c0110d33180cea2a2d75432c92ece">&lt;ins/&gt;</div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-260c0110d33180a4bad1e1b333132d6c" data-id="260c0110d33180a4bad1e1b333132d6c"><span><div id="260c0110d33180a4bad1e1b333132d6c" class="notion-header-anchor"></div><a class="notion-hash-link" href="#260c0110d33180a4bad1e1b333132d6c" title="卓越的性能表现"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>卓越的性能表现</b></span></span></h3><div class="notion-text notion-block-260c0110d331801289a0de58e253553d">Mobile-Agent-v3框架及其核心模型GUI-Owl在多个主流GUI自动化基准测试中展现了卓越的性能，证明了其作为基础智能体的强大能力。这些基准测试全面评估了GUI-Owl在UI元素定位、单步决策、通用问答以及在线环境交互等方面的能力。论文通过一系列实验，不仅验证了GUI-Owl和Mobile-Agent-v3的有效性，还深入分析了各项创新技术对性能的贡献。</div><h4 class="notion-h notion-h3 notion-h-indent-2 notion-block-260c0110d33180d08bf2f4430e0ce61b" data-id="260c0110d33180d08bf2f4430e0ce61b"><span><div id="260c0110d33180d08bf2f4430e0ce61b" class="notion-header-anchor"></div><a class="notion-hash-link" href="#260c0110d33180d08bf2f4430e0ce61b" title="1. 端到端模型性能：GUI-Owl的领先地位"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>1. 端到端模型性能：GUI-Owl的领先地位</b></span></span></h4><div class="notion-text notion-block-260c0110d33180628bbcd80ff67a21c8">GUI-Owl作为端到端的多模态GUI智能体，在多个开放源代码模型中取得了新的SOTA（State-of-the-Art）性能。论文主要在以下两个关键基准上进行了评估：</div><ul class="notion-list notion-list-disc notion-block-260c0110d331800c9d4bc04c1b932b53"><li><b>AndroidWorld</b>：这是一个针对Android移动设备GUI自动化任务的基准。GUI-Owl-7B模型在此基准上取得了<b>66.4%</b>的成功率，显著超越了同等规模的其他开源模型。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d331806c85a7ef2136705769"><li><b>OSWorld</b>：这是一个涵盖桌面操作系统（如Ubuntu、macOS、Windows）GUI自动化任务的基准。GUI-Owl-7B在此基准上取得了<b>29.4%</b>的成功率，同样表现出强大的泛化能力。</li></ul><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-266c0110d33180a7846adec5dcde23a4"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:528px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3Af061b1ac-169d-4144-bbfd-50e153089092%3Aimage.png?table=block&amp;id=266c0110-d331-80a7-846a-dec5dcde23a4&amp;t=266c0110-d331-80a7-846a-dec5dcde23a4" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-260c0110d33180f5a63ff2abe6635c2b">值得注意的是，当GUI-Owl与Mobile-Agent-v3框架结合时，其性能得到了进一步的显著提升：</div><ul class="notion-list notion-list-disc notion-block-260c0110d33180e0a7e0e55d847cae40"><li>在<b>AndroidWorld</b>上，成功率提升至<b>73.3%</b>。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d33180ff9978f6b2478a978a"><li>在<b>OSWorld</b>上，成功率提升至<b>37.7%</b>。</li></ul><div class="notion-text notion-block-260c0110d33180fb8598dd0f74a4d598">这充分证明了Mobile-Agent-v3多智能体框架在复杂任务协调和执行方面的强大增益效果，使得GUI-Owl能够更好地发挥其潜力。</div><h4 class="notion-h notion-h3 notion-h-indent-2 notion-block-260c0110d33180e5a931fde2c3896b09" data-id="260c0110d33180e5a931fde2c3896b09"><span><div id="260c0110d33180e5a931fde2c3896b09" class="notion-header-anchor"></div><a class="notion-hash-link" href="#260c0110d33180e5a931fde2c3896b09" title="2. 超越专有模型：GUI-Owl-32B的强大实力"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>2. 超越专有模型：GUI-Owl-32B的强大实力</b></span></span></h4><div class="notion-text notion-block-260c0110d331802fb2faf7cc39abbf94">除了与开源模型的对比，论文还展示了GUI-Owl-32B（更大规模的模型）在性能上甚至超越了一些强大的专有模型，这在GUI自动化领域是一个里程碑式的成就：</div><ul class="notion-list notion-list-disc notion-block-260c0110d331804c9554ca1a25a9c77e"><li><b>MMBench-GUI</b>：这是一个综合性的GUI理解和交互基准。GUI-Owl-32B在此基准上表现出色，其性能优于包括<b>GPT-4o</b>和<b>Claude 3.7</b>在内的所有模型。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d33180f395f5d76704758135"><li><b>AndroidControl</b>：这是一个专注于Android设备控制的基准。GUI-Owl-32B同样在此基准上取得了领先地位，再次证明了其在复杂移动GUI操作上的卓越能力。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d331802f8bece8e943a7d22e"><li><b>UI元素定位能力</b>：在专门的UI元素定位评估中（如ScreenSpot V2/Pro, OSWorld-G, MMBench-GUI L2），GUI-Owl-32B不仅超越了所有同等规模的开源模型，而且与专有模型相比也具有极强的竞争力，这得益于其精细的Groundingpipeline训练。</li></ul><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-266c0110d33180aa8469ceff80571e74"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A88476b16-0823-4939-bb97-e5458ff081b9%3Aimage.png?table=block&amp;id=266c0110-d331-80aa-8469-ceff80571e74&amp;t=266c0110-d331-80aa-8469-ceff80571e74" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-260c0110d33180449720e6f2b3b564d1">这些结果表明，GUI-Owl不仅在开源领域树立了新的标杆，也为整个GUI自动化领域带来了新的可能性，证明了通过大规模数据和先进训练方法，开源模型也能达到甚至超越商业闭源模型的性能。</div><div class="notion-text notion-block-266c0110d3318064bb7efb951789dd49">&lt;ins/&gt;</div><h4 class="notion-h notion-h3 notion-h-indent-2 notion-block-260c0110d3318005bfeff75bfa7d1472" data-id="260c0110d3318005bfeff75bfa7d1472"><span><div id="260c0110d3318005bfeff75bfa7d1472" class="notion-header-anchor"></div><a class="notion-hash-link" href="#260c0110d3318005bfeff75bfa7d1472" title="3. 消融研究与关键技术贡献"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>3. 消融研究与关键技术贡献</b></span></span></h4><div class="notion-text notion-block-260c0110d33180e0abd8dda7b43b7cef">论文还通过详细的消融研究（Ablation Studies）验证了Mobile-Agent-v3中各项关键技术组件的有效性：</div><ul class="notion-list notion-list-disc notion-block-260c0110d331800ebb1dff9cc41fb2b3"><li><b>TRPO策略的有效性</b>：实验结果显示，引入轨迹感知相对策略优化（TRPO）策略显著提升了模型在在线环境中的表现。例如，在OSWorld-Verified基准上，TRPO策略将成功率从<b>27.1%</b>提升到了<b>34.9%</b>以上，这强调了其在处理稀疏奖励和长序列动作方面的优势。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d3318096bf68d13ad7c73684"><li><b>在线过滤、重放缓冲区和经验管理</b>：消融研究证实，这些机制对于训练的稳定性和效率至关重要。在线过滤确保了高质量数据的输入，重放缓冲区则有效利用了历史经验，减少了训练过程中的方差，使得模型能够更稳定地学习。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d33180a9972ac2eb611362f9"><li><b>历史图像数量和交互步长预算</b>：实验表明，模型的性能与所使用的历史图像数量以及交互步长预算呈正相关。这意味着提供更丰富的上下文信息和允许更长的交互序列，有助于模型做出更准确的决策。</li></ul><ul class="notion-list notion-list-disc notion-block-260c0110d33180ffb97ef8fb7346f3c2"><li><b>推理数据合成</b>：论文详细分析了推理数据合成策略（包括离线提示引导拒绝采样、多智能体框架蒸馏和迭代在线拒绝采样）对GUI-Owl推理能力的增益。这些方法共同作用，逐步增强了GUI-Owl的推理能力，使其能够处理更复杂的逻辑和任务。</li></ul><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-266c0110d33180ffad17cd32b1525888"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A8827d478-93bc-46aa-9c76-ebbd490c1c95%3Aimage.png?table=block&amp;id=266c0110-d331-80ff-ad17-cd32b1525888&amp;t=266c0110-d331-80ff-ad17-cd32b1525888" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-266c0110d33180c993bedc9c055a6b1e"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3Ac3f40de9-681d-49bf-8079-efcf5d48c6b3%3Aimage.png?table=block&amp;id=266c0110-d331-80c9-93be-dc9c055a6b1e&amp;t=266c0110-d331-80c9-93be-dc9c055a6b1e" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-260c0110d33180ae96d4ded29a40cb4c">这些深入的实验分析不仅展示了Mobile-Agent-v3和GUI-Owl的卓越性能，也为未来的GUI自动化研究提供了宝贵的经验和方向，证明了其创新方法在提升智能体能力方面的关键作用。</div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-260c0110d33180fb9757eede35308eb3" data-id="260c0110d33180fb9757eede35308eb3"><span><div id="260c0110d33180fb9757eede35308eb3" class="notion-header-anchor"></div><a class="notion-hash-link" href="#260c0110d33180fb9757eede35308eb3" title="结论与展望"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title"><b>结论与展望</b></span></span></h3><div class="notion-text notion-block-260c0110d33180ae8107c4edcffc5f3a">Mobile-Agent-v3框架及其核心模型GUI-Owl的发布，无疑为GUI自动化领域注入了新的活力。通过其在<b>大规模环境基础设施</b>、<b>多样化的基础智能体能力构建</b>以及<b>可扩展的环境强化学习</b>这三大方面的创新，研究团队不仅提升了GUI智能体的性能上限，更重要的是，为构建能够真正理解并操作复杂GUI环境的通用智能体奠定了坚实的基础。</div><div class="notion-text notion-block-260c0110d33180419cf3d5bb5bcd65fc">GUI-Owl作为一个端到端的多模态智能体，其在多平台、多任务上的卓越表现，以及在多智能体框架中的灵活集成能力，预示着GUI自动化将不再局限于简单的重复性任务，而是能够处理更具挑战性、需要复杂推理和规划能力的场景。特别是“自我进化GUI轨迹生产”框架的提出，为高质量、大规模的GUI交互数据收集提供了一条可持续的路径，有望解决长期以来困扰该领域的数据瓶颈问题。</div><div class="notion-blank notion-block-260c0110d3318199bc6df8567844a256"> </div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-260c0110d33181a289b1c7f37a4cd686" data-id="260c0110d33181a289b1c7f37a4cd686"><span><div id="260c0110d33181a289b1c7f37a4cd686" class="notion-header-anchor"></div><a class="notion-hash-link" href="#260c0110d33181a289b1c7f37a4cd686" title="AI Agents 知识星球"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">AI Agents 知识星球</span></span></h2><div class="notion-text notion-block-260c0110d331813aa949d710393c2548">GUI Agents 技术发展迅猛，想紧跟 GUI/AI agents 技术前沿？我们的知识星球会<b>介绍 Agents 相关的最新项目和工具，并以视频方式解读最新论文</b>，为你开启技术新视野，快来加入吧！</div><div class="notion-sync-block notion-block-260c0110d3318122956bc21e46f27afe"><div class="notion-row notion-block-260c0110d3318190af65dea86b9b9fd5"><div class="notion-column notion-block-260c0110d331813fb001fda49eb7704b" style="width:calc((100% - (1 * min(32px, 4vw))) * 0.5)"><div class="notion-text notion-block-260c0110d331813d90b1edfac74c4386">加入知识星球，每周获取会员专享视频👇</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-260c0110d331810b9744c1a7d8ae37ce"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3Ad1adc9e1-7b6a-453a-a9f3-2e2e826a3b09%3Aimage.png?table=block&amp;id=260c0110-d331-810b-9744-c1a7d8ae37ce&amp;t=260c0110-d331-810b-9744-c1a7d8ae37ce" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-blank notion-block-260c0110d33181af8931e117a0b1e674"> </div></div><div class="notion-spacer"></div><div class="notion-column notion-block-260c0110d33181ec84e2e96dcd5d14fb" style="width:calc((100% - (1 * min(32px, 4vw))) * 0.5)"><div class="notion-text notion-block-260c0110d33181e185d3d984396a0c0c">扫码加微信小助手为好友，备注「agent」，小助手会定期邀请入群👇</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-260c0110d33181c792dac1caa43a8fe7"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F9341931a-53f0-48e1-b026-0f1ad17b457c%2Feed2e715-74df-4361-bd15-bc084f0d791f%2Fimage.png?table=block&amp;id=260c0110-d331-81c7-92da-c1caa43a8fe7&amp;t=260c0110-d331-81c7-92da-c1caa43a8fe7&amp;width=337.741455078125&amp;cache=v2" alt="notion image" loading="lazy" decoding="async"/></div></figure></div><div class="notion-spacer"></div></div></div><div class="notion-sync-block notion-block-151c0110d33180b3ba16fe7b239b5be6"><div class="notion-text notion-block-151c0110d33180d28b7bec1e1f98963b"><b>当前星球包含的专享视频包括：</b></div><ul class="notion-list notion-list-disc notion-block-2b9c0110d33180f3ab7cfd5dbdd069b3"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1fcSEBMEzr">AI-Agents 中的上下文工程（Context-Engineering）</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-266c0110d331804eac48ca0f99b0ade2"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV13wuozDExH">GUI Agents 最新技术综述（2025）</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-266c0110d3318066b907e494b7b212d4"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1EeTvzaEBw">GUI Agent 最新技术：MONDAY—从视频自动构建 GUI Agents 轨迹数据</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-1e4c0110d33180cbb9eaf9db68f4ced0"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1MVG1zsEB7">GUI Agent 最新技术：InfiGUI-R1—从反应式执行向推理式决策的进阶之路</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-1e4c0110d3318097a0b3dc7029483852"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1bmdzYzEty">GUI Agent 最新技术：自动驾驶与具身智能技术能带来哪些启示？</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-1b3c0110d3318065a5b1ec3028aaefdf"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1uyRhY2EFi">GUI Agent 最新技术：ATLaS—同时提升训练效率和模型泛化性</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-1acc0110d3318034a55ae1fe3e709812"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1gm96YrEQY">GUI Agent 技术分享：DigiQ/VEM—使用 RL 提升模型的泛化能力</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-19cc0110d331806580dec9dc260154e7"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1pPFceFE42">UI Agent 技术分享： UI-TARS—利用长期记忆和反思调整迭代优化模型</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-185c0110d331800f9687c1f6b3d5c83b"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1itwkerEyu">AI Agent 技术分享：Insight-V—探索 VLM 的长链条视觉推理能力</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-17ac0110d331809a9d2dc345f7df87f5"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1hZcGe1ELm">UI Agent 技术分享：PC-Agent—提升模型认知能力以便更好完成复杂任务</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-173c0110d331801e9074e847bbeaaefd"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1aKrTY7EWB">UI Agent 技术分享：OS-Genesis—自动合成高质量且多样化的训练数据</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-173c0110d33180fdad96fafb9cd8623d"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1u26hY5Eyw">UI Agent 技术分享：PAE-通过自动探索新任务不断扩展模型能力</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d331804b9542c46d6f309176"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1dgCNYXEfa">UI Agent 技术分享：Iris-通过自动构造的数据提升模型效果</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-165c0110d33180f79e4efbc20c6e3980"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV17VqfYfEjk">UI Agent 技术分享：Falcon-UI—利用无监督数据预训练 UI Agent 模型</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-15cc0110d3318068b5f7cb3a2a4e3d24"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1erqxYhEBc">UI Agent 技术分享：Aguvis-来自 HKU &amp; Salesforce 的大一统训练数据和训练框架</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d3318043887fd27956e15eb6"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1U86FY9E1G">UI Agent 技术分享：ShowUI-当前最好的 UI Agents 开源模型，还适用中文 APP？</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d33180b98171fcbc2934024c"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1pjBtYnE6C">UI Agent 技术分享：使用世界模型提升 UI Agents 效果？</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d33180d0a0c2c69318bc6ff3"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV14eU7YWEEs">UI Agent 技术分享：来自华为诺亚方舟实验室的 LiMAC</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-151c0110d33180eda815f4f6a769ae7e"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.bilibili.com/video/BV1c1mpYtEqG">UI Agent 技术分享：来自 LG AI Research 的 Auto-Intent</a></b></span></li></ul><div class="notion-blank notion-block-151c0110d33180e3802cefd4b14a659c"> </div></div><div class="notion-text notion-block-260c0110d33181e8a837d385b0bf6c38">&lt;ins/&gt;</div><div class="notion-blank notion-block-260c0110d33181fcbe35eecbe2c65cd4"> </div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Pix2Text 新版数学公式检测和识别模型：V1.5]]></title>
            <link>https://www.breezedeus.com/article/pix2text-model-1.5</link>
            <guid>https://www.breezedeus.com/article/pix2text-model-1.5</guid>
            <pubDate>Thu, 24 Jul 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Pix2Text (P2T)  的数学公式检测模型（MFD）和数学公式识别模型（MFR）发布新版 V1.5 系列模型，效果得到进一步提升。]]></description>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-227c0110d33180038048c26f4f5274dc"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-row notion-block-227c0110d331816fab67dc37e6230222"><div class="notion-column notion-block-227c0110d33181e9a06febfb6688e5ab" style="width:calc((100% - (1 * min(32px, 4vw))) * 0.375)"><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-236c0110d33180379807e0e256bd0112"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A68ca3659-00de-49f5-8e10-de9d3c20ba43%3Ap2t-v1.5.png?table=block&amp;id=236c0110-d331-8037-9807-e0e256bd0112&amp;t=236c0110-d331-8037-9807-e0e256bd0112" alt="notion image" loading="lazy" decoding="async"/></div></figure></div><div class="notion-spacer"></div><div class="notion-column notion-block-227c0110d33181c2b57bef810f7251e8" style="width:calc((100% - (1 * min(32px, 4vw))) * 0.625)"><div class="notion-text notion-block-227c0110d331818d88b7d344acd90bae"><b>目录：</b></div><div class="notion-table-of-contents notion-gray notion-block-227c0110d3318169bb02cf667ef18ffc"><a href="#236c0110d331804398b2cb244b23e10c" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">MFD V1.5 模型</span></a><a href="#227c0110d331812287d5eb1dd319406e" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">MFR V1.5 模型</span></a><a href="#227c0110d3318176bb1bf66d4cebd4ef" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">本地使用方式</span></a><a href="#236c0110d33180ae91abc54579145dc3" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">安装</span></a><a href="#236c0110d3318068955fcecf595d6cde" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">使用</span></a><a href="#227c0110d33181c78e3cd3412ef0da38" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">P2T 网页版</span></a><a href="#227c0110d33181bb91deca592d917b3c" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">在线 Demo</span></a><a href="#227c0110d33181178a27f3a6befb39a5" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">付费版模型购买</span></a><a href="#227c0110d331816eb993c24e228c0b89" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">购买链接</span></a><a href="#227c0110d33181eebe13e3293d94a452" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">使用说明</span></a></div><div class="notion-blank notion-block-227c0110d3318164b82aedf65d8de0e3"> </div></div><div class="notion-spacer"></div></div><div class="notion-text notion-block-227c0110d3318153b626d86691f0291d"><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/pix2text">Pix2Text (P2T)</a></b></span><b> 识别图片中文字和数学公式，输出对应的文本和 Latex 表达式；其目标是成为 </b><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://mathpix.com/">Mathpix</a></b></span><b> 的免费开源 Python 替代工具。Pix2Text</b> 差不多是两年半前发布的初版，当前 GitHub stars 突破了 2500🌟，这是个典型的缓慢积累的项目。</div><div class="notion-text notion-block-227c0110d3318177b279d0b3571336d7">如我之前所说，<b>Pix2Text </b>是坚持走 <b>小模型+开源 的路线，模型大小得保证在一般的 CPU 机器上能跑得动，代码和基础模型都开源</b>，同时也提供精度更高的付费模型供购买后个人或商业使用。<b>Pix2Text 整合了版面分析和表格识别模型，可以识别图片中的版面、表格、图片、文字、数学公式等内容，并整合所有内容后以 Markdown 格式输出。P2T 也可以把一整个 PDF 文件（PDF 的内容可以是扫描图片或者其他任何格式）转换为 Markdown 格式。</b>具体原理说明见 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/p2t-v1.1">Pix2Text V1.1 新版发布，支持 PDF 转 Markdown</a></b></span>。</div><div class="notion-blank notion-block-227c0110d3318163a217f99dcd7e6a2e"> </div><div class="notion-text notion-block-227c0110d33181e684efcc5c78b9f10d">一年前我发布了新架构的 MFD 和 MFR 模型，它们一直是同体量模型中效果最好的数学公式检测和识别模型。目前 MFR 的开源模型下载量已经超过了 <code class="notion-inline-code"><b>600K</b></code>，这个数字对我个人来说是个很大的激励。这些模型我称之为 <code class="notion-inline-code"><b>V1.0</b></code> 版本。</div><div class="notion-text notion-block-236c0110d3318069a271c171da55c884">这次新发布的 MFD 和 MFR 模型我称之为 <code class="notion-inline-code"><b>V1.5</b></code> 版本。接下来介绍下新版本模型的差异点和效果吧。</div><div class="notion-callout notion-gray_background_co notion-block-236c0110d3318009b3d0c8788ae79d1c"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="📌">📌</span></div><div class="notion-callout-text"><div class="notion-text notion-block-236c0110d33180a4a65fda8b73b103f8">注意：新发布的<b>模型</b>版本为 <code class="notion-inline-code"><b>V1.5</b></code>，pix2text Python 库的版本依旧是 <code class="notion-inline-code"><b>V1.1.*</b></code>。</div></div></div><div class="notion-text notion-block-227c0110d331818ba2d1d2a7e4b486df">&lt;ins/&gt;</div><div class="notion-blank notion-block-236c0110d33180c1be58fe175b03683d"> </div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-236c0110d331804398b2cb244b23e10c" data-id="236c0110d331804398b2cb244b23e10c"><span><div id="236c0110d331804398b2cb244b23e10c" class="notion-header-anchor"></div><a class="notion-hash-link" href="#236c0110d331804398b2cb244b23e10c" title="MFD V1.5 模型"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">MFD V1.5 模型</span></span></h3><div class="notion-text notion-block-236c0110d3318088a58fd0ceda831c28">之前的 MFD V1.0 版本模型是基于 yolo8 架构训练得到的检测模型。而新版的 1.5 版本模型使用了新的 yolo11 架构。</div><div class="notion-text notion-block-236c0110d3318096b25dcbf88c034e76">同时，我们也增加了检测模型的训练数据，让模型相较于其他开源模型更能适应非标准排版图片（如 PPT，手机拍照图片）。</div><div class="notion-blank notion-block-236c0110d331802e8fccf05dc1c5bce0"> </div><div class="notion-text notion-block-236c0110d3318046a3dced55374e3518">以下是 MFD 新旧模型的对比：</div><table class="notion-simple-table notion-block-236c0110d33180728945cab40b27e81a"><tbody><tr class="notion-simple-table-row notion-simple-table-header-row notion-block-236c0110d33180a28961ccb184df9932"><td class="notion-simple-table-header-cell" style="width:200.1328125px"><div class="notion-simple-table-cell">模型名称</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">模型架构</div></td><td class="" style="width:173px"><div class="notion-simple-table-cell">模型说明</div></td></tr><tr class="notion-simple-table-row notion-block-236c0110d331802db71fcf24c56a1d52"><td class="notion-simple-table-header-cell" style="width:200.1328125px"><div class="notion-simple-table-cell"><span class="notion-gray">MFD-1.0 (MFD)</span></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">yolov8m</div></td><td class="" style="width:173px"><div class="notion-simple-table-cell">开源</div></td></tr><tr class="notion-simple-table-row notion-block-236c0110d331803ca881e9eedb1461f0"><td class="notion-simple-table-header-cell" style="width:200.1328125px"><div class="notion-simple-table-cell">MFD-1.5</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">yolo11m</div></td><td class="" style="width:173px"><div class="notion-simple-table-cell">开源</div></td></tr><tr class="notion-simple-table-row notion-block-236c0110d33180af9e00e36a136b8d53"><td class="notion-simple-table-header-cell" style="width:200.1328125px"><div class="notion-simple-table-cell">MFD-ADVANCED-1.5</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">yolo11l</div></td><td class="" style="width:173px"><div class="notion-simple-table-cell">知识星球会员专享</div></td></tr><tr class="notion-simple-table-row notion-block-236c0110d33180268736ff8407ccb4ed"><td class="notion-simple-table-header-cell" style="width:200.1328125px"><div class="notion-simple-table-cell">MFD-PRO-1.5</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">yolo11x</div></td><td class="" style="width:173px"><div class="notion-simple-table-cell">付费可购买</div></td></tr></tbody></table><div class="notion-blank notion-block-236c0110d33180f7a1b4fed779ec05a3"> </div><div class="notion-text notion-block-236c0110d33180958e8ae6d219ba1c72">各模型参数量如下：</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-236c0110d33180ad9b14fe5de14fe5e8"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3Af2e1f51f-b59b-44a5-b1f8-410a5621ffdb%3Aimage.png?table=block&amp;id=236c0110-d331-80ad-9b14-fe5de14fe5e8&amp;t=236c0110-d331-80ad-9b14-fe5de14fe5e8" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-blank notion-block-236c0110d3318025a8e9f047d49fed72"> </div><div class="notion-text notion-block-236c0110d3318092b366e7645f2e3e43">以下是模型在验证集上的效果：</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-236c0110d331806cba59d715ef33a6d7"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A88e8ed13-9b3e-4405-8572-16793c645336%3Aimage.png?table=block&amp;id=236c0110-d331-806c-ba59-d715ef33a6d7&amp;t=236c0110-d331-806c-ba59-d715ef33a6d7" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-236c0110d33180cebf89ee89f4373cd4"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A976456bb-6385-4dfc-9f9a-0ca2d0dc03ee%3Aimage.png?table=block&amp;id=236c0110-d331-80ce-bf89-ee89f4373cd4&amp;t=236c0110-d331-80ce-bf89-ee89f4373cd4" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-blank notion-block-236c0110d33180bbbccdf7e67026186e"> </div><div class="notion-text notion-block-236c0110d3318012ac8aeaddd283cebe">可见，新版的 V1.5（yolo11 系列）模型，相较于 V1.0 模型，效果有较显著的提升。</div><div class="notion-blank notion-block-236c0110d33180b18a8fd753f41ca8c5"> </div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-227c0110d331812287d5eb1dd319406e" data-id="227c0110d331812287d5eb1dd319406e"><span><div id="227c0110d331812287d5eb1dd319406e" class="notion-header-anchor"></div><a class="notion-hash-link" href="#227c0110d331812287d5eb1dd319406e" title="MFR V1.5 模型"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">MFR V1.5 模型</span></span></h3><div class="notion-text notion-block-236c0110d331807ebc54e29b6f00f6b9">和之前的 MFR 模型（V1.0）一样，<b>MFR V1.5</b> 也包含 <b>MFR-1.5</b> 和 <b>MFR-PRO-1.5</b> 两个模型。</div><div class="notion-blank notion-block-236c0110d33180b2b99acc9cda45c84b"> </div><div class="notion-text notion-block-236c0110d3318052afa9d2ba07616773">MFR V1.5 使用的模型架构和 V1.0 相同，但训练过程做了以下优化：</div><ul class="notion-list notion-list-disc notion-block-236c0110d33180089ebfcf38be3133ed"><li>V1.0 模型对根号公式图片（如下图）识别效果一般，V1.5 针对此问题做了优化，专门加入了对应的训练数据。</li><ul class="notion-list notion-list-disc notion-block-236c0110d33180089ebfcf38be3133ed"><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-236c0110d331800a9490e774b50dbe9a"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:247.9971466064453px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A91f5336a-d88f-4e6e-8cd0-470d6680445f%3AiShot_2025-06-04_11.23.58.png?table=block&amp;id=236c0110-d331-800a-9490-e774b50dbe9a&amp;t=236c0110-d331-800a-9490-e774b50dbe9a" alt="notion image" loading="lazy" decoding="async"/></div></figure></ul></ul><ul class="notion-list notion-list-disc notion-block-236c0110d33180bc93d1f93d226083ec"><li>V1.0  模型能识别的最大 tokens 数量为 512，V1.5 模型这个数字提升为 1024，以便更好地识别复杂的多行公式。</li></ul><ul class="notion-list notion-list-disc notion-block-236c0110d33180fea660fbe233514360"><li> V1.5 加入了更多来自真实场景的图片标注数据，进一步提升了模型效果。</li></ul><div class="notion-blank notion-block-236c0110d33180758c51fa80b18d6960"> </div><div class="notion-text notion-block-236c0110d33180a299abd675016aba77">以下是各个模型在人工选定的测试数据集上的 <b>CER（字错误率，越小越好）</b>。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-227c0110d33180cc8c2bfbf697b0d071"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3A41e8e282-b16d-4dbe-9032-c88e023020b1%3Aimage.png?table=block&amp;id=227c0110-d331-80cc-8c2b-fbf697b0d071&amp;t=227c0110-d331-80cc-8c2b-fbf697b0d071" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-blank notion-block-236c0110d33180a0aa2eed537619f032"> </div><div class="notion-text notion-block-236c0110d33180e6b4ddfa7f05ffed59">由于 LaTeX 表达具有多样性（同样的公式可以使用不同的 LaTeX 表达式渲染得到），我们也对生成结果做了人工评测。对于一个模型结果，只要它渲染成图片后的结果和原始图片相同即视为成功（得分 1.0），否则视为失败（得分 0.0）。以下是不同模型在测试集上的整体得分（得分越高越好）：</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-227c0110d331807f9daaeadb02d91c7a"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/attachment%3Ab9a02237-3e16-4c3c-b3aa-a680f9ae3bcf%3Aimage.png?table=block&amp;id=227c0110-d331-807f-9daa-eadb02d91c7a&amp;t=227c0110-d331-807f-9daa-eadb02d91c7a" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-blank notion-block-227c0110d33180e4a091cc8695d7f0d8"> </div><div class="notion-text notion-block-227c0110d3318139b10ecb030afa3440">由上图可见，相较于 V1.0 的模型，MFR V1.5 的模型效果得到进一步提升。</div><div class="notion-blank notion-block-227c0110d33181b099e7d3772ac41906"> </div><div class="notion-blank notion-block-236c0110d33180599912da837e0ad57e"> </div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-227c0110d3318176bb1bf66d4cebd4ef" data-id="227c0110d3318176bb1bf66d4cebd4ef"><span><div id="227c0110d3318176bb1bf66d4cebd4ef" class="notion-header-anchor"></div><a class="notion-hash-link" href="#227c0110d3318176bb1bf66d4cebd4ef" title="本地使用方式"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">本地使用方式</span></span></h2><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-236c0110d33180ae91abc54579145dc3" data-id="236c0110d33180ae91abc54579145dc3"><span><div id="236c0110d33180ae91abc54579145dc3" class="notion-header-anchor"></div><a class="notion-hash-link" href="#236c0110d33180ae91abc54579145dc3" title="安装"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">安装</span></span></h3><div class="notion-text notion-block-236c0110d33180b197b6e083e746308e">如果已安装旧版 pix2text，使用以下命令更新相应的 Python 包：</div><div class="notion-blank notion-block-236c0110d33180169162c86dcc3ffe63"> </div><div class="notion-text notion-block-236c0110d331804b8ffedd7844cf0cd3">如果未安装过 pix2text，可以直接使用以下命令安装最新的 pix2text 包即可：</div><div class="notion-blank notion-block-236c0110d331806d9a37dd7eb2dec1f9"> </div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-236c0110d3318068955fcecf595d6cde" data-id="236c0110d3318068955fcecf595d6cde"><span><div id="236c0110d3318068955fcecf595d6cde" class="notion-header-anchor"></div><a class="notion-hash-link" href="#236c0110d3318068955fcecf595d6cde" title="使用"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">使用</span></span></h3><div class="notion-text notion-block-236c0110d33180d6a2d9d1469954a023">新版 pix2text 默认会使用数学公式检测模型 <code class="notion-inline-code"><b>mfd-1.5</b></code> 和 数学公式识别模型 <code class="notion-inline-code"><b>mfr-1.5</b></code>：</div><div class="notion-blank notion-block-236c0110d331804e9d5eebc18f209aa5"> </div><div class="notion-text notion-block-236c0110d33180cb8c38eee8b1a9bdf8">推理使用时用到的模型文件会自动下载，文件默认从 Huggingface 网站下载。如果没有梯子，请多等待一会，系统会自动切换到 Huggingface 镜像站下载。</div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-227c0110d33181c78e3cd3412ef0da38" data-id="227c0110d33181c78e3cd3412ef0da38"><span><div id="227c0110d33181c78e3cd3412ef0da38" class="notion-header-anchor"></div><a class="notion-hash-link" href="#227c0110d33181c78e3cd3412ef0da38" title="P2T 网页版"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">P2T 网页版</span></span></h2><div class="notion-row"><a target="_blank" rel="noopener noreferrer" class="notion-bookmark notion-blue_background notion-block-227c0110d3318109868efbfc419f0f1a" href="https://p2t.breezedeus.com/"><div><div class="notion-bookmark-title">Pix2Text (P2T) - Free Mathpix Alternative</div><div class="notion-bookmark-description">Use Pix2Text (P2T) to convert math formulas in images to text. Pix2Text is a free alternative to Mathpix that supports math formula recognition, LaTeX rendering, and export to various formats.</div><div class="notion-bookmark-link"><div class="notion-bookmark-link-icon"><img src="https://www.notion.so/image/https%3A%2F%2Fp2t.breezedeus.com%2Ffavicon.ico?table=block&amp;id=227c0110-d331-8109-868e-fbfc419f0f1a&amp;t=227c0110-d331-8109-868e-fbfc419f0f1a" alt="Pix2Text (P2T) - Free Mathpix Alternative" loading="lazy" decoding="async"/></div><div class="notion-bookmark-link-text">https://p2t.breezedeus.com/</div></div></div></a></div><div class="notion-blank notion-block-227c0110d331815fa6e8e06b543ee860"> </div><div class="notion-text notion-block-227c0110d33181df8fa3df34a786a3fb">所有人都可以免费使用 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://p2t.breezedeus.com">P2T网页版</a></b></span>，每人每天可以免费识别 10000 个字符，正常使用应该够用了。<em>请不要批量调用接口，机器资源有限，批量调用会导致其他人无法使用服务。</em></div><div class="notion-blank notion-block-227c0110d33181b9a576eaa4fdfa3cca"> </div><div class="notion-text notion-block-227c0110d331813b81e8c76efc34e739">受限于机器资源，网页版支持的文本 OCR 语言有限。如果要尝试其他语言上的效果，请使用以下的<b>在线 Demo</b>。</div><div class="notion-text notion-block-227c0110d33181d892ffe253e3132e14">&lt;ins/&gt;</div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-227c0110d33181bb91deca592d917b3c" data-id="227c0110d33181bb91deca592d917b3c"><span><div id="227c0110d33181bb91deca592d917b3c" class="notion-header-anchor"></div><a class="notion-hash-link" href="#227c0110d33181bb91deca592d917b3c" title="在线 Demo"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">在线 Demo</span></span></h2><div class="notion-row"><a target="_blank" rel="noopener noreferrer" class="notion-bookmark notion-block-227c0110d331817f8851ddc7d72ef09a" href="https://huggingface.co/spaces/breezedeus/Pix2Text-Demo"><div><div class="notion-bookmark-title">Pix2Text - a Hugging Face Space by breezedeus</div><div class="notion-bookmark-description">Discover amazing ML apps made by the community</div><div class="notion-bookmark-link"><div class="notion-bookmark-link-icon"><img src="https://www.notion.so/image/https%3A%2F%2Fhuggingface.co%2Ffavicon.ico?table=block&amp;id=227c0110-d331-817f-8851-ddc7d72ef09a&amp;t=227c0110-d331-817f-8851-ddc7d72ef09a" alt="Pix2Text - a Hugging Face Space by breezedeus" loading="lazy" decoding="async"/></div><div class="notion-bookmark-link-text">https://huggingface.co/spaces/breezedeus/Pix2Text-Demo</div></div></div><div class="notion-bookmark-image"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fcdn-thumbnails.huggingface.co%2Fsocial-thumbnails%2Fspaces%2Fbreezedeus%2FPix2Text-Demo.png?table=block&amp;id=227c0110-d331-817f-8851-ddc7d72ef09a&amp;t=227c0110-d331-817f-8851-ddc7d72ef09a" alt="Pix2Text - a Hugging Face Space by breezedeus" loading="lazy" decoding="async"/></div></a></div><div class="notion-blank notion-block-227c0110d3318130b225e24f1903a078"> </div><div class="notion-text notion-block-227c0110d33181f1856af19f46392f1a">可以使用此 <b>在线 Demo</b> 尝试 <b>P2T</b> 在不同语言上的效果。但在线 Demo 使用的硬件配置较低，速度会较慢。如果是<b>简体中文或者英文</b>图片，建议使用 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://p2t.breezedeus.com">P2T网页版</a></b></span>。</div><div class="notion-callout notion-gray_background_co notion-block-227c0110d33181d5a06ed81f655b047b"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="📌">📌</span></div><div class="notion-callout-text">如果无法科学上网，可以访问此地址：<span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://hf.qhduan.com/spaces/breezedeus/Pix2Text-Demo">https://hf-mirror.com/spaces/breezedeus/Pix2Text-Demo</a></span> 。</div></div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-227c0110d33181178a27f3a6befb39a5" data-id="227c0110d33181178a27f3a6befb39a5"><span><div id="227c0110d33181178a27f3a6befb39a5" class="notion-header-anchor"></div><a class="notion-hash-link" href="#227c0110d33181178a27f3a6befb39a5" title="付费版模型购买"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">付费版模型购买</span></span></h2><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-227c0110d331816eb993c24e228c0b89" data-id="227c0110d331816eb993c24e228c0b89"><span><div id="227c0110d331816eb993c24e228c0b89" class="notion-header-anchor"></div><a class="notion-hash-link" href="#227c0110d331816eb993c24e228c0b89" title="购买链接"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">购买链接</span></span></h3><div class="notion-text notion-block-227c0110d33181ecb5f6cfd56b9ad3d6">除免费开源版模型 <code class="notion-inline-code"><b>MFD-1.5</b></code> 和 <code class="notion-inline-code"><b>MFR-1.5</b></code> 外，我们也提供了一些 V1.5 的付费版模型。以下是各付费模型的购买链接。购买仅包含模型的 ONNX 版本，不包含 PyTorch 版本。个人购买后的模型仅限个人使用，不可商用，不可开发票。企业购买后可开发票（receipt），使用范围说明具体见对应的购买页面。</div><table class="notion-simple-table notion-block-245c0110d33180edadaec7fbb9ae3472"><tbody><tr class="notion-simple-table-row notion-teal notion-simple-table-header-row notion-block-245c0110d3318070a056f6f6f617518f"><td class="notion-simple-table-header-cell" style="width:154.14630126953125px"><div class="notion-simple-table-cell"><b>识别模型版本</b></div></td><td class="" style="width:114.99857330322266px"><div class="notion-simple-table-cell">企业购买</div></td><td class="" style="width:145.99573516845703px"><div class="notion-simple-table-cell">个人<b>购买</b></div></td><td class="" style="width:141.99431610107422px"><div class="notion-simple-table-cell"><b>对星球会员</b></div></td><td class="" style="width:97.99999237060547px"><div class="notion-simple-table-cell"><b>免费可下载</b></div></td></tr><tr class="notion-simple-table-row notion-block-245c0110d33180eea6dfe48482bf8e4c"><td class="notion-simple-table-header-cell" style="width:154.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code"><b>MFD-Advanced-1.5</b></code></div></td><td class="" style="width:114.99857330322266px"><div class="notion-simple-table-cell">✖️</div></td><td class="" style="width:145.99573516845703px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/ef80ff41-b113-4bf0-9516-7f44e49a6bba">Lemon Squeezy</a></b></span></div></td><td class="" style="width:141.99431610107422px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://t.zsxq.com/FEYZRJQ">免费获取</a></b></span></div></td><td class="" style="width:97.99999237060547px"><div class="notion-simple-table-cell">✖️</div></td></tr><tr class="notion-simple-table-row notion-block-245c0110d331801b9493fde422c390d6"><td class="notion-simple-table-header-cell" style="width:154.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code"><b>MFD-Pro-1.5</b></code></div></td><td class="" style="width:114.99857330322266px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/bf5fecfa-64ca-40a3-89e5-ac019ae39a31">企业 Pro 版</a></b></span><span class="notion-blue"><b>
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/54ecbde4-40e5-47cd-919d-c6ac6201523a">企业 Plus 版</a></b></span></div></td><td class="" style="width:145.99573516845703px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://mall.bilibili.com/neul-next/detailuniversal/detail.html?isMerchant=1&amp;page=detailuniversal_detail&amp;saleType=10&amp;itemsId=12805387&amp;loadingShow=1&amp;noTitleBar=1&amp;msource=merchant_share">bilibili 商城</a></b></span><span class="notion-blue"><b>
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/c8f2360b-cd46-4bf3-89be-6e1a2828137a">Lemon Squeezy</a></b></span></div></td><td class="" style="width:141.99431610107422px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://t.zsxq.com/FEYZRJQ">个人购买八折</a></b></span></div></td><td class="" style="width:97.99999237060547px"><div class="notion-simple-table-cell">✖️</div></td></tr><tr class="notion-simple-table-row notion-block-245c0110d33180869431e9abb20f0bea"><td class="notion-simple-table-header-cell" style="width:154.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code"><b>MFR-Pro-1.5</b></code></div></td><td class="" style="width:114.99857330322266px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/bf5fecfa-64ca-40a3-89e5-ac019ae39a31">企业 Pro 版</a></b></span><span class="notion-blue"><b>
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/54ecbde4-40e5-47cd-919d-c6ac6201523a">企业 Plus 版</a></b></span></div></td><td class="" style="width:145.99573516845703px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://mall.bilibili.com/neul-next/detailuniversal/detail.html?isMerchant=1&amp;page=detailuniversal_detail&amp;saleType=10&amp;itemsId=12805401&amp;loadingShow=1&amp;noTitleBar=1&amp;msource=merchant_share">bilibili 商城</a></b></span><span class="notion-blue"><b>
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/ab343594-fe6c-4f92-89c9-6c1682f84ff4">Lemon Squeezy</a></b></span></div></td><td class="" style="width:141.99431610107422px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://t.zsxq.com/FEYZRJQ">个人购买八折</a></b></span></div></td><td class="" style="width:97.99999237060547px"><div class="notion-simple-table-cell">✖️</div></td></tr></tbody></table><div class="notion-blank notion-block-227c0110d33181cc8da9d84256e0dffa"> </div><div class="notion-sync-block notion-block-227c0110d33181e9876dfc73012970f7"><div class="notion-text notion-block-227c0110d331813384cfe8524080bda9"><b>Pix2Text V1.0+ 包含两种企业版</b>。它们的权益差异见下图。<span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/bf5fecfa-64ca-40a3-89e5-ac019ae39a31">企业 Pro 版</a></b></span><span class="notion-blue"><b> </b></span>是一次性购买，之后有新模型需要重新购买。<b>企业 Pro 版 </b>只允许企业内部使用或者对外提供免费的服务（如教育机构），不允许对外提供付费服务。<span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/54ecbde4-40e5-47cd-919d-c6ac6201523a">企业 Plus 版</a></b></span><span class="notion-blue"><b> </b></span>购买后一年内可以免费获取所有的新模型。<b>企业 Plus 版 </b>除了提供 Pro 模型外也提供<b> Plus 版 </b>模型，同时提供所有模型的 PyTorch 版本，企业可以基于这些模型利用自己的数据进行模型精调，或者转换为需要的其他模型格式（如 CoreML等）。<b>企业 Plus版 </b>允许企业对外提供付费服务。</div><div class="notion-text notion-block-227c0110d3318199b4d5f45e2156367c">更详细说明请见 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com">模型购买商店</a></b></span>（进入商品的详情页有具体说明）。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-227c0110d3318184af76d4f829006c35"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:624px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F9341931a-53f0-48e1-b026-0f1ad17b457c%2F1ac9ab05-09eb-4328-9657-c1b9526d5e1e%2FUntitled.jpeg?table=block&amp;id=227c0110-d331-8184-af76-d4f829006c35&amp;t=227c0110-d331-8184-af76-d4f829006c35&amp;width=624&amp;cache=v2" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-blank notion-block-227c0110d3318136b6fcd312128e70d3"> </div><div class="notion-text notion-block-227c0110d331815d9104c0f02e3ef414"><b>购买链接</b>见：<span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com">模型购买商店</a></b></span>（进入商品的详情页有具体说明）。</div></div><div class="notion-blank notion-block-227c0110d33181188e43ccb1bc7e4695"> </div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-227c0110d33181eebe13e3293d94a452" data-id="227c0110d33181eebe13e3293d94a452"><span><div id="227c0110d33181eebe13e3293d94a452" class="notion-header-anchor"></div><a class="notion-hash-link" href="#227c0110d33181eebe13e3293d94a452" title="使用说明"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">使用说明</span></span></h3><div class="notion-sync-block notion-block-29be5f9e1031465596c9a4184d014756"><div class="notion-text notion-block-417bd2e41a4940ea8bb9bba60ab7408e">首先，请确保你用开源的模型跑通了 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus/Pix2Text">Pix2Text</a></b></span>，否则你下载完付费模型也跑不起来。详细安装和使用说明看 <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus/Pix2Text">Pix2Text</a></span> 项目文档就行。遇到问题可以在这里评论，或者<span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/join-group">加入群聊</a></b></span>与我沟通，但<em><span class="notion-red">请注意帮你跑通代码不在星主的服务范围之内</span></em>（参考 <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/zsxq">星球说明</a></span>）。</div></div><div class="notion-blank notion-block-227c0110d331815092edfb3a1bcc991d"> </div><div class="notion-text notion-block-227c0110d331817ca4a2fa936e41bf27">通过<span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com">模型购买商店</a></b></span>购买<b>企业 Basic 版</b>后，可以下载模型对应的 2 个压缩文件，其中以 <code class="notion-inline-code">*-mfd-</code> 开头的文件为 MFD（数学公式检测）模型，以<code class="notion-inline-code">*-mfr-</code> 开头的文件为 MFR（数学公式识别）模型。MFD 模型压缩文件解压后会看到一个名为 <code class="notion-inline-code">*-onnx</code> 的文件夹，里面的文件即为模型文件，比如叫 <code class="notion-inline-code">pix2text-mfd-pro-1.5.onnx</code> 。假定文件 <code class="notion-inline-code">pix2text-mfd-pro-1.5.onnx</code> 的路径为 <code class="notion-inline-code">abc/def/mfd-pro-1.5-onnx/pix2text-mfd-pro-1.5.onnx</code>。MFR 模型压缩文件解压后会看到一个名为 <code class="notion-inline-code">mfr-pro-1.5-onnx</code> 的文件夹，其中包含模型文件以及相关的配置文件。假定文件夹 <code class="notion-inline-code">mfr-pro-1.5-onnx</code> 的路径为 <code class="notion-inline-code">abc/def/mfr-pro-1.5-onnx</code>。</div><div class="notion-blank notion-block-227c0110d33181a2a49acaadc25bae23"> </div><div class="notion-text notion-block-227c0110d331810fb07cf57302ce018e">那在初始化 Pix2Text 时应该如下传入参数。初始化后的使用方式和开源模型完全一样，检测和识别结果的结构也是一样的。</div><div class="notion-blank notion-block-227c0110d331815b81e5e30bd3c74641"> </div><div class="notion-text notion-block-227c0110d33181a7abf5dbfd605db996">如果购买的是<b>企业 Pro 订阅版</b>，可以下载的模型文件会更多（当前是 5 个），除了包含 MFR 的 PyTorch 版本外，也会包含 <b>CnOCR（文本 OCR）</b>中的最新付费模型（ONNX 和 PyTorch 版本），它对中英文文本的识别效果比免费模型更好。可以使用如下方式传入对应的模型。</div><div class="notion-callout notion-gray_background_co notion-block-227c0110d3318168b6fdfdef2f4dd4ee"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="📌">📌</span></div><div class="notion-callout-text">注意：<b>CnOCR</b> 的文本模型只支持<b>英文</b>和<b>简体中文</b>，如果要识别其他语言的文本，请勿使用 CnOCR 模型。只需把上面代码中的 <code class="notion-inline-code">text_config</code> 去掉即可。</div></div><div class="notion-blank notion-block-227c0110d33181a4a115f9d385930e01"> </div><div class="notion-text notion-block-227c0110d331811aba3ce82967f16765"><b>Pix2Text Pro V1.5 新模型</b>已部署到 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://p2t.breezedeus.com">P2T网页版</a></b></span>，欢迎免费使用。有问题可以在这里评论，或者<span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/join-group#57c5ca3b4d9746ae8357af4f3316d8dc">加入群聊</a></b></span>与我沟通，谢谢。</div><div class="notion-blank notion-block-227c0110d3318185a9a8ce9795e63f5b"> </div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[P2T Detailed Infos]]></title>
            <link>https://www.breezedeus.com/article/pix2text</link>
            <guid>https://www.breezedeus.com/article/pix2text</guid>
            <pubDate>Mon, 26 Feb 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[Pix2Text: an Open-Source Python3 tool for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.]]></description>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-9db2343e2011404cb6e63d810e25c32e"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-row notion-block-adec00ddaace4e34bb673048e95b68ad"></div><div class="notion-row notion-block-5aaff80be992449b9dfd73320629d1c6"><div class="notion-column notion-block-8ec8886fd258450482eff557f4fd1710" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.41666666666666663)"><div class="notion-blank notion-block-0ef0036bfe1047acac95b1c3847594ef"> </div></div><div class="notion-spacer"></div><div class="notion-column notion-block-1f0c61a1713f4c1aadd930e269c8454d" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.2500000000000001)"><div class="notion-text notion-block-763776e3295246479918dca3733338a7"><b>[English] |</b> <a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/pix2text_cn"><b><span class="notion-blue">[中文版]</span></b></a></div></div><div class="notion-spacer"></div><div class="notion-column notion-block-dd0c0cad23aa47b7b8ad6b7243416773" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.3333333333333333)"><div class="notion-blank notion-block-77827d814dc448739802fc0419029835"> </div></div><div class="notion-spacer"></div></div><div class="notion-row notion-block-a10084b9ab9e4b3cb93c83d82dbe4ab7"><div class="notion-column notion-block-173a107dd2cd4bd898b9646e9396da30" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.125)"><div class="notion-blank notion-block-60b878ec095e4e1a8035083eb0613f70"> </div></div><div class="notion-spacer"></div><div class="notion-column notion-block-e28044a6c6c243b49348644d96e279b6" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.7916666666666666)"><div class="notion-text notion-block-af6205a4f2b741148010a4ea7a6f6531"><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://pix2text.readthedocs.io">📖 Docs</a></b></span> | <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://pix2text.readthedocs.io/zh-cn/stable/install/">🛠️ Install</a></b></span> | <a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://p2t.breezedeus.com"> </a><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://p2t.breezedeus.com">🖥️</a></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://p2t.breezedeus.com"> Online Service</a></b></span><span class="notion-blue"><b> </b></span><b>| </b><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/spaces/breezedeus/Pix2Text-Demo">🛀🏻 Demo</a></b></span><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/spaces/breezedeus/Pix2Text-Demo"> </a></span>| <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/join-group">💬 Contact</a></b></span></div><div class="notion-blank notion-block-c5fe61c5c901463e9370fdc76651492d"> </div></div><div class="notion-spacer"></div><div class="notion-column notion-block-9457b6a5c53642dba46bba0b53fc5f5e" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.08333333333333348)"><div class="notion-blank notion-block-7f4534162cb34bd49011a3f5d2e2429e"> </div></div><div class="notion-spacer"></div></div><div class="notion-row notion-block-1afb67ae6fdf49edb9acf62c21626060"><div class="notion-column notion-block-a57833d779ee46fea16cec2f36e1eebd" style="width:calc((100% - (1 * min(32px, 4vw))) * 0.5625)"><div class="notion-text notion-block-aa9cbb35f05a46318dd14ce6c0aec1c3"><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus/Pix2Text">Pix2Text (P2T) </a></b></span><span class="notion-blue"><b> </b></span>aims to be a <b>free and open-source Python</b> alternative to <a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://mathpix.com/"><b>Mathpix</b></a>, and it can already accomplish <b>Mathpix</b>&#x27;s core functionality. <b>Pix2Text (P2T) can recognize layouts, tables, images, text, mathematical formulas, and integrate all of these contents into Markdown format. P2T can also convert an entire PDF file (which can contain scanned images or any other format) into Markdown format.</b> The text recognition engine of Pix2Text supports <code class="notion-inline-code"><b>80+</b></code><b> languages</b>, including <b>English, Simplified Chinese, Traditional Chinese, Vietnamese</b>, etc.</div></div><div class="notion-spacer"></div><div class="notion-column notion-block-a7134efc814c4521af6fe04aee0b91ce" style="width:calc((100% - (1 * min(32px, 4vw))) * 0.43750000000000006)"><div class="notion-text notion-block-805ee5aae44a4468aa9e81d4f65edad6"><b>Contents:</b></div><div class="notion-table-of-contents notion-gray notion-block-8db905fde7ea473b894328ca3da652e2"><a href="#a22c3fafd0784507970ccc4e4a594fab" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">Online Service</span></a><a href="#dea5752daf624ef6b5097c63d16cccc8" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">Demo 🤗</span></a><a href="#abd662d2dde843eb9a44f5bb2c901852" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">Documentation</span></a><a href="#1ef6e20ee19143ddaca4822480993710" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">Available Models</span></a><a href="#4520c3c6277c45d4b4a6b77c3b6d2e92" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">Model Stores</span></a><a href="#7831261a2b4644bb882c054ae7ed9022" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">Purchasing the Math Formula Detection (MFD) models</span></a><a href="#4ec758684c7a4c5f85fc01ba46934591" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">Purchasing the Math Formula Recognition (MFR) models</span></a><a href="#067fbb588d514adabb059572162a55a3" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">Usage Instructions After Purchase</span></a><a href="#33e3624955c64a9690d55eec5203d7b4" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">Code Repo</span></a></div><div class="notion-blank notion-block-b3fb97796ba8404297b3c5f3a798edd1"> </div></div><div class="notion-spacer"></div></div><div class="notion-text notion-block-ebed3bc9c64649e0a305bab8c66889ef"><b>Pix2Text (P2T)</b> integrates the following models:</div><ul class="notion-list notion-list-disc notion-block-3828aae465954b27a43b52025724c7d1"><li><b>Layout Analysis Model</b>: <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/breezedeus/pix2text-layout">breezedeus/pix2text-layout</a></span> (<span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://hf-mirror.com/breezedeus/pix2text-layout">Mirror</a></span>).</li></ul><ul class="notion-list notion-list-disc notion-block-8d16bdb1ad07466496d324325ee5f60e"><li><b>Table Recognition Model</b>: <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/breezedeus/pix2text-table-rec">breezedeus/pix2text-table-rec</a></span> (<span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://hf-mirror.com/breezedeus/pix2text-table-rec">Mirror</a></span>).</li></ul><ul class="notion-list notion-list-disc notion-block-dc37644ba2b649339098879c50be1778"><li><b>Text Recognition Engine</b>: Supports <b>80+ languages</b> such as <b>English, Simplified Chinese, Traditional Chinese, Vietnamese</b>, etc. For English and Simplified Chinese recognition, it uses the open-source OCR tool <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus/cnocr">CnOCR</a></span>, while for other languages, it uses the open-source OCR tool <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/JaidedAI/EasyOCR">EasyOCR</a></span>.</li></ul><ul class="notion-list notion-list-disc notion-block-1947a9f938944c7eb09922ea421acbb3"><li><b>Mathematical Formula Detection Model (MFD)</b>: <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/breezedeus/pix2text-mfd">breezedeus/pix2text-mfd</a></span> (<span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://hf-mirror.com/breezedeus/pix2text-mfd">Mirror</a></span>). Implemented based on <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus/cnstd">CnSTD</a></span>.</li></ul><ul class="notion-list notion-list-disc notion-block-0690d440bacb476ab73183bab44c4b2d"><li><b>Mathematical Formula Recognition Model (MFR)</b>: <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/breezedeus/pix2text-mfr">breezedeus/pix2text-mfr</a></span> (<span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://hf-mirror.com/breezedeus/pix2text-mfr">Mirror</a></span>).</li></ul><div class="notion-text notion-block-1355372fe4d24b53b56eb62c63958322">Several models are contributed by other open-source authors, and their contributions are highly appreciated.</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-9c005fb97d214ac38534cfa051a4b072"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://pix2text.readthedocs.io/zh-cn/stable/figs/arch-flow.jpg?spaceId=9341931a-53f0-48e1-b026-0f1ad17b457c&amp;t=9c005fb9-7d21-4ac3-8534-cfa051a4b072" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-80e0a3c4fca94d87bf7cee4184153ddd">For detailed explanations, please refer to the <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://pix2text.readthedocs.io/zh-cn/stable/models/">Models</a></span>.</div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-a22c3fafd0784507970ccc4e4a594fab" data-id="a22c3fafd0784507970ccc4e4a594fab"><span><div id="a22c3fafd0784507970ccc4e4a594fab" class="notion-header-anchor"></div><a class="notion-hash-link" href="#a22c3fafd0784507970ccc4e4a594fab" title="Online Service"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Online Service</span></span></h2><div class="notion-row"><a target="_blank" rel="noopener noreferrer" class="notion-bookmark notion-blue_background notion-block-14c6b1a7798e49feb4121520ccd947ed" href="https://p2t.breezedeus.com/"><div><div class="notion-bookmark-title">Pix2Text (P2T) - Free Mathpix Alternative</div><div class="notion-bookmark-description">Use Pix2Text (P2T) to convert math formulas in images to text. Pix2Text is a free alternative to Mathpix that supports math formula recognition, LaTeX rendering, and export to various formats.</div><div class="notion-bookmark-link"><div class="notion-bookmark-link-icon"><img src="https://www.notion.so/image/https%3A%2F%2Fp2t.breezedeus.com%2Ffavicon.ico?table=block&amp;id=14c6b1a7-798e-49fe-b412-1520ccd947ed&amp;t=14c6b1a7-798e-49fe-b412-1520ccd947ed" alt="Pix2Text (P2T) - Free Mathpix Alternative" loading="lazy" decoding="async"/></div><div class="notion-bookmark-link-text">https://p2t.breezedeus.com/</div></div></div></a></div><div class="notion-blank notion-block-9769b8126e784e41aaa9502d05954ed3"> </div><div class="notion-text notion-block-ef0581bb391b4d3d9dcbed08765fed65">Everyone can use the <a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://p2t.breezedeus.com"><span class="notion-blue"><b>P2T Online Service</b></span></a> for <b>free</b>, with a daily limit of 10,000 characters per account, which should be sufficient for normal use. <em>Please refrain from bulk API calls, as machine resources are limited, and this could prevent others from accessing the service.</em></div><div class="notion-blank notion-block-b7c39dfa591445ac9e87843c28519638"> </div><div class="notion-text notion-block-b0fcf7139dcd4210996d5b5d4758815d">Due to hardware constraints, the Online Service currently only supports <b>Simplified Chinese</b> and <b>English</b> languages. To try the models in other languages, please use the following <b>Online Demo</b>.</div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-dea5752daf624ef6b5097c63d16cccc8" data-id="dea5752daf624ef6b5097c63d16cccc8"><span><div id="dea5752daf624ef6b5097c63d16cccc8" class="notion-header-anchor"></div><a class="notion-hash-link" href="#dea5752daf624ef6b5097c63d16cccc8" title="Demo 🤗"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Demo 🤗</span></span></h2><div class="notion-row"><a target="_blank" rel="noopener noreferrer" class="notion-bookmark notion-block-c9f5ff2de977425db562ebdd504e3160" href="https://huggingface.co/spaces/breezedeus/Pix2Text-Demo"><div><div class="notion-bookmark-title">Pix2Text - a Hugging Face Space by breezedeus</div><div class="notion-bookmark-description">Discover amazing ML apps made by the community</div><div class="notion-bookmark-link"><div class="notion-bookmark-link-icon"><img src="https://www.notion.so/image/https%3A%2F%2Fhuggingface.co%2Ffavicon.ico?table=block&amp;id=c9f5ff2d-e977-425d-b562-ebdd504e3160&amp;t=c9f5ff2d-e977-425d-b562-ebdd504e3160" alt="Pix2Text - a Hugging Face Space by breezedeus" loading="lazy" decoding="async"/></div><div class="notion-bookmark-link-text">https://huggingface.co/spaces/breezedeus/Pix2Text-Demo</div></div></div><div class="notion-bookmark-image"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fcdn-thumbnails.huggingface.co%2Fsocial-thumbnails%2Fspaces%2Fbreezedeus%2FPix2Text-Demo.png?table=block&amp;id=c9f5ff2d-e977-425d-b562-ebdd504e3160&amp;t=c9f5ff2d-e977-425d-b562-ebdd504e3160" alt="Pix2Text - a Hugging Face Space by breezedeus" loading="lazy" decoding="async"/></div></a></div><div class="notion-blank notion-block-877051836a4c41f9966e46fb8f2230c0"> </div><div class="notion-text notion-block-4fbf4b800cbe4e9f9edea2f6713eb49d">You can also try the <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/spaces/breezedeus/Pix2Text-Demo">Online Demo</a></b></span> to see the performance of <b>P2T</b> in various languages. However, the online demo operates on lower hardware specifications and may be slower. For Simplified Chinese or English images, it is recommended to use the <a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://p2t.breezedeus.com"><span class="notion-blue"><b>P2T Online Service</b></span></a>.</div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-abd662d2dde843eb9a44f5bb2c901852" data-id="abd662d2dde843eb9a44f5bb2c901852"><span><div id="abd662d2dde843eb9a44f5bb2c901852" class="notion-header-anchor"></div><a class="notion-hash-link" href="#abd662d2dde843eb9a44f5bb2c901852" title="Documentation"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Documentation</span></span></h2><div class="notion-row"><a target="_blank" rel="noopener noreferrer" class="notion-bookmark notion-block-c95160741d494eba912414d7c1e504eb" href="https://pix2text.readthedocs.io/"><div><div class="notion-bookmark-title">Pix2Text</div><div class="notion-bookmark-description">Pix2Text Online Documents</div><div class="notion-bookmark-link"><div class="notion-bookmark-link-icon"><img src="https://www.notion.so/image/https%3A%2F%2Fpix2text.readthedocs.io%2Fzh%2Flatest%2Ffigs%2Fbreezedeus.ico?table=block&amp;id=c9516074-1d49-4eba-9124-14d7c1e504eb&amp;t=c9516074-1d49-4eba-9124-14d7c1e504eb" alt="Pix2Text" loading="lazy" decoding="async"/></div><div class="notion-bookmark-link-text">https://pix2text.readthedocs.io/</div></div></div></a></div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-1ef6e20ee19143ddaca4822480993710" data-id="1ef6e20ee19143ddaca4822480993710"><span><div id="1ef6e20ee19143ddaca4822480993710" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1ef6e20ee19143ddaca4822480993710" title="Available Models"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Available Models</span></span></h2><div class="notion-text notion-block-284d7bc95ea0446c867c5f6032a693d3">P2T includes two kinds of models: <b>Math Formula Detection (MFD)</b> and <b>Math Formula Recognition (MFR)</b>. For details, see the <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus/Pix2Text"><b>project description</b></a></span>. By default, P2T uses free open-source models and will automatically download them when in use. Besides the free models, I will continue to optimize the models. The latest models require purchase for downloading and usage. If you are not deploying locally, it&#x27;s recommended to directly use the <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://p2t.breezedeus.com/"><b>P2T Online Service</b></a></span>, as the Online Service always utilizes the most recent models.</div><div class="notion-blank notion-block-93a26651a26c40a58b904f0823816238"> </div><div class="notion-text notion-block-08a7b55ac5ff43f88474aa4337af25ea">The current models (both latest) used in the Online Service are:</div><ul class="notion-list notion-list-disc notion-block-ce6da74deb224720a1c3b4c911527cb1"><li><b>MFR-Plus/MFR-Pro-1.5</b></li></ul><ul class="notion-list notion-list-disc notion-block-247c0110d3318076a252fc6d7db825c2"><li><b>MFD-Pro-1.5</b></li></ul><div class="notion-text notion-block-1e7221be671847939006be6c53590e96">The paid models used in the Online Service perform better than the open-source models. If you need to deploy the P2T service on your own, it&#x27;s advisable to purchase the <b>same models used in the Online Service</b>.</div><div class="notion-blank notion-block-64b266ae63c44a8680497fd9dd9af6eb"> </div><div class="notion-text notion-block-2e32345f676f40f4903692df04162006">To thank our <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/zsxq">Planet Members</a></b></span> for their support, <b>all models (only for personal use) are available at a 20% discount for Planet Members</b>. To purchase, <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/join-group#57c5ca3b4d9746ae8357af4f3316d8dc">add the assistant as a friend</a></b></span>, and after arranging payment, the assistant will provide the model files directly. Note: No discounts are offered for the enterprise versions.</div><div class="notion-blank notion-block-fe505cd6eac3426aa14663020af74492"> </div><div class="notion-text notion-block-f94c237bc6fe4953a33e729e138e1a2b"><b>Things to note before purchasing:</b></div><div class="notion-callout notion-orange_background_co notion-block-f336f6be2e9c4c36a9adf73144ebe60b"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="📌">📌</span></div><div class="notion-callout-text"><b>For personal use</b>, please follow the column <b>“Individual Purchase” </b>of the tables;<b> For business or commercial use</b>, please follow the column <b>“Commercial Purchase” </b>of the tables, or <a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/join-group"><b>contact the author</b></a><b> </b>(Email: breezedeus AT gmail.com).</div></div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-4520c3c6277c45d4b4a6b77c3b6d2e92" data-id="4520c3c6277c45d4b4a6b77c3b6d2e92"><span><div id="4520c3c6277c45d4b4a6b77c3b6d2e92" class="notion-header-anchor"></div><a class="notion-hash-link" href="#4520c3c6277c45d4b4a6b77c3b6d2e92" title="Model Stores"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Model Stores</span></span></h3><div class="notion-text notion-block-5a605f29e951459e8d5a8c6f5362f440">Model purchases are available at the following two stores:</div><table class="notion-simple-table notion-block-d17fc6d59db14b81b00a46832f97c6a4"><tbody><tr class="notion-simple-table-row notion-simple-table-header-row notion-block-8e8a0f428df1442f906e381e62801a79"><td class="notion-simple-table-header-cell" style="width:133.88494110107422px"><div class="notion-simple-table-cell">Store</div></td><td class="" style="width:549.9999923706055px"><div class="notion-simple-table-cell">Description</div></td></tr><tr class="notion-simple-table-row notion-block-3567bc009b114e9e9d75f00231e81565"><td class="notion-simple-table-header-cell" style="width:133.88494110107422px"><div class="notion-simple-table-cell"><span class="notion-blue"><b>Bilibili Mall</b></span></div></td><td class="" style="width:549.9999923706055px"><div class="notion-simple-table-cell">Only sells models for personal use. Cannot issue invoices.</div></td></tr><tr class="notion-simple-table-row notion-block-74adf31d177846ee9ef8ea164458eea3"><td class="notion-simple-table-header-cell" style="width:133.88494110107422px"><div class="notion-simple-table-cell"><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com">Lemon Squeezy</a></b></span></div></td><td class="" style="width:549.9999923706055px"><div class="notion-simple-table-cell">Sells models for <b>commercial</b> and <b>personal</b> use. The platform can issue invoices (US-style invoices).</div></td></tr></tbody></table><div class="notion-text notion-block-47b9c557f20041f2a0bf6af53e308e4c">Here are more specific instructions.</div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-7831261a2b4644bb882c054ae7ed9022" data-id="7831261a2b4644bb882c054ae7ed9022"><span><div id="7831261a2b4644bb882c054ae7ed9022" class="notion-header-anchor"></div><a class="notion-hash-link" href="#7831261a2b4644bb882c054ae7ed9022" title="Purchasing the Math Formula Detection (MFD) models"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Purchasing the Math Formula Detection (MFD) models</span></span></h3><div class="notion-sync-block notion-block-9ee1d8e4c74f4abb9ab5ecc955bf55f4"><div class="notion-text notion-block-44b5753fba0e4e43a2e8cd88eb8d3fde">Here are the purchase links for different versions. It is recommended to try the <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/spaces/breezedeus/Pix2Text-Demo">Online Demo</a></b></span> to verify the model&#x27;s performance before making a purchase. Each version has a different License; please click the links in the table to view the product details. If you have any issues, you can <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/join-group">contact the author</a></b></span>. The Enterprise version includes both the MFD and MFR models, so there is no need to buy them separately.</div></div><table class="notion-simple-table notion-block-3e1a9ef7ce114cd0b49ee7ad830a979b"><tbody><tr class="notion-simple-table-row notion-teal notion-simple-table-header-row notion-block-02e96f62a7fc409eae842e91f8090f87"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell">MFD Model Version</div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">Commercial 
Purchase</div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell"><b>Individual 
Purchase</b></div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell"><b>For 
Planet Members</b></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell"><b>Free 
Download</b></div></td></tr><tr class="notion-simple-table-row notion-block-247c0110d33180d48c9bc707c544b176"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code"><b>MFD-Advanced-1.5</b></code></div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">✖️</div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/ef80ff41-b113-4bf0-9516-7f44e49a6bba">Lemon Squeezy</a></b></span></div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://t.zsxq.com/FEYZRJQ">Free</a></b></span></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">✔️</div></td></tr><tr class="notion-simple-table-row notion-block-50cc567d33734ecfa6da1af57ff337eb"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code"><b>MFD-Pro-1.5</b></code></div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/bf5fecfa-64ca-40a3-89e5-ac019ae39a31">Enterprise Pro</a></b></span><span class="notion-blue"><b> 
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/54ecbde4-40e5-47cd-919d-c6ac6201523a">Enterprise Plus</a></b></span></div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://mall.bilibili.com/neul-next/detailuniversal/detail.html?isMerchant=1&amp;page=detailuniversal_detail&amp;saleType=10&amp;itemsId=12805387&amp;loadingShow=1&amp;noTitleBar=1&amp;msource=merchant_share">bilibili</a></b></span><span class="notion-blue"><b>
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/c8f2360b-cd46-4bf3-89be-6e1a2828137a">Lemon Squeezy</a></b></span></div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://t.zsxq.com/FEYZRJQ">20% off for personal use from Bilibili</a></b></span></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">✖️</div></td></tr><tr class="notion-simple-table-row notion-block-0ab2d37fb5f848cd8b4c5648a9201c7c"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code">mfd-advanced</code></div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">✖️</div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell">✔️<span class="notion-blue"><b>
 </b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/10953ec3-f903-42fa-996f-3c163b17bef8">Lemon Squeezy</a></b></span></div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell">✔️<b> </b><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://t.zsxq.com/wtH9m">Free</a></b></span></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">✖️</div></td></tr><tr class="notion-simple-table-row notion-block-7367c62e5a64442e831ac059ae9adeaf"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code">mfd-pro</code></div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/bf5fecfa-64ca-40a3-89e5-ac019ae39a31">Enterprise Pro</a></b></span><span class="notion-blue"><b> 
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/54ecbde4-40e5-47cd-919d-c6ac6201523a">Enterprise Plus</a></b></span></div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://mall.bilibili.com/neul-next/detailuniversal/detail.html?isMerchant=1&amp;page=detailuniversal_detail&amp;saleType=10&amp;itemsId=11883911&amp;loadingShow=1&amp;noTitleBar=1&amp;msource=merchant_share">Bilibili</a></b></span><span class="notion-blue"><b> 
 </b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/0f114ee9-6ba3-4ca0-849b-1f7dabf20cbb">Lemon Squeezy</a></b></span></div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell">✔️ <span class="notion-orange">20% off for personal use from Bilibili</span></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">✖️</div></td></tr></tbody></table><div class="notion-callout notion-gray_background_co notion-block-bb7c5c858d234924b7a78b5fe771a04e"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="📌">📌</span></div><div class="notion-callout-text">These models are only compatible with <b>Pix2Text ≥ V1.1.4</b>.</div></div><div class="notion-sync-block notion-block-65cbd9e72fd843bfb56df527a5cc842a"><div class="notion-text notion-block-dbc87d8a4b364c8bbb5900fe6d72a0db">For detailed descriptions, see</div><ul class="notion-list notion-list-disc notion-block-247c0110d331800680c2c43d776c4dfc"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/pix2text-model-1.5">Pix2Text New Version of Mathematical Formula Detection and Recognition Models: V1.5</a></b></span>.</li></ul><ul class="notion-list notion-list-disc notion-block-247c0110d33180a4ba06dd3b0a8cf7de"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/p2t-mfd-v1.1.1">Pix2Text V1.1.1 New Release, Bringing an Improved Math Formula Detection Model</a></b></span>.</li></ul></div><div class="notion-blank notion-block-0b7f68df74f543d5b9e27c9dbebc570d"> </div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-4ec758684c7a4c5f85fc01ba46934591" data-id="4ec758684c7a4c5f85fc01ba46934591"><span><div id="4ec758684c7a4c5f85fc01ba46934591" class="notion-header-anchor"></div><a class="notion-hash-link" href="#4ec758684c7a4c5f85fc01ba46934591" title="Purchasing the Math Formula Recognition (MFR) models"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Purchasing the Math Formula Recognition (MFR) models</span></span></h3><div class="notion-sync-block notion-block-9ee1d8e4c74f4abb9ab5ecc955bf55f4"><div class="notion-text notion-block-44b5753fba0e4e43a2e8cd88eb8d3fde">Here are the purchase links for different versions. It is recommended to try the <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/spaces/breezedeus/Pix2Text-Demo">Online Demo</a></b></span> to verify the model&#x27;s performance before making a purchase. Each version has a different License; please click the links in the table to view the product details. If you have any issues, you can <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/join-group">contact the author</a></b></span>. The Enterprise version includes both the MFD and MFR models, so there is no need to buy them separately.</div></div><table class="notion-simple-table notion-block-da313f33297a4198a2fc303e39fc2fdd"><tbody><tr class="notion-simple-table-row notion-teal notion-simple-table-header-row notion-block-0834b6b20dcf432e91a1e051dcd99614"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell">MFR Model Version</div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">Commercial 
Purchase</div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell"><b>Individual 
Purchase</b></div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell"><b>For 
Planet Members</b></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell"><b>Free 
Download</b></div></td></tr><tr class="notion-simple-table-row notion-block-e65eb275b1c542bb8fd0db0489765eff"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code"><b>MFR-Pro-1.5</b></code></div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/bf5fecfa-64ca-40a3-89e5-ac019ae39a31">Enterprise Pro</a></b></span><span class="notion-blue"><b> 
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/54ecbde4-40e5-47cd-919d-c6ac6201523a">Enterprise Plus</a></b></span></div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://mall.bilibili.com/neul-next/detailuniversal/detail.html?isMerchant=1&amp;page=detailuniversal_detail&amp;saleType=10&amp;itemsId=12805401&amp;loadingShow=1&amp;noTitleBar=1&amp;msource=merchant_share">bilibili</a></b></span><span class="notion-blue"><b>
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/ab343594-fe6c-4f92-89c9-6c1682f84ff4">Lemon Squeezy</a></b></span></div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://t.zsxq.com/FEYZRJQ">20% off for personal use from Bilibili</a></b></span></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">✖️</div></td></tr><tr class="notion-simple-table-row notion-block-7b5d2cb3291845f4b4412bb47c9d3f32"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code">mfr-pro</code></div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/bf5fecfa-64ca-40a3-89e5-ac019ae39a31">Enterprise Pro</a></b></span><span class="notion-blue"><b> </b></span></div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://mall.bilibili.com/neul-next/detailuniversal/detail.html?isMerchant=1&amp;page=detailuniversal_detail&amp;saleType=10&amp;itemsId=11884166&amp;loadingShow=1&amp;noTitleBar=1&amp;msource=merchant_share">Bilibili</a></b></span><span class="notion-blue"><b>
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/ab9171ff-c659-4932-afde-1eaeb680805d">Lemon Squeezy</a></b></span></div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell">✔️ <span class="notion-orange">20% off for personal use from Bilibili</span></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">✖️</div></td></tr><tr class="notion-simple-table-row notion-block-392ddf8e54bc45678da0fce8b55c3dba"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code">mfr-plus</code></div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/54ecbde4-40e5-47cd-919d-c6ac6201523a">Enterprise Plus</a></b></span></div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell">✖️</div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell">✖️</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">✖️</div></td></tr></tbody></table><div class="notion-callout notion-gray_background_co notion-block-1532b12a190145a1bbd147de64ea1b4a"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="📌">📌</span></div><div class="notion-callout-text">These models are compatible with both <b>Pix2Text V1.0, V1.1, V1.1.1</b>.</div></div><div class="notion-sync-block notion-block-65cbd9e72fd843bfb56df527a5cc842a"><div class="notion-text notion-block-dbc87d8a4b364c8bbb5900fe6d72a0db">For detailed descriptions, see</div><ul class="notion-list notion-list-disc notion-block-247c0110d331800680c2c43d776c4dfc"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/pix2text-model-1.5">Pix2Text New Version of Mathematical Formula Detection and Recognition Models: V1.5</a></b></span>.</li></ul><ul class="notion-list notion-list-disc notion-block-247c0110d33180a4ba06dd3b0a8cf7de"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/p2t-mfd-v1.1.1">Pix2Text V1.1.1 New Release, Bringing an Improved Math Formula Detection Model</a></b></span>.</li></ul></div><div class="notion-blank notion-block-77d0d5af73e2461998a69c9d31007715"> </div><div class="notion-blank notion-block-5413b4cfffca4c758900f955f1342685"> </div><div class="notion-text notion-block-858c5717fad74657b489195f0a46409c"><b>Pix2Text V1.1/V1.0 includes two types of enterprise editions</b>. The differences of both are shown in the figure below. The <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/bf5fecfa-64ca-40a3-89e5-ac019ae39a31">Enterprise Pro Edition</a></b></span> is a one-time purchase; new models require a separate purchase. The <b>Enterprise Pro Edition</b> is allowed only for internal corporate use or for providing free services externally (such as educational institutions), and cannot be used for offering paid services. The <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/54ecbde4-40e5-47cd-919d-c6ac6201523a">Enterprise Plus Edition</a></b></span> comes with free access to all new models for one year after purchase. The <b>Enterprise Plus Edition</b> not only provides the Pro models but also offers the <b>Plus models</b>. Additionally, it includes PyTorch versions of all models, enabling enterprises to fine-tune the models with their own data or convert them into other required model formats, such as CoreML. The <b>Enterprise Plus Edition</b> permits the provision of paid services.</div><div class="notion-text notion-block-8ed8a349f102472aa0023a833edf9903">For more detailed information, please visit the <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/"><b>Model Store</b></a></span> (specific details are available on the product detail pages).</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-cb25044987bc4f568bdc6b9038b872c6"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F9341931a-53f0-48e1-b026-0f1ad17b457c%2F2e65ae00-65ea-4ed1-a6d7-af562daf6d70%2FUntitled.jpeg?table=block&amp;id=cb250449-87bc-4f56-8bdc-6b9038b872c6&amp;t=cb250449-87bc-4f56-8bdc-6b9038b872c6&amp;width=707.9971313476562&amp;cache=v2" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-blank notion-block-1d37f5db65f44d19880a1aea10f8fd4d"> </div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-067fbb588d514adabb059572162a55a3" data-id="067fbb588d514adabb059572162a55a3"><span><div id="067fbb588d514adabb059572162a55a3" class="notion-header-anchor"></div><a class="notion-hash-link" href="#067fbb588d514adabb059572162a55a3" title="Usage Instructions After Purchase"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Usage Instructions After Purchase</span></span></h3><div class="notion-text notion-block-15bfac2c533b40a992ababd90b157583">After purchasing the <b>Enterprise Pro/Plus Edition</b> through the <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/">Model Store</a></b></span>, you can download two compressed files related to the models. The file starting with <code class="notion-inline-code">p2t-mfd-</code> is the MFD (Math Formula Detection) model, and the one starting with <code class="notion-inline-code">p2t-mfr-</code> is the MFR (Math Formula Recognition) model. After unzipping the MFD model file, you will find a folder named <code class="notion-inline-code">yolov7-model</code> containing the model file, for example, <code class="notion-inline-code">mfd-yolov7-20230613.pt</code>. Suppose the path to the file <code class="notion-inline-code">p2t-mfr-20230702.pth</code> is <code class="notion-inline-code">abc/def/yolov7-model/p2t-mfr-20230702.pth</code>. After unzipping the MFR model file, you will find a folder named <code class="notion-inline-code">mfr-pro-onnx</code>, which includes the model file and related configuration files. Assume the path to the <code class="notion-inline-code">mfr-pro-onnx</code> folder is <code class="notion-inline-code">abc/def/mfr-pro-onnx</code>.</div><div class="notion-blank notion-block-8be77cebb54c4b02b78c1d73b8120080"> </div><div class="notion-text notion-block-09ec17d99ef64c8a8b6843344a0f62f0">The usage instructions for the various versions of Pix2Text are as follows (the latest version is recommended always):</div><ul class="notion-list notion-list-disc notion-block-247c0110d33180108e47d1a7b0c1ff2c"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/pix2text-model-1.5">Pix2Text New Version of Mathematical Formula Detection and Recognition Models: V1.5</a></b></span>.</li></ul><ul class="notion-list notion-list-disc notion-block-cc2fd1c0f8ed493da20ec821e9308b03"><li>For more information about P2T V1.1.1, please refer to: <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/p2t-mfd-v1.1.1">Pix2Text V1.1.1 New Release, Bringing an Improved Math Formula Detection Model</a></span>.</li></ul><ul class="notion-list notion-list-disc notion-block-11583b1d48f842a2bcde8c5bb4be6d03"><li>If you are using P2T <b>V1.1</b>, please refer to: <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/p2t-v1.1">Pix2Text V1.1 New Released, now supporting PDF to Markdown conversion</a></span><span class="notion-blue"><b> </b></span>.</li></ul><details class="notion-toggle notion-block-46ed4a4f4ad348d7a5442e5737e85080"><summary>If you are using P2T <b>V1.0</b>, please refer to: <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/p2t-v1.0">Pix2Text V1.0 New Release: The Best Open-Source Formula Recognition Model</a></span> .</summary><div><div class="notion-text notion-block-3d422db3c7cb44418d2d6e675ad50e79">When initializing Pix2Text, pass the parameters as follows. The usage after initialization is the same as the open-source model, and the structure of the detection and recognition results is also the same.</div><div class="notion-blank notion-block-bc1399c51eb54cefa5b2b3d2de435c3d"> </div><div class="notion-text notion-block-50aad67543494021bc9ad55512391e11">If you purchase the <b>Enterprise Pro Subscription Edition</b>, you will have access to more model files (currently 5), including the PyTorch version of MFR and the latest paid model of <b>CnOCR (text OCR)</b> (both ONNX and PyTorch versions), which has better recognition performance for English and Simplified Chinese text. Use the following method to input the corresponding model.</div><div class="notion-callout notion-gray_background_co notion-block-d7ddbe96ea4b4f97b0bdce405fae40b0"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="📌">📌</span></div><div class="notion-callout-text">The <b>CnOCR</b> text model only supports <b>English</b> and <b>Simplified Chinese</b>. If you need to recognize text in other languages, do not use the CnOCR model. Simply remove the <code class="notion-inline-code">text_config</code> from the code above.</div></div></div></details><div class="notion-blank notion-block-38bbf6c04f954b4c976121345f298c49"> </div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-33e3624955c64a9690d55eec5203d7b4" data-id="33e3624955c64a9690d55eec5203d7b4"><span><div id="33e3624955c64a9690d55eec5203d7b4" class="notion-header-anchor"></div><a class="notion-hash-link" href="#33e3624955c64a9690d55eec5203d7b4" title="Code Repo"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Code Repo</span></span></h2><ul class="notion-list notion-list-disc notion-block-e254f80eadc5435589e686451fe19639"><li><b>Github</b>：<a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus/Pix2Text">https://github.com/breezedeus/Pix2Text</a></li></ul><ul class="notion-list notion-list-disc notion-block-c0f7f764dd9044e2a294575b94de1d1c"><li><b>Gitee</b>：<a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://gitee.com/breezedeus/pix2text">https://gitee.com/breezedeus/pix2text</a></li></ul><div class="notion-blank notion-block-23b97e2658964ee8932cd80063373790"> </div><div class="notion-callout notion-orange_background_co notion-block-274629634f4b4acaa3475761390f2b10"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="📌">📌</span></div><div class="notion-callout-text">P2T uses <b><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/cnocr">CnOCR</a></span></b><b><span class="notion-blue"> </span></b>or<b> </b><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/JaidedAI/EasyOCR"><b>EasyOCR</b></a></span> to recognize the text part in images. For more information on <b>CnOCR</b>, refer to <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/cnocr"><b>this link</b></a></span>.</div></div><div class="notion-blank notion-block-b64e49cf243d418eb8777bda9a8edf7d"> </div><div class="notion-callout notion-orange_background_co notion-block-161d876aefa148179db79120d62bd4ea"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="📌">📌</span></div><div class="notion-callout-text"><div class="notion-text notion-block-0571e9677a314a73926d2890909faabd">Make sure you&#x27;ve successfully run<span class="notion-blue"> </span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus/Pix2Text">Pix2Text</a></b></span> using the open-source models. Otherwise, after downloading the paid models, you might encounter problems getting them to work. Detailed installation and usage instructions can be found in the <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus/Pix2Text">Pix2Text</a></span> project documentation. If you face any issues, feel free to comment here or <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/join-group">join the group chat</a></b></span> to communicate with me. However, <em>please note that</em><em><span class="notion-red"> helping you to get the code running is not within the services provided by the Planet host</span></em> (refer to <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/zsxq">Planet Description</a></span>).</div></div></div><div class="notion-blank notion-block-247c0110d331805c9deef29a83c05260"> </div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[P2T详细资料]]></title>
            <link>https://www.breezedeus.com/article/pix2text_cn</link>
            <guid>https://www.breezedeus.com/article/pix2text_cn</guid>
            <pubDate>Mon, 26 Feb 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[Pix2Text: a Free Alternative to Mathpix (Pix In, Latex & Text Out). Pix2Text 识别图片中的文字和数学公式的Latex表达。]]></description>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-c45d6b3a15a9461bb3359c22851e9e13"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-row notion-block-9d8a0ac4fcd44664bfa23d1acc7d356f"></div><div class="notion-row notion-block-af34f19f274c4849b146a66a87ceba14"><div class="notion-column notion-block-6bd51dfb4f3a48b5afd2b7cc86f40618" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.41666666666666663)"><div class="notion-blank notion-block-3aa6fe0bac464b49818408d459d76804"> </div></div><div class="notion-spacer"></div><div class="notion-column notion-block-3ef3de3759884641bb6e1a35ea406839" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.2500000000000001)"><div class="notion-text notion-block-cbf9c053e7724db28329e8bfcbe8765b"><b>[中文]</b> <b>|</b> <a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/pix2text"><b><span class="notion-blue">[English]</span></b></a></div></div><div class="notion-spacer"></div><div class="notion-column notion-block-b577a23b70e84cb08af6a55c7ba3c684" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.3333333333333333)"><div class="notion-blank notion-block-e3708011c4d445d18806212efb8cc9ff"> </div></div><div class="notion-spacer"></div></div><div class="notion-row notion-block-23e642324ec24c43bb0c4a341f44ff4a"><div class="notion-column notion-block-86f25214a1864ae4b8ed84c83fe05119" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.125)"><div class="notion-blank notion-block-52080a9c85604be788946a5f9e213c26"> </div></div><div class="notion-spacer"></div><div class="notion-column notion-block-437eb181846b4a71bd7e805eb8f0e44f" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.75)"><div class="notion-text notion-block-600b97dcf02c46e8b6908fa3ccefb06d"><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://pix2text.readthedocs.io">📖 在线文档</a></b></span> | <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://pix2text.readthedocs.io/zh-cn/stable/install/">🛠️ 安装</a></b></span> | <a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://p2t.breezedeus.com"> </a><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://p2t.breezedeus.com">🖥️ 网页版</a></b></span> | <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/spaces/breezedeus/Pix2Text-Demo">🛀🏻 在线Demo</a></b></span> | <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/join-group">💬 交流群</a></b></span></div><div class="notion-blank notion-block-0fefa6ebe632403b8dee074fad05c07b"> </div></div><div class="notion-spacer"></div><div class="notion-column notion-block-36fc592435634768874751897189e052" style="width:calc((100% - (2 * min(32px, 4vw))) * 0.1250000000000002)"><div class="notion-blank notion-block-b199b21a53834dc3819dcef6aa482e4d"> </div></div><div class="notion-spacer"></div></div><div class="notion-row notion-block-a5a7994cd7474c19affe4fd0056ee219"><div class="notion-column notion-block-1f3a605fc2f64cbbb7ee4603bf254ba2" style="width:calc((100% - (1 * min(32px, 4vw))) * 0.625)"><div class="notion-text notion-block-dac70786e0f9408fb6a79945c4a95095"><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus/Pix2Text">Pix2Text (P2T) </a></b></span><span class="notion-blue"><b> </b></span>期望成为 <a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://mathpix.com/"><b>Mathpix</b></a> 的<b>免费开源 Python</b> 替代工具，目前已经可以完成 <b>Mathpix</b> 的核心功能。 <b>Pix2Text (P2T) 可以识别图片中的版面、表格、图片、文字、数学公式等内容，并整合所有内容后以 Markdown 格式输出。P2T 也可以把一整个 PDF 文件（PDF 的内容可以是扫描图片或者其他任何格式）转换为 Markdown 格式。P2T</b> 的文字识别引擎已<b>支持</b> <code class="notion-inline-code"><b>80+</b></code><b> 种语言</b>，如<b>英文、简体中文、繁体中文、越南语</b>等。</div><div class="notion-blank notion-block-64fab672b1dc4e358811dadbf4054e7d"> </div></div><div class="notion-spacer"></div><div class="notion-column notion-block-adc9382fce8247b8812fb92299e1a827" style="width:calc((100% - (1 * min(32px, 4vw))) * 0.375)"><div class="notion-text notion-block-c829a483785542a985fb39f4b0d25df6"><b>目录：</b></div><div class="notion-table-of-contents notion-gray notion-block-62c4a6ac07bb4c7eb4ab2e36bc64e4d8"><a href="#740bba4e7d1248549d56a2b3ea6e2fc9" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">P2T 网页版</span></a><a href="#7f3ac7e90394434a8df8ed53ae9612c1" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">在线 Demo</span></a><a href="#90504b934b5d43a492a3bab81888e86a" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">在线文档</span></a><a href="#b199ffd08db846408d8300ec39bd55d9" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">付费模型</span></a><a href="#301429b358cd4112aea8e79d655f7b2c" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">模型商店</span></a><a href="#2be42698299748e8b79381375176602a" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">购买数学公式检测（MFD）模型</span></a><a href="#6d66ed25050446889a4192ff2e55eb56" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">购买数学公式识别（MFR）模型</span></a><a href="#46b6138f29ce456391c09e83c485ba30" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:24px">购买后使用说明</span></a><a href="#4cabddb92b5441d691ecad470d12da0b" class="notion-table-of-contents-item"><span class="notion-table-of-contents-item-body" style="display:inline-block;margin-left:0">代码库</span></a></div><div class="notion-blank notion-block-6cd9918fc1784a33b03a006f71349167"> </div></div><div class="notion-spacer"></div></div><div class="notion-text notion-block-66afe0ccf5cd4d67b23c477729d688ec"><b>Pix2Text</b> 当前整合了以下模型：</div><ul class="notion-list notion-list-disc notion-block-112fb318035a4fa39b8a03a3ff6c1548"><li><b>版面分析模型</b>：<span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/breezedeus/pix2text-layout">breezedeus/pix2text-layout</a></span> （<span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://hf-mirror.com/breezedeus/pix2text-layout">国内镜像</a></span>）。</li></ul><ul class="notion-list notion-list-disc notion-block-f4fb016333db4fb8a1786eb1c1ae7011"><li><b>表格识别模型</b>：<span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/breezedeus/pix2text-table-rec">breezedeus/pix2text-table-rec</a></span> （<span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://hf-mirror.com/breezedeus/pix2text-table-rec">国内镜像</a></span>）。</li></ul><ul class="notion-list notion-list-disc notion-block-878c04dfadf745cc88e378ba79cf5600"><li><b>文字识别引擎</b>：支持 <code class="notion-inline-code"><b>80+</b></code><b> 种语言</b>，如<b>英文、简体中文、繁体中文、越南语</b>等。其中，<b>英文</b>和<b>简体中文</b>识别使用的是开源 OCR 工具 <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus/cnocr">CnOCR</a></span> ，其他语言的识别使用的是开源 OCR 工具 <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/JaidedAI/EasyOCR">EasyOCR</a></span> 。</li></ul><ul class="notion-list notion-list-disc notion-block-e516eee82098470fa3714c8c61ad907c"><li><b>数学公式检测模型（MFD）</b>：<span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/breezedeus/pix2text-mfd">breezedeus/pix2text-mfd</a></span>（<span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://hf-mirror.com/breezedeus/pix2text-mfd">国内镜像</a></span>）。基于 <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus/cnstd">CnSTD</a></span> 实现。</li></ul><ul class="notion-list notion-list-disc notion-block-e3689f9105ea40f5b8dc5bd66bca08bf"><li><b>数学公式识别模型（MFR）</b>：<span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/breezedeus/pix2text-mfr">breezedeus/pix2text-mfr</a></span>（<span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://hf-mirror.com/breezedeus/pix2text-mfr">国内镜像</a></span>）。</li></ul><div class="notion-text notion-block-5d44c536858e4910bdb0289609d1dea8">其中多个模型来自其他开源作者， 非常感谢他们的贡献。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-b7419361e9c744b5adee3b19b8d2611f"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://pix2text.readthedocs.io/zh-cn/stable/figs/arch-flow.jpg?spaceId=9341931a-53f0-48e1-b026-0f1ad17b457c&amp;t=b7419361-e9c7-44b5-adee-3b19b8d2611f" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-cf3e8daf89194619952cb325493b78de">具体说明请参考 <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://pix2text.readthedocs.io/zh-cn/stable/models/">可用模型</a></span>。</div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-740bba4e7d1248549d56a2b3ea6e2fc9" data-id="740bba4e7d1248549d56a2b3ea6e2fc9"><span><div id="740bba4e7d1248549d56a2b3ea6e2fc9" class="notion-header-anchor"></div><a class="notion-hash-link" href="#740bba4e7d1248549d56a2b3ea6e2fc9" title="P2T 网页版"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">P2T 网页版</span></span></h2><div class="notion-row"><a target="_blank" rel="noopener noreferrer" class="notion-bookmark notion-blue_background notion-block-62a45c77b37f405090161210cfff2d27" href="https://p2t.breezedeus.com/"><div><div class="notion-bookmark-title">Pix2Text (P2T) - Free Mathpix Alternative</div><div class="notion-bookmark-description">Use Pix2Text (P2T) to convert math formulas in images to text. Pix2Text is a free alternative to Mathpix that supports math formula recognition, LaTeX rendering, and export to various formats.</div><div class="notion-bookmark-link"><div class="notion-bookmark-link-icon"><img src="https://www.notion.so/image/https%3A%2F%2Fp2t.breezedeus.com%2Ffavicon.ico?table=block&amp;id=62a45c77-b37f-4050-9016-1210cfff2d27&amp;t=62a45c77-b37f-4050-9016-1210cfff2d27" alt="Pix2Text (P2T) - Free Mathpix Alternative" loading="lazy" decoding="async"/></div><div class="notion-bookmark-link-text">https://p2t.breezedeus.com/</div></div></div></a></div><div class="notion-blank notion-block-e97a27f75b1245c0b2ad85147ed40f19"> </div><div class="notion-text notion-block-113999c701cd4df1bde42912a598cd20">所有人都可以免费使用 <a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://p2t.breezedeus.com"><span class="notion-blue"><b>P2T网页版</b></span></a>，每人每天可以免费识别 10000 个字符，正常使用应该够用了。<em>请不要批量调用接口，机器资源有限，批量调用会导致其他人无法使用服务。</em></div><div class="notion-blank notion-block-bda73cb692fc4df285061d7d5071309c"> </div><div class="notion-text notion-block-27aeb09483af4af5903c4ceeb80e462a">受限于机器资源，网页版当前只支持<b>简体中文和英文</b>，要尝试其他语言上的效果，请使用以下的<b>在线 Demo</b>。</div><div class="notion-blank notion-block-e4827f55c3024e40a2e005cdbd52947a"> </div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-7f3ac7e90394434a8df8ed53ae9612c1" data-id="7f3ac7e90394434a8df8ed53ae9612c1"><span><div id="7f3ac7e90394434a8df8ed53ae9612c1" class="notion-header-anchor"></div><a class="notion-hash-link" href="#7f3ac7e90394434a8df8ed53ae9612c1" title="在线 Demo"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">在线 Demo</span></span></h2><div class="notion-row"><a target="_blank" rel="noopener noreferrer" class="notion-bookmark notion-block-1275eccae0b248df9606bf924626a76d" href="https://huggingface.co/spaces/breezedeus/Pix2Text-Demo"><div><div class="notion-bookmark-title">Pix2Text - a Hugging Face Space by breezedeus</div><div class="notion-bookmark-description">Discover amazing ML apps made by the community</div><div class="notion-bookmark-link"><div class="notion-bookmark-link-icon"><img src="https://www.notion.so/image/https%3A%2F%2Fhuggingface.co%2Ffavicon.ico?table=block&amp;id=1275ecca-e0b2-48df-9606-bf924626a76d&amp;t=1275ecca-e0b2-48df-9606-bf924626a76d" alt="Pix2Text - a Hugging Face Space by breezedeus" loading="lazy" decoding="async"/></div><div class="notion-bookmark-link-text">https://huggingface.co/spaces/breezedeus/Pix2Text-Demo</div></div></div><div class="notion-bookmark-image"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fcdn-thumbnails.huggingface.co%2Fsocial-thumbnails%2Fspaces%2Fbreezedeus%2FPix2Text-Demo.png?table=block&amp;id=1275ecca-e0b2-48df-9606-bf924626a76d&amp;t=1275ecca-e0b2-48df-9606-bf924626a76d" alt="Pix2Text - a Hugging Face Space by breezedeus" loading="lazy" decoding="async"/></div></a></div><div class="notion-blank notion-block-09ad9d49ea1643a98150b4be16e70d99"> </div><div class="notion-text notion-block-38167370282c4ee2a1a9689dd000c536"><b>国内镜像</b>（不用梯子，但未必长久可用）：</div><div class="notion-row"><a target="_blank" rel="noopener noreferrer" class="notion-bookmark notion-block-746f1aa7ff9443bfbf955f81fa791fa1" href="https://hf.qhduan.com/spaces/breezedeus/Pix2Text-Demo"><div><div class="notion-bookmark-title">Pix2Text - a Hugging Face Space by breezedeus</div><div class="notion-bookmark-description">Discover amazing ML apps made by the community</div><div class="notion-bookmark-link"><div class="notion-bookmark-link-icon"><img src="https://www.notion.so/image/https%3A%2F%2Fhf.qhduan.com%2Ffavicon.ico?table=block&amp;id=746f1aa7-ff94-43bf-bf95-5f81fa791fa1&amp;t=746f1aa7-ff94-43bf-bf95-5f81fa791fa1" alt="Pix2Text - a Hugging Face Space by breezedeus" loading="lazy" decoding="async"/></div><div class="notion-bookmark-link-text">https://hf.qhduan.com/spaces/breezedeus/Pix2Text-Demo</div></div></div><div class="notion-bookmark-image"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fcdn-thumbnails.huggingface.co%2Fsocial-thumbnails%2Fspaces%2Fbreezedeus%2FPix2Text-Demo.png?table=block&amp;id=746f1aa7-ff94-43bf-bf95-5f81fa791fa1&amp;t=746f1aa7-ff94-43bf-bf95-5f81fa791fa1" alt="Pix2Text - a Hugging Face Space by breezedeus" loading="lazy" decoding="async"/></div></a></div><div class="notion-blank notion-block-69068ba544324f45984b799cc47ad5c9"> </div><div class="notion-text notion-block-feee889e51304da1a51354b2b72248d2">可以使用 <b>在线 Demo</b> 尝试 <b>P2T</b> 在不同语言上的效果。但在线 Demo 使用的硬件配置较低，速度会较慢。如果是简体中文或者英文图片，建议使用 <a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://p2t.breezedeus.com"><span class="notion-blue"><b>P2T网页版</b></span></a>。</div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-90504b934b5d43a492a3bab81888e86a" data-id="90504b934b5d43a492a3bab81888e86a"><span><div id="90504b934b5d43a492a3bab81888e86a" class="notion-header-anchor"></div><a class="notion-hash-link" href="#90504b934b5d43a492a3bab81888e86a" title="在线文档"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">在线文档</span></span></h2><div class="notion-row"><a target="_blank" rel="noopener noreferrer" class="notion-bookmark notion-block-11f1d0e8e7194a2396d0039b42a031c2" href="https://pix2text.readthedocs.io/"><div><div class="notion-bookmark-title">Pix2Text</div><div class="notion-bookmark-description">Pix2Text Online Documents</div><div class="notion-bookmark-link"><div class="notion-bookmark-link-icon"><img src="https://www.notion.so/image/https%3A%2F%2Fpix2text.readthedocs.io%2Fzh%2Flatest%2Ffigs%2Fbreezedeus.ico?table=block&amp;id=11f1d0e8-e719-4a23-96d0-039b42a031c2&amp;t=11f1d0e8-e719-4a23-96d0-039b42a031c2" alt="Pix2Text" loading="lazy" decoding="async"/></div><div class="notion-bookmark-link-text">https://pix2text.readthedocs.io/</div></div></div></a></div><div class="notion-blank notion-block-bcfa2910611545e49634adb91efd407c"> </div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-b199ffd08db846408d8300ec39bd55d9" data-id="b199ffd08db846408d8300ec39bd55d9"><span><div id="b199ffd08db846408d8300ec39bd55d9" class="notion-header-anchor"></div><a class="notion-hash-link" href="#b199ffd08db846408d8300ec39bd55d9" title="付费模型"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">付费模型</span></span></h2><div class="notion-text notion-block-22bae78764984352aed1c8aa9d7e2443"><b>P2T</b> 中包含<b>数学公式检测（MFD）</b>和<b>数学公式识别（MFR）</b>两种模型，细节参见 <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus/Pix2Text">项目说明</a></span>。P2T 默认会使用免费的开源模型，使用时进行自动下载。但免费模型之外，我也会持续优化模型。最新的模型需要购买才可下载使用。如果不做本地部署，建议直接使用 <a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://p2t.breezedeus.com"><span class="notion-blue"><b>P2T网页版</b></span></a>，网页版会一直使用最新的模型。</div><div class="notion-blank notion-block-bd7da03073d04fd6a56caa5087b56a72"> </div><div class="notion-text notion-block-e770f009e7984f7b8dc2f48b477da11d">当前网页版使用了最新的模型：</div><ul class="notion-list notion-list-disc notion-block-bdfa3402a49948edb219b6ecc4a5c408"><li><b>MFR-Plus/MFR-Pro-1.5</b></li></ul><ul class="notion-list notion-list-disc notion-block-d09c080fa8374a96a888cf2fc3ac5437"><li><b>MFD-Pro-1.5</b></li></ul><div class="notion-text notion-block-56847ed301564f058f886cbe6254f151">P2T网页版使用的付费模型效果比开源模型好。如果你需要自己部署P2T服务，建议你购买<b>网页版同款</b>模型。</div><div class="notion-blank notion-block-8a3be69542bf415792adfcd8afab7cdc"> </div><div class="notion-sync-block notion-block-4535763428ac4ee6986309a32dcab01c"><div class="notion-text notion-block-f61cbb08f5c94841addc904467578736">为感谢<span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/zsxq"><b>星球会员</b></a></span>的支持，<b><span class="notion-orange">星球会员购买B站所有的个人版模型一律八折。</span></b>通过下面表格中的链接<b>购买并确认收货</b>后，<span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/join-group#57c5ca3b4d9746ae8357af4f3316d8dc">加小助手为好友</a></span>，小助手会把折扣金额返现。注意：企业版不提供折扣。</div></div><div class="notion-blank notion-block-8b78d1a47fd3458187b2f3c9cfdf6c7f"> </div><div class="notion-text notion-block-55a5d29cf1b24c5ea90f3524b1ea202b"><b>购买前注意事项：</b></div><div class="notion-callout notion-orange_background_co notion-block-36d4ce7ebdde43ac83b0e91b819d1cec"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="📌">📌</span></div><div class="notion-callout-text">请确保你用开源的模型跑通了 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus/Pix2Text">Pix2Text</a></b></span>，否则你下载完付费模型可能跑不起来。详细安装和使用说明看 <a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus/Pix2Text">Pix2Text</a> 项目文档就行。遇到问题可以在这里评论，或者<span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/join-group">加入群聊</a></b></span>与我沟通，但<em><span class="notion-red">请注意帮你跑通代码不在作者的服务范围之内</span></em>（参考 <span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/zsxq">星球说明</a></span>）。</div></div><div class="notion-callout notion-orange_background_co notion-block-2c312fc37bd74cc58a884dec7340a306"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="📌">📌</span></div><div class="notion-callout-text">个人使用请参考以下表格中的“<b>个人购买</b>”列；企业购买请参考以下表格中的“<b>企业购买</b>”列，或者 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/join-group">联系作者</a></b></span>。</div></div><div class="notion-blank notion-block-30bddae82afc411d96f206ea9fdbdb2c"> </div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-301429b358cd4112aea8e79d655f7b2c" data-id="301429b358cd4112aea8e79d655f7b2c"><span><div id="301429b358cd4112aea8e79d655f7b2c" class="notion-header-anchor"></div><a class="notion-hash-link" href="#301429b358cd4112aea8e79d655f7b2c" title="模型商店"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">模型商店</span></span></h3><div class="notion-text notion-block-89fe77efdfcc4addbc94ec2cdada7fbc">模型购买包含以下 2 个商店。</div><table class="notion-simple-table notion-block-45a2db1b3892410c8d85f284e416c334"><tbody><tr class="notion-simple-table-row notion-simple-table-header-row notion-block-4d13b1d9992e4ee2a196398b0aaa2e47"><td class="notion-simple-table-header-cell" style="width:133.88494110107422px"><div class="notion-simple-table-cell"><b>商店</b></div></td><td class="" style="width:549.9999923706055px"><div class="notion-simple-table-cell"><b>说明</b></div></td></tr><tr class="notion-simple-table-row notion-block-e1862be005b445b6b0d5d34c2892bc82"><td class="notion-simple-table-header-cell" style="width:133.88494110107422px"><div class="notion-simple-table-cell"><span class="notion-blue">B站商城</span></div></td><td class="" style="width:549.9999923706055px"><div class="notion-simple-table-cell">仅售卖<b>个人</b>使用的模型。无法开具发票。</div></td></tr><tr class="notion-simple-table-row notion-block-d4dc2b6994db443d8df7478513eef3b7"><td class="notion-simple-table-header-cell" style="width:133.88494110107422px"><div class="notion-simple-table-cell"><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com">Lemon Squeezy</a></b></span></div></td><td class="" style="width:549.9999923706055px"><div class="notion-simple-table-cell">售卖<b>商用</b>和<b>个人</b>使用的模型。平台可以开具发票（美式发票）。</div></td></tr></tbody></table><div class="notion-text notion-block-ccc6c7b772e94246b4f724ebb83934e1">接下来给出更具体的说明。</div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-2be42698299748e8b79381375176602a" data-id="2be42698299748e8b79381375176602a"><span><div id="2be42698299748e8b79381375176602a" class="notion-header-anchor"></div><a class="notion-hash-link" href="#2be42698299748e8b79381375176602a" title="购买数学公式检测（MFD）模型"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">购买数学公式检测（MFD）模型</span></span></h3><div class="notion-sync-block notion-block-3910ef88c9574d45b1a825f3edba7354"><div class="notion-text notion-block-eef1482fb6144f13acb5c5b7b01d02ad">以下是不同版本的购买链接。建议先使用 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/spaces/breezedeus/Pix2Text-Demo">在线 Demo </a></b></span>验证效果后再购买。各个版本的 License 不同，请点击表格内链接查看商品明细。遇到问题可以 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/join-group">联系作者</a></b></span> 。企业版同时包含 MFD 和 MFR 模型，无需重复购买。</div></div><div class="notion-sync-block notion-block-f5a0ced7f3314966bfefb172ac749a9c"><table class="notion-simple-table notion-block-ed4f8eeef7b44397ba0722c632a8d430"><tbody><tr class="notion-simple-table-row notion-teal notion-simple-table-header-row notion-block-c1198b55f0b34c0d9ae79a5da09d672c"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell">检测<b>模型版本</b></div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">企业购买</div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell">个人<b>购买</b></div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell"><b>对星球会员</b></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell"><b>免费可下载</b></div></td></tr><tr class="notion-simple-table-row notion-block-247c0110d33180bdb087d61c81ac4917"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code"><b>MFD-Advanced-1.5</b></code></div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">✖️</div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/ef80ff41-b113-4bf0-9516-7f44e49a6bba">Lemon Squeezy</a></b></span></div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://t.zsxq.com/FEYZRJQ">免费获取</a></b></span></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">✔️</div></td></tr><tr class="notion-simple-table-row notion-block-247c0110d331807f8cc5fd542ff7f173"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code"><b>MFD-Pro-1.5</b></code></div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/bf5fecfa-64ca-40a3-89e5-ac019ae39a31">企业 Pro 版</a></b></span><span class="notion-blue"><b>
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/54ecbde4-40e5-47cd-919d-c6ac6201523a">企业 Plus 版</a></b></span></div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://mall.bilibili.com/neul-next/detailuniversal/detail.html?isMerchant=1&amp;page=detailuniversal_detail&amp;saleType=10&amp;itemsId=12805387&amp;loadingShow=1&amp;noTitleBar=1&amp;msource=merchant_share">bilibili 商城</a></b></span><span class="notion-blue"><b>
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/c8f2360b-cd46-4bf3-89be-6e1a2828137a">Lemon Squeezy</a></b></span></div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://t.zsxq.com/FEYZRJQ">个人购买八折</a></b></span></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">✖️</div></td></tr><tr class="notion-simple-table-row notion-block-835a71a44ceb4336810ca6e4348c5731"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code">mfd-advanced</code></div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">✖️</div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b>
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/10953ec3-f903-42fa-996f-3c163b17bef8">Lemon Squeezy</a></b></span></div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://t.zsxq.com/wtH9m">免费</a></b></span></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">✖️</div></td></tr><tr class="notion-simple-table-row notion-block-0336ca7f50ff443aa100b70ff3fa39c2"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code">mfd-pro</code></div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/bf5fecfa-64ca-40a3-89e5-ac019ae39a31">Enterprise Pro</a></b></span><span class="notion-blue"><b> 
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/54ecbde4-40e5-47cd-919d-c6ac6201523a">Enterprise Plus</a></b></span></div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://mall.bilibili.com/neul-next/detailuniversal/detail.html?isMerchant=1&amp;page=detailuniversal_detail&amp;saleType=10&amp;itemsId=11883911&amp;loadingShow=1&amp;noTitleBar=1&amp;msource=merchant_share">B站</a></b></span><span class="notion-blue"><b> 
 </b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/0f114ee9-6ba3-4ca0-849b-1f7dabf20cbb">Lemon Squeezy</a></b></span></div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell">✔️ <span class="notion-orange"><b>个人使用 B站购买八折</b></span></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">✖️</div></td></tr></tbody></table></div><div class="notion-callout notion-gray_background_co notion-block-87b2c54be9a04f2da98222eeb2dbf42d"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="📌">📌</span></div><div class="notion-callout-text">以上模型只兼容 Pix2Text ≥ V1.1.4。</div></div><div class="notion-sync-block notion-block-797568c06f5a4806ab05e3b87c797f22"><div class="notion-text notion-block-634f5aa774d64762a101272f940d2f95">购买后的使用说明见：</div><ul class="notion-list notion-list-disc notion-block-247c0110d33180989f15f17292cb06ca"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/pix2text-model-1.5">Pix2Text 新版数学公式检测和识别模型：V1.5</a></b></span><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/pix2text-model-1.5"> </a>。</li></ul><ul class="notion-list notion-list-disc notion-block-247c0110d331806f9df5c1fc15c10030"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/p2t-mfd-v1.1.1">Pix2Text V1.1.1 新版发布，带来更好的数学公式检测模型</a></span>。</li></ul></div><div class="notion-blank notion-block-063a4021a47f44a195bdce7434eeb169"> </div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-6d66ed25050446889a4192ff2e55eb56" data-id="6d66ed25050446889a4192ff2e55eb56"><span><div id="6d66ed25050446889a4192ff2e55eb56" class="notion-header-anchor"></div><a class="notion-hash-link" href="#6d66ed25050446889a4192ff2e55eb56" title="购买数学公式识别（MFR）模型"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">购买数学公式识别（MFR）模型</span></span></h3><div class="notion-sync-block notion-block-3910ef88c9574d45b1a825f3edba7354"><div class="notion-text notion-block-eef1482fb6144f13acb5c5b7b01d02ad">以下是不同版本的购买链接。建议先使用 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://huggingface.co/spaces/breezedeus/Pix2Text-Demo">在线 Demo </a></b></span>验证效果后再购买。各个版本的 License 不同，请点击表格内链接查看商品明细。遇到问题可以 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/join-group">联系作者</a></b></span> 。企业版同时包含 MFD 和 MFR 模型，无需重复购买。</div></div><table class="notion-simple-table notion-block-7ca461d814034fbc834df508a67bac75"><tbody><tr class="notion-simple-table-row notion-teal notion-simple-table-header-row notion-block-ea36b0ce01e349caad9ce95a07bd00f5"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell"><b>识别模型版本</b></div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">企业购买</div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell">个人<b>购买</b></div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell"><b>对星球会员</b></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell"><b>免费可下载</b></div></td></tr><tr class="notion-simple-table-row notion-block-247c0110d331800ab0f5f69b415fa5b7"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code"><b>MFR-Pro-1.5</b></code></div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/bf5fecfa-64ca-40a3-89e5-ac019ae39a31">企业 Pro 版</a></b></span><span class="notion-blue"><b>
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/54ecbde4-40e5-47cd-919d-c6ac6201523a">企业 Plus 版</a></b></span></div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://mall.bilibili.com/neul-next/detailuniversal/detail.html?isMerchant=1&amp;page=detailuniversal_detail&amp;saleType=10&amp;itemsId=12805401&amp;loadingShow=1&amp;noTitleBar=1&amp;msource=merchant_share">bilibili 商城</a></b></span><span class="notion-blue"><b>
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/ab343594-fe6c-4f92-89c9-6c1682f84ff4">Lemon Squeezy</a></b></span></div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://t.zsxq.com/FEYZRJQ">个人购买八折</a></b></span></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">✖️</div></td></tr><tr class="notion-simple-table-row notion-block-32c56daf04e94e3e9f667dd722486e93"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code">mfr-pro</code></div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/bf5fecfa-64ca-40a3-89e5-ac019ae39a31">Enterprise Pro</a></b></span><span class="notion-blue"><b> </b></span></div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://mall.bilibili.com/neul-next/detailuniversal/detail.html?isMerchant=1&amp;page=detailuniversal_detail&amp;saleType=10&amp;itemsId=11884166&amp;loadingShow=1&amp;noTitleBar=1&amp;msource=merchant_share">B站</a></b></span><span class="notion-blue"><b>
</b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/ab9171ff-c659-4932-afde-1eaeb680805d">Lemon Squeezy</a></b></span></div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell">✔️ <span class="notion-orange"><b>个人使用 B站购买八折</b></span></div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">✖️</div></td></tr><tr class="notion-simple-table-row notion-block-d83ba186f3a54347a306270574c31de5"><td class="notion-simple-table-header-cell" style="width:129.14630126953125px"><div class="notion-simple-table-cell"><code class="notion-inline-code">mfr-plus</code></div></td><td class="" style="width:145.99431610107422px"><div class="notion-simple-table-cell">✔️ <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/54ecbde4-40e5-47cd-919d-c6ac6201523a">Enterprise Plus</a></b></span></div></td><td class="" style="width:141.99715423583984px"><div class="notion-simple-table-cell">✖️</div></td><td class="" style="width:123.99573516845703px"><div class="notion-simple-table-cell">✖️</div></td><td class="" style="width:120px"><div class="notion-simple-table-cell">✖️</div></td></tr></tbody></table><div class="notion-callout notion-gray_background_co notion-block-145fb4d65c574ca89db543f8ebcc60d2"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="📌">📌</span></div><div class="notion-callout-text">以上模型兼容 Pix2Text V1.0、V1.1、V1.1.* 。</div></div><div class="notion-sync-block notion-block-797568c06f5a4806ab05e3b87c797f22"><div class="notion-text notion-block-634f5aa774d64762a101272f940d2f95">购买后的使用说明见：</div><ul class="notion-list notion-list-disc notion-block-247c0110d33180989f15f17292cb06ca"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/pix2text-model-1.5">Pix2Text 新版数学公式检测和识别模型：V1.5</a></b></span><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/pix2text-model-1.5"> </a>。</li></ul><ul class="notion-list notion-list-disc notion-block-247c0110d331806f9df5c1fc15c10030"><li><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/p2t-mfd-v1.1.1">Pix2Text V1.1.1 新版发布，带来更好的数学公式检测模型</a></span>。</li></ul></div><div class="notion-blank notion-block-87fa12eaf6364e0480de14c88a4670d2"> </div><div class="notion-sync-block notion-block-156d75bdeccb4674b0b104485402bca9"><div class="notion-text notion-block-facf5ea0cd3240c2b71b6d38c916d899"><b>Pix2Text V1.0+ 包含两种企业版</b>。它们的权益差异见下图。<span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/bf5fecfa-64ca-40a3-89e5-ac019ae39a31">企业 Pro 版</a></b></span><span class="notion-blue"><b> </b></span>是一次性购买，之后有新模型需要重新购买。<b>企业 Pro 版 </b>只允许企业内部使用或者对外提供免费的服务（如教育机构），不允许对外提供付费服务。<span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com/buy/54ecbde4-40e5-47cd-919d-c6ac6201523a">企业 Plus 版</a></b></span><span class="notion-blue"><b> </b></span>购买后一年内可以免费获取所有的新模型。<b>企业 Plus 版 </b>除了提供 Pro 模型外也提供<b> Plus 版 </b>模型，同时提供所有模型的 PyTorch 版本，企业可以基于这些模型利用自己的数据进行模型精调，或者转换为需要的其他模型格式（如 CoreML等）。<b>企业 Plus版 </b>允许企业对外提供付费服务。</div><div class="notion-text notion-block-c71846b650d64b0a980bb08fdf42ea4f">更详细说明请见 <b><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com">模型购买商店</a></span></b>（进入商品的详情页有具体说明）。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-04ee1e756dfd43ffadfb10b8b9d0acae"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:624px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F9341931a-53f0-48e1-b026-0f1ad17b457c%2F1ac9ab05-09eb-4328-9657-c1b9526d5e1e%2FUntitled.jpeg?table=block&amp;id=04ee1e75-6dfd-43ff-adfb-10b8b9d0acae&amp;t=04ee1e75-6dfd-43ff-adfb-10b8b9d0acae&amp;width=624&amp;cache=v2" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-blank notion-block-76f4b1e597c54fda835dd5656d3f4cbe"> </div><div class="notion-text notion-block-faff9c7bf4e4415486c6c2f1bcd0f325"><b>购买链接</b>见：<b><span class="notion-blue"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://ocr.lemonsqueezy.com">模型购买商店</a></span></b>（进入商品的详情页有具体说明）。</div></div><div class="notion-blank notion-block-30ddb4fd3af647e3b060d32ad67a82b2"> </div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-46b6138f29ce456391c09e83c485ba30" data-id="46b6138f29ce456391c09e83c485ba30"><span><div id="46b6138f29ce456391c09e83c485ba30" class="notion-header-anchor"></div><a class="notion-hash-link" href="#46b6138f29ce456391c09e83c485ba30" title="购买后使用说明"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">购买后使用说明</span></span></h3><div class="notion-text notion-block-6589e3cdabc04e1ab7525489f7db8d28">各个版本的 Pix2Text 的使用说明如下（推荐使用最新的版本）：</div><ul class="notion-list notion-list-disc notion-block-247c0110d33180b385ccd6a33de0744f"><li><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/pix2text-model-1.5">Pix2Text 新版数学公式检测和识别模型：V1.5</a></b></span></li></ul><ul class="notion-list notion-list-disc notion-block-acefd0f932f240b8bd8869c4ba723c12"><li>如果安装的是 <b>Pix2Text V1.1.1</b>，使用说明见<span class="notion-blue"><b> </b></span><span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/p2t-mfd-v1.1.1">Pix2Text V1.1.1 新版发布，带来更好的数学公式检测模型</a></b></span>。</li></ul><ul class="notion-list notion-list-disc notion-block-3987a5fda6e64e5cbc7ce79a4ea7a5de"><li>如果安装的是 <b>Pix2Text V1.1</b>，使用说明见 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/p2t-v1.1">Pix2Text V1.1 新版发布，支持 PDF 转 Markdown</a></b></span>。</li></ul><ul class="notion-list notion-list-disc notion-block-c5898a4914114c3aa6320daff69adba7"><li>如果安装的是 <b>Pix2Text V1.0</b>，使用说明见 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/p2t-v1.0">Pix2Text V1.0 新版发布：最好的开源公式识别模型</a></b></span>。</li></ul><div class="notion-blank notion-block-e8aebc233b054d40813e5612d69ee9bc"> </div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-4cabddb92b5441d691ecad470d12da0b" data-id="4cabddb92b5441d691ecad470d12da0b"><span><div id="4cabddb92b5441d691ecad470d12da0b" class="notion-header-anchor"></div><a class="notion-hash-link" href="#4cabddb92b5441d691ecad470d12da0b" title="代码库"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">代码库</span></span></h2><ul class="notion-list notion-list-disc notion-block-6adceb533ebd436487d0a050ea3fc93e"><li><b>Github</b>：<a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/breezedeus/Pix2Text">https://github.com/breezedeus/Pix2Text</a></li></ul><ul class="notion-list notion-list-disc notion-block-ebd526dfc0574cd395c47cb6b43b1e04"><li><b>（国内）Gitee</b>：<a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://gitee.com/breezedeus/pix2text">https://gitee.com/breezedeus/pix2text</a></li></ul><div class="notion-blank notion-block-15cca36a11b94cdf85784d1f2da9d149"> </div><div class="notion-callout notion-orange_background_co notion-block-ff2cac9d7eb34eb181d17a794974341f"><div class="notion-page-icon-inline notion-page-icon-span"><span class="notion-page-icon" role="img" aria-label="📌">📌</span></div><div class="notion-callout-text">P2T 利用 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/cnocr">CnOCR</a></b></span> 或 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://github.com/JaidedAI/EasyOCR">EasyOCR</a></b></span><span class="notion-blue"><b> </b></span>识别图片中的文字部分，CnOCR 相关内容参考 <span class="notion-blue"><b><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.breezedeus.com/article/cnocr">这里</a></b></span>。</div></div><div class="notion-blank notion-block-ff5499ad39884cad96f29a18859e3bef"> </div></main></div>]]></content:encoded>
        </item>
    </channel>
</rss>