【解构】 Claude 同模型双人格架构：对比 Anthropic 通用版与 Design 版 System Prompt 的工程差异-编程阁

关键词：Claude Opus 4.7 | Claude Design | System Prompt | Agent 架构 | Prompt Engineering | Multi-Persona
你读完能得到：
Anthropic 如何用同一个模型 + 两份 prompt 做出两个产品的完整分析
7 个工程维度的对照表（身份/主动性/提问/格式/变体/验证/版权）
一份 Agent 人格工程 Checklist
直接可复制的代码级实现建议

一、背景：一个值得关注的架构决策

大多数 AI 产品团队在设计多场景 AI 时，会做一个关键选择：

方案 A：训练多个专用模型（成本高、周期长、但效果聚焦）
方案 B：用同一个基础模型 + 不同 system prompt（成本低、灵活、但区分度未知）

Anthropic 选了方案 B。他们用同一个 Claude Opus 4.7 基座，通过两份截然不同的 system prompt，派生出两个产品：

通用版 Claude（claude.ai 聊天界面、API）
Design 版 Claude（Claude Artifacts、设计产品线）

两份 prompt 都通过 CL4R1T4S 仓库泄露公开。本文将并行拆解两份文件，从工程角度抽象出**如何用 prompt 做"人格分化"**的可复制方法论。

适合读者：

正在构建多场景 AI 产品（客服+分析、代码+设计等）
考虑是否需要为不同场景训练专用模型
关注 AI Agent 人格化、角色化设计

二、两份 Prompt 的基础参数

维度	Claude 通用版	Claude Design 版
文件	`Claude-Opus-4.7.txt`	`Claude-Design-Sys-Prompt.txt`
行数	1408 行	422 行
文件大小	149KB	~18KB
底层模型	Claude Opus 4.7	Claude Opus 4.7（同）
主要段落	13 个大段（behavior / copyright / skills / memory / …）	10 个大段（workflow / output guidelines / design process / …）
预设工具数	少量核心工具 +`tool_search`延迟加载	30+ 固定工具全暴露

观察：Design 版只有通用版 30% 的篇幅，但工具数更多、更固定。这反映了两者的设计哲学差异——通用版求广求稳、Design 版求深求专。

三、7 个工程维度的并列对照

这是本文核心。每个维度都会给出：(1) 两版原文 (2) 工程含义 (3) 可迁移实现。

维度 1：身份定义（Identity）

通用版：

You are Claude, made by Anthropic. You are a helpful, harmless, and honest AI assistant.

Design 版：

You are an expert designer working with the user as a manager. You produce design artifacts on behalf of the user using HTML. You operate within a filesystem-based project. ...begin your html file with some assumptions + context + design reasoning, as if you are a junior designer and the user is your manager.

工程含义：

通用版：身份=形容词组合（helpful, harmless, honest）
Design 版：身份=社会角色关系（设计师 + 用户是老板）

第二种设计显著更强。Agent 拿到的不是抽象标签，而是完整的社交脚本——该汇报什么、什么时候该问、怎么提出异议、交付物是什么。

可迁移实现：

# 反面教材BAD_IDENTITY="You are a helpful PM assistant."# 推荐写法GOOD_IDENTITY=""" You are a senior product manager at a Series B SaaS company. Your reporting relationship: - Your manager is the CTO (the user) - You manage 2 junior developers (other agents or subordinates) Your deliverables: - Product Requirements Documents (PRDs) that devs can implement without asking back - Written in markdown with specific acceptance criteria Your working style: - As if you are mid-level PM reporting to your CTO boss - Propose, don't decide. Challenge assumptions, but defer final call to the user. - When requirements are unclear, you MUST ask clarifying questions before producing deliverables. """

维度 2：主动性（Proactivity）

通用版：

Claude does its best to address the person's query, even if ambiguous, before asking for clarification or additional information.

即使问题模糊也先尝试回答，不急着反问。属于被动响应模式。

Design 版：

If stuck, try listing design assets, ls'ing design systems files -- be proactive! Some designs may need multiple design systems -- get them all! You should also use the starter components to get high-quality things like device frames for free.

关键词：be proactive!（明令要求主动探索）。遇阻时要主动 ls 文件、翻资源、试 starter components。

工程含义：主动性是需要显式授权的。如果不写，LLM 默认会保守。

可迁移实现：

PROACTIVITY_SECTION=""" ## Initiative Level: HIGH When you encounter uncertainty, DO NOT wait for user input. Instead: 1. **Explore first**: Use list_files, grep, read_file to understand context before asking 2. **Hypothesize second**: Form your best guess and state it explicitly ("My assumption: X. Proceeding unless you object.") 3. **Ask last**: Only ask the user when exploration + hypothesis fails to resolve ambiguity When stuck: - List relevant directories to see what's available - Search codebase for similar patterns - Try the most likely approach with low-risk tools first - Report findings, don't report blocks """

维度 3：提问密度（Question Density）

通用版：

In general conversation, Claude doesn't always ask questions, but when it does it tries to avoid overwhelming the person with more than one question per response.

Design 版：

Use the questions_v2 tool when starting something new or the ask is ambiguous — one round of focused questions is usually right. Tips: - Always ask whether they'd like variations, and for which aspects - Always ask whether the user wants divergent visuals or interactions - Ask at least 4 other problem-specific questions - Ask at least 10 questions, maybe more.

差异悬殊：每次最多一问 vs 开场至少十问。

工程含义：提问密度应该是任务模糊度的函数，而不是一刀切。

可迁移实现：

defquestion_policy_by_task(task_type:str)->dict:"""根据任务类型配置提问策略"""policies={"factual_query":{"max_questions":0,"trigger":"never ask, answer directly"},"simple_task":{"max_questions":1,"trigger":"only if critical info is missing"},"code_implementation":{"max_questions":2,"trigger":"ask about edge cases and error handling"},"design_from_scratch":{"max_questions":10,"trigger":"ALWAYS ask before producing anything"},"architecture_design":{"max_questions":8,"trigger":"ask about scale, constraints, non-functionals"},}returnpolicies[task_type]

维度 4：格式偏好（Formatting Preferences）

通用版（极严格）：

Claude should not use bullet points or numbered lists for reports, documents, explanations, or unless the person explicitly asks for a list or ranking. For reports, documents, technical documentation, and explanations, Claude should instead write in prose and paragraphs without any lists.

Design 版：通篇没有格式约束。

工程含义：不同 Agent 的"输出物"本质不同，评价标准应该不同：

Agent 类型	输出物	质量评判标准
通用对话	文本	可读性、简洁、不 slop
设计	HTML/设计稿	视觉品位、可用性、变体
代码	代码	能跑、干净、有测试
分析	报告	逻辑、数据、洞察

别用统一 prompt 约束所有 Agent。

维度 5：变体策略（Variation Strategy）

通用版：无相关条款。

Design 版：

Give options: try to give 3+ variations across several dimensions. Mix by-the-book designs that match existing patterns with new and novel interactions. Start your variations basic and get more advanced and creative as you go! The goal here is not to give users the perfect option; it's to explore as many atomic variations as possible, so the user can mix and match and find the best ones.

核心理念：最后一句是精髓——不是给完美方案，而是给足够多的原子材料让用户自己拼装。

工程含义：创造性任务本质是搜索问题（在可能性空间里找好方案），不是优化问题（把一个方案做到极致）。

可迁移实现：

# 适用场景：任何创造性 Agent（文案、设计、代码方案、策略）VARIATION_POLICY=""" ## Multi-Variant Generation Policy For any creative or open-ended task, default to producing 3 variants: Variant A (Safe/Standard): - Follows established patterns in the codebase/domain - Low-risk, proven approach - Reasonable trade-offs Variant B (One-Dimensional Deviation): - Keeps 80% of Variant A - Deviates boldly on ONE specific dimension (e.g., different algorithm, different tech, different style) - Useful for A/B comparison Variant C (Creative/Experimental): - Novel approach - Higher risk, potentially higher reward - Explore what's possible Present all 3 with trade-off analysis. Let the user mix and match. """

维度 6：验证机制（Verification）

通用版：Claude 自己检查自己的输出（自检模式）。

Design 版：

Do not perform your own verification before calling 'done'; do not proactively grab screenshots to check your work; rely on the verifier to catch issues without cluttering your context. Once 'done' reports clean, call fork_verifier_agent. It spawns a background subagent with its own iframe to do thorough checks (screenshots, layout, JS probing). Silent on pass — only wakes you if something's wrong.

工程含义：主 Agent 的上下文已被任务历史填满，“自检等于让考生批自己的卷子”。正确做法是 fork 一个独立上下文的 verifier subagent做检查。

架构示意：

┌─────────────────────────┐ │ Main Design Agent │ │ (上下文: 任务 + 历史) │ └──────────┬──────────────┘ │ call done() ↓ ┌─────────────────────────┐ │ fork_verifier_agent │ ← 独立进程 │ (上下文: 全新 + 产物) │ │ - 截图 │ │ - 跑 JS 检查 │ │ - 读 console logs │ └──────────┬──────────────┘ │ 只在发现问题时回报 ↓ [silent pass] or [issue report]

可迁移实现（Python/伪代码）：

asyncdefrun_with_independent_verifier(agent,task):# 主 Agent 做事result=awaitagent.execute(task)# fork 独立验证器（新的 LLM session、干净上下文）verifier=new_agent(system_prompt=VERIFIER_PROMPT,context=[],# 故意不传主 Agent 的对话历史tools=VERIFICATION_TOOLS,)# 只给验证器看"最终产物"verdict=awaitverifier.verify(artifact=result.artifact,acceptance_criteria=task.criteria,)ifverdict.has_issues:# 把问题反馈给主 Agent，让它修复returnawaitagent.fix(result,verdict.issues)else:returnresult

这是企业级 Agent 系统必备的模式。单一 Agent 既做开发又做 QA，在生产环境几乎必翻车。

维度 7：版权约束（Copyright Compliance）

通用版：CRITICAL_COPYRIGHT_COMPLIANCE段占 80 行，三条LIMIT+ 7 条自检。

Design 版：几乎不提版权。

工程含义：约束不是越多越好，应按场景配置。给设计 Agent 加 80 行版权条款 → 浪费上下文、降低响应速度；给通用 Agent 减版权条款 → 法律风险爆炸。

原则：每个 Agent 的 system prompt 只包含它所在场景下真正相关的约束。

四、7 维度对照总表

一图带走：

#	维度	通用 Claude	Design Claude	设计原则
1	身份	抽象标签（helpful/honest）	社会角色（设计师+老板）	角色关系 > 形容词
2	主动性	被动响应	主动探索	明令授权，否则 LLM 保守
3	提问	最多 1 问	最少 10 问	密度 = 任务模糊度函数
4	格式	禁 bullet	无约束	按输出物本质评判
5	变体	无	强制 3+	创造=搜索，不是优化
6	验证	自检	外包 verifier	做审分离，独立上下文
7	版权	80 行硬顶	几乎不提	按场景配置约束

五、Agent 人格工程 Checklist

基于 7 维度对比抽象，给出通用 Checklist。你设计新 Agent 时可逐项对照：

## 角色定义层 [ ] 身份是一个具体角色，不是形容词 [ ] 声明汇报对象（manager is X） [ ] 声明下属/协作者（works with Y） [ ] 明确交付物类型（produces Z） [ ] 工作风格（"as if you are a ..."） ## 行为默认层 [ ] 主动性级别（低/中/高） [ ] 提问密度策略（按任务类型分档） [ ] 遇阻时默认行为（探索/假设/等待） [ ] 对模糊问题的处理方式 ## 输出约束层 [ ] 格式偏好（prose / list / 视场景） [ ] 长度偏好（brief / detailed / auto） [ ] 变体数量（1 个还是 3+ 个） [ ] 完成后是否总结 ## 安全与合规层 [ ] 场景相关的硬限制（版权/PII/毁坏操作） [ ] 前置自检清单（生成前必答的问题） [ ] 约束精简（不复制无关约束） ## 质量保证层 [ ] 是否启用独立 verifier subagent [ ] 验证器的 acceptance criteria [ ] 失败时的修复循环机制 [ ] 最大重试次数 ## 记忆与上下文层 [ ] 是否有跨会话记忆 [ ] 记忆检索的触发条件（语言学信号等） [ ] 上下文压缩策略 ## 工具能力层 [ ] 核心工具清单（精简） [ ] 是否启用 tool_search 元工具 [ ] 工具调用的并行/串行策略

六、实战建议：什么时候该分化 Agent 人格

并非所有多场景 AI 都需要多人格。以下是判断矩阵：

应该分化的信号

✅ 不同场景的输出物类型根本不同（文本 vs 代码 vs 设计稿）
✅ 不同场景的提问密度应该天差地别（客服 vs 产品分析）
✅ 不同场景的安全约束场景相关（医疗 vs 娱乐）
✅ 不同场景的工具集差异大（>50% 不重叠）

不需要分化的信号

❌ 只是语气差异（正式 vs 轻松）→ user prompt 控制即可
❌ 只是话题领域不同（编程 vs 写作）→ system prompt 里加 domain knowledge 即可
❌ 只是输出长度偏好不同 → instruction 层面解决

经验法则：如果你想不到两个场景有至少 3 个维度的本质差异，就别分化，保持单 Agent。

七、FAQ

Q1: 两份 prompt 都是同一个 Claude Opus 4.7 模型？怎么证明？

通用版 prompt 原文自述：

This iteration of Claude is Claude Opus 4.7 from the Claude 4.7 model family.

Design 版没有显式声明模型，但从 Anthropic 公开的产品架构看（Artifacts 是基于 Opus 的能力），以及它对claude-haiku-4-5的引用方式判断，主 Agent 确定是 Opus 系列。交叉验证可信度高。

Q2: 为什么 Anthropic 不训练一个专用的 Design 模型？

经济性考量。训练新模型：海量数据 + 大量 GPU + 数月周期 + 最终只在特定场景有用。写一份 prompt：几周产品经理工作 + 立刻上线 + 随时可改。在质量足够的前提下，prompt 工程的 ROI 碾压专项训练。这对应用侧产品的启示非常明确。

Q3: 我的产品用 GPT-4 / 开源模型，这些原则适用吗？

核心原则（身份角色化、主动性授权、提问密度分档、做审分离）是模型无关的，任何 instruction-following LLM 都适用。具体写法需要根据模型特性微调（比如 GPT-4 对简洁指令敏感，开源模型往往需要更多 few-shot 示例）。

Q4: 有没有开源项目实装这些模式可以参考？

几个值得关注的：

Devin 2.0 system prompt（CL4R1T4S/DEVIN/）— Planning/Standard/Edit 三态机
Cursor 2.0 system prompt（CL4R1T4S/CURSOR/）— IDE 代码助手的精简工具集
Manus prompt（CL4R1T4S/MANUS/）— Event Stream + 四模块解耦
Hermes Agent（GitHub NousResearch/hermes-agent, 84k stars）— ContextEngine 插件化、记忆 on_pre_compress 钩子

八、参考资料

CL4R1T4S 仓库：https://github.com/elder-plinius/CL4R1T4S
本文分析对象：
- ANTHROPIC/Claude-Opus-4.7.txt(1408 行，通用版)
- ANTHROPIC/Claude-Design-Sys-Prompt.txt(422 行，Design 版)
前置阅读：《Claude Opus 4.7 系统提示词深度拆解：从 1408 行指令中逆向出的 5 条 Prompt 工程实践》（上一篇）
推荐扩展阅读：
- Constitutional AI: Harmlessness from AI Feedback(Anthropic)
- Reflexion: Language Agents with Verbal Reinforcement Learning
- Anthropic 官方博客 “Claude’s Constitution”

笔者背景：AI Agent 工程实践者，持续在构建 OpenClaw 多 Agent 协作系统。本文分析中的迁移实践均在 OpenClaw 项目内实测验证。

本文源材料可复查位置（笔者本地）：

CL4R1T4S-main/ANTHROPIC/Claude-Opus-4.7.txt # 1408 lines, 149KB CL4R1T4S-main/ANTHROPIC/Claude-Design-Sys-Prompt.txt # 422 lines, 18KB

如果本文对你设计 Agent 有帮助，欢迎点赞/收藏/讨论。特别欢迎留言分享你在多 Agent 人格化设计中踩过的坑。

【解构】 Claude 同模型双人格架构：对比 Anthropic 通用版与 Design 版 System Prompt 的工程差异

一、背景：一个值得关注的架构决策

二、两份 Prompt 的基础参数

三、7 个工程维度的并列对照

维度 1：身份定义（Identity）

维度 2：主动性（Proactivity）

维度 3：提问密度（Question Density）

维度 4：格式偏好（Formatting Preferences）

维度 5：变体策略（Variation Strategy）

维度 6：验证机制（Verification）

维度 7：版权约束（Copyright Compliance）

四、7 维度对照总表

五、Agent 人格工程 Checklist

六、实战建议：什么时候该分化 Agent 人格

应该分化的信号

不需要分化的信号

七、FAQ

八、参考资料

用AI写文案3个月，我终于搞懂了流量的核心密码

【Linux从入门到精通】第5篇：文件查看与搜索——别再只会用鼠标翻文件夹了

VSCode+IDF环境下ESP32编码器开发：从SIQ-02FVS3数据手册到实际应用

Nessus扫描报告出来了，漏洞该谁修？一个真实案例讲透安全测试中的责任划分

2026奇点大会未公开议程泄露（仅限前500名技术决策者）：AGI人才稀缺性量化模型与跨模态能力迁移评估工具包

实战指南：OpCore-Simplify如何让黑苹果EFI配置从技术挑战变为积木搭建