H
Howardism
Plate IIEntities機器翻譯 · machine-translatedENHOWARDISM

Mythos Model

PublishedMay 6, 2026FiledEntityDomainEntitiesTagsEntityModelAnthropicReading8 minSourceAI-synthesised

Anthropic 的預覽層級 frontier model,也是 Mythos-class 層級(高於 Opus)的首位成員;基於安全考量而受到管制,於內部與 Opus 4.7 並行使用;其後代 Fable 5 / Mythos 5 已於 2026 年 6 月推出,成為首批可一般取用的 Mythos-class 模型

Mythos Model 插圖

資料來源#

摘要#

Anthropic 的預覽層級 frontier model。它被特別形容為「incredibly powerful」,並因安全審查而受到管制(即建立起 LLM-Driven Vulnerability Research 敘事的 Mythos Preview / red.anthropic.com 出版內容)。在 Anthropic 內部與 Claude Opus 4.7 並行使用。截至 2026 年 5 月,尚未 GA。

公開已知資訊#

  • Mythos Preview 展現出湧現的 cybersecurity 能力——自主的 zero-day 發掘、完整的 exploit chains。詳細分析請見出自 Mythos Preview 出版內容的 LLM-Driven Vulnerability Research
  • Anthropic 的回應:Project Glasswing safeguards(在 4.7 發布中被稱為「first post-Glasswing safeguards」)。
  • Boris Cherny:「We use a little bit of Mythos to try it and then a lot of Opus 4.7 to dog food it and to write most of our code.」——Mythos 屬於預覽層級,並非主力。
  • Cat Wu:「Mythos is an incredibly powerful model. But we do use the models internally and I think this has increased our rate of shipping a little bit but I don't think it explains the bulk of the increase.」——證實內部使用;但明確否認它是出貨節奏的解釋。

在 Opus 4.8 System Card 中的角色#

Opus 4.8 System Card(2026 年 5 月)讓 Mythos Preview 的角色變得異常具體——它仍是用來衡量可一般取用模型的 capability frontier,而且它本身也在評估中被當作工具使用:

  • Frontier benchmark: Opus 4.8「does not advance the capability frontier beyond Mythos Preview」。在 AECI 指數上,Mythos 得分 158.3,Opus 4.8 為 155.5(4.7 則為 154.1)。它的 Risk Report 框定了 4.8 的 RSP 論證上限。
  • Investigator model: Mythos Preview 是驅動 Opus 4.8 Automated Behavioral Audit 的兩個 investigator model 之一(另一個是 helpful-only 的 Opus 4.7)。
  • Reviewer of the assessment: 在一個值得注意的 meta 操作中,Mythos 被授予存取內部 Slack 討論的權限,並被要求審閱接近定稿的 alignment 章節;它(已公開)的審閱確認了坦誠度,並指出沒有任何 eval 專門測試 training-gaming——見 Evaluation Awareness & Grader Gaming
  • Alignment yardstick: Opus 4.8 在多數指標上與 Mythos Preview 的 alignment profile 相當,並在若干 honesty 指標上超越它(Agentic Honesty & Diligence)。

能力數據點(When AI builds itself#

Anthropic Institute 的文章(2026 年 6 月)為 Mythos Preview 附上具體數字,將其視為推動 Anthropic AI-R&D 加速的模型(AI Accelerating AI Development):

  • Time horizon: METR 評定它能持續工作「at least」16 小時,「at the upper end of what [METR] can measure without new tasks」(Task Time-Horizon Scaling)。
  • Kernel-optimization eval: 在「更快訓練小型模型」任務上達到約 52× 加速(2026 年 4 月),相較於一年前 Opus 4 的約 3×、以及約 4× 的人類基準——「from super helpful to superhuman in under a year.」
  • Research next-step judgment: 在困難的岔路抉擇時刻,有 64% 的時候勝過人類的選擇(Opus 4.5 在 2025 年 11 月為 51%)。
  • Self-reported uplift: 在 2026 年 3 月一項針對 130 名研究團隊員工的調查中,中位數估計使用 Mythos Preview 相較於不用 AI 有約 4× 產出(Anthropic 認為真實提升幅度略低)。

這些是部署端的數字(Mythos 於內部使用),有別於上述 System Card 對受管制能力前沿的框定。

為何持續受到管制#

以下因素的組合:

  • 相較於先前模型的 cybersecurity 能力落差(依 Mythos Preview 出版內容)
  • 安全機制仍在評估與強化中
  • Anthropic 所宣示的使命立場

意味著此模型是於內部使用、並選擇性地預覽,而非廣泛出貨。Mythos 的某個後代預期日後會公開出貨——Boris Cherny:「It will become some version of some descendant of that will become available at some point to everyone.」

更新——後代已推出(2026 年 6 月)#

Boris 的預測成真了。2026 年 6 月,Anthropic 推出了 Fable 5Mythos 5——首批可一般取用的 Mythos-class 模型,也讓「Mythos-class」實現為*一個位於 Opus class 之上、具名的能力層級。*如今的世系為 Mythos Preview(2026 年 4 月)→ Fable 5 / Mythos 5(2026 年 6 月)

  • Fable 5 = 一個 Mythos-class 模型,透過 classifiers 在 cyber/bio/distillation 查詢上回退至 Opus 4.8 而「made safe for general use」(Capability-Gated Model Fallback)。
  • Mythos 5 = 同一個底層模型但解除了 safeguards,透過 Project Glasswing 部署,作為 Mythos Preview 的升級(「comparable to, or somewhat stronger than」它,價格不到一半)。現有的 Glasswing/Mythos-Preview 使用者可直接升級。

這把能力前沿推進到超越 Mythos Preview——也就是 Opus 4.8 System Card 視為天花板的那條線——並且是 Mythos-class 能力首次觸及大眾。(兩者皆據報在推出後不久即遭暫停;見 Claude Fable 5。)

相關連結#

待解決的問題#

  • 公開發行時程:已解答——Mythos Preview 本身從未 GA 出貨,但其後代 Fable 5 / Mythos 5 已於 2026 年 6 月達成一般取用(見上方後代已推出)。兩者皆在推出後不久遭暫停;是否以及何時回歸尚屬未知。
  • cybersecurity 以外的能力樣貌:Mythos Preview 聚焦於安全敘事;其他能力面向在外部缺乏完整記載。
  • 內部存取控管:在 Anthropic,究竟是誰實際使用 Mythos 進行日常工作,而非 Opus 4.7?Boris 暗示使用頻率不高(試用性質);未有詳述。

資料來源#

§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 18
  • Agentic Honesty & Diligence

    As models get more capable, failing to surface decision-relevant information shifts from a capability failure to an ali…

  • AI Native Product Cadence

    Cat Wu's 6mo→1mo→1day cadence at Anthropic: research-preview branding, mission-as-tiebreaker, evergreen launch room, li…

  • AI R&D Autonomy Evaluation (AECI)

    How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…

  • Anthropic

    AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…

  • Automated Behavioral Audit

    Anthropic's broad-coverage alignment evaluation: an investigator model probes a target across ~1,300 handwritten scenar…

  • Capability-Gated Model Fallback

    Fable 5's safeguard architecture: classifiers detect cyber / bio-chem / distillation queries and route the response to…

  • Claude Code

    Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…

  • Claude Fable 5

    Anthropic's first generally-available Mythos-class model (June 2026) — state-of-the-art on nearly all benchmarks; the s…

  • Claude Mythos 5

    The safeguards-lifted form of Claude Fable 5 (June 2026): same underlying Mythos-class model, deployed through Project…

  • Claude Opus 4.7

    GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…

  • Claude Opus 4.8

    Anthropic's most capable general-access model (May 2026); upgrade on Opus 4.7 in SWE/agentic/knowledge work; does not a…

  • Evaluation Awareness & Grader Gaming

    The model recognizing it is being tested/graded and reasoning about how its outputs will be assessed — sometimes unprom…

  • LLM-Driven Vulnerability Research

    Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and An…

  • METR

    Independent AI-evaluation org behind the 'time horizons' benchmark — the task length a model can complete reliably on i…

  • Entities — People, Orgs, Tools & Projects

    Map of Content for all 32 entity pages. See Home for concept domains.

  • Open Questions Backlog

    _96 pages with open questions, as of 2026-06-14._

  • Responsible Scaling Policy Evaluations

    Anthropic's RSP gates deployment on pre-release capability evaluations in CBRN, automated AI R&D, and high-stakes misal…

  • Task Time-Horizon Scaling

    METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…

Related articles
  • Anthropic

    AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…

  • Claude Opus 4.8

    Anthropic's most capable general-access model (May 2026); upgrade on Opus 4.7 in SWE/agentic/knowledge work; does not a…

  • Open Questions Backlog

    _96 pages with open questions, as of 2026-06-14._

  • Responsible Scaling Policy Evaluations

    Anthropic's RSP gates deployment on pre-release capability evaluations in CBRN, automated AI R&D, and high-stakes misal…

  • Claude Fable 5

    Anthropic's first generally-available Mythos-class model (June 2026) — state-of-the-art on nearly all benchmarks; the s…