H
Howardism
Plate IIEntities中文HOWARDISM

Claude Opus 4.7

PublishedApril 17, 2026FiledEntityDomainEntitiesTagsEntityClaudeAnthropicLLM ModelReading8 minSourceAI-synthesised

GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokenizer inflation, new `xhigh` effort, first post-Glasswing safeguards

Illustration for Claude Opus 4.7

Sources#

Summary#

Claude Opus 4.7 is Anthropic's general-availability frontier model released as a direct upgrade to Opus 4.6 (same pricing: $5/M input, $25/M output; model ID claude-opus-4-7). It advances on advanced software engineering, literal instruction following, high-resolution vision, and file-system memory, while remaining less broadly capable than the limited-release Claude Mythos Preview. It is the first model to ship Mythos-class cyber safeguards under Project Glasswing.

Details#

Capability Deltas vs. Opus 4.6#

  • Software engineering on hardest tasks: marketed explicitly for "hand off your hardest coding work." SOTA on Finance Agent, GDPval-AA; improved on SWE-bench Verified/Pro/Multilingual (improvement holds after excluding memorization-flagged problems).
  • Instruction following — literal: substantially more literal. Anthropic warns that prompts tuned for earlier models "can sometimes now produce unexpected results" because Opus 4.7 no longer skips or loosely interprets parts. Retuning is a required migration step, not optional.
  • Multimodal: accepts images up to 2,576 px on long edge (~3.75 MP, >3× prior Claude models). Enables dense-screenshot reading (computer-use), complex-diagram extraction, pixel-precise references. Model-level change, not an API parameter.
  • File-system memory: better at using filesystem-backed memory across long multi-session work; needs less up-front context on follow-up tasks.
  • Safety: similar overall profile to 4.6. Better on honesty and prompt-injection resistance; modestly weaker on over-detailed harm-reduction advice for controlled substances. "Largely well-aligned and trustworthy, though not fully ideal." Mythos Preview remains the best-aligned model by Anthropic's evaluations.

Token-Economics Changes (Migration Hazard)#

Two compounding effects increase token consumption:

  1. Updated tokenizer: same input maps to 1.0–1.35× more tokens depending on content type.
  2. Thinks more at higher effort levels, particularly on later turns in agentic settings — more output tokens.

Anthropic claims the net is favorable on their internal coding eval across effort levels, but explicitly recommends measuring on real traffic. Users can counter via the effort parameter, task budgets, or explicit conciseness prompting. Direct hit on the context-window-as-primary-constraint theme in Claude Code Best Practices; cross-reference the brevity-constraint findings in Scale-Dependent Prompt Sensitivity.

Effort Levels#

Introduces a new xhigh ("extra high") effort level sitting between high and max. Tradeoff surface: reasoning depth vs. latency/tokens on hard problems.

  • Claude Code default raised to xhigh on all plans.
  • Anthropic recommends starting coding/agentic use at high or xhigh.

Cyber Capabilities and Safeguards#

  • Opus 4.7 is the first post-Glasswing model and ships with safeguards "that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses."
  • Cyber capabilities were differentially reduced during training (not merely filtered at inference).
  • Still less capable than Mythos Preview on cyber; CyberGym score updated (harness improvement changed Opus 4.6 baseline from 66.6 → 73.8).
  • Legitimate security researchers (vuln research, pentest, red-teaming) are routed through the new Cyber Verification Program rather than default access.

This directly fulfills the roadmap promise stated in LLM-Driven Vulnerability Research: "Upcoming Claude Opus model will ship with new safeguards developed against Mythos-class outputs."

Accompanying Launches#

  • Task budgets (public beta, API): developer-guided token-spend allocation across longer runs — a server-surfaced analogue to the budget lever in Client-Side Agent Optimization's combo space.
  • /ultrareview slash command in Claude Code: dedicated review session that reads changes and flags bugs/design issues. Pro and Max users get three free ultrareviews.
  • Auto mode extended to Max users (previously Team-only research preview).

Availability#

  • All Claude products, Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry.
  • API model ID: claude-opus-4-7.
  • Pricing unchanged from Opus 4.6.

Connections#

  • Claude Code Best Practices — Opus 4.7 is the runtime most Claude Code work will target; its literal-instruction-following and tokenizer inflation amplify the context-window-as-primary-constraint framing
  • Claude Code Auto Mode — auto mode was already extended to Opus 4.6; Opus 4.7 ships with it extended to Max users
  • LLM-Driven Vulnerability Research — Opus 4.7 operationalizes the "safeguards developed against Mythos-class outputs" commitment from the Mythos Preview disclosure
  • Client-Side Agent Optimization — the improved instruction-following may reduce Opus-as-planner failures documented on 4.6 (open question); task budgets echo AgentOpt's budget lever server-side
  • Scale-Dependent Prompt Sensitivity — literal instruction following might dampen elaboration-driven overthinking, but xhigh-default and "thinks more at higher effort" cut the other way. Needs empirical recheck before assuming brevity findings carry over
  • Agent Harness Engineering — better file-system memory strengthens the case for repo-local versioned artifacts as the agent's primary memory surface
  • Mythos Model — preview-tier successor used internally; Boris Cherny: "we use a little bit of Mythos and a lot of Opus 4.7"
  • Claude Opus 4.8 — direct successor (May 2026); improves on nearly every eval and on most alignment measures; a helpful-only variant of 4.7 serves as an investigator model in 4.8's behavioral audit, and 4.7 grades 4.8's constitution-adherence eval
  • Harness Shrinkage as Models Improve — Opus 4.7 is the model whose spontaneous loop-starting and natural to-do-list use motivate the shrinkage thesis; Cat Wu's pruning discipline runs at every release of this lineage
  • Agent Loop Pattern/loop becomes natural model behavior at 4.7 per Boris Cherny's report
  • Claude Code — primary product surface targeting this model
  • Model Spec Midtraining (MSM) — Opus 4.6/4.7 used by the May 2026 MSM paper as the data-generation model for synthetic spec documents and AFT data
  • Synthetic Document Finetuning (SDF) — Opus is the workhorse generator for SDF/MSM corpora across Anthropic alignment work
  • TML-Interaction-Small — era-mate (mid-2026 frontier from a different lab); 4.7's xhigh effort tier mirrors the minimal/xhigh tiers of GPT-realtime-2.0 used as a baseline in TML's interaction benchmarks
  • AI-Accelerated Offense — Opus 4.7's post-Glasswing safeguards are the model-side response to the accelerated-offense threat landscape the Zero Trust framework addresses
  • Build for the Next Model — Opus 4.7 is the concrete release that closed Claude Design's unsolved prototype gaps — Dan Carey's retrospective proof of the "build for the next model" bet
  • Claude DesignAnthropic Labs product whose early-prototype capability gaps were fixed by this release rather than by engineering

Open Questions#

  • Do Hakim's (2026) brevity-constraint findings on Opus 4.6 replicate on Opus 4.7, or does the literal-instruction-following change the elasticity? Specifically: does <50 words still yield +13.1pp on GSM8K?
  • Does Opus 4.7 still underperform as a planner in HotpotQA-style combo sweeps, or does improved instruction-following close the gap that AgentOpt (Hua et al., 2026) identified?
  • What is the real-world token-inflation multiplier on typical Claude Code sessions (1.0–1.35× is content-dependent — what's the distribution on code-heavy vs. prose-heavy inputs)?
  • How does xhigh compare to max on coding evals? The migration guidance says "start with high or xhigh" — is max ever worth it for coding?
  • What fraction of existing CLAUDE.md / system-prompt hedges become counterproductive under literal instruction following?

Connections#

  • Jagged Intelligence (Ghosts, Not Animals)Karpathy's "Opus 4.7 will refactor a 100K-line codebase or find zero-days, yet tell me to walk to a car wash 50m away to wash my car" is the canonical jaggedness example at this model's capability level

Derived#

Sources#

§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 27
  • Agent Harness Engineering

    Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…

  • AI-Accelerated Offense

    Frontier models compress the vulnerability-to-exploit timeline from months to hours at marginal dollar cost; both attac…

  • Anthropic

    AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…

  • Automated Behavioral Audit

    Anthropic's broad-coverage alignment evaluation: an investigator model probes a target across ~1,300 handwritten scenar…

  • Build for the Next Model

    Prototype the thing that almost works, not the thing that already works: bet that the next concrete model release (not…

  • Capability-Gated Model Fallback

    Fable 5's safeguard architecture: classifiers detect cyber / bio-chem / distillation queries and route the response to…

  • Claude Character as Product

    Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…

  • Claude Code

    Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…

  • Claude Code Auto Mode

    Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground betw…

  • Claude Code Best Practices

    Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…

  • Claude Design

    Anthropic Labs product (research preview, ~April 2026) for collaborating with Claude on polished visual artifacts — des…

  • Claude Opus 4.8

    Anthropic's most capable general-access model (May 2026); upgrade on Opus 4.7 in SWE/agentic/knowledge work; does not a…

  • Client-Side Agent Optimization

    AgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server…

  • Harness Shrinkage as Models Improve

    Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…

  • Interaction Models

    Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…

  • Interactivity Benchmarks

    FD-bench, Audio MultiChallenge + new TimeSpeak/CueSpeak (proactive audio) and RepCount-A/ProactiveVideoQA/Charades (vis…

  • LLM-Driven Vulnerability Research

    Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and An…

  • Entities — People, Orgs, Tools & Projects

    Map of Content for all 32 entity pages. See Home for concept domains.

  • Model Spec Midtraining (MSM)

    New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…

  • Model Welfare Assessment

    Anthropic's first-class framework for assessing whether and how a Claude model fares — drawing on internal states, beha…

  • Mythos Model

    Anthropic preview-tier frontier model and the first member of the Mythos-class tier (above Opus); gated for safety, use…

  • Open Questions Backlog

    _96 pages with open questions, as of 2026-06-14._

  • Opus 4.6 → 4.7 Changes and Multi-Agent Coding Considerations

    4.6→4.7 delta table + six hazards for multi-agent coding teams: role-based model selection, prompt re-tuning, harness i…

  • Scale-Dependent Prompt Sensitivity

    Large models underperform small ones on 7.7% of standard benchmarks due to overthinking; brevity constraints recover 26…

  • Synthetic Document Finetuning (SDF)

    Wang et al. 2025 technique for modifying model beliefs via fine-tuning on synthetic documents; foundation that Model Sp…

  • TML-Interaction-Small

    TML's first interaction model: 276B MoE / 12B active, audio+video+text in / text+audio out, 200ms micro-turns, async ba…

  • When to Use Claude Opus 4.6 for Work

    Decision rules for Opus 4.6 deployment: solver-not-planner, elaboration-load-bearing tasks, brevity constraints, Pareto…

Related articles
  • Anthropic

    AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…

  • Harness Shrinkage as Models Improve

    Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…

  • LLM-Driven Vulnerability Research

    Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and An…

  • Agent Harness Engineering

    Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…

  • Claude Code Best Practices

    Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…