H
Howardism
Plate IIAI Engineering中文HOWARDISM

Verification as the New Bottleneck

PublishedMay 23, 2026FiledConceptDomainAI EngineeringTagsAgent EngineeringAI Coding WorkflowAI Native OrgReading5 minSourceAI-synthesised

Fiona Fung: coding is no longer the bottleneck — verification, review, maintenance are; shift-left; TDD loses its tax; PR-cycle-time funnel analysis

Illustration for Verification as the New Bottleneck

Sources#

Summary#

Fiona Fung's central claim from running Claude Code + Cowork engineering: for years, engineering bandwidth was the expensive resource — planning, reviews, and process all existed to protect it. Once agentic coding made coding cheap, the bottleneck moved to verification, review, and maintenance. "On the Claude Code team, coding is really not the slow part anymore." The new scarce resource is confidence that the change is correct — and it gets scarcer precisely because bandwidth (and therefore throughput) exploded.

Why verification is now the constraint#

Three forces converge:

  • Volume. Bandwidth increased so much that "we have to pay even more attention to: is it correct."
  • Blurring roles. More people (designers, managers, PMs) now check in changes, so everyone needs confidence their change is correct.
  • Maintenance cost. Higher throughput means more to maintain — the cost of maintenance becomes a first-class concern, not an afterthought.

This is the org-level mirror of Karpathy's The Verifiability Thesis ("LLMs automate what you can verify") and the demand side of Harness Shrinkage as Models Improve (prompt scaffolding shrinks; mechanical verification stays load-bearing).

TDD loses its tax#

A vivid sign of the shift: TDD used to feel like "eating broccoli" — write the failing test first, verify it fails, then fix. With Claude, Fung found it "so much more fun and pleasurable… it took the tax out of test-driven development." The economics flipped: when writing the test is nearly free, the discipline that grounds verification (a test that provably fails, then passes) is pure upside. (Cf. the tdd / red-green-refactor discipline; the failing-test-first step is the verifier.)

Shift left#

Her recurring phrase: shift left — catch problems closer to the source via automation, not after a customer hits them. "What's better than me running into the bug first? Having automation in place to catch it closer to the source." As throughput rises, the only way verification keeps up is by being automated and early rather than manual and late.

Who reviews — and the human-in-the-loop line#

Before shipping Claude Code's own code-review feature, "how do you keep up with code reviews?" was her most-asked question. The answer: Claude Code review handles style, lint, obvious bugs, and spec-drift (if you check the spec into the codebase, "Claude is very good about verifying against spec drift"). But humans stay in the loop where it matters: legal review, risk tolerance, trust boundaries — "trust but verify, and where humans bring needed expertise." The division of labor: automate the mechanical verification, reserve human judgment for risk and trust-boundary calls. (Cf. Deep Modules for Agents: reviewer in a fresh context.)

Measuring the shift (and a trap)#

Signals she watches: onboarding ramp-up time ↓, PR cycle time ↓, Claude-assisted commits ↑ ("I haven't seen a commit that wasn't Claude-assisted in months"). The trap: don't read end-to-end PR cycle time alone — break it into funnel chunks. If cycle time isn't dropping, it may not be low AI adoption; it could be CI/build systems jamming under the new throughput. And throughput isn't the goal — "find some way to measure whatever you're actually trying to solve," not just velocity.

Connections#

Derived#

Open Questions#

  • Fung's own open question: "How far do you push fully automated reviews?" — where's the speed/safety balance, and how do you keep humans confident without re-introducing the review bottleneck?
  • If CI/build is the hidden jam, does verification infrastructure (test runners, CI capacity) become the actual capex of an AI-native org?

Sources#

§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 26
  • Agentic Honesty & Diligence

    As models get more capable, failing to surface decision-relevant information shifts from a capability failure to an ali…

  • AI Accelerating AI Development

    The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…

  • AI Brain Fry

    Kropp et al. 2026/03: mental fatigue from excessive AI oversight increases minor errors +11%, major errors +39%; cognit…

  • AI-Driven Formal Proof Search

    LLM generates Lean, compiler verifies every step → eliminates hallucination; DeepMind resolves 9/353 Erdős + 44/492 OEI…

  • AI-Native Product Org Bottlenecks

    AI-native product-org bottleneck is accountable taste at speed: dogfooding trains taste, evals encode it, and accountab…

  • Anthropic

    AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…

  • Building Is Cheap, Arguing Is Expensive

    "In technical debate, code wins": generate three PRs vs whiteboard; prototype over design doc; reduce design docs

  • Cat Wu

    Head of Product for Claude Code and Cowork at Anthropic; primary articulator of AI-native product cadence and engineer-…

  • Claude Code

    Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…

  • Claude Code Auto Mode

    Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground betw…

  • Code as Source of Truth

    Docs go stale at high coding throughput; check specs/skills into the repo; onboard via Claude; spec-drift verification

  • Deep Modules for Agents

    Ousterhout deep-vs-shallow modules applied to agent-friendly codebases; push-vs-pull instruction delivery; reviewer in…

  • Dogfooding as Product Discipline

    Product sense is built by relentless first-hand use ("ant food"); Mr. Peanut catch; cross-source (Cat Wu vibe-checks, G…

  • Evals as Product Spec

    Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done loo…

  • Fiona Fung

    Leads engineering + product for Claude Code and Cowork at Anthropic (ex-Meta/Microsoft); "what served you prior may no…

  • Harness Shrinkage as Models Improve

    Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…

  • Human-in-the-Loop Boundaries

    Humans belong at allocation, understanding, design-concept, risk, and accountability boundaries; they slow the system d…

  • AI Engineering & Agent Tooling

    Map of Content for the ai-engineering domain — 36 concepts. Curated entry point; see Home for all domains.

  • Open Questions Backlog

    _96 pages with open questions, as of 2026-06-14._

  • The PRD-Replacement Spectrum at AI-Native Speed

    Four positions (grill-then-PRD → lighter-PRD → build-to-decide → prototype-is-spec) are one spectrum once you decompose…

  • Product Velocity as Moat

    Shipping speed as differentiator + trust signal ("you'll scale with us"); a treadmill that must convert into durable lo…

  • Recursive Self-Improvement

    An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…

  • Research Taste as the Human Bottleneck

    The narrowing human role as AI absorbs execution: choosing which problems matter, which results to trust, and when an a…

  • The Verifiability Thesis

    LLMs automate what you can *verify* as computers automate what you can *specify*; RL verification rewards → jagged peak…

  • When Does Verification Quality Determine Whether AI Automation Works?

    Verification-quality ladder from Lean/formal proof search through software CI and vulnerability reproduction; autonomy…

  • Vibe Coding vs. Agentic Engineering

    Vibe coding raises the floor (anyone builds); agentic engineering preserves the quality bar while going faster; ">10x a…

Related articles
  • Harness Shrinkage as Models Improve

    Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…

  • Claude Code

    Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…

  • Fiona Fung

    Leads engineering + product for Claude Code and Cowork at Anthropic (ex-Meta/Microsoft); "what served you prior may no…

  • Agent Loop Pattern

    `/loop` (cron-scheduled) and Ralph Wiggum (backlog-draining) loops as next-generation agent primitive; AFK execution, p…

  • Evals as Product Spec

    Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done loo…