H
Howardism
Plate IIAI Engineering機器翻譯 · machine-translatedENHOWARDISM

驗證成為新的瓶頸

PublishedMay 23, 2026FiledConceptDomainAI EngineeringTagsAgent EngineeringAI Coding WorkflowAI Native OrgReading5 minSourceAI-synthesised

Fiona Fung:寫程式不再是瓶頸——驗證、審查、維護才是;shift-left;TDD 失去它的稅負;PR 週期時間的漏斗分析

Verification as the New Bottleneck 的插圖

資料來源#

摘要#

Fiona Fung 從帶領 Claude Code + Cowork 工程團隊得出的核心論點:多年來,工程頻寬一直是昂貴的資源——規劃、審查與各種流程的存在,都是為了保護它。一旦 agentic coding 讓寫程式變得廉價,瓶頸就移到了驗證、審查與維護。「在 Claude Code 團隊裡,寫程式真的不再是慢的那一環了。」新的稀缺資源是對改動正確性的信心——而且正因為頻寬(連帶吞吐量)爆炸式成長,它變得更加稀缺。

為什麼驗證如今成了約束條件#

三股力量匯聚在一起:

  • 數量。 頻寬增加得如此之多,以至於「我們得付出更多注意力去確認:它正確嗎。」
  • 角色界線模糊。 更多人(設計師、經理、PM)現在都會 check in 改動,因此每個人都需要對自己改動的正確性有信心。
  • 維護成本。 吞吐量更高意味著要維護的東西更多——維護成本變成一等公民的考量,而不再是事後才想到的事。

這是 KarpathyThe Verifiability Thesis(「LLM 自動化你能驗證的事」)在組織層級上的對照,也是 Harness Shrinkage as Models Improve 的需求面(prompt 鷹架縮小;機械式驗證仍是承重結構)。

TDD 失去它的稅負#

這個轉變有一個鮮明的徵兆:TDD 過去感覺像「吃花椰菜」——先寫會失敗的測試、確認它失敗、然後修好。有了 Claude,Fung 發現它「有趣與愉快得多……它把測試驅動開發的稅負拿掉了。」經濟學翻轉了:當寫測試幾乎是免費的,那個讓驗證有所依託的紀律(一個能被證明先失敗、再通過的測試)就純粹是上檔利益。(參照 tdd / red-green-refactor 紀律;先寫失敗測試這一步就是驗證器。)

Shift left#

她反覆出現的口頭禪:shift left——透過自動化在更靠近源頭處攔下問題,而不是等到客戶踩到了才處理。「有什麼比我先撞上 bug 更好?就是有自動化機制能在更靠近源頭處攔住它。」隨著吞吐量上升,驗證能跟上的唯一辦法,就是讓它自動化而且提早,而不是手動而且滯後。

誰來審查——以及 human-in-the-loop 的界線#

在推出 Claude Code 自己的 code-review 功能之前,「你們怎麼跟得上 code review?」是她最常被問到的問題。答案是:Claude Code review 處理風格、lint、明顯的 bug,以及 spec drift(如果你把 spec check 進 codebase,「Claude 非常擅長對照 spec drift 進行驗證」)。但在重要的地方,人類仍留在迴圈中:法務審查、風險容忍度、信任邊界——「信任但要驗證,並在人類能帶來必要專業之處交給人類。」分工是這樣的:把機械式驗證自動化,把人類判斷保留給風險與信任邊界的決定。(參照 Deep Modules for Agents:在全新 context 中的審查者。)

衡量這個轉變(以及一個陷阱)#

她關注的訊號:onboarding 上手時間 ↓、PR 週期時間 ↓、Claude 協助的 commit ↑(「我已經好幾個月沒看到不是 Claude 協助的 commit 了」)。這個陷阱:不要只看端到端的 PR 週期時間——要把它拆成漏斗區塊。如果週期時間沒在下降,原因未必是 AI 採用率低;也可能是 **CI/建置系統在新吞吐量下卡住了。**而且吞吐量本身不是目標——「找個辦法去衡量你真正想解決的東西」,而不只是速度。

相關連結#

衍生內容#

待解決的問題#

  • Fung 自己的開放問題:「完全自動化的審查要推進到多遠?」——速度/安全的平衡點在哪裡,又該如何在不重新引入審查瓶頸的前提下,讓人類維持信心?
  • 如果 CI/建置才是隱藏的卡點,那麼驗證基礎設施(test runner、CI 容量)是否會成為一家 AI-native org 真正的資本支出?

資料來源#

§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 26
  • Agentic Honesty & Diligence

    As models get more capable, failing to surface decision-relevant information shifts from a capability failure to an ali…

  • AI Accelerating AI Development

    The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…

  • AI Brain Fry

    Kropp et al. 2026/03: mental fatigue from excessive AI oversight increases minor errors +11%, major errors +39%; cognit…

  • AI-Driven Formal Proof Search

    LLM generates Lean, compiler verifies every step → eliminates hallucination; DeepMind resolves 9/353 Erdős + 44/492 OEI…

  • AI-Native Product Org Bottlenecks

    AI-native product-org bottleneck is accountable taste at speed: dogfooding trains taste, evals encode it, and accountab…

  • Anthropic

    AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…

  • Building Is Cheap, Arguing Is Expensive

    "In technical debate, code wins": generate three PRs vs whiteboard; prototype over design doc; reduce design docs

  • Cat Wu

    Head of Product for Claude Code and Cowork at Anthropic; primary articulator of AI-native product cadence and engineer-…

  • Claude Code

    Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…

  • Claude Code Auto Mode

    Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground betw…

  • Code as Source of Truth

    Docs go stale at high coding throughput; check specs/skills into the repo; onboard via Claude; spec-drift verification

  • Deep Modules for Agents

    Ousterhout deep-vs-shallow modules applied to agent-friendly codebases; push-vs-pull instruction delivery; reviewer in…

  • Dogfooding as Product Discipline

    Product sense is built by relentless first-hand use ("ant food"); Mr. Peanut catch; cross-source (Cat Wu vibe-checks, G…

  • Evals as Product Spec

    Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done loo…

  • Fiona Fung

    Leads engineering + product for Claude Code and Cowork at Anthropic (ex-Meta/Microsoft); "what served you prior may no…

  • Harness Shrinkage as Models Improve

    Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…

  • Human-in-the-Loop Boundaries

    Humans belong at allocation, understanding, design-concept, risk, and accountability boundaries; they slow the system d…

  • AI Engineering & Agent Tooling

    Map of Content for the ai-engineering domain — 36 concepts. Curated entry point; see Home for all domains.

  • Open Questions Backlog

    _96 pages with open questions, as of 2026-06-14._

  • The PRD-Replacement Spectrum at AI-Native Speed

    Four positions (grill-then-PRD → lighter-PRD → build-to-decide → prototype-is-spec) are one spectrum once you decompose…

  • Product Velocity as Moat

    Shipping speed as differentiator + trust signal ("you'll scale with us"); a treadmill that must convert into durable lo…

  • Recursive Self-Improvement

    An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…

  • Research Taste as the Human Bottleneck

    The narrowing human role as AI absorbs execution: choosing which problems matter, which results to trust, and when an a…

  • The Verifiability Thesis

    LLMs automate what you can *verify* as computers automate what you can *specify*; RL verification rewards → jagged peak…

  • When Does Verification Quality Determine Whether AI Automation Works?

    Verification-quality ladder from Lean/formal proof search through software CI and vulnerability reproduction; autonomy…

  • Vibe Coding vs. Agentic Engineering

    Vibe coding raises the floor (anyone builds); agentic engineering preserves the quality bar while going faster; ">10x a…

Related articles
  • Harness Shrinkage as Models Improve

    Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…

  • Claude Code

    Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…

  • Fiona Fung

    Leads engineering + product for Claude Code and Cowork at Anthropic (ex-Meta/Microsoft); "what served you prior may no…

  • Agent Loop Pattern

    `/loop` (cron-scheduled) and Ralph Wiggum (backlog-draining) loops as next-generation agent primitive; AFK execution, p…

  • Evals as Product Spec

    Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done loo…