Sources#
Summary#
Rich Sutton's 2019 essay: general methods that leverage computation (search, learning) ultimately outperform methods that build in human knowledge and hand-engineered structure — and they do so by a wide margin as compute grows. The "bitter" part: this keeps surprising researchers who invested in clever domain structure, because the structure becomes a ceiling, not a foundation.
This page exists because the principle recurs as a load-bearing argument across this wiki — invoked explicitly to justify dissolving harnesses into models.
Where it's invoked here#
- Interaction Models — TML cites "the bitter lesson" directly: hand-crafted interactivity systems (VAD, turn-detection, dialog-management harnesses) "will be outpaced by the advance of general capabilities," therefore "for interactivity to scale with intelligence, it must be part of the model itself." See Turn-Based Interface Bottleneck.
- Encoder-Free Early Fusion — co-train all modality components from scratch in one transformer rather than stitching pretrained encoders/decoders: fewer hand-engineered modular boundaries.
- Time-Aligned Micro-Turns — remove artificial turn boundaries so interaction modes become scalable model behavior rather than per-mode harness code.
- Harness Shrinkage as Models Improve — the same logic applied to coding-agent harnesses: prompt scaffolding compensates for what the model can't yet do, and should shrink as models improve. (Caveat there: mechanical verification — tests, types, linters — is the part that doesn't migrate inward.)
- Agent Harness Engineering — "enforce invariants, not implementations": let the model find the path; the harness only encodes what must be true.
The standard caveat#
The bitter lesson is about capabilities and structure migrating into the model, not "harnesses are useless." Things that legitimately stay outside the model: mechanical verification (Harness Shrinkage as Models Improve's synthesis), organization-specific policy/style, security boundaries, and — per Claude Character as Product — deliberate character/personality work. The open question on every harness component is which side of that line it's on.
Connections#
- Evolutionary Proof Search — the bespoke evolutionary apparatus is exactly what the bitter lesson predicts gets absorbed
- Interaction Models — the most explicit recent invocation
- Turn-Based Interface Bottleneck — "the less-intelligent harness loses to scaling"
- Harness Shrinkage as Models Improve — the coding-agent version, with the mechanical-verification caveat
- Agent Harness Engineering — invariants-not-implementations as a bitter-lesson-aware design rule
- Encoder-Free Early Fusion / Time-Aligned Micro-Turns — architectural choices justified by it
- Claude Character as Product — a candidate counterexample: character may not migrate inward
- Model Spec Midtraining (MSM) — alignment moving from harness-prompt-injection to model-internalized values is a bitter-lesson move on the alignment axis
- Compute Allocator — names what stays on the human side of the line: the allocation decision and the human-facing scaffolding that supports it don't migrate inward, even as model-facing structure does
- HTML as the New Markdown — "leave room for the model to surprise you" is the prompt-level form of the lesson; the caveat is that human-facing legibility (HTML artifacts) is on the side that does not dissolve into the model
- MCP and Computer Use — Boris Cherny's "to the model, it's just tokens" makes the substrate choice (MCP/API/computer use) a model decision, not a harness decision; bitter-lesson endpoint for tool dispatch
- Agentic Loops Overtake Bespoke Systems — the clearest empirical confirmation in the corpus: DeepMind's simple agentic loop matched its bespoke trained system (AlphaProof + evolutionary search) on open math problems as the LLM improved
- AI R&D Autonomy Evaluation (AECI) — if the bitter lesson runs all the way, scaled general methods eventually improve themselves; AECI is how Anthropic measures whether that threshold is near
- Recursive Self-Improvement — the furthest extrapolation of the principle: "research progress is mostly a function of tools and resources," so perspiration (the 99%) becomes automatable
- AI Accelerating AI Development — the empirical instance: the kernel-optimization loop going 3×→52× is scaled general method beating hand-tuning, measured
- Research Taste as the Human Bottleneck — the open bet on the last holdout: is research taste a true ceiling, or just the next structure the bitter lesson dissolves?
- Build for the Next Model — the product-strategy corollary: since capability migrates inward over releases, prototype "the thing that almost works" and let the next model dissolve the gap rather than engineering around it
- Task Time-Horizon Scaling — rising general-benchmark capability is the curve that keeps shrinking hand-built scaffolding's advantage
- The Verifiability Thesis — Karpathy's account of why scaled RL outruns hand-engineering: labs throw compute at verifiable-reward environments
- Software 3.0 — the neural-net-as-host-process extrapolation is the bitter lesson pushed all the way to the hardware layer
- Andrej Karpathy — frequent invoker of the principle (verifiability, ghosts, Software 3.0 all rest on it)
Sources#
- Interaction Models: A Scalable Approach to Human-AI Collaboration (explicit citation of "the bitter lesson")
Cited by 27
- Agent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
- Agentic Loops Overtake Bespoke Systems
DeepMind's *basic* Ralph-loop agent matched its bespoke evolutionary+AlphaProof system as the LLM improved; the bitter…
- AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
- AI R&D Autonomy Evaluation (AECI)
How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…
- Opinions on Using AI Tools & the Future of the Software Engineering Role
Debate map of four stances on using AI tools (bullish-insider / pragmatist-practitioner / skeptic-governance / architec…
- Andrej Karpathy
Co-founder OpenAI, ex-Tesla AI, Eureka Labs; coined "vibe coding," Software 1/2/3.0, "ghosts not animals," "agentic eng…
- Build for the Next Model
Prototype the thing that almost works, not the thing that already works: bet that the next concrete model release (not…
- Claude Character as Product
Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…
- Compute Allocator
The human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding…
- Encoder-Free Early Fusion
Multimodal design with minimal pre-processing instead of large standalone encoders: dMel audio embedding, 40×40-patch h…
- Evolutionary Proof Search
The full-featured agent's mechanism: population DB of proof sketches, Elo via Plackett–Luce/Gibbs, P-UCB selection, LLM…
- The Future of Agent Interfaces
Interface future is layered: native interaction models for human collaboration, MCP/APIs for structured action, app pro…
- Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
- HTML as the New Markdown
Thariq Shihipar's thesis: as models improve, thousand-line markdown plans overwhelm the *human*; HTML artifacts (visual…
- Does the Human-Facing Harness (HTML Artifacts) Hit Its Own Bloat Ceiling?
Yes — HTML raises and reshapes the human-attention ceiling but can't remove it; bloat relocates from document-length to…
- Interaction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
- MCP and Computer Use
Anthropic's two complementary connector mechanisms: MCP for structured programmatic access (Salesforce/Drive/Gmail/Slac…
- LLM Architecture, Training & Alignment
Map of Content for the llm-architecture domain — 19 concepts. Curated entry point; see Home for all domains.
- Model Spec Midtraining (MSM)
New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…
- Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
- Research Taste as the Human Bottleneck
The narrowing human role as AI absorbs execution: choosing which problems matter, which results to trust, and when an a…
- Software 3.0
Karpathy's taxonomy: 1.0 code, 2.0 weights, 3.0 prompting; LLM as programmable interpreter; MenuGen "shouldn't exist";…
- Task Time-Horizon Scaling
METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…
- Thinking Machines Lab
AI research lab behind interaction models (May 2026); harness-dissolves-into-model thesis; upstreamed streaming-session…
- Time-Aligned Micro-Turns
The core interaction-model move: input/output as continuous streams in ~200ms interleaved chunks, no turn boundaries; s…
- Turn-Based Interface Bottleneck
Why current AI interfaces limit collaboration: single-thread turn-taking is a bandwidth bottleneck; humans pushed out b…
- The Verifiability Thesis
LLMs automate what you can *verify* as computers automate what you can *specify*; RL verification rewards → jagged peak…
Related articles
- Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
- Interaction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
- Open Questions Backlog
_96 pages with open questions, as of 2026-06-14._
- Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
- Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
