H
Howardism
Plate IIAI Engineering中文HOWARDISM

Context Window Smart Zone

PublishedMay 6, 2026FiledConceptDomainAI EngineeringTagsLLM ArchitectureAgent EngineeringContext ManagementReading5 minSourceAI-synthesised

Smart zone vs dumb zone (Dex Hardy / Matt Pocock): quadratic attention scaling, ~100K marker independent of advertised context; clear-and-restart > compaction; status-line token counting as essential discipline

Illustration for Context Window Smart Zone

Sources#

Summary#

LLMs do not degrade linearly as context grows; they degrade quadratically because attention relationships scale O(n²) with token count. Matt Pocock (citing Dex Hardy of Human Layer) frames this as a smart zone / dumb zone split: the first ~100K tokens of any session is the smart zone where the model performs well; beyond that the model gets "dumber and dumber" regardless of advertised window size. Practical implication: context budget is a real, hard resource — and the agent harness is responsible for keeping individual sessions within the smart zone.

The constraint#

"Every time you add a token to an LLM, it's kind of like you're adding a team to a football league. The number of matches goes up quadratically."

"It doesn't matter whether you're using 1 million context window or 200K, it's always going to be about [100K]. It starts to just get dumber."

Matt Pocock

The 1M-token context windows shipping in 2026 don't move the smart zone — they "just shipped a lot more dumb zone." Long context is useful for retrieval (find a fact in five copies of War and Peace) but not for reasoning (write code that depends on all of it).

Memento metaphor#

Each session is a fresh start. There is no memory across sessions; the model resets to the system prompt every time. This is a constraint but also a feature — clearing context restores smart-zone behavior cheaply. Persistent state must live somewhere the next session can read it (repo, filesystem, a the index-style catalog).

Compaction is worse than clearing#

Claude Code's /compact command summarizes the running session into a smaller history. Pocock prefers /clear:

  • Compacted history accumulates "sediment" — distortions and lossy summaries — that degrades subsequent work
  • Clear-and-restart returns to a known-clean baseline (the system prompt)
  • The cost of clearing is paid back by working in the smart zone

The disagreement isn't universal — many developers like compaction because it preserves continuity. The right call depends on whether your task can be resumed cleanly from a written record (then prefer clear) or needs in-flight conversational context (then compaction wins).

Implications for harness design#

  1. System prompt budget. Anything always-in-context comes off the smart-zone budget. "I have seen people put 250K tokens [in the system prompt], then you're just going into the dumb zone before you can even do anything." Keep CLAUDE.md / AGENTS.md as a table of contents, not an encyclopedia (see Agent Harness Engineering on AGENTS.md as ToC).
  2. Sub-agents preserve parent context. A sub-agent runs in its own context window; only its summary returns. Pocock's grill-me skill ran a 93.7K-token sub-agent yet his main session still had ~25K tokens unused.
  3. Fragment work into many sessions. Loops (see Agent Loop Pattern) and vertical slices (see Vertical Slice Tracer Bullets) work because each iteration starts fresh in the smart zone.
  4. Reviewer should run in fresh context. If the implementer used 80K tokens in the smart zone, asking it to review its own work pushes the reviewer into the dumb zone. Cleared context = smart-zone reviewer (see Deep Modules for Agents on push-vs-pull and reviewer placement).
  5. Push vs pull instructions. Always-in-context instructions cost smart-zone tokens; pull-on-demand (skills) costs nothing until invoked.

Status-line token counter as an essential tool#

Pocock recommends a status-line widget showing the exact running token count of each session — without it, developers don't know when they're approaching the dumb zone. He treats this as "absolutely essential information."

Connections#

  • Matt Pocock — popularizer of the smart-zone framing
  • Agent Harness Engineering — system-prompt minimalism and AGENTS.md-as-ToC are restatements of the smart-zone principle
  • Agent Loop Pattern — fragmenting work to stay in smart zone is why loops are powerful
  • Vertical Slice Tracer Bullets — keeping each task small enough to fit in smart zone
  • Design Concept Grilling — the grilling session uses a sub-agent so the parent context stays small
  • Deep Modules for Agents — clearing-before-review is a smart-zone discipline
  • Harness Shrinkage as Models Improve — the smart zone may grow ("the dumb zone has become less dumb lately") but quadratic attention still constrains it
  • AI Brain Fry — human-side analog of the smart zone: oversight has its own degradation curve past capacity, mirroring attention degradation past ~100K tokens
  • Interaction Models — continuous audio/video at 200ms granularity accumulates context fast; TML names long-session context management as an open problem — the same constraint in a new modality
  • HTML as the New Markdown — the human-attention analog: a reader degrades past some volume of undifferentiated markdown the way a model degrades past ~100K tokens; HTML raises the human's effective smart zone by spending tokens on legibility
  • Agentic Technical Debt — founders' persistent-context discipline (CLAUDE.md) competes with smart-zone budget; over-long context files become their own problem

Open questions#

  • Does the smart-zone marker scale with model size, or is it bounded by attention architecture? Pocock observes "the dumb zone has become less dumb lately" but pegs it at 100K through 2026.
  • When sparse-attention or memory-augmented architectures ship, does the smart zone become a soft constraint?
  • How should harnesses surface remaining smart-zone budget to the user — token count, percentage, or a richer signal?

Sources#

§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 24
Related articles
  • Harness Shrinkage as Models Improve

    Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…

  • Design Concept Grilling

    Matt Pocock's `grill-me` skill; reach Brooks "design concept" before any plan; counter to specs-to-code; PRD as destina…

  • Agent Loop Pattern

    `/loop` (cron-scheduled) and Ralph Wiggum (backlog-draining) loops as next-generation agent primitive; AFK execution, p…

  • Claude Code Best Practices

    Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…

  • Agent Harness Engineering

    Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…