Sources#
Summary#
Thinking Machines Lab's framing of why current AI interfaces limit collaboration: the turn-based interface is a bandwidth bottleneck between human and model. It is the problem Interaction Models are built to dissolve.
The two-claim argument#
-
AI labs over-optimize for autonomy. Labs treat autonomous capability as the model's most important property; as a result, today's models and interfaces "aren't optimized for humans to remain in the loop." But in most real work users can't fully specify requirements upfront and walk away — good results come from a collaborative loop of clarification and feedback.
-
Humans get pushed out by the interface, not the work. "Humans increasingly get pushed out not because the work doesn't need them, but because the interface has no room for them." The fix is to let people collaborate with AI the way they collaborate with other people: messaging, talking, listening, seeing, showing, interjecting — and the model doing the same.
The mechanism: a single thread#
Today's models "experience reality in a single thread":
- Until the user finishes typing/speaking, the model waits with no perception of what the user is doing or how.
- Until the model finishes generating, its perception is frozen — no new information arrives until it finishes or is interrupted.
This narrow channel limits how much of a person's knowledge, intent, and judgement can reach the model, and how much of the model's work is legible to the human. Analogy: "trying to resolve a crucial disagreement over email rather than in person."
Why harnesses don't fix it#
Existing real-time systems bolt interactivity on with a harness — VAD (voice-activity detection), turn-boundary prediction, dialog state machines — components "meaningfully less intelligent than the model itself." That harness precludes whole interaction modes:
- proactive interjection ("interrupt when I say something wrong")
- reaction to visual cues ("tell me when I've written a bug in my code")
- speak-while-listening ("translate Spanish→English live")
- speak-while-watching ("live-commentate this sports game")
The Bitter Lesson says these hand-crafted systems get outpaced by general capability growth → the resolution is to make interactivity model-native (see Time-Aligned Micro-Turns).
Connections#
- Interaction Models — the proposed resolution
- The Bitter Lesson — why the harness-based status quo loses
- Time-Aligned Micro-Turns — the architectural move that removes turn boundaries
- Full-Duplex Interaction — the interaction modes the bottleneck currently blocks
- Harness Shrinkage as Models Improve — the general version of "the less-intelligent harness should dissolve into the model"
- AI Employee Framing / Human-AI Accountability Redesign — the org-side mirror: both warn against treating autonomy as the goal and pushing the human to the margin; this page is the interface-side version of the same critique
- Design Concept Grilling — argues the value is in collaborative iteration; this page argues the interface is what blocks it
- Context Window Smart Zone — orthogonal limitation that also makes "fully autonomous, walk away" brittle
Sources#
Cited by 11
- AI Employee Framing
Kropp et al. (HBR May 2026, n=1,261): framing AI agents as "employees" vs "tools" cuts personal accountability −9pp, in…
- Opinions on Using AI Tools & the Future of the Software Engineering Role
Debate map of four stances on using AI tools (bullish-insider / pragmatist-practitioner / skeptic-governance / architec…
- Design Concept Grilling
Matt Pocock's `grill-me` skill; reach Brooks "design concept" before any plan; counter to specs-to-code; PRD as destina…
- Full-Duplex Interaction
Perceive-and-respond simultaneously across modalities; proactive interjection, visual-cue reactions, simultaneous speec…
- The Future of Agent Interfaces
Interface future is layered: native interaction models for human collaboration, MCP/APIs for structured action, app pro…
- Human-AI Accountability Redesign
HBR five-pillar prescription: span-of-control redesign, role redesign, performance management reset, decision-rights/es…
- Interaction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
- Interaction & Multimodal
Map of Content for the interaction-multimodal domain — 7 concepts. Curated entry point; see Home for all domains.
- The Bitter Lesson
Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolv…
- Thinking Machines Lab
AI research lab behind interaction models (May 2026); harness-dissolves-into-model thesis; upstreamed streaming-session…
- Time-Aligned Micro-Turns
The core interaction-model move: input/output as continuous streams in ~200ms interleaved chunks, no turn boundaries; s…
Related articles
- Interaction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
- Agent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
- Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
- Opinions on Using AI Tools & the Future of the Software Engineering Role
Debate map of four stances on using AI tools (bullish-insider / pragmatist-practitioner / skeptic-governance / architec…
- Encoder-Free Early Fusion
Multimodal design with minimal pre-processing instead of large standalone encoders: dMel audio embedding, 40×40-patch h…
