Plate IIAI Engineering中文HOWARDISM

Design Concept Grilling

PublishedMay 6, 2026FiledConceptDomainAI EngineeringTagsAgent Engineering Planning AlignmentReading7 minSourceAI-synthesised

Matt Pocock's `grill-me` skill; reach Brooks "design concept" before any plan; counter to specs-to-code; PRD as destination doc, Kanban as journey doc

Illustration for Design Concept Grilling

Sources#

Full Walkthrough: Workflow for AI Coding — Matt Pocock

Summary#

Matt Pocock's grill-me skill — a relentless interviewer prompt that walks down decision-tree branches one question at a time, with recommended answers — replaces "ask the agent for a plan" with "reach a shared understanding before any plan exists." The point is alignment, not output. The goal-state is what Frederick P. Brooks calls the design concept in The Design of Design: a shared idea held by all participants in the work. A PRD or a plan is downstream of the design concept; producing one without alignment first guarantees rework.

The skill (verbatim)#

"Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the decision tree, resolving dependencies one by one. For each question provide your recommended answer. Ask the questions one at a time…"

That's it. The skill is short on purpose — minimum surface area, maximum behavior change.

Why grill before plan#

Pocock observed that agents in plan mode "really eagerly try to produce a plan" — say "I think I've got enough" and ship a plan that papers over open questions. The plan reads fine, but it is wrong in ways that don't surface until implementation. Forcing the agent to interview first exposes the open questions while there is still time to answer them cheaply.

The recommendation-with-each-question pattern is load-bearing: it lets the user say "yes, agreed" most of the time, only debating where they actually disagree. A pure question-only interview wastes user attention on obvious calls.

Counter to the "specs to code" movement#

Pocock's strongest negative thesis: specs-to-code is vibe coding by another name. Defenders say "write a careful spec, hand to AI, fix the spec when the code is wrong, never look at the code." Pocock has tried it: it doesn't work.

Reasons:

The code is the battleground, not the spec
Specs that don't engage with code degrade into wish-lists
The feedback loop runs through a layer (spec ⇄ AI ⇄ code) instead of where the bugs actually live (code ⇄ tests)
Without code engagement, the developer's mental model of the system rots

Grilling sits on the opposite discipline: spec is downstream of alignment, alignment is upstream of any artifact, the developer keeps a hand in the code throughout.

Outputs of a grilling session#

A grilling session can run anywhere from 10 to 100 questions; Pocock has had sessions that went an hour. The artifact at the end is the conversation history itself — kept around as raw material for the PRD step. Pocock's write-a-PRD skill consumes this history (along with another short interview) to produce a destination document.

He explicitly does not review the PRD afterwards:

"What am I testing at this point? What are the failure modes I'm trying to test for? I know that LLMs are great at summarization. I have reached the same wavelength as the LLM. So all I'm doing is checking the LLM's ability to summarize."

This is only safe because the grilling session did the alignment work. Skip grilling and you must read the PRD.

Two essential documents#

After grilling, Pocock generates exactly two documents:

PRD (destination doc) — what the finished thing looks like, user stories, definition of done, out-of-scope list, implementation decisions, testing decisions, modules to be modified
Kanban (journey doc) — vertical slices into independently grabbable tickets (see Vertical Slice Tracer Bullets)

He then deletes (or closes) the PRD after implementation completes — see doc rot.

Module map appears in the PRD#

The PRD includes "modules to be modified" — concrete identification of which existing modules change and which new ones are introduced. This connects planning to architecture (see Deep Modules for Agents). The point is to keep the codebase shape in mind throughout planning, not as an afterthought during implementation.

When to skip grilling#

Grilling is for human-in-the-loop tasks. For a short well-scoped change ("rename this function across the codebase"), the overhead is wasted. The discipline scales with stakes: bigger feature, fuzzier brief, higher cost of going the wrong way → grill harder.

Connections#

Matt Pocock — author of the skill
Vertical Slice Tracer Bullets — the Kanban that follows the PRD
Deep Modules for Agents — module map in PRD ties planning to architecture
Agent Loop Pattern — grilling sits at the human-in-loop top of the funnel; loop drains the AFK bottom
Context Window Smart Zone — grilling uses sub-agents to keep parent context small
Agent Harness Engineering — "enforce invariants" at the planning layer is "reach alignment before any plan"
Claude Code Best Practices — the explore→plan→code workflow has the same shape; grill-me is the more aggressive variant of the "explore" step
Interaction Models — grilling is collaborative real-time iteration; turn-based interfaces are precisely what makes it clunky today, and an interaction model is the substrate that would make grilling-style collaboration feel native
HTML as the New Markdown — brainstorm → let Claude interview you → plan is the grilling shape; Thariq's HTML plan is a richer destination artifact than a markdown PRD, traded against harder versioning
Agentic Technical Debt — grilling produces the design concept that goes into CLAUDE.md; the strongest upstream defense against debt-by-session-re-derivation
Zero-Friction Scope Creep — a strong design concept reached via grilling resists scope sprawl in a way written PRDs alone often don't
Evals as Product Spec — grilling produces the design concept; evals encode whether it was achieved. Matt's "verification loops" and Cat's "ten great evals" are the same primitive at planning's other end
Building Is Cheap, Arguing Is Expensive — productive tension: Fiona Fung's "generate three PRs and compare" relocates design into built artifacts; reconciled here as the prototype is the medium of the design concept, not a replacement for reaching one

Open questions#

Can grilling be run AFK against another agent that holds the user's preferences? Pocock's answer in 2026 is "no, this part has to be human-in-the-loop" — but the question is open as agents get better at modeling their principal.
How does grilling change for team work where multiple humans need to align? Pocock's hint: pair-program with the agent in the room, treat it as a third interlocutor.

Derived#

The PRD-Replacement Spectrum at AI-Native Speed — the left pole of the spectrum: maximal pre-build alignment, PRD as deleted destination doc
Where Does the Why Live? — the grilling session is an authoring-time home for the why; but the destination PRD that carries it is deleted, so the why is orphaned for future readers

Sources#

Full Walkthrough: Workflow for AI Coding — Matt Pocock

§ end

About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 26

Agent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
Agent Loop Pattern
`/loop` (cron-scheduled) and Ralph Wiggum (backlog-draining) loops as next-generation agent primitive; AFK execution, p…
Agentic Technical Debt
Debt that *compounds* (not just accumulates) because each agentic-coding session re-derives architectural decisions wit…
Opinions on Using AI Tools & the Future of the Software Engineering Role
Debate map of four stances on using AI tools (bullish-insider / pragmatist-practitioner / skeptic-governance / architec…
Building Is Cheap, Arguing Is Expensive
"In technical debate, code wins": generate three PRs vs whiteboard; prototype over design doc; reduce design docs
Claude Code Best Practices
Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…
Context Window Smart Zone
Smart zone vs dumb zone (Dex Hardy / Matt Pocock): quadratic attention scaling, ~100K marker independent of advertised…
Deep Modules for Agents
Ousterhout deep-vs-shallow modules applied to agent-friendly codebases; push-vs-pull instruction delivery; reviewer in…
Evals as Product Spec
Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done loo…
HTML as the New Markdown
Thariq Shihipar's thesis: as models improve, thousand-line markdown plans overwhelm the *human*; HTML artifacts (visual…
Human-in-the-Loop Boundaries
Humans belong at allocation, understanding, design-concept, risk, and accountability boundaries; they slow the system d…
Interaction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
Learning to Co-Work with AI: A Software Engineer's Field Guide
Field guide for software engineers in the AI era: 6 skill clusters (taste, harness, alignment-first planning, agent-fri…
LLM-as-Compiler Knowledge Base
Karpathy's architecture: LLM incrementally compiles raw docs into a persistent interlinked wiki, replacing RAG with a 4…
Matt Pocock
Independent AI-coding educator; built Sandcastle library; smart-zone/grill-me/tracer-bullets pedagogical framing; "bad…
AI Engineering & Agent Tooling
Map of Content for the ai-engineering domain — 36 concepts. Curated entry point; see Home for all domains.
Model Introspection Feedback
Cat Wu's underrated technique: ask the model why it failed; treat answer as harness-debugging signal not model criticis…
Model Spec Midtraining (MSM)
New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…
Open Questions Backlog
_96 pages with open questions, as of 2026-06-14._
Outsource Your Thinking, Not Your Understanding
"You can outsource your thinking but not your understanding"; understanding as the non-delegable human bottleneck; know…
The PRD-Replacement Spectrum at AI-Native Speed
Four positions (grill-then-PRD → lighter-PRD → build-to-decide → prototype-is-spec) are one spectrum once you decompose…
Prototype Over PRD
Dan Carey's prototype-replaces-PRD method: record a why-not-what conversation, transcribe it, hand the transcript to Cl…
Turn-Based Interface Bottleneck
Why current AI interfaces limit collaboration: single-thread turn-taking is a bandwidth bottleneck; humans pushed out b…
Vertical Slice Tracer Bullets
Pragmatic-Programmer tracer-bullet pattern applied to agent task decomposition; vertical slices > horizontal layers; Ka…
Where Does the Why Live?
Rationale (the 'why') is well-homed at authoring time — it's the recorded why-not-what conversation and the grilling se…
Zero-Friction Scope Creep
MVP failure mode when agentic coding removes the cost-based forcing function against scope creep; antidote is written s…

Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
Context Window Smart Zone
Smart zone vs dumb zone (Dex Hardy / Matt Pocock): quadratic attention scaling, ~100K marker independent of advertised…
Deep Modules for Agents
Ousterhout deep-vs-shallow modules applied to agent-friendly codebases; push-vs-pull instruction delivery; reviewer in…
Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
Agent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…

Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
Context Window Smart Zone
Smart zone vs dumb zone (Dex Hardy / Matt Pocock): quadratic attention scaling, ~100K marker independent of advertised…
Deep Modules for Agents
Ousterhout deep-vs-shallow modules applied to agent-friendly codebases; push-vs-pull instruction delivery; reviewer in…
Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
Agent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…

Cited by 26

Agent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
Agent Loop Pattern
`/loop` (cron-scheduled) and Ralph Wiggum (backlog-draining) loops as next-generation agent primitive; AFK execution, p…
Agentic Technical Debt
Debt that *compounds* (not just accumulates) because each agentic-coding session re-derives architectural decisions wit…
Opinions on Using AI Tools & the Future of the Software Engineering Role
Debate map of four stances on using AI tools (bullish-insider / pragmatist-practitioner / skeptic-governance / architec…
Building Is Cheap, Arguing Is Expensive
"In technical debate, code wins": generate three PRs vs whiteboard; prototype over design doc; reduce design docs
Claude Code Best Practices
Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…
Context Window Smart Zone
Smart zone vs dumb zone (Dex Hardy / Matt Pocock): quadratic attention scaling, ~100K marker independent of advertised…
Deep Modules for Agents
Ousterhout deep-vs-shallow modules applied to agent-friendly codebases; push-vs-pull instruction delivery; reviewer in…
Evals as Product Spec
Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done loo…
HTML as the New Markdown
Thariq Shihipar's thesis: as models improve, thousand-line markdown plans overwhelm the *human*; HTML artifacts (visual…
Human-in-the-Loop Boundaries
Humans belong at allocation, understanding, design-concept, risk, and accountability boundaries; they slow the system d…
Interaction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
Learning to Co-Work with AI: A Software Engineer's Field Guide
Field guide for software engineers in the AI era: 6 skill clusters (taste, harness, alignment-first planning, agent-fri…
LLM-as-Compiler Knowledge Base
Karpathy's architecture: LLM incrementally compiles raw docs into a persistent interlinked wiki, replacing RAG with a 4…
Matt Pocock
Independent AI-coding educator; built Sandcastle library; smart-zone/grill-me/tracer-bullets pedagogical framing; "bad…
AI Engineering & Agent Tooling
Map of Content for the ai-engineering domain — 36 concepts. Curated entry point; see Home for all domains.
Model Introspection Feedback
Cat Wu's underrated technique: ask the model why it failed; treat answer as harness-debugging signal not model criticis…
Model Spec Midtraining (MSM)
New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…
Open Questions Backlog
_96 pages with open questions, as of 2026-06-14._
Outsource Your Thinking, Not Your Understanding
"You can outsource your thinking but not your understanding"; understanding as the non-delegable human bottleneck; know…
The PRD-Replacement Spectrum at AI-Native Speed
Four positions (grill-then-PRD → lighter-PRD → build-to-decide → prototype-is-spec) are one spectrum once you decompose…
Prototype Over PRD
Dan Carey's prototype-replaces-PRD method: record a why-not-what conversation, transcribe it, hand the transcript to Cl…
Turn-Based Interface Bottleneck
Why current AI interfaces limit collaboration: single-thread turn-taking is a bandwidth bottleneck; humans pushed out b…
Vertical Slice Tracer Bullets
Pragmatic-Programmer tracer-bullet pattern applied to agent task decomposition; vertical slices > horizontal layers; Ka…
Where Does the Why Live?
Rationale (the 'why') is well-homed at authoring time — it's the recorded why-not-what conversation and the grilling se…
Zero-Friction Scope Creep
MVP failure mode when agentic coding removes the cost-based forcing function against scope creep; antidote is written s…