Howardism · Vol. 03Plate II · No. 02

Product & Org, in order.

Notes8DomainProduct & OrgOpen Qs23Newest7 Jun 2026Oldest6 May 2026

Product cadence, org design, and the AI-native team.

Map of Content for the product-org domain — 8 concepts. Curated entry point; see Home for all domains.

AI Native Product Cadence (hub) — Cat Wu's 6mo→1mo→1day cadence at Anthropic: research-preview branding, mission-as-tiebreaker, evergreen launch room, lighter PRDs, weekly metrics readouts
Compounding Loop Optimization — Dan Carey's discipline of instrumenting and automating every recurring step of the build loop — because when internal tooling is an-afternoon-cheap, each optimization pays back ×(50–100 iterations per project)
Dogfooding as Product Discipline — Product sense is built by relentless first-hand use ("ant food"); Mr. Peanut catch; cross-source (Cat Wu vibe-checks, Glasgow founder-led sales)
Engineer PM Convergence — Generalists across disciplines; product taste as bottleneck skill; Anthropic Claude Code team as case study; "just do things" cultural substrate
Evals as Product Spec — Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done looks like for ambiguous AI features; companion to introspection (hypothesis) and vibe-check (direction)
Managers as ICs — Every Claude Code manager starts as an IC; flat org; agentic coding collapsed the onboarding cost that pushed managers out of the codebase
Model Introspection Feedback — Cat Wu's underrated technique: ask the model why it failed; treat answer as harness-debugging signal not model criticism; caveats around model self-report fidelity
Prototype Over PRD — Dan Carey's prototype-replaces-PRD method: record a why-not-what conversation, transcribe it, hand the transcript to Claude, ask for a few prototype variations; the prototype is the spec, not a downstream artifact

Open questions 23 open

AI Native Product Cadence
- Does the cadence scale beyond ~100 people? Anthropic itself is bigger (~30-40 PMs alone), but the [[claude-code]] team that visibly drives cadence is small.
- What's the equivalent of research-preview branding for B2B enterprise launches where customers expect stability? Cat doesn't address.
- How much of the cadence is structural (process choices) vs cultural (talent density)? Probably both, ratio unclear.
Compounding Loop Optimization
- The loop assumes the team *is* (close to) the user. How much of the compounding advantage survives when the user is unlike the builder and "talk to users" can't be same-room?
- Where is the line between worthwhile internal tooling and yak-shaving? Carey's "afternoon" bar is the heuristic, but [[cat-wu]] warns that over-customizing setups "becomes distraction."
- Does Claude-as-first-pass-on-all-feedback ever filter out the rare signal that doesn't cluster? Automating triage optimizes the common case; the tail is where surprising bets come from.
Dogfooding as Product Discipline
- Dogfooding works when the team *is* the user (Claude Code) or near it (Cat Wu, Boris). How do you build product sense for users very unlike you — does "talk to customers" fully substitute, as Glasgow/Fung's small-business work suggests?
- Can dogfooding scale, or does it implicitly cap how large an AI-native product org can stay taste-driven before it reverts to dashboards?
Engineer PM Convergence
- Does this scale beyond ~50-person Claude Code-style teams? Boris hedges: "I think this is going to be a question for years."
- What happens to formal PM career ladders in companies where engineers do PM work? Open at Anthropic per Cat.
- Cross-disciplinary generalist is a hiring bar — where does the supply come from? Career changers, or new-grad bias toward AI-native education?
Evals as Product Spec
- How do you write an eval for *taste*-driven features like [[claude-character-as-product|character]]? Amanda's role is canonical for being eval-resistant; Cat names her as someone who *is* good at evals here, but doesn't describe the technique. **Partially answered:** [[wiki/derived/evals-for-taste-and-character]] — the technique is a pipeline (conviction → dogfood-sourced failure modes → MSM-style variant A/B measurement → ~10 interpretable evals); proven on the safety/values core but still tacit on the warm/witty aesthetic surface.
- The 10-vs-100 number is given without justification. Is there a Goldilocks zone, or does it depend on feature surface area? [[client-side-agent-optimization]]'s framing of combos suggests evals also have a combinatorial explosion problem.
- How do evals interact with [[harness-shrinkage-as-models-improve]]? When a harness asset shrinks because the model now handles it natively, the evals built around the old harness may become artifacts rather than guardrails. Does Anthropic retire evals or repurpose them?
- Is there a single non-Anthropic example of a PM-as-eval-writer to cite, or is this currently a Cat-Wu-singular framing? The Matt Pocock workshop reaches the same place from a different vocabulary, but no third source has been ingested yet.
Managers as ICs
- Fung's own open question: "Do you still need separate iOS and Android orgs?" — if engineers flex across platforms via Claude, the traditional platform-split org may dissolve too. How far does flattening go?
- Does manager-as-IC scale past a certain org size, or only work while Claude Code is small and the codebase is Claude-legible?
Model Introspection Feedback
- How reliable are 4.7-class introspective reports? Anthropic's interpretability research suggests partial fidelity but not full. Empirically, Cat reports it's good enough to drive harness fixes — but unclear at what model scale this technique becomes load-bearing.
- Does adversarial introspection ("why did you fail?") yield different signal than neutral ("walk me through your reasoning")? Worth probing.
- Could a meta-agent run introspection automatically against logged failures? Sounds tractable but no public implementation.
Prototype Over PRD
- Where does prototype-over-PRD break down? Carey's domain is a visual design tool where a prototype *is* the product surface; for backend/infra/data work the prototype may not capture the spec (cf. [[ai-native-product-cadence]]'s "full PRD for heavy-infra features").
- If there is no PRD, where does the *rationale* ("why we chose variation B") live for future readers? Same rationale-capture gap flagged in [[building-is-cheap-arguing-is-expensive]].
- The prototype-as-spec must not become the prototype-as-validation trap [[problem-solution-fit-discipline]] warns about: a fast prototype proves the build was solvable, not that the problem is real.