Howardism · Vol. 03Plate II · No. 02

Governance & Workforce, in order.

Notes11DomainGovernance & WorkforceOpen Qs25Newest14 Jun 2026Oldest8 May 2026

Policy, workforce shifts, and the economics of AI labor.

Map of Content for the governance-workforce domain — 11 concepts. Curated entry point; see Home for all domains.

AI Accelerating AI Development — The empirical core of When AI builds itself: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged code Claude-authored, ~8× code/engineer/day vs 2024, a kernel-optimization eval going 3×→52× in a year, an automated researcher recovering 97% of a weak-to-strong gap, and model next-step judgment beating humans 64%
AI Brain Fry — Kropp et al. 2026/03: mental fatigue from excessive AI oversight increases minor errors +11%, major errors +39%; cognitive cost surface for both tool and employee framings
AI Employee Framing — Kropp et al. (HBR May 2026, n=1,261): framing AI agents as "employees" vs "tools" cuts personal accountability −9pp, increases escalation +44%, reduces error catching −18%, no adoption gain
AI R&D Autonomy Evaluation (AECI) — How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives recursive self-improvement; tracked via the AECI capability index plus concrete shortcomings vs. human researchers; Opus 4.8 sits below the frontier and is not close to substituting for research staff
Autonomous Scientific Discovery — Mythos-class models now conduct novel science with limited human input — autonomous protein/drug design (~10× faster, matching skilled humans), molecular-biology hypotheses preferred ~80% over Opus-class (one E. coli mechanism independently corroborated), and week-long genomics that beat a Science-published model at 100× smaller; the wet-lab analogue of AI-driven formal proof search, and fresh evidence in the research-taste debate
Capability-Gated Model Fallback — Fable 5's safeguard architecture: classifiers detect cyber / bio-chem / distillation queries and route the response to a less-capable model (Opus 4.8) instead of refusing — 'fallback, not refusal'; >95% of sessions never trigger; conservative tuning, robust to 1,000+ hours of jailbreak testing; a new point on the safeguard spectrum for capabilities past a risk threshold
Frontier Pause Verification — The arms-control problem of a credible, verifiable slowdown or pause of frontier AI: detectability is harder than for other technologies (training runs are easier to conceal than missile silos), so the Anthropic Institute aims to build the verification systems a multilateral pause would require
Human-AI Accountability Redesign — HBR five-pillar prescription: span-of-control redesign, role redesign, performance management reset, decision-rights/escalation/consequences, agentic-unit-not-human-role design
Recursive Self-Improvement (hub) — An AI system autonomously designing and developing its own successor; Anthropic Institute's When AI builds itself argues AI is already accelerating AI development (engineers ship ~8× more code/quarter) and lays out three futures — stalled-but-diffused, compounding-efficiency, and full RSI
Research Taste as the Human Bottleneck — The narrowing human role as AI absorbs execution: choosing which problems matter, which results to trust, and when an approach is a dead end; the top rung of the autonomy ladder, and the open question of whether taste is 'just another capability' AI fails at then masters
Responsible Scaling Policy Evaluations — Anthropic's RSP gates deployment on pre-release capability evaluations in CBRN, automated AI R&D, and high-stakes misalignment; the Opus 4.8 determination is that it does not advance the frontier beyond Mythos Preview and that catastrophic risk remains low given current mitigations

Open questions 25 open

AI Accelerating AI Development
- LOC, self-reports, and headroom-dependent multiples all overstate; what *unbiased* throughput metric would Anthropic's promised shift to "direct measurement of AI R&D acceleration and researcher uplift" ([[ai-rd-autonomy-evaluation]]) actually use?
- The W2S result didn't transfer to production-scale models. Is that a temporary scaling artifact or a structural limit on autonomous research?
- The next-step judgment trend (51%→64%) is measured only on weak-human-move slices. What does the curve look like on a representative sample of research decisions?
AI R&D Autonomy Evaluation (AECI)
- "Not close to substituting for senior researchers" is a subjective, internally-sourced judgment. What objective signal would replace it as models approach the threshold?
- AECI is a single scalar fork of an external index; how sensitive is the 155.5 / frontier-not-advanced conclusion to the choice of the n=11 evaluation set?
- The shift to "direct measurement of AI R&D acceleration and researcher uplift" is announced but not yet operationalized in this card — what does that measurement look like?
Autonomous Scientific Discovery
- Every result is Anthropic-reported and example-selected; the genomics "100× smaller beats *Science*" claim is "intend to publish" — what survives external peer review?
- Science's verification gap: the formal-proof loop self-validates; here a wrong-but-confident hypothesis costs a wet-lab cycle to falsify. Does autonomy without a fast verifier *increase* the verification bottleneck rather than relieve it?
- If hypothesis-generation is genuinely at ~80% preference, how much of "research taste" is left as a distinctively human function — and how would you measure the residue?
Capability-Gated Model Fallback
- The >95%/<5% figures are session-level; what's the false-positive rate for *legitimate* security researchers and biologists, whose benign queries are exactly the ones most likely to trip the conservative classifiers?
- Fallback-not-refusal preserves UX but means the *real* general-access model for security/bio-adjacent work is Opus 4.8, not Fable — does that quietly cap Fable's value for whole professional segments until the trusted-access programs open?
- The UK AISI's "progress toward a universal jailbreak" is disclosed but not quantified — and the post-launch **access suspension** (see [[claude-fable-5]]) raises the question of whether a safeguard failure forced it.
- Does swapping to a weaker model on flagged topics create an exploitable oracle (probe which queries trigger fallback to map the classifier's boundary)?
Frontier Pause Verification
- What does an AI-training "verification regime" concretely consist of — compute-accounting, datacenter inspection, hardware attestation, on-chip telemetry? The essay names the problem, not the mechanism.
- Detectability < verifiability: can detection even be made reliable when training runs leave no physical signature and inputs are dual-use?
- Who adjudicates triggers and lifts? No institution currently holds that mandate, and standing one up is itself a decade-scale task.
Recursive Self-Improvement
- Is "research taste" a true ceiling (future 1) or just the next capability to fall (futures 2–3)? The essay frames this as the single load-bearing uncertainty.
- The RSI extrapolation rests on trends staying exponential rather than S-curving — but the essay concedes it cannot rule out an architectural ceiling or a compute/energy supply-chain constraint. Which binds first?
- If misalignment compounds through self-improvement (future 3), is AECI-gated [[responsible-scaling-policy-evals|RSP]] review fast enough to catch it before control is lost?
Research Taste as the Human Bottleneck
- Is research taste a genuine ceiling (an architectural capability scaling can't reach) or the next jagged valley to fill? The essay calls this the decisive unknown.
- If taste is automatable, what — if anything — remains a durable human comparative advantage in AI development?
- How do you measure rubber-stamping? "Humans set direction" can be true on paper while real judgment quietly transfers to the model.
Responsible Scaling Policy Evaluations
- The RSP determination leans heavily on "we use it daily and it doesn't substitute for our researchers." How well does that subjective judgment scale as models approach the threshold?
- The two new general-access risk pathways (other AI developers; major governments) are newly in scope but lightly evaluated — what would a positive finding there even look like?
- How does the RSP brake interact with [[recursive-self-improvement]]: is AECI-based gating fast enough if acceleration compounds, and does single-lab gating even matter without the multilateral [[frontier-pause-verification|pause-verification]] regime?