H
Howardism
Howardism · Vol. 03Plate II · No. 02

Governance & Workforce, in order.

Notes11DomainGovernance & WorkforceOpen Qs25Newest14 Jun 2026Oldest8 May 2026

Policy, workforce shifts, and the economics of AI labor.

Map of Content for the governance-workforce domain — 11 concepts. Curated entry point; see Home for all domains.

  • AI Accelerating AI Development — The empirical core of When AI builds itself: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged code Claude-authored, ~8× code/engineer/day vs 2024, a kernel-optimization eval going 3×→52× in a year, an automated researcher recovering 97% of a weak-to-strong gap, and model next-step judgment beating humans 64%
  • AI Brain Fry — Kropp et al. 2026/03: mental fatigue from excessive AI oversight increases minor errors +11%, major errors +39%; cognitive cost surface for both tool and employee framings
  • AI Employee Framing — Kropp et al. (HBR May 2026, n=1,261): framing AI agents as "employees" vs "tools" cuts personal accountability −9pp, increases escalation +44%, reduces error catching −18%, no adoption gain
  • AI R&D Autonomy Evaluation (AECI) — How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives recursive self-improvement; tracked via the AECI capability index plus concrete shortcomings vs. human researchers; Opus 4.8 sits below the frontier and is not close to substituting for research staff
  • Autonomous Scientific Discovery — Mythos-class models now conduct novel science with limited human input — autonomous protein/drug design (~10× faster, matching skilled humans), molecular-biology hypotheses preferred ~80% over Opus-class (one E. coli mechanism independently corroborated), and week-long genomics that beat a Science-published model at 100× smaller; the wet-lab analogue of AI-driven formal proof search, and fresh evidence in the research-taste debate
  • Capability-Gated Model Fallback — Fable 5's safeguard architecture: classifiers detect cyber / bio-chem / distillation queries and route the response to a less-capable model (Opus 4.8) instead of refusing — 'fallback, not refusal'; >95% of sessions never trigger; conservative tuning, robust to 1,000+ hours of jailbreak testing; a new point on the safeguard spectrum for capabilities past a risk threshold
  • Frontier Pause Verification — The arms-control problem of a credible, verifiable slowdown or pause of frontier AI: detectability is harder than for other technologies (training runs are easier to conceal than missile silos), so the Anthropic Institute aims to build the verification systems a multilateral pause would require
  • Human-AI Accountability Redesign — HBR five-pillar prescription: span-of-control redesign, role redesign, performance management reset, decision-rights/escalation/consequences, agentic-unit-not-human-role design
  • Recursive Self-Improvement (hub) — An AI system autonomously designing and developing its own successor; Anthropic Institute's When AI builds itself argues AI is already accelerating AI development (engineers ship ~8× more code/quarter) and lays out three futures — stalled-but-diffused, compounding-efficiency, and full RSI
  • Research Taste as the Human Bottleneck — The narrowing human role as AI absorbs execution: choosing which problems matter, which results to trust, and when an approach is a dead end; the top rung of the autonomy ladder, and the open question of whether taste is 'just another capability' AI fails at then masters
  • Responsible Scaling Policy Evaluations — Anthropic's RSP gates deployment on pre-release capability evaluations in CBRN, automated AI R&D, and high-stakes misalignment; the Opus 4.8 determination is that it does not advance the frontier beyond Mythos Preview and that catastrophic risk remains low given current mitigations

Open questions 25 open

  • AI Accelerating AI Development
    • LOC, self-reports, and headroom-dependent multiples all overstate; what *unbiased* throughput metric would Anthropic's promised shift to "direct measurement of AI R&D acceleration and researcher uplift" ([[ai-rd-autonomy-evaluation]]) actually use?
    • The W2S result didn't transfer to production-scale models. Is that a temporary scaling artifact or a structural limit on autonomous research?
    • The next-step judgment trend (51%→64%) is measured only on weak-human-move slices. What does the curve look like on a representative sample of research decisions?
  • AI R&D Autonomy Evaluation (AECI)
    • "Not close to substituting for senior researchers" is a subjective, internally-sourced judgment. What objective signal would replace it as models approach the threshold?
    • AECI is a single scalar fork of an external index; how sensitive is the 155.5 / frontier-not-advanced conclusion to the choice of the n=11 evaluation set?
    • The shift to "direct measurement of AI R&D acceleration and researcher uplift" is announced but not yet operationalized in this card — what does that measurement look like?
  • Autonomous Scientific Discovery
    • Every result is Anthropic-reported and example-selected; the genomics "100× smaller beats *Science*" claim is "intend to publish" — what survives external peer review?
    • Science's verification gap: the formal-proof loop self-validates; here a wrong-but-confident hypothesis costs a wet-lab cycle to falsify. Does autonomy without a fast verifier *increase* the verification bottleneck rather than relieve it?
    • If hypothesis-generation is genuinely at ~80% preference, how much of "research taste" is left as a distinctively human function — and how would you measure the residue?
  • Capability-Gated Model Fallback
    • The >95%/<5% figures are session-level; what's the false-positive rate for *legitimate* security researchers and biologists, whose benign queries are exactly the ones most likely to trip the conservative classifiers?
    • Fallback-not-refusal preserves UX but means the *real* general-access model for security/bio-adjacent work is Opus 4.8, not Fable — does that quietly cap Fable's value for whole professional segments until the trusted-access programs open?
    • The UK AISI's "progress toward a universal jailbreak" is disclosed but not quantified — and the post-launch **access suspension** (see [[claude-fable-5]]) raises the question of whether a safeguard failure forced it.
    • Does swapping to a weaker model on flagged topics create an exploitable oracle (probe which queries trigger fallback to map the classifier's boundary)?
  • Frontier Pause Verification
    • What does an AI-training "verification regime" concretely consist of — compute-accounting, datacenter inspection, hardware attestation, on-chip telemetry? The essay names the problem, not the mechanism.
    • Detectability < verifiability: can detection even be made reliable when training runs leave no physical signature and inputs are dual-use?
    • Who adjudicates triggers and lifts? No institution currently holds that mandate, and standing one up is itself a decade-scale task.
  • Recursive Self-Improvement
    • Is "research taste" a true ceiling (future 1) or just the next capability to fall (futures 2–3)? The essay frames this as the single load-bearing uncertainty.
    • The RSI extrapolation rests on trends staying exponential rather than S-curving — but the essay concedes it cannot rule out an architectural ceiling or a compute/energy supply-chain constraint. Which binds first?
    • If misalignment compounds through self-improvement (future 3), is AECI-gated [[responsible-scaling-policy-evals|RSP]] review fast enough to catch it before control is lost?
  • Research Taste as the Human Bottleneck
    • Is research taste a genuine ceiling (an architectural capability scaling can't reach) or the next jagged valley to fill? The essay calls this the decisive unknown.
    • If taste is automatable, what — if anything — remains a durable human comparative advantage in AI development?
    • How do you measure rubber-stamping? "Humans set direction" can be true on paper while real judgment quietly transfers to the model.
  • Responsible Scaling Policy Evaluations
    • The RSP determination leans heavily on "we use it daily and it doesn't substitute for our researchers." How well does that subjective judgment scale as models approach the threshold?
    • The two new general-access risk pathways (other AI developers; major governments) are newly in scope but lightly evaluated — what would a positive finding there even look like?
    • How does the RSP brake interact with [[recursive-self-improvement]]: is AECI-based gating fast enough if acceleration compounds, and does single-lab gating even matter without the multilateral [[frontier-pause-verification|pause-verification]] regime?