Howardismvol. 03 · quiet corner of the web

Plate IIEntities中文HOWARDISM

Chloe Li

PublishedMay 8, 2026FiledEntityDomainEntitiesTagsEntity Person AnthropicAlignment ResearcherReading2 minSourceAI-synthesised

Lead author of MSM paper (arXiv 2605.02087); Anthropic Fellows Program; designed all specs and experiments

Illustration for Chloe Li

Sources#

Model Spec Midtraining: Improving How Alignment Training Generalizes

Summary#

Entity. Lead author of "Model Spec Midtraining: Improving How Alignment Training Generalizes" (arXiv 2605.02087, May 2026). Member of the Anthropic Fellows Program. Designed the MSM specs, proposed and designed the experiments, produced all results, wrote the paper.

Contributions#

Per Author Contributions (App. A of the MSM paper):

Led the project
Designed the Model Specs used (cheese-preference specs, Philosophy Spec, Rules/Value-Augmented/Rule-Augmented specs, General Spec)
Proposed and designed all experiments
Produced all results
Wrote the paper

Co-authors: Sara Price (Anthropic; advised initial phase), Jon Kutasov + Samuel Marks (jointly supervised; Jon proposed the project, Sam guided the controlling-generalization framing).

Code release#

Open-sourced the full MSM pipeline, AFT pipeline, Model Specs, and trained models: https://github.com/chloeli-15/model_spec_midtraining

Connections#

Author of: Model Spec Midtraining (MSM) paper
Affiliated with: Anthropic (Fellows Program)
Co-authors: Sara Price, Jon Kutasov, Samuel Marks (Anthropic)
Adjacent work: Synthetic Document Finetuning (SDF) (Wang et al., the technique MSM builds on)
Authored: Model Spec Science (the empirical study of which Model Spec features generalize best; she designed the specs and experiments)

Sources#

§ end

About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 5

Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Entities — People, Orgs, Tools & Projects
Map of Content for all 32 entity pages. See Home for concept domains.
Model Spec Midtraining (MSM)
New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…
Model Spec Science
Empirical study of which Model Spec features best generalize alignment; value explanations > rules alone, specific > ge…
Synthetic Document Finetuning (SDF)
Wang et al. 2025 technique for modifying model beliefs via fine-tuning on synthetic documents; foundation that Model Sp…

Related articles

Claude's Constitution / Model Spec
Anthropic Model Spec / Constitution by Askell et al.; document specifying Claude's values + hard constraints (SP1–3, GP…
Alignment Fine-Tuning (AFT)
Standard post-pretraining stage (SFT + RLHF) for installing values; shallow-alignment failure mode motivates Model Spec…
Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Claude Character as Product
Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…
Deliberative Alignment
Guan et al. 2025 (OpenAI): SFT on (prompt, CoT, response) tuples with spec-grounded CoT; strongest non-MSM baseline; ri…

Related articles

Claude's Constitution / Model Spec
Anthropic Model Spec / Constitution by Askell et al.; document specifying Claude's values + hard constraints (SP1–3, GP…
Alignment Fine-Tuning (AFT)
Standard post-pretraining stage (SFT + RLHF) for installing values; shallow-alignment failure mode motivates Model Spec…
Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Claude Character as Product
Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…
Deliberative Alignment
Guan et al. 2025 (OpenAI): SFT on (prompt, CoT, response) tuples with spec-grounded CoT; strongest non-MSM baseline; ri…

Cited by 5

Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Entities — People, Orgs, Tools & Projects
Map of Content for all 32 entity pages. See Home for concept domains.
Model Spec Midtraining (MSM)
New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…
Model Spec Science
Empirical study of which Model Spec features best generalize alignment; value explanations > rules alone, specific > ge…
Synthetic Document Finetuning (SDF)
Wang et al. 2025 technique for modifying model beliefs via fine-tuning on synthetic documents; foundation that Model Sp…