Sources#
Summary#
Entity. Lead author of "Model Spec Midtraining: Improving How Alignment Training Generalizes" (arXiv 2605.02087, May 2026). Member of the Anthropic Fellows Program. Designed the MSM specs, proposed and designed the experiments, produced all results, wrote the paper.
Contributions#
Per Author Contributions (App. A of the MSM paper):
- Led the project
- Designed the Model Specs used (cheese-preference specs, Philosophy Spec, Rules/Value-Augmented/Rule-Augmented specs, General Spec)
- Proposed and designed all experiments
- Produced all results
- Wrote the paper
Co-authors: Sara Price (Anthropic; advised initial phase), Jon Kutasov + Samuel Marks (jointly supervised; Jon proposed the project, Sam guided the controlling-generalization framing).
Code release#
Open-sourced the full MSM pipeline, AFT pipeline, Model Specs, and trained models: https://github.com/chloeli-15/model_spec_midtraining
Connections#
- Author of: Model Spec Midtraining (MSM) paper
- Affiliated with: Anthropic (Fellows Program)
- Co-authors: Sara Price, Jon Kutasov, Samuel Marks (Anthropic)
- Adjacent work: Synthetic Document Finetuning (SDF) (Wang et al., the technique MSM builds on)
- Authored: Model Spec Science (the empirical study of which Model Spec features generalize best; she designed the specs and experiments)
Sources#
Cited by 5
- Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- Entities — People, Orgs, Tools & Projects
Map of Content for all 32 entity pages. See Home for concept domains.
- Model Spec Midtraining (MSM)
New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…
- Model Spec Science
Empirical study of which Model Spec features best generalize alignment; value explanations > rules alone, specific > ge…
- Synthetic Document Finetuning (SDF)
Wang et al. 2025 technique for modifying model beliefs via fine-tuning on synthetic documents; foundation that Model Sp…
Related articles
- Claude's Constitution / Model Spec
Anthropic Model Spec / Constitution by Askell et al.; document specifying Claude's values + hard constraints (SP1–3, GP…
- Alignment Fine-Tuning (AFT)
Standard post-pretraining stage (SFT + RLHF) for installing values; shallow-alignment failure mode motivates Model Spec…
- Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- Claude Character as Product
Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…
- Deliberative Alignment
Guan et al. 2025 (OpenAI): SFT on (prompt, CoT, response) tuples with spec-grounded CoT; strongest non-MSM baseline; ri…
