Howardismvol. 03 · quiet corner of the web

Plate II機器翻譯 · machine-translatedENHOWARDISM

Chloe Li

PublishedMay 8, 2026FiledEntityTagsEntityPersonAnthropicAlignment ResearcherReading2 minSourceAI-synthesised

MSM 論文（arXiv 2605.02087）第一作者；Anthropic Fellows Program 成員；設計所有規格與實驗

Chloe Li 的插圖

資料來源#

Model Spec Midtraining: Improving How Alignment Training Generalizes

摘要#

實體。 「Model Spec Midtraining: Improving How Alignment Training Generalizes」（arXiv 2605.02087，2026 年 5 月）的第一作者。Anthropic Fellows Program 成員。設計了 MSM 規格，提出並設計實驗，產出所有結果，撰寫論文。

貢獻#

根據 MSM 論文附錄 A 的作者貢獻聲明：

主導整個專案
設計所使用的 Model Specs（cheese-preference specs、Philosophy Spec、Rules/Value-Augmented/Rule-Augmented specs、General Spec）
提出並設計所有實驗
產出所有結果
撰寫論文

共同作者：Sara Price（Anthropic；指導初始階段）、Jon Kutasov + Samuel Marks（共同指導；Jon 提出專案構想，Sam 引導 controlling-generalization 框架）。

程式碼釋出#

開源了完整的 MSM pipeline、AFT pipeline、Model Specs 及訓練模型：https://github.com/chloeli-15/model_spec_midtraining

相關連結#

著作：Model Spec Midtraining (MSM) 論文
隸屬：Anthropic（Fellows Program）
共同作者：Sara Price、Jon Kutasov、Samuel Marks（Anthropic）
相關研究：Synthetic Document Finetuning (SDF)（Wang et al.，MSM 所建構的基礎技術）
著作：Model Spec Science（關於哪些 Model Spec 特徵最能泛化的實證研究；她設計了規格與實驗）

資料來源#

§ end

About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 5

Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Entities — People, Orgs, Tools & Projects
Map of Content for all 32 entity pages. See Home for concept domains.
Model Spec Midtraining (MSM)
New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…
Model Spec Science
Empirical study of which Model Spec features best generalize alignment; value explanations > rules alone, specific > ge…
Synthetic Document Finetuning (SDF)
Wang et al. 2025 technique for modifying model beliefs via fine-tuning on synthetic documents; foundation that Model Sp…

Related articles

Claude's Constitution / Model Spec
Anthropic Model Spec / Constitution by Askell et al.; document specifying Claude's values + hard constraints (SP1–3, GP…
Alignment Fine-Tuning (AFT)
Standard post-pretraining stage (SFT + RLHF) for installing values; shallow-alignment failure mode motivates Model Spec…
Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Claude Character as Product
Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…
Deliberative Alignment
Guan et al. 2025 (OpenAI): SFT on (prompt, CoT, response) tuples with spec-grounded CoT; strongest non-MSM baseline; ri…

Related articles

Claude's Constitution / Model Spec
Anthropic Model Spec / Constitution by Askell et al.; document specifying Claude's values + hard constraints (SP1–3, GP…
Alignment Fine-Tuning (AFT)
Standard post-pretraining stage (SFT + RLHF) for installing values; shallow-alignment failure mode motivates Model Spec…
Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Claude Character as Product
Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…
Deliberative Alignment
Guan et al. 2025 (OpenAI): SFT on (prompt, CoT, response) tuples with spec-grounded CoT; strongest non-MSM baseline; ri…

Cited by 5

Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Entities — People, Orgs, Tools & Projects
Map of Content for all 32 entity pages. See Home for concept domains.
Model Spec Midtraining (MSM)
New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…
Model Spec Science
Empirical study of which Model Spec features best generalize alignment; value explanations > rules alone, specific > ge…
Synthetic Document Finetuning (SDF)
Wang et al. 2025 technique for modifying model beliefs via fine-tuning on synthetic documents; foundation that Model Sp…