cover of episode “Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations” by Nicholas Goldowsky-Dill, Mikita Balesni, Jérémy Scheurer, Marius Hobbhahn

“Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations” by Nicholas Goldowsky-Dill, Mikita Balesni, Jérémy Scheurer, Marius Hobbhahn

2025/3/18
logo of podcast LessWrong (Curated & Popular)

LessWrong (Curated & Popular)

AI Chapters
Chapters

Shownotes Transcript

No transcript made for this episode yet, you may request it for free.