Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you
A friend of mine recently recommended that I read through articles from the journal International Se
This is the best sociological account of the AI x-risk reduction efforts of the last ~decade that I&
Hi allI've been hanging around the rationalist-sphere for many years now, mostly writing about
This is the full text of a post from "The Obsolete Newsletter," a Substack that I write a
Ultimately, I don’t want to solve complex problems via laborious, complex thinking, if we can help i
Once upon a time, in ye olden days of strange names and before google maps, seven friends needed to
Summary In this post, we explore different ways of understanding and measuring malevolence and expl
I’m not a natural “doomsayer.” But unfortunately, part of my job as an AI safety researcher is to t
Over the past year and half, I've had numerous conversations about the risks we describe in Gra
This is a link post.Full version on arXiv | X Executive summary AI risk scenarios usually portray
This post should not be taken as a polished recommendation to AI companies and instead should be tre
This is a personal post and does not necessarily reflect the opinion of other members of Apollo Rese
I (and co-authors) recently put out "Alignment Faking in Large Language Models" where we s
Summary and Table of ContentsThe goal of this post is to discuss the so-called “sharp left turn”, t
(Many of these ideas developed in conversation with Ryan Greenblatt)In a shortform, I described some
“Anomalous”, “glitch”, or “unspeakable” tokens in an LLM are those that induce bizarre behavior or o
This is the abstract and introduction of our new paper, with some discussion of implications for AI
The CakeImagine that I want to bake a chocolate cake, and my sole goal in my entire lightcone and e
This post offers an accessible model of psychology of character-trained LLMs like Claude. Epistemic
This is a link post.This is a blog post reporting some preliminary work from the Anthropic Alignment