LessWrong (Curated & Popular)

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you

Episodes

Total: 438

A friend of mine recently recommended that I read through articles from the journal International Se

This is the best sociological account of the AI x-risk reduction efforts of the last ~decade that I&

Hi allI've been hanging around the rationalist-sphere for many years now, mostly writing about

This is the full text of a post from "The Obsolete Newsletter," a Substack that I write a

Ultimately, I don’t want to solve complex problems via laborious, complex thinking, if we can help i

Once upon a time, in ye olden days of strange names and before google maps, seven friends needed to

Summary In this post, we explore different ways of understanding and measuring malevolence and expl

I’m not a natural “doomsayer.” But unfortunately, part of my job as an AI safety researcher is to t

Over the past year and half, I've had numerous conversations about the risks we describe in Gra

This post should not be taken as a polished recommendation to AI companies and instead should be tre

This is a personal post and does not necessarily reflect the opinion of other members of Apollo Rese

I (and co-authors) recently put out "Alignment Faking in Large Language Models" where we s

Summary and Table of ContentsThe goal of this post is to discuss the so-called “sharp left turn”, t

(Many of these ideas developed in conversation with Ryan Greenblatt)In a shortform, I described some

“Anomalous”, “glitch”, or “unspeakable” tokens in an LLM are those that induce bizarre behavior or o

This is the abstract and introduction of our new paper, with some discussion of implications for AI

The CakeImagine that I want to bake a chocolate cake, and my sole goal in my entire lightcone and e

This post offers an accessible model of psychology of character-trained LLMs like Claude. Epistemic

This is a link post.This is a blog post reporting some preliminary work from the Anthropic Alignment