LessWrong (Curated & Popular)

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you

Episodes

Total: 466

\[Linkpost\] “METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman

10h ago

This is a link post. Summary: We propose measuring AI performance in terms of the length of tasks AI

“I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?” by shrimpy

19h ago

I have, over the last year, become fairly well-known in a small corner of the internet tangentially

“Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations” by Nicholas Goldowsky-Dill, Mikita Balesni, Jérémy Scheurer, Marius Hobbhahn

1d ago

10 chapters

Note: this is a research note based on observations from evaluating Claude Sonnet 3.7. We’re sharing

“Levels of Friction” by Zvi

2d ago

12 chapters

Scott Alexander famously warned us to Beware Trivial Inconveniences.When you make a thing easy to do

“Why White-Box Redteaming Makes Me Feel Weird” by Zygi Straznickas

2d ago

There's this popular trope in fiction about a character being mind controlled without losing aw

“Reducing LLM deception at scale with self-other overlap fine-tuning” by Marc Carauleanu, Diogo de Lucena, Gunnar\_Zarncke, Judd Rosenblatt, Mike Vaiana, Cameron Berg

3d ago

7 chapters

This research was conducted at AE Studio and supported by the AI Safety Grants programme administere

“Auditing language models for hidden objectives” by Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Akbir Khan, Euan Ong, Christopher Olah, Fabien Roger, Meg, Drake Thomas, Adam Jermyn, Monte M, evhub

4d ago

10 chapters

We study alignment audits—systematic investigations into whether an AI is pursuing hidden objectives

“The Most Forbidden Technique” by Zvi

5d ago

9 chapters

The Most Forbidden Technique is training an AI using interpretability techniques.An AI produces a fi

“Trojan Sky” by Richard\_Ngo

6d ago

You learn the rules as soon as you’re old enough to speak. Don’t talk to jabberjays. You recite them

“OpenAI:” by Daniel Kokotajlo

2025/3/11

Exciting Update: OpenAI has released this blog post and paper which makes me very happy. It's b

“How Much Are LLMs Actually Boosting Real-World Programmer Productivity?” by Thane Ruthenis

2025/3/9

LLM-based coding-assistance tools have been out for ~2 years now. Many developers have been reportin

“So how well is Claude playing Pokémon?” by Julian Bradshaw

2025/3/9

4 chapters

Background: After the release of Claude 3.7 Sonnet,[1] an Anthropic employee started livestreaming C

“Methods for strong human germline engineering” by TsviBT

2025/3/7

Note: an audio narration is not available for this article. Please see the original text. The origi

“Have LLMs Generated Novel Insights?” by abramdemski, Cole Wyeth

2025/3/6

In a recent post, Cole Wyeth makes a bold claim:. . . there is one crucial test (yes this is a crux)

“A Bear Case: My Predictions Regarding AI Progress” by Thane Ruthenis

2025/3/6

3 chapters

This isn't really a "timeline", as such – I don't know the timings – but this is

“Statistical Challenges with Making Super IQ babies” by Jan Christian Refsgaard

2025/3/5

6 chapters

This is a critique of How to Make Superbabies on LessWrong.Disclaimer: I am not a geneticist[1], and

“Self-fulfilling misalignment data might be poisoning our AI models” by TurnTrout

2025/3/4

This is a link post.Your AI's training data might make it more “evil” and more able to circumve

“Judgements: Merging Prediction & Evidence” by abramdemski

2025/3/1

4 chapters

I recently wrote about complete feedback, an idea which I think is quite important for AI safety. Ho

“The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better” by Thane Ruthenis

2025/2/26

4 chapters

First, let me quote my previous ancient post on the topic:Effective Strategies for Changing Public O

“Power Lies Trembling: a three-book review” by Richard\_Ngo

2025/2/26

4 chapters

In a previous book review I described exclusive nightclubs as the particle colliders of sociology—pl