cover of episode o3 - wow

o3 - wow

2024/12/21
logo of podcast AI Explained Official Podcast

AI Explained Official Podcast

Frequently requested episodes will be transcribed first
Chapters

Shownotes Transcript

**o3 isn’t one of the biggest developments in AI for 2+ years because it beats a particular benchmark. It is so because it demonstrates a reusable technique through which almost any benchmark could fall, and at short notice. I’ll cover all the highlights, benchmarks broken, and what comes next. Plus, the costs OpenAI didn’t want us to know, Genesis, ARC-AGI 2, Gemini-Thinking, and much more. **

**FrontierMath: **https://epoch.ai/frontiermath)

https://arxiv.org/pdf/2411.04872)

Chollet Statement:https://arcprize.org/blog/oai-o3-pub-breakthrough

**MLC Paper: **

https://www.scientificamerican.com/article/new-training-method-helps-ai-generalize-like-people-do/?utm_campaign=socialflow&utm_source=twitter&utm_medium=social)

**AlphaCode 2: **https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf)

**Human Performance on ARC-AGI: **https://arxiv.org/pdf/2409.01374v1)

**Wei Tweet ‘3 months’:**https://x.com/_jasonwei/status/1870184982007644614)

**Deliberative Alignment Paper: **https://openai.com/index/deliberative-alignment/)

**Brown Safety Tweet: **https://x.com/polynoamial/status/1870196476908834893)

**Swe-Bench Verified: **https://openai.com/index/introducing-swe-bench-verified/)

**Amodei Prediction: **https://x.com/OfirPress/status/1858567863788769518)

**David Dohan: 16 hours **https://x.com/dmdohan/status/1870171404093796638)

**OpenAI Personal Writing: **https://openai.com/index/learning-to-reason-with-llms/)

https://simple-bench.com/)

**John Hallman Tweet: **https://x.com/johnohallman/status/1870233375681945725)

00:00 - Introduction

01:19 - What is o3?

03:18 - FrontierMath

05:15 - o4, o5

06:03 - GPQA

06:24 - Coding, Codeforces + SWE-verified, AlphaCode 2

08:13 - 1st Caveat

09:03 - Compositionality?

10:16 - SimpleBench?

13:11 - ARC-AGI, Chollet