#202 - Qwen-32B, Anthropic's $3.5 billion, LLM Cognitive Behaviors

2025/3/9

Last Week in AI

AI Deep Dive AI Chapters Transcript

People

Andrey Kurenkov

Jeremie Harris

Topics

@Andrey Kurenkov : 我认为GPT-4.5的改进并不显著，这可能表明单纯依靠无监督学习扩展模型规模已接近瓶颈。未来发展方向可能需要转向更加注重推理能力的模型。 @Jeremie Harris : 大型语言模型的改进越来越难以察觉，需要更精细的评估方法。单纯的预训练计算已经无法带来显著的性能提升，需要结合推理计算共同提升模型能力。预训练就像学习，推理就像考试，两者都需要投入时间和精力才能取得好成绩。

Deep Dive

Shownotes Transcript

Our 202nd episode with a summary and discussion of last week's big AI news!
Recorded on 03/07/2025

Hosted by Andrey Kurenkov and Jeremie Harris.
Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/.

Join our Discord here! https://discord.gg/nTyezGSKwP

In this episode:

Alibaba released Qwen-32B, their latest reasoning model, on par with leading models like DeepMind’s R1.
Anthropic raised $3.5 billion in a funding round, valuing the company at $61.5 billion, solidifying its position as a key competitor to OpenAI.
DeepMind introduced BigBench Extra Hard, a more challenging benchmark to evaluate the reasoning capabilities of large language models.
Reinforcement Learning pioneers Andrew Bartow and Rich Sutton were awarded the prestigious Turing Award for their contributions to the field.

Timestamps + Links:

cle picks:

(00:00:00) Intro / Banter
(00:01:41) Episode Preview
(00:02:50) GPT-4.5 Discussion
(00:14:13) Alibaba’s New QwQ 32B Model is as Good as DeepSeek-R1 ; Outperforms OpenAI’s o1-mini
(00:21:29) With Alexa Plus, Amazon finally reinvents its best product
(00:26:08) Another DeepSeek moment? General AI agent Manus shows ability to handle complex tasks
(00:29:14) Microsoft’s new Dragon Copilot is an AI assistant for healthcare
(00:32:24) Mistral’s new OCR API turns any PDF document into an AI-ready Markdown file
(00:33:19) A.I. Start-Up Anthropic Closes Deal That Values It at $61.5 Billion
(00:35:49) Nvidia-Backed CoreWeave Files for IPO, Shows Growing Revenue
(00:38:05) Waymo and Uber's Austin robotaxi expansion begins today
(00:38:54) UK competition watchdog drops Microsoft-OpenAI probe
(00:41:17) Scale AI announces multimillion-dollar defense deal, a major step in U.S. military automation
(00:44:43) DeepSeek Open Source Week: A Complete Summary
(00:45:25) DeepSeek AI Releases DualPipe: A Bidirectional Pipeline Parallelism Algorithm for Computation-Communication Overlap in V3/R1 Training
(00:53:00) Physical Intelligence open-sources Pi0 robotics foundation model
(00:54:23) BIG-Bench Extra Hard
(00:56:10) Cognitive Behaviors that Enable Self-Improving Reasoners
(01:01:49) The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
(01:05:32) Pioneers of Reinforcement Learning Win the Turing Award
(01:06:56) OpenAI launches $50M grant program to help fund academic research
(01:07:25) The Nuclear-Level Risk of Superintelligent AI
(01:13:34) METR’s GPT-4.5 pre-deployment evaluations
(01:17:16) Chinese buyers are getting Nvidia Blackwell chips despite US export controls

#202 - Qwen-32B, Anthropic's $3.5 billion, LLM Cognitive Behaviors 01:19:52 Share

Last Week in AI

Deep Dive

Shownotes Transcript

#202 - Qwen-32B, Anthropic's $3.5 billion, LLM Cognitive Behaviors