cover of episode #202 - Qwen-32B, Anthropic's $3.5 billion, LLM Cognitive Behaviors

#202 - Qwen-32B, Anthropic's $3.5 billion, LLM Cognitive Behaviors

2025/3/9
logo of podcast Last Week in AI

Last Week in AI

AI Deep Dive AI Chapters Transcript
People
A
Andrey Kurenkov
J
Jeremie Harris
Topics
Andrey Kurenkov: 我认为GPT-4.5的改进并不显著,这可能表明单纯依靠无监督学习扩展模型规模已接近瓶颈。未来发展方向可能需要转向更加注重推理能力的模型。 Jeremie Harris: 大型语言模型的改进越来越难以察觉,需要更精细的评估方法。单纯的预训练计算已经无法带来显著的性能提升,需要结合推理计算共同提升模型能力。预训练就像学习,推理就像考试,两者都需要投入时间和精力才能取得好成绩。

Deep Dive

Shownotes Transcript

Our 202nd episode with a summary and discussion of last week's big AI news! Recorded on 03/07/2025

Hosted by Andrey Kurenkov) and Jeremie Harris). Feel free to email us your questions and feedback at contact@lastweekinai.com )and/or hello@gladstone.ai)

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/).

Join our Discord here!) https://discord.gg/nTyezGSKwP

In this episode:

  • Alibaba released Qwen-32B, their latest reasoning model, on par with leading models like DeepMind’s R1.

  • Anthropic raised $3.5 billion in a funding round, valuing the company at $61.5 billion, solidifying its position as a key competitor to OpenAI.

  • DeepMind introduced BigBench Extra Hard, a more challenging benchmark to evaluate the reasoning capabilities of large language models.

  • Reinforcement Learning pioneers Andrew Bartow and Rich Sutton were awarded the prestigious Turing Award for their contributions to the field.

Timestamps + Links:

cle picks:

(00:00:00) Intro / Banter

(00:01:41) Episode Preview

(00:02:50) GPT-4.5 Discussion

(00:14:13) Alibaba’s New QwQ 32B Model is as Good as DeepSeek-R1 ; Outperforms OpenAI’s o1-mini)

(00:21:29) With Alexa Plus, Amazon finally reinvents its best product)

(00:26:08) Another DeepSeek moment? General AI agent Manus shows ability to handle complex tasks)

(00:29:14) Microsoft’s new Dragon Copilot is an AI assistant for healthcare)

(00:32:24) Mistral’s new OCR API turns any PDF document into an AI-ready Markdown file)

(00:33:19) A.I. Start-Up Anthropic Closes Deal That Values It at $61.5 Billion)

(00:35:49) Nvidia-Backed CoreWeave Files for IPO, Shows Growing Revenue)

(00:38:05) Waymo and Uber's Austin robotaxi expansion begins today)

(00:38:54) UK competition watchdog drops Microsoft-OpenAI probe)

(00:41:17) Scale AI announces multimillion-dollar defense deal, a major step in U.S. military automation)

(00:44:43) DeepSeek Open Source Week: A Complete Summary)

(00:45:25) DeepSeek AI Releases DualPipe: A Bidirectional Pipeline Parallelism Algorithm for Computation-Communication Overlap in V3/R1 Training)

(00:53:00) Physical Intelligence open-sources Pi0 robotics foundation model)

(00:54:23) BIG-Bench Extra Hard)

(00:56:10) Cognitive Behaviors that Enable Self-Improving Reasoners)

(01:01:49) The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems)

(01:05:32) Pioneers of Reinforcement Learning Win the Turing Award)

(01:06:56) OpenAI launches $50M grant program to help fund academic research)

(01:07:25) The Nuclear-Level Risk of Superintelligent AI)

(01:13:34) METR’s GPT-4.5 pre-deployment evaluations)

(01:17:16) Chinese buyers are getting Nvidia Blackwell chips despite US export controls)