#184 - OpenAI's Voice 2.0 + execs quitting, Llama 3.2, To CoT or not to CoT?

2024/10/2

Last Week in AI

AI Deep Dive AI Insights AI Chapters Transcript

A

Andrey Kurenkov

J

Jon Krohn

Andrey Kurenkov：OpenAI发布了具有更多语音和新外观的高级语音模式；Meta的AI助手可以模拟名人的声音；Google的Gemini语音模式已在Android上免费提供。 Jon Krohn：语音模式是AI交互的未来，因为它比文本输入更灵活；OpenAI和Meta在AI语音助手策略上存在差异，OpenAI采用闭源策略，而Meta专注于开源模型；Gemini的实时网络搜索和处理大型文件的能力使其成为有价值的AI工具。 Andrey Kurenkov：Luma和Runway推出了AI视频生成API，标志着AI视频生成技术的商业化；Microsoft的Copilot Wave 2将AI集成到Microsoft 365应用中，提高生产力；Perplexity AI集成了OpenAI的O1模型，专注于推理能力的搜索功能。 Jon Krohn：OpenAI的O1模型在处理复杂的数学和编程问题方面显著优于GPT-4；Meta发布Llama 3.2对AI生态系统至关重要，因为它是一个与前沿模型竞争的大型语言模型，并且现在可以处理图像；阿里巴巴发布了OVIS 1.6，这是一个新的多模态大型语言模型。

Deep Dive

Why is OpenAI rolling out Advanced Voice Mode with more voices and a new look?

OpenAI is enhancing the user experience by adding more voices and a new design, making the AI assistant more versatile and engaging. The update includes custom instructions and five new voices, increasing the total to nine. This feature is primarily available in the US and represents a significant step in the evolution of conversational AI, allowing for more natural and real-time interactions.

Why is Meta paying celebrities millions of dollars for their voices in AI chatbots?

Meta is investing in celebrity voices to make their AI chatbots more appealing and engaging, hoping to increase user interaction with AI features across their platforms like Instagram, WhatsApp, and Facebook. This move is part of their strategy to attract users and demonstrate the capabilities of their AI, particularly in the realm of voice-based interactions.

Why is Grok partnering with Aramco to build a massive data center in Saudi Arabia?

Grok, a chip startup, is partnering with Aramco to build a data center with 19,000 language processing units, initially, and potentially up to 200,000 units. This partnership is aimed at providing significant AI infrastructure to Saudi Arabia, a country that maintains neutral relations between the West and other nations. The move is strategic for Grok to scale its services and for Saudi Arabia to advance its AI capabilities.

Why did OpenAI execs quit as the company removes control from the non-profit board and hands it to Sam Altman?

Several OpenAI executives, including the CTO, VP of Research, and Chief Research Officer, have quit following a reorganization that shifts control from the non-profit board to Sam Altman. This move towards a for-profit structure and the potential for commercial interests to influence governance may have led to these departures. The company is also facing challenges in maintaining its original mission of benefiting all humanity.

Why is O1, OpenAI's new model, particularly impressive in math and symbolic reasoning?

O1 is designed to handle complex tasks by using a chain-of-thought (CoT) approach, which allows it to break down problems and reason through them step-by-step. This capability is especially effective for math and symbolic reasoning, where it can achieve up to 50% better performance compared to GPT-4. The model's ability to spend more time on reasoning and planning makes it a significant advancement in these areas.

Why is Microsoft planning to power data centers using the Three Mile Island nuclear plant?

Microsoft has signed a 20-year power purchase agreement to reopen the Three Mile Island nuclear plant, now named Crane Clean Energy Center, to power its data centers. This move is part of their strategy to secure a reliable and sustainable energy source as AI models require significant power to operate. Nuclear energy, despite its past controversies, is seen as a stable and environmentally friendly option compared to fossil fuels.

Why did Governor Newsom sign bills to combat deepfake election content and protect digital likenesses of performers?

Governor Newsom signed several bills to address the growing concerns around AI-generated deepfakes and their potential misuse in elections and the entertainment industry. AB2655 requires large platforms to remove or label deceptive election-related content, AB2839 expands the timeframe for prohibiting such content, and AB2355 mandates disclosure for AI-generated performer likenesses. These laws aim to prevent misinformation and protect the rights of individuals.

Why is an AI tool like ChartWatch reducing unexpected hospital deaths by 26%?

ChartWatch, an AI early warning system, monitors changes in a patient's medical record and makes hourly predictions about potential deterioration. By alerting doctors and nurses to patients who need immediate intervention, it significantly reduces unexpected deaths. The system uses over 100 inputs, including vital signs and lab results, and has shown promising results in early trials. This technology could lead to more comprehensive health monitoring and timely interventions.

Why is Snapchat introducing an AI video generation tool for creators?

Snapchat is introducing an AI video generation tool to allow creators to generate videos from text prompts, enhancing content creation on the platform. This tool, powered by Snap's own foundational video model, is in beta and available to a small subset of creators. It aims to make video creation more accessible and creative, with plans to expand the feature in the future.

Why is Lionsgate partnering with Runway for AI-assisted film production?

Lionsgate is partnering with Runway to explore the use of AI in film production, particularly in pre-production and post-production stages. They aim to develop AI models that can create backgrounds and special effects, potentially reducing the need for traditional VFX crews and storyboard artists. This move is part of a broader trend of integrating AI into creative industries to streamline processes and enhance productivity.

Shownotes Transcript

Our 184th episode with a summary and discussion of last week's big AI news! With hosts Andrey Kurenkov) and guest host Jon Krohn.

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/).

If you would like to become a sponsor for the newsletter, podcast, or both, please fill out this form).

Email us your questions and feedback at [email protected] )and/or [email protected])

In this episode:

OpenAI, Meta, and Google are enhancing their AI assistants with advanced voice modes, while Meta released Llama 3.2, an open-source model capable of processing both images and text.

Significant AI infrastructure developments include Grok's partnership with Aramco for a massive data center in Saudi Arabia, and Microsoft's plan to power data centers using a reopened Three Mile Island nuclear plant.

Recent research shows chain-of-thought prompting is most effective for math and symbolic reasoning, while OpenAI's GPT-4 with vision capabilities is being integrated into Perplexity AI's search platform.

AI is being rapidly integrated into various sectors, with examples including ChartWatch reducing unexpected hospital deaths, Snapchat and YouTube introducing AI video generation tools, and Lionsgate partnering with Runway for AI-assisted film production.

Timestamps + Links:

(00:00:00) Intro / Banter

(00:04:45) Response to listener comments / corrections

Tools & Apps (00:07:46) OpenAI rolls out Advanced Voice Mode with more voices and a new look)

(00:13:32) Meta’s AI can now talk to you in the voices of Awkwafina, John Cena, and Judi Dench)

(00:17:11) Gemini’s chatty voice mode is out now for free on Android)

(00:21:30) AI video rivalry intensifies as Luma announces Dream Machine API hours after Runway)

(00:23:35) Copilot Wave 2 supercharges productivity with AI across all your Microsoft 365 apps)

(00:25:56) Perplexity introduces new 'Reasoning' focus powered by OpenAI's o1)

Applications & Business (00:33:47) OpenAI Execs Mass Quit as Company Removes Control From Non-Profit Board and Hands It to Sam Altman)

(00:41:46) Sam Altman departs OpenAI’s safety committee)

(00:43:04) Chip Startup Groq Backs Saudi AI Ambitions With Aramco Deal)

(00:46:29) Grok’s image generator, Black Forest Labs, is raising $100M at a $1B valuation, say sources)

(00:48:05) Pudu unveils super semi-humanoid robot with 8-hour battery, 10kg lift power )

(00:50:56) Amazon introduces Amelia, an AI assistant for third-party sellers)

Projects & Open Source (00:52:58) Meta Releases Llama 3.2—and Gives Its AI a Voice)

(00:56:52) Alibaba Unveils Ovis 1.6 – A New Multimodal Language Model)

Research & Advancements (01:00:35) To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning)

(01:06:14) LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench)

(01:10:15) Norwegian Startup 1X Unveils AI World Model for Robot Training)

(01:12:44) AI tool cuts unexpected deaths in hospital by 26%, Canadian study finds)

Policy & Safety (01:15:47) Three Mile Island nuclear plant will reopen to power Microsoft data centers)

(01:18:23) Governor Newsom signs bills to combat deepfake election content)

(01:20:12) Governor Newsom signs bills to protect digital likeness of performers)

(01:22:20) Startup behind “world’s first robot lawyer” to pay $193K for false ads, FTC says)

Synthetic Media & Art (01:24:49) Snap is introducing an AI video generation tool for creators)

(01:25:57) YouTube Shorts to integrate Veo, Google’s AI video model )

(01:26:56) Lionsgate Signs Deal With AI Company Runway, Hopes That AI Can Eliminate Storyboard Artists and VFX Crews)

(01:28:01) Outro