cover of episode 🎄ThursdAI - Dec19 - o1 vs gemini reasoning, VEO vs SORA, and holiday season full of AI surprises

🎄ThursdAI - Dec19 - o1 vs gemini reasoning, VEO vs SORA, and holiday season full of AI surprises

2024/12/20
logo of podcast ThursdAI - The top AI news from the past week

ThursdAI - The top AI news from the past week

Frequently requested episodes will be transcribed first
Chapters

Shownotes Transcript

For the full show notes and links visit https://sub.thursdai.news

🔗 Subscribe to our show on Spotify: https://thursdai.news/spotify

🔗 Apple: https://thursdai.news/apple

Ho, ho, holy moly, folks! Alex here, coming to you live from a world where AI updates are dropping faster than Santa down a chimney! 🎅 It's been another absolutely BANANAS week in the AI world, and if you thought last week was wild, and we're due for a break, buckle up, because this one's a freakin' rollercoaster! 🎢

In this episode of ThursdAI, we dive deep into the recent innovations from OpenAI, including their 1-800 ChatGPT phone service and new advancements in voice mode and API functionalities. We discuss the latest updates on O1 model capabilities, including Reasoning Effort settings, and highlight the introduction of WebRTC support by OpenAI. Additionally, we explore the groundbreaking VEO2 model from Google, the generative physics engine Genesis, and new developments in open source models like Cohere's Command R7b. We also provide practical insights on using tools like Weights & Biases for evaluating AI models and share tips on leveraging GitHub Gigi. Tune in for a comprehensive overview of the latest in AI technology and innovation.

00:00 Introduction and OpenAI's 12 Days of Releases

00:48 Advanced Voice Mode and Public Reactions

01:57 Celebrating Tech Innovations

02:24 Exciting New Features in AVMs

03:08 TLDR - ThursdAI December 19

12:58 Voice and Audio Innovations

14:29 AI Art, Diffusion, and 3D

16:51 Breaking News: Google Gemini 2.0

23:10 Meta Apollo 7b Revisited

33:44 Google's Sora and Veo2

34:12 Introduction to Veo2 and Sora

34:59 First Impressions of Veo2

35:49 Comparing Veo2 and Sora

37:09 Sora's Unique Features

38:03 Google's MVP Approach

43:07 OpenAI's Latest Releases

44:48 Exploring OpenAI's 1-800 CHAT GPT

47:18 OpenAI's Fine-Tuning with DPO

48:15 OpenAI's Mini Dev Day Announcements

49:08 Evaluating OpenAI's O1 Model

54:39 Weights & Biases Evaluation Tool - Weave

01:03:52 ArcAGI and O1 Performance

01:06:47 Introduction and Technical Issues

01:06:51 Efforts on Desktop Apps

01:07:16 ChatGPT Desktop App Features

01:07:25 Working with Apps and Warp Integration

01:08:38 Programming with ChatGPT in IDEs

01:08:44 Discussion on Warp and Other Tools

01:10:37 GitHub GG Project

01:14:47 OpenAI Announcements and WebRTC

01:24:45 Modern BERT and Smaller Models

01:27:37 Genesis: Generative Physics Engine

01:33:12 Closing Remarks and Holiday Wishes

Here’s a talking podcast host speaking excitedly about his show

TL;DR - Show notes and Links

  • Open Source LLMs

  • Meta Apollo 7B – LMM w/ SOTA video understanding (Page), HF))

  • Microsoft Phi-4 – 14B SLM (Blog), Paper))

  • Cohere Command R 7B – (Blog))

  • Falcon 3 – series of models (X), HF), web))

  • IBM updates Granite 3.1 + embedding models (HF), Embedding))

  • Big CO LLMs + APIs

  • OpenAI releases new o1 + API access (X))

  • Microsoft makes CoPilot Free! (X))

  • Google - Gemini Flash 2 Thinking experimental reasoning model (X), Studio))

  • This weeks Buzz

  • W&B weave Playground now has Trials (and o1 compatibility) (try it)

  • Alex Evaluation of o1 and Gemini Thinking experimental (X), Colab), Dashboard))

  • Vision & Video

  • Google releases Veo 2 – SOTA text2video modal - beating SORA by most vibes (X))

  • HunyuanVideo distilled with FastHunyuan down to 6 steps (HF))

  • Kling 1.6 (X))

  • Voice & Audio

  • OpenAI realtime audio improvements (docs))

  • 11labs new Flash 2.5 model – 75ms generation (X))

  • Nexa OmniAudio – 2.6B – multimodal local LLM (Blog))

  • Moonshine Web – real time speech recognition in the browser (X))

  • Sony MMAudio - open source video 2 audio model (Blog), Demo))

  • AI Art & Diffusion & 3D

  • Genesys – open source generative 3D physics engine (X), Site), Github))

  • Tools

  • CerebrasCoder – extremely fast apps creation (Try It))

  • RepoPrompt to chat with o1 Pro – (download)) This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe)