OpenAI announced the release of ChatGPT Pro for $200 per month, which includes access to the better model O1. They also released Sora, which allows 50 uses per month and up to 1080p resolution for Pro users. Additionally, updates to Canvas were made, integrating it directly into GPT-4.0 for easier use in custom GPTs.
Canvas is now integrated directly into GPT-4.0, eliminating the need to manually select it for tasks. Users can enable Canvas for previously built custom GPTs, allowing for side-by-side editing and inline use in advanced prompts, enhancing productivity and flexibility.
A study by Deep Health and RadNet showed a 21% improvement in breast cancer detection using AI in mammogram screenings, demonstrating the potential of AI to enhance healthcare outcomes.
The Canadian government invested $240 million in Cohere's AI data center as part of a $2 billion sovereign AI compute strategy, aiming to advance national AI infrastructure and economic development.
Google's quantum chip Willow can reduce errors exponentially and perform calculations in under five minutes that would take today's fastest supercomputers 10 to the septillion years, revolutionizing computational science and potentially confirming the existence of the multiverse.
YouTube's auto-dubbing feature converts audio into multiple languages for knowledge-focused content, benefiting creators by automating the process and expanding their reach to global audiences without the need for manual translation.
NAMs optimize how transformers store and retrieve information by selectively pruning memory, similar to human memory processes. This enhances efficiency and performance across various tasks, enabling superior results with less memory and cross-domain mastery without additional training.
Home care robots need to navigate complex environments, handle unexpected pop-ups, and prioritize tasks effectively, similar to human caregivers. They must also gain acceptance and understanding from the public to fulfill the increasing demand for elderly care.
Gemini 2.0 introduces multimodal capabilities, allowing AI agents to process text, images, and audio together. It also features tool use, function calling, and memory, enabling real-time applications like tutoring and spatial computing, which could revolutionize industries like education and robotics.
Stainless' SDKs simplify the development process for platforms like OpenAI, Anthropic, and Meta by providing cohesive software development kits. This reduces the need for developers to navigate extensive documentation, accelerating the creation of AI-powered applications.
https://www.thedailyaishow.com 在今天的《每日人工智能秀》中,Brian、Beth、Karl和Jyunmi聚集在一起讨论人工智能领域的最新发展,重点关注OpenAI最近的公告,这是他们更新、创新的十二天活动的一部分,AI驱动的医疗保健创新,以及加拿大政府对AI数据中心的重大投资。他们还谈到了YouTube自动配音功能的演变以及谷歌在量子计算方面的突破。 讨论的要点:
医疗保健中的AI:讨论包括Deep Health和RadNet的一项重要研究,该研究表明使用AI技术进行乳腺X光筛查时,乳腺癌检测的准确率提高了21%,展示了AI在改善医疗结果方面的潜力。 量子计算:谷歌的新量子芯片Willow被重点介绍,其潜力在于通过显著减少错误并以当今超级计算机无法达到的速度进行计算,从而彻底改变计算领域,暗示了计算科学的新可能性。 加拿大AI倡议:Karl分享了关于加拿大在推进AI基础设施方面的国家努力的见解,特别是政府对Cohere雄心勃勃的AI数据中心项目的投资,强调了AI在国家经济和技术发展中的战略作用。 YouTube的自动配音:团队探讨了YouTube推出的专注于知识内容的自动配音功能,讨论了其对不同语言内容可访问性的影响,以及对像他们这样的AI聚焦节目可能带来的好处。
剧集时间线:
00:00:28 👋 介绍和OpenAI日回顾 00:01:47 🤔 紧迫新闻话题 00:02:11 💻 OpenAI的12天:前四个公告 00:06:14 🎨 自定义GPT中的Canvas 00:07:26 ✨ 对OpenAI剩余日子的期待 00:07:53 🏥 AI增强的乳腺X光研究 00:10:09 🇨🇦 加拿大AI数据中心投资 00:14:32 🏞️ 数据中心与自然 00:17:04 🤔 数据中心位置与基础设施 00:17:41 🌐 YouTube自动配音功能 00:20:09 🗣️ 自动配音演示 00:22:55 🌍 语言障碍与全球内容 00:24:24 🧠 进化的通用变压器记忆 00:28:44 🤖 NAMs对机器人技术的影响 00:30:14 🏠 老年人家庭护理机器人 00:31:49 💻 谷歌的量子芯片Willow 00:34:11 🤔 量子计算与多元宇宙 00:37:14 ✨ 谷歌的Gemini AI代理 00:42:04 🤖 Gemini的实际应用 00:44:23 🖱️ Gemini的多模态能力 00:46:01 ⚙️ 钢铁SDK用于AI平台 00:49:35 💻 Automatic收购WP AI,Humane的Cosmos OS 00:54:01 👋 结论与新闻通讯注册
<raw_text>0 LLMs mess up a lot with, you know, Jan is seven years old and his brother Alex is twice his age. How old is Alex? Well, the answer is 14, right? Or how old is Alex? Alex is seven because his brother is twice his age. It's selective memory knows to keep the right answer and eliminate the wrong answer. Because what it's saying is, and we all know this, is when you try to hold everything in memory, it confuses the
the heck out of the language models. We just talked about this yesterday with the whole idea of the traffic lights. If you put everything in there, it gets awful confusing to figure out what it should be looking at and what it should. So the answer isn't everything always in memory. The answer is selective pruning of memory, much more like humans do it. We tend to connect and fire the neurons on things that we need to keep for long-term memory, like how to chew and not stab ourselves in the face with our fork, right, Beth? Yeah.
We just talked about it as an inside thing with me and Beth the other day. And then also things that only need to be remembered for a short period of time, you know? So I think it's really, really exciting. And obviously, yes, I'm so glad you brought it up, Beth. My guess was Jimmy was going to do it, but maybe Jimmy didn't...
Just beat me to it, basically. Yeah, I've always been impressed with the Sakana folks. Yes, they're small and quiet and then drop these huge movements in the space. I think this right here, these NAMs, is going to be...
That silent background thing that's just gonna enable, you know huge forwards into like embodied AI Right. So if if your robot no longer has to remember Every possible thing that's around them They only they they look at it more like a human like this has priority or this is what's important in here that ability to discern is huge because we talked about it before for LLM is is
there isn't this curation process, right? And so this is that step towards that curation process. So if we can see this in other applications or this is the basis for doing that kind of curation, then we're going to see some like,
Sci-fi futuristic jumps in. It's like, oh yeah, this is my humanoid home care robot that knows that these pills are important for me. And not this other frivolous thing. The donut's not important. The pills are important for eating every day. Those kinds of things. Which leads me into my next science story.
where Chiba University, we seem to have a Japan street going on right now. So Chiba University is doing a study on how to empower older adults with home care robots. So according to their estimates, by 2030, one in every six individuals will be over 60 years old.
That's a sixth of the entire population. That's well over a billion people, right? And with that, home care is going to increase. The need for home care is going to increase. And we don't have enough people going into home care or elderly care to fill that gap.
So they're thinking robots are going to fill this, embodied AI is going to fill this role, right? And we've touched upon that here and there. But
But right now, there isn't that acceptance or information, understanding, acceptance of what that's going to look like. So their big study is in to find out how they can alleviate that and then create applied solutions for it. So I thought that was interesting news. All right.
Uh, I got one because G brought it up, uh, but it's one I would want to bring up anyway. Um, he was saying, did, I'm sorry, this is Justin. I apologize. Justin was saying anyone catch the news on Google's new quantum chip Willow and, uh, Justin and everybody else, you might, you might enjoy this. So the first thing I saw pop up on YouTube was one of their engineers or somebody. Um, and this, this I'll call him a kid cause he looked young. Um,
Obviously, he's working on quantum quantum computers. But he did this presentation of it. And I watched all six minutes of this presentation. I was like, I barely understand anything.
he just said. And I was like, wow, that was really not over my head, but it was just flying past me. I was like, okay, what are we trying to say here? So I had to do some other research on it since then. So Justin, I'm glad you brought it up. But this is just from Google. So it says, today I'm, I'm not who I am.
I'm delighted to announce Willow, our latest quantum chip, will have a state-of-the-art performance across a number of metrics, enabling two major achievements. So this is what I really wanted to share because this is what I was missing. The first is that Willow can reduce errors exponentially as we scale up using more qubits.
This cracks a key challenge in quantum error correction that the field has pursued for almost 30 years. So that's number one. Second, Willow performed a standard benchmark computation in under five minutes that would have taken one of today's fastest supercomputers 10 to the septillion years, a number that vastly exceeds the age of the universe. Yeah.
So they built a computer that could solve a problem in five minutes that would take longer than the universe to do with today's fastest supercomputers. So just a small improvement in speed there, just a tiny one. So yeah, there you go. There's the two major things they felt like they cracked. It is a huge announcement. I think it was...
Sort of announced during this 12 days opening. I, and you know, this, that's gotten a lot of the news, but it is a really, really big deal. How it's deployed, who uses it, who this makes the biggest impact for. I'm not quite sure. But very, very interesting technology. As I watch this, this kid explain it and pluses and minus signs and the whole thing. And I was like, that looks awesome. I don't know what I'm looking at.
Right? And one of the hypotheses about why and how it could do that many calculations is that it was doing calculations in the multiverse, in alternative, in other universes. That was like the last line of the post, too. Yeah. Really? Yeah, it was just like a throwaway. Yeah, so I was watching, was it Nate?
Nate Johnson? Jones. Jones. Sorry, Nate Jones. Sorry about that, Nate. So I was watching, because he was covering this bit of news after Carl dropped it in our back channel. And
Yeah, the throwaway piece of that is Google believes it confirms the existence of the multiverse by this one line, just throwaway line about, oh, we can do this because of the multiverse. I'm just like, wow, just a reality there, Google. You know the thing about timelines? Actually a thing. Like...
I'm surprised he didn't say like, "Who's seen Loki?" -Right. -Exactly. That's what we're talking about. That's what they should name the-- I know they called it Willow, but they should definitely name the first data center, Quantum Data Center Loki, just for us popular IP folks. Or what was that, Back to the Future, where they show the alternative?
Is that back to the future? Oh, how it breaks. Yeah. It breaks, right? That's where I remember that from as a kid. No, Willow is obviously very, very cool. It is a big announcement from Google, and it just goes to show just how big Alphabet is, Google's parent company, and how many different pieces and parts and
fingers they have in different parts of technology that this is just one of them. Not to mention, we just mentioned Google earlier, but we called it YouTube, right? They own YouTube too. And like what they just did there. So it's really, really amazing. And it's fun that it's not a straight up Gemini story. Whether that advanced technology is being used on the backend, it's cool to like see other really, really cool advancements here that Google is coming out with. And this is nothing new. They've been doing this for decades now, but
This is a cool one. It'll be interesting to see where it goes. The salty one in our comments says, means the end of our passwords and our passwords will be absolutely useless. Well, listen, the salty one, I've heard that because of other reasons too with passwords and passwords
and how they're done and things like that. So yeah, I think there's going to be a big reckoning here very, very soon on how we secure data and doing it with a hexadecimal 20 digit, you know, password is no longer going to be good enough if we can solve
problems in five minutes to take longer than the universe to do on a supercomputer that certainly seems like it can do a lot in the multiverse on there. So yeah, I have another one if you want if I can share one more here. Yeah, can I break in? Can I do breaking news? Yeah, do it. Oh, go for it. Yeah. So I think Gemini Flash, Gemini 2.0 just dropped. And I think I really want to show you a
They presented these four demos and it's not a slick demo. It feels like a legit demo.
So they have one on building AI agents. If you all want to watch that. Yeah, we do. I love that. I just said it wasn't about Gemini. You're like breaking news. The multiverse, Brian, the multiverse. Logan, I want to say Fitzpatrick, but that's not it.
I think, or maybe that is it. - I wanted to see Logan Paul, but I'm like, no, it's not. - I was like, no, not Logan Paul. - On one of the open AI drops, he just like left a tiny comment that said, things change in a flash, right? It's like, oh, okay, here we go. - Yeah, this is them building AI agents with it.
Hey, this is a quick demo of multimodal live streaming in Gemini 2.0, and we are casting a tab into AI Studio. Can you see my screen? Yes, I see your screen. It's showing a document with the title "Demo." It also contains bullet points and a Google Meet window with a person in the video. Is there anything in particular you would like to talk about on the screen? Yeah, can you start the demo by just reading the text I highlighted?
Sure, I can do that. The text you highlighted reads, "The Multimodal Live API lets you build real-time multimodal applications powered by Gemini 2.0 Flash. It features tool use like Google search and code execution, function calling, and more." Would you like me to read the following lines too? No, can you actually just tell me what this word means though?
I can do that. Multimodal refers to the ability to process and understand different types of data, like text, images, and audio, all together. Does that help? Yeah, um, I need to demo the interruptions feature next. Can you tell me a really boring, kind of once upon a time sort of story?
Okay, I can do that. Once upon a time, in a small village nestled beside a quiet forest, there was a very unremarkable rock. This rock was gray and it was round and it cracked their- Sorry, I didn't mean to interrupt you. No problem. I understand that interruptions are part of the demo. What should I do now? Um, so we're gonna demo memory. Can you just summarize everything we've seen or heard so far?
I can do that. So far, we've discussed a demo of the Multimodal Live API, which lets you build real-time multimodal apps, and I defined what multimodal means. I was also asked to tell a very boring once-upon-a-time story, but I was interrupted. Is there anything else? No, I think we're done. Do you want to read this ending card, or should I?
I can read it if you would like. The ending card says, start building with Gemini 2.0 at aistudio.google.com. Is that correct? Yeah, that was great. That's it. There's a bunch of other videos there. On the one I'm really curious about is...
is spatial. They had a really interesting piece there on spatial computing, on spatial visual or whatever. So I want to watch that video. And then I think though from a voice I was reading, it's still text to speech. So I don't think it's the same as the real-time API, but it did say that it could recognize how you were saying it. So I'm curious to see how that is because text to speech is
What does that mean? Is it still the old where your text, your speech gets transcribed, then that's transcribed and there's three things that have to happen? I'm curious to see what that means. And I think Gemini was at 12.06. 12.06 has been in the...
arena battle there for a bit. And so I'm curious to see what people, the build potential with this is very, very interesting too. It says, I'm reading this blog as you were showing that from, yeah,
I don't know who they are, but that's just the one I pulled up here. It says real-world applications, so this is sort of helpful. Agentic experiences powered by Gemini 2.0 are poised to transform industries. And it says education, AI agents can tutor students using dynamic multimodal
So yeah, really, really interesting. Not surprising at all to see them
you know, announcing this right now during the 12 days of opening out. We've seen this before. You know, maybe Google always had this on their docket. Maybe they're strategically aligning this to come out as opening eyes, pushing all these new things like SOAR and Canvas and things like that. Pull the wind out of some sails. Yeah, I mean, this is nothing new. And we've seen this before. And I think it's smart. I think it's actually good for everybody. And it's just showing how much
progress is going on as we end this year. I mean, we're just getting hit after hit after hit after hit between Runway and Google and Sakana and OpenAI and Meta getting in there and everything else. Grok 3 is coming, right? Grok 3 is everybody's... Isn't Grok 3 was...
trained using the hundred thousand something. That's the one he talked about, yeah. That was the one, right? The last thing I want to just point out is the coding assistant that they powered. They have Jules, your AI-powered code agent. So
It works within your GitHub workflow, handles bug fixes, time-consuming tasks, whatever it is when you're coding. So that's very interesting in addition to some of these other things that you can do. So I'm curious to see. I definitely want to start seeing if I can integrate some of the custom builds we have with Gemini 2.0 to kind of maybe see some native different multimodal aspects of this. So that would be interesting.
So that circling of multimodal without saying it, just having the cursor go around it and have a conversation about it was fascinating because my understanding is what Gemini's multimodal is, is it divides it up into screenshots.
And if you just do a screenshot, you're just pointing, right? So you have to be processing the sequence of the screenshots to be able to get that. And that was, wow. It was highlighted though. The word was highlighted. So I could see how that would be pulled into a
OK. Look, I'm not-- and I'm not saying this isn't impressive, right? I mean, you know, it's kind of silly, you know, to be like, well, it's not that impressive. Yeah, of course it's impressive. But just to point out, like, it was highlighted. So depending on how many frames are being captured,
And I would imagine, Jimmy, right, this would be eventually like movies where it's at least 30 frames per second. Yeah, they'll be able to capture, like, as soon as they hit 30, well, as soon as they hit 24, because I think right now the top end's like 16 a second. Right, that's motion. That's, yeah. Yeah.
So if you can hit 24, you'll capture the vast majority of the content that's out there. 30, you're talking about 90% or more. And then for the stuff in the last few years, that 60 plus, you've got everything. Like pretty much everything on YouTube is going to be 30 or less. And that's what the college is now too, right? About selective memory, the NAMs. Oh, yeah.
and then you go okay this starts to make sense even if it's capturing 30 frames per second if it's just as fast pruning 24 frames out of that that don't matter anymore and deleting that out because it doesn't make sense well there you could watch my screen and if i had a true agent and we're working together there's going to be plenty of times where it's capturing no motion on the screen because there's there's there's a process or thinking going on at that time and it's irrelevant
If you just capture everything and you have to memorize, you have to keep everything in memory, that's going to slow down the processing speed. You're going to have way more data in there that you need to have. So you can see how these different technologies start to play with each other because, yes, we can get to a point where hopefully Gemini and others are capturing what's on your screen at 30 frames per second. But it's what do you do with that data and how do you decide what's most relevant specifically?
to the task. Just comparing them to each other and being able to pick, you know, cherry pick what, what's important. Yeah. That's going to be huge. Think about an agent trying to work past the pop-up on a screen. Like, Oh, you have an agent that's supposed to go get you a dinner reservation or something. Great.
but it hits a pop-up and it totally gets confused because it's taking up three quarters of the screen. Okay, that's irrelevant to the task. We know that as humans, click X out of that, move on with what we're doing. It's annoying, get rid of it. This video just took up the whole screen. Minimize it. An agent, that's not so easy. So it'll be interesting to see how all this starts to play together because that's what it's going to take for true agentic workflows. They're going to have to get past the things that are blockers. True navigation. Yeah, totally.
Like pop-ups, and then that leads to captures and things like that. Because as soon as the agent can literally replicate all of your movement and interpret the screen like you would, then it's free to do whatever it really wants at that point. Or whatever you want at that point. Yeah, exactly.
there's also one more thing that they rolled rolling out today it's releasing gemini deep research mode
to advanced users. So I think it's, I think their answer to a one and it's their reasoning mode. So I'm not sure because Sundar Pichai just like 17 minutes ago, put it out. So I think if you really want to try it, you got to head over to AI studio. There's a really cool spatial demo that I'm,
kind of trying to figure out. But if you look at some of these capabilities versus, and this is Flash, Gemini 2.0 Flash, right? So like when... Fastest and smallest model, right? Right, it's like 2.5 cents a million tokens or 25 cents a million tokens. It's better than Gemini 1.5 Pro. So I think, and Dennis saw this just
posted too. It's like, this is the first of our 2.0 models. So get ready for the next level of models, which is, you know, what's crazy is that, you know, ChatGPT 4.5 or whatever is being like, you know, there's heavy, heavy rumors that's coming out in the 12 days. So
This battle is insane. Within this next week or two, Grok, I was like, Meta, where's Llama? What is it? Four? Whatever it is, it's just one-on-one-on-one. Llama drops 3.3. 3.3, but that's the last... They say forthcoming. So now it's like, hey, Anthropic, where is... Whatever you have, but...
Well, I know we're up against time, so I just wanted to wrap up with a couple of quick hits. And the main one is Stainless, this company called Stainless. What they are creating specifically for companies or for APIs for OpenAI, Anthropic and Meta, essentially, they've created a method for developing SDKs or software development kits
to develop using OpenAI, Anthropic and Meta and various others. So that's been one of the gaps, right? Here's an API, go for it guys. Here's our basic development documentation. But there isn't...
There hasn't really been a cohesive SDK that will help developers or guide developers through doing a lot of the low-hanging fruit, mundane tasks, the interconnections and things like that.
And so that needs, so that forces developers to go through a huge amount of documentation and learning everything about the API access and everything like that. But SDKs allow you to sort of shortcut that and get your development going. And then you, you know, you can review and things like
And that's one of the things about having all the code helpers and those kinds of things. So this stainless company is coming out with their product where they create SDKs for all of those services. So that I think on the development side is pretty big in terms of getting more people and developers into developing for those platforms, things like that.
A couple other quick ones I think will be also useful for us in the future is Automatic.
which is the parent company and owner of WordPress. They've got WordPress.com, WordPress.org, a bunch of other things, including Tumblr. But they just purchased a company called WPAI. And WPAI is a company that builds AI tools and products for WordPress.
So that is going to be a huge influx of AI-powered tools and AI-powered building tools for the WordPress environment, which is pretty much the largest GUI-based web host, website building environment.
software that's out there. So most websites are either on a WordPress site or one of the other ones, but WordPress has been around for so long. There's been so many integrations. All of your best blogs are run on WordPress for the most part. And so this addition of WPAI to the family, the automatic family, I've got high hopes that we're going to see
more AI tools just be automatically added to the base WordPress. And that'll be a huge get for creators and web developers and things like that.
And finally, Humane has come out with a-- or they've announced their Cosmos OS, which if you remember, Humane is supposed to be that layer AI or essentially your orchestrator or conductor AI that will conduct all the other AIs you have in your life.
And their big slick video they came out with essentially says, "Humane is going to live on all of your devices, every single device, and it's going to orchestrate all of the AIs that live on those devices for you. And that will be your interface layer." And that's their big push is to have that AIOS essentially.
All right. So those are my last quick hits. Does anybody have anything else?
No? Okay. So we'll wrap it up there. Thank you, everyone, for joining us. It has been some intense news, great stuff from around the world. Carl's always giving us the latest and greatest, late-breaking. And yeah, again, thank you, everybody, for joining us. Also, remember, we're going to be on in a couple more hours, in a couple hours, because it's day five of the 12 Days of OpenAI. And as always...
Sign up for the newsletter at thedailyaishow.com to always get more in-depth insights and information about what we talk about during the week. And that comes out every Sunday. All right. Thanks, everybody. Bye.
Bye.</raw_text>