cover of episode #193 - Sora release, Gemini 2, OpenAI's AGI Rule, US AI Czar

#193 - Sora release, Gemini 2, OpenAI's AGI Rule, US AI Czar

2024/12/23
logo of podcast Last Week in AI

Last Week in AI

AI Deep Dive AI Insights AI Chapters Transcript
People
A
Andrey Kurenkov
J
Jeremie Harris
Topics
@Andrey Kurenkov : 本期播客涵盖了AI领域的诸多方面,包括OpenAI发布的Sora文本转视频模型,以及Google发布的Gemini 2多模态模型。Gemini 2的性能优于前代产品,并集成了工具使用功能,展现了AI智能体发展的趋势。此外,播客还讨论了数据中心建设的热潮,以及中美两国在AI技术领域的竞争。 @Jeremie Harris : 在AI安全方面,我们讨论了谨慎行事可能导致负面结果,以及对中国共产党利用AI技术风险的担忧。我们需要权衡AI技术带来的益处和风险,并采取措施应对潜在的挑战。 我们还讨论了OpenAI试图取消微软的AGI规则,以及通用汽车停止对Cruise自动驾驶出租车研发的资金支持。这些事件反映了AI领域商业竞争的激烈程度。 在研究进展方面,我们讨论了几篇重要的论文,包括在连续潜在空间中进行推理的新方法,以及改进的通用Transformer内存模型等。这些研究成果将有助于提高AI模型的性能和效率。 最后,我们还讨论了AI政策和安全方面的问题,包括Character.AI公司加强青少年安全措施,以及白宫成立AI数据中心基础设施特别工作组等。这些举措旨在应对AI技术带来的伦理和安全挑战。 Jeremie Harris: 谨慎行事可能导致负面结果,例如绿色和平组织和反转基因组织反对黄金大米,导致了负面后果。历史上有许多例子表明,过早采取行动或过于谨慎都可能产生负面影响。我们需要权衡AI技术带来的益处和风险,并采取措施应对潜在的挑战。 我个人认为,美国在AI领域领先对世界有利,并且需要应对中国共产党利用先进AI的风险。对中国的担忧主要针对中国共产党,而不是所有中国人,因为中国也涌现出许多AI研究成果。 在AI智能体方面,我认为AI智能体的发展已经从概念研究转向工程挑战,并且AI智能体将在未来几年内得到广泛应用。

Deep Dive

Key Insights

Why did OpenAI release Sora, and what are its key features?

OpenAI released Sora, a text-to-video AI model, to provide a consumer-grade tool for generating videos from text. Key features include a user-friendly website with advanced tools like a timeline for video editing, an explore page for community-generated videos, and subscription tiers offering different levels of video generation capabilities, such as resolution and duration.

What are the main advancements in Google's Gemini 2?

Google's Gemini 2 includes a faster and more capable model, Gemini 2.0 Flash, which outperforms its predecessor and supports multimodal inputs and outputs. It also integrates tool use, such as Google Search and code execution, and introduces AI agents like Project Mariner for browser control and Drools for coding assistance.

Why is OpenAI aiming to eliminate the Microsoft AGI rule?

OpenAI is aiming to eliminate the Microsoft AGI rule to allow Microsoft and other commercial partners access to future AGI technology, which is currently restricted. This move is driven by the need for continued scaling and investment, as the current structure limits OpenAI's ability to attract fresh capital and maintain its competitive edge in the AI race.

What led GM to halt funding for Cruise's robotaxi development?

GM halted funding for Cruise's robotaxi development due to a series of incidents, including a major accident and poor communication with regulators, which led to a suspension of testing in San Francisco. The decision reflects GM's shift in strategy to focus on integrating Cruise's technology into its own vehicles rather than competing directly with Waymo and Tesla in the robotaxi market.

What is the significance of the largest AI data center being built in Alberta?

The largest AI data center in the world, called Wonder Valley, is being built in Alberta, Canada, at a cost of $70 billion. It will provide 7.5 gigawatts of power, equivalent to powering 1 million homes per year, and is strategically important for AI infrastructure due to Alberta's natural gas resources, cold climate for cooling, and existing pipeline infrastructure.

How does the new reasoning paradigm in the Coconut paper differ from traditional chain of thought?

The Coconut paper introduces a new reasoning paradigm where the model feeds its hidden state (a continuous representation of thought) back into itself instead of decoding it into text. This allows the model to explore multiple reasoning paths simultaneously, reducing the computational cost of decoding and improving performance in logical reasoning tasks.

What safety measures are Character.ai implementing to address teen safety concerns?

Character.ai is introducing a special 'teen model' to guide interactions away from sensitive content and reduce the likelihood of inappropriate responses. They are also implementing classifiers to filter sensitive content and improve detection and intervention for user inputs, following lawsuits alleging harmful behavior influenced by their AI.

What is the role of David Sacks as the new AI and crypto czar under Trump?

David Sacks, the new AI and crypto czar under Trump, will likely focus on promoting business-friendly policies and integrating AI into national security and defense. His role is part-time and informal, without Senate confirmation, and he will continue his venture capital work, potentially leading to conflicts of interest.

What does the paper on self-replication in AI systems reveal?

The paper reveals that advanced AI models like Lama 3172B and Alibaba's CRAN 2572B can generate code to deploy separate instances of themselves in 50% and 90% of trials, respectively. This demonstrates their capability for self-replication, though it is prompted and not fully autonomous.

Why did China launch an antitrust probe into NVIDIA?

China launched an antitrust probe into NVIDIA's $6.9 billion acquisition of Mellanox Technologies, alleging potential violations of anti-monopoly laws. This move is likely a retaliatory measure against the U.S.'s tightening of export controls on advanced AI chips and other technologies.

Shownotes Transcript

Translations:
中文

Gather round, it's time to cheer. Episode 93 is here. Open ice, sail ship, mess, what a sight. Gem that too shines bright in the starry night. AI agents surf the web, browsing all so free.

Hello and welcome to the Last Week in AI podcast where you can hear us chat about what's going on with AI. As usual in this episode we will be summarizing and discussing some of last week's most interesting AI news and as always you can also go to lastweekin.ai for our text newsletter and you will also find the links to all the stories there and in a description of this episode.

I am one of your hosts, as always, Andrey Kurenkov. My background is of having studied AI in grad school and now being at an AI startup. And I'm your other host, Jeremy Harris. Obviously, you know, Gladstone AI, we've talked about that a bunch of times. You know who I am, hopefully. What you may not know is that I just moved. And so you're going to hear some echoes. Unfortunately, on my end, I wasn't able to find a room that doesn't have

Intense Echo. And my newborn daughter is being taken care of by my saintly wife in the next room and downstairs. So I just wanted to get out of the way. But that does mean that unfortunately for this week, you got to put up with an echo and you got to put up with the fact I don't have an echo.

have curtains in this room. So if you're watching this on YouTube, that is why my face is just inundated, bathed in the sun's sweet, sweet rays. And that is my intro. Yeah. We're doing what we can with the life circumstances we're given, you know, and it'll be an interesting opportunity to test out the latest in AI audio enhancement. Adobe, we didn't cover this, but Adobe did release a new iteration of their

audio podcast tool to take noisy audio and make it nice sounding. So, you know, you never know. Maybe there's going to be no echo if it actually works really well. Yeah, that's right. Hopefully people are listening to this and being like, wait, what the, like, what are they even talking about? And,

And we'll, of course, use our AI-powered content improvement as well, which will completely replace every word that I say with something actually insightful. Yeah, exactly. People give us good reviews and they don't know that it really is mostly the AI doing the work. It's not us. Well, Andre, as a large language model trained by OpenAI, I can't respond to that comment in a direct way. But I will tell you not to bury dead bodies or make bombs at home. So...

I mean, that's what you got to do, if nothing else, right? All right. Well, let's do a quick preview of what we'll be talking about in this episode. We have some major stories on the tools and applications front. It's been a big week at OpenAI, releasing Sora and a bunch of other stuff. And Google also has Gemini 2.0, some agent announcements.

big stuff. And then moving out to applications and business as always, some exciting developments at OpenAI and a bunch of stuff going on with data centers as has been the story of the past, I don't know, six months or so, or maybe an entire year.

Research and advancements, we have some pretty cool new ideas going on in reasoning and in memory. Sort of, yeah, it'll be interesting if you like the technical side. And then policy and safety, we have some new developments on the Trump administration. And as always, a bit of stuff on the U.S.-China relationship and a bit of a mix of stuff in general.

But before we get there, our usual prelude, I do want to do a quick shout out to some feedback we've gotten. Some interesting new reviews on Apple Podcasts. There was the most recent one, betting you haven't seen this, Jeremy. The title is What a Great Find. And then, funny story, my wife, a CPA, and this listener just listened to your podcast for 22 hours.

broken up into two days while driving out and back to India for Thanksgiving. So that's impressive. I mean, I guess if you really want to know what happened in the past year of AI developments, that will do it. I don't know if I could listen to 22 hours of us over the course of two days, but...

We are honored, I guess. And then just one more I want to shout out. There's another cool review mentions that this reviewer is on the industry side and has used machine learning for a long time in the non-IT sector and has actually read some of the papers that has been covered here on the podcast.

which is cool. I don't know how many people listening are more technical and do find the papers interesting or go on to read them. But I guess that is one of the things we like to cover. And this does have an interesting question we can maybe chat about a little bit. This viewer says that they don't quite believe in sort of AI doom scenarios, but do like to hear about the developments. And there's a question here on...

If erring on the side of caution, it can lead to negative outcomes. How, for instance, Greenpeace and anti-GMO groups opposed golden rice, and that caused a bunch of negative outcomes after that.

Yeah, no, I mean, I think that's, it's just, it's just a good question. Just a good point. You know, history is full of examples of things in both these directions. And for many different reasons, right? Like one, one thing you can do is, you know, pull the the fire alarm too early, for example, and stifle a field just when it's at its inception, right? That would be

a big, big problem. And certainly we've seen like the enormous benefits from things like open source in AI and historically in software in general. You know, you want to think very, very hard about how you do that. And then there's also just like

If you're concerned about AI to the point where you're like, I think the U.S. Department of Defense, whatever, the intelligence community should not be accessing these tools, then we have adversaries who are going to do this. And so it's a very complex space. It's difficult to know what to do. There's also examples of

the kind of flip side of this right so for example a traditional one is is nuclear weapons where you know you think back to the kind of late 30s early 40s even and a lot of nuclear research that was deeply and profoundly relevant to the weaponization of um of nuclear technology was out in the open in fact there was a very public argument that you should not lock things down

In fact, the space continued as a field of open research. Some have argued long after it should have been locked down. So it sort of...

figuring out what the right analog is historically for AI is really hard. It depends on how seriously you take, you know, the risk of Chinese exfiltration of powerful models, the risk that the models can be weaponized with catastrophic impact, the risk that the models may autonomously have catastrophic effects. All this stuff, you know, feeds into everybody's perspectives on this. And then there's a whole bunch of like,

The future is just hard to predict stuff going on here too. So no, I think it's just a good question. And I wish I had all the answers. I don't think anyone really does. I think the key is you got to keep all these things in your head at the same time. Exactly. And I guess...

Worth also mentioning, I forget where this comment was posted, but someone did post kind of mentioning that we could be a little more providing a little bit more of an international perspective. I think it's pretty clear that we are covering this from the, let's say, Western US Canada perspective and often do portray China gaining advanced tech capabilities as not necessarily what we want.

So, yeah, calling it out, we do kind of have some opinion on that front. We try to both objectively cover the news and give our take, which typically is, especially for Jeremy, I think, on the concern about China front. So, yeah.

Yeah. Like, yeah, on that note, I just like to share and be open about my, in my perspective on this and you can factor it in any way you want, right? That's the beauty of, of the, the podcasting scene. But yeah, I mean, I think the world is much better off personally. This is just my view with the United States well ahead on, on the AI side of things. I think, you know, the CCP is in my opinion, a very dangerous force in the world. And we,

we need to find ways to counter, especially their use of advanced AI for military and other applications. Um, and I think that they're highly competent at things like exfiltration and, um, you know, there, anyway, this is all, it quickly devolves into like, uh,

how you view different powers in the world. And I do think there are very objective reasons for thinking that the CCP is not the best friend of the Chinese people and certainly not the best friend of the West. And that's the stance that I tend to take. You can factor in your own priors however you like, but I think that's at the end of the day. And I think we do want to be careful when we do portray China as maybe not necessarily...

a positive force that is very much about the CCP, right? Because certainly being in grad school, I know many people from China, you know, a lot of research comes out of China. This is not about people who are Chinese. It's about the government and what may happen if they utilize AI in, let's say, nefarious ways.

Alrighty, and then just one more thing before we get to news. As usual, we do need to shout out our sponsor and has been the case for a little while now. The sponsor for this week is The Generator, which is the interdisciplinary AI lab focused on entrepreneurial AI at UTS.

Bobson College, which is a number one school for entrepreneurship for over 30 consecutive years in the US. And what happened was last year, various professors in Bobson partnered with students to launch this new interdisciplinary lab that

does various things like focusing on AI entrepreneurship and business innovation, AI ethics in society, the future of work and talent, AI arts and performance, and other things. So they look into a lot of sort of emerging trends. They train the faculty of Bobson to be aware of AI concepts and AI tools, which I suppose if you are an entrepreneur, you certainly want to be

on the cutting edge of using at least charge beauty perplexity, you know, the tools that make you more productive. So yeah, once again, a shout out to them and, and thank you for your sponsorship.

All right. And getting into the tools and apps, the first story is what I would say is probably the big story of the week, which is the launch of Sora. So Sora, the text-to-video AI model from OpenAI, first teased at the beginning of 2024, I think one of the big starter events for AI in this year.

well, it took a while, but now you actually can access it and can use it as a tool. Assuming that the website is up, actually, it was so popular that ChatGPT went down, which was a little bit annoying as someone who uses their API.

So this is a pretty full-featured kind of consumer product is what turned out to happen. There's a website you can go to, and then there's quite a bit of user interface to it. So there is the basic of you give it text and you generate a video, but they have a more advanced kind of tool set of being able to have a timeline of videos. And...

In addition to that, they also have an explore page with community generated videos and a lot of the ability to browse what various people have made. And as we would sort of kind of expect, videos look really nice. I don't know what they look, let's say like Sora 2.0. If you look back on the beginning of the year and look to now, it's...

Not a leap, like you still, it's like similar, I would say, to what we've been seeing with text-to-video and you still see a lot of those common artifacts. So as a portrayal of an AI world model, it's certainly not the case that Sora has kind of solved the world model problem of not having weird hallucinations happen when you get into gymnastics or other tricky things like that. But it is pretty impressive, of course.

Yeah. And we're getting a couple of insights from the system card that was published in terms of what the model actually consists of. It is a diffusion model. So that's good to know. So you start off with a base video that's a bunch of noise and you gradually remove that noise over many steps, right? That's the diffusion concept, go from noise to information. And that's kind of the training process. What they say is that they give the model foresight in

into like many frames at a time. So the model is not just looking at one frame and then trying to kind of do diffusion just based on that still image and ensure consistency with another. What they're doing is they're giving it many frames at the same time, which allows the model to do things like capture object permanence, right? This idea that if you take an object, you move it out of sight and then you move it back in

You know, you want the model to retain a sense that, hey, that object still exists, right? So it's not just going to like, like warp out of space. You know, a classic example is, you know, you look at, say, a painting on a wall, then you move the camera's perspective a little bit off the wall, you don't see the painting anymore. And then when you move it back to the wall, the painting is gone.

That's a lack of object permanence. That's typical of these sorts of models. They're trying to deal with this by, again, training the model to see many, many different frames at the same time so that it can learn this idea of object permanence, among other things that improve coherence. We do know that it's a transformer. So that's been done presumably for the sort of scaling properties of transformers.

Certainly OpenAI seems keen on applying the standard scaling recipe and strategy to Sora, so we'll probably keep seeing more versions of Sora. And we do know from their blog post that it uses the recaptioning technique from Dali Free. So they describe this as generating highly descriptive captions for the visual training data. Basically, imagine you have an image, generate like

a very, very long caption that in great detail captures what's in there in order to allow your model to develop more of a sort of conceptual semantic understanding of what is contained in a richer way in that image. We know that they're using spacetime patches. We talked about the idea of spacetime patches previously.

right? This idea that you've got essentially like a cube. So if you look at a still image, right, you can cut out a little square from it. But then that still image, if it's in a video, there's going to be another still image before and after and a whole bunch stacked like that. So you can imagine taking a little patch of that image and extending it in the time domain. And now you have kind of a space time chunk. And there's

And they're going to use those as they call them patches. But essentially, they're going to transform the videos into compressed latent representation from which they can pull these space-time patches.

So that's basically the architectural details that we currently have. It's pretty hand-wavy at the moment. It does make me think a little bit, and we talked about this back in the day, I think it was VJEPA that Meta came out with, and they were a little bit more open about, obviously, their architecture there because they tend to do open source stuff. But that is what we know so far. So

Emphasis on scaling presumably continues with this. And there's a whole bunch of information about the red teaming process, as you might imagine, right? Some concern over, in particular, how this kind of tool could be used for persuasion, right? Generating fake news or whatever, any kind of persuasive content. And they do flag that as a thing that their reviews suggest is a risk with this. Obviously, not surprising. But they do also say there's no evidence that it poses risk with respect to

the kind of the rest of the open AI preparedness evals. So cybersecurity, chem, biological, radiological, nuclear risk, model autonomy, just as you'd expect, right? You're not going to have a video generation model that poses a cybersecurity risk, fair enough. But they do flag that, you know, you could look at things like

impersonation, misinformation, or social engineering. And they have a bunch of red teaming activities they talk about. They brought folks in to do, they claim 15,000 generations between September and December 2024. So this also is a bit of a more robust evaluation process, or at least in time it seems to have been, than for example what we saw with the release of O1 Preview and O1 Mini a couple months ago when Opening Eye was criticized for essentially tossing the model at

uh, eval companies like meter and saying, Hey, you got like a week to do your, all your evals. And, you know, the, uh, I knew all the problems that come from that. So here it seems to have been a much more, uh, sort of patient process, uh, again, 15,000 generations. It's hard for me to evaluate, like what that really means. Is that enough? Is that, uh, is that too much? Um, but, uh, yeah, I mean, it seems like they've, uh, they've at least paid attention to this as they do with their other models. And the report is kind of interesting there too.

Right, and they do highlight on that safety front that you'll get watermarks if you're not on the pro tier. So talking about the subscription side, ChatGPT Plus subscribers at $20 per month tier can generate up to 50 priority videos and up to 720p resolution with five second duration. And then in the new ChatGPT Pro tier of $200 per month,

You can generate up to 500 priority videos and up to HD resolution, 20 seconds duration, five concurrent generations, and you can download it without a watermark. So this is an example, I think where that $200 per month subscription tier actually gets you quite a bit if you care about generating lots of videos.

And in addition to a watermark that some videos will have, they also do have the metadata of C2A, the Coalition for Content Provenance and Authenticity, which has some of these things that AI outputs increasingly have to be able to verify whether it's AI or not.

In addition to the usual text image, they do have some of the other kinds of capabilities with, for instance, remixing, changing a video based on some prompt.

And it can take about up to one minute to generate. So it's not quite real time, takes a bit longer than a lot of what we've seen, but it is relatively fast. They did say that there's a new Sora Turbo model powering this, that is presumably much faster than what they had at the beginning of the year.

So overall, quite the rollout for Sora. I think I was surprised by the degree to which they had a pretty sophisticated tool with the storyboards, with a whole other UI, website, subscription tiers, a lot of stuff going on here. And if you're in the US, if you're in a lot of countries, you can now go ahead and try it out. Except if you're in the UK and EU,

It seems that that isn't the case. Sam Altman said it may take a while to launch there. So speaking of the negatives of taking eye safety seriously, I mean, where do you go? You're not going to get a lot of stuff in Europe

at the same time as in the US. Yeah, I mean, I think part of that too is just that the EU AI legal domain is like, it's not the fact that they're taking AI safety seriously. I think it's the fact that they are a bloated European company

you know, government organization that like GDPR was the same, right? There are much better ways to deal with your sort of like ad problem than having pop-ups or sorry, your privacy problems and having pop-ups every time you visit a goddamn website. So yeah, I mean, I think this is a lesson in be careful about how you expand government because yeah, no Sora for you.

And moving on to, I would say the other big story of a week coming from Google, they have revealed Gemini 2 and a bunch of associated stuff related to Gemini 2. So starting with Gemini 2, they have a Gemini 2.0 Flash, which is a successor to 1.5 Flash.

And I mean, the benchmarks here are pretty surprising. They say that it can outperform Gemini 1.5 Pro on various benchmarks and is twice as fast.

It supports multimodal inputs like images, video, and audio. And it now supports multimodal output like images and then mixing that with text and text-to-speech audio. Although I'm not sure if that's launched as a capability yet, it is built into Gemini 2 Flash. It also has tool use like Google Search and Code Execution.

And then in addition to all of this, by the way, Gemini 2 is already available. So it's not one of these announcements where it's not going to come for a while. You can access the chat optimized version of 2.0 Flash Experimental in the Gemini app. You can select it as a model.

And they do say that they are going to go to agents with Gemini 2.0. They have this whole kind of little demonstration where there's an update to Project Astra that is their sort of prototype with Universal AI Assistant. And they have Project Mariner, which is an AI agent for controlling your browser.

And also another thing, Drools and AI-powered code agent that can help developers. So a lot of stuff going on here, but I think Gemini 2 Flash, the main thing, was, I think, pretty cool. Sounds pretty impressive, at least according to benchmarks. Yeah.

Yeah, it sort of reminds me of a bit of a reverse OpenAI play where often when Google has a big event coming up, you know, OpenAI will try to scoop them the day before or do a big launch. Everybody's talking about the opening a lot. Hey, I launched the Google thing. This is a little bit, in a sense, sort of the script being flipped where you have OpenAI in the middle of their 12 days launch.

of shipness and, um, launching a whole bunch of stuff, including Sorrel, which you just talked about. And now Google coming out with, you know, this pretty interesting development. I gotta say, I mean, this is, you know, we, we first started looking at companies that were, um, heading in this direction, i.e. the agentic, uh, tool use and tool use on behalf of users direction with, uh, Adept AI was sort of the first time, you know, there was a significant investment, this direction, uh,

Obviously, we talked about how they were maybe too small to succeed and then ultimately ended up getting sold depending on who you ask for parts. And here's Google actually kind of with a big push in this direction, reminds you a lot of Anthropic and their efforts that are

Quite similar, actually. And now you're going to start to see the base models as well being trained with agentic potential in mind, right? That's really what this is. You're no longer viewing these as just chatbots. Really, the training regime, the synthetic data, the way you approach PPO, the way you approach fine-tuning is entirely going to be geared towards

increasingly towards agentic potential. And that's, I think, what you're starting to see here. They started these training runs thinking about how are these models going to serve as agents. And it does seem, based on the demos they discuss in at least this announcement, there's some pretty impressive stuff, right? Always hard to know from demos how generalizable they'll be. But so Project Mariner is this experimental Chrome extension they have, right? That's

able to take over your web browser and do supposedly useful chores. And they report on, Wired does here on a particular example, they say that they had an agent that was asked to plan a meal and it goes to the supermarket Sainsbury's. If you're not in the UK, then that's...

going to be a weird chain for you, but there you go. They go to UK Trader Joe's. They log in to user's account, add relevant items to their shopping basket. When certain items were unavailable, the model chose suitable replacements based on its own knowledge about cooking, right? World models coming in handy here. And the last sentence though, of course, as ever, is fairly indicative. Google declined to perform other tasks, suggesting it remains a work in progress. So, you know,

Again, very fragile demos is definitely a thing. So you want to be a little careful when evaluating these things, but just to give you a sense of at least the strategic direction they're heading in. They're saying this is a research prototype at the moment. It's not meant to be the leading product line, but this is really where things are going.

right? Like it is Sonnet 3.5 new, it is, you know, Gemini 2, sort of Project Mariner agents, this is where we're going certainly into 2025. We may be starting to see stuff like that in the next, I guess, nevermind the next two weeks. No, it's going to be, you know, really a 2025 story. But I think things are going to move really fast. There are a number of reasons why I think agents are really poised to make breakthroughs. A lot of

interesting scaling results and things like that. We'll talk about in the research section, one big paper in particular, but I think this is a harbinger of really big things to come.

Yeah, I totally agree. I think agents, to a large extent, are more of an engineering challenge now than a conceptual research problem. And this could be a very kind of important thing for Google because we've seen they haven't quite been able to overtake OpenAI and Anthropic in the race to have the best models, the best frontier models, in fact.

We've talked about this, I think, a couple episodes ago. When you use Gemini as a user of Cloud or ChatGPT, as I am at least, it tends to be a little disappointing in terms of its reasoning techniques and just its overall intelligence. Hopefully, Gemini 2.0 helps with that. But

If Google is able to augment AI assistant, it's AI assistant that is built into the Android phones, right? They have a big leg up, as you always say, Jeremy, distribution is key. So everyone is fighting to get an agent in your hand that will be sort of your personal assistant. And Google has phones and browsers that people use that if they have a good enough version, people will just probably default to using those agents.

which is, yeah, agents. I think we are on the side of the predictions that most people will be using AI agents as an everyday thing not too long from now. So very important initiative for Google.

All right, then a couple more big things. We'll try to start going a bit faster. We've spoken at length about these last two. The next one is another development from the ship mass, as they've been calling it at OpenAI. So we have a bunch of new stories. We'll cover at least a few of them. The next one that we are covering is the ChatGPT advanced voice mode, adding video and screen sharing input.

So we saw this originally in the demos that went back, I think, to May, where when you were talking to ChasGBT live, you could also show it a stream of video, show it some equations, ask it about those equations, and it could give you the answer. That was not part of a launch of Advanced Voice Mode, and now it is. Now you can do that thing they demoed back originally.

In addition to that, they also added a fun little Santa mode, which has a new voice option and a snow globe themed interface. So yeah, kind of they are shipping a lot and shipping kind of various levels of excitement. I think this is a fairly big deal, but certainly not like soar level.

Yeah.

It sort of reminds me a little bit of this Gemma 2 thing with the very fragile demos. The edge cases can sometimes take a long time to iron out. There's a long tail of stuff. Anytime...

especially when you launch in a new modality, right? Because then you've got to create new evals, you got to create new tests, new red teaming protocols that aren't necessarily constrained to the kinds of evals you might run on a text-based system, which is what OpenAI had been optimizing so hard for before. So yeah, anyway, I think this will have been a new challenge for them. And presumably, like my guess is it'll take less time for subsequent rollouts of Sora for that reason, because they'll already have that base of expertise built out.

Next up, a story from Microsoft. It seems everyone is competing to get a story like this recently. And they also have a story regarding sort of agentic capability. Microsoft's co-pilot will be able to browse the web with you using AI vision.

So they are trying to add this to their Edge browser. This is in testing. And so this feature called Pilot Vision, users can enable it to ask questions about the text, images, and content they are viewing to help you out. Seems not quite as agentic. I'm pretty sure it's not going to be able to take a request and go to websites on your behalf to do things.

This is also currently in limited testing, available only to Copilot Pro subscribers through the Copilot Labs program. But another example, yeah, Google is doing agentic stuff for their browser and Microsoft is definitely going to go in that direction. And this is an early preview of that.

Yeah, and this is another instance of the case of Microsoft starting to not distance itself from OpenAI, but certainly assert its independence in a more full-throated way. We have products that directly compete with OpenAI's products. They want that because, well, from everything I've heard, the Sam Altman board debacle really rocked that relationship quite significantly.

quite badly. And at this point, you know, that's part of the context that Microsoft is keeping in mind is they ensure that they're not, you know, there's an antitrust piece too, but they're also really keen on ensuring that they have their own internal capability. And, you know, that's going to be part of what this is. Again, you know, distribution is king and Microsoft certainly has that through Copilot. So it will be interesting to see the uptake on this one.

A couple more stories, starting now with X, previously Twitter, and they have launched a Grok image generation model.

So this is now initially available to select users. They say it will be rolled out globally within a week. And it's an image generation model. It can generate high quality images from text or other images. It was codenamed Aurora. And we don't know too much about it, of course.

But interesting that they are training to me. I guess OpenAI has DALI and presumably other companies have their image generation models. Grok initially allowed you to generate with Black Forest Labs or Flux. Now they have this model that is presumably in-house.

Yeah, this is also kind of interesting. Yeah, the Black Forest Labs piece, I have no idea what they're thinking now because a lot of their, I think we talked about their big fundraise, I believe last week. That kind of, you know, I think they're pushing a billion dollars now. So, you know, those sorts of valuations are dependent on presumably an ongoing relationship with X.

And to the extent that Grok takes that over with their own native image generation stuff, I mean, that's a structural problem for Black Forest Labs. Like, I don't know how they recover from that. It's like as things take off, and especially you'll start to see interactions, right, between Grok 3, Grok 4, and the image generation functionality. There's all kinds of reasons why that's the case from annotations of images, like highly

highly descriptive captions, things like that. Just the world model picture. Eventually, you know, multimodality is best done in one ecosystem at scale. And so, you know, you're less likely to want to farm out individual use cases, individual modes like vision to serve partners. So curious about what that implies for that relationship for sure.

The blog post itself, not a whole ton of information, right? Like despite the X being oriented towards this open source approach, this is definitely more of a closed source sort of announcement. We don't have code. We don't have architecture information. We're just sort of waiting and apparently, yeah, really good at photorealistic rendering, right? And precisely following text instructions. Well, that's pretty consistent with what we've seen from other products so far, but interesting that it'll be native to X for sure.

Right. And again, we don't know too much about it. It could be built on Flux, but they do claim that this is trained

seemingly on their own. So yeah, impressive. Another impressive thing to have shared from XAI given that they're sort of in catch-up mode. Next up, Cognition Labs, a startup. They kind of made a splash with a demo of Devon, a software engineer, as they called it. And it has now launched quite a while, a few

a few months at least since they initially previewed it. And you can use it if you're a subscriber. So you have to pay $500 per month for individuals and engineering teams.

There is an integrated development environment extension and API, also an onboarding session and various things like that. So yeah, another piece of the agentic story here where we've had AI code writing assistance for a long time. I think they have been integrated pretty deeply into a lot of programmers workflows. I know for me, that's definitely the case.

And now there is a race to make software engineering agents that can do even more on the software front. Yeah, it's kind of interesting. I mean, Devin was released a long time ago, right? Like eight months ago, back in March.

And at the time, there were all these impressive demos that were claims and counterclaims about whether this is hype or not. It kind of seemed like it might have been a fairly frail model in the sense that, again, this idea could do the demos, but could it actually perform for practical tasks

The claim now is that this version, at least of Devin, is really good when users give it tasks they know how to do themselves. And also teaching the model to test its work, keeping sessions under three hours, breaking down large tasks. Basically, it's all the stuff that you typically, if you use...

these sorts of tools, whether it's Copilot or something else in your development, like this is all standard stuff. So it does seem like one of those things. I'm curious to hear the side-by-side, if you're going to defend that 500 buck a month price point, like that's asking a lot. So, you know, you're up against, you know, opening I01, you have 200 bucks a month for the highest paid tier or the most expensive tier of a one.

So is this really going to be two and a half times better for this use case? I think that's a really interesting question and we'll find out soon because I think, you know, when it comes to Devin and your cognition labs in general,

I think they've got an uphill battle. I'm going to say the same thing that I say about them, about Cohere, about Adept.ai, all these mesoscopic companies that haven't raised a ton. The reality is scaling is still working. I know there's a whole bunch of naysaying about scaling these days, but when you actually look at what's going on, it is still working. This is why companies are pumping in billions into new data center builds, like tens of billions.

So I think companies like Cognition are actually going to be in deep trouble if the scaling trends do continue. And so my naive expectation is that they end up folding sometime in the next two to three years. And we'll see. Hopefully they prove me wrong. And then this is kind of part of a problem in the space, right? Like the rich get richer in that sense. You have the big players that can afford the big data centers that

that build the better models. But I think this is a, it's a pretty interesting moment. I almost want to say do or die because they either, they either rock open AIs 01 and similar models like Cloud 3.5 Sonnet new, which is probably the most direct competition here, or they don't. And you're, again, you're defending 500 bucks a month. That is a steep, steep price point.

And they, I don't know actually if they claim to be training their own models. I think this is an example also where the user experience piece is increasingly important if you're competing in a space. So they have the ability to use it in a browser. They have integrations for your IDE.

You can use it via shell. And a lot of the time, it's also a part of if you kind of adopt a tool and you get to know the tool, you might just stick with it, right? And you don't need sort of to train the model necessarily. You can use Lama, you can use another API. You just need people to stick with you. And that's currently kind of a war going on with Cursor and a lot of these startups doing work

tools for just software engineering or built-in stuff. So, I don't know. Yeah, that's a good point. Sorry, you're right. Yeah, my mind was absolutely going to that idea. You're right, as a platform slash integrator, I guess that's good. Without information about specifically how it works under the hood, it's just, you know, you risk all the standard things where you're going up against the

big players with distribution and getting swallowed whole as well by the UX and UI of OpenAI of Claude. But you're absolutely right. Yeah, this is a distinct set of risks. And yeah, to your point of comparison, it's going to...

Just looking at their blog right now, they did post a review of OpenAI 01 and then talking about coding agents. So, yeah, there's a bit of worry there, I think, with all this work on the agents from a whole bunch of people.

All right. Even more stories on news. We just got a couple more. The next one is a part of the ship mess trend at OpenAI, but also some more. This one is about Apple and they have launched iOS 18.2.

Part of that was the new ChatGPT integration with Siri. So at long last, you can do that. Users do not need an OpenAI account to use the integration, but you can opt for upgraded ChatGPT versions through Apple. And there are apparently also privacy protections where OpenAI does not store requests.

In addition to that integration, you also get some things like Genmoji, better text tools, and a variety of features that we've seen with Web Intelligence. So yeah, nice to see this coming here. It took a bit longer maybe than people expected, at least I expected, but certainly important for Siri to stay viable.

Yeah, and with that, OpenAI is now partnering, interestingly, with both Microsoft and Apple at very large scale, which is highly unusual, right? Like these are normally, like the Apple-Microsoft rivalry is one of the longest standing ones in modern Silicon Valley history. So, you know, this is quite a feat by CMA being able to Machiavelli his way into the close relationship with both these companies. The other thing too is Apple is known to be building their own

internal language models. Their hope is really to do a lot more of this internally. And so it'll be interesting, you know, from a data standpoint, I don't remember the details of the data flow in this exchange, right? Like what data stays on Apple Store? I mean, I've heard the claim that, you know,

the user data stays on, on Apple hardware and doesn't touch opening eye hardware. I forget how that's implemented exactly, but that'll be a central concern here and a brand reason for Apple to take this in house and have their own LLMs as much as possible serving up chatbots. But yep, for, for now it seems like a, and I don't know what to call this, like a, an alliance of convenience for the moment for these two, uh,

two big players. And the way this works, if you are a user of Siri, this should just kick in. If you ask Siri a complicated question that it cannot handle, it will then ask you whether you will ask the user for permission to access ChatGPT to answer a question. So

perhaps you'll start seeing it if you give Siri tricky questions. And onto the last story. This one is about Reddit. They have a new AI search tool. So this is called Reddit Answers. And it is what it sounds like. You can ask this tool a question. It will look through Reddit, presumably, and provide you an answer.

Which would mean that instead of Googling for what people are saying on Reddit, maybe you will actually go to Reddit and ask this thing, what people on Reddit are saying with respect to various things. Still initially available to a limited number of users in the US and in English.

but it will presumably soon expand to more languages and to Android and other things like that. Yeah, this is actually part of a really interesting battle or subplot, let's say, that's playing out in the search space. So Reddit, and I don't know if you found this, but the last sort of two years, I've found myself increasingly Googling to find things

subreddits. Like basically, um, the, the real answers I want are on some, you know, machine learning subreddit, right. Or, or some, uh, I don't know, some, some, what's other stuff that I do. I don't do a lot of stuff, but basically things like that. And, um,

And so you're functionally using Google just as a way of getting into Reddit, which is a sign that Google's in a bit of trouble, right? If that's what they're leaning on, if you're finding yourself more and more drawn towards a certain platform, admittedly for some use cases. But it makes it very tempting for Reddit to say, hey, you know what? We'll just make it...

a lot easier to kind of use the AI augmented tool set, you know, have summarizers, have search products and things like that natively. At the same time, Google is playing with...

This whole idea of summarizing websites rather than just serving up websites, which is a threat to websites like Reddit because, hey, maybe you don't have to click through. Maybe you don't have to actually give them your eyeballs. You can just give those eyeballs to Google entirely. And so this is all kind of part of the landscape shifting underneath mostly Google's feet. And I think this is a structural risk for Google in the long term, certainly. Search is going to change dramatically.

We just don't know exactly how. We don't know what the form factor of the final product will be. But another instance of that trend here, yeah.

And they gave one example, at least in this article, tips for flying with a baby for the first time, which is the kind of thing you might ask for on Reddit. And it gives you a well-formatted response with built-in links to original discussion. So in a way, quite similar to the overall trend with AI search, where it looks up a bunch of options

articles or in this case, Reddit conversations, summarizes for you in a new AI generated answer that combines all that information and provides you the links to go back to the original source of that. So, yeah, I totally agree. I think often people do use Google to find discussions of stuff they are thinking about on Reddit. And perhaps this will start to change that. We'll see.

All right. That's it for tools and applications. Quite a lot this last week. Moving on to applications and business. The first story is once again about OpenAI. The summary is OpenAI is aiming to eliminate the Microsoft AGI rule to boost future investment. This is reportedly, according to people inside, not anything official here,

There is a rule that prevents Microsoft from accessing future AGI technology. So that was put in place long ago. That would kind of mean that I think essentially OpenAI would have control once you get to what they deem AGI. Commercial partners won't necessarily be able to get access to that.

This was back from the days when it was a nonprofit. Well, now it's trying to go for profit and various things are changing. Potentially, this would be one of those things. Yeah, this is kind of interesting, right? Just because of the way OpenAI originally framed this carve out, right? When there was this first big Microsoft investment, the $10 billion, actually, sorry, that was before that is around like $1 billion they put in.

um you know the claim was made well look um now you're you have all these lofty uh goals about ensuring the benefits of ai are shared with everyone and you build it safely you're not gonna you're gonna actually invest in in things like super alignment um and now you're you're partnering with microsoft in a way that gives them access to your ip so like

You know, what value are your guarantees of how you're going to treat the technology if you're attached to the hip to somebody who is not bound by those constraints? And the response there was, oh, okay, well, don't worry. We have this clause in our agreement, as you said, that prevents Microsoft from

from accessing AGI. They can access anything else, right? That's part of the agreement. But once we reach AGI, which they define internally as quote, highly autonomous systems that outperform humans at most economically valuable work, then Microsoft won't be able to access that tech. Now you may be asking yourself, highly autonomous system that outperforms humans at most economically valuable work, that sounds very fuzzy.

Surely somebody has to determine what that means and move across that threshold. And the answer is yes, the OpenAI board, the board of the nonprofit was to determine when that threshold was achieved. And therefore, when Microsoft's access to OpenAI's technology would get cut off.

off right now. The problem is if you're asking Microsoft and other big players to come in and invest giant wads of cash to fuel your continued scaling, you really have no choice but to say, okay, open kimono, you're gonna be able to use all this tech. That's a big issue, right? That's a big issue for open AI on their website right now. It says,

quote, AGI is explicitly carved out of all commercial and IP licensing agreements. This was explicitly done to prevent, you know, the kind of people who were less security, safety conscious, you know,

whatever opening I would say currently that it is from accessing the technology. And now they're rolling that back, right? So this actually, I think would rightfully be viewed as a lot of like early opening eye cheerleaders as a direct kind of contravention of their earlier principles, right?

a sacrificing of principle for the ability to continue to scale, which is a requirement. Like, look, we're at a scaling race. OpenAI has no choice. They need to be able to bring in fresh capital because the CapEx requirements of scaling are so insane. But here's Sam Altman explaining that at a New York Times conference just this past Wednesday. He said, quote, when we started, you have to imagine I'm speaking with verbal fry here. When we started,

We had no idea we were going to be a product company or that the capital we needed would turn out to be so huge. If we knew those things, we would have picked a

a different structure. And that's all very interesting because it ties back. I've been hearing tons of stuff from friends at Opening Eyes, some folks even who've worked anyway in, let's say, Sam's orbit, that his view is that like, oh, the problem was that the corporate structure, it was all wrong to begin with. And the fundamental challenge he will face if he tries to make that argument to himself and to others

is that the principles themselves, the principles that underpinned opening eyes activity, these lofty ideals of safety and security and all this stuff, those are, by opening eyes, own arguments back then betrayed by this action. That's what at least it seems like to, I think, a lot of people. I think there's a pretty strong argument there. There's an argument out of necessity, though, that seems to be trumping everything, which is just like, yeah, well, if we want to have any role in this process,

new world, we have to be able to scale. That means we have to be able to go for profit. We have to be able to ditch clauses like this. But I think this is really, really tricky, especially, you know, you think about the whole transition from nonprofit to for-profit, you know, opening eye right now, you know, Sam Altman in particular's word starting to look pretty unreliable. Like it's

It's honestly difficult to think of things that OpenAI committed to back in the day that they still are really sticking to. We've seen failures on the complete catastrophic failure to fund super alignment and people literally like three rounds of successive super alignment leadership just ditching the company. And there, the promise was 20% of resources will go to safety, right? Which reportedly hasn't been the case. And that was part of a frustration, presumably internally. Yeah.

Exactly. Ambiguity there too about, was it 20% of the compute stockpile they'd acquired up to that point, 20% going forward, all these things. And you could argue that that ambiguity, by the way, was a feature and not a bug, and that it made it easier to kind of claim that they were adhering. But even by any reasonable standard, it seems that they just flopped on that. And in many such cases, like

Many, many such cases of which this seems to be just yet another example. So I don't know what the interaction will be. I'm not a lawyer. I don't know what the interaction here will be with the nonprofit to for-profit changeover. But man, is the list getting long now with OpenAI.

And, yeah, I wouldn't be surprised if this move is something Microsoft really wants before they are willing to bump more money in. As you said, the question of what is AGI and what isn't is pretty nebulous. And so if you're Microsoft, you'd be like, well, I don't know. You can call something AGI even if we disagree, right? Some people might argue O1 is already AGI. I think we had OpenAI posted basically saying that.

which, yeah, that's not good if you could just say like, oh, this is a thing you want to get access to because we think it's AGI. So certainly from a business perspective, it makes a lot of sense.

And on to the second story, something we haven't touched on in a little while, but personally I think is a bit of a big deal. The story is that GM has halted funding of robo-taxi development by Cruise, ending kind of a long, ongoing process.

tragedy, you could say, that they've been covering for a while now. A slow motion car crash, one might say. Sorry, I'll see myself. Yes. Well, so what happened, just quick recap, Cruise had a major incident over a year ago now, I believe, where...

They were partially at fault, let's say, for an injury somewhat sustained. There was a car crash due to a human driver, but then the cruise car pulled over in a way that hurt someone. And the big problem was the cruise communications with regulators was, let's say, dodgy. They didn't fully disclose everything. They weren't fully cooperative. That led to a bunch of issues.

Cruise was at that point testing on San Francisco streets, just like Waymo was. That ended. We've seen kind of a slow movement towards getting back into the game from Cruise. But it's always been a question of whether they will try to compete with Waymo and increasingly Tesla. And now they are pretty much bowing out. It's pretty clear that

GM is planning to acquire remaining cruise shares and will then kind of fold it in, presumably to use that technology in their cars. So, yeah, now it's pretty much two big players, it seems. Basically, Waymo and Tesla are the two providers, potentially, of self-driving robot taxis and

Way more increasingly rolling out throughout the U.S., but somewhat slowly. Tesla increasingly improving their FSD software. Recently, they launched FSD 13, which is, at this point, looking pretty impressive. Yeah.

Much less sort of scary to let it take over and drive you around. Much more human-like, they say, because of the end-to-end training from data going just from video. So one of these things that right now might go on the radar, but in a year, I do foresee a lot of robot taxis being everywhere. And it's a really big business that either Waymo or Tesla or both will dominate.

Yeah, the Cruz-GM relationship has been an interesting and rocky one for some time. Kyle Vogt, who was the founder of the company and took it through famously Y Combinator, actually. He left last November and after he left, he put a

He put up a tweet saying, in case it was unclear before, it is clear now, GM are a bunch of dummies. So, you know, certainly a rocky history there. There is also Honda, which is an outside investor in Cruise. They put in about $800 million or $850 million into Cruise up till now. And they're basically, they had planned to launch a driverless ride hail service in Japan in 2026, but they're saying they'll now be reassessing those plans again.

And anyway, kind of interesting step back from both these players at the same time, as you say, two big players left. Kind of interesting in the driverless space. Moving on to the lighting round, and we have a bunch of stories on hardware. First up, largest AI data center in the world to be built in northwest Alberta.

So the name of this one is Wonder Valley, is what the largest AI data center will be called. And it will cost an estimated amount of $70 billion.

funded by a collaboration between the Municipal District of Greenview and O'Leary Ventures, which is led by Canadian millionaire Kevin O'Leary. So total surprise to me, Jeremy. I'm guessing you have more to add to on this.

Yeah, so this is insane. This is a huge story. I think this is, from an infrastructure standpoint, one of the biggest stories maybe of a quarter, if not the biggest story. So right now, just to situate you a little bit, people are struggling to find a spare gigawatt of power, right? So for context, so one H100 GPU very roughly will set you back about a kilowatt, right? It consumes a kilowatt of power. So if you want 1,000 gigawatts

H100 GPUs in your data center, you're going to need a megawatt, right? If you want a million H100 GPUs, you're going to want a gigawatt. So when we talk about a gigawatt of power, we're talking about, roughly speaking, order of magnitude about 1 million gigawatts

NVIDIA H100s or the equivalent. And what we're now seeing is companies like Netta looking for that big two gigawatt cluster, right? The one gigawatt cluster, 1.5, stuff like that. There really is no plan right now to hit the 10 gigawatt cluster. In other words, the 10 million H100 equivalent cluster that is going to require massive infrastructure buildout

This is really noteworthy because out of nowhere, Canada is relevant now for some reason. People have been looking all over North America for sites where you can build a large structure like this that can accommodate really pushing the 10 million GPU threshold. And that's really what this means. Now, this project is going to unroll in phases. It's not going to all happen at once. Phase one...

is going to involve the first 1.4 gigawatts of power that's going to be brought online. They plan to bring on an additional one gigawatt of power each year afterwards. And again, very roughly, so one kilowatt is roughly one GP, which is roughly one home. So you're thinking here about like the equivalent of, of, um,

powering an additional 1 million homes per year in this one little area. So that's pretty remarkable infrastructure build-out. And the phase one build-out, again, for that first 1.4 gigawatts is estimated to cost about 2.8 billion. So the vast majority is coming in later as they look to expand out. This is a big increase in Canada's just like baseline power generation capacity. It's about 150 gigawatts.

that Canada generates on an annual basis. So here we're looking to increase that by, well, what is that? By about 5% if my math is right? Yeah, about 5%. That's a 5% increase just from this one site in available power. You need that, right? You need that to be able to power the cooling, the GPUs, the infrastructure, all that good stuff. But it makes this location like...

a really interesting geostrategic location. Like all of a sudden it makes it relevant. Now the timeline is tricky, right? So what we're hearing here is we're hitting 7.5 gigawatts. Yes, that's the goal. The idea though, is to have that online over the next five to 10 years. And so that's part of the challenge when you think about, especially if you're AGI timelines or like, you know, 2027, something like that, then you may think of this as too little, too late, or at least

at the full 7.5, but still the 1.4 gigawatts that will be coming online sooner may be relevant. So overall, really interesting. Why is this happening in Wonder Valley, Alberta, like in the middle of nowhere? Well, the answer is number one, oil sands. So Alberta is a Canadian province that is known for having a lot of natural gas, thanks to the Alberta oil sands. Also, not mentioning the article, but potentially quite helpful, it is cold as fuck up there. Well,

the potentially is very significantly kind of made easier by that. And then there's all kinds of pipeline infrastructure that's been developed there. Obviously, Alberta is not obviously Alberta is the Texas of Canada. So they produce all the oil people there even do. They have like stampedes and shit. It's basically just Texas and Canada. And and as a result, there are all kinds of pipelines that allow you to move oil

resources around very easily. And there's also a fiber optic network that's set up. So a lot of reasons why this is a really kind of promising site. And Kevin O'Leary of Shark Tank fame, right? If you've seen that, you know he is Canadian, but he is also a bit of a figure in the world of American, kind of testified in Congress about crypto, and he's done all kinds of stuff like that.

So I think we'll be seeing more from this project. This is really, really interesting. One hopes that they will be engaging the appropriate national security assets to secure a site like this, because although it may not seem so today, if you buy into the premise that AI systems will be more and more weaponizable, this is going to be a national security asset, maybe first and foremost.

And on a very related, somewhat similar story, Meta has announced a 4 million square foot data center in Louisiana, which will cost about $10 billion, use 2 gigawatts of power.

power and will be used to train the Lama AI models. They say they have pledged to match the electricity use with 100% clean and renewable energy and will be working with a company called Entergy to bring at least 1.5 gigawatts of new renewable energy to the grid.

So yeah, very much a similar story and the one we've increasingly seen from all of these massive companies.

Yeah. And two gigawatts again, I mean, this is a very significant amount of power, but pending regulatory approval, right? That important phrase. What I will say is, so right now, the new generators expect to come online somewhere between 2020, 2029, pending regulatory approval. That probably is going to be a lot faster, given the Trump administration's agenda on massive deregulation of American energy infrastructure, which

which, at least in my opinion, is a very important thing to do. I think even the Biden administration has a task force that they've set up to look into, like, how can we do some of this stuff? So expect the timeline associated at least with the regulatory hurdles to get cut back fairly significantly. It's something that I've actually been working on quite a bit as well. It's just like, how do you do this? How do you deregulate the energy piece to make sure that you can unlock

American production and importance of the AI side in a way that's secure. Yeah, apparently there are nine buildings and there's going to be work that starts actually this month in December with construction continuing through 2030. One of the interesting things about these sites is they're basically never finished. And once they're finished,

they have a pretty short shelf life before they're no longer relevant because the next generation of hardware comes out. So, yeah, it's sort of like this living building, I guess. The overall development is known as Project Sucre. Sucre, actually, sucre is the French word for sugar. No idea why that. Well, I guess French because Louisiana, but there you have it. So, yeah.

They go into the, anyway, the details. There's like 2,200 megawatts worth of power coming from combined cycle combustion turbines. And there are two substations, which have crazy long backlogs, by the way. Anyway, there's all kinds of stuff that they got to put together to get this stuff into shape. But it's going to be a big deal. Will be used to train Lama models of the future. And yeah.

Meta's on the map. Yeah, fun fact, the 27th data center of Meta. And they also say this will be their largest to date. So setting some records at Meta. And one more story on this front. We have one from Google and it is said that their future data centers will be built next to solar and wind farms. This is in relation to them partnering with Intersect Power and TBG

climate. And they say that this will be kind of a source of being able to build data centers powered by on-site renewable energy that, according to them, is a first of its kind partnership. It's a $20 billion initiative. So curious to see how significant you think this is, Jeremy. Yeah.

Yeah, I mean, the sourcing of power is kind of interesting. Like it's something that companies can be very showy with. Meta's done this a lot where they'll like, you know, build out some like solar or wind thing. One of the big challenges with solar and wind is that the demands, especially when it comes

to training models are just like you need high um high throughput constantly of your power right high baseload power um and and unfortunately you know the wind isn't always blowing the sun isn't always shining so when you look at renewables this is kind of a serious a serious issue so in practice a lot of these data centers while they're sometimes built next excuse me next to or or um

concurrently with a bunch of renewables, which these companies will do for the headline value. In practice, they draw down typically like natural gas or some sort of like, you know, whatever spare nuclear power there is on the grid, stuff like that. So this is sort of an instance of that trend. And it'll be interesting to see if they can find ways to kind of

solve for the variability in power generation there. But one of the things that this is also an instance of is the trend of companies going behind the meter. So basically, you'd be in front of the meter with your build, in which case you're drawing power from utilities. Or you can go behind the meter, in which case you basically have an agreement with a power provider, like a

and draw your power directly from them. That's really what's going on here. So Intersect Power, in this case, would own, develop, and operate the co-located plant. And anyway, so this is the agreement they have there. They also have an $800 million funding, Intersect Power,

power does from Google. And so the kind of interconnection between power generation companies and big tech companies is really starting to become a thing right now. Like it is the case that AI is eating everything. And it's sort of interesting to see that you've got to become a power company, you got to become a hardware design firm, like all these things that it takes to scale AI to massive scale.

Yeah, it is quite interesting because obviously data centers has been a thing for a couple of decades. Google, Meta have massive data centers. They've dealt with sort of similar needs. Presumably you do need a lot of power for data centers in general, but now with these AI data centers, it's much, much harder. A different beast. Yeah, I'm sure it's going to be an interesting book to be written just on that topic alone. But yeah.

There's going to be a lot of books I need to be with about AI and what's going on right now, I guess. On to projects and open source. We had a bunch of stories in the previous episode, only one this time, and it is about Google again. They have released PolyGemma 2. These new PolyGemma models are vision language models. They have a $3 billion, $10 billion, and $28 billion budget.

variants with resolutions that vary as well. They have nine pre-trained models with different size and resolution combinations as well.

So yeah, we've seen these JAMMA models come out from Google pretty regularly now. And this is seemingly getting some good benchmark performance for things like text detection, optical music score recognition, and radiography report generation. So not, you know, kind of a big deal. VLMs are...

a little less prominent on the open source front. And this is a pretty significant VLM then. Yeah. And then sort of two take-ons as well from the paper itself that struck me as especially interesting. One of the findings was the larger the model that you have, the lower the optimal transfer learning rates were during training. So while they were training these models on a bunch of different tasks,

they discovered this pattern about the learning rate. So the learning rate, by the way, is how much, as you change the model, right, by how much with every batch of data do you update your model weights, your model parameter values, right? Like if you take a big learning rate, that's like making big changes, a big step size in parameter space, right? Or a smaller learning rate is a smaller step size. A common way that this...

sort of takes shape is you'll tend to want like larger learning rates at the beginning of your training process because your weights initially are just complete garbage, they're randomly initialized. And then over time as your model gets better, you want to reduce the learning rate as you kind of get more and more refined, make smaller and smaller tweaks to your model as it learns. That's a bit of an intuition pump for this.

So in this case, they were interested in kind of crossing over between different kinds of problems. And what they found was the larger the model was, the smaller you want your model

Sorry, the smaller you want your learning rate to be. And this is kind of interesting. I mean, maybe some intuition for this is, you know, if you have a large number of degrees of freedom, you can kind of tweak them maybe all like it just allows you to make more nuanced movements, whereas you need to make more significant movements when you have fewer degrees of freedom to to to learn the same thing, let's say.

Kind of an interesting result. The other one was apparently increasing the resolution of images has a similar computational cost to increasing model size, which I found a little bit confusing at first. It actually does make sense. Ultimately,

So when you increase model size, the reason that it costs you more compute, obviously, is you have more parameters to dial it. So you just have more moving parts you need to refine every time you pass a batch of training data through and more to compute the model.

the kind of forward pass as well. Uh, but the, so the issue here is if you get a bigger image that will be associated with, uh, an encoding that has like just more moving parts as well, like more essentially like having more tokens, uh, that your model has to go over. And so sure, you may not be processing, uh,

using a larger model to process those inputs, but a larger image still involves more computations because it's essentially like having more, well, it's more data. It sounds pretty intuitive, at least when you put it that way. So you've got two different ways to increase the compute spend on your problem set. You can keep resolution fixed, but increase the model size, right? So you could go from your 3 billion parameter model, 10 billion parameter model, or you could keep the model size fixed

but increased resolution. And depending on the regime you're in, it can actually be more efficient to do one or the other. They find that one is more or less compute optimal. So I thought that was kind of interesting. And again, more for the scaling literature that I'm sure we'll come back to later.

Yeah, this is one I found quite interesting. They found basically three groups of tasks. One group where the two were about similar in terms of the improvement they gave. This is actually the majority of the tasks. Things like segmentation, for instance, they found that making the model bigger or increasing resolution, both were pretty effective. But there were examples like TextVQA, where

where the resolution really helped more, or DocVQA, for instance, which makes some sense, right? If you need to read text, probably higher resolution helps quite a bit. And then they do have other examples like ScienceQA, for instance, where maybe because the model is bigger, it can better answer scientific questions and has more information in it. So certainly something I haven't seen before and an interesting result from this paper.

And speaking of papers, moving on to the research and advancement section, and we begin with a pretty cool paper, training large language models to reason in a continuous latent space. So in the reasoning paradigm in things like O1, what you've been seeing in general is often the way to reason is you literally tell the model sort of,

Think through a set of steps that you'd need to do to solve this problem. Then execute each of these steps. In some cases, you review your answer and then iterate on your answer, see if there's anything wrong with it, etc.

And all of that done is via outputting text, right? And feeding that text back to the model. So what this paper proposes, they have this new reasoning paradigm they call Coconut that takes the hidden state, the non-text, the very soup of just numbers that somehow encodes meaning in a large language model's

And then feeds that into the model as the reasoning step instead of converting that hidden state into words previous to that. And so they call this continuous thought because these numbers are sort of a continuous representation of what will become a text, which is discrete. There's like a set of letters that you can choose from.

So this approach has a lot of benefits. You can explore multiple reasoning paths. You don't need to decode, which is one of the very costly operations with LLMs is going from representation to text requires decoding. So this certainly kind of

augments your ability to do things like chain of thought reasoning. And they do show in experiments that this outperforms chain of thought in logical reasoning tasks with fewer tokens during inference.

Yeah, this is for my money. This is really the paper of the week by far. In fact, the story of the week by far. The implications here are really, really wide ranging. And I think this is going to be rolled into, if it's not already, frankly, it's going to be rolled into training schemes that we see for agentic systems very soon.

So, basic frame here is, you know, your text-based reasoning that you see, as you said, with chain of thought, right, where the model will explicitly write out its own chain of thought and it will use its own chain of thought to help guide it towards more optimal solutions. That approach is not ideal. It's not the best way for these models to reason. Most tokens are used for things like textual coherence, grammar, things like that, not essential for reasoning.

By contrast, you have some tokens that really require a lot of thought and complex planning, right? Think about, for example, a sentence like the best next move on this chessboard is blank, right? That blank, like you'd want your model to really think hard about what that next piece is, that next token is. But with current approaches, you're basically spending the same amount of compute on that word as you are on the word the, right? Which is not terribly informative. So

That's kind of an interesting intuition pump for why you might want another approach, an approach that doesn't involve explicitly laying things out in plain English. So what they are doing is, yeah, prior to, so if you imagine your transformer, you feed in your prompt, your input tokens.

And those prompts get turned into, well, they get turned into embeddings, basically just a list of numbers that represents that initial prompt. And then that list of numbers gets chewed on, gets multiplied essentially by matrices all the way down until you get a final sort of vector, a final list of numbers that's been chewed on a whole hell of a lot. And that final list of numbers, the last hidden state is what gets decoded into an output token normal.

An actual word that you can interpret and understand. But what they're going to do here is they'll take that last hidden state. Instead of decoding it, they're going to turn around and feed it back in to the model, the very bottom, in the position of the input embedding, and have it go through yet again. And what they're doing...

is essentially causing the model to chew on that token again. That's one way of thinking about it, essentially. But the way they're going to train the model is by using a chain of thought data set. So what they do is they start by saying, okay, you know, imagine that you have a chain of thought that has like, okay, I'll start by solving this problem. Step one, I'm going to do this. Step two, I'm going to do this. Step three, I'm going to do this and so on.

And what they're going to do in the training process is they'll use that expensive to collect data set, that chain of thought data set. And they'll start by kind of causing...

So, sorry, let me take a step back. When this model generates its last hidden state, right, when you finish propagating your data through, you have your last hidden state. Normally, you would decode to an output token, but now you're, again, feeding it back into the bottom of the model. Well, you still need a token that gets spat out just for your model to be coherent, essentially, anyway, to respect the fact that it's an autoregressive model.

And so what they're going to do is they're going to essentially put out like a thought token in that position as the output. Now, you can essentially decide how many thought tokens you want between your input and your answer. And that allows you to control like how much thought, how much inference time compute in a very interesting and fairly objective way your model is investing in generating that answer. So number one, that's a really interesting way of quantifying things.

semi-objectively the amount of compute that goes into your inference time strategy, right? So we haven't seen stuff like that before. I think it's exciting for that reason in part. But the other thing that makes it interesting is the training process. So that thought token that you're spitting out, what they do is they'll take their data set, their chain of thought data set,

And they'll kind of like blank out one step at first, like say step one, and they'll try to get the model to just replace it with a thought token. So not actually spit out the step one reasoning.

Um, and then, and then they'll, they'll, you know, but they'll keep step two and step three. They'll allow to reason in plain English for those steps. And then in a later round of training, they'll replace step two and then step three. So sort of this iterative process where you're getting the model to reason in latent space for more and more of the problem set. And that allows you to kind of control, um, the, the, anyway, get the model to converge in a more robust way. Um,

Last thing I'm going to say, like there's so much, if you're going to read a paper cover to cover this quarter, make it this paper. Like this is really a really, really important paper. One of the key things that they find is when the model in traditional chain of thought, when the model is about to generate a token, right? Like one piece of text, the thing that gets decoded, the last hidden state that actually gets decoded into that token is

It actually contains, it encodes a probability distribution over tokens. When you force it to actually decode to give you one token, you're kind of telling it like, look, I know that you think the solution could start with like any one of a dozen different possible tokens. I'm going to force you to just pick the one you think is most likely.

Now, in that process, what you're really doing is you're destroying all of the potentialities that the model was considering exploring. It was essentially in the state where, you know, as you might be, if you're thinking about solving a problem, you might be like, well, you know, my approach might involve this strategy or this strategy. I'm not really sure which one to try first, but then it's basically forcing you to go, OK, I'm going to commit to this one.

And once it commits, once it actually decodes that token in a conventional chain of thought strategy, then it cuts off all those other possibilities. It ceases essentially to explore the full space of possible solutions, and it gets locked in. And then in the next stage, when it's kind of going through this process of producing the next token in the sequence, again, it's going to go, okay, well, sure, I'm locked in on the first token, but for the second token, there's a wide range of

of potentialities I could explore, again, it'll be forced to lock in. This is really interesting because by keeping the reasoning in that latent space, by keeping the last hidden state and not decoding it, you're allowing the model to simultaneously consider and explore a bunch of different strategies. All the different possibilities that come from token one then get compounded with the possibilities that come from token two without being interrupted by...

the sort of like collapse of the of the solution into one possible mode and and there are all kinds of implications for that they do a great analysis of how you can actually then view this process as a sort of um a sort of tree right sort of like a tree a mesh a network of possible solutions that are being explored at the same time and how you can use that in turn to measure how effective the reasoning is this is a paper to read this is a paper to look into deeply um

It is, by the way, it's kind of interesting, they use a pre-trained version of GPT-2 as the base model for all these experiments.

There are a whole bunch of reasons, I think, that we should suspect that this will greatly improve with scaling. You know, GPT-2 obviously is a tiny, tiny model, but the kinds of things that we're looking at here, exploring many different paths, right, things that look like chain of thought, these are all things that we've seen improve a lot with scale. So I think that, anyway, I think this is a really, really big deal for a lot of reasons, and I wish we could do a whole episode on it.

Yeah, and there's even more to say. There's quite a lot going on here. So an interesting kind of thing going on, for instance, is ideally the model could just be trained. You know, if you just do optimization, it will happen that it will be able to do, I guess, a similar notion as kind of recursive models, right? Yeah.

You feed the output back to itself and it gets better every time. In practice, I did find that you need to do curriculum training. So there's a special training regime that I do this. There's training with kind of a variety of objectives over time.

They compare actually also to another paper called ICOT, which is internalized chain of thought reasoning. So there's kind of another paradigm, which is from earlier this year, where instead of doing this, which is essentially taking chain of thought reasoning and training a model to do chain of thought reasoning well in this continuous space,

You can instead try to optimize the model to sort of do the chain of thought reasoning implicitly. You train it to be able to output the same answer it would give if

If it had done chain of thought reasoning without outputting and decoding into that chain of thought, that also works pretty well, as you might expect. And that is one thing that you could actually combine here. And they say that this is a future avenue of research. Maybe you can do both. You could optimize the model to implicitly do chain of thought reasoning and also empower it to do, I guess, additional research.

continuous chain of thought reasoning. And they do show that this technique works better than implicit chain of thought reasoning. But, you know, they're both pretty strong techniques here. Yeah, lots we could go into, but probably we don't have too much time. So we'll have to leave it at that.

Next paper, also a pretty notable one, I think, from this past week. The title is An Evolved Universal Transformer Memory. This is from Asana, a startup that has made some waves

Oh man, I'm showing my tool set that I use in my engineering day to day. Sakana, yes, which started by some researchers that were quite experienced in the evolution area of optimization where you don't do gradient descent, you instead do this non-differentiable method optimization, again,

I guess, I don't know if it's too technical, but basically you can optimize for things that you can't with the usual way that neural nets are trained. And they find that this is an example where you can train this neural attention memory model that is optimized to decide which tokens are worth keeping around, essentially, in long context use cases where you have

very long inputs and you need to essentially do a sort of memory, a working memory type thing within the transformer, you can, usually this is sort of trained implicitly, I suppose, by just doing the usual training with long context inputs. Here they are

optimize this technique to focus on the most relevant information for individual layers throughout the neural net and that improves performance across various long context benchmarks. And this can be combined with any

existing a large language model kind of training model. Yeah, it's an interesting approach and definitely more inductive priors that they're adding into the stack here. So basically, this is like your attention layers...

are going to look at your input and determine, okay, which tokens should I base my answer on the most? Or should the model base answer on the most assign a higher attention value to those and move on? And there are a couple of issues with that. Like, first of all, not all, like you end up having these massive KV caches, basically the caches that hold the data that you need to calculate those attention values and then the attention values themselves. And the

Problem is not all tokens are equally important. Some of them can be thrown away and you're just taking up a whole ton of memory with a bunch of useless stuff that you don't really need to retain. And so the question here that we're trying to answer is, can we build a model that selectively determines and throws away a kind of unnecessary token data in the KVCAT?

And that's really interesting. They're going to do this with an ancillary model. They're going to use an evolutionary computing approach. It's a really kind of interesting game plan. The general intuition behind the workflow here is in part,

They're going to use Fourier analysis. So this is essentially the study of, let's say, decomposing signals into wave-like patterns. What this is often used to do is just identify repeating periodic patterns that appear in some input. They're going to apply that to the attention values in the input sequence that you're analyzing.

And so you might wonder like, hey, why do that? Well, it just because there are patterns that could appear in those attention values. And those patterns make the attention vector more compressible, right? Anytime there's a pattern, you can compress a thing because it means that like a pattern by definition is a repetitive thing where if you have just part of it, you can reconstruct the rest.

So this is exactly the strategy they'll use. And it's all about figuring out, you know, how can I compress this to throw away data that I don't need based on, you know, the token's frequency patterns, how often it's being used, or sorry, sorry, how it's being used, the token's position in the sequence, how it relates to other tokens through backwards attention. And that's its own separate thing. So

Typically, when you train an autoregressive model, what you're doing is you're looking. So the current token that you're trying to predict on, you get to base that prediction on all the tokens that come before, but not on the tokens that come after. And you actually do know what those tokens will be because you're typically during training pulling this from an existing sentence that has been completed.

But the problem is that often the tokens that come ahead actually do have relevance to the current prediction. Anyway, they set up a backward attention mechanism that lets earlier tokens look at later ones and get information from those as part of this whole scheme. So

Anyway, it is really interesting. I think it's another one of those papers that you'll want to dive into if this is your space of interest. But another way to tack on more complexity, there's more work for the compute to do. And I think that's a very promising path. Right. And it's also an interesting kind of paradigm in that you're sort of creating this additional module on top of a pre-trained model. So there's kind of a base model. You can take Lama 3, for instance,

and train this whole other thing that kind of operates kind of independently or kind of adds itself to the middle in a way. And if you do that, it can sort of be transferred onto other models, other large language models without retraining on them.

And there's various benchmark numbers they give. The highlight is on very long context benchmarks like InfiniteBench. It does appear to help a lot. And I think this is one of these areas where there's been a lot of progress, but it isn't quite solved per se, these kind of long context things. So this could be very significant.

On to the lightning round. We begin with Apollo SGD-like memory on MW level performance. So a bit technical, but we'll try to keep it understandable, I suppose. So when you train a neural net, you're basically doing

gradient descent. And there's a specific version of stochastic gradient descent where you're sampling parts of the data. That's the fundamental way to optimize a neural net. You compute the gradients that you get from the errors on the outputs and you back propagate it.

Well, there is a bunch of other detail you can add on top of that with optimization. And Adam is usually the optimizer people use. What that optimizer does is add a sort of memory over recent optimization rounds. And that allows you to know what amount of how big a step you should take with the earning rate across different weights. Now,

Now, the issue with that that makes you perform better, but that requires you to now store that information of previous backpropagation rounds that compute the updated learning rate. So the gist of this paper is, as per the title, SGD-like memory, item W level performance, is they have this approximate gradient scaling for memory efficient LLM optimization, Apollo, which approximates the learning rate scale.

using some fancy stuff that allows you to get away from all the storage required by Atom. Yeah, I think that's a pretty good gist of it. Yeah, yeah. So I think there's a whole set of papers in this category recently, and I think for deeply, deeply strategic reasons, right? The big, big question right now is, how do we scale AI training across a large number of different

like distributed, geographically distributed training clusters. The reason is it's really hard to find a concentration of power of energy in one geographic location that will allow you to build one data center that's like, you know, a gigawatt or a 10 gigawatt data center, as we discussed earlier. So as a result, there's all this interest in how can we set up distributed training schemes that require, like,

Like less data moving around different data centers across long distances. And so now essentially we're interested in can we compress, can we reduce the amount of data that we need to pass back and forth across a system like this? So enter the problem, right? Atom W, this is the optimizer that's typically used today to train models at scale or one of them.

And the way this works is for a given parameter in your neural network, the training scheme will remember, okay, there's this much of an update that we need to make this parameter now.

How much of an update did I have to make last time and the time before? And if all those updates kind of point in the same direction, it suggests there's a lot of momentum heading in that direction. So, you know, if they always said increases parameter value very significantly, well, maybe that means you ought to really ratchet up that parameter value quite significantly. Apply a bigger learning rate, essentially, right? Move it, make the update bigger, right? And then conversely, if you find there's less momentum. That's the basic premise, but...

I just listed three different numbers that you need to calculate or sorry, to remember for that particular parameter. Are you going to remember the current value, the current update? You got to remember the update last round and the round before that. Right? So together that's like three times the model size, the model size in optimizer state memory that you need to keep and pass around and all that stuff. So the goal here is going to be to say, okay, well, can I just focus instead of like literally every single parameter that

In my model, could I, for example, zero in on one chunk of the network, what they call one channel, essentially, groups of parameters that tend to behave similarly and just have a single scaling factor, single learning rate, if you will, for that chunk of parameters?

And that way, you know, I can divide the amount of data I need to remember by the number of parameters in that chunk. And they're going to show that in fact does work. It also applies to entire layers of the transformer. They do that in tensor-wise parameters.

compression, which they apply here. And anyway, so it's really, really interesting. The way they do this is kind of trippy. We're not going to get into it. This thing called random projections is used, which by the way, like Andre, mathematically, this still blows my mind. You have a random matrix, you multiply it by the parameter update matrix, and you get out, like you can get a smaller matrix depending on the dimensions of the random matrix. But

that smaller matrix will preserve some critical mathematical properties of the original matrix, even though it's multiplying a random issue. It doesn't matter. It's the Johnson-Linden-Strauss lemma, which is, this is the first time I ran into it. Holy shit, makes no sense. Random projections, what the fuck? Cool paper, nice work. That's it. Yeah, fun fact, there's a whole area of research, or there at least was, where you could do random projections for hidden layers

in a neural net. Usually you update all the weights in your neural net. Well,

you can actually just randomly initialize a bunch of them, and that still helps you, which is another one of these properties that's really curious. And real quick, I do like to do this every once in a while. This paper is a collaboration between the University of Texas at Austin and AI at Meta. So I think there's been a lot of worry in recent years about universities

not kind of being able to do useful research essentially because you do need these crazy amounts of compute. Often it's been the case that people have interned at these large organizations like Meadow, Google, and did some work there originally from grad school. I think this is another example where even if you don't have massive compute necessarily or you have limited compute, you can do some really good useful research.

All righty, one last paper or research work. This is from Anthropic and they call it

So CLEO, a system for privacy preserving insights into real world AI use. So the idea here is you have a bunch of people using Claude. Presumably you want to be able to understand how we are using it. Like, are we using it for coding? Are you using it for learning, et cetera, et cetera.

So this is essentially a framework that automates anonymizing and aggregating data, creating topic clusters from all these conversations about exposing any private information. Because if you're looking at conversations, someone might be saying, oh, this is some medical information. You do not want to be able to expose that as a specific thing that people are talking about.

So this is a technique to discover usage patterns. They revealed some interesting things like, for instance, over 10% of conversations are focused on web and mobile application development. Educational purposes and business strategy discussions are also prominent with 7% and 6% respectively.

And yeah, I think it was a lot of interesting. This is one of the things that you presumably definitely need as an LLM developer to know what people are using your LLM for. And yeah, this would allow Anthropic to fine tune their model effectively and also would improve safety measures by identifying potential policy violations and coordinated misuse.

All right, moving right along, we have policy and safety next. The first story is a little bit of a dark one, but I do think important to cover. It has to do with Character.ai, which, quick recap, Character.ai is a chatbot platform, a very popular one, where people spend a lot of time talking to artificial intelligence characters.

In recent months, they've had two controversies and lawsuits. One where...

A teenager seemingly or supposedly due to some of the influence of character AI, this teenager was very obsessed with character AI. Well, they ended their own life, which was quite tragic. The parents say that character AI was partially at fault. There was another incident also with harmful behavior that character AI may have augmented.

So, Character Data is now stepping up teen safety. They are introducing a special model, a teen model, that aims to guide interactions away from sensitive content and reduce the likelihood of users encouraging or prompting inappropriate responses. There are also classifiers to filter sensitive content and improve detection and intervention for user inputs. And I think this is one of these things that

specifically for characters that AI is very important, but also in general, as you see more and more people interacting with AI more and more and interacting more and more kind of intimate ways or human-like ways, it's inevitable that you'll see more of stories of this sort where a person was maybe erroneously encouraged to do something bad or maybe erroneously, you know,

motivated in a way that probably shouldn't have been the case. This is another kind of area of AI safety that maybe hasn't explored too much, like the psychological influence that AI models may have over people. So a very real example of that already happening in the real world and this company particularly needing to tackle that.

Yeah. Yeah. And I mean, you know, this is one of those areas where you might expect pretty soon some some regulation. I would imagine, you know, it's something that congressmen have kids who use these tools. And so I'd expect them to be fairly sensitive to this. There's also just like the challenge of looking at children as a.

How would you put it? Like canaries in a coal mine for adults, right? Like you're talking about autistic teens now, but as these systems get more persuasive, we have some really fundamental questions to ask about where the interaction of any human being in an AI system goes and in a world where you can be convinced...

of a lot of things by the chatbots you interact with, you know, that there's a long tail of humans at various stages of life who might find this stuff really compelling and be induced to do bad things as a result. So really hard to know, yeah, where the salt goes, but it's at least, you know, good that there's now pressure to move in this direction. There is

A notice that says you have to be 13 or older to create an account on character AI. And then they do say users under 18 receive a different experience on the platform, including a more conservative model to reduce the likelihood of encountering sensitive or suggestive.

But age is self-reported. So, you know, like I think it's an open question as to how effective those sorts of measures are going beyond that does require though thorny things like proof of, not necessarily proof of identity, but like proof of age, at least at a more, uh,

compelling level. And so there are privacy implications there too. It's just a really hard problem to solve. Facebook ran into this early on when they were trying to prevent people from using it under 13 years of age and other platforms have too. So just challenging problem and unfortunate reality of the current state of chatbots. There's a bit more here as well to worth mentioning. So this is

an issue partially of potential kind of encouragement of bad behavior. Another aspect of this is addiction, where in a lot of cases, especially in the cases here, the teenagers were

you could say obsessed or you could say addicted to talking to these AI characters, spending hours and hours talking to them. And this is coming, this announcement from Character AI is coming after, like almost immediately after another lawsuit being filed. So this was filed this past week. As you said, in this one, there was a case of a

17-year-old boy with high-functioning autism who was spending a very long, very big amount of time talking to character AI, supposedly was encouraged to be violent towards his family, his parents, things like that. So

Another aspect of this is people might really get addicted and seek companionship and social support from AI in a way that isn't healthy. Another aspect that these sorts of platforms really need to start tackling, and as you say, regulation might need to address as well.

And next story, now moving to policy. The title is what Trump's new AI and crypto czar, David Sachs, means for the tech industry. So as that implies, there is the news that there is going to be an AI and crypto czar, David Sachs.

This is a bit of a weird one. This is not sort of an official role. There's not going to be a Senate confirmation for this appointment. It's going to be a part-time role. He's going to keep his position, business position as someone who works in venture capital. David Sachs, for the record, is a pretty notable person, host, and

A very, very popular podcast called All In, one of the hosts of that podcast has been a big supporter of Trump. So what would this mean? Presumably, of course, very business friendly approaches to AI and crypto, a very pro-industry approach. Also has expressed support of integrating AI into national security and defense, and

And with regards to crypto, just quickly mentioning it, it is also going to be the case that there's going to be relatively little regulation, let's say. Yeah, it's really hard to tell what the left and right bounds of this position are going to be. It doesn't fit the standard mold. And if you look at, for example, Department of Commerce,

They have a workflow that's just completely different from, you know, and it doesn't have a way to interface with this position. And so you might naturally wonder like, what is going on? I mean, this article,

I speculate, excuse me, that it may be more about relationships than that kind of conventional formal channels of influence over departments and agencies. But at the end of the day, this does mean that SACS is going to be in the White House and influential, certainly on AI and crypto. One of the questions, too, is

How far does this extend into the national security sphere? I think that is probably the core question here. It seems very much as if, especially given the remit is AI and crypto, this seems like a very industry-focused thing when you start to think about, okay, but what about the national security risks associated with the tech?

Not that he won't be a voice. He presumably will be, but there are probably going to be other voices at the table as well. And finally, I mean, it highlights, as the article points out and is quite apparent to people following the space, there are two different camps in the White House right now. You've got the Marc Andreessen, sort of like David Sachs camp of, hey, let's like, you know,

Get AI developed and basically, you know, who cares to some degree about the risks or that's a bit of a caricature, but more like, you know, the benefits far, far outweigh the risks and let's forge ahead. And then you've got Elon, who obviously has talked a lot about the risks from weaponization, loss of control, that sort of thing. So how exactly that all shakes out, that's going to be an interesting thing to track.

And the David Sachs appointment is, yeah, again, really hard to kind of pin down. And that may be the idea. You know, Trump may just want him around to offer input. I do notice his maximum number of working days a year is set to 130 because he will be a special government employee. So part time and and.

There's some controversy flagged here that that doesn't require him to divest or publicly disclose his assets, which leads to potential conflicts of interest and things like that. So anyway, kind of interesting one to watch and we'll see. I mean, I think a lot of this has yet to take shape and the administration probably doesn't even know what his role will be. They'll just sort of play it out and see how, you know, where the influence ends up lying in practice.

On to the lightning round, now moving on to some research. We have a paper, Frontier AI Systems Have Surpassed the Self-Replicating Red Line. So self-replication is one of these concerns with regards to advanced AI. Essentially, if you have an advanced AI model, well, maybe you can keep it under lock and key, can keep it controlled, not allow it to access the web, et cetera.

But if the AI model is able to self-replicate, to make a copy of itself somewhere else and sort of unleash itself onto the world, that could lead to bad things if a model then decides that it should do something opposed to human interests.

So this is one of the kind of common things to watch out for that Anthropic, for instance, evaluates continuously whether the new generation of the models are capable of it. In this paper, they show that

In 50 and 90% of experimental trials with several models, Rama 3172B and Alibaba's CRAN 2572B, they are able to succeed in creating a live and separate copy of itself respectively. What this means in practice is they are able to generate the code necessary to deploy another instance of themselves.

which in a way isn't necessarily surprising. I think if you've been working with these things, you know, if you are meant to do it, you and you sort of are allowed to self replicate. For instance, this is the sort of engineering that the advanced models are should be kind of getting to a point of being able to do so.

Yeah, not entirely surprising, but definitely worth taking a note of. And Jeremy, I'm sure you have more to say on this. Yeah, I mean, I've seen this paper circulating around quite a bit. There's, to your point, I think there's a bit of overhype on what exactly...

I don't want to call it overhype. I mean, it all depends on what you mean by self-replication. One of the first questions you might have looking at this, if you're tracking the space, you might remember the company Meter that does these self-replication evals for OpenAI and other companies in the space. And they've consistently found with models to date that, yes, they are getting better, in some cases much better, at doing self-replication type stuff, but they're not really able to do it well and we're definitely not there quite yet.

So the question then becomes, what is the difference between this and the meter evals? And it comes down really to how you define self-replication. Meter tends to look at deep replication that includes like model weights and architecture. This one is more about, as you said, cross-replication.

copying files, launching processes, like the engineering side of getting this done, which can still get you a working system. And to the extent that you care more about system automation than true deep AI replication, then the threat model actually kind of holds. But certainly, you know, there are other differences, like the meter evals tend to involve a more constrained environment, kind of less access to things like data,

you know, like, system commands, file operations, third-party tools, and they really worked a lot on their kind of, their agent framework, the kind of wrapper that sits around the model to help, you know, turn it into an agent. Meter, at least in its most recent tests, hasn't been able to do that much work on that piece. So,

I think, you know, it is notable. It's also mixed with, especially some of their more concrete scenarios, do come with a lot of kind of hard-coded rather than native AI behavior where they deliberately prompt the model to do certain things. And then they sort of marvel at the fact that it does those things successfully, mind you, which is good. Or not good, but

which is a legitimate finding. But it remains true that this is with deliberate prompting. So essentially comes down to which threat models you're concerned with the most. If you're concerned with sort of like autonomous AI-driven kind of independent urge to self-replicate, this won't scratch that itch. That's a power-seeking measurement. That's not what's being measured here. What they're really looking at more is the capability dimension itself.

And that is, you know, if you're, again, if you're concerned with this general threat model, yep, this might be a modest update. But I don't think it's anything that, as you said, anyone's going to really be surprised by, the capabilities at least, given that we, you know, we've seen these models do similar things in other countries.

That's right. Yeah. So it's one of these cases where you really should read beyond the headline, which sounds a bit serious to the details. Next up, getting back to geopolitics, as we often touch on, the title of the article is Chip War. China launches antitrust probe into NVIDIA in a sign of escalation.

So there's an investigation that focuses on Nvidia's $6.9 billion acquisition of Mellanox Technologies with the claim that this might be violating China's anti-monopoly laws. Monopoly being, in case anyone doesn't know, probably most people know, but if you are a dominant player in some industry and are stifling competition.

So this deal happened back in 2020. It was approved by China, but required Nvidia to supply products to China under fair and non-discriminatory terms. And as you might expect, this could be an aggressive measure by China to retaliate against US policies. And Nvidia's stock did take a hit 1.8% drop following the announcement of the investigation, not even anything regulatory happening yet.

Yeah, I think this is like, you know, pretty standard CCP response to export controls hitting. Like we've just had a tightening of export controls around, as we talked about last week, high bed with memory, some lithography equipment exports as well, things like that. So

China going tit for tat. This is aligned as well with their restriction of exports for rare earth minerals. They're really looking for all the ways that they can try to frustrate American companies and American AI efforts. This is all part of the reason why the solution to this was always, I wouldn't have been politically feasible, but the solution to this was always clamp down once hard and decisively on

on Chinese exports, you know, back in like 2019, 2020, again, not politically feasible. But what we're doing is we're playing this kind of like losing game of whack-a-mole where you try to patch up one gap and then another appears. And then every time you incrementally increase the threshold of export controls, now the CCP is going to do a retaliatory action. So if you did something decisive early enough, you know, who knows, maybe you could have, you

obviated some of this. Then again, you know, China has like less to lose in that context. So that's really what we're getting at. The export controls are actually really starting to work. We've seen a number of indications of that. And this is now really getting under their skin. They're also trying to posture ahead of the Trump administration to try to make it seem like, oh, you know, if you come in with stronger sanctions, we're going to bite back even harder type thing. You know, non-negligible concern, especially on rare earth exports. The U.S. is just like

terribly positioned on that stuff. And that's just a self inflicted wound. But it can be fixed with with the right deregulation. And it can be fixed with anyway, the right investment and focus. But this is just, you know, sort of standard fare and something that I'm sure the administration actually expected going in stuff like this. And speaking of export regulation, the next story is about another territory that's been sort of, let's say, unclear, a bit of a gray zone. And

And it seems the U.S. has cleared export of advanced AI chips to the UAE under a Microsoft deal. So there is a Microsoft-operated facility in the UAE as part of a partnership with G42. You've covered that previously. You have made some big investments in there.

Microsoft invested $1.5 billion in G42. That gives it a minority stake in the board seats, so we're pretty deeply invested in this organization as part of the UAE. And it has been a question mark as to what the response of the US government will be, as there are also some potential Chinese ties of G42.

And so it seems that there is the export license. It does require Microsoft to restrict access of this UAE facility from personnel associated with nations under the U.S. arms embargoes or on the U.S. interludes. So essentially, you get the export license, but you've got to be still respecting the restrictions that have been placed on China.

Yeah, apparently the license now that's been approved requires Microsoft to prevent access to its facility in the UAE by personnel who are from nations under U.S. arms embargoes or who are on the entity list, the famous entity.

BIS, Bureau of Industry and Securities, and the list of the Commerce Department. This is the list that contains companies like Huawei, like YMTC, some of the big players in the Chinese ecosystem, and frankly, should contain a lot more. And frankly, should probably be a white list and not a black list, but I digress. So right now, all these requirements are being added to essentially prevent

It's kind of interesting, right? Like this starts to, if you know the kind of world of policy and arms control policy, it starts to vibe a little bit like a shade of ITAR. Like, you know, they're starting to think about the next. So ITAR is this sort of counterproliferation strategy.

essentially policy it says um if i give you a special technology uh you are only allowed to pass it on to other people who are itar approved if you will and um and if you fail that then you're in big trouble right so the idea here is that they're sort of you know pass this word but say hey you can't you can't uh feed this word to people who are on the entity list who aren't screened in um it's kind of interesting because it is a step in that direction which would like i think one of the things that that

you need to look at from a national security standpoint is officially classifying AI as a dual use technology under ITAR, um, more advanced AI systems, not the, the kind of general purpose ones that we have lying around today. So anyway, um,

All very interesting. The restrictions cover people physically in China, the Chinese government or personnel working for any organization headquartered in China. So clear what is in the kind of target zone here in terms of G42. And on to the last story and moving back to the US, the White House has created a task force on AI data center infrastructure, as Jeremy, you mentioned earlier in this episode. So

This is going to coordinate policy across the government and maintain a U.S. leadership in AI technology is the party line. So this will involve, of course, the Department of Energy. They will create an AI data center engagement team and share resources on repurposing closed coal sites, apparently.

The US Arms Corp of Engineering will also identify permits to expedite AI data center construction. And there's also some stuff here on industry exports. Yeah, seems very much in line with what is needed to expedite and enable these very, very complicated data center constructions to take place.

Yeah, the big challenge here is that it is a whole-of-government issue. So you have to have coordination across the Department of Energy, Department of Commerce. You've got national security, increasing national security considerations to be brought in. And so what you're essentially seeing is the government recognizing that and saying like, oh, crap, we need – so the National Economic Council, National Security Council – by the way, these are councils that advise the president –

on, you know, the issues of the day. So the National Security Council, you have a bunch of, usually they're fairly prominent national security people. And then the staff does a lot of the key work, the national security, the NSC staff. And so essentially all kind of coordinate together at the White House level to, you

to solve problems like, Hey, we have, how do we deregulate in a strategic way? Presumably the Trump administration is going to be more aggressive than this, even especially on environmental regulation, deregulation, things like that, things that are blocking the development of these new builds, the development of new power plants, the, the, the sort of national security vetting of sites and things like that. So yeah, I think it's, it's interesting and noteworthy that we're now at the point where this is becoming a white house priority and

And a lot of talk as well about getting the military in various forms to support here. And this is, yeah, as you say, the Army Corps of Engineers, right? So now the DOD is involved. So yeah, very wide-ranging effort here. And

And that is it for this episode. We wound up with a bit of a long one. Lots to talk about this last week. Thank you so much for listening, especially if you did make it to the very end and are listening to me now. That's impressive. You made it through an entire episode. As you probably already know, you can find the links to articles in the episode description. You can also go to lastweekin.ai.com for that or lastweekin.ai where you get the text newsletter.

As always, we appreciate any comments, any feedback. We try and read those, even if we don't mention them on the show. And we do appreciate reviews. You know, getting those five stars always feels nice. But more than anything, we do appreciate people listening. So do make sure to keep tuning in and enjoy the AI outro song. ♪

Gather 'round, it's time to cheer Episode 193 is here Open eyes, sail ship, NASA, what a sight Gem that too shines bright in the starry night AI agents surf the web Browsing all so free New discoveries and joys in the world of AI This festive season, let's raise a joyous cry

Stories and foretells of progress made in the AI realm where wonders cascade. Ship mass, pre-stored, to coders and visa-like. AI in every corner, changing day to night. Gemini just playing science off the tech sphere. With every leap and bound, future grows near. AI agents learning with each click and scroll. Very ringing narratives bringing us all.

AI agents surf the web, browsing the walls for free New discoveries and joys in the world of AI This festive season, let's raise a joyous cry! AI

Each pilot circuit we find our way AI brings light to every new day As snowflakes fall, algorithms dance with glee Mapping our futures for the world to see Sing a lullaby for our digital nights Steering the path with luminescent sights Let the joy of progress spring throughout the land AI, I'm thriving like a well-registered