cover of episode #198 - DeepSeek R1 & Janus, Qwen1M & 2.5VL, OpenAI Agents

#198 - DeepSeek R1 & Janus, Qwen1M & 2.5VL, OpenAI Agents

2025/2/2
logo of podcast Last Week in AI

Last Week in AI

AI Deep Dive AI Chapters Transcript
People
A
Andrey Kurenkov
J
Jeremie Harris
Topics
@Andrey Kurenkov : 我认为我们之前的播客对DeepSeek v3的预测是准确的,DeepSeek R1的结果并不令人意外。DeepSeek R1是一个与OpenAI的O1具有竞争力的语言模型,其优势在于推理能力。该模型的训练使用了强化学习方法,并取得了令人印象深刻的成果。DeepSeek R1的发布引发了美国科技股的剧烈波动,这反映了市场对AI技术发展前景的担忧和期待。然而,我认为市场对DeepSeek R1对英伟达的影响存在误读,它实际上利好英伟达的硬件生态系统。DeepSeek R1采用宽松的MIT许可证,这有利于其在商业和研究领域的应用。 此外,DeepSeek还发布了Janus Pro,一个性能优异的开源文本到图像模型。这些模型的发布表明,DeepSeek作为一个实验室,正在对开源AI领域产生重大影响。 @Jeremie Harris : DeepSeek V3是一个强大的基础模型,通过强化学习优化就能达到与GPT-4相当的水平。人们对DeepSeek R1对硬件的影响存在误读,它实际上利好英伟达的硬件生态系统。仅仅通过奖励模型正确答案就能有效提升大型语言模型的推理能力,这证明了强化学习的强大潜力。深度学习模型通过强化学习,能够自主发现并利用推理时间缩放定律,这表明该定律是AI系统的一个内在属性。模型会自然地采用比人类更有效率的推理方式,人类可解释性只是对模型的一种额外限制。DeepSeek R1是实际应用的模型,而R1.0则展示了强化学习的未来潜力。DeepSeek证明了可以以更低的成本获得与OpenAI O1相当的性能,这对于英伟达来说是利好消息。DeepSeek的成功凸显了算力在AI发展中的重要性,也进一步强调了出口管制的必要性。DeepSeek的手机应用在Google Play商店排名第一,这表明其模型获得了广泛的关注。DeepSeek的成功并不能改变算力在AI发展中的核心地位,未来算力仍然是决定AI竞争力的关键因素。

Deep Dive

Shownotes Transcript

Translations:
中文

Hello and welcome to the Last Week in AI podcast where you can hear us chat about what's going on with AI. As usual in this episode, we will summarize and discuss some of last week's most interesting AI news. And you can also check out our Last Week in AI newsletter with even more news stories at lastweekin.ai.

I'm one of your hosts, Andrey Karenkov. My background is of having studied AI in grad school, and now I work in a generative AI startup. And I'm your other host, Jeremy Harris. I'm the co-founder of Gladstone AI. It's an AI national security company. And I will say, before we got started,

Andre was a real champ. Just this week has been crazy. And then the week before was crazy. And we didn't cover the week before. So now we're doing two weeks. And I was just like at the last minute, hey, dude, like I have 20 minutes less than I normally would for this. And, you know, sometimes we each go back and forth with different constraints and we're trying to figure it out. But this week, it's me. I apologize. And he was very kind and started pruning off a couple stories, which maybe we'll cover later. But like, there's so much shit. This is such a hard deal. It's going to be a dense episode for sure.

As people might expect, we'll be talking about DeepSea quite a lot, but there's other stuff on the business front, on the policy front, and I'm sure, Jeremy, part of why you've been crazy busy is that there is a new administration in the United States that is making some moves, you know? Anyway, there's quite a lot going on, so we will just dive deep into it in a bit. I will just say...

Give a high level preview. We're going to start with projects and open source, unlike we do usually. So we're going to start right away with DeepSeek R1, then talk about some Qen models and some other models. We're going to cover tools and apps also related to DeepSeek and Qen actually, and some other stories about perplexity. Applications and business as usual. There's updates about OpenAI. That seems to be like half of the news that we cover in some cases. And

And Microsoft and DeepSeek. We're going to mostly skip research this week because we are going to dive deep on DeepSeek. And then we're going to have some policy and safety stories related to a new administration and our usual sort of geopolitics in that section. And can I just say also YouTube viewers may note Andre's teeth are looking pretty good. That may sound weird. If this is the first episode of this podcast you're listening to. Yeah.

then you think i'm a weirdo if it's not you know i'm a weirdo but but but congratulations i guess your surgery went well is everything yes yes i have fully recovered from my unfortunate new year's event and i'm glad uh thank you for noticing and speaking of listeners one last thing before we get to the news i do want to quickly acknowledge some listener comments and corrections i noticed a fun bit of feedback actually just posted recently on

Apple Podcasts, we got a three-star review that said that we have consistently bro quality. We are status quo young Psy Valley bros, always behind the curve, but constantly cheerleading. Case in point, the ironic hardware episode right before DSR1. So,

An interesting take. Thanks for the feedback. I will say on this point, I went back and re-listened to our episode where we covered DeepSeek v3, which was towards the start of January. And Jeremy, you need to get some credit because I think at the time you called it as a

gigantic, huge deal. We went deep on the technical details of how they were able to train this very efficiently. And all this news about it costing $6 million, et cetera, et cetera, that's not even R1, right? That's going back to DeepSeek v3, which we did cover. So anyway, I'm just going to point that out.

Thank you, Andre. I mean, my goodness. But yeah, no, it's, I will say, we'll talk about this when we get to R1 and R10 and all that jazz. But in a way, I mean, and if you listen to the first podcast that we did on V3, when it came out, you're probably not that surprised by R1 and R0, right? I mean, the way we were talking about it, then I think it made it clear, this thing had, it was a base model that had all the potential of, you know, GPT-40 to give R1 and, you

to the extent that you have a good base model, all you really need is that optimization routine for RL and so on, which is really what popped. So

in some ways, super consequential, in some ways, not too surprising. First of all, status quo, young Silicon Valley bros. I love that. I'm going to have a t-shirt made with that on it. That's awesome. The case in point, so the ironic hardware episode right before DeepSeek R1. So I would actually love to get this reviewer's take on what specifically they think is ironic about it, because I will say, we'll talk about this when the discussion on R1 comes, but

I think there's been a real kind of misreading on the implications of R1 and R0 for hardware. It's actually strongly bullish for, this is not stock advice, for NVIDIA stock. It's strongly bullish for that hardware ecosystem and for scaling in a way that I think a lot of people have kind of missed out. So anyway, just to kind of plant that flag there, again, maybe the reviewers referring to something else, in which case I think it'd be really interesting to hear what it is.

And maybe they disagree, but I think this is- Everyone's talking about hardware now suddenly, right? So anyway, thank you for the review. Always appreciate constructive feedback like this that will take into account. And we'll mention just one more thing and we'll dive in.

On the Discord, which has been a lot of fun to see people's discussion questions, we did get a question asked to DeepSeek and its implications for regulations in the U.S. where global caution here would now impact the race between the U.S. and China. So we'll get back to that. Also planning a flag once we get to policy and safety. We'll discuss the implications for geopolitics.

And thank you Discombobulated Penguin for the question. Discombobulated Penguin, thank you. Thank you, Discombobulated Penguin. And with that preview out of the way, let's dive in. And we begin, as I said, with projects and open source. And of course, the first story is DeepSeek R1. And we're going to be diving into the paper, I would say, which is titled DeepSeek R1, Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.

So I'm sure many people already know what DeepSeeker 1 is, but let's quickly summarize that. So DeepSeeker 1 is basically equivalent or competitive with OpenIO 1. It is a language model, a chatbot that is optimized for reasoning and that is meant to be able to tackle problems that are challenging and things like Cloud Sonnet, things like GPIO 4.0 are not able to do well on.

So this paper came out just a few weeks after DeepSeq v3. DeepSeq v3 is the base model that they started out with. So O1, for instance, presumably started out of GPT-4.0. Similarly, this is a model that is trained on top of DeepSeq v3. It's not trained from scratch.

And this paper has some very interesting details and implications for how this is done. So with things like O1, we don't really know what they did. There was a lot of speculation, but it was not very clear. We've also covered a lot of news stories over the past year or so, looking into various ways to do reasoning with LLMs, various ways to do inference time scaling, and

The very interesting thing with this paper that I think from a more technical perspective is exciting is as per the title, "Incentivizing Reasoning Capability in LLMs via Reinforcement Learning," the focus of the approach is pretty much on reinforcement learning. So that meaning that they train the model via giving it rewards. It just produces some outputs. It's kind of like trial and error.

And so it is strained without being told the right answer, you could say. And we've seen this as one possible approach, but that pretty much is the only approach that they are relying on. Although there is multiple steps, at a high level, this really showcases the potential of reinforcement learning. And similar to DeepSeq v3,

It seems to be done with relatively few resources and get really impressive results. Like, you know, people say it's not quite as good as O1, but it's very impressive. You know, obviously it gets really good benchmark numbers. So that's another reason this is exciting. So I'll...

Stop there, I think, and Jeremy, I'll let you jump in with more details. Yeah, yeah. So I think you're exactly right. You know, this idea that reinforcement learning is king, right? So there's reinforcement learning and there's reinforcement learning. So the triumph here is showing that if you just reward the model basically for getting the right answer versus not getting the right answer,

That's kind of enough. There are all these convoluted strategies and we've covered a whole bunch of these as they've been discussed. And, you know, we have deep mind papers about these. We have kind of blog posts about these. Process reward models as well. Yeah. Exactly. Right. So, so PRMs and ORMs process reward models and outcome reward models where, you know, you have essentially think about, you know, a chain of thought, right. Where your model kind of goes, okay, well, yeah, step one, I'll do this step two. I'll do that. Right. So on process reward models, of course, are these models that get trained, right.

to assess how likely a given step is to be accurate. There are all kinds of ways to do this. Often you'll generate 10 different rollouts starting from a given step, 10 different alternative paths that start from there and see what fraction of those paths lead to the correct answer. And then based on that fraction, you score the presumed accuracy of the initial step that you started your rollouts from. That's one way to do process reward modeling. And you basically use that to train a model that predicts

like how accurate a given step in a reasoning stream is likely to be. Process reward models are really, really finicky, very, very hard to get good data for them and so on. There are outcome reward models that kind of do the same thing with the output. So you basically train a model to just like predict whether an output is likely correct or not. These are not ground truth, right? These are models. It's kind of like having a

a reward model for our LHF or something. They are not the ground truth. So you're training your model to optimize against something that is not the thing you care about in some sense. And that means that the model can exploit, right? It can, it can hack basically the process or the outcome reward models and just kind of like, anyway, generate outputs that the model thinks is good, but that aren't actually good. So

That has always been a problem. It's been like, we've been playing whack-a-mole with that issue for the last basically two years or so. What this shows is take a pre-trained model like DeepSeek v3 and just do RL. All you're going to do is tell it, hey, I'm going to give you thinking tags. I'm going to force you to write your code.

chain of thought between these thinking tags. Okay. But your output is going to come after the closing of the thinking tag. So you've got this kind of defined area as your scratch pad, essentially that you can do all your thinking in, and then you've got a defined area for your actual output. And that is it. At least for deep seek R one zero, that is it, right? So you just have your pre-trained model, and then you're doing the straightforward reinforcement learning process with, you know, you either get it right and get a reward or you don't, you get no reward.

Insanely, what ends up happening when you do this, and they're going to use a data set that has quantifiable outputs, like math data sets or coding data sets, where you actually can give an objective assessment of whether or not the output was correct. And that allows them to generate these rewards efficiently. We've seen that basically everywhere in this space.

But what you find when you do that is that the model will naturally learn to reason in a way that kind of looks like chain of thought. It's more diverse than that. And in that sense, it's actually more powerful, right? Because if you think about the chains of thought that we typically force these models to think through, the way we do that is we'll actually take our base model instead of going straight into reinforcement learning. What we'll do is we'll give it some extra training, some just supervised fine-tuning,

on a data set, a curated data set that has a bunch of chains of thought and then outputs. And those are really expensive to produce those data sets, right? Because you've got to get humans in some cases to like, you know, annotate chains of thought, produce chains of thought, solve problems in all kinds of ways and in detail. And then you just train your model to do text autocomplete on those basically. And it learns the pattern of chain of thought. It sort of, it gets forced to think a little bit like that data set. And so you'll see it naturally run through a chain of thought after it's been fine-tuned on that data set.

But what happens here is much more organic than that. All we're rewarding it for is getting the right answer or not. And just based on that reward, the only extra thing we're telling it is put your thoughts between these thinking tags and the thoughts turn out to look a lot like chain of thought naturally, but they're not forced to look like chain of thought. It can do some more variable, more diverse things because you're not forcing it. You're not training it explicitly on human chains of thought. So it will tend to explore more. And one of the other things that you'll tend to see is

When you start this process of reinforcement learning, the first couple rounds of reinforcement learning that you do on the base model, the chain of thought, the length of the chain of thought, or the amount of text between those thinking tags starts off pretty short. And as you spend more time, do more steps of reinforcement learning, you'll find that the model actually tends to fill out those thinking tags even more. The chains of thought essentially get longer.

and the outputs get more accurate. And what this is telling you is that the model essentially is independently rediscovering and exploiting inference time scaling laws itself. They have an amazing figure. It's figure three in the paper, where they show this very kind of linear increase in the average length per response. In other words, essentially the amount of text between those thinking tags and the amount of inference time compute that the model is investing to generate these outputs.

And again, that's not happening because any human hard-coded this idea of like, hey, by the way, the more tokens you spend on your thinking, in other words, the more inference time compute you expend, the better your output will be. No, that's an organic result. That is the model just kind of stumbling into the strategy through pure reinforcement learning. This is a really, really big deal. It's not just a big deal because we don't have to collect now these giant supervised fine-tuning chain of thought data sets, though that is a really, really big deal.

But it's also just an indication of how robust inference time scaling laws are. They are a convergent fact of the matter about AI systems. When you train systems with reinforcement learning, they will independently discover this. Last thing I'm going to say about R1.0 before we move on to just R1, which is a slightly different story.

R10 has trouble sticking to just one language. So remember, all you're training it to do is either get the right answer or not, or rather, all you're awarding it for is getting the right answer. You're not telling it how to think. You're not giving it chains of thought to train on.

So what you tend to find is that the model actually kind of switch between languages. Sometimes it'll sometimes generate text. It's not human legible. And they kind of call this out as an issue, as a problem, almost like it's a bug, but it's actually not. It's the way to think of this is it's a feature. There is a long compound German word that captures 20 English words worth of meaning.

And there's probably a word in Chinese or a word in French that captures 20 English words worth of meaning. And when you can use exactly that word and compress essentially more thinking into one or a small set of tokens, you ought to do that if you're trying to optimize for the efficiency of your computation.

And that, I think, is the actual correct interpretation of why you see this weird kind of language flipping. It's the model essentially taking advantage like, hey, look, to me, there's no such thing really as English. There's no such thing as like French or whatever. There's no need to be coherent in one language. I use whatever kind of linguistic or reasoning tools in my chain of thought that happen to be most efficient.

And so you end up with exactly this. And the problem, and we called this out to a few episodes way back when OpenAI's 01 came out and OpenAI said, hey, look, it's reasoning through human interpretable chains of thought. How wonderful for AI safety, right? This means that we can understand what the model is actually thinking and we can intervene if it starts thinking kind of risky, dangerous thoughts.

And at the time we said, well, hold on a minute. That's actually absolutely not going to be the end state of this stuff. There are always more efficient ways to reason than the human legible ways. And that's what we're seeing here. I think very clearly this is sort of the early signs of when you just let the model reason for just the reward and you don't kind of introduce artificial human bullshit, the model will naturally learn to reason in ways that are less and less human intelligible because human intelligibility is a task.

It is a tax that you impose on the model. It's an additional unnecessary inductive prior that you are better to do away with. So anyway, I'll park it there. I'm sorry. Very exciting. It's a very good thing to call out. And it is part of the reason that there is a divide between R1-0 and R1, which we'll get to next, right? There are some nuances here to cover.

A couple more nuances just with R1.0. So R1.0, that is the purely reinforcement learning model. That's the first step to get to R1. And they really do just start with a base model. They have a very simple template. So the prompts they give to a model does tell it to think about the reasoning process, to output a thinking process before giving the answer. But that's pretty much the only thing they tell it.

They train it on math problems and coding problems only. And I think this is important to call out with respect to reinforcement learning. I think people often kind of forget that this is a limitation of reinforcement learning in general. If you don't have the ability to get rewards in a programmatic way, which in this case you can, then

reinforcement learning is much more difficult. And if you have, you presumably have all sorts of reasoning processes, you have reasoning processes for how to navigate the web, et cetera, it may not be possible to train reinforcement learning in general. As to the language thing, I do want to mention, this is a bit of a tangent, but I do think it's worth bringing back.

For anyone who's been following AI for a very long time, back in 2017, I think it was, back in the old days when deep learning was all the hype and NLMs were still not a thing, there was a new story about AI inventing its own language. So this was some research from Meta. I think at the time it wasn't Meta yet, maybe.

And they were doing a paper on bartering, a multi-agent system where you had two AI models barter. That's what we call it anyway. And so they did a similar thing, optimizing these two models together. And they found that the models started basically using gibberish, using, you know,

you know, punctuation and so on, not human readable stuff. And just like here, that makes a lot of sense. Because if your reward is only, you know, get the right output, the process to get there, you're not telling the model what to do, it can make up its own weird language along the way. And that's not nefarious, it's not surprising, even if that's just like,

a pretty reasonable outcome of leaving a model unconstrained. It can now do whatever it wants. And so at the time for that paper, they mentioned explicitly that they added a reward component. They then added a bit to reward that's like, this should actually read like English. And then it was actually interpretable.

And in a way that's similar to what they did here in this paper. So now we can move on from R1.0 to R1. R1 is R1.0, but with a few more limitations and constraints and kind of design considerations that you could say. So just to quickly cover the process, they begin to train R1.0.

R1, not R10, by doing supervised learning. So they get a data set of reasoning traces through a combination of different things. At least as far as I know, they use DeepSeek R10 for some of it. They use some other approaches to get some of it. And then they just train the model to mimic that data set, which is part of what presumably OpenAI is maybe doing, paying human

people to produce data to train on.

Then, after doing that supervised fine-tuning, they do some more reinforcement learning. So they do the same kind of reinforcement learning as R1.0 on R1 after doing supervised fine-tuning on it to kind of, I guess, bias it in a certain direction that does use human interpretable approaches. Then they go into, in the paper, also distillation and getting to smaller models.

Ultimately, you get somewhat complicated. I don't know if you could call it complicated, but it's not quite as simplified as it might seem. This set of steps is a little unintuitive and

I would say, you know, may not be optimal, but it's still, it's pretty interesting that they also do large scale reinforcement learning. When it gets to training R1, they mix it a little bit with supervised firing training, and that gets you the best of both worlds of sort of your LLM type clarity with the reasoning of R1-0.

Yeah, absolutely. And when they do that, right, when they add the supervised fine tuning step to get it to think in human legible chain of thought terms, right?

Yes, the human interpretability absolutely goes up, but the performance drops. It's a slight drop, but there is a performance drop. And so earlier when I was saying there is a tax that you pay for human legibility, human interpretability, they're literally measuring that tax. You either are going to optimize for a model that is a really good reasoner, or you're going to optimize for a model that is human interpretable, but those two things mean different things.

The pressure on companies to make better reasoners is ultimately going to be very, very strong and maybe stronger than the pressure on companies to make human intelligible reasoning systems. And to the extent that that's true, you start to be concerned about things like steganography or things like even just like explicit reasoning of kind of dangerous things.

reasoning trajectories that are human legible, because that's where you want to expect these things to go down the line. So I think it's kind of really interesting. One way to think about this, by the way, is that

R1 is the model you actually use. It's maybe more human interpretable for right now, for a lot of applications, what you want to use. But R10 is the model that shows you the future is reinforcement learning. This is the model that makes the point that RL can scale and that really works. And the big lesson from all of this

And now this gets back to, this is not investment advice, but looking at the movement, the stock movement of NVIDIA. So there's a lot that was going on there. But when you think about what caused NVIDIA to take off in the first place, it was basically the argument that Rich Sutton first made in the bitter lesson, right? Which is that, well, scale is king. And a lot of people misinterpreted what that meant. The point of the bitter lesson is not that you don't need smart ideas anymore. A lot of people think that, but

But instead, it's that you need to find smart ways to get out of the way of your optimization process. You need to find ways to melt away inductive priors, just let the compute do what the compute does. That actually takes smart ideas. Those are exactly the kinds of ideas that DeepSeq used so well in both v3 and r10 in particular.

And so when you like the actual distillate of this is so deep seek showed that you could achieve open AI, a one quality performance on at least an inference time, like something like a 30th of the budget.

$5 million for the training or $6 million is what they claim and their asterisks is that only applies to the compute used during the particular training run that led to the successful output. We talked about this before. It doesn't account for all the experiments they had to run, but still. Okay. So in other words, I can get a lot more intelligence per flop. I can get a lot more intelligence per unit of compute. That is the deep seek story. Does that sound like a bearish case for NVIDIA?

To me, it sounds like a bullish case for NVIDIA. Essentially, your GPUs just went up in value 30x at inference time. That's what that says. It says that actually the slope on the scaling curve that you get by applying the lesson that DeepSeq learned in this process are actually way steeper than we thought before. The ROI is even bigger.

And because there's a never ending demand for intelligence, like that is what the literal economy is based on. They're essentially a huge pump, the kind of chunk of it. All this means is people like they hold to the same question when you're, it doesn't matter if you're, uh, you know, anthropic opening eye, whoever else your questions yourself is always going to be the same. How much money can I possibly afford to pump into my compute budget?

And then I get whatever intelligence I get out. What this says is you'll actually get 30x more. So if anything, this actually makes a bull case for, hey, why don't we try to, if we possibly can squeeze even more into that budget.

That's what's going to happen. Like mark my words, that is what is going to happen. It's already like, there's like this swath of people who see this as, as pessimistic news. But when you talk to the people in the labs themselves, that is not where they're going. This is very much like scaling is very much alive. We happen to be in this special moment where we're kind of at this turning point in paradigm in the paradigm, right? Where we had pre-training for really long time was the dominant paradigm. Now we're having inference time compute take over a little bit more in RL and all that. And

And that gives new entrants the opportunity to just get ahead. But the fundamental question over the next six months, 12 months is going to return to, yeah, but how much compute can you throw at these same strategies? DeepSeek, if they're not state backed in a very big way by them, they're going to struggle.

already struggling, as their CEO has said, with accessing enough high quality compute resources to push the sport. Export controls are absolutely hitting them. This is another lesson that's been mislearned. Everyone's like, oh, wow, a Chinese company did a really impressive thing. What's the point of export controls? No, no, no. The lesson is compute matters 30 times more than it did yesterday. Export controls are even more critical. That's the real lesson here.

So anyway, there's all kinds of stuff. We'll, we'll, we'll talk about this as well in the policy section because Dario from Anthropic had a blog post that's actually kind of a banger in my opinion. But anyway, all that is to say, and this is to the question of one of the viewers who was asking this on, on the discord, that's kind of my take on the export control story. This is a really impressive model, by the way. I think a lot of people who are trying to cope with, you know, this is not actually super, it is impressive. It's absolutely impressive. It's

Also absolutely on trend. But the crazy thing is that you have a Chinese company that is on trend in terms of what the frontier ought to be able to do or reasonably close to it. Maybe not quite on the frontier. So anyway, super impressive model. Just look at the sweet bench verified score, you know, 49.2. That's better than opening eyes. Oh, one model in seven on 17th December. That tells you everything you need to know. This is legit. It has huge, huge implications, but they are different from what a lot of, I think the kind of mainstream narrative is right now.

Right. And I think we'll spend a little more time on the mainstream narrative and the reaction to R1, which was pretty extreme, I would say. And we'll get to there in the business bit. For now, we are focusing more on the technicals. I will say one more thing before we move on as to the technical report of the paper.

One of the things that was very interesting, and I appreciate a lot, this is not done enough, they do have a section on unsuccessful attempts and things that didn't work for them. And they do actually call out process reward models as something that kind of worked, but basically failed.

the compute wasn't worth it. Just doing RL turned out to be better rather than this more complex approach. And they did try Monte Carlo tree search inspired by AlphaGo, AlphaZero and others. And this is another kind of idea that people are excited about doing more of a search process where you do search to get a good outcome rather than just doing RL, which is what seems to be the case here. I think also

There are some details missing in the XactoRail setup because there are

Various ways to do reinforcement learning. They do, I guess, one of the big pieces here is GRPO, which we didn't even mention. But worth mentioning, they're using group relative policy optimization as the reinforcement learning algorithm, which they came up with back in early 2024. Also demonstrates that that is a very promising algorithm algorithm.

That algorithm makes it possible to train more efficiently. We can't get into the details, but it seems to work quite well. So anyway, it's a great paper. It's a very interesting paper if you're tracking this stuff. And R1 certainly is impressive and exciting. And we'll probably get back to it in a bit. But...

Moving on, there's a few more stories we're not going to be able to dive as deep on. So we're going to just start moving quick. First up, next story is again on DeepSeek. And this was fun. Just after R1, very soon after they did announce...

another type of model, a multi-model AI model called Janus Pro, which they claim outperforms other models like it. And the very last thing to mention about DeepSeek R1 that is notable, it is very permissively licensed. I think it's the MIT license, which basically means you can do anything you want with it. That means that you can use it for commercial applications, for

you know, research, obviously, for pretty much anything. There's no restrictions of any kinds, which often are there for other open source releases. So that is another aspect of why this is exciting. This is now one of the frontier models that you can use to build upon. And obviously, that's exciting to many in this space. And now moving on, we have quite a few more stories to cover. So we'll have to go quick.

Next up, we have another story about DeepSeq and another model they released, which is not quite as big of a deal, but still very cool. They have now a model titled Janus Pro, which is a text-to-image model, also released under VMware MIT license.

Similar to, you know, other text-to-image, I think it's hard to say exactly. You know, it looks very good. It does reportedly outperform DALI Free and other models like Stable Diffusion Excel on benchmarks. They released a 7 billion parameter version of it as well as a 1 billion parameter version. So it is...

There are pretty good open source text-to-image generators out there, not quite as big of a deal, but pretty impressive that DeepSeq as a lab, really, an R&D project, not a commercial venture, is now releasing multiple models like this into open source and making a big impact, really.

Yeah, and it bears mentioning, right, the same kind of company that tends to be good at making reasoning models also tends to be good at making these kind of more multimodal systems. That's not a coincidence. But anyway, so it'd be interesting to see, like, will we see more multimodal models from DeepSeek in the future, you know, integrating the reasoning with the vision and other modalities? I mean, I certainly would expect that to be incoming.

Right. And one other nuance to mention here, I guess in the description for it, what they highlight is that this unifies multi-model understanding and generation. So the big highlight is the text-to-image part, but they are combining, we have vision language models, which are

image plus text to text. That's image understanding. We also have text to image model, image generator models, which are just text to image. These are usually done in different ways, slightly different ways, trained in different ways. So the very interesting fear here is the unification and getting it kind of all to work together. So here again, there are some pretty significant technical insights that are novel and

actually potentially quite impactful. And there's another paper on this, Janus Pro, Unified Multimodal Understanding and Generation with Data and Model Scaling. Can't go into the details, but again, pretty exciting research as well as a model that people can use.

And moving right along, we have another exciting Baywater release which happened just after R1. Not quite as big a deal, but still pretty notable. And this one is about QEN 2.5-1M.

So Quend is coming from another Chinese organization, I believe funded by Alibaba. They've been working on this Quend series of models for quite a while. And so they have now released the technical report for this latest iteration, which is focused on long context lengths. So the dash one M in the name is Quend.

because they are extending it to be able to process 1 million tokens. And so they release a paper with pretty much a focus of how they get to that

optimization of long context scaling. They also release variants of it, 7 billion parameter and 14 billion parameter, and also update their APIs to have access to it. So again, this is one of the, I guess, missing spots in open source models. Often, typically you get lengths of more like 128,000 tokens. So again,

effectively going up to a long context is a pretty significant deal. Yeah, and they use a whole bunch of techniques for this that they document quite well. One of the key ones is progressive length training. We've seen this in previous cases, but they push it to its limit here where you start with like a relatively small context window or effective context window of like around 4,000 tokens in this case.

And they gradually increase it. You go to 32,000, you know, like 64,000-ish. You're basically doubling every time until eventually your model kind of gets to the point where it can accommodate the full context and do really well on things like needle-in-a-haystack evals, which is one of the things they look at. There's also this need to track. So because the attention mechanism doesn't natively care about the word ordering, you have to superimpose essentially a

Anyway, you use a technique to kind of superimpose some sort of sinusoidal type pattern on top of your embeddings so that you can track which words are where. They use adaptive rope bass frequencies that increase with context length. Basically, it's a way of kind of dynamically adjusting to...

adjusting this kind of word ordering accounting strategy as you grow that context window. The training data mix too is kind of interesting. So for that progressive length pre-training, what they do, sorry, training rather, 75% of the text that they use is actually the full context at that length. Like 75% of it is the maximum length it can be. And then they have shorter sequences for 25% or so. But anyway, use all kinds of other techniques that we won't go into too much detail. We talked about sparse attention in the past. They do use that.

a lot of ways to kind of do like VRAM optimization to on the chip and all kinds of stuff. So it is, it is really cool. It's another one of these like very engineering heavy, kind of open source developments, right? We're, we're starting to see like the, in order to be able to read these papers, you have to understand the hardware and you have to be able to kind of get into the weeds on like what your VRAM is doing and what your SRAM is doing even in all this jazz. So

And increasingly, I guess you could say that Frontier AI is about the engineering side, or at least the engineering side is totally inseparable, of course, from the architecture and the modeling stuff. So anyway, I find this really interesting and good timing for our hardware episode. How about that?

Yes, exactly. Also, with regards to scaling laws, I think an interesting thing to note that obviously the general idea of scaling is you make bigger models, you get bigger data, you combine those things, you get better performance. As we see with DeepSeq v3, with R1, with this,

Ultimately, just to do effective scaling is not easy, as you said earlier, right? So it's about figuring out the right mix of ingredients, the optimization process, the hardware, et cetera, the data that enables you to do effective scaling and ultimately work on various problems.

And this is another demonstration of the accumulation of knowledge that exists in this space that just wasn't there two years ago for people to leverage.

And on to the next story, again, a second release from the Quent team. And this one is Quent 2.5-VL. So as I mentioned, this is a vision language model. This is focused on things like analyzing text and image, video understanding, and object counting. And similar to OpenAI's operator model,

and to on Frappic computer use API, this would power their ability to control website browsing and pretty much use your computer for you in an energetic manner. So this one, I would say, is less, again, of an interesting big deal. This also, as you like to mention,

Jeremy came with an interesting blog post. The title of the blog post was Quent 2.5 VL, Quent 2.5 VL, Quent 2.5 VL. What's going on over at the Quent team, man? What's in the water? Someone is very creative. They release blog posts that are not boring.

So yeah, here they have various demonstrations of the model. So clearly, these teams are getting a lot of resources, or at least they are able to make a lot of progress at this point. And this is part of the reason, I suppose, why there's been a very strong reaction to all this stuff.

Yeah. And one of the sort of concrete advances they've had to do here is in the kind of long, like as they put it, ultra long video understanding, just because that's what you need to make an agent that runs on a computer like this.

I will say just like from a national security standpoint, you think about like, so we've talked somewhat about, or actually quite a bit about this idea of the legal picture around pseudo permissive licenses, right? So you have a Chinese company that puts out some model that's really performant. And there's a license term that says, if you have any issues with

Like the use of this model, those issues get litigated in a PRC court, in a Chinese court, right? And this kind of gives you a bit of open source warfare vibes where, you know, it kind of brings you under the umbrella of the CCP. That was a kind of an interesting, maybe vaguely academic problem or not entirely kind of not a huge, huge deal, but a thorn in the side of the United States.

Here, when we're moving into increasingly these operator type models that actually take control of your computer and do actual shit, like potentially send emails for you or have access to your personal data and the ability to exfiltrate it to servers outside your kind of remit, this starts to become a real frigging issue. You think about open source warfare in the form of planting black kind of backdoors and Trojans in these models to have them behave in certain ways that you're

achieve the objectives of the Chinese Communist Party or whoever developed them. This is actually a very interesting strategy and open sourcing is very, you know, I'm not saying that that's what's going on here. I suspect it's not. But as we get more used to using these kind of like PRC originated models, this is something that we ought to start thinking about is like, who's building these models?

What incentives do they have to bury certain behaviors in ways that are inscrutable to us because we lack the interpretability techniques to look at these systems in detail? I think that's an actually under-discussed dimension of this from a national security standpoint. There's a world where a year from now, we just discover that, oh shit, the latest zero day is actually to use all these deployed agentic models that come from Quinn or DeepSeek or whatever else. So I think that's a really interesting dimension of this, something that is worth tracking anyway.

That's right. And moving right along again, we're going to start moving very quickly, moving on to tools and apps. Going to take a quick break from all the R1 and Quinn stories, going to OpenAI and the

going to another story related to that agentic computer use kind of story. So just recently, OpenAI launched a research preview of Operator, which is exactly that tool that you can use within ChatGPT that will browse the web and do pretty much computer use of the same sort that Anthropic and in this case, the Quint team demonstrated.

So within operator.chatgpt.com, if you go there and if you have access, you will only be able to try it out as a US user. If you are at the $200 pro subscription tier, at least for now, you can then use it. And there's going to be a small window that will pop up with a dedicated web browser that the agent will then start using.

And as a user, you can still take control because the operator is using its own thing. It's not controlling your computer. So you can keep doing other stuff.

OpenAI says that Operator has a computer-using agent model. We don't know very much about it aside from that, similar to the Anthropic computer use model. But apparently it's trained to interact with visual websites to be able to click and read text, navigate menus, etc.,

So it's been something that Anthropic launched, I don't know, back in October, many months ago, they had this preview on their API. At the time, it was kind of a big deal. And I think people are still very bullish on their GenTech AI. So, you know, I think it was overshadowed a bit by R1 and the conversation around that. But I do think this seems quite notable.

It does. And it's, you know, it's not perfect. They're very kind of forward about that. Obviously, they have to be because if you're going to launch a, you know, model and say it's agentic in this way, people are going to use it for real shit. And so they do say currently operator cannot reliably handle many complex or specialized tasks.

such as creating detailed slideshows, managing intricate calendar systems, or interacting with highly customized or non-standard web interfaces, fine. So that's what it can't do. But they are explicitly taking a kind of precautionary approach here. They're requiring supervision for some tasks. So banking transactions and...

Other areas where you have to, for example, put in your credit card information, the user has to step in and actually do that. And OpenAI does say, and this is relevant in that context, operator doesn't collect or screenshot any data. So this is obviously like, you might be nervous about entering your credit card information in a context where you have operator running on your system. Their claim is that they're not collecting that data. So it's sort of interesting this, where do you do this handoff between the human and the AI in context of this open-ended?

I mean, at the end of the day, until we have full on AGI, right? Like we're not going to have a clean answer to that question. It's a little dicier even in self-driving cars where at least there you're in a very constrained environment. You know, you're on, you know, you're on a road, you know, it's just like other cars, pedestrians. It's a notoriously complex environment. Don't get me wrong.

But compared to the entirety of the internet, you're going to run into some really wild out of distribution settings there. And the stakes are high there too, right? You could be giving money away, you could be downloading malware, doing all kinds of stuff. And it is an adversarial environment in a way that drives it. So I think this will be really interesting to see like how robustly can they make these models? How quickly can they improve them? But there are all kinds of partnerships, obviously, as you'd imagine.

with companies like DoorDash, Instacart. So a lot of like YC companies, which is interesting because obviously Sam Moulton was the president of Y Combinator for a while. So he's got good relationships with those guys, but also eBay, Priceline, StubHub, Uber, and so on. So

Just, you know, making sure that operator respects their terms of service is obviously top priority for them and a good sort of initial trial round shakedown cruise for operator here. Exactly. And I think similar to Anthropic, Anthropic's computer use API is similar to Project Mariner from Anthropic.

Google, that was announced just in December. No real timeline on when this would be widely available and reliable. My impression is that with all these efforts, this is taking us towards that future where agents will just do stuff on your behalf. But it probably will take a while for us to get there. Just looking at OpenAI only releasing this now, many months after Anthropic,

With multiple limitations, it also refuses to send emails and delete calendar events, which, you know, as an assistant, which presumably you want your agent to be sending emails and deleting calendar events as necessary, right?

So yeah, it's exciting to see more work towards this front. If this whole idea of like, please buy a ticket for me, I don't know why everyone likes this idea of getting AI to book a ticket for you for travel. I don't think that's a good idea, but often that's one mentioned. Clearly, eventually we'll get there and we're making progress towards that future. I'd talk about my views on that more, but I got to catch my 3 a.m. flight to New York City, so...

Good one. So, and moving right along and getting back to DeepSeek and another aspect of the story. So obviously as we nerdy people who cover AI all the time, the paper of R1 was very exciting and very interesting having an

Oh, one level model, almost a one level model was very unexpected even. But another aspect of a deep seek story that I find surprising and interesting is that their app that is on smartphones.

has become hugely popular. So the story is that the DeepSeek app had reached the number one position on the Google Play Store. That means that it saw over 1.2 million downloads since mid-January and 1.9 million downloads globally overall. So that is pretty crazy, right? Because obviously we've seen ChatGPT go viral,

We've seen kind of that huge spike in usage. The fact that DeepSeq now pretty much went viral with their own chatbot that is a ChatGPT competitor. It is a way to talk to the v3 model for free in this case.

and people are, I guess, flocking to it is, again, something that's kind of surprising that I would imagine has open AI a bit worried. We've seen some reactions of like, you know, being grouchy about people being excited about it. So clearly, I think also one of the reasons that we saw a very strong response to the DeepSeek R1 release.

Yeah, I do think once these new paradigms get hardware saturated, at the end of the day, it's going to devolve into the same thing, right? Who has the bigger pile of GPUs and the energy to run them and the ability to cool them? So I think in this case, China ends up in more or less the same position they were. If they were struggling to compete on the basis of pre-training, they're going to continue to struggle to compete when inference time compute becomes more important. It's just that

We're not yet at the point where these specific techniques, where this paradigm has been scaled on the hardware to the point where we're saturating the full fleet of hardware that we have available. Those optimizations are unfolding as we speak right now. That is part of the design conversation with the next generation of not just NVIDIA hardware, but also the way that data centers are being set up and computers being set up and networking fabric and all those things. Yeah, very rapidly expect people to blow past the deep seek, the O1 platform.

and R1 levels of performance. And I think you're going to see, unless there is a sustained and concerted effort on the part of the PRC, which well could be, to consolidate compute and do massive training runs that can compete on a compute basis with what we have in the West, you're going to see just the same thing play out again. I would guess that you're going to see a sort of takeoff happening with Western models clearly pulling ahead.

But the gap between open source and closed source is probably going to continue to close. And that's worth tracking. I will say with this launch, you know, number one on the US Play Store, like the DeepSeek app itself has shit going to Chinese servers. So use it your own risk. But like, this is again, a form of it's not open source warfare. It's a little different because this is the app, the kind of the deployed app.

But this is part of the structural advantage that OpenAI has enjoyed and Anthropic have enjoyed, especially OpenAI, because of the brand recognition. As people use that system more, they get more data, which they can use to train their next models. In this case, though, bear in mind, civil-military fusion is a thing in China. And so anything that a Chinese company has, the Chinese military has. And so that's who you're sending your data to. It won't matter for everybody, but for some people, it may. Yeah, you may not want to give it passwords to all your sensitive documents if you're working at Google or whatever, right? Yeah.

And of course, worth mentioning quickly that, again, this is coming from China. So many people have reported that it is, let's say, censored in various ways that the Chinese government unsurprisingly would like it to be censored. Although if you get the open source model, it's pretty easy to circumvent that. As we have covered, you can untrain all sorts of restrictions in these models. The model is aware of the things it's not supposed to say.

So, but yeah, in the app, we can expect that. But then again, also my impression is like, if you want to try something that's free and that is a new thing that you might prefer to chat GPT and you're not worried about sensitive information, like it's actually a good app. And I'm sure that's part of why people are flocking to it. And next story again,

We covered DeepSeq, now we are going back to Quen. So in addition to that 1 million context length model, Alibaba did also release Quen Chat v02. And that introduced features like web search, video creation, and image generation within that chat interface that adds on to some of the, I guess, simpler things like document analysis and image understanding that was already there.

So web search is coming on the heels of OpenAI not too long ago. Adding web search to ChatGPT is one of the ways for it to get context for answering questions.

I think notable that, you know, obviously in China, when is filling out that, I guess, niche, or at least one of the companies filling out that niche of having a chat GPT type consumer service that you can pay to use a chat bot.

And now I guess one of the nice advantages that you have if you're using command chat is that they have that 1 million parameter model that has a very long context size, which is more akin to Unpropic, Opus, or Gemini that do similarly, I guess, optimize for long context. So

A lot going on clearly with LLMs and AI in the China space in the month of January. And moving right along, coming back to the US, but still talking about DeepSeek. It really is all the rage. The next story is about Perplexity.

Perplexity is that AI powered search interface that's quite popular. And they very quickly launched a US hosted DeepSeek R1. So now, if you use Perplexity, you can choose to use R1 to power the pro search mode. And that used to be

an option with a one only now you can choose to use deep seek r1 there. So not too much else to say, but interesting to see so quickly after release, first of all, them integrating that into their product, hosting the model in the US and giving it as an option for people to use. Yeah, I mean, if I'm perplexity, I'm really loving this strategically. Not that it's going to last, but

at least for now and maybe in the future, just having these alternatives to opening eyes O1 model that are credible frontier level capabilities, perplexity is an aggregator in a sense, an aggregator of capabilities from many different models. They're not building the frontier models themselves. They are outsourcing that to others. And to the extent that you have many different companies building models, you have the space become more commoditized. The value capture ultimately

is a lot easier to do at the aggregation level, if that's the case, or at least it becomes a much more plausible value add. So I think that's kind of a strategic implication of the R1 release and its incorporation here into perplexity.

And just a couple more stories in this section. Next, we go to Apple and a bit of a, you know, fun, oopsie AI is doing silly things type story, which is nice to mix into all the serious progress stuff. Apple, as some may have seen, had really silly AI generated notifications for news. This is coming after they released iOS 10.

18.3 that has AI enabled by default and how weird that that isn't something we are like covering as a priority. I think Apple, it went pretty under the radar that they are rolling out Apple intelligence more aggressively now than

But part of the thing that didn't go on the radar is the silly things it does, like they summarize, I guess, headlines and the news stories in the notifications. And that leads to a lot of very silly, incorrect summarizations saying essentially false things. And it was so bad that Apple, as per the news story, has disabled this feature. And there are also examples of it.

summarizing messages that you get sent from contacts similarly very embarrassing or silly types of things at least some people are getting so yeah apple you know has been quite slow to get to this compared to other companies you could say strategically but

This is not a strong indication that Apple intelligence is going well. And really reminds me of Gemini, right? And when Google rolled out their stuff, you saw similarly very silly types of things that indicates that these companies are rushing this stuff. Yeah, absolutely. And I think Apple, sort of like Amazon, there are a couple of companies that are noteworthy for having been late to the game and recognizing that, yep, like Apple,

It does seem that the scaling laws work, dude. Like we're on track for AGI. I don't know what you guys have been doing, but the cost of being behind is just, it's so compounded across verticals, right? Like your hardware stack, your networking, what you do to acquire the power you need to build the data centers. And then the model development teams, like at every layer of the stack, you have to find ways to convince the best people because the best people are...

in this domain. Like they are absolutely 10 X, a hundred X engineers in terms of the leverage you can get from their work. And so the difference between being in the number one spot and the number two spot is, is a world of difference. And so I think that's kind of the part of the tax that anyway, Apple and Amazon are paying and Amazon at least has had the

the wisdom to partner with Anthropic to get help, you know, getting, getting their like Inferentia, Trinium 3 chips online. So Apple doesn't really have that kind of partnership. And, and that I think is actually to their detriment. I think that's something that if I was at Apple, one of the things I'd be looking to do is like, how do you find a way to partner with a true frontier lab and get them to help you with your, your build? Cause it's clearly not going too well. Right. By the way, I guess worth mentioning this, this,

iOS 18.3 had some other updates. There's now visual intelligence where you can point your phone at a thing.

and ask questions about whatever you're taking a photo of, similar to things you can do with ChatGPT. So Apple is rolling out some other features, but I guess this was the highlight that people are aware of, at least as far as I've seen. One more story along those lines that I'm sure not many people have heard, but is fun to cover. French AI Lucy looks trashy.

from this is from headline, but keeps getting answers wrong. So France launched an AI chatbot named Lucy backed by the government with the aim of promoting European values and apparently countering English language dominance in AI tools. So this would be, you know, This is such a European project. I'm sorry. It's just so European. Yes. And then

Shortly after this launch, it was suspended for providing incorrect and humorous responses, causing amusement and frustration. So there's various examples of very silly things like it saying that cow's eggs are a healthy food source, things like that.

So pretty embarrassing or at least humorous that this is that. And as you said, Europe, as part of a larger story, you're clearly very behind on competing with both US and China. And this is not a great

on the power of Europe to develop this kind of tech. Yeah, I feel like Emmanuel Macron, the French president there, he kind of knows just enough to spend a lot of money on stuff that's really dumb, but that's just not obviously dumb. It's a dangerous place to be. And anyways, there's been a couple of things like this.

I mean, I guess I've said before on the podcast, just for those tracking at home, like I think Mistral, for example, is going to be in a world of hurt. I don't think they can keep up in a world of scaling and I expect them to fold at some point or get acquired or something like that through like a bunch of other labs that we've seen that happen to.

But one specific thing that I find kind of funny here, sorry to like bash this too much, but it is funny. So Lucy, whose logo is a female face said to be a combination of Marianne, the French Republican symbol, and yes, Scarlett Johansson, the American actress was widely criticized. Why you would go with Scarlett Johansson after the GPT-4-0 debacle? I have no idea.

I have genuinely no idea. But this apparently sounded like a really good plan. And they went ahead and did it. And so now this is just like another layer of shit in the shit sandwich that is this giant government investment in this chatbot. I don't know. There's so many things going on here. I'm like, I just don't know. But I'm sure they have a plan. I'm sure they have a plan. Right. And they did receive funds from the broader national...

National Investment Plan. The organization, by the way, is Linagora, a French open source software firm that is the leader of a consortium behind the project. They did say a statement that the launch was premature. So yeah, we've seen that happen also to Google and also to Apple, clearly. So I guess they're not unique in having this sort of response, but still.

A little silly. And moving right along to applications and business, once again, we've got to come back to DeepSeq and cover the result and response to the R1 model. So I don't know exactly the timeline of this. That's one interesting kind of thing. Pretty much nobody cared about DeepSeq v3 in the business world, at least. Then R1 came out and everyone went crazy and started

panicking, or at least there was clearly a large amount of panic in the US business world. So there was a 2% fall in the S&P 500, a 3% fall in Nasdaq, and a 17% plunge in NVIDIA's stock. I mean, 17%. That's 600 billion of market cap value. So

Clearly, we saw a lot of news coverage of the story, a lot of coverage that was not very good, that cited the $6 million figure from the paper in comparison to billions of dollars spent by OpenAI, which, you know, obviously that's wrong. The $6 million story is about the training costs, not the infrastructure costs.

And as to the ramifications for NVIDIA, there it's quite nuanced. It may be the case that

Nvidia could see less future profits due to the ability to train more efficiently that was covered in DeepSeq v3. But then again, right, the demonstration of that paper is that with the relatively weak hardware available to the Chinese companies that are restricted from buying the latest generation of chips,

they were still able to train. So on that side, you could argue perhaps that NVIDIA would not be able to sell quite as many of their products

flagship chips that are the most expensive. But regardless, yeah, from my perspective, this was a little surprising, perhaps an indication of that it's almost like a wake up call. There was a blog post made of last year of a 600 billion question of AI where you've seen a huge amount of investment in infrastructure that isn't leading to profits, except for NVIDIA, I suppose. And

And so I think this could also be indication of a bit of worry that all of this massive amount of investment is perhaps not going to pay off quite so well. So I'm just going to douse cold water as I tried to do before on this whole narrative. I don't like to be clear, like not an Nvidia shill. And it's just a fact of the matter that they kind of run the space and have massive market capture. But

There's a great report out by Semi Analysis that goes over this in detail, though it was already obvious to some extent previously. But the actual CapEx, right? So the question that you ask yourself when you're deciding whether to buy more NVIDIA chips is,

isn't what is the marginal cost and compute of training my model. It's how much is my fucking cluster going to cost me? How many NVIDIA chips do I have to buy? The total capex, the total server capex for this cluster, according to SEMI analysis, was $1.3 billion. A lot of that went to maintaining and operating the GPU clusters and kind of operating. But this is a massive expense that is orders and orders of magnitude greater than the advertised $6 million training cost.

The $6 million training cost, again, is perhaps the compute cost associated with one training run, the one that specifically led to the V3 model. It is not the CapEx cost, which is the main thing you're thinking about when you're deciding whether to buy more NVIDIA chips. That's the thing that NVIDIA's revenue is really based on, to a significant degree anyway. The other thing to keep in mind is they advertised, and we talked about it at the time,

when V3 first came out. But what we're learning, and this was kind of like, the ball was fumbled a little bit. I think Scale AI CEO Alex Wang and maybe even Dario at Davos gave interviews where they mistakenly said something like there was like 50,000 H100s in reality.

available to DeepSeek. In reality, it's a mix of H800s, H100s, and then the H20s, these China-specific chips that we've talked about a lot in the context of export controls, and which probably should have been export controlled as well, but weren't. That was a chip that NVIDIA specifically designed to just skirt the export controls, get right underneath the threshold, and be able to sell to China. So

Moral of the story is, dude, we got to tighten up export controls even more. They are working because how much worse would it have been if DeepSeek had been able to get their hands on more of this hardware? That's a really key question. So the stock plummets when people, in my opinion, incorrectly interpret the

kind of result here of DeepSeek. But the other kind of complicating factor is that literally the next day, we find out that President Trump has tariffs that he wants to impose on Taiwanese semiconductor exports, up to 100% tariffs, he says, which would actually justify a crash in Nvidia's stock. And so now we're left wondering, did the stock crash because people incorrectly priced in the implications of DeepSeek on day one,

Or was there some sort of leak ahead of time and insider trading based on that leak about the imminent announcement of potential tariffs on TSMC or on Taiwanese imports? That's actually very ambiguous to me right now. I wonder if somebody has done kind of a detailed analysis to kind of parse that out. I don't know how one would.

But I think there's some blurring of the lines there that makes things quite interesting. So bottom line is, I think fundamentals here are bullish for NVIDIA, modulo the tariffs, which is actually going to be a big issue for USAI competitors. Right. So I guess I think we're both on the same page as far as analysis, where this is a bit of an overreaction, seemingly, and only kind of could be said to make sense

from the broader perspective of the outlook for AGI and building out these massive data centers in general, rather than specifically DeepSeek on its own.

Moving right along, the next story also relates to data centers and it is on the relationship between Microsoft and OpenAI. So there was a post by Microsoft that is updating the details that we know about Microsoft and OpenAI's relationship. So now Microsoft no longer holds exclusive cloud provider status.

to OpenAI, although it does have a right of first refusal agreement where OpenAI, I guess, would have to at least talk to them.

OpenAI is still committed to using Azure quite a lot, but clearly also trying to loosen up its relationship with Microsoft. This is also in the context of the Stargate project we'll discuss a little bit more, which seems to be primarily for the benefit of OpenAI, with OpenAI getting exclusive license to use the outcome of that project.

OpenAI and Microsoft has a long storied relationship, quite interesting from a business strategy perspective. And this is the latest update to a very ongoing situation. Yeah, in a way, maybe not the most shocking. We actually talked about this in the context of the OpenAI Oracle deal, right? That turns out to have been part in retrospect of Project Stargate, this Abilene, Texas project.

build that they're a cluster that they're partnering on, right? That was the first time we looked at that. We're like, Hey, you know, this is really kind of my opening eye going off script with respect to their relationship with Microsoft. As we understood it at the time, it seems like that has evolved. As I recall, the argument back then was, Hey, Microsoft seems a little sketched out about,

following OpenAI as far as they want to go at the rate they want to go in terms of these build outs, the very, very aggressive, right, like $500 billion over, I guess, four year build out pace or five years, no, four years. It is also worth noting, by the way, that

So Microsoft has been investing $80 billion per year in new data center builds for AI, which actually, if you look at it over a four-year period, isn't that far off from the $500 billion number. There's a lot going on there. And maybe opening eyes desire to have exclusive access to that cluster was a big factor too. This is a big deal. The other piece that's been talked about, and Elon tweeted about this, and he was both right and... Actually, I mean, I guess he was technically right

Sam kind of framed this up as like, yeah, it's a $500 billion investment.

Funding secured, to coin a phrase. Elon said, no, you don't have the funding secured. I have it on good authority. He said, I think on X at some point that SoftBank only has, I don't know, 10, 15 billion available to spend in liquidity. Anyway, when you add together the amounts, 15 billion from OpenAI and another 15 billion from Oracle or whatever, it was like that order of magnitude just didn't add up to anywhere close to 500 billion. That was absolutely correct.

So there's $100 billion that is actually secured, and the hope is to raise the additional $400 billion in time. That extra $400 billion, therefore, is to some degree a marketing play. OpeningEye is trying to make this the project, the government-favored project. That's a big element here. I will say, I mean, OpeningEye's security is shit. I know this because we've been doing an investigation for the last year on Frontier Lab security and the implications for national superintelligence projects.

which will be coming out, I guess, two weeks from now. So that will probably be covering it here. But when you're announcing to the fucking world that you're building a $500 billion cluster that you internally believe to be the super intelligence cluster, you're inviting a nation state attention. So, you know, like Sam A has put China on notice that this is going to be a big fat juicy facility and they know exactly what he plans to use it for. So not great from a kind of a security standpoint, not that you can hide these builds, but there are ways to go about this that might've been a little better

I think that you face this media incentive because you're trying to draw investors as well

But one challenge with this build is who the investors are. So G42 is one of the investors through... They're not investing as G42, they're investing through MGX, but it is G42. That is UAE fund. There's also Saudi money, which is a dominant contributor to Masayoshi Sun's SoftBank. So in a very real sense, the Stargate project is like UAE and Saudi funded,

And I'd have to take a look, but I would not at all be surprised if the lion's share of funding came from those sources. That is interesting from a national security standpoint. The strings that are attached to that funding would have to be very, very carefully kind of vetted. But that, I think, is a very serious issue. And so...

It speaks to some of the challenges people have had with open AI in particular, being willing reportedly to trade off American national security interests for even the national security interests, according to some reports of the Russians and the Chinese, where they say, "We're going to get these countries to try to bid on AI, bid against each other to have the AGI project based in their company." This is the sort of thing. And unfortunately, when there are stories like that that are very credible,

That leads to questions about when you then start taking Saudi and UAE money to do these builds. What actually is the thought behind that? I'm not pretending to know. I can't read Sam Altman's mind. But these are things that you want to be considering, especially if you believe yourself to be building a project of this significance. That's right. And next up, another story related to OpenAI and their ongoing journey, you could say,

Related to the governance piece of OpenAI, they're updating their board of directors with the addition of a black founding partner, Adebayo Ogunesi, I think is how you could say it.

So he was focused on infrastructure investment, was in Credit Suisse for 23 years. I can't say that I know too much about the implications of this, but clearly, you know, coming at a time where OpenAI is still making a strong effort to shift towards a for-profit structure, coming on the heels of, you know, just, I guess, slightly over a year ago, we had the

nonprofit board

have a stage of coup. And since then, there's been a gradual transition of powers, presumably behind the scenes. So this is happening alongside and probably does have some meaningful implications. Yeah, it's basically, I mean, my read on this is they need a guy who can help bring in massive amounts of like Saudi and like sovereign wealth fund money into giant projects. And so this is a great finance guy, really experienced doing this kind of thing.

They say actually in October, he launched a joint GIP BlackRock $30 billion fund with backing from Microsoft. Okay. Okay. NVIDIA. Okay. Okay. And Abu Dhabi, there it is, to build data centers and adjacent power infrastructure. So this is a guy with experience with those UAE stakeholders, the sort of G42s of the world and presumably networks that are deep in there. So my read is that that's kind of the play here with this appointment.

And one last story, just to cover something kind of normal, I guess, something more along the lines of what we may get on a calmer week. Worth knowing that 11 Labs, which is specialized in AI voice technology, has raised $250 million in a CEC funding round that leads to them being valued at $3 billion. I believe we covered this as a potential story, and this is the...

confirmation of that funding round led by iconic growth, Andrew Seaton Horowitz, I think a name that perhaps isn't as well known, right, as OpenAI or as Anthropic, etc. But as a leader in the space of AI voice technology, it's a very significant organization and

I do think clearly that is reflected in this funding and in the valuation. And that's always to say to that, let us move on to policy and safety. Skipping research entirely just because we don't have time. And we begin a policy and safety with once again, Stargate and the...

announcement that happened at the probably White House or regardless happened with Donald Trump's presence. So there was a lot of fanfare about Stargate as what was marketed, you could say, as that 500 billion investment in AI infrastructure in the US. And so there was this

presentation where Trump hailed this project and said this would position the U.S. as being competitive, this being part of a Make in America initiative. And he also mentioned using emergency declarations to facilitate infrastructure development. So an interesting, like obviously, Jeremy, you would know more about this, whether the U.S. government

can kind of get behind this project and what the implications are for this announcement happening. Kind of weirdly, because Stargate has been ongoing for a while and they seem to be kind of pushing it now in a way that isn't really news, but is being made to seem like a new thing. Yeah.

Yeah, in fairness, like, so that aspect is not so unusual. I think TSMC did something similar late in the Biden administration, where they had a big fab they wanted to announce. And they just said, Oh, we're gonna kind of wait till Trump's in office, and then, you know, give him credit for that.

It's done. It's just like part of politics as usual. It must be said, this is a an especially Sam Altman type move, especially given that he's he's playing catch up right now in terms of his relationship with the administration, having been, you know, like a very public anti Trump guy for so long. And then he sort of put out some some pretty.

I don't know, to some cringy tweets. Like it's sort of like when you've been tracking his views on the previous Trump administration and to see this 180 is kind of like, oh, that's interesting. You know, like it's very clearly, at least to me, it seems very clearly a

an attempt to ingratiate himself, which like, yep, like I get it, you're running a business here and there are obvious governance implications, but it's going to be part of the calculus for anybody in that position. In terms of what this actually means in terms of government support, there's no, like I'm not tracking any government investment in this. In fact, for amounts this high, it would be really difficult for just the president to be able to say, hey, yeah, we're going to fund you because Congress is responsible for appropriations and

So getting more funds in the door, that just isn't something the presidency can easily do without taking money away from other things. That being said, Trump has been very forward on deregulation of, in particular, environmental regulations and other issues that slow data center build times and power builds.

which is actually really important. Right now, our biggest gap relative to China is our ability to field enough power through whatever means to build really big campuses. We have all the hardware we can eat more or less, but we need that energy infrastructure. A bunch of executive orders came out towards the end of Biden's term, which seemed to still kind of be live. And then, so that's kind of interesting. So Trump has let those be because they did point to deregulation.

But he's making other moves and really kind of bolder moves to deregulate and move things forward, which I think if you're a fan of American competitiveness in this space, that is an important play. You want America, really, no matter what position you have in this field, even if you care about loss of control, which I absolutely do, you want America to be first so that American labs have enough lead time that they can work on their alignment techniques and not be too, too rushed by geostrategic factors. So anyway...

I think this is actually like, it's good that Trump is saying he's behind this. The funding sources, and this is more of a Sam Altman sourcing the funds thing, the funding sources are potentially an issue unless the strings are very carefully looked at. You may need sovereign wealth fund money from outside the country. That may just be a fact of the matter of this stuff, but you definitely want some intense national security scrutiny on the source of those funds and the amount of leverage they have on the project.

And as you said, once again, that 500 billion number is basically just what they hope to get over the next four years.

Apparently, the $100 billion number is from South Bank CEO Masayoshi Son, who there's other investors there, including OpenAI. So a huge project and a very ambitious project. We'll see, I suppose, what winds up happening with it. And next, some more news related to Trump taking office, which happened yesterday.

I think last week we couldn't cover. So that's surprisingly, I suppose we knew this would happen. President Trump

Trump has rescinded the executive order on AI from the Biden administration, the safe, secure, and twice-worthy development and use of AI order that was a huge, very, very long executive order, did a lot of stuff. Trump had another executive order, initial rescissions of harmful executive orders and actions that went into effect.

And as you said, there's kind of a mix of things that Trump seems to be doing. So this is more focused on the safety piece on a lot of the things that various agencies were targeted towards doing.

But there are other Biden policies and orders that haven't been targeted in this. Yeah, this is actually, it's pretty interesting. So I think we talked about this when this EO first came out, but the executive order that Trump has just revoked, it's an EO that tried to do a little bit of everything, right? So the Democratic coalition that was behind it was, included people with a variety of interests and concerns, some of them about like hardcore, like NATSEC concerns that are

bipartisan, things around the weaponization of AI, loss of control risk, stuff like this. And then there was a bunch of stuff that was more, you might say, clearly Democrat coded, so stuff around ethics and

and bias in algorithms and stuff like this. Anyway, so it was at the time the longest executive order in US history. I think it may still remain that. So that was a fun freaking thing to read when it came out. But you can certainly read this as they tore down the executive order because it had so much extraneous stuff. And the question is, what are they going to replace it with? One of the key things that this EO did that was good was it included a reporting requirement for any models that were trained with more than 10 to the 26 flops, right?

So these are at the time, no model had been trained at that threshold. Now we do have some. But yeah, so the question is going to be, is that going to be reinstated in some form? What other executive orders are coming? That remains an open question. So I think there are people kind of reading in a lot right now and into things that are actually quite unclear. But the reasoning behind this is somewhat clear. Anybody who's been tracking this, you know, the administration was talking about this.

how they were going to revoke this for quite some time. And it was clear why as well, that there was just so much fluff in there that was not germane to kind of the core national security issues that President Trump cares about. So that's kind of the, I guess, the angle that they're taking on that. And a lot of this, we have yet to see how it's going to play out.

And now moving back to DeepSeek, as promised, getting back to the policy and geopolitics implications. And we're going to get into it through, as you mentioned, Jeremy, the take of Anthropic CEO Dario Amadei.

As Amadei has done previously, he posted a blog post conveying his viewpoints saying that they do not perceive DeepSeek as adversaries, basically saying this is not necessarily a bad thing, but at the same time emphasizing the importance of export controls. So Amadei kind

kind of struck a fine line here. He had good words to say about DeepSeq and the research they've done, but at the same time, I suppose, tried to remind the fact that

They are based in China and therefore are directly tied and have to sort of follow the orders of the authoritarian government of China that at least as someone in the West, and again, we want to be very clear here, we do have a bit of bias, you could say, or a viewpoint that is negative with respect to the Chinese government. And similarly here, Amadei kind of positioned China as a,

not a good thing. And as it still being important to double down or continue going with expert controls. Yeah, he also published a blog post, which is which is quite good, kind of laying out in a little bit more detail his his thinking on what deep seek actually means. I think everybody kind of in the space more or less converged. There's more or less two groups of people. They're like people who are looking at deep seek v3. And we're like, holy shit,

and kind of like already had run these calculations in their head. And then there are people who are just kind of getting shocked by R1 when it came out. And the media takes have been dominated by the latter, but the former, anyway, we've kind of done it to death, but it's basically aligned with that idea, right? That scale is going to continue to work. The scaling curves are going to continue to dominate. And the question now is like, how quickly can

and the West saturate the compute that they already have. And once that's done, then we'll get a real sense for who's ahead in the space. But ultimately, hardware is king that hasn't really changed. We just have a second axis on which to scale. And one of the points that Dario makes really effectively too, is like, look, it's been a minute since 01 was trained. I

It's been a minute since like 3.5 Sonnet was trained. And in that time, you know, you'd pretty much expect given the rate of algorithmic and hardware improvements that yeah, you would get a model like this trained on about a billion dollars worth of infrastructure that where the individual training run costs around $6 million. Like none of this is really that shocking. In fact, it's slightly behind the curve. The shocking thing here is not, is not that China-based.

managed to do this per se. It's just the curves themselves are so steep. The improvement curves, we are on course, at least many people believe, I believe, we're on course for super intelligence here. That's what these curves say. If you take that seriously, then yeah, every incremental breakthrough is going to seem shocking. And even things like DeepSeek, which are maybe a couple months behind where the frontier is in terms of the cost performance trade-offs are going to shock people. And when you open source them and you add some marketing that says $6 million,

then yeah, it's going to make an impact. And I think that the main lesson here is expect more of this sort of thing, not necessarily from China as Western scaling kind of starts to drive things, I would expect. But yeah, certainly from frontier labs elsewhere. Right. And this is a good time to circle back to that question we got from Discord and particularly, Jeremy, your take. So clearly there's a bit of tension here. On the one hand, we want to have safety, you know, have...

You, of course, are a big safety hawk, would like to be aware of the alignment concerns and so on. At the same time, you could say, are you race dynamics between the US and China? And DeepSeek demonstrates that. So yeah, what was your reaction here in terms of, I guess, how this relates to alignment and those sorts of things? Like China obviously has a very impressive national champion in DeepSeek.

And a lot has been made of a meeting between the Chinese, so Li Qiang, who is the second in command in China, and one of the co-founders of DeepSeek. That is interesting. The Bank of China announced a 1 trillion yuan investment in AI infrastructure, which incidentally has been incorrectly framed by Western media as $137 billion.

That's what you get if you just do the kind of currency conversion naively. But the number that actually matters is purchasing power, the PPP number, purchasing power parity. And in PPP terms, that's actually a $240 billion investment.

So that is more than the actual total funding committed to Project Stargate. It's more than double, in fact. So when you think about how serious is the CCP on this, they're fucking serious. And they now have a national champion in DeepSeek that has absolutely the technical chops to compete if they have enough hardware on hand. And I think also worth mentioning, not just DeepSeek, Alibaba and Quinn, we shouldn't overlook. They are very much competitive on the frontier model front.

No, great call out. Yeah. And when you and as well, you think of like the Huawei SMIC access and how much anyway, there's a whole story about like what the hardware picture might look like with their seven nanometer process and whether that's enough to make enough chips at scale with good enough yields to do interesting things. It may well be. But the bottom line is like China is for real here. And this is where, you know, like Western national security agencies have a lot of work to do and will have to get more involved.

you know, there's, anyway, this is, this is spiraling into essentially like the work that we're going to launch in two weeks. But bottom line is I think that, that we have to make some very thoughtful trade-offs and calculations regarding what it means for China to be a live player in this race. And at the same time recognize that, yeah, alignment is unsolved. Like there, there are too many people who look at the fact that alignment is

and control of superintelligent systems is probably a really big issue. And they almost don't want to acknowledge that because they also recognize that trying to negotiate in good faith with China is not going to happen. The question that we've been wrestling with over the last year as we've done our investigation is,

What happens if you take both those things seriously? What happens if you acknowledge that, yes, China has basically violated every international security treaty that they participated in? They've taken advantage of the treaties that the US and Russia have engaged in on nuclear and show no sign of stopping. At the same time, we don't know how to control superintelligence, and the likely outcome there is not great if we build superintelligence before we can control it. How do you reconcile those two views?

And I think that's kind of at the core of a lot of sort of Pollyanna-ish, unrealistic takes on both sides that don't kind of take the full picture into account. So I'll park it there because I'm...

I will go on way too long on that piece. Time for a policy episode. I've already heard it just now. And the last bit, there are a couple of stories related to TSMC. We'll focus on one of them. So there was a story of Trump threatening tariffs and there was a response from the Taiwan government in response to that. And then also...

An interesting story of the Taiwanese government clearing TSMC to make two nanometer chips abroad, which is they have this so-called silicon shield restricting that. Now they're lowering that, which is related, of course, to TSMC's work in the United States.

Yeah. The way to think about Taiwan in this situation is that they are a person who has taken your baby. They're holding your baby. And then there's another person who's pointing a gun at them. And they're not going to let go of your baby because if they do, then you're going to be like, eh, yeah, I don't really care if Taiwan gets shot. But they're holding onto your baby so that you care about them getting shot. They're like, no, we're building all the semiconductors here. And if China attacks us, then you don't get any semiconductors. It's really, really bad. This is a perfect metaphor. I'm really happy to hear that one.

Yeah. There's anyway,

So yeah, this has actually been a matter of Taiwanese state policy for a long time, that whatever TSMC's leading node is, they're only allowed to build fabs outside of TSMC that make two generations of nodes behind that. And so when you look at the Arizona fabs that are being teed up by TSMC, famously, those are like four nanometer. And that's because the leading node TSMC right now is the two nanometer fab, the two nanometer node. And so

Yeah, that's changing now. And that's a really, really interesting development, right? That is essentially green lighting the build out of fabs for, you know, the two nanometer, the 1.6 nanometer and so on in the United States, which obviously, America would be really, really interested in, because they need to ramp up their capacity to produce these chips really fast.

If something happens like hot war style, China invades Taiwan, I mean, assume to first order that all the TSMC fabs are booby trapped to blow. Basically, no more TSMC for you. And then everything resets to like, okay, well, what are the next leading fabs? And in that context, SMIC is a really interesting player.

actually. I mean, they'll have issues because they can't get lithography machines and other shit, but like they definitely become more important. And so China rises to much closer to parity with the West.

in that situation. So a lot of interest in onshoring Taiwanese TSMC fabs and capabilities at those higher resolutions. So that's kind of what we're seeing here. That's been greenlit, essentially. Makes sense. And with that, we are finished with this very dense DeepSeek focused episode. Thank you for listening. As always, you can go to the description for all the links, go to lastweekin.ai or lastweekin.ai or lastweekin.ai.com.

on the web to get those as well. As always, we appreciate your views, you sharing, subscribing, but more than anything, you listening and you chatting on Discord also is great. So thank you and be sure to keep tuning in.

♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪

is

♪♪ ♪♪

Thank you.