Home
cover of episode #447 – Cursor Team: Future of Programming with AI

#447 – Cursor Team: Future of Programming with AI

2024/10/6
logo of podcast Lex Fridman Podcast

Lex Fridman Podcast

Key Insights

Why did the Cursor team decide to fork VS Code instead of writing an extension?

The decision to fork VS Code was driven by the need for more control over the programming environment as AI capabilities improve. Extensions have limitations that would lock in certain features and hinder innovation as models get better.

What are the key features that make Cursor stand out compared to GitHub Copilot?

Cursor offers faster and more intelligent features like next-action prediction and code editing across multiple lines. It also integrates AI more deeply into the editing process, making the experience more organic and efficient.

How does Cursor's tab feature work, and what is its purpose?

The tab feature in Cursor allows the model to predict the next edit or action the programmer will take, making coding faster by eliminating low-entropy actions. It aims to make the editing process feel more like having a fast colleague looking over your shoulder.

What technical challenges did the Cursor team face in making the tab feature work efficiently?

The challenges included training small, efficient models, handling long contexts, and ensuring fast performance. Techniques like speculative decoding and using sparse models were employed to improve speed and accuracy.

How does Cursor handle diffs for code changes across multiple files?

Cursor optimizes diff interfaces for different types of code changes, such as auto-complete suggestions versus larger code blocks. It aims to make the review process easier by highlighting important parts of the diff and graying out less critical changes.

Why is Cursor focusing on local models less compared to cloud-based models?

Cursor's focus on cloud-based models is due to the limitations of local models, especially on less powerful machines and the need for larger models that can't fit on a single node. Cloud-based models also allow for more sophisticated features and faster iteration.

What role does synthetic data play in improving AI models for programming?

Synthetic data helps in training models for tasks where real data is scarce or expensive. It can be used to generate training data for bug detection, improve model accuracy, and create verified data for specific tasks like solving math problems or passing coding tests.

How does Cursor envision the future of programming with AI?

Cursor sees the future of programming as a hybrid where humans remain in control, making critical decisions while AI handles the repetitive and predictive tasks. This will allow for faster iteration and more creative control over the codebase.

Chapters

The code editor is where software is built, providing structure and tools for programmers.
  • Code editors are advanced word processors for programmers.
  • They offer visual differentiation, navigation, and error checking.
  • The role of code editors will evolve with AI-assisted coding.

Shownotes Transcript

Translations:
中文

The following is conversation with the founding members of the cursor team. Michael troll swalec save are with omar k and among singer, cursor is a ordinator based on V S code that adds a lot of powerful features for A I assistive coding. IT has captivated the attention and excitement of the programing and A I communities.

So I thought this is an excEllent opportunity to dive deep into the role of the eye in programming. This is a super technical conversation that is bigger than just about one code editor, is about the future of programing, and in general, the future of human AI collaboration in designing and engineering complicated and powerful systems. And now a quick your second mention of sponsor, check them out in the description is the best way to support this podcast.

Weve got on cod for unifying your machine learning stack, master class for learning shop of five for selling stuff online next week for your business and A Y one for your health. Choose wise in my friends. Also, if you want to get in touch with me for whatever reason, or take a survey or similar questions for a name, may of that will be great.

Go to the extreme, a contest. contact. And now onto the four ad reads.

I try to make the interesting, but if you skip them, please still talk out our sponsors. I enjoy their stuff. Maybe you will do. This episode is brought to buy on court, a platform that provides data focused A I tooling for this annotation, curation, management and for model evaluation. One of things I love about these guys is they have a great blog that uh, describes cleanly.

I means technical, but it's not too technical, but it's efficiently technical towards actually describing idea is not be as blog posts on the of the state. They are like the opening eye, one model that was just released. So sometimes they integrated into why this is a part of on cord, why this makes sense, and sometimes not.

And so I love that I recommend their blog just in general. That said, you know, when they are looking at state, they are models. They are always looking for waste.

Integrated into their platform, basically is a place to organize your data. And data is everything. This was true before. The popularity in the explosion of attention methods of transformers and IT is still very much true. Now, sort of the nonsense tic, the human generated data is extremely important.

How you generate that data, how you organized that data, you leverage, how you train, how you find the pre training, the post training, all of IT the whole thing. Data is extremely, extremely important. And so on.

Cord takes data very seriously. Anyway, go try out on core to create, annotate, manage your A, I data at once. D 到 console looks that's uncalled docs lash legs。

This episode is also brought to by master class, where you can watch over two hundred classes from the best people in the world in their respected disciplines. Carlo sand on guitar, for example. I love that one.

There's a few guitars tomorrow. A to great, great, great stuff color santana he's an instrumental eubagis haven't quite tried to play that, but if I like to do list is a sort of one of those things. You know for sure this is a thing I will play because it's too beautiful.

It's too so far IT feels like once you play, you understand something about the guitar that you didn't before. It's not blue. It's not I don't know what that is.

It's some kind of dream like teleportation into a psychedelic world where the tone is warmer than anything else i've ever heard, and still the guitar can cry. I don't know. I love IT.

He's a genius. So it's such a gift. They can get a genius like that to a teach us about the his secrets. Get unlimited access to every master class and get an additional fifteen percent off in annual membership. But master class, that countless legs pod that's masks are come like legs pod.

The seaboard also brought to buy sharpie Y, A platform designed for anyone to sell anywhere with a great looking online store or simple looking online store like the one I put together. Lets give me that conflict. I have a few shirts on there in case you're interested.

And speaking of shirts, i'm reminded of three, three stores which are very much loved for a long time, is to love three stores. We're nice place to get stuff like anal kitchen stuff and, uh, clothing and the kind of clothing you get a threat stores is actually pretty interesting because there are shirts there. There are just unlike anything else you would get anywhere else.

So if you're sort of selective and creative minded, there's a lot of interesting fashioned is there and interpret t shirts that's just like hilarious t shirts, t shirts that are very far away, the kind of trajectories you have taken in life or or not, but you just haven't thought about IT like a band that you love, but you've never would have thought to wear A T shirt anyway. A little bit, I think, a shop lifes, the internet third store, uh, of course, you can do super classes, you can do super fancy or you can do super drift. All of IT is possible.

Sign up for a one hour promote trial period at sharp fied that consulate h legs. That's all lower case. Go to sharp fied out sashes to take your business to the next level today.

This epsom is also brought by next week in all in one cloud business management system. Sometimes I think that next week is supporting this podcast because of rolling me. They're saying, hey, legs, are you do a little too much talking.

Maybe should be building more. I agree with you next week. I agree with you.

I saw every time. I do not ready for next week. IT is a chance for me to confront miah union shadow. Some of the demons emerged from the subconscious and ask questions that I don't have answers to questions about once mortality in that life is short, and that one of the most fulfill things in life is have a family and kids and all of these things that would very much like to have.

And also the reality that I love programing and I love building, I love creating cool things that, uh, people can use and share, and that would make their life Better. All of that, of course, I also love listen to podcast. And I kind of think of this podcast is me listening to podcast, where I can also maybe participate by asking questions.

So all these things that you love, but you ask the hard question of, like, okay, while life is slipping away is short, is really, really assured. What do you want to do with the rest of the minutes in the hours to make up your life? yeah.

So thank you for the existence al crisis last week. I appreciate if you are running a business, if you have taken the leap into the unknown and started the company, then you should be using the right tools to manage that company. In fact, over thirty seven thousand companies have upgraded in next week, take advantage of network flexible financing plan and net flash.

Lex, that's next slags. This subsoil is also brought you by the delicious. The delicious ag one is an all in one daily drink to support Better health and peak performance is basically a super awesome multivitamin.

Makes me feel like I have my left together, even when everything else fears that is fallowing apart. At least have age you won, at least have that nutritional foundation to my life. So the first thing i'm doing, all the corner were diet.

All the this goal in dance events, the mental madness of staying up online, or just the stress of certain things are going through. All of that A Y one is there. At least they have the variants.

Also sometimes wonder, these be called a flat Greens and another called A G one. I always wonder, is A G two coming? I was, I just won the interesting branding decision, like A G one. Me is an O C D kind of programmer type.

Like, okay, is this a versioning thing? Okay, is this like A G is your point one alpha, what's one's the final release with anyway, the thing I like to say and to consume is A G one we'll give you one month. Supply official, when you sign up a drink.

A G want that come flash legs. This is luxury podcast to supported. Please check our response or in the description. And now their friends, here's Michael swale, arvid and amon.

And this is awesome. We have Michael amount swale arvid here from the cursor team. First up the critical question, what's the point of a code editor?

So the the code editor is largely the place we build soft. And today or for a long time, that's meant the place where you text at IT A A formal program language. And for people aren't programs.

The way to think of the code editor is like a really stoop up work processor for programmers, where the reason is it's sup is code has a lot of structure. And so the the word processor, the code editor can actually do a lot for you. That word processor is you sort of in the writing space haven't been able to do for for people that in text there.

And so you know, that's everything from giving you visual differentiation of like the actual tokens in the code so you can like scanned quickly. Letting you navigate around the copy is sort of like you're navigating around the internet with kype relinked. You're going to sort of definitions of things you're using to air checking um do you know to a catch rudimentary bugs. And so traditionally, that's what a code editor has meant. And I think that what a code there is is going to change for the next ten years as what IT means to build sofa maybe starts to look .

a bit different. I think I think also a quote that should just be fun.

Yes, that is very important. That is very important. And it's actually sort of an underwater aspect of how we decide what to build, like a lot of the things that we build.

And then we we try them out, we do an experiment, and then we actually throw them out because they're not fun. And and so a big part of being fun is like being fast, a lot of the time, fast. As for you.

yeah, fast. Yeah, yeah. That is my t shirt point amenta.

I think one of the things that draws a lot of people told to building stuff on computers, like and seeing netizen speed, where, you know, other disciples, you might be sort of get kept by resources or the ability, even the ability you know, to get a large group other and coating is this like amazing thing? Where is you in the computer and a that alone, you can build really good .

stuff really quickly. So for you, we don't know. Curses, the super cool new editor. That's a fork of V. S. code. You'll be interesting to get your kind of explanation of your own journey of editors how did you I think all of you are were big fans of V S code with copilot. How did you arrive to the as code and how did delhi to a journey with curse?

Yeah um so I think a lot of us all of us version ally from users, them yeah no need of them, just peer them in a terminal. And at at least for myself, IT was around the time that copy I came out so twenty twenty one that I really want to try IT. So I wanted to vasco, the only platform, the only creditor in which he was available. And even though I you know really enjoyed using them, just the experience of copilot, the s code was more than good enough to convince me to switch. And so that kind of was the default until we start working on curser.

And maybe we should explain what copa IT does get a really nice order complete. This suggests, as you start writing a thing suggests, one or two or three lines how to complete the thing. And there is a fun experience in that, you know, like when you have a close friendship and your friend complete your sentences, like one has done well, there's an intimate feeling. This probably Better, worse than intimate, but is a cool feeling of like how we should IT gets me.

And then there's an unpleasant .

feeling when IT doesn't get you. And so there's like that kind of friction. But I would say for a lot of people, the feeling that IT gets me over powers that IT doesn't.

And I think I actually one of the underrated aspects of get of copa is that even when is wrong is is like a little bit annoying, but is not that bad because you just type another character and then maybe then that gets you or you type another character and then then that gets you .

so even when is drawn is not that yeah you can not and fix IT when the other underwater part of, uh, copolymer me. So was this the first real real AI product? So the first language model consumer product.

the copy was coming. The first killer APP for s yeah.

like the matter is out twenty .

twenty one, right? okay. So what's the the origin of the story of cursor?

So around twenty twenty, the scaling newspapers came out from from A A, I and I was a moment where this look like clear, predictable progress for the field, where even if we didn't have any more ideas, IT looks like you can make these models lot Better if you had more computer more day.

Oh, by the way, we'll probably talk uh, for three to four hours on on the top of scaling laws. Just just to summarize, its a paper instead of paper instead of ideas. I say bigger might be Better for model size and data size in the in the RAM machine learning.

It's bigger and Better, but predictably Better.

At the topic of conversation .

around the time, for some of us, there were like a lot of concept for conversations about what's this kind of look like, what's the this story going to be for all of rent node worker feels about how they're going to be made Better by the technology getting Better.

And then I think there were a couple of moments where, like the theoretical gains predicted in that paper, I started to feel really concrete and started to feel like a moment where you could actually go and not, you do A P. H. D, if you wanted work on, do you useful work in A? I actually felt now there was this, this whole set of systems one can built that were really useful.

And I think that the first moment we already talked about a little bit, which was playing with the early bit of of copal I C I was awesome in magical. Um I think that the next big moment where everything kind of click together was actually getting early access to duty 4, the end of twenty twenty two was when we were tinkering with that model and the step of the capabilities felt enormous. And previously that we've been working on a couple of different projects.

We have been um because of copilot of scaling OS, because of our prior interests in the technology. We had been tinkering around with tools for programmer, the things that are like very specific. So you know we were building tools for financial professionals to have to work with an a juperno pe or like playing around with can you do start of analysis with these models and the sub of an G B before I felt like, look, that really me can create the theoretical against that we'd predicted before, felt like you could build a lot more just immediately at that point of time. And also, if we were being .

consistent and really .

felt like this wasn't just, give me a point, solution thing, this going to be all of programing, which was going to flow through these models and fall like that, demanded a different type of programming environment to different type of programing. And so we set off to build that that sort of flatter vision uh, around them.

There's one that I distinctly remember. So my roommate is an an I M O gold winner and others a competition in the U. S. Called the partner sort of the I college people. And is is this math competition is is exceptionally good. So shanghai gue and among I remember it's a june june of twenty twenty two had this bet on whether the mod like twenty twenty four genre july you are going to win a gold metal in the M O with .

with like models I most international .

metal empire and so orbit and I built, you know, also competed IT so a sort of personal and and I remember think IT mad this is not going happen. This was like, was unlike even though I I sort of believed in progress, I thought, you know, i'm a girl just like a modest is illusion was and and to be honest, I mean, I I was, to be clear, very wrong. But that was maybe the most president .

bet in the group. So the new results and deep mind IT turned out that .

you are correct technically.

One point away, the moon was very enthusiastic about this stuff. yeah. And before the man had dislike scaling ones t shirt that he would want, I like charts and like that the formula on IT .

you like felt the, or you felt the school.

yeah. I distinctly remember there's this one conversation I had with with Michael where before I hadn't thought super deeply and critically about scaling laws.

And he kind of posed the question, why isn't scaling all unit? Or why isn't scaling in a result in mass gains in progress? And I think I went through like the like the stages of grief there is anger, denial and then finally at the end just thinking about IT a acceptance um and I think i've been quite hopeful and optimistic about progress since I think one thing or cobia t is I think IT also depends on like which domains you're going to see progress.

Like math is a great domain because especially like formal, they are improving because you get the fantastic signal of actually verifying if the thing was correct. And so this means something like oil can work really, really well. And I think like you could have systems that are perhaps very superhuman at math and still not technically have H I.

okay. So can we take IT all the way to curse? And what is cursor is a fork of V S code.

And V S code is one of most popular editors for a long time. Like everybody found love with IT, everybody loved them. I love d max for sorry. Uso unified in some fundamental way, the the developer community.

And then you look at the space of things, you look at the scaling while A I is becoming amazing and you decided, okay, it's not enough to just write an extension for via code because there's a lot of limitations to that. We need if he is going to keep getting Better, Better, Better. We need to really like, we think how the the A S can be part of the editing process.

And so you decide to fork V, S code and start to build a lot of the the amazing features will be the to talk about what was that decision like? Because there's a lot of extensions, including copilot of V S code, they are doing so AI stuff. What was the decision like to just force us code?

So the decision to do an editor seemed kind of self have a antos por, at least what we wanted to do and achieve. Because when we started working on the editor, the idea was these models going as much Better. Their capabilities are going to improve and it's going to entirely change.

You build software both in a you will have big productive vegans but also radical and I like the active building sort is going to change lot. And so you're very limited in the control you have over a good editor if you're A A plug into an existing cutting environment. Um and we didn't want to get locked in by those limitations. We wanted to be able to just build the most useful stuff.

okay. Well, then the natural question is, you know VS code is kind of with copilot competitor, so how do you win? Is is basically just the speed in the quality of features.

Yeah, I mean, I think this is a space that is quite interesting, perhaps quite unique, where if you look at previous tech waves, maybe there is kind of one major thing that happened and unlocked anyway the company. But every single year, every single model capabilities, uh, I jump, you get model capabilities, you now unlock this new wave of features. Things are possible, especially in programing.

And so I think in A I programing being even just a few months ahead, let on a year ahead makes your product much, much, much more use. I think curse a year from now, we'll need to make the curse of today look off to lead. And I think you know microsoft has done a number of like fantastic things, but I don't think they're in a great place to really keep innovating and pushing on this in the weather. To start of can .

is rapidly implementing features and and .

pushed yet like and kind of doing the research experimental necessary um to really coach the ceiling. I I don't know if I .

think of IT in terms of features that I think about IT in terms of like capabilities for for programmer. It's that like you know, as you know, the new or one model came out, and i'm sure there going to be more, more models of different types like longer context in maybe faster.

There's all these crazy ideas that you can try and hopefully ten percent of the crazy ideas we'll make going to do something kind of cool and useful and uh we want people to have that sooner to refresh. Its like an underwater factors were making IT far ourself when we started curse. You really felt this frustration that, you know, models, you could see models getting Better, uh, but the copal experience had not changed.

IT was man disgust. Like the ceiling is getting higher, like whether they not making new things, things like this should be making new things. They be like he like like this worth over all the alpha features there.

There were no alpha features. IT was like, uh, I am sure I was selling well. I i'm sure IT was a great business, but I didn't feel I am one of these people that really want to try and use new things. And who said just does no new things like a very long .

while yeah it's interesting. I don't know how you put them towards, but when you compare a cursor with copilot, tty quickly became started to feel stale for some .

reason yeah I think one thing that I think helps us is that where is sort of doing IT all in one where we're developing the the U S. And the way you interact with the model at the same time as we're developing, like how we actually make the model get Better answers. So we like how you build up the the prompt or like how do you find the contestant for a cursor tab, like how do in the model? Um so I think that helps size to have all of its like sort of like the same people working on the entire experience and trend yeah it's .

like the person making the U I and the person training the model, like sit to like eighteen, three often in the same person even yeah often often win the same persons. So you you can you can see things that that are sort of not possible if you're not you're not talking.

you're not experimenting and you're using like exide .

curse to write course.

no. yeah. Well, let's talk about on these features. Let's talk about the unknowing, the all powerful.

Please be to the time, you know, to complete on stewards, basically. So what how does tab work? What does tab to higher .

light in something at a high level? I say that there are two things that curse is pretty good at right now there. There are other things that IT does, but two things that IT helps programmer with. One is this idea of looking over your shoulder and being like a really fast colleague who can kind of jump ahead of you and type and figure out what you're what you're onna do next. And that was the original idea behind that was kind of the corneal. The idea behind good auto complete was predicting what you are going to do next but you can make that concept even more ambitious by not trust, predicting the characters after your curse, actually preventing the next entire changer and to make the next death next place you're going to jump to um and the second thing curse is pretty good out right now too is helping you sometimes jump ahead of the AI and tell you what to do and go from instructions to code. And on both of those who have done a lot of work on making the editing experience for those things organic ic um and also making those things smart and fast.

one of things we really wanted was we wanted the model to be able to edit the code for us. H, I was kind of a wish. And we had multiple attempts at IT before. Before we had a sort of a good model that could edit code for you.

Then after after we had a good model, I think there there there have been a lot of effort to you know make the influence fast for you know having having a good, good experience. And uh, we've been starting to incorporate, I mean, Michael sort of mentioned this like ability to jump to different places. And that jump to different places I think came from a feeling off you know once you once you accept an edit um it's like, man IT should beaches really obvious where to go next? It's like it's like I i'd made this change.

The model should just know that like the next place to go to is like eating lines down and I got if you if you're a wim user, you could press one A J J or whatever I like, why? why? Why am I doing this? Like the model? The model should just know IT. And then so so the idea was sure you just press tab and would go ety lines down and then make sure you show you the next study and you would press tab. So is just you as long as you could keep pressing tab.

And so the internal competition was, how many tabs can we make the oppressive once you have like the idea, uh, more more, uh, sort of abstractly the the thing to think about is sort of like once how how, how are the editor of zero, zero entropy? So once you sort of express your intent and the editor, there's no like new bits of information to finish, you are thought, but you still have to type some characters. Do you like make the computer understand what you're actually thinking then maybe the model you just sort of read your money and all the zero entropy bits you just be like tapped the way yeah interesting .

thing where if you look at language model loss on different domains, um I believe the bits per bite, which is a kind of character Normalize loss for code is lower than language. Which means in general, there are a lot of tokens in code that are super predictable, lot of characters that you were predictable.

And this is I think, even magnify when you are not just trying to auto complete code, but predicting what the user is going to do next in their editing of existing code. And so you know the goal curse your happy. Let's eliminate all the low entry py actions you take inside of the editor. When the tent is effectively determined, let's just jump before the time. Skip y forward.

Well, what's the intuition? What's the technical details of how to do? Next question, prediction that jump. That's not that's not so intuitive, I think to people .

yeah I think I can speak to a few the details on how how to make these things work. They're incredible latency. So you need to train small models on this on this task in particular, they are incredibly profile token hungry.

What that means as they have these really, really long problems when they see a lot of your code and they're not actually generating that many tokens. And so the perfect fit for that is using a sparse model, meaning an emile moo. Um so that was kind of one one break, one break.

We made that substantially improved performance at longer context. The other being um of in a speculative coating that we kind filled out called speculative edits. Um these are two, I think important pieces of what make is quite high quality um and very fast.

Okay, so M V makes you experts. The input is huge, the output is small. Yeah okay, so what what else can you say about how to make is cashing .

play a role in plays a huge um because you're dealing with this many in potok ens. If every single keystroke that you're typing in a given line, you had to rerun the model on all of those tokens passed in, you're just going to one significantly great lent cy to you, na, kill your G, P, S with load. So you need you need to design the actual problems use for the model such that they are cash cashing aware. And then yet you need to reuse the K, V cash across request, just that you're spending less work, less compute h again.

what are the things that tab is supposed to be able to do kona in the near term, just that like sort of linger on that january code, like fill empty space. Also edit code across multiple lines yeah and then jump to different locations at the same file yeah and then like .

a jump to different files also. So if you making edit one file and maybe maybe you have to go maybe you have to go to another file to finish your thought IT IT should go to the second files .

also and then the full generalization is like next next action prediction. Sometimes you need to run, uh, command in the terminal and I should be able to suggest the command based on the code that you wrote to um or sometimes you actually need to like it's not just something, but you you is hard for you to know if it's correct because you actually need some more information to learn like you need to know the type to be able to verify that is correct as maybe you should actually take you to a place that's like the definition of something and then take you back so that you have all the requisite knowledge to be able to accept the next completion.

also providing the human the knowledge.

Yes, right.

yeah. Can you into great leg? I just got to know, gaining prime gan, who I believe has an S S order coffee via S S H.

Oh yeah, we did that. We did that.

The model do that. Like give you and yeah, provide A O so that's the general .

framework. yes. And the magic moment would be, if IT is .

programing .

is this weird discipline where sometimes the next five minutes, not always, but sometimes the next five minutes, what you're to do is actually predictable from the stuff you don recently. And so can you get to a word where that next five minutes either happens by you just engaging and IT taking you through or maybe a little bit more? I've just you seeing next step what it's going to do and you're like, okay, that's good, good.

That's good. That's good. And you can tap, tap, tap through these big changes. As we're .

talking about this, as you've mentioned, one of the really cool and noticeable things about curr is that there's this whole diff interface situation going on. So like the model suggests would with the red and the Green of like cares how we going to modify the coat. And in the chat window, you can apply and IT chose you the death and you can accept the death. So maybe can you speak to whatever direction .

of that will probably have like four, five different kinds of diffs. We we have optimize the death war for the other complete.

So that has a different diff interface then uh then when you're reviewing larger blocks of code and then we're trying to optimize uh, another diff thing for when you're doing multiple different files uh and inserted at a high level, the differences for when you're doing all to complete IT would be really, really fast to read uh actually should be really fast to read in all situations. But in our completed sort of you you're really like your eyes focused on one area. You you can't be in too many. You the humans can't look in too many .

different places to work on on the site.

So we currently has this box on the side, so we have the current box. And if you tried to delete code in some place and tries to add other code, IT tries to show your box on the .

you may be shown if we pull IT up. And um this is over.

glad. So that is that IT was like three or four different attempts. I trying to make this visiting work. Well, first, the attempt was like this blue crossed outlined.

So before I was a box on the side, IT used to show you the code delete by showing you like a like google, the docks style, you would see like a line through IT. Then you would see the the new code that was super distracting. Then we tried many different, even there was sort of delicious there was trying to read, highlight in the next, uh, iteration of IT, which is sort of funny.

Would you would hold the on mac the option button? So I would I would sort of highlighter region of code to show you that there might be something coming. Uh, so maybe in this in example, like the input and value, uh, would get, would all get blue.

And the blue was to highlighting the A, I had a suggestion for you. Uh, so instead of directly showing you the thing, I would show you that the A I, I would just hinted that the A I had a suggestion. And if you really wanted to see that, you would hold the option button and then you would see the new suggestion. Then you if you release the option button, you would dency original code.

So that's, by the way, that's pretty nice, but you have to know .

to hold the option one. Yeah.

I by the way, not a matter, but I got IT. It's a button. I guess you people have .

it's it's again, it's it's just non intuitive. I think that's that's the key thing.

And under a chance, this this is also not .

the final version of IT. I am personally very excited for um making a lot of improvements in this area like we we often talk about IT as the verification problem. Where are these lifts are great for small edits, uh for large edits or like when it's multiple files or something, it's um actually a little bit prohibitive to to review these gifts. And uh so there are like a couple of different ideas here. Like one idea that we have is okay, you know like parts of the gifts are important, they have a lot of information and then parts of the diff um are just very low entropy. They're like exam, like the same thing over and over again as then maybe you can highlight light the important pieces and then gray out the not so important pieces or maybe you can have a model that looks at the diff and and sees, oh there is a likely bug here I will like mark this with a little red square icky and say like you should probably like review this part of the if um and ideas in in divine I think .

are exciting yeah that's the really fascinating space of like guai design engineering so basically trying to guide the human programmer through all the things they need to read and nothing more yeah I optimates .

yeah and you want to an intelligent model to do IT like currently alf algorithms are, they're like, I like they are just like Normal algorithms. There is for no intelligence. There is like intelligence that went into designing the algorithm. Then there there is no, like you don't care if thing, if it's about this thing or this thing, uh, as you want a model to to do this.

So I think the the the general question is like mad, these models are going to get much smarter. As the models get much smarter, the changes they will be able to propose or are much bigger. So the as the changes gets bigger and bigger and bigger, the humans have to do more, more, more verification work. IT gets more, more, more hard like just you you need to help them out. I I don't want to spend all my time.

Reviewer code, can you say a little more across multiple files? dive?

yeah. I mean, so get off. Try this right with cows. When you're doing good review, you're reviewing multiple debts across multiple files.

But like harvard said earlier, I think you can knew much Better than cover you. Cover you kind of sucks like you spend a lot of time trying to rock this code. It's often quite unfamiliar to you and IT often like doesn't even actually catch for that many bugs.

And I think you can sign significantly improve their review experience using language models, for example, using the kinds of tricks that I had described, that maybe a pointing you towards the region that actually matter. Um I think also if the code is produced by these language models and is not produced by someone else like the code of you, experience is designed for both the reviewer and the person that produced the code. In the case where the person that produced the code is the language model, you don't have to care that much about their experience.

You can design the entire thing around the reviews. Suspect the reviewers job is as fun, as easy as product as possible. Um and I think that that feels like the issue with just kind of naively trying to make these things look like code review, I think can be a lot more creative and and pushed the boundaries and what's possible.

Just one one idea there is I think ordering matters generally when you review A P, R, you you have this list of files and you're reviewing them from top to bottom. But actually like you actually want to understand this part first because I came like logically first and then you want to understand the next part. You don't want to have to figure out that yourself. You want a model to guide you through the thing.

And is the step of creation going to be more, more natural language? Is the goal versus with actually.

I think sometimes I don't think it's going to be the case that all of programing will be natural language. And the reason for that is, you know, if i'm per programing, what's wall and well is at the computer in the keyboard and sometimes if I am like driving, I want to say just wala hey like implement this function and that that works. And then sometimes it's just so annoying to explain as well of what I want him to do as though I actually take over the keyboard and I show him I I writes like part of the example and then IT makes sense and that's the easiest way to communicate. And so I think that's also the case for A I like sometimes the easiest way to communicate with the A I will be to show an example and then he goes and does the thing everywhere else. Or is sometimes if you're making a website, for example, the easiest way to show to the what you wanted not to tell you what to do but you know drag things around or draw things um and yeah and and like maybe eventually we will get to like bring machine interfaces or what I want and can like understand what you're thinking as I think natural language will have a place I think I will not definitely not be the way most people programme most of the time.

I'm really feeling the agi with this editor .

feels like .

there's a lot of machine learning going on underneath. Tell me about some of the small stuff that makes IT .

all work worker really works via this on sample of custom models that we train alongside, you know, the frontier models that are fantastic at the reasoning, intense things. And so curse your tap for examples is a great example of where you can specialize this model to be even gather than even front your models.

If you look at if s on the test, said that the other domain, which is kind of surprising, that requires custom models, but but this kind of necessary and works quite well. In apply. Um so I think these models are like the frontier models are quite good.

It's sketching out plans for code and generating like rough sketches of like the change. But actually creating depth is quite hard um for frontier models, for your training models um like you try to do this with saw IT, with a one Candy front. Your model and IT IT really mess up stupid things like counting line numbers, especially in super, super large files. Um and so what we've done to alleviate this is we let the model kind of sketch out this rough cobo ck that indicates what the change will be. And we train a model to then apply that change to the file.

And we should say that apply is the model looks a year code IT gives you a really damn good suggestion of what new things to do. And the seemingly for human trivial step of combine the two you're saying is not so trivial.

Contrary to popular perception, IT is not a deterministic alethia.

Yeah I I I think like you see, follow copies of apply um elsewhere and IT just breaks like most the time because you think you can kind of try to do some deterministic mashing and then IT fails you know at least forty percent of time and that just results in a terrible product experience.

Um I think in general, this this regime of you are going to get smarter, smarter models and like so one of other thing that apply IT lets you do is IT lets you use fewer tokens with the most intelligent dels. Uh this is both expensive in terms of lencs for generating all these tokens um and cost so you can give this very, very rough sketch and then have your model models go and implement IT because it's a much easier test to implement this very, very sketched out code. I think that this this regime will continue where you can use smarter, smarter models to do the planning and then maybe devolution details, uh, can be handled by the western origin ones. Perhaps you have, you know, maybe a one maybe will be even more capable models given even higher level plan that is kind of recursively applied by sa and money. Maybe we we .

should talk about how how to make IT fast yeah have you liked? Yes is always interest .

yeah how do you make IT?

Yeah so one big component of making IT fast is specular at its so speculate edits are a variant of speculative coating and maybe be helpful to briefly describe specular the coating um with specular decoding. What you do is you you can kind of take a dancing of the fact that you know most the time and i'll all the coffee out that you would be when you're memory bound in in language model generation if you process multiple tokens at once, um IT is faster than generating one token of time. So this is like the same reason why if you look at tokens for second with prompt tokens, first generated tokens, it's much, much faster for prop tokens.

Um so what we do is instead of using what spec of the coding Normally does, which is using as a really small model to predict these draft token, take your larger model will then go in and and verify um with code at its we've a very strong prior what the existing code will look like and that prior is literally the same exact code so you can do as you could just feed chunks of the original code back into the into the model um and in the model will just pretty much agree most of the time that okay i'm just onna spit this cut back out and you can process all of those lines in parallel and you just do this with sufficiently many chunk s so and eventually you'll reach a point of disagreement ment where the model will now predict text that is different from the ground. True original code, it'll generate those together and then we kind will decide after enough tokens match, uh, the original code to restart speculating a chance of code. What this actually ends up looking like is just a much faster version of Normal editing codes. So just like IT looks like a much faster version of the model rewriting all code. So just we we can use the same exact interface that we use for for gifts, but IT will just stream down a lot faster .

and and advantages that wireless streaming, you can just also be reviewing start reviewing the code before that there is no no big loading screen. Um so maybe that that is part of the part of the advantage .

so the human can start reading before things is done.

I think the interesting of here is something like like speculation is a fairly common idea now is is like not only in language models, there's obviously speculation in CPU and speculation for database speculation all over the place.

Let me ask this the ridiculous question of which alone is Better coding GPT claude, who wins in the context of programing. And i'm sure the answer is much more nuance because that sounds like every single part of this involves a different model.

Yeah I think there there's no model that parade dominates others. Meeting IT is Better in all categories that we think matter. The category is being speed, um ability to adequate ability to process lots of code bone context in a couple of other things and kind of coating capabilities.

The one that i'd say right now is just kind of net test is on IT. I think this is a consensus opinion. Our ones really interesting and it's really good at reasoning.

So if you give IT really hard uh programing interview style problems, really code problems, you can do quite quite well on them um but IT doesn't feel like a kind of understands your rough intent as well as on IT does like if you look at a lot of the other frontier models um one calm I have is IT feels like they're not necessarily over. I'm nothing. They train in benchMarks, but they perform really well in benchMarks relative to kind of everything that's kind of in the middle. So if you tried and all these benchMarks and things that are in the distribution of the benchMarks are evaluated on, no, they will do really well when you push them little bit outside of that song. It's, I think, the one that, that kind of does best, that kind of maintaining that same capability, that you kind of have the same capability in the benchmark when you try to instruct to do anything with coding.

What another article question is. The difference between the Normal programing experience verses with benchMarks represent like where do benchMarks fall short? Do you think when we're evaluating these models?

But we just like a really, really hard like like critically important detail, like how how difference like benchMarks are versus which is like real coding a real coding. It's not interviews style coding. You're doing this, you know, humans are saying like half broken english sometimes and sometimes you're saying, like I do what I did before.

Sometimes you're saying, uh, you know, go at this thing and then do this other thing for me and then make this U, I abandoned, you know, it's just like a lot of things are sort of context dependent. You really want to like, understand the human and and due with the human wants as a post to structure. Maybe the the way to put IT is sort of abstract list. Uh, the interview problems are very well specified. Theyve lean elon a specification while the human stuff is less specified.

Yeah, no, I think of this. This means for question is both complicated by what villagers mentioned and then also to what a man was getting into, is that even if you like, you know there's this problem of like this skew between what can you actually want on a benchmark verses, uh, real programing. And that can make sometimes to capital because it's like real program is like very messy and sometimes things aren't well specified.

What's character? What isn't. But then um it's also double hard because of this public backup k problem. And that's both because public benchmark er sometimes kind of feel climbed on.

But then it's like really, really hard to also get the data from the public benchMarks out of the models. And so for instance, like one of the most popular like asian benchMarks, swevenham um is really, really contaminated in the training data of uh, these foundation models. And so if you ask these foundation models do a sweep inch problem, you actually don't give them the context of cobs.

They can like pollution ate right file pass. They can lucene the right function names. Um and so that it's it's also just the public aspect of these things is tRicky yeah .

like in that case, I could be trained on the literal issues or poor request themselves and and maybe the lives will start to do a Better job um or they've already a good job at decontaminating those things. They're not going to admit the actual training data of the response itself like these are all like some the most popular pithum reis like simply is one example. I don't think they're going to handy cap their models on simply all these popular repositories in order to get true evaluation scores in these benchMarks.

And I think that given the death and benchMarks, um there have been like a few interesting catches that places that build systems with these models or build these models actually use to get a sense of are they going in the right direction or not. And in a lot of places, uh, people will actually just have humans play with the things and give quality to feedback on these um like wonder to the fancy al mol companies.

They they have people who that's that's a big part of their role. And you know internally we also qualitiative vely access. These models are actually mean on that a lot in you to like private value that we have.

like the vibe of .

the five, the vibe benchmark, human benchmark yeah you put in the humans do a vide check yeah OK. I mean, that's that's kind of what I do like, just like reading online forms and read IT and ex, just like I don't know how to properly load in people's opinions because it'll say things like I feel like a GPT gotten dumb or or something i'll say I feel like and then I sometimes feel like that too but I wonder if it's the models problem or mine yeah with claws .

is an interesting take. I heard where I think L S has different ships um and I suspect they're slightly different emmerick than in video G P S. And someone speculated that cars degraded performance had to do with maybe using the quantification that existed on a bedrock forces. Ah whatever was running on anthropic G, P.

S, I interview, a bunch of people have conspiracy theory on, spoke to this.

Well, not not like conspiracy theory as much. They're like there, you know humans, humans are humans there. There's these details and you know you're doing like this cozy amount flops and that chips are messy. And man, you can just have bugs like bugs are. It's it's hard to hover state how hard bugs are to avoid.

What's are the role of a good prompt in all this? These women in the backmarker really structured, well formulated proms. what? What did a human be doing to maximize success? And was the importance of what the human to blog poston. He called the prompt design .

yeah I think IT depends on which model you are using and all of them are is likely different and they respond differently to different prompts. But um I think the original GPT for uh and original all sort of beautiful models last last year, they were quite sensitive to the prints and they also had a very small context window. And so we have all of these pieces of information around the code base that would maybe be release in the prompt.

Like you have the dogs, you have the files that you at, have the conversion history. And then there is a problem like how you decide what you actually put in the prompt when you have an element ate space. And even for today's models, even when you have long context, filling out of the entire context window means that it's slower and means that sometimes the model actually gets confused and some models get more confused than others.

And we have this one system internally that we call preempt, which helps us with out a little bit. Um and I think IT was built for the era before where we had eight thousand uh token context windows. Uh and it's a little bit similar to when you're making a website, you use sort of you you wanted to work on mobile, you wanted to work on a desktop screen and you have this uh dynamic information, which you don't have, for example, of your making.

You like designing a print magazine. You have like you know exactly where you can put stuff when you have a website or when you have a prompt, you have these inputs and then you need to format them to always work. Even if the input is really big, then you might have to cut something down uh and and and so the idea was okay, like let's take some inspiration.

What's the best way to design websites? Well um the thing that we really like is is react and the declarative of approach where you um you use G S X in in in java script uh and then you declare IT this is what I want and I think this has higher priority or like this has higher z index and something else um and then you have this rendering engine in web design is as like chrome uh in our cases that prompt rendered, which then fits everything onto the page and as you declared to decide what you want and then IT figures out what you want. Um and as we have found that to be uh quite helpful.

And I think the role of IT has has sort of shifted over time um where initially was to fit to these small contacts windows. Now it's really used for because you know IT helps us with split ting up the data that goes into the prompt and the actual rendering of IT. And so um it's easier to d bug because you can change the rendering of the prompt and then try IT on old prompt because you have the raw data that went in to the prot and then you can see, did my change actually improve IT for for like this .

entire evil set they do literally prompted. Jx.

yes, yes. So kind of looks like react. There are components like we have one components that's a file components and IT takes in like the curse. Like usually there is like one line where the cursor is in your file and that's like probably the most important line because that's what you're looking at. And so then you can give priority. So like that line has the hidest priority and then you subtract one for every line that uh is farther away and then eventually when is rendered to figure out how many lies can not actually fit in this centers around that thing.

That's amazing. You and you can do like other fancy things were a few of lots of code blocks from the entire code base. You could use uh recruit al um and things like in betting and regranex cores to add priorities force to these components.

So should humans, when they ask questions, also use, try to use something like that? I could IT be beneficial to write J S X. In the, in the problem, the whole ideas, you should be loose and messy.

I think argo is kind of that you should just a do whatever is the most natural thing for you, yes. And then we our job is different grades. How do we actually like refuse the relative things so that your .

thing actually makes sense? Well, this is a service. The discussion ahead with iran of perplexity is like history of ideas, like you should let the person be as lazy yes, yes.

But like, yeah, that's a beautiful thing. But I feel like you allowed to ask more programs, right? So like if you say, do you do what you want? I mean, humans lazy. There's a kind of tension between just being lazy versus like provide more as be prompt at almost like the system pressuring you or inspire ing you to be articular yeah, not in terms of the grammar of the sentences, but in terms of the depth of thoughts that you convey inside the the problems.

I think even as the system gets closer to some levels of perfection, often when you ask the model for something, you just are not none of intense conveyed to know what to do. And they're like a few ways to resolve that intent. One is the simple thing of having more just ask you, i'm not sure how to do these parts space in your query. Could you clarify that? I think the other could be maybe if you there are five or six possible generations given the uncertainty present in your curry so far, why don't we just actually show you all those, let you pick them.

How hard is IT too, for the model to choose to speak, talk back sort of verse? It's hard. It's sort of like how to deal the uncertainty. Ty, I I choose to ask for more information to reduce the bigger.

So I mean one of the things we we do is um is like a recent addition is try to suggest files that you can add. So and while you're typing uh one can guess what the uncertainty is and maybe suggest that like maybe maybe you're writing your E P I and we can guess using the that you've made previously the same file that the client and server is super useful.

And there's like a hard technical problem of how do you resolve IT across all commitments, which files the most important, giving your current prompt. And we're still sort of uh, initial versions will double. I'm sure we can make IT much more accurate. Ah it's it's very experimental.

But then the ideas we show you, like do you just want to add this file, this file, this file also to tell you know the model to edit those files for you? Uh, because if if you maybe you're making the API, like you should also edit the client and the server that is using the API and one resolving the that we kind of clause both there's the face where you're writing the prompt and this before you even crick enter. Maybe we can help us awesome of .

the energy to what agents approaches. How is for agents?

We think agencies are really, really cool. I think agents is like it's like resembles sort of like a human, sort of like the things like you can kind of feel that is like you're getting closer to G I because you see where um IT acts as as a human wood and and it's really, really cool. I think um agents are not yet super useful for many things.

They I think we're getting close to where they will actually be useful. H and so I think, uh, there are certain types of tasks we are having. An agent would be really nice.

Like I would love to have an agent, for example, if like we have a bug where you sometimes can't command c and command v uh inside our chat input box and that's a task that super well specified. I just want to say like in two sentences as this does not work, please fix IT. And then I would love to have an agent that just goes off does IT. And then A A day later, I I come back and I reviewed the the thing you mean IT goes.

finds the right file. Yeah.

he finds the right file IT like, tries to reproduce the bug IT like fix us the bug and then you verify that IT is correct. And this this could be a process that takes a long time. Um and so I think I would love to have head. Uh and then I think a lot of programming like there is often this belief that agents will take off over all of programing. Um I don't think we think that that's the case because a lot of programing, a lot of the value is in iterating or you don't actually want to specify something up front because you don't really know what you want until you've seen an initial version and then you want to iterate on that and then you provide more information. And so for a lot of purging, I think you actually want a system that instant that gives you an initial version intently back and then you can iterate super, super er quickly.

What about something like that we think came out reply ancient, that does also like setting up the development environment, installing software packages, configuring everything, configure the database and actually deploying the APP. Yes, that also in the set of things you dream about.

I think, I think that would be really cool. I for for certain types of programme, uh, I would be really .

cool is that scope of cursor?

Yep, we're aren't actively working on IT right now. Um but it's definitely like we want to make the programming is life easier and more fun and some things are just really tedious than you need to go through a bunch of steps and you want to delegate that one agent. Um and then some things you can actually have an agent in the background while you're working.

I'd let's say you have A P R. That's both back end and front and you're working in the front end and then you can have a background agent that doesn't work and figure out kind of what you're doing. And then when you get to the back in part of your P R, then you have some like initial piece of code that you can iterate on. Um and and so that that would also be really cool.

One of the things already talked about a speed, but I wanted just lying around that some more in the various places that the technical details involved in making this thing really fast. So every single aspect of cursor, most aspects of cursor feel really fast. I can mention the apply is probably the .

slowest thing and me from yes. So the pain.

No, is a pain is a pain that we're feeling and we're working on fixing you.

Yeah I mean, IT says something that something that feels I don't know what IT is like one second or two seconds that feels slow. That means that actually shows that everything else is just really, really fast. Um so there are some techno details about how to make some of these models, how to make the chat fast, how to make the is fast is something just judge to mine .

yeah I mean, so we can over a lot of transition that we use. One interesting thing is cash warming. Um and so what you can do is if as users typing, you can have yeah you you're probably going to use some pieces of context and you can know that before the users done typing.

So you know, as we discussed before, reusing the kv cash results in lower latency, lower costs across request. So as the user search types, you are going to merely warm the cash with like, let's say, the current file contents. And then when they press under, uh, there is very few tokens that actually has to prefer. And you, before starting a generation, this will signally, I can .

explain how K, V, cash works.

So the way transformers .

work like that.

I mean like one of the mechanisms that allows transformers to not just independently, like the mechanisms that allows transformers to not just independently look at each token s, but see, previous tokens are the keys in valleys to attention.

And generally, the way attention works is you have at your current token some query and then you've all the keys and values of all your previous tokens, which are some kind of representation that the model stores eternally of all the previous tokens in the prompt. And like by default, when you're doing a chat, the model has to, for every single token, do this four past through the entire uh, model. That's a lot of matrix, more plants that happened and that is really, really slow.

Instead, if you have already done that and you store the keys and values and you keep that in the GPU, then when i'm let's say I have sort ted for the last and tokens. If I now want to compute the the output tokens, the end plus one token, I don't need to pass those first and tokens through the entire model because I already have all those key egm values. And so you just need to do the four pass through that last token. And then when you're doing attention, uh, you're reusing those keys and values that have been computed, which is the only kind of central part um sequentially dependent part of the transformer.

Is there my higher level cashing of like cashing of the prom, sir.

that kind of I say help yeah that that there's other types of cashing you can kind of do. Um one interesting thing that you can do for current tab is you can basically predict ahead as if the user wouldn't accepted the suggestion and then trigger another uh, request and so then you cash you'd one to speculate if it's it's a mix of speculation in cashing, right? You're speculating what would happen if they accept to IT and then you have this value that is cash this this uh suggestion. And then when they press how the next one would be weight for them immediately, it's a kind of clever ha stic slash trick, uh that uses a higher level cashing and and can give uh the IT feels fast despite they're not actually being any changes in the in the model.

And and if you can make the K, V cash smaller, one of the advantages you get is like me, maybe you can speculate even more, can get ten things that and I could be used by like I like predict the next ten and like it's possible the user hits the one of the ten. It's like much hard chance that the user hits like the exact one that you show them um maybe they type another character in and when he sort of hit hit something else in the cash.

There's there's all these tricks where uh the the general phenomenon here is uh I think it's is also super useful for R L. Is you know maybe a single sample from the model isn't very good. But if you predict like ten different things, uh, turns out that one of the ten, uh, that's right, is the probability is much higher. There's these past cakeron in a part of R L trip. Like what what R L does is you can you can exploit this past k phenomenon you to make many different predictions and and uh one way to think about this, the model ort of nose internally has like has some uncertainty over like which of the things is correct, but like which of the casings just a human wants when we are L R um you know which should have model one of the things we're doing is we're predicting which .

like which of the hundred different .

suggestions to the moon produces is more amenable for humans like which of them do humans more like than other things? Um maybe maybe like there's something where the mole can predict very far head versus like a little bit and be somewhere in the middle and and just and then you can give a reward to the things that humans would like more and and sort of punish the things that you would like and sort of then train the model to output suggestions that humans would like. You have these rl loops that are very useful to exploit these um a money maybe going even more detail yeah it's .

a little IT is a little different than speed um but I mean like taking only you tie back in because you can get away the smaller model if you are earlier smaller model that gets the same performances as the bigger one. Um that's like what I was mentioning stuff about kb about reducing the size of cash there. There other techniques there is, well, they're really helps for speed.

Um so kind of back in the day, like all the way two years ago, uh, people mainly use multiple attention. And I think there's been immigration towards more efficient attention schemes like group query um or multiple tention. And this is really helpful for them with last bad sizes being able to generate the tokens much faster. The interesting thing here is um this now has no fact on that time to first token and prefer speed. I think this matters for is uh now generating tokens and and wise that cause when you're generating tokens instead of being bottled, acted by doing this super paralyzed matrix multiples across all your tokens, your bottle neck by how quickly is for long context um with large backsides es by how quickly you can read those cash keys and values um and so then how that's memory band with and how can we make this faster? We can try to compress the size of these cases values. So muli query attention is the most aggressive um where Normally with multi had attention you have some number of coon code attention heads um I think some number kind of query query heads um musique y just preserves the query heads, gets rid of all the key w heads um so there is only one kind of key value head and there's all the remaining query heads with group query um you instead you know preserve all the quality heads and then your keys and values are kind of there are fewer heads for the key sand values, but if you're not reducing, its just one. But anyways, like the whole point here is you're just reducing the .

size or kb cash and then there is the .

yeah the latent um that's a little more complicated. And the way that this works is we kind of turns the entirety of your keys and values across all your heads into this kind of one latent vector or that is then kind of expanded in france time.

But M L is from this company, uh called deep's. K um it's quite interesting algan. Uh maybe the key idea is sort of uh in both M Q A uh and in other places, what you are doing is reducing the a number like the number of K V heads that vantage you get from that is is you know there's less of them, but um maybe the series that you actually want a lot of different uh that you want each of the the keys and values to actually be different. So one way to reduce the sizes you keep one big shared vector for all the keys dies, then you have smaller vectors for every single token so that when you you can you can store the only the smaller thing as some sort of like low rank reduction and the lord intraductory that and at the end of the time, when you when you eventually want to compute the final thing, uh, remember that like your memory band, which means that like you still have some some computer left that you can use for these things. And so if you can expand the um the latent vector back out and and somehow like this is far more efficient because just dig, you're reducing like for example, be like reducing like thirty two or something like this of the vector that keep keeping.

Yes, there's perhaps some richness and having a separate a set of keys and values inquiry that kind of paradise match up versus compressing that only one and that interaction .

at least okay all that is dealing with um being memory bound yeah and what I mean ultimately, how does that map to the user experience trying to .

get yeah the two things that a max you is, you can now make your cash a lot, marsha, because you've was space allocated for the K V. Cash, united cash, a lot more gressly ly and a lot more things. You get more cash hits, which are helpful for reducing the time to first token for the reasons kind of describe earlier. And then the second being, when you start inference with more and more requests and larger and larger batch sizes, you don't see much for slow down in as is generating the tokens, the speed of that.

what IT also allows you to make .

your prompt bigger first? Yes, yeah. So like the basic, the size of your K, V, cash is a both the size of all your prompts multiplied by the number of prompting process in parallel, so you could increase either the dimensions, right, the bad size, or the size of your promise. Without degrading the lenny of generating tokens.

You wrote a blow pole shadow workspace IT erring on code in the background. yes. So what's gone on?

Uh, so to be clear, we want there to be a lot of stuff stuff happening in the background. We're experimenting with a lot of things, uh, right now. Um we don't have much of the happening other than like the the cash warming ing or like you figure out the right context that goes into your comment prompts, for example.

But the idea is if you can actually spend computation in the background, then you can help um help the user maybe like at a slightly longer time horizon than just predicting the next few lines. They are gonna. But actually like in the next ten minutes, what are you going to make? And by doing in the background, you can spend more competition doing that.

And so the idea of this is how to watch space that, that we implemented and we use IT internally for like experiments. And is that to actually get advantage of doing something in the background, you want some kind of feedback signal to give give back to the model, because others SE like you can get higher performance by just letting the model think for the longer. And and so like a one is a good example of that.

But another way you can improve performance is by letting the model iterate and get feedback and and so one very important piece of feedback when your a programmer is um the language survey, which is uh this thing that exists uh for most different languages and there is like a separate language ever per language. And I can tell you, you know you're using the wrong type peer and then gives you an error or IT can allow you to go to definition and sort of understands the structure of your code. The language servers are extension developed by like there is a type script language we developed by the type script people, a rust language very developed by the rust people.

And then they all interface over the language of a very protocol to VS code, so that VS code doesn't need to have all of the different languages built into V S. code. But rather you can use the existing compiler infrastructure for linking purposes.

It's for linking. It's for going to definition. Uh, and if you're like seeing the the right types that you're using was .

doing like type checking also.

yes, type checking and and going to references. And that's like when you're working in a big project, you you kind of need that. If you if you don't have that, it's like really hard to to code project.

Can you say again how that's being used inside curse the the language severe protocol unica thing.

So it's been using cursor to show to the programmer r just like and we as could. But then the idea is you want to show that seeing information to the models, the I O models um and you want to do that in a way that doesn't affect the user IT in background. And so h the idea behind the charter workspace was okay. Like one way we can do this is um we spawn a separate window of curse that's hidden. And so you can set this flag on election his head in there is a window, but you don't actually see IT an inside of this window at the AI agent can modify code however they want as long as they don't save IT because it's still the same folder um and then can get feedback from from the liners and go to definition and and ita .

on their code like literally run everything in the background like as if right yeah .

maybe even run the code.

So that's the .

eventual version OK. That's what you want. And a lot of the blog post is actually a about how do you make that happen because is a little bit tRicky. You wanted to be on the users machines, so that is exactly a mirrors, the users environment. And then on linux, you can do this cool thing where you can actually mire the file system and have the A I make changes to the files and and IT thinks that is Operating on the file level, but actually that's stored in in memory and you you can uh create this coral extension to make you work um on mac and windows. It's a little bit more more difficult uh and and uh but it's it's a fun technical problems that way.

One maybe hacky, but interesting idea that I like is holding a lot on saving. And so basically, you can then have the language model of whole block on on saving to disk. And then instead of you Operating in the ground, truth version of the files that says that you actually are Operating, what was the shadow or face before on these unsafe things that only insistent memory that you still get literature for and you can code in. And then when you try to maybe run code, it's just like there's a small warning, but there's a lock and then you kind of will take back the lock from the language if you're trying to do things currently or from the chat workplace if you .

trying to do things can currently that such an exciting future. By the way, it's a bit of attention, but like to allow a model to change files is scary for people. But like it's really cool to be able to just like let the agent do a set of tasks and you come back the next day and kind of observe like it's a colleague or something like that yeah and I think there .

maybe different versions of like running ability where for the simple things where you are doing things in the spend a few minutes on behalf the user as their programing IT makes sense to make something work locally in the machine. Thanks for the more aggressive things we are making larger changes that take more grapes of time. You'll probably want to do this in some sandbox remote environment. And that's another incredibly tRicky problem of how do you exactly reproduce or mostly reproduced to the point of that being effective ly equivalent for running code, the users environment with this mote remote and box.

And here's what kind of agency you want for, for coding. Did you do you want them to find box? Do you want them to like implement new features like what what they do.

do want? So by the way, when I think about agents, I don't think just about coding, uh, I think so for the print, this particular podcast, this video editing and a lot of, if you looking to do be a lot of this code behind, I is very poorly documented code. But you can interact with premiere, for example, using code, and basically all the uploading everything I do and youtube everything.

As you could probably imagine, I do all that through code and and including translation over dubbing, all this. So I envision all those kinds of tasks are automating many of the tasks that don't have to do directly with the editing. So that, okay, that's what I was talking about. In terms of coding, I would be fundamentally thinking about bug finding like many levels of kind of but bug finding and also bug finding like logical bugs, that logical spiritum bugs. Once like big directions of implementation, that kind of .

stuff and but finding yeah I mean.

it's really interesting that these models are so bad at book finding a when just knightly prompted to find a bug, they're incredibly poorly calibrated.

Even the the smart is exactly .

you even on how .

how do you explain that is a good intuition.

I think these models are really strong reflection of the three training distribution. And you know, I do think they they generalize as the loss gets low and lower, but I don't think the loss in the scale is quite the loss is low enough such that they're like really fully generalizing and code like the things that we use these things for the frontier models that that they're quite good at are really code generation and question answering.

And these things exist in massive quantities and pretrail ing with all of the coding get up on the scale of many, many trillions of tokens and questions and answers on things like stack up the flow and maybe get have issues. And so when you try to push these things that really don't exist, a very much online, like for example, to curse a cap objective of predicting the next edit, given the edits done so far, uh the brittleness shows and then bug detections is another great example where there aren't really that many examples of like actually detecting real bugs and then proposing fixes um on the models just kind of like really struggled IT. But I think it's a question of transfering the model like in the same way that you get this fantastic transfer um from retrain models.

Uh just on code in general to the cursor tab objective. Uh you'll see a very, very simple thing with generalized models that are really good to code to bug detection. IT just takes like a little bit of kind of judging in that direction.

To be clear, I think they sort of understand code really well like while they're being retrained, like the representation that that being built up like almost certainly like somewhere in the stream, there's the model knows that maybe there's there's some sound sketch going on rated.

So have some sketches actually listed the sketches to you'll like actually like part part of IT is that humans are really calibrated on which bugs are really important. It's not just actually it's not just actually saying like there's something sketchy. It's like sketchy, trivial.

Is this sketchy like you're going na take the server down? Yeah like a part of IT is maybe the cultural knowledge of uh like wise, the staff engineer, staff engineer, a staff engineers is good because they know that three years ago, so one a really sketchy piece of code that took took the server down and I suppose you like I suppose you movie is like, yeah you just this thing is like an experiment. So like a few bugs are fine, like you're just trying to experiment and get the fuel of the thing.

And so if the model is really annoying when you're writing an experiment, that's really bad. But if you're writing something for super production, you're like writing a database, right? You're good writing code post gress or linux or whatever, like your linus terribles year years that is sort of unacceptable to have even in educate and just just having the calibration of like how paranoid is the user like.

But even then, like if you're putting in a maximum paranoia, it's still just like doesn't quite get IT yeah yeah.

yeah I mean, but is hard for humans to to understand what which kind of code is important, which is not thank you. I think when your principles on a websites says if if if a coke do a lot of damage, one should add a comment that say this this line of .

code is is dangerous and they all caps and ten times no.

you say, like for every single line of code inside the function, you have to and that's quite profound that that says something about human beings because the the engineers move on even the same person might just forget how I can think the titanic a single function might not into that quite clearly by looking at the single piece code .

yeah I think if that that one is also uh partially also for today's A I models where if you actually write dangerous dangerous dangers and every single line like a, the models will pay more attention to that and will be more likely to find box in that region.

That's actually just straight up a really good practice of labelling code of how much damage can do.

I can I some people think it's ugly, uh actually .

think it's h like fact I actually think is one of the things I learned from our IT is you know like uh sort esthetically I don't like IT, but I think there's certainly something where like it's is useful for the models and in humans just forget a lot and it's really easy to make a small mistake and coach like bringham you know like just bring on the server and like you like I go first. We like tech a lot and whatever, but there's always these things that to be very different .

yeah like with just Normal dog strings. I think people often give IT when making exchange. And think of this I I know how to do this um any kind of really need to point IT out to them so that that doesn't sound .

through yeah you have to be reminded you do that damage. I think we don't really think about that. Yes, you think about OK. How do I figure out how this work so I can improve IT? You don't think about the other direction .

until until we have formal verification for everything, then you can do whatever you want. And you you know for certain that you have not introduce a bug if the proof pass.

but can really what do you think that user would look like?

I think people will just not write the tests anymore and um the model will suggest like you write a function, the model will suggest a spec and you review this back and h in the meantime, a smart reasoning model. Computers are proof that the implementation ation follows this back. Um and I think that happens for for most functions.

You think this gets that a little bit some of self you were talking about earlier with the difficulty specifying intend for what you want to offer um where sometimes that might be because the instance is really hard to specified also then going to be really .

hard to prove that it's actually matching whatever .

your intentions like you think that speck is hard to generate yeah or just like for a given back, maybe you can I think there is a question like can you actually do the formal verification like that? Like is that possible? I think that there's like more to get into there. But then also.

even if you have this back.

if you have spent.

how do you you have this back is the backward .

and networking yeah that .

back back would be before.

But how you then I think that you care about things that are not going to be easily well specified in the speak language.

I see I see yeah yeah .

maybe an argument against formal verification ation. Yes, the masses replacing replacing something like unit.

Sure yeah yeah. Um I think you can probably also evolve the detect languages to capture some of the things that they don't really capture. I go by aldona. I think it's very exciting.

And you're speaking not just about like single functions. You're speaking about entire code basis.

I think entire code basis harder, but that that is what I would love to have. And I think I should be possible. And because you can even there's like a lot of work recently where you can prove formally verify down to the hardware.

So like through the you formally verify the sea code and then you formally verify through the G C C compiler and then threw the very logue down to the hardware um and that's like incredibly big system. But IT actually works. And I think big cold visits are sort of similarly that they like multilayer system. And um if you can decompose IT and formally variable for each part, then I think you should be possible. I think this specification problem is a real problem.

But how do you know side effects or how do you handle, I guess, extra nal dependencies like calling strike .

A P I will be stripe, right?

Respect for like you can do this for everything. Like can you do this for everything you use? Like how do you how do you do IT for there's a language more like maybe maybe like people use language models as primitives in the program.

They right? And there is like a dependence on IT. And like how how do you now include that?

I think you might be able to prove, prove that still .

prove what about language models?

I think IT feels possible that you could actually prove that our language model is aligned, for example, or like you can prove that IT actually gives the the right answering um nice to dream yeah .

that is I mean, yes, if it's possible as you I have a dream speech that will certainly help with you know making sure you call doesn't bugs and making sure AI destroy of human civilization. So the full spectrum um of A I safety to just bug finding. Um so he said the models struggle with bug finding. What's the hope?

My hope initially we is and I can let Michael Michael try into Better is like this IT should you know first help with the stupid box. Like you should very quickly catch the stupid box, like off by one edits. Like sometimes you write something in a common and do the other way.

It's like very common. Like I do this. I write like less than in a comment. And like I may be right to create and and the model is like, looks cat lady, do you sure you want to do that? Uh but eventually you should be able to get harder bugs .

to yeah and I think that is also important to that. This is having good bug finding mls feels necessary to get to the high screeches of having A, I do more, more programme for you. Where are you're going to, you know, if A, I is building more, more the system for you, a union, and not trust Jenny, but also verify. And without that, some of the problems that we talked about before with programing with these models um we'll just become tenable. Um so it's not just for humans like you ray bug, I ray a bug, find a bug for me, but it's also being able to to verify the ice code and check IT um is really important.

Yeah and then how do you actually do this? Like we've had a lot of contentious dinner discussion of how do you actually clean a bug model. But one very popular idea is you know it's kind of potentially easy to introduce a bug then actually finding the bug as you can train a model to introduce bugs in existing code um and then you can train a reverse bug model then that uh can find find bugs using this sync data. So that's like one example. Um but yeah there are lots of ideas.

You can also. Um you would also do a unch of work, not even at the model level of taking the biggest models and then maybe giving them access to a lot of information that's not trust the code like it's kind of a hard problem to like stea file and be like worth the bug and you know that's that's hard for humans often right and so often you have to to run the code and being able to see things like traces and step through IT a buggers.

Um there's another whole other direction where like kind attends told that and he could also be that there are a two different product form factors here. I could be that you have a really special team model that's quite fast, that's kind of front in the background and trying to spot bugs and IT might be that sometimes sort of to orbits earlier example about, you know some inferior input box bug might be that sometimes you would like there, you know there's a bug. You're not just like checking hypothesis free. You're like, this is a problem I really want to solve IT and you zp that with tons and tons and tons of computer and you're ruling to put in like fifty dollars to solve that bug or something even more.

Have you thought about integrating money into this whole thing? Like I would pay probably a large amount of money for if you found a bug or even generated code that I really appreciated. I got a moment a few days ago when I started in cursor were generated.

perfect. Uh perfect three functions for interacting with the youtube A P, I to update captions and full localization like in different languages. The A P, I documentation is very good in the code across.

Like if I I google that for a while, I couldn't find exactly there's a lot of confusing information and course is generated perfectly. And I was like, I just set back a red the code I was like, this is correct. I tested IT is correct.

I was like, I want to tip on on a button that goes first five dollars, one that's really good just to support the companies, support what its face is and the others that probably sends a strong signal like good job, right? There is much stronger signal than just accepting the code, right? You just actually send like a strong good job that and for bug finding, obviously like there's a lot of people that will pay a huge amount money for a bug like a bug bug balty thing, right? Is that guys think about that?

Yeah a controversial idea inside the the company. I think this depends on how much you believe in humanity. You know like, uh I think that would be really cool if like uh you spend nothing to try to find a bug and if he doesn't find a bug, you you you spends your dollars and then if IT does find a bug, uh, and you click except then IT also shows like imprint thesis like one dollar, as if you spent one dollar to accept the bug.

Um and then of course there is a worry, like okay, we spent a lot of computation, like maybe people will just copy paste. Um I think that's a worry. Um and then there is also the worry that like introducing money into the product makes IT like kind of you know like IT doesn't feel as fun anymore. Like you have to like think about money and and you all you want to think about is like the code ah, and then maybe IT actually makes more sense to separate IT out. And like you pay some fee like every month and then you get all of these things for free.

But there could be a tipping component, which is not like.

yes, but IT still has that like dollar people I think is fine. But I I also see the point where like maybe you don't want to introduce IT yeah I was .

in the moment that feels like people do this is when they share IT. When they have a fantastic example, they just can't share with their friend.

There is also a potential world where there is a technical solution to this, like honor system problem too, where if we can get to a place where we understand the output of the system more, I mean, to the stuff we are talking about with, like you air tracking with the L, S P, and then also running the code, if you could get to a place where you could actually somehow verify, oh, I have fixed the bug. Maybe then the bouncy system didn't need to rely the on our system too.

How much interactions are between the terminal on the code my, how much information is gained from if you run the code in the terminal, I can you use, can you do like A A loop where runs, runs the code and suggest how to change the code if the code and run time gets an air is right now their separate worlds completely? I can know you can do control k inside the terminal to help you .

write the you you can use terminal contacts as well uh inside of jack, mankind kind of everything. Um we don't have the looping part yet, though we expect something like this could make a lot of sense. There's a question of whether IT happens in the four ground to or if IT happens in the background like what we've been discussing.

sure. The backgrounds is pretty cool, like running the code in different ways. Posted a database site to this, which I had you protected from not modifying the database. But okay.

I mean, there's there's certainly cool solutions that uh, there's this new A P I that is being developed for. It's it's not in E W S H, but you know it's certainly is I think it's in planet scale.

I don't have planned scale was the first one to add IT the security sort of add branches to a database uh which is um like if you are working on a future and you want to tests against the broad database but you don't actually want to test against the broad database, you could so add a branch to the database in the way to do that as to add a branch to the rider and log. Uh and there's a obviously a lot of technical complexity in doing IT recent, I guess, the best companies new things to do. They have good database now. Uh, and and I I think like a turbo buffer, which which is one of the database we use as is is going to add hope, maybe branding to the to the write a log and and so so maybe maybe the the A I agents will use will use branching the like test against some branch and it's out of gonna a requirement for the gate is you like support branching or something?

We be really interesting to you ancha file system.

right? Yeah if you like, everything needs branching. Like.

yeah yeah sick. Is the promote motivation.

right?

Like if you bring on everything that's like a lot.

There's obviously these like super club orgasms to make sure that you don't actually use a lot of spaces. P R, okay.

this is a good place to ask about infrastructure. So you guys most use A W S. What are some interesting details? What are some in chAllenges? Why you chose A W S? Why is? Why is a you still winning hashtag ads?

Is just really, really good, is really good. Like um whenever you use an idea o's product, you just know that it's going to work like I want to be absolute hell to go through the steps to set IT up.

Yeah why is the interface so horrible?

It's just so good. Doesn't the nature of winning? I think exactly.

It's just nature they are winning ah yeah but .

eight years we can always trust like IT will always work. And if there is a problem, it's probably your problem. okay. Is there .

some interesting like chAllenges to he has a pretty new startup to get scaling to like to so many people on yeah .

I think there IT has been an interesting journey. Adding you know each extra zero to the request per second you run into all of these would like that. You know the general components you are using for for cashing in database run into issues as you make things bigger bigger. And now at the scale where we get like you know in to overflows on our tables and things like that. Um and then also there been some custom systems that we built, like for instance are like retrial al system for um computing assem tic index for carbis and answering questions about a could that have continually I feel like being well, one of the thicker thinks to scale.

I I have a few friends who were super, super senior engineers and one of their sort of lies like it's it's very hard to predict where systems will break when when you scale them. You you you can sort of try to predict in advance like this. There's always something something weird that's gonna en when when you add the sectors, you you you thought you thought through everything, which you didn't actually think through everything.

Uh but I think for that particular system, m we've so what that the for concrete details they think we do is obviously we upload um when like we chunk up all of your code and then we set up sort of the code for for embedding and we embedded code and then we store the embedding uh in in the database, but we don't actually store any of the code. And then there's reasons around making sure that we don't introduce client bugs because we are very, very paranoid by client bugs. We store uh uh much of the details on the server, uh, like everything is sort of encrypted.

So one one of the technical chAllenges is, is always making sure that the local index, the local could base state is the same as the state that is on the survey. And and the way technically we ended up doing that is, so for every single file, you can, you can ort of keep this hash. And then for every folder, you can sort of keep a hash, which is the hasher fall of its children.

And you can sort of recursively do that until the top. And why why do something? Something complicated? Uh, one thing you could do is you could keep a hash for every file.

Then every minute you could try to download the hashes that are on the server, figure out what are the files that don't exist in the server, maybe just created a new file, maybe just deleted a files, maybe you checked out a new branch, and try to reconcile the state between the kent and the server. But that introduced like absolutely gorgeous network overhead, both both on the cent side. I mean, nobody really wants us to hammer their life all the time.

If you're using cursor uh but also like I think you would introduce like enormous overhead in the database. IT would sort of be reading a pensive turabi database sort of approaching like twenty carbon get a is like every second kind of crazy. Definitely want to do that.

So what do you do here? You sort of you just try to reconcile the le hash, which is at the rutty project. And then if if something mismatches, then you go, you find where all the things disagree.

Maybe you look at the children and see if the hashes match. If the hash don't match, go look at their children and so on. We only do that in this scenario where things don't match. And for most people, most of the time .

dashes match. yes. I mean, this so yes, this is cool to see that you can have to think through all these problems.

And I mean, the point of the reasons got to hard is just because like the number of people using IT and you know some of your customers to have really, really large code basis, uh to the point where, uh, we we you know we originally rolled a dark base, which is which is big but just just not the size. Some company that's been there for twenty years and sort of has to trade normous number of files. And you sort of wanted skill that across programmers, there's all these details were like building a simple thing is easy, but scaling IT to a lot of people, like a lot of companies, obviously difficult, which is of no independent of actually so that there's part of the killing. Our current solution is also yeah coming up with new idea is that obviously they're working on uh but but in scaling out of the animal last few weeks, months yeah and there .

are a lot of clever things like additional things that they go into this indexing system. Um for example, the bottles neck in terms of cost is not storing things in the Better database. The database actually in betting the code and you know want to rehab at the codes for every single person in a company that is using the same exact code, except for maybe there is for branch with a few different files or they've made a few local changes.

And so because again, in bedding of the bottle, that you going to do this one clever trick and not have to worry about like the complexity of like dealing with branches in the other day basis where you just have some cash on the actual vectors computed from the hash of given chunk. And so this means that when the earth person of the company goes in the best, the cob is is really, really fast. And you do all this without actually storing any code on a at all. No code stored. We just store the vectors in the vector with and the fact cash.

What's the biggest gains at this time you get from index in the code base? Just out of curiosity, like what what benefit users have, this seems like a longer term, there be more more benefit. But in the short term, just asking questions of the code base. Uh what what's the use? What's the useful as of that?

I think the most obvious one is um just you want to find out where something is happening in your alert could be and you sort of have a fuzz memory of, okay I want to find the place where we do x um but you don't exactly know what to search for in a Normal text search as you ask chat ah you had command enter to ask with with that could be stat and then uh very often IT finds the the right place that we're thinking of.

Uh I I think like like you mentioned in the future, I think is only going to get more and more powerful or we're working a lot on improving the equality. Very trivial. And I think this healing for that is really, really much higher .

than people give a created for. One question that's good to ask here, have you considered and why having you much dance of local stuff to wear, you can do that. This seems like everything we just discuss is accepting difficult do to go to go to the cloudy, to think about all these things with the cashing in. The large code bases with a large novel programmes are using in code basic, have to figure out the possible that a lot of IT, most software stuff, this heavy competition stuff locally, have you considered doing sort of ebell tings locally?

Yeah, we thought about IT, and I think that would be cool. To do IT locally, I think, is just really hard and and wanting to keep in mind as that you know ah some of our users use the latest macbook pro and but most of our users like more than eighty percent of users are in windows machines, which uh and and many of them are are not very powerful.

And so local models really only works on the on the latest computers and is also A A big overhead to to to build that in. And so even if we would like to do that, um it's currently not something that we are able to focus on. And I think there there are some uh people that that that do that and I think that's great. Um but especially as models get bigger and bigger and you want to do fans to your things with like bigger models, IT becomes even harder to do IT locally.

Yeah and it's not a problem of like weaker computers. It's just that, for example, if you're some big company I have big company could base is just really hard to process big company code base even on the beest microprobe. So even if it's not even a matter of like if you if you're just like uh, a student or something, I think if you're like the best programming or that at a big company, you're still gna have a horrible experience.

If you do everything locally when you could, you could do edge. And so straight again, IT wouldn't be fun anymore. Yeah, I get .

approximate nearest neighbors on this massive code basis was gonna just eat up your memory in your CPU and and that and that's just that. Let's talk about like also the modeling side where Robert said there are these massive headwinds against uh, local models where one uh things that seems to move towards emos with like one benefit is maybe there more memory than with bound which places in favor of local a verses uh using GPU um are using in video G P S.

But the downside is these models are just bigger and total and you know they're gna need to fit often not even on a single note but multiple nodes. Um there's no way that's gonna fit inside of even really good mac books. And I think especially for coding, it's not a question as much of like does that clear some bar of like the models good enough to do these things? And then like we're satisfied, which may may be the case for other other problems and maybe where local models shine, but people are always gonna the best, the most intelligent, the most capable set of things. And that's going to be really, really hard to run for almost all people locally.

Do you want that the most capable model like you want you want sign .

at you .

and also with would you an infant? I one of those, but there's some people that like to do something locally. Yeah, really there's a whole VC open source moment that kind of resist and it's good to take exist actually because you want to resist the power centers that are growing are there .

is actually an alternative to local models that I am particularly fond. Uh, I think it's still very much in the research stage but you could imagine um to do homework PC encysted for language inference. So you enrapt your input on your local machine, then you stand that up and then um the server uh can use loss of computation.

They can run models that you cannot run locally on this, encrypt the data um but they cannot see what the data is and then they send back the answer and you describe the answer and only you can see the answer. Uh so I think, uh, that's still very much research and all of this is about trying to make the overhead lower because right now the overhead is really big. But if you can make that happen, I think that would be really, really cool. And I think that would be really, really impacted um because I think one thing that's actually kind of worried m is that as these models get Better and Better, uh, they are going to become more and more economically useful.

And so more and more of the world's information in data will through flow through you know wonder to centralized actors um and then there are worries about you know there can be traditional hacker attempts but IT also creates this kind of scary part where if all of the world's information is flowing through one node in plain text um you can have surveilLance ds in very bad ways and sometimes that will happen for no initially will be like good reasons like people will want to charge, protect against like bad actors using AI models in bad ways. And then you will add in some so reliance code and then someone else will come in and you know you're in a slippy slope and then you start uh, doing bad things with a lot of the world's data. And so I am very hopeful that we can solve more encysted for privacy preserving .

machine learning. But I would say like that's the chAllenge we have. All software these days is like there are so many features that we provided from the cloud and all this increasingly relying IT and make our life awesome. But there's downsides, and that is why you relying really good security to protect from basic attacks. But there's also only a small set of companies that are controlling that data, you know, and they they obviously have leveraging that could be infiltrated all kinds of ways that the world we live in.

Yeah, I mean, the thing i'm just actually quite worried about this sort of the world where means the interrogations has this responsible skillings policy. And so where we're unlike the low, low E S S, which is in the topic security, however, uh of like of the models, but as we get you like, could not good yes, or three years of four whatever models which are sort of very powerful, but for for mostly reasonable security reasons, you would want to monitor all the prompts. Uh, but I think I think that's a reasonable and understandable where where everyone is coming from, but may IT be really horrible if if sort of like all the worlds' information is sof a mother that heavy lids that way to centralized it's it's like so is a really fine line. You're walking where on the one side, like you don't want the models to go rog, on the other side, like mads humans, like I don't I don't know if if I trust like all the world information to pass you, like three three model providers.

And what do you think it's different than cloud providers?

Because I think the this is a lot of this data would never have gone to the cloud providers in in the first place. Um where this is often like you want to give more data to the year, your models, you want to give personal data that you would never have put online in the first place uh to these companies or or to these models and IT also centralized control, uh, where right now um for for cloud you can often use your own encysted keys. And like IT was, us can't really do much um but here is just centralized .

actors that see .

the exact plain text of everything on the other type .

of context that that actually been a friction for me on a writing code in python. There's a bunch itself imported. This ah you could probably into IT that kind of stuff. I would like to include the context is there like how hard is IT to auto figure out the context? It's tRicky.

Um I think we can do a lot Better um at computing and contact automatically in the future. One thing that's important to notice there are tradeoff with including automatic context. So the more contacts you include for these models, um first of the lower they are and um the more expensive of those requests her, which means you can then do less model calls and do less fancy stuff in the background.

Also for a lot of these models, they get confused if you have a lot of information in the prompt. So the bar for um accuracy and for relevance of the context you include should be quite high. Um but this is already we do some automatic context in some places within the product is definitely something we want to get a lot Better up. And um I think that there are a lot of cool ideas to try there um both on the learning Better, retrial systems like Better in betting models, Better reviewers.

I think that there are also cool academic ideas, you know stuff we tried out internally but also the field is grappling with rd large about can you get language models to a place where you can actually just have the model itself like understand a new crop of information? And the most popular talk about version of this is, can you make the context windows infinite? Then if you make the context, when is infinite? Can you make the model actually hit hanging to the infinite context? And then after, you can make to pay attention to the infant context, to make IT somewhat feebly, to actually do IT.

Can you then do cashing for that infinite context? You don't after recomputation that all the time. But there are other cool ideas that are being tried that are a little bit more analogous to find tuning.

I've actually learning this information in the weight of the moo and IT might be that you actually gets sort of a qualitiative the different type of understanding, if you do at north the weight level, then if you do IT at the incontestible arn. Love, I think the journey the jury still a little bit out on how this is all gonna in the end ah. But in the interm as the company, we are really excited about Better retrieve systems and picking the parts of the code base that are most relevant. What you're doing.

we could do that a lot of there, like one interesting proof concept for the learning this knowledge directly in the weights is worth VS code. So we're in A V esco work and VS code, the code is all public. So these models in free training a scene of the code um they probably also seen questions and answers about IT and then they've been fine tune and early shift build to people to answer questions about code in general.

So when you ask a question about the ask code, you sometimes that pollution, mate, but sometimes that actually does a pretty good job at entering the question. And I think like this is just by IT happens to be OK. But what if you could actually like specifically trainer post train model such that IT really was built to understand this code base? Um it's an open research question, one that work what interested in.

And then there's also uncertainty of like you want the model to be the thing that end to end is doing everything is doing your trivial and its internals and then kind of answering a question creating the code or do you want to separate the retrieval from the frontier model? Or maybe you know, you'll get some really capable models that are much Better than like the best open source ones in a handful months. And then you want to separately train a really good open source model to be the retriever, to be the thing that feeds in the context um to these measure models.

Can people more to the post training a model to understand the code base? I really know what I mean that with this stated direction is this yeah I mean.

there are many possible ways you could try doing and there's certainly no shortage of ideas. Um it's just a question of going in and like trying all them and being emirate le about which one works best. Um you know one onder and naive thing is to try to replicate ate what's done ah with the s code uh and these frontier models.

So let's like continue pretrail ing some kind of continue to be training that includes general how data but also throws in a lot of the data of some particular reports tory that you care about and then impose training um meaning in let's just start with instruction for tuning. You have like a Normal struction funding data set about code, but you thrown a lot of questions about code in that repository. Um so you could either get ground truth ones which might be difficult or you could do what you kind of hinted out or suggested using synthetic data. Um I E kind of having the model uh, ask questions about various pieces of code. Um so you can take the piece of of the code, then prompt the model or have a model proposed question for that piece of code and then add those uh as instruction, find any data points and then in theory, this might unlock the models ability to answer questions about that code base.

Ask you about open eye or one, what do you think is a role of that kind of test time computer system in programme?

I think test time compute is really, really interesting. So there's been the retraining regime, which will kind of as you scale up the amount of data and the size of your model, get you'd Better and Better performance both on loss and then on downtown benchMarks um and just general performance. We use IT for coating or other cast.

Um we're starting to hit uh a bit of a data wall, meaning it's going to be hard to continue scaling up this regime. And so scaling up ten test time computer is an interesting way of now you increasing the number of inference time flops that we use, but still getting like like yet as you increase the number flocks use in front time, getting corresponding uh improvements in in performance of these models. Traditionally, we just had to literally train a bigger model that always uses uh that always use that many more flows.

But now we could perhaps use the same size model um and run IT for longer to be able to get uh an answer at the quality of much lower model. And so the really interesting thing I like about this is there are some problems that perhaps require hundred trillion premier model intelligence trained one hundred million tokens um but that's like maybe one percent, maybe like point one percent of all quarries. So are you going to spend all of this effort, all this computer train model, uh, that cost that much and then run IT so infrequently? IT feels completely wasteful when I said, you get the model that end that that you train. The model is capable doing the ninety nine point nine percent of quarries. Then you have a way of inference time running at longer for those few people that really, really want max intelligence.

How do you figure out which problem requires what level of intelligence so are possible to dynamically figure out when to use deputy for when to use, like when to use a small model and when you need to the old one?

I mean, yeah, that's that's an open research problem. Certainly, I don't think anyone exactly cracked this model riding problem quite well. We'd like to we have like kind of initial implementations of this for things, for something. I curse your tab.

Um but at the level of like it's going between four a salt to a one, um it's a bit tRicky like there's also a questions like what level of intelligence do you need to determine if the thing is uh, too hard for for the the four love model? Maybe you need the one level. Um it's really unclear.

but you mention so there's this a pretrail ing process and this post training and there's like test time compute, that fair is a sort of separate where's the big gains?

Um well, it's weird because like test time computer, there's like a whole training strategy needed to get test time to computer work. And the really the other really weird thing about this is known like outside of the big labs and maybe even just open eye known really knows how IT works. Like there were some really interesting papers that uh show hints of what they might be doing.

And so IT, perhaps they are doing something with tree search using progress reward models. But yeah, I just I think the issue is we don't quite know exactly what he looks like. So I would be hard to kind of comment on like where IT fits in. I would put IT in post training. But maybe like the compute spent for this kind of for getting test time computer to work for a model is going to draw for retraining eventually.

So we don't even know if all one is using just like chain of thought are, we don't know how they're using of these.

We don't know anything. It's fun to speculate if if you were .

to build a competing model, what would you do?

yeah. So one thing to do would be I think you probably need to train the process ward model, which is so maybe we can get into reward models and outcome reward models first as process reward models. Outcome reward models are the kind of traditional reward models that people are trained for these for for language models, language modeling. And it's just looking at the final thing.

So if you're doing the math problem, let's look at the final thing, you've done everything and let's assign a greater how likely we think um like what's the reward for this this this outcome process the reward models and said try to grade the chain of thought in the OpenAI had some preliminary paper on this, I think a last summer where they use human label's to get this pretty long several hundred thousand data set of creating chances of thought. Um ultimately IT feels like haven't seen anything interesting in in the ways that people use processor word models outside of just using IT as a means of uh affecting how we choose between a bunch of sample. So like what people do um in all these papers is a sample, a bunch of options from the language model and then use the process reward models to grade uh, all those generations alongside maybe some other heroics.

And I used that to choose the best answer. The really interesting thing that people think might work and people want to work, is research with these processor d models. Because if you really can grade every single step are the chain of thought, then you can kind of branch out and you know explore multiple pats of the chain of thought and then use these process formal to evaluate how good is this branch you're taking.

Yeah when that when the quality of the branch is somehow strongly core later with the quality the outcome at the very end. So I have a good model of knowing which brand should take, so not just in the short term and like in the long term.

Yeah and like the interesting work that I think has been done is figure out how to properly train the process or the interesting work that has been open source in people I think a talk about is uh how to train the processor d models, maybe a more automated way um uh, I could be wrong here, could not be mentioned some papers. I haven't seen anything super, uh, that seems to work really well for using the process reward models creatively to do researching code.

This is kind of an A I safety, maybe a bit of a philosopher question. So OpenAI says that they're hiding the chain of thought from the user and they've said that was a discount decision to make. They instead of showing the chain of thought, they're asking the model to summarize the chain of thought.

They're also in the background. They're going to monitor the chain of to make sure the models not trying to manipulate the user, which is a fascinating possibility. But anyway, what do you think about hiding the chain of thought?

One consideration for open I and this completely specular could be that they want to make IT hard for people to distal these capabilities out of their model. IT might actually be easier. You have access to that hidden train of thought h to replicate the technology because that's pretty important data like seeing .

seeing the steps of A T train on that.

And there was sort of a mirror situation with this, with some of the large linux mole's ders. And also this is speculation. But um some of these A P S i'm used to offer easy access to log properties for all the token s generating um and also log probabilities of for the prom token ins, and then some of these s took those away and again complete speculation.

But um one of the thoughts is that the reason those were taken away, if you have access lot abilities um similar to have thought that can give you even more information to try to these capabilities out the A P I out of these bigger models and to models you control as an atra s on also the the previous discussion about us integrating a one, I think that were still learning how to use this model. So we made o one available in cursor because like we were when we got the model, you're really interested in trying IT out. I think a lot of programmers are really interested in trying IT out.

But uh, one is not part of the default cursor experience in any way up. Um and we still haven't found a way to you get integrated into an editor, into the editor in a way that we we we reach for sort of you know every hour may be even every day. And so I think the jury still out on how how to use the model.

And we haven't seen example of of of people releasing things where IT seems really clear, like, oh, that's that's like now the use case, the obvious one to return to us. Maybe this can make IT easier for you to have these background things running right, to have these models and loops, to have these models agented. Um but we're still um still discovering.

to be clear, we have ideas. We just need to when you try and get something incredibly useful before we we put IT out.

But IT has these significant limitations like even like foreign capabilities. Uh IT does not stream and that means it's really, really painful to use for things where you want to supervise the output um and instead you're just waiting for the world text to show up. Also, IT does feel like the early innings of testing computer search where it's just like a very, very much of this year um and there's so many things that like like don't feel quite right. And I suspect um in parallel to people increasing uh the amount of retraining data and besides the models and retraining and finding tricks air, you'll know this other thread of getting search to work Better, Better.

So let me ask you about strawberry tomorrow eyes. So IT looks like get hub copilot might be integrating a one in some kind of way. And I think some of the comments are saying this this mean course or is done. I think it's one common saying .

that as I time to shut down Christy.

time to shut down. So is the time to ut down cursor.

I think this piece is a little bit different from past software spaces over the the twenty um where I think that the ceiling here is really, really, really incredibly high. And so I think that the best product in three to four years will just be so much more useful than the best product today. And you can like wax poetic about modest brand added, you know, this is our advantage.

But I think in the end, just if you don't have late, if you stop innovating on the product, you you will lose. And that's also a great for startups um that's great for people trying to to enter this market um because that means you have an opportunity to win against people who have you know lots of users already by just building something Better. I think yeah, over the next few years, it's trust about building the best product, building the best system of in that both comes down to the modeling engine side of things and IT also comes down to the to the editing experience.

Yeah, I think most of the additional value from current first is everything else out there is not just integrating the new model fast like a one IT comes from all of the kind of debt that goes into these custom models that you don't realize are working for you and kind of every facet of the product as well as like the really thoughtful U. X. With every single feature.

right? Uh, from that profound answer IT was to send back down to the technical you changed .

the oxi a no 呀。

Can you please explain?

Yeah I think um there are three main kinds of synthetic data. The first so so what is synthetic data first? So there's Normal data like non synthetic data, which is just that's naturally created.

I E usually it'll be from humans having done things. So from some human process you get this data, synthetic data. Uh the first one would be installation. So having a language model output tokens or probability distributions over tokens um and then you can train some less capable model on this. Uh this approach is not going to get you a net like more capable model in the original one that has produce the tokens.

Um what is really useful for if there's some capability you want to a list IT from some really expensive highly model can then that is still that down into some smaller ta specific model. Um the second kind is when like one direction of the problem is easier than the reverse. And so a great example of this is bug production, like me much earlier, where it's a lot easier to introduce reasonable looking bugs, then IT is to actually attack them.

And this is was a really a case for humans to um and so what you can do, you get a model that's not change that much data, that's not that's mart to introduce the boxing code. And then you can use that to then train use this synthetic caa to train a model that can be really good detecting bugs. The last categories I think is I guess the main one that feels like the big labs are doing for synthetic data, which is um producing tax with language models that can then be verified easily.

So like, you know, extreme example of this is if you have a verification system that can detect if language is shakespeare level in any, a bunch of monkeys typing and typewriters like you can eventually get enough train data to train, to shake through a level language model. And I mean, this is the case like very much the case for math, where verification is is is actually really, really easy for formal um formal languages. And then what you can do is you going to have an OK model h generate a ton of rollouts and then choose the ones that, you know, have actually proved the ground, the train further.

Uh, there are similar things you can do for code with lee, code like problems, or uh, where if you have some set test that you know correspond to if something passed these test, IT is actually solve the problem. You could do the same thing where we verify that is, pass the test and then train them all on the output set of past the tests. I think it's to be a little tRicky. Getting this to work in all domains are just in general, like having the perfect verified feels really, really hard to do was just like open ended missile's task you give the model or more like long horizon tasks, even encoding.

That's because you not as optimistic is our but yeah so yes that that the third category requires having a verified yeah verification .

is IT feels like it's best when you know for a fact that is correct. And like then like IT, IT wouldn't be like using a language model to verify. It'd be using tests or formal systems or running the thing too .

doing like the humane form furious ation where you just do manual quality control yeah but like .

the building a different .

version that works like running the thing is actually understand the but yeah no that's further summer between .

yeah I think that that's the category that is um most likely result in like massive gains.

What about ral with feedback side? R H F forces R I F um what's the role of that in getting Better performance on the models?

Yeah so all H F is when the reward model you use is trained from some labels you've collected from humans getting feedback. Um I think this works if you have the ability to get a ton of human feedback for this kind of test you care about R R L A I F is interesting.

Uh, because you're kind of depending on like this is actually kind of going to that is depending on the constraints, that verification is actually a decent but easier than generation because he feels like, okay, like what are you doing? You're using this language model to look at the language models and then through the language model. But no IT actually may work if the language model has a much easier time verifying some solution, then IT does generate then you actually could perhaps get this kind of for curse if but I don't think it's going to look exactly like that. Um the other the other thing you can do is that we kind of do is like a little bit of a mix of R A I F in R H F, where usually the model is actually quite correct and this is the case curr top picking. Between like two possible generations of what is what is the Better one and then IT just needs like a hand a little bit of human nudging with only like on on the order fifty one hundred uh examples um to like kind of a line that prior the model has what exactly was what what you want IT looks different that I think Normal usually training these reward models and tones examples.

What's your intuition when you compare generation and verification, generation and ranking? Is is drinking way easier? And generation .

my intuition we just say, yeah IT should be like this is kind of going back to like if you if you believe p is not equal mp, then there is this massive class of problems that are much, much easier to verify, given improved than actually proving in.

I wonder if the same thing would prove p, 一, P2, P, P.

that would be, that would be .

really cool. Be of whatever feels metal by AI. Who gets the credit? No, there.

Open files sofa question.

I'm actually, i'm actually surprisingly .

cares what what, what. Like a good, bad for one, a one I will get the fields .

that will be.

uh, I I don't know what .

a man bet here is a sorry .

nobel prize are a level feels more .

come first? Well, you would say that, of course.

but is also like isolated system. No, sure. Like I don't even know if don't need to do.

I feel like there I felt like the path to get to imo was a little bit more clear because the ari could get a few imo problems. And there are a bunch of like there is a bunch of lowing through given the literature at the time of like what what tactics people could take, I think won much less version in the space that they are improving now until yeah less intuition about how close we are to solving these really, really hard open problems.

So you would think you'll feels much of first IT won't be like in physics .

or in a hundred. I think that's probably like me. It's probably much more likely they'll get them yeah yeah.

Why I think IT puts to like I don't know, like B, S, D, which is the birds will turn direct ductor when I pots this, like hard, hard math from searches, actually really hard, sort of unclear what to past you, to get even a solution. Looks like we don't even know what a past looks like. That alone.

you like an isolated system. And you can actually, you have a good reward system. And uh, IT feels like it's easier to train for that.

I think we might get fields medal before agi.

Think you very happy, you're very happy, but I don't know. I think twenty, twenty eight, twenty thirty feels so, feels metal alright.

It's feels like forever from now giving a half fast things have been going. Speaking of how fast things have been going on, the stock was scaling was so for people who don't know, uh, maybe is good to talk about this whole a idea of scaling was what are they? What do you think stand and what do you think things are .

going IT was interesting. The original scaling laws paper by opening, I was slightly wrong because I think of some issues they did with a learning rate schedules and an chella showed a more correct version. And then from then, people have again have deviated from doing the computer optimax ers.

People start now optimising more show for making the thing work really well, given a given an inference budget. When I think there are a lot more dimensions to these curves than what we rationally used of just compute number of, uh, premiers and data like influence computers is the obvious one. I think contents linked is another obvious one.

If you care like let's say you care about the two things of inference, compute and and then, uh, context window, maybe the thing you want to train is some kind of Better than because they're much, much cheaper and faster x super super log context. And even if maybe IT is tennessee AR scaling properties during training, many spend tanks more compute to train the thing to get the same same level capabilities. Um it's worth IT because you care most about that inference budget for really long context windows. So interesting to see how people kind of play with all these dimensions.

So yeah me speak to the mutio dimensions. Obviously, the original conception was looking variables of the size of the model as measured by partners and the size, the data measured by the number of tokens. And looking at the show, the two, yeah.

And it's kind of a compelling ocean that there is a number, or at least a minimum. And IT seems like one was emerging. Do you still believe that there is a kind of bigger is Better?

I mean, I think bigger is certainly Better for just raw performance, grow intelligence and wrong intelligence.

I think that the path that people might take is i'm particularly bullish, ed, on distillation and like, yeah, how many loves can you turn to if we spend like a ton ton of money on training? Like get the most capable, uh, cheap model right? Like really, really Carrying as much as you can cause is like the name version of caring as much as you can about inference time, computers, what people already done with like the lama models just overtraining the shit out of seven b models um on way, way, way more tokens than essential.

But if you really care about IT, maybe thing to do is what I did, which is less, is not let's not just train on tokens. Let's literally train on uh many minimizing the kid divergence with uh the distribution of gamma twenty seven right knowledge displaying there. Um and you're spending the compute of literally training this twenty seven billion model, billion primer model on all these tokens just to get out this a smaller model .

and the distillation gives you just a faster model.

Smaller means faster. Yeah display tion in theory is um I think getting out more signal from the data that you're training on. And it's like another it's perhaps another way of getting over, not like completely over, but like partially helping with the data wall. We're like you only have so much data train on let's like train this really, really big model on all these tokens and we'll distill IT into the smaller one and maybe we can get more signal uh per token, uh, for this for this much smaller model than we would affluents retrained IT.

So if I gave you ten trillion dollars, how? How do you spend IT? I mean, can buy an island or whatever. How do you allocated in terms of improving a the big model? Verses may be paying for H, F in the R, L, H, F, yeah.

yeah. I think there's a lot of these secrets and details about training these large models that I just don't know and only previous to the large labs. And the issue is I would waste a lot of that money if I even attempt to this because I wouldn't know those things. Suspending a lot of this belief and assuming like you, you had to know how um and Operate or or if you're saying like you have to Operate with like the limited information .

you have now no H L C, you sweep in. You get all the information. All the little here is all little premiers. All the all the premiers are defined how the thing is trained. If we look in how to invest money for the next five years in terms of maximizing what you call raw intelligence.

isn't the answer like really simple, you just, you just try to get as much computer as possible, like they get, the end of the day, all you need to buy the gp s and then that the researchers can find find all the all like they get sort of you you can tune, whether you want between a big model or a small model like.

well, this gets into the question of like, are you really limited by computer and money? Are you limited by these other things?

I'm more privilege orbits, orbits believe that we were sort of ideal limited, but there's always like.

but if you have a lot of compute, you can run a lot of experiments.

So you would run a lot of experiments verses like that computer rendered agented model.

I would. But I I do believe that we are limited in terms of ideas that we have.

I think yeah because even with all this compute and like you know, all the data you can collect in the world, thank you really are ultimately limited by not any ideas but just like really good engineering, like even with all the cabinet in the world, would you really be able to assemble like there aren't that many people in the world who really like make the difference here um and and there's so much work that goes into research that is just like pure really, really hard engineering work is like a very hand way.

The example, if you look at the original transformer paper, you know how much work was kind of joined together, a lot of these really interesting concepts invented in the watershed versus then going in and writing all the codes like maybe the cut kernels, maybe whatever else. I don't know if you run in G, P, S or tps originally, so that IT actually sasser ated A G P, G P performance, right? Getting know share here to go in and do do all this code right? And then probably on the best engineers in the world or maybe going a step further, like the next generation of models having these things like getting model paralysis to work and scaling.

And unlike you know thousands of for maybe ten of thousands of like one hundred ds, which I think GPT three may have been. Um there's so much engineering effort that has to go into all these things to make IT work. Um if you really brought that costs down to like you know may be not zero but just made a tennis easier or made IT super easy for someone with really fantastic ideas to immediately get to the version of like the new architecture they dreamed up that is like getting fifty forty percent. The station on G P S. I think I would just speed up research by a ton.

I mean, I think I think if if you see a clear past the improvement you you should always sort of take a low food first train, I think probably open the iron and all the other laws, the direct thing to pick off the low hanging fruit, where the low hanging fruit is like sort of you you could scale up to a GPT for point two five scale um and just keep scaling and and like things, things keep getting Better.

And as long as like you, there's there's no point of experimenting with new ideas. We like everything. Everything is working and you you just sort of bang on IT and try try to get as much as much do that as a possible. And then maybe maybe when you really need new ideas for I think I think if if you're spending count in nars, probably want to spend some, you then actually like to evaluate your ideas like probably your ideal abated at that point.

I think all of us believe new ideas are probably needed to get you know all the way there to H I. And all this also probably believe there are exist ways of testing out those ideas at smaller skills um and being fairly confident that we'll play out. It's just quite difficult for the labs in their current position to dedicate their very limited research and engineering talent to explore all these other ideas when there's like this courting that will probably like improved performance for some like this and out of time.

Yeah but also these big labs like winning, they just go on wild.

okay.

So how a big question looking out to the future. You're now at the the center of the programing world. How do you think programming the nature programing changes in the next few months, in the next year, in the next two years and next five years, ten years?

I think we're really excited about the future where the programmer in the driver seat for a long time, and you've heard you to talk about this a little bit, but one that emphasizes speed and agency for the programmer and control the ability to modify anything you won't modify, the ability to really, really fast you're building. And this is a little different. I think that where some people are are are jumping to in the space.

Well, I think one idea that captivated people is can you talk to your um computer? Can you have a built off off for you as if you're talking to like an engineering apartment or an engineer over slack? And can I just be this this sort later text box? And um part of the reason were not excited about that IT is you know some stuff we talked about with ency, but then a big piece a reason were not excited about that is because that comes with giving up a lot of control.

It's much harder to be really specific when you're talking in the text box. And um if you're necessarily just going to communicate with a thing like you would be communicating with an engineering apartment, you're actually application tons of tons of really important decisions um to the spot um and this kind of gets that fundamentally what engineering is. Um I think that some people who are a little bit more removed from engineering, I think of IT, is you know the spec is completely right out and then the engineers just calm in the implement and I trust about making the thing happening code making the thing exists.

Um but I think a lot of the the best engineering engineering we enjoy involves tones of tiny microvision ons about what exactly you're building and about really hard trade ffs, between speed and cost, all the other things, things involved in a system and a we want as long as humans are actually the one's speaking, designing the software and the ones um specifying what they want to be built and it's not just like company run by OA eyes. We think you'll really want the humor, the human in the driver seat um dictation, these decisions and so is the jury's are out on of what that looks like. I think that you know, one weird idea for what that could look like is IT could look like you kind of you can control the level of abstraction, view a code and you can point out specific parts of code base that may like maybe you digest a code base by looking at IT in the form of seduced and um you can actually edit that city code too and then have changes get me down at the this sort of formal programme level and you keep the like. You know you can just at any piece of logic are in your software component of programing, you keep the inflow text editing component of programming you keep the control of, you can even go down into the code, you can go at higher levels of abstraction. While also giving you these big productive gains .

would be nice if you go up and down the the abstraction stack and there .

are a lot of details figure out there that sort of like a fuzz idea. Time will tell that actually works. But these these principles of of control and speed in the human, in the driver seat, we think are really important. Um we think for some things like our ve had mention before, for some stars of program, you can kind of handed off what style you know if you have a bug that's really well specified. But that's not most of programming and that's also not most of the programming we think a lot of people value.

What about like the fundamental skill of programing? There's a lot of people like Young people right now kind scared, like thinking because they like love programing, but they're scared about will I be able to have a future if I pursue this career path. If you think the very skill of programing will change fundamentally.

I actually think this is a really, really exciting time to be building. Soften, like we remember what programme was like and you know thirteen, 14, whatever was and serious, so much more craft and boiler played and you know looking up something really early and yet at at that to exit is definitely not at at zero. But problem is way more fun than back then.

Um it's like for really getting down to the the delight, concentration and all all the things that really draw people are programing like for instance, ince this element of being able to build things really fast and um speed and also individual control, like all those are just being turned up a time. And so I think it's just I think this can be really, really fun time for people who built so far. Um I think that the skills will probably change to you. I think that people's taste and creative ideas will be magnified and IT will be less about may be less a little bit about or they play text editing maybe even a little bit less about carefulness, which I think is really important today. If you're a programmer.

I could be a lot more fun. But I think I agree.

I am very excited to be able to change like just what one thing that that happened recently was like, we wanted to do a relative big migration or could be we we're using acing local storage nog s, which is known to be not very performance. We wanted to migrate to a context object. And this is a big migration and effects the entire code base and swallow.

I spent, I know, five days working through this, even with today's A I tools. And I am really excited for a future where I can just show a couple of examples, and then the AI applies that to all of the locations, and then IT highlights, oh, this is a new example, like, what should I do? And then I show exactly what to do there.

And then that can be done in like ten minutes. Uh, and then you can iterate much, much faster than you can. Then you don't have to think as much up front and stay stand at the blackboard and like think exactly like how we're going to do this because the cost is so high.

But you can just try something first and you realized, oh, this is not actually exactly what I want. And then you can change IT instantly again after. And so yeah, I think being a programmer in the future is going to be a lot of fun.

Yeah, I really like that point about IT feels like a lot of the time with programing. There are two ways and go out of one is like you think really hard carefully in front about the best possible way to do IT, and then you spend your limited time of engineering to actually implemented. Uh, but I must refer, just getting in the code and like, you know, taking a crack, seeing that lays out and then iterating really quickly on that, that feels more fun.

Yeah, I just begin to generate the boiler plate is great. So you just focus on the difficult design nuances, difficult design decisions and migration. I feel like this is this is a cool one. I get seems like large, large models able to basically translate for one program language another or like translate like migrate in the general sense of what migrate is um but that's in the current moment.

So I mean that the fear has to do with like, okay, these models get Better, Better than you're doing lesson less creative decisions and is going to kind of move to a place where it's, uh, you are Operating in the design space of natural language or natural language is the main programme language. And I guess I get asked that by way of advice, like somebody he's interested in programing now, what do you think they should learn? Like to say, you guys started in in java.

And have you get there some P, H, P, objective c, objective c. There you go here. I mean, in the end, we all know javascript going to win.

And that types script, this is going to be like vanilla gy of script t going to eat the world and maybe a little peach p and I mean, they also brings up the question of lake. I think dk note has a this idea that some percent of the population is geeks. And like there's a particular kind of psychology and mind required for programing. And he feels like more and more than expense the kind of person that should be able to, can do grade programing. My expand.

I think different people do programing for different reasons. But I think the true, maybe like the best programmer um are the ones that really love, just like absolutely love programing.

For example, the folks in our team who literally when there they get back from work, they go and then they booter curter and then they start coding underside projects for the entire night and they say also three am doing that um and when they're sad they said, I just really need to code. I think like you are there's that level programmer or where like this subsection sion and low of programing um I think makes really the best programmer. And I think these types of people will really get into the details of how things work.

I guess the question i'm asking that exact program, think about that person one year, one super tab, the super awesome praised by the tab is succeeds you keep in the team first to .

tap more than anybody else yeah and it's also not just .

like like pressing tab is like the just pressing tab. That's like the easy way to say IT in the in the catch catch phrase, you know uh but what you're actually doing when you're pressing tab is that your you're injecting intent uh all the time while you're doing IT, your you're uh sometimes you're rejecting and sometimes you're typing a few more characters um and and that's the way that you're um you're sort of shaping the things as being created. And I think programing will change a lot to just what is that, that you want to make this .

sort of higher band with the communication to the computer just becomes hier higher band with as a purse like like just typing as much lower back with than than communicating intent.

I mean, let's go here, a manifesto titled engineering genius we are and applied research that building extraordinary product of human A I systems. So speaking to this like hybrid element, to start, we're building the engineer of the future, a human AI programmer. That's an order magnitude more effective than any one engineer.

This hybrid engineer will have effortless control over their cold base and no low entropy y strokes. They will iterate at the speed of their judgment. Even in the most complex systems, using a combination of A I in human ingenuity, they will outsmart and out engineer the best pure A I systems.

We are a group of researchers and engineers. We build software models to invent at the edge of was useful as possible. Our work has already improved the lives of hundreds of thousands of programmer. And on the way to that, will at least make programing more fun. So thank you for talking today.

Thank you for thank you.

Thanks for listening to this conversation with Michael swale, arvid and amon. To support this pocket, please check our sponsors in the description tion. And now let me leave you with the random, funny and perhaps profound programing court I saw on redit. Nothing is as permanent as a temporary solution that works. Thank you for listening and hope to see you next time.