EP83: Self Driving Computers, plus SearchGPT, & Github Copilot with Sonnet

2024/10/31

Chapters

The episode explores the implications and potential of self-driving computers, discussing their capabilities, security risks, and the future of AI-based computing.

AI-based computing as the ultimate interface
Security risks and challenges with AI controlling computers
The need for improved skills and tools for AI to handle novel situations
Potential for AI to automate repetitive tasks and improve productivity

Crease this week. Well, I guess that was last week, we talked about this idea of computer use that anthrops released in the impact that IT had had on us. And there was certainly a lot of discussion during the week about how people might use these upcoming self driving computers, as i've now calling them.

Yeah, I had a particularly powerful impact on me just because i'd speculated several times that this is really the future of A I based computing, simply because IT is the ultimate interface in terms of if I can use a computer, I can do everything that a remote work currently can, which is most things yeah.

I I think like I go through phases of having these crisis moments with A I staff where, you know, the mine wonders and you think, well wants, the computer can Operate itself and the context improves like the amount of information that I can take in more than yeah IT can pretty much accomplish any task. And of course, this week we've seen open a eye, release a search feature, google, which will get to a little bit later, releasing a grounded search capability as well. So as all these tools become available uh, to the AI models themselves and then they can use computers as well, this very little that they can't do that acknowledged work I would do.

I think it's definitely having used to a fair bit now. It's definitely one of those cases where the reality doesn't quite match the vision in the sense that it's got problems. It's not perfect, right? But I think we on this podcast really focus on the vision side of IT because we know inevitably the technology catches up, like if you can have an A I model, look at the screen, locate an icon, like, for example, in our example last week, locate the coffee category on ubs.

If I can do that, then inevitably, as the technology improves and gets faster and more accurate, it'll be able to do anything on the screen, right? And so I think looking at that vision, the possibilities absolutely enormous. yeah.

And I guess that in moments like this, people bring up a lot of the problems with that, which I think many people well aware of. Some of them, for example, security, like we talked about last week on the show with our work space computer concept for symptoms. One of the things we're doing is allowing people to have this personalized virtual computer.

And what that enables is you to set the computer up for specific task. Like, for example, you might load up microsoft ics cell a into that works space computer, logged into your three, six, five accounts. So that kind of A I has access to to your computer and can go do things in the background for you. But they obviously some security risk in that, like the AI having control of a computer that has some authentication.

That's true. I mean, there's always risks with everything you do, but lots of people use computers and virtualize ed environments like it's an entire industry. The security is industry standard like that part i'm not worried about.

So yes, giving an A I agency to act as you is a new concept and maybe one that will have inherent risks with that. But I also think that, that is where the real power comes from, this new technology because we've had browser automation for a long time. We've had, uh, you know, software, what is that called infrastructure as code, where you can spin up a server and installed programs.

And only stuff for ages like those parts of IT aren't new. What is new is the A I being able to handle novel situations when using a desktop computer environment, so he can be given an abstract concrete goal and IT can then figure out what to do and then do IT. But most importantly, I can react when things go wrong.

One of the things I said to you in all my testing that i've done in the last week is I noticed so often I would set the sym theory workspace computer off on a task and look at what it's doing, like the amount of times that opened microsoft team just for no reason. I'm like it's almost like propaganda level at this point. But IT did all this just nonsense stuff for you, like these things just so stupid.

And then suddenly I check back in a little lighter and the task has been accomplished. And so I think that's what's really getting me excited about. It's like, okay, it's it's like a slow I I think is what I should need. It's slow, but IT gets there. And I think that's what's really exciting about IT.

So do you know what I think the best lights to and i'm sure people anyone would that ever owned a robot vacuum cleaner or one of these, like two in ones like where I can open vacuum and will totally relate. And I think it's the best ideology is right now.

It's like one of them where if you painstakingly watch IT do its tars, like go around, say you k and vacuum, and try mob like mind does IT like IT takes like probably ten minutes to find the kitchen. And I like you, stupid thing. And you to do this every literally every day, and you still can't figure IT out.

But IT eventually gets that and IT eventually does the job. Not very well though. Yeah.

I know what you mean and you see that missing obvious an obvious win and I can't get there. And I think this is really the insight that i've had with IT during the week. And I spoke to you about this several times and my inside is this, you need to equipped with skills, and the skills can't just be move mouse click and drag, uh, you know, double click, which are are its basic skills, or use key Prices and things like that.

That's what anthropic has done with their model lives. Is they ve optimized to do those kind of actions based on what IT sees. And one of the things we've done to try to make IT faster is batch the tour you. So it'll be like, all right, I look at the screen, it's sort of like rehearsal, like an olympic dive or something.

It's like i'm onna do a double back in a half pack whatever um and then do do no splash and so what it's doing is i'm gonna click that scroll button then i'm gonna drag IT for not point five seconds then i'm gone to move the mouse to the right and double click something and so then IT has a crack. IT does those things based on you, giving you the ability to actuate IT. And then IT takes another screen shot and takes a look and go, how did I do? And that's when IT often makes a mistake.

It's like, okay, I stuff that up. I've now going to move the mouse to this point and click IT and stuff at that. Now that's obviously incredibly painstaking. But what you and I discuss is what if you take the skills beyond that level?

It's like, what if you allow IT to do things like I want to move the mass in a curve shape to say draw a dash hand, which is what i've been trying to do during the week. Open paint, draw a dash hand. I don't know why I pick that one um the curves and so um and so I was like in order to IT to be able to do that, IT needs to be able to draw curve.

It's not gonna be able to just iterate, just mashing through my credit doing one picks at time or something like that. It's going to need sort of maco base tools and things like that. And then we like, well, there's also common things like opening a program, opening a new tab in a web brother, maximizing things, you know, things that you do on a regular basis on a computer, are things that I could be taught to do in one motion, right? So IT isn't this idea that IT has to learn from scratch for every single task get ever does. And I think that's where the application layer, such as sym theory, is. Gonna really add something to this technique because it's going to be needed in order to make IT actually usable for real tasks.

Yeah there was a discussion in our um this day and I I community this week that I timed in on which is this ideas like is in pixels the best way to Operate a computer like the idea that you maybe there's at the need for like an A I O S or A I specific applications or websites where you know it's just easier for the AI to use them. And I sort of made the comparison to self driving where it's like, no, we're not out there making the existing road infrastructure any easier for self driving cars like they just have to Operate like a human and that's why the problem is so hard to solve. And you know so I can see like I think these skills that were teaching is like the early days of when tesla was developing full self driving, where I had a lot of lines of sea code where IT was like literally if this than that, like if dash hand runs in front of vehicle, don't hit IT.

It's like the it's like the john comic stall of programing. Like if you looked at the original quake source code, like the quake c source code, when I first looked at that, I was just blown away by just has simple and literal the code wards. It's like there were literal if statements for different things that might happen in the game.

And IT worked brilliantly. And I think that to some degree, uh, that's what's needed in programming in making something that's useful. And I think everyone has this idea.

I wouldn't the A I be great if I just had this universal A P. I. And I can like call my gmail, like send an email, I can call my g mail to archive, do this.

And there's a perfectly formatted to use A P. I call for everything I could ever want you to do. But we know that that's not possible.

IT was like when they talked about the semantic webs, like all everyone put special tags that will deny, this is a books title, this is a thing, this is a link, this is a whatever. And that'll be amazing. But the reality is that doesn't happen. And I feel like this idea of a universal A P. I needs to go beyond the idea that humans will create IT needs to be like our universal A P. I is, i'm just a human and i've got a brain and I can click and I can do the same stuff and use a regular user interface and I think that's where it's like, okay, yeah, you can sort of be reduction and say it's just counting pixel. But the truth is what it's doing is using an interface that allows you to the best possibility of getting any toss done.

Yeah and I kind of think even in the shorter time, developing skills like your dream of of drawing a dog in pain with this curve moves, in the short time that kind of gets you do a Better level where I can just do more things and IT starts to become a useful. But the reality is, I think the model just has to get to the point where you can just handle these things. I GUI don't think there is going to be a bunch of people like they. They might be in the short term, but they going to developed this middle are but I think just like weve seen with testers, eventually, they're just going to get end to end and figure out literally .

how to drive possibly yeah I think in the in the short of, tell me, are giving IT a lot of tools at its disposal going is going to be needed. But you're right. I think at some point, the speeds a big factor, like if you can get a lot faster, I can just iterate more. And so at some point, making all the mistakes doesn't really matter so much. But um yeah I think in the short term, it's going to be empowers IT to do the best possible job to actually do something useful because I think a lot of the examples were saying right now on eggs are rather very contrived there, either Cherry picked examples or they're just flukes that IT just happens to do that really well. And I think what we need to do is keep iterating and looking at the use cases and saying is IT able to do this without a lot of gardens because I think the thing is, if there's a lot of guidance, then you could have done at yourself or you could have automated IT very much needs to be the whole go to this plus, which is my sort of lazy A I man's command, where alliteration will just pace a block of craft from a terminal or something and just go fixed plus, and then clouds on a GPT ChatGPT for I will just know what I want and solve IT. And I think we need to get that level with the computer use where it's like, you just give vg, you know notions of what you would like done and it's like on its I will get to work immediately and .

gets the job done yeah I think IT has to get to that point where it's like a junior employee and then I will be White band official. For example, I had to fill in an insurance claim a couple of days ago, and I was chatting to the AI about how to frame IT correctly to basically maximize to claim, because I want to stuff IT up, right? And I had IT in that context.

That mine said its context was filled with the right stuff like you know after having gone back and for um and then it's a case of having to go to the website and fill in the claim right now. That is something that these forms like they don't want you to fill in insurance claims, right? Like they do not want you to claims, like the old praist form experience on earth from, can definitely do what cause just on that, literally.

And so what I would love to be able to do is say to the AI agent, hey, okay, cool. Now go filled the claim on my behalf and then IT just loads up the computer, goes to the site, fills IT in and then notifies me to maybe check IT when it's done. I think that that's a point where I would be like, okay, that was useful because feeling in that form was painstaking and knowing what to put in each field, I was still going back and ford with IT, to the point they use all this weird lingo.

So I shared my screen with that tab in the browser was asking IT like, what should I fill in for this? What should I check for this? And IT was able to to tell me which is suber useful.

But I would, I think the next step is just go do IT for me. Like, I don't care. Like plus, like very this. Luz, yeah.

And I think the other thing you've got to think about is the web brows is an interesting one because you could always do like chrome plugging and other browser automatic stuff along with the eye to to help do that kind of thing, right? Like that is technically possible. But think about these scenario where people are using like legacy systems, like people in banking, for example, they're all using like lotus notes.

And I think last time I chatted at lotus notes, someone was like really excited by this is for you um but they're using these legacy banking systems written in cobo and staff and they're doing IT through either vn c connections or like virtual machines and those interfaces to automate them or to build an API for them or whatever you want to do to sort of into great A I into that area. That's going to be so incredibly difficult because the people who can actually build an API for IT is so few IT might not even be possible or would have been done there. As you bring the universal A I computer use tool thing to the four, the work space computer concept to the four, IT can Operate the computer, use the virtual machine connection or the V N C connection, and actually provide A A way to automate those talks, like we're actually like this. This interface is universal. A P, I, which is A A human user interface, is going to allow the aid to be applied in so many more areas that are currently living in this weird legacy world.

Yeah I this is why I I wanted to sort of just honing on this topic in this episode because to me, when IT work, so when IT gets to a certain level of capability is the first time I think our world will be totally change. Because once it's like once it's able to Operate and do things and have a large context and the hallucinations are reduced to a point where it's like just checking in with you to make key decisions, then I don't know. Like what does that mean for certain jobs like jobs, certain jobs do start to become very irrelevant. What you can do these things?

Yeah, there's definitely a lot of sort of data entry Operator is basic like basic jobs where you're just moving one thing from somewhere else, doing some basic Operation on them, like a lot of the things you see on air to air task like a five and things like that. Obviously, there's some creative elements in there, but you're seeing the A I being able to do that as well.

Obviously, within this context, eat being able to then go off and use A I within A I to get and this for itself for generate content or use specialize models is gone to be another factor for IT. But I mean, would not that crazy far away from the idea of IT producing a bunch of synthetic data, training a new model, invoking that model, and they are using IT, and you can make all those tools available. I think, like you say, the issue is going to be around hallucination because as someone who consistently works with the models on code, the amount of times when I say working on the configuration ation of a program i've never seen before, and i'll say i'll tell IT my problem, i'll be like, oh, I can't get this option disabled in the menu.

I'd really like to disable this option. IT will be like, oh, just do the disabled menu setting here IT is and you go do IT you look at the dog like no such an option exists. It's just, it's just made that up sounds great.

Gets my hopes up. I'm like, oh, how amazing the his death have thought of everything but it's just the A I. Making IT up and I feel like you'd definitely going to be run into territory with the computer use thing where IT simply is trying to do do things that I can't do and thinking that is the right way.

Yeah, yeah. I think IT definitely needs some sort of like like, you know, and we'll get to a little bit light of the ground of truth in the sense that IT needs to be validating and checking key decisions and things like that. But I guess just thinking forward, if you assume these solvable problems, which that I mean, you have to believe they are right and it's going to train this way.

The question is like how loans that take is that five years and we have a self driving computer that can literally do all the house on a computer a human can do, is at ten years? Is that fifteen years at thirty? Are we way off a people going to IT is recording and be like, man, like you. We're on out death beds and I think it's tomorrow.

I my i'm willing to put myself out there and say that I I don't know about fully like everything a human can do. But I would say that if this is the first iteration, the the possibilities for this absolutely enormous. Because you think about IT, at some point, the application layer can simulate the other inputs to a computer as well, like audio, like video.

You know, we could have an AI avatar streaming in audio and video through the various virtual input device. You've got massing keyboard Operated. They can, and I can do them all at once, like joysticks, like what other accessory are their track pads, track balls.

Like IT just emulate every device that's ever been available, and IT give input for them all, like, you know, I could really imagine playing flight. Sm, that's gonna happen at some point. And the other thing we are missing here is that local models running on GPU are getting a lot Better. And people in the in the local lama community in other areas that are iterating on custom models, like imagine a smaller custom computer Operating model running on the GPU itself on the local computer with no latency or less latta cy um than hitting the cloud. Suddenly you going to get a much more rapid experience um then taking a screen shot, uploading IT to anthropic with all the other recent ones of blab ba and waiting for that process you go from say one or two seconds of lag, up to twenty seconds of lag to like hundred million seconds or two hundred million seconds which may be possible and the the you suddenly reached that order of magnitude increase in the ability of this thing to run yeah that's why I think locking yourself into one model is a mistake because I think that there's gonna be other paradigms for running this kind of thing that just work Better. And that is running IT on a machine itself.

Yeah, I think that's too time horizons for me, or too like milestones I want to see achieved. The first is basically being able to spain n up virtual employees or agents for yourself. And I think this is the sort of early steps we need to get there, right, like where it's useful.

I can go and do things in the background. I don't think you gonna watch IT. I think it's just going to be you have to do list in the and you like a computer, take these four or five tasks of my like you go to these things, check back in when you're done just like you would do in a workplace.

Um you know if you have people working with you or you work on a team like you take these task, i'll take this. Let's all reconvene later on, right? And that where is so useful that you're finding yourself using IT data to just get your job done in IT? IT vastly improves productivity and maybe eliminate some lower level jobs as part of that or IT just makes people wait more productive.

So you know, IT IT doesn't actually lead to any sort of real job laws. So I think that's the first one. And I think people will need to iterate and work with that just like that work space concept we talk about all the time. But then I think it's going to get to another point where it's just vastly .

superior yeah where you're choosing to use IT over consulting someone first up on a novel task that you can do yourself. I think to me that level to level one is okay. I I know this process like the back of my hand because it's my job and I can give IT very good instructions because the other thing that we haven't really iterated on yet, even us, is prompting.

And I feel like the the proms that anthropic gave in their examples were quite basic. I really feel like this scope to vastly, vastly improve that, including multis t learning where you like OK. He is a task guy.

I do. He's what that looks like before IT starts. He's what that looks like in the middle, like a half filled insurance form. And he is what IT looks like when successful, happy I have my money at the end kind of thing and that that kind of stuff can a vastly improve IT. So I think that there's a lot of scope to improve here um that that will see.

But that's gonna be in the in the concept I think of our ten x hundred x super user who's like, okay, Normally I get other people to do this task. Now I can get them done myself. Like you say, I give IT a checklist in the morning.

IT goes off, solves of my problems. That gives me time to go to other problems in that. Then level two is going to be okay. I would like to do this task, but I don't know. Have any idea how to do IT ask the I, I, I can do IT and you're pretty confident I can get out there and solve your problem like that's that's the exciting level, and I I do think that's coming.

So let's talk about some real use cases that I think in the short term might be interesting. I have a friend that goes into people's homes that disabled and figures out like how to fit out the homes that they can stay in the home, right? I think there's a name for that job.

Not sure what that is, but he he goes in, he basically takes notes and then he fills in this report for the government and that's pretty much his job to get funding for them to then go and implement to stop. And he hates that part of his job. And it's incredibly fruit.

The whole job .

yeah he hates the whole job. But the point of da ake is, you can imagine in the pretty near term, right, having a work place computer set up, having all the correct logging to the the platforms required, and then he comes back to the agent or dictate tes to the agent with voice and says, okay, here's my notes from that particular house call or whatever I did and it's like, you know, cool jeff, he's name is not jeff.

I'm going to use the name jeff cool jeff, i'm in a golf now and fill that and and submitted. You want to check IT before it's done here. IT is done.

Banging like that would be pretty game changing in the short term. And I think that with the computer use and having an actual computer environment that's close. So where is before? If you think about the old technique right now, it's like, okay, you go into your AI profit A I APP you ask IT for chunks of paragraph, you cutting in, pasting into, like say, a word template or a website or whatever. It's just doing that next step for you like it's .

just pulling IT yeah it's like I think about IT in a sort of chat bought uh support role like as we saw that in the early G A I shit where everybody released a chat what on their website. But what you spoke to me about the time at the time we were talking about IT, was the idea that you could give the agency, agency, like I could process refunds up to a certain amount IT could, uh, you know, answer stock inquiries. You could do whatever.

Now you could enable that for way more businesses by simply logging IT in and pointing IT at your various tools and having to have the agency to go off and actually do that stuff. And so that could be done even in scenario where someone doesn't have the budget or technical expertise to empower the thing to actually do stuff um and those kinds things, I think we will become really common where it's like, okay, this is a simple and repetitive task. But like the amounts are different, the the timing different, the actual product they're talking about is different or whatever IT is.

But the air is smart enough to be able to accommodate the particular situation and know when to escalate and things like that. And I think that's the idea is that you don't have to wait for some A I start up in every niche industry in order to be able to leverage the models. And I think that's the real thing because obviously I would be Better if everyone just built an A, P, F everything in a dedicated LLM solution for every problem that's not gonna happen. This interface allows those situations, those esc situations, where there is no budget to use A I for IT, to use A I for IT.

So so how do we get there? That's my next question is like what is left? I mean, we have literally been working on this every night for two weeks or a week.

I don't know time goes so quick, but you know, like we would become obsessed with the idea of, like what do you actually need to get IT to be able to do this? Like first round task, where people are getting actual value out of this. And that is to feel .

life changing. I think my natural instinct, my natural answer, is iteration. You've got to actually find real use cases, wait for them to fail, work out why IT failed, and then give you the tools and capabilities to overcome that problem in the future. So, so far, everything i've run into has been fixed ble.

So for example, clicking the wrong icon or something like that or try to open firefox when IT is an installed? Or uh, what's another good example of something where it's filing like, oh, I couldn't back space correctly or IT was trying key combinations that weren't translated correctly to the machine in order to Operate. And so there is like this sort of basic early learning stuff that just needs a little bit of special casing, a little bit of john comex style programing where it's like if this exact thing happens, do this and my thinking improving the tools in that way.

And like for us, we have the platform of symptoms. We're going to let hundreds of people use this, get feedback and interact on that. We're using IT ourselves intensely.

And I think what will happen is we will build up this, this tool that has the ability to, like, I don't know the right word, but IT sort of like, I think in parliament is the right way, like we need to empower the A I to overcome all of the little issues that hits and all of the little corks IT has because of deficiency. So you mentioned to me a great one. We were setting a desktop background for the fit, the seem theory workspace computer.

And I did some futuristic fifth element style space cape thing. And you're like, uh, yeah, that's gonna confuse the shit out of the A I on the screen shots. Let's do something really thanks um let's do really something really simple and basic high contrast unlike that makes sense.

And I think this is the thing we need to facilitate and give you the best opportunity of success. And part of that is going to be like I actually include this improved sometimes in the past when you face the situation, you did this, but that was wrong. You should have done this and just have a whole bunch of rules like that you're feeding into IT so IT can overcome those mistakes first time and get further down the line with everything.

And I I think also that, that also means less into IT, less initiation or less cost, more speed. You're more inclined to try for more things like partly. It's like can we get traction for the workspace computer concept? So more time is spent on IT. Like look at us. We are literally speculating, saying we think this is the future.

Should we spend time on IT? Is this something worth while spending time on? Or or is this just the fat of the wake? And to me, getting uptake and getting people actually using IT for real stuff is k to its development because he could fail if not enough. People are actually interested enough to be using IT and therefore running into the scenario and therefore giving the feedback .

and improving IT. IT just feels like every other thing we've seen in A I like know something like open interpreter had this computer control we demoted on the show like forty five. So a goal or something like that, right? And at the time, we just sort of dismissed IT as this is so far into the future yet like we not even close as all these other problems to solve first, and it's not that useful.

That was our sort of you know hot take at the time, right or wrong. And then I think what was seeing from these lives as they sort of go away and and you know really work on these concepts, like they take some of the best ideas from the open source community or the ideas of just people that um you know I kind of hacking on this stuff and then they make IT accessible via API. They basically distributed to make IT slightly more useful.

And you know, we thought was seeing that this week as well, would search like originally, if you think about a perplexity out and said we're going to a build a search engine on top of the eye, right? And everyone else has seen this starts to succeed and say what we we need to add search as well. Um and before that, I think in line chain there was tools very early on to add google search to to the the tool kit of your I bought like you know not that long after the a GPT for launch.

So to me it's just a case of like do we need like you know how like how far like how many more iterations are we? Like do we see an OpenAI model come out soon where if taken this concept even further and then IT gets to a point where everyone's eyes open? Or is there gonna be a in between phase where we're building tools and trying to make this stuff Better? Or just IT just rely on that core model?

And I think based on what you're saying, I think the implication here is these guys don't know, and I think that's what we see in the media a lot is this idea that OpenAI is like all knowing about the future and they just drip feeding us poor peasants with stuff as they see fit um and all that they've got, it'll figured out.

But I actually think if you look at their product releases and what actually comes out, and even I include anthropic in this to some degree, like you say, what they're reacting to is a everybody's mind awakening to the possibilities as they use the technology and then expLoring some of those concepts at the model level. And the other one is simply copying the best ideas from the the community and realizing, hey, this is the way to do IT. We're gonna move in that direction because almost everything the major A I model companies have done at things we discussed many months earlier.

I'm not saying we had anything to do with that. I'm just saying that the sort of side gust of people thinking about IT and trying IT is leading to these new ideas. And then they have the budget and ability to go explore those ideas and implement .

them in concrete ways yeah and I I mean that's a good pivot uh to search GPT or search GPT chat sorry, ChatGPT search now called uh so OpenAI obviously today introduced ChatGPT search where you can click in ChatGPT search on the search button and then IT sort of switches that over to search mode.

And I think what is really interesting about that, to back to your point, is we like we had perplexity come out, you know, quite some time ago, right, and say they're na take on google with search, choosing A I and I think that they're doing a pretty good job of that, right? But then OpenAI had obviously seen this success and said, O, K, but we're onna do search now too and IT IT does seem like, you know, like when the iphone forced in apple open the APP store and then eventually they slowly just took out the best apps and them in the I O S or clone. IT kind of feels a little bit like that, but IT doesn't sense for ChatGPT to have search and it's taking quite a while to do IT.

And I think a few things on this. The first is all of those deals they cut with publishers was around this product um and figuring out how do they essentially not get a tonto lawsuits going after them from these publishers. And that's the problem applies facing right now. They are getting a tonto lawsuits from people saying, you know, they're stealing information or misrepresenting IT and things like that. So .

tragedy in .

the search is a available I did put IT to the test. To the test a little bit and and had to play with that. My first search was, uh, trying to be as controversial as possible with trump polls, said this is like the total like plus command in here so I put in trump polls and I got back, you know, a summary of the polls as a series of links.

IT does new york magazine. IT seems to really like new york magazine. I got this source quite a few times.

Never really heard of new york magazine. Uh, but I guess theyve got to deal with them so that they are putting them out there. So you get your sources.

You can click into more sources as well and see like a series of what you would call blue links on google search as well that goes through the search results and has the links as well. Now I did also do the same search in complexity. And to be fair of perplexity here um i'm i'm not even authenticity or logged in here.

I just use this sort of vanilla um the vanilla free search out of the gate. So IT has the sources up top. IT chose similar sources really um IT features new york times BBC and then this projects five thirty.

What was interesting about that is he'd actually presents a chart of the poll movement over time. So IT has the general election polling for trumps yon looking into that data, which is pretty cool. And then IT goes through national polling battle ground states, electoral college outlook.

So it's pretty comprehensive like if you were doing actual thora research and you didn't just want a quick answer to the the, the the question that you asked that I think complexity would be you pretty superior here to ChatGPT still. Um but it's a lot of rating still and i'm just not sure like it's the best use of A I search like I would just like one line like you know it's on one line summary. I don't really want to read an as I on IT.

I also put IT into theme there. You just to compare a our search feature as well. And IT returned here the latest polling results regarding kala, Harrison, tonal, trump, IT went through the three key battle ground states, and then finally have finished on the national average. So I actually thought, if I was trying to objectively compare answers, I probably give complexity the most comprehensive. And 因为 objectively.

sym theory was the best, and that is total totally .

in nobody. I would actually give him every second if i'm being honest, for this particular search, because IT just summarized what I wanted to know or or what you would expect to know whether we will chat to be tea IT really expects me to RAID like like a lot of words. I can't estimate how many words this would be but like three hundred words or something.

one of the reasons I use the um o one model so seldom because it's like I don't need to know all this crap like just give me the bit that I want don't give me a ten page explanation about why and like I know that that sort of part of IT IT almost has to write the explanation to give a Better answer if you know what I mean, IT just doesn't know when to shut up sometimes.

Yeah I think about obviously they're going to bring the chat G B T voice capability to this search, which will be a really cool experience but it's like I don't want this much information being read back to me like I you know I kind of think I should be tune where IT gives us a brief like a small answer, and then you can be like, okay, expand. Like you unlashed the hands on this thing. Give me the full, full answer.

So the second career I tried was, uh, best sushi restaurants near me. Now, all apps seem to fail at this in a way because they don't have my location. I guess they are just basing in off IP and my IP addresses routing through serves in sydney. So IT thinks i'm in sydney and specifically downtown sydney.

downtown. yeah. Ah wu, ah so sydney. Both of .

viBrant sushi, seen with numerous exceptional results here, some top rated sushi establishments you might consider pretty cool interactive map in the u. Um you can scroll through IT nicely. Having visited a lot of these restaurants, I think the ordering pretty good as well.

I was quite impressed with that. I'm not sure where they piped that data from a mushroom ing being everyone s favorite ite search engine. And then below that, you can click in the sources and you get this side panel with all the different links that kind of use to bring up those result.

So pretty interesting. I then you compared google to IT, and I don't know, i'd still kind of trust google in this case. The google maps product is just I think vastly superior. We've got all the reviews like um IT just still seemed Better to me um but when you do that poll uh surge with the uh with google, naturally that when are these kind of search thoughts to feel out dated IT just doesn't give you an answer IT just gives you like all top stories and then just a sea of blue links and it's the first time comparing those searches on like this is this is just so out of data this point like I don't want to do that anymore and I probably weren't in the future. You know.

to get me the most like applied knowledge in a way like a lot of the search as I do, uh, how to do something and it's like, well, if you're gonna tell me how you might as well be specific like how do I do IT in this exact context and that's something a google search will never be able to do. Yeah so I don't know.

Like like they've also got chrome plugging out so you can like make this search GPT or whatever they are calling at the default arch engine. Like do you think can you see yourself like going full on A I search at this point? Or is there still times you're using google?

I mean, to be honest, I only really now go to search engine if IT is like a restaurant or something like real world dish or i'm desperate and I can't solve my problems using simpler with whatever a model i'm currently selected on. Like it's very rare that I use google for anything other than like finding a sporting field or something you know concrete .

and real yeah I can use IT to find websites like is like don't know that you are else. I just literally put in the address or just because .

the address by like automatically goes to google if you don't type the U R O, right? Like that's probably accidental, such as a probably my number one thing. And I don't think I would use IT like a search engine for serious research anymore because I I just don't I fundamentally just don't trust the results. I think that it's political, it's biased and it's corrupt.

Yeah I mean, all that stuff yeah sleeping all that aside, I don't know. Like the thing with me is i've tried complexity for search like those other searches and I find when I search, as you said, it's generally for localized information, right? Like things near me, the weather, stuff like that ride. And I think right now, google just has so much Better knowledge of that world and those search queries es, that are just so common to me that I still prefer IT for those things. So my brain naturally will just go back to IT, but definitely for learning things or figuring things out where with code or something like that, what you might have search google before that is just completely and utterly gone for me and fully replies.

The barrier, I think, is going to be them getting people to install IT on either their phone or their brows. Er because really like data day, if you do decide i'm going to a search a search engine for something, you just gonna type into whatever boxes available. Ty, right? Like you're not going to go log into plexi.

Shit, what's my past? Where did I sign up with google? Did I sign up with whatever I go to log in OK, now I got the search box. Now search like no one's gona do that. I think on a regular basis, unless there excited by the novelty of A I or think they're getting vastly superior results because of IT and are willing to take the time to do IT like I think the people who are gonna win on mass will continue to be the large tech companies because they control the platforms like apple, google, maybe microsoft in terms of deaths.

P IT got me thinking, notice google need at this point and A I search subscription like a view with them, or like, if I was them, I would just go and clone this, this version of search, right, and call IT like google search plus and just charged for a .

yeah I think what they do need to do is piece off, sorry, a battle this where um with all this passive A I because I think the passive A I stuff is annoying and IT IT IT breeds mistrust and frustration. So for example, now like on facebook and I think instagram, at least facebook, every time there's comment, it's like, common is thought. Common is generally thought. The dog was very cute, and this invoked the feeling of family together.

This is like, come on, I don't like, i'm not like I can figure out for myself what people generally feel about IT and also like if i'm browsing facebook, i'm not exactly pressed the time I desperately need to know what everyone thought of like Haley berries latest summer wim suit catalogue you know, like I don't need that can send this opinion written by an ai proactively and it's the same when google did IT with gami. It's like, I don't really need your, like, I want, I want, if I want to R, C, A, I, i'm gonna C, A, I. I don't want you to do for me .

in this sort of condescent why?

I think that's what they wanna say, is we in a body in the light? Oh, we've got a billion uses of A I A day. Yeah, because you didn't .

give him a choice. Yeah I think that's why, like notebook L, M work, because they just had a team of people going, oh, how do we like do the best rag for people where they just start with files, do some rag and then you know, the podcast trend, which is kindly interesting, but I think was also like, you know, people got exile about that. It's not like something you just constantly out producing podcast. Please stone, please stone. I only consume .

information in pod for and I listened to IT on four time speeds so I get through pretty quick. But i'm good shopping like thirty six hours of content a day in.

That's what i'm so but I I do think that's why no book L M worked right for them because I was like organically built. They didn't have the problem of Harry ina alike. Take an existing product that people are already familiar with, then jam something into IT. IT was like, you can stop from scratch. So IT feels like the best play here would be too create, like google search too. And you know, just say the people like if you want ads, we're going to put ads in a and your I S gna sort of you give you whatever take you on on the information if you don't want ads pay and then just give you a clean search experience, completely integrated into everything google that you're already using.

And I think it's kind of getting that point where they're going to have to do this at some point because there's more consumers find out that you don't have to like troll through or see a blue links, which are mostly ads and you can just get answers and a great voice experience for search like it's over, like slowly but surely IT will die, like IT will take years and years and years for the right. yeah. But the thing like .

they still have the platform, and I think you're right, they will have to have an offering at some point, but they um they've got a little bit of time up to sleep. The other interesting thing I found about the search GPT is the fact that you need to specifically enable IT. And this is something we've talked about before, where in synthetic we won.

We did this sort of auto thing where, D, A, I would sort of figure out, oh, he said i'd like to see an image of this or um you know i'd like to find out about the trump polling or whatever IT is and IT would decide okay, i'm going to do a search to find out about that similar to the early tool use cases. But what we ve found was that a IT often LED to IT doing things you didn't want like I don't want you doing a search every time I ask a question um and also I don't want the lag if IT having to make the decision before IT proceeds with getting me an answer quickly. So it's interesting that they also chose to make you specifically press a button to make IT a search rather than using some sort of routing method to decide this is a search or this ison.

I think it's just so bad because otherwise the A I will lean too heavily in a search and it's always doing a search and it's slower. You're getting a slow response. It's putting random sources into the answer.

And you like, I didn't really want this. Like I just want that A I ants, a version of IT of like a pretty established way of doing something. Where is yeah I think when you invoke search and you tell IT no I to surge, it's just a much Better experience and they've come to that conclusion clearly as well. And I think this is why it's so hard to get function calling going and why we haven't seen function calling would say the GPT voice uh yeah because it's hard like the function calling is hard to do right and hard to figure out which function to actually call like what is the user actually asking .

for and that is really what the the desktop of the computer tool thing is because it's a whole series of chain tool calling. But I think that in that context, that actually makes sense if the a is left to its own devices, that absolutely should be told the full arsenal of tools that has its its disposal and decide when to use them. But when it's the user, I think the user should decide. And I think that's what we're seeing in this update.

right? Let's move on. So earlier this week, microsoft, github. Microsoft obviously owns github, which is the repository of pretty much all the code in the world. A struck a deal with OpenAI as rivals I know with .

rivals .

uh google, uh gm, so google german model and also anthropic senate model to use in the copilot product. And so for those that are unfamiliar, the giove copilot product helps develop us write code or do things like auto completed also has an inbuilt what you can add some files and don't be like, how do I do this? Uh, IT writes a lot of code like google came out this week their earnings and said, I think a quarter of the code now is written by the AI models internally. I struggled to actually believe that because german I .

is just that and probably because I just produced for both output that lines and lies but don't really very much .

yeah but this kind of took me by a surprise at least because you've obviously got that really strong partnership between microsoft open eye like they are in bed together. But what happened, at least with developers, you know, over the last, I guess, three months or four months since on IT came out, is developers of just realized it's Better at code, it's Better code completion, it's Better to sort of work back and ford with that at at most coating tars. And the latest tune is even you know, far period than the previous.

So what was happening is you had startups like cursor, which is another I D E that uh people were starting to use, which was just defaulting to claud sonet and using cloud sona and then giving people tools around writing code, uh, with that particular model. Now I think what's super interesting about this is obviously microsoft with copd has realized that people are just gonna go to curse a because devices are willing to pick up new tools quickly like they obviously prefer the cloud on that model. Maybe they saw a drop in copilot usage or they didn't like what they was. They've gone and partner with google and anthropic to provide a germinal model, which I think honestly is just to be able to say to the OpenAI guys, we're partners with everyone cheating with .

multiple paper .

yeah with multiple wait, it's fine. But I think what's really happened here and everyone, I mean, this is such an obvious conclusions and tropics models Better, it's Better. I honestly, i'm putting Better than like o one preview, o one mini, I eats just Better for this particular use case.

And so I think they've just had to admit to feed on IT and say, look, guys, they ve built Better a model. We this is a huge business for us now. We're just gonna with the best model.

Yeah, they can't risk IT. And I think that's the thing. People, when they're doing real work, they will use the best thing. They are not it's not gonna about loyalty or or being a fan of something. You're gonna do what works for you yeah .

and this is why, like honestly, would ChatGPT it's like as a power user vii, it's like they can release as many beaches like search and voice in all this staff. And it's like at the end of day, I just wanna I want to pick between different models and always have the lightest and greatest model or switch between models. I don't really want model lock. And I guess that as they build these ecosystems around their models, that kind of what's happening similar to apples ecosystem, like you're getting locked into a model vendor that may not necessarily be the best level of intelligence for the task that you want to do. Yeah.

exactly. I mean, I use copy ed a lot, but I basically use that as advanced auto complete like I sort of start typing what I know needs to come next and do IT. If I actually wants to generate real code for something significant, i'll go to symptoms and use one of the models there, usually clad sonnet, sometimes others depending on the problem. But um i've never I don't think i've ever used the copilot chat feature where you actually talk to the model directly in there.

Yeah I mean, I did, to be fair, try cause for a little while and there's some great features in that product. I just found myself though wanning some sort of like separation between my the the AI outputs I was getting because I like sort of tuning an agent and then getting IT into a scenario to answer the question right. And then i'm always checking the code of matter anyway and sometimes rewriting IT.

So I find myself, I like just doing big in insertions or replaces even though IT shows me what it's changing, I don't know. I just didn't work for me and maybe I need to put more effort into IT, but I definitely still prefer like stand alone. Like this is where I conversing with my co worker and this is where i'm doing my work and maybe that's just .

me yeah and I think that's the other point is I understand that when you use copilot, IT understands the concept of the surrounding code, which is useful because if I can know variable names, I can know things like that.

What I find the difference with using IT and say, sim theory when i'm working with an agent and i'm not isolating this to sim theory, i'm just saying like when you're working with A A sort of chat style agent, ongoing conversation is IT gets to know the problem. You're solving the things you've already tried. The previous generations IT really gets the problem Better.

And I find deep into a session like that, I treasure that chat session because i'm like, we are so close to solving this difficult problem here. And then I can be lazy with my prompts ago, oh, OK. Now I need a method to do eggs, which is just some small extension.

Like to give you an example with the workspace computers, like I want to command that he's gonna list out the status of the current workspace puts and IT knows exactly what I mean by that and just prints out a copy pace method. I can just chuck straight in yeah, valuable. And you just can't do that with copilot like you.

You can't say, hey, write a method to do this. I mean, IT might start IT might write a couple of lines of IT for you, but I just won't do IT right now. And I think. Maybe IT will was on IT and that might be an advancement. But you just still wonder if the actual, uh, interface is right for that kind of problem.

Yeah, i'm not sure i'd be interested in the commenting people that do right code, which I know a lot of our listeners do. If you do prefer working in the ID, like having everything in there, like the chat and everything or if you still going to like ChatGPT or or whatever to to work work in that in that scenario because i'm just like I don't know if i'm just biased or like or if because I started doing this so long ago, I formed a habit around IT like this, just how I now work with A I went voting. So i'm just used to IT and then going in trying any of these other ways of doing IT. Ah you I I don't know if that makes sense even with copilot, then i'm like ah i'm gonna hit like a limit or like you know I don't really know know how it's going to handle all the context, but IT doesn't me think we really need to get outside of context trimming and prioritization concept out into the world at some point soon because I I really think the more we talk about this, there is a sense of just yet controlling the context is I think going to be the next at least short term innovation that helps get more out of the models.

Yeah, exactly. It's all about that. We know that the models are capable of so much. And it's really a matter of you getting IT the right information at the right time and doing IT in a way that's effortless because it's it's very it's very time consuming to get all the pieces in place. And that's why I think when I talk about a session that builds up like that, you've built its context over time and that's why you went up with something so valuable at the end of IT. And I think any work that helps facilitate people getting into that state, either with less technical expertise or less time and effort, will be valuable.

They also announced git hub Spark, uh, which is A A product is very similar clads auto facts, uh, so you can essentially go in and get IT to create you like a mini APP. I think that the difference here is that it's not just open chat like that. You've got tabs like for the theme of IT, and you can say I want round of corners.

I don't so IT just makes the work a lot easier. So and IT also has like a little data base behind IT. So for example, one of the things i've got up on the screen now is someone's built an at for carioca, and people can say if they are attending or not, and it's like a custom page to go do that.

It's pretty cool like you could imagine as these get easy to ride, like with a halloween in my neighbor od lost, not like you could have an a little apple t where people can list if they getting involved or not, right? And and you've got this like a little APP that everyone can share around. So I can see this being pretty valuable.

We would get hub. I'm not sure what their reasoning is, but behind building these little apple lets and putting them out there. But it's A A pretty good feature IT. IT takes care of everything for you. So like hosting data abase back in and stuff like that.

But that's impressive. Yeah, that stuff is what is needed to take IT from just being a novel to being actually useful.

yes. So and then you've got like this cool version control you are on the left and side where you can go go through IT. Um I think feel like people that are like semi technical and just want to go build little apps that they use that are useful to them.

It's a great concept and will be very useful. I can't really imagine this becoming like a mainstream thing where everyone's building like little random as but you know as these things get more advances, maybe good hubs, Spark is something in the future. Your your really Spark ideas from that sounds like a pay promotion at this point.

Spark your ideas would get up copse b to them whatever.

What do you make this like like we get towards the end of the year here and kind of wrapping up this episode, like to me one of the most exciting things this year has been obviously the closed on set model. I think that was really just the tune of IT is is such a great model. I think it's kind of been interesting around o one mini and o one preview.

I'm curious when we get the the final o one, which I I yet the impression will be able to do multi modality in generate images into A A whole bunch of stuff really well, you if that all change things, are we still preferred like a sonet tune of the model? Or is there some sort of step change function coming? But IT does not seem like everyone is now just building features. And if we were really close to agi like these guys spout of the time, why are they releasing you a search or or a plexi clone in the back off of the year like IT doesn't add up in my point. Yeah.

i've sort of got two ideas. One is that there's two major factions in the company. One is the sort of model building and the other one is like we've got to compete with the big tits like and therefore, releasing search is good.

Like I think my opinion is that the most likely is there's two parts of the company. One is the commercialization part, which is what was seeing most updates out on. The other one is the the guy's iterating on models because I just think about what the iteration cycle of building a new model is like you screw IT up.

It's like OK. We just spent three million dollars in ten weeks and IT failed because i've got to put a semi colon online seventy one um man, you know I like I could be like OK sure it's probably a higher level of technical difficulty than I can possibly understand. But um I would imagine that something like that and all you just you get through the cycle and you tried and you like, you know this is this is a very good like we need to go again and it's just time consuming and expensive.

So I could be the Better models are coming. It's just it's just longer than I expect. I mean, we you know we experience that ourselves.

You try something realized, uh, like the work space computer, for example. The first way I tried to do that was a fail and I had to redo IT. And um I think that they're probably experiencing a similar thing.

But I I do I do agree with you that I don't think this is the case that open the eyes just sitting on all this gold and they just trickle IT as when they choose. I think they're working as hard as everyone else trying to make the most of this this boom. All right.

So if you are interested in using the workspace computer, we said we will drive go out this week turns out a little bit harder than we thought. So we are aiming for early next week. So stay tuned on that.

You yeah I think it's it's worth mentioning. I'm pretty confident about that now because like we're using IT, um it's really just a matter of making sure we can make IT available to everyone in a way that's going to work immediately for you.

yes. So you are interested in that. You can uh, sign up at the don't AI and you will get an email announcement when that is released. Any final thoughts on the week that was Chris.

My final thoughts are work spaces been working hard on that um really and you know I made the comment to you this morning that i'm actually excited to get a live just I can play with IT a bit you know like I feel like i'm out there building IT, but i'm not actually .

getting to use IT so much yeah it's very good content .

for next and I I think I promise someone on um on the disco that I would try to get IT to format my hard drive. So I I promise once it's ready, i'm gna run like a local version on my actual PC and give IT administrator or previous and just see how much damage you can do. May i'll record IT, but I might record IT with like a phone camera in case that actually starts to destroyed up. I might be maybe I should actually give IT access to my email and encourage you to send .

high risk that would be Better. I think we give IT access to your email and yeah and do like get IT to try and do something that would be upsetting.

Yeah, yeah. Okay, all right, i'll i'll come up with something. You know, i'll give you a good crack.

All right. Thanks for joining us again. We will see next week.

EP83: Self Driving Computers, plus SearchGPT, & Github Copilot with Sonnet

This Day in AI Podcast

Chapters

Shownotes Transcript

EP83: Self Driving Computers, plus SearchGPT, & Github Copilot with Sonnet 01:03:30 Share

This Day in AI Podcast

Chapters

Shownotes Transcript

EP83: Self Driving Computers, plus SearchGPT, & Github Copilot with Sonnet