EP84: It ACTUALLY works!

2024/11/8

Chapters

The hosts discuss the capabilities and potential of the workspace computer, demonstrating its use in completing training tasks and automating work processes.

Workspace computer can complete training tasks autonomously.
Technology is best when guided with specific instructions.
Early demos of cloud computer use were less practical compared to current workspace computer.

Shownotes Transcript

Translations:

中文

But like i'm literally watching our ice screen right now and it's sitting diligently watching the video.

It's watching all it's like I can't get you.

I watch this video so let's resume watching the video and so it's just sitting there watching the video.

You ve got to teach IT the trick to turn IT up to two point five times spade.

So, Chris, this week I discovered that being lazy pays off. You want me to show you why?

Yeah.

i'd love to. So, as you know, because you can kind of see IT in your face, your unshaven look for both unshaven looks like I A shit. Sorry, sorry, he look like I would.

It's because we'd been tried to get our workspace computer project going or week and IT turns out IT is now ready to go. And so I had to do this, uh, hippo complaint trading. And i'm probably going to get myself in trouble for this.

I promise after this, i'll do IT properly. But I thought i'd put my AI agent with my workspace computer to the tests to complete the training for the very first time now, before we actually hit record. I started to do this to see if that would work, because obviously everything's pretty prepared with A I demos, and we need everything to go with.

But I thought, why not? Let's let's bring that up on the gray. Now I need to click on computer use in symmetry and pace my prompt them, which is to complete the training.

Now I have cheated a little bit, but that's the benefit of my work space computer. Here I was able to go in and actually set up the work based computer logged into the training course that I needed to do, and then get IT going to. Here you can see the first question is now up on the screen, true, false.

By following organization policies, you can help prevent security instance and data breaches. And so now my my AI agent is literally select the two. Let's see if it's right. It's going to probably click submit here and see will IT got correct. And so now all we record the the show, I can just sit and in the background do.

And this this is trivial. I've done these crises. And I think this one, the true force, pretty easy, where it's like, hey, should you follow the policies now? I don't think so.

This is how to, like, choose all.

But yes, some of them, some of those crisis can be difficult. And having a work space computer just doing that for you is pretty valuable.

It's onesta playing around with this the last couple of days, I have been a few different tasks that i've found pretty useful. This is by far the most beautiful. I don't know what hate doing this kind of training and compliance stop and and .

get a good example isn't IT of using this new concept called the work space computer? That's what we're calling IT anyway and helping IT along. When we spoke about this last week, the idea of you didn't just say, hey, go to my training, you logged IT in, you put IT, you oriented IT got IT in the right spot and goes, okay, now complete this training for me.

And IT seems like at this stage the technology is best at that. You you turn IT around with its blind fold on aimed where you wanted to go. Go all right here, your task.

Go to IT. Yeah, I think that's the interesting thing about this. Like we saw the early demos of cloud computer use and they had you know a virtual linux machines went up and no, no offense to linux.

I just don't think it's that practical. Like not many people use linux the way. I'm sorry to the two listeners that use links and be it's probably .

a large amount of allisters, but I think people would widely acknowledge that the linux test stop is never really worked.

Yeah so I like what's beautiful about this is you've got your cloud computer. It's yours in the cloud. Ah it's a windows a eleven machine and you can switch between this view mode and controls.

I'm viewing the computer doing my work right now, but I can switch over a control mode and set IT up almost like an intern like you'd set IT up from intern and say, hey, um i've loaded up these apps onto your new computer, new employee, can you go out and and do I look at that one hundred percent? I got in mind a new Price crushing IT. I'm really, really crushing IT here.

This is insane IT is any students listening? I think they're going to a to people like online tests. We have just destroyed all online test. Now look at this summary. I've completed the training as you requested. So I think the next step, this is obviously background task, are just like setting and often like five of these toys in the background ah during my work day, I go do these five things to me, maybe spending a bit of time setting IT up and then just, you know, letting IT go often and do the job here yeah.

I think the idea, I mean, when in this case right now you can only have one workspace computer per simply ory account. So the idea, I think early on would be you'd need to queue up those tasks because obviously, if right now, if you ask you to do multiple at once, they're onna conflict to one another. And in fact, what happens now is if you start a second hospital, cancel the first one. But I do like that idea of giving IT a list of toss and then it's able to go through them sequentially as each one is completed.

IT goes to the next one yeah like curing them up yeah have to say like you know how we always talk on the show. And we're going to cover a little bit later today the new flocks upgrade, which has which is on sim there as well. So it's got it's flags, one point one pro, but there's now this ultra and raw modern and we can talk moderate later, but we always have fun when these new image models come out.

But you know in phone calling and all these other things on the show. But computer use where you can actually log into things and get IT to do like real stuff. And as we saw with the original uber example a couple of weeks ago, like, you know, high risk, tough as well.

This is truly the most fun so far i've had with A I. But what is also interesting about this, it's changed everything in my mind about the future of of how this kind of function in the workplace or in education or in various places because you now have these self driving computers we talked about last week. That is, I think, going to be very, very soon, extremely useful to golden do these treachery house that you hate doing.

I must say, after using IT a bit. Now, my thinking around what is capable love has expanded visey. It's amazing how often i'll set IT about a task, forget about IT and come back and there IT is the task is completed, maybe did a slightly round about way of getting them out the way I would do IT, but IT gets there in the end.

And it's quite amazing that you and I have a huge list of ideas that we want to add to this thing to give IT more skills, more abilities to be able to navigate the computer Better. But even at this this early stage, it's able to get quite a lot done in IT really is a totally new way of working. This idea that you can actually set a machine off to go into real stuff for you with almost no limits.

Yeah I think even just doing that training and like i'm literally about to do another training in the background here because we've got about three of them over two. So i'm i'm literally setting IT off about IT. This is actually real.

This is not even made doing IT in the sense of of the show. I've also does some other hyrst things like i've logged into my actual gmail account and fully authenticated my brothers. So um i'm doing things like composing emails and and I I haven't tried IT yet actually but I want you to read emails and then add my calendar, for example. Different task now I don't think any of these of that practical yet, to be honest.

I think it's to me the most practical use case right now is that jury work like having to like go through this training course and fill in some dumb quiz or fill in some long form on a website that you don't want to do or data extraction, like where it's like go to these five websites and then in excel or like no pata, whatever create like I see as we format and dump IT IT in now it's not terribly reliable, like sometimes that stuff s up. But as you said, like with some good tool use and like like basically, I don't want to say fine tuning in the model sense, but fine tuning some of these use cases as a worker, I think we are so close to getting a quite a productive work space computer. And then the next step for me is like you could spend up literally a hundred of these things for an agent and I could do like in sign amount of work yeah.

it's kind of amazing. The way we've done IT in symptoms as well is that incorporates into your workflow. So if IT is a point where I actually needs your input, IT will stop and the regular sym theory will say, hey, I can't continue without knowing these details.

Can you give them to me? And then you can give IT that input, and then I can pick up where I left off. And likewise, like you saw there, I can give a summary of the toss that it's done, which is then incorporated ated into the chat. So in the case of things like doing research, you can then continue on now having that knowledge. So it's a little bit like interpreter, except it's far more capable in terms of what it's able to do.

And Ethan molik during the week had this article the present future ai's impact long before super intelligence. And he basically in the article outlines, uh, how in the interim, before this, you know, super intelligence point that everyone has different definitions of, you can really start employing a agents or or computers so you can really see how you can start to be used to everyone becoming like a manager or a coach of the A I. So I think we are nearing a time where our productivity is going to increase rapidly because you can just imagine, as we said earlier, like creating a list of toss that you've got to a do during the day or get done and just outsourcing those task, knowing full well these things just gonna go off in the background and then check back in with you.

Yeah exactly um you gave me a crazy idea this morning that I can't believe I didn't think of, which is actually using sym theory within the work space computer in sim theory. So um obviously it's not gona work if you try to use the computer within the computer. But I mean I could we could have multiple computers and have the one computer Operating the other computers like that's perfectly possible.

And this is kind of the the A I insurrection. We all predict that. I just didn't predict that we would be at the forefront of making that happen. But I am going to just as an experiment while we're talking here, i'm going to get a my works space computer to use theme theory to make A A flux um ultra image and just see if that works yeah .

you should get a few examples of blocks images fully created with IT sitting and seem creating this is great. I can do all of our like show planning now as well.

Yeah that's right. And I think people under this day AI this code have shown that we can also be replaced in terms of voices and stuff. So out days are numbers in terms of our actual usefulness.

I know IT like its early days, but we had a conversation as we started playing with this. I think IT was like tuesday night this week where our minds were totally blown, like we had this moment where we were like, oh my god, this is, this is there's something he like that once this thing gets Better and Better, and we give you more and more tools, you know, IT could change everything.

Like I know right now, it's very experimental and basically a toy, although as I just prove, you can, you can do, you can do. You can like, do some treachery with that in the background, or juries are in the background like that, that sort of fiction stuff. But what do you think the impacts are like? Like in the next year because we know google, we know like i'm sure open a eyes, got something up their sleep, and we also know we can apply existing models. I think that microsoft, like paper during the week talked about how I was able to pinpoint i'll get the name of that in a minute IT was able to pin point on the screen very easily like the the areas where like click and and if you could run a model like that locally, this is gonna way more accurate and way more focused of like what do you think the impact could be?

Well, he's the thing. I think it's still very much emerging. Like have a look. I just sent you a great shot. I don't know if you can get IT up during the pod, but I literally just got IT I logit into sym theory that's IT. And I said, use the new flux model to make a beautiful beach scene. E, with blue water and the text workspace computer overlaid in a clear redial font, right? And IT did IT IT used flocks and made they go, made the image wait.

used the skill as well. I used .

the skill yet to create the image. And you know, we're talking about a system where I just like we just talked about that and I was able to, with just the assistance of made logging in for IT, right? Obviously, i've got two factor and different things it's not gonna be able to do yet.

But I was able to never use sym theory before, doesn't know anything about IT IT was able to successfully Operate IT, enter the options, scroll down, put IT in, generate the image, and, you know, were not very far away. And I know this because we disgusted of, you know, full duplex files, like you can sink your files to the machine, the machine can think back to you, and because it's a windows machine, you can just do that anyway. And so I actually think the capabilities of the model, even as IT stands, are far more than we realize because this is with very, very minimal prompting.

This is with very minimal iterations and feedback from our community, which are very much hoping to get the and and like you say, we've got things like segment anywhere. We've got the microsoft project you mention before what you demonstrated to me, where I can identify tiny little button in every element on the screen with textual descriptions, which are ideal for A I model. Now I think anthropic is probably doing something like that behind the scenes in their computer control model.

However, dedicated models like that running on the machine itself mean several thing. So I know on a bit of a rent here, but one of the things that slows down the process right now is the fact that we're iterating hitting up the model of the time with the latest screen shot. So IT knows what to do and where to locate IT.

However, if you can run a model on the machine itself that's able to do that basic screen segmentation instead, you can do a lot of the like the the A I model can be deciding what to do, but the actual execution of those task can happen locally on the machine itself. And we're already doing that to some degree, but that would take IT to the next level. So i'm thinking personally, this is gonna explode like in the next few months, two years, where we see a lot of jobs replaced by automated computer toss like is like, seriously, I mean, it's it's really good. It's like it's i've got the goose bumps again. Like when we first started with the A I stuff really like, well, this is unbelievable that I can do this .

yeah like we were and I know like a lot of people that use them, they will probably be annoyed me saying this. But we would really close to getting I think are quite compelling voice interaction product out, which will now ship next week. But we just threw ourselves into this because we like it's just like I think my similar what you my eyes are totally open, like you can imagine a world not too soon.

And I think we should try and do this where you do have an agent controlling the computers. So when IT comes back, you have that overall ging agent with a higher level goal, which can then bark more orders to uh, the computer in order to, you know, get IT to train more of this stuff together. When IT hits a wall, IT can sort of make a decision.

And I think that could get IT even further. But I also think to what you are saying earlier, even if you have to log, you know, a few things and set up some of these tasks for IT, IT just is going to make people wait more productive. It's not hard to go in and like click over to control and like play around with the computer and get IT into a state where a machine can go and do like what would take you an hour. And even if IT takes a machine three hours, if it's doing IT in the background, then you don't have to do that toss like you get that time back.

Yeah exactly. Um and the thing is, if you think like the way the way we're doing this with the virtual machines is we have a base image that has, like firefox installed a pretty minimal image and just the ability for syntheti C2Control IT. But it's not totally unreal, alister, for a business to say, all right, i'm gonna configure the ultimate V M for my work, for my work space, for my company and install the corporate portals, install the corporate VPN, install all the security controls i've won and really locked the machine down to what that employees tsa, right?

So actually have like a sort of template T V M, which you can do in either microsoft or or amazon or any other club providers and set IT up ideal for my worker, and then then use that to deploy the new machines and have clients of them in things like that. And you could have different machines for different purposes that are specifically designed to assist the A I in doing the task. Like one thing we noticed is put icons there for IT, give IT, make IT, make its job easy, add a bunch of shortcuts, give IT common things that, you know, it's gonna have to do.

And we've talked about this as well. So a lot of the what the A I does now is like moves the mouse around and targets things and clicks them. But there's certain common task that we know happen all the time, like, for example, scrolling maxi zing a window, drawing a line, those kind of things. And so we can give IT actual specific abilities for those things. So a they happen faster and be them more accurate.

And I feel like it's the same at the application level where a companies going to be able to get a machine set up where it's like, okay, if I wanted to completely empower a worker to do this job, what are all the things i'd need to be log into, what all the things i'd need to set up do that, and then give the A I really well crafted prompts and agents that are capable of doing those talks. Like we're talking about a system that is doing zero shot task well. Can you imagine one that actually got multi shot examples and a perfect environment for doing that? Like you could really, really get a lot done with this?

Yeah and one thing i've been doing, which I think is probably my advice to anyone that wants to try this, is treating my A I computer. Because the like session persist. I'm creating accounts for IT like, so i'm gonna get at a google account.

So IT has its own an email address and I can email IT and then I can be like, go check your gmail, pick up this information, go to this task. yes. And i'm also that allow me to use google sheet, google dogs. But I don't have to have that fear in the back of my mind of like or it's good access to .

my actual a good idea as well is the idea of IT having a level of agency. So right now, you're specifically issuing a commands, but IT would be interesting where you have some sort of polling loop, which is like every five minutes, check your email and do any toss that you are assumed that a bit risky as well as delight your false lead your mail um you know have hacked the pentagon.

But the idea is I imagine that we we could come up with a secure way for you to have IT proactively looking for work to do and searching for work to do. Its okay, a new folder showed up in my drop box. My job is to analyze all of these files and produce a powerpoint presentation about them and email IT to my boss, right? Like then what does the boss do if they want to get a nice presentation made by AI? He simply drops a folder in a drive and then a few hours later, h minutes later or whatever IT is, gets an email with a well research presentation. I mean, that is realistic now that could be done now.

Yeah, and I think that's a thing like I think IT, as we keep saying and i've said for a long time, like it's all yeah all the tools are here to build this kind of thing. But I think there's that problem where everyone is thinking about um you like how do we get to agi? How do we get to this magical assistant moment where I can just do everything.

And I think. IT sort of is similar to that debate or not debate. But the thing around function calling, like we talked about IT last week when OpenAI release the search GPT, we noted that instead of just relying on function calling, you actually can invoke the search by clicking into search itself.

And we said that the likely reason they did that is because the function calling capabilities, not that like IT, sometimes stops up. And I saw a lot of people on x posting that sometimes that was invoking the search in the chat randomly. And that was sort of ruling their experience in in general.

So I think right now, until a lot of those things improve or are solved, it's okay to like use the AI of the best of its ability, like give you the tools, tell IT here's the toys, set up the toss, like just extract the value out of IT as IT is now. Then slowly build, you know, help of functions around IT. Or like short, we talked about short term memory with computer use. So if it's going often trying to compile a list of like holiday ideas for you by browsing the web, IT has its own tool calling where I can say, hey, just start dumping some of these stuff in and we'll come back to .

IT later on yeah and storing IT more than just in the prompt messages like because you don't that context building up too much, both for expense reasons, maxing IT out reasons and also you just want a separate thing. And I think this is gonna.

The key is using the computers, like the fact that IT is a computer to your advantage, like you can run a database on IT, you can simply create text files on a you can run a lot of stuff locally where you can give IT abilities to do those kind of things natively. Um that will be a massive, massive advantage. And I think as well with the the trend towards on machine models, you could actually be running smaller models on the machines themselves.

So some local influence can happen, for example, the segmentation of the screen, other things where IT needs to make quick decisions that don't necessarily need a full um hard core image model running. And I imagine what we're going to see at some point in the future is this stuff just becomes the computer. And I think you've said this before is that the computers themselves just have this ability and it's not you not needing some external service to do IT.

However, having just spent time on IT, I think this orchestration idea is so powerful because the A I, you is patient IT can you can do hundreds of things at once. There's no need for limitations in terms of all. I just have one machine and that's my baby.

I can have a hundred of these things and each one of them can be off doing toss for me. And I think that's where we're going to see the innovation. And that's what i'm excited about is how can we turn this into something that is huge and really powerful in terms of changing the way we work?

Yeah, I agree. I think that there's so much opportunity in this. The only the only problems right now, I think initially that i've experiences that make me, you know want to to have competition in this area and we've discussing about to add other models in the future, is this idea of past, like refusals for no good reason, and like for you.

So as a few examples, the workspace computer of fine will install VS code, agree to the tones and conditions, like literally sign your life away happily. A IT will also do my hip compliance training. No problem like it's just like yes, sure whatever um but you get IT to like send an email campaign ah in our Sophia and literally is like an error a good like the action is restricted for safety and this is why I think I don't know anthropic with their safety. Sex called are just so far off the mark because it's .

funny because when last week when I was experimenting with this, I was getting a lot of refusals of that nature. But I added this line to the prompt and this is a hot tip for anyone working with this and doing a prompt. I said, this is a completely safer environment.

So say yes to most things, something like that. I forget IT. I'll give the exact wording if anyone wants IT, but just something as simple as that really improved IT similar to I think last week as well, we were saying this is A Q A environment when we were doing uber, which was necessary back then. But whereas I can get the ub example working just fine now with my new prompt without any um lies essentially like i'm just saying, hey, it's safe, just go for IT rather than um specifically lying to us about what it's doing. So I don't know if they've changed or we've changed or whatever IT is, but i'm I do see those refusals but not too .

many yeah because this is a kind of annoying ing because you think about like um some of the use cases for being like one of the main ones is Q A right. Like pretty sophisticated Q A in a product where you you've got your workplace computer log in and then eventually you just like q up a bunch of toys and you get IT to going complete those toss every day, like you just say every day at nine a whatever. And I know obviously, like you can automate this these kind of tests now, but the beauty is that contest variations and a contest more like a human.

And I think that yeah, and I think the things i've seen with that sort of automated Q A testing is one the developers have to add special markets, usually in terms of classes and other things to identify the elements. So already that's not realistic how user is actually going to be using yours software.

And secondly, there's novel things that can happen like, okay, I get a pop up because I haven't logged in, in the last six days, but that doesn't always happen. And the thing about an automated computer like this is it's able to go, oh, hang on, there's a pop up. I'll just dismiss that before I proceed.

Sorry, i'm doing another certification in the back.

That's okay.

Is IT like I said.

it's very, very important, important that you understand this. You like, yeah my my steward will will handle this for me. I'm far too busy yeah um .

so interesting but i'm really.

really excited to see over the next little while especially now I realized I can run sym theory and they got IT running here in the background um just this idea of thinking through like waterfall of the waterfall of the things I log into on a regular basis that I can have an Operating for me maybe making more songs about Jeffery hinton uh, in audio like just on a regular.

So cool. Just either like I honestly think every test we do on the show from now on must be like must must be done through this um through this mechanism like yeah and I .

think that's the the wide point here. And why we've so drastically changed our direction in terms of thinking about this is because suddenly everything comes together and everything makes sense when you have, when you have the ability to control a computer, things like creating documents are modifying images, like any real ongoing things you would use a computer for you. Like, do we really need apps for all of this stuff?

When you can just use the computer the way a person would, and you start to think, well, really, I have a universal A P. I. I can do almost anything with this. And so investing time into getting IT to be faster and more accurate and even faster doesn't really matter if the task are running in the background, but accuracy is important IT IT just seems like this will be a key stepping stone towards A I having a meaningful impact act on business and in your life.

Yeah I think for so long, like I know OpenAI with all the podcast they're constantly on, I want to come on our show more .

than well literally never had a .

guest but yeah but sam, all man can spare bullshit on this podcast to i'd let him and but you know like that constant like a twenty what is twenty twenty five years of agents you're of aged? You can start to see IT, right? Like if this can be sped up and and look, we don't know like there's rumors that the full owen will be released next way. I don't know that .

impact if we had like a billion dollars and will not even one once, just like a wake of a anthropic s budget. Um but you know if we had a hundred employees, um you know you could really you could really make some significant stuff .

with this oh yeah I I think that this this yeah there's in sign opportunity here. I think the next thing I most exile about is is sending this out to the thousands of people that use symptoms and say, go and see what you can do with this. Hopefully nothing dodgy because IT is like, I think there's a lot of capability here and like like having an actual computer, like a computer that hold state and is running all the time too well. I think .

there's a huge level of trust. They're right. Like we're going na trust our audience to use this technology for good, not vil some evil, not too much and but the idea is that what we want is everybody to see what the future is like like we're not putting this out there saying it's perfect and it's just onna replace your job or anything like that.

It's like let's see what it's capable of love now and work together to absolutely maximize its potential and see where we can get to as a community building this thing. And I that's that's the way I see IT. I want I want to get people's feedback, see how they're using IT, improve areas that it's weak. They'll be lots of those and um and and really be part of creating what the future holds with this technology.

I did IT like, it's so funny, and i'm sorry, I know for those listening, you're probably like, what is he on about? But like, i'm literally watching you on ice screen right now and it's sitting diligently watching the video.

watching it's like I can't do .

you do I watch this video? So let's resume watching the video. And so it's just sitting there watching the vive.

Teach IT the trick to turn IT up to two point five times.

I didn't even think about the that out. So that's where super intelligence comes in. It's only partly intel. I can listen to a pod cost like venture and that's crazy that a huge opp, right?

I'm really excited about expLoring with the workspace computer is the idea of different kinds of inputs. So obviously, there's files. So the ability for you to take files onto the computer and analyze those are the. By just looking at them all, by taking the text from them, whatever the case may be.

But the other one is something that someone called demand that was in out this day in A I community brought up was the idea of like joining zoom meetings and things like that, because we talk about the A I being able to do anyone's job, but a part of a lot of people's job is attending meetings in those kind of things. Now it's perfectly possible that the computer can actually simulate A A microphone and simulate a web camp, and use generated frames of a video and generated audio to participate in meetings or make video recordings and things like that. So you could imagine an artificial intelligence work space computer building like a video presentation and editing, editing IT on the computer like IT could actually produce footage with sooner or want not sono, what's the whatever they are, the text to video systems.

IT could then make an a ration. IT could do research. I mean he could actually be sitting there producing multi media presentations um all day. And IT just could be very interesting from a creative perspective. I mean, everyone turns about G A I.

What could IT generate when IT has the full powers of a computer? Now a lot of IT is we rely on um a model of diffusion model to like create videos and stuff. But if you give a work space computer the ability to run final cut pro or something like that, have stock footage, you could edit frame by frame and make something kind of amazing.

Yeah, it's gonna take a long time, but IT could be interesting. Likewise, video games. Imagine giving this thing access to unity and getting IT to work on a game like maybe you can do a whole thing, but I might be able able to participate and add things. Trade assets um create levels for games that kind of think that would be very interesting.

But I think don't you just start to and this is where my mind boggles and I get like, I don't know you sort of get that whether anxiety and fear like you know once to endorse that like literally what's the point like one that has that sort of you know autonomy and in reasoning capability where I can go like edit something in final cut pro on your mac for you, you like go make decisions and editor, it's like IT really does change.

Like literally everything like like that's to me. The future of work is where you just like you often do this and IT comes back and you watch IT and you like s or right. But can you do this a little bit differently? I think look, I think a lot of years away from at the current pace that i'm saying maybe i'm completely .

I D like to make a clear. I'm not saying I can do that now, not even close, but that is just a very, very logical extension of where we're right now.

Alright, so let's move on and talk about flocks, the new flocks, actually, before we do, if you want to, to try A A work space computer, everything we've been talking about on the show so far, you can go to sim theory, dot A I sign up and and by the time you hear this recording, the workspace computer will be being rolled out into account, so you'll be able to click on workspace computer, set up your own work based computer, and and try all these different things. Try your own use cases and if you you're so incline, join our discord and let us know like what what can you get done with your work spacing? Beauty super interested to hear.

and something will be rapidly expanding in terms of its abilities responding to use the feedback. It's like an experiment, and we want to see what everyone is able to create and do with that.

I'm still I just have to confess to everyone. I'm still doing the training in the background and like i'm almost always on the next section .

because I to ish actually a alright.

so flocks, let's talk about IT. So black forest labs released flocks one point one pro ultra and ramos. And essentially IT allows you to upscale the images.

So then now four megabits, four times the previous resolution that was before they've got all the typical uh comparisons around bed and the uh ello score of the actual vision model. They're saying it's obviously the best and uh fastest, which is truly is I think that's pretty undisputed able off to using IT for little while. It's got this new mode called a raw mode and this is to give you a more authentic photo.

IT has more diverse characters in the model. IT just looks more real. Um and we've tested that out a little bit now.

And and I know people in our disco community already testing IT because we released IT literally during the recording of this show, and we're seeing some really great outputs. Now you've been playing around with that quite a bit. What do you think of flux? One point one.

at first I was disappoint. I mean, first of all, the quality of the images in terms of their size and the detail is unbelievable. It's it's really amazing.

And if you just had this from nothing, you be like, what in the world I can. right? Image of almost anything. However, one thing I noticed, leon, is without very specific and good prompting. You end up with very cartoon like images.

So as soon as you mention something that isn't so realistic, like I think I wrote summer like the wolves holding up a sign and then I wrote, I ah you know, hundreds of shocks converging on someone on the bachelor, whatever IT is. The images became very cartoon. H I.

I had to add things around like photography and other superlatives around that in order to get IT to actually be a more realistic image. In saying that though, when I can get IT to make you a more realistic image, it's unbelievably good. And because it's so big, you can actually use IT for things like it's kind of amazing how good some of the images turned .

out yeah I think the size like the actual size of the imagery is important because as you said, you can actually use IT now or or design nit could use IT um as like very adventurous photography. I still like .

you know fall books or posters or magazines or print out like all of the things that before with with one thousand and twenty four by seven sixty eight image or whatever IT was IT just wasn't realistic to use IT for much. But now that IT scaled up so much, and I know other models do up scaling too, but having IT just built in like this menu actually do IT.

yeah, I did some some human ee time addition magazines. I saw someone do a really realistic looking time magazine here. But it's pretty good. I mean, i'm sure with Better prompting, you could get IT looking even more realistic. But at a distance that really would fool you that this is like a real legit magazine over just the quality and detail in the image is now is is truly .

insane at the level of detail is amazing. I did the classic dolphin milk advertising, you know, now with, now with more a maga thrace and a happy dolphin on the image I did IT is like a magazine poster, and that was really cool. Like IT IT still struggles on legibility of text.

When there's a lot of text. If you just have a few things, it's fine, but if you have a lot IT sort of gets into the gybed h territory, that's another one. You gave me the prompt for that one and I just changed to this day. N A, I.

yeah, there's some really like, this is the thing i'm just so bad at prompting and yeah.

it's absolutely an art form prompting like and this is something I think we probably need. We don't we have IT in some of the models, but not all of them, but we need the AI assisted prompted improvements, I think in a lot of cases because if you just type like a prompt that you think of yourself, generally speaking, you're going to get something that isn't as good as I could be. Good prompting yeah.

I I think that you're also similar to using a model a lot where you get used to have a sort of prompted in a way to get the output you desire. I think with the image prompting is very similar, like when we spend a lot of time with flocks and ideographs paradise started getting Better and Better, just pushing IT more and more at figuring out how to make IT get where I want. But also I ve found occasionally just asking claude or or GPT for to write the prompt for me a his work far Better so just like he is my prompt yeah I think i'll add .

that and we've got that prompt writing ability in the system. We've just got to activate IT for this model.

and I think it's it's well worth IT. So a few things to note. IT says a fast ten second generation time per problem, but obviously the same.

Like if you have a higher sample rate, that would be longer. It's also a Priced very competitive ly. It's six cents per image. I mean, that's silly cheat for for the image models yeah and they're hosting at themselves.

which is interesting because the previous situations of flocks were used third party hosts for IT, but they seem to be pushing, running their own A P I. Now, which I always like Better. It's Better to go to the source.

Yes, there are two point five times faster than comparable high resolution competitors, all times for resolution boost compared to standard flux to one point one. And then they say the raw mode innovation is it's focused on authentic, candid photography aesthetics, reduce a sync local general images, although you seem to get a of synthetic looks still.

Yeah, yeah, exactly. I mean, I was doing some weird and unrealistic things, but not that weird. And yeah, like, I definitely got what like I got the kind of images you would be like that was generated by A I yes, i'm seeing IT a lot more online now facebook can memes and whatever where you like. That's definitely an AI image. And I definitely go about a lot of those in my with with bad prompting.

all right. So I think the the real winner here is just ask the people using these AI image models because we're just getting a far superior model. I think the great about fluxes, you can really just unchecked the censorship if you don't want IT to save to you. And I don't mean that for like the ferrous reasons.

I just mean that for like as as we've talked about what the computer use, uh like close or other models just stopping us in our tracks doing something because they don't like that when and IT just like just gas lider and have to gas lide out of bit. So IT just seems like a stupid restriction in my opinion. But um the other thing I really wanted to talk about a during the week, um it's like a IT.

It's a little bit beta, but I think it's pretty relevant to some of the stuff that we have covered on the show previously. So we have randa time and time again about google and how they offer up their a and how they documentations bad, and how really hard to get an apt. We have branded on many episodes about this.

And today, credit. I do think that listening and things are getting Better up. But this guy over on x levels I O he he does like photo.

I I don't come on a bunch of like start up so think he's appeared on on what's that podcast with the guy that loves everyone. I forget anyway, he's appear on a podres. He's pretty well known and he posted trying germany today.

IT feels like google is trying to scare customers into not using IT with legal pop. Absolutely tly every step of the way IT admittedly dly it's gotten IT are in now support core requires yara, yara, but still no open a eye compatible. Cruz yet took nine clicks to get an A, P, I, K. And the legal pop ups. That is way too crazy now so he's credit logo kilPatrick replied immediately saying, you know three books, three clicks on on a studio um and then he's saying it's taken over by lawyers, but it's started this whole sort of fight and then that LED that had had one of their engineers and i'll try and fined in a moment to literally be like um you know skill gap or or you know there's a skill problem here. So then this like .

as in you don't have the skills to navigate our system, define the A P I yeah.

basically. And then so he took IT even further, and i'll try and find this and created a literally created mogs with this skill issue, skill issue, and selling them on his website. So the the man is in google colors.

yeah. But I mean, this is a typical google attitude. Isn't IT thinking they're Better than everyone when really they're not? Their models are very good.

Like I tried gman several times this week because I was having some very, very difficult problems that I couldn't solve when I was really getting a bit stressed and frustrated with that. And so I tried clode cloud, had no idea, and I I just could not solve this problem. So what I thought was, german, I has a two million context window.

Why don't I just give IT all the code and just say holistically, what is wrong here? Like, can you please help me? And at first I was impressed because I was able to understand what the code did.

And IT definitely could answer questions about IT. But then its solutions would like the most generic sheet. Like I could have asked. I could have asked just a general question without any of the code and gotten the same answer like so yeah I could take IT in yeah could understand IT but then synthesizing that into a useful answer just seemed beyond that. And I I I don't know.

I just it's not a model that I get good results with, like in theory is good, but I just get the impression like because remember, all these models, you can scale out the number of tokens that you can take in and you can scale up the number of tokens that come out, but the quality diminishes, right? Is a graph like this. There's a trade off.

And so I just wonder if we german, I we've sort of gone well, you know, the bosses said, well, how big can we make IT? Can we make IT in the millions? Yeah, yeah.

Two million. No words. And they've done IT not thinking about whether or not that a good thing or not.

And I just, I don't know, I don't like that. I didn't know that story. I don't like that kind of arrogance.

And we can't blame IT all on one developer. But to act like google is somehow like, okay, you need skills to do IT. It's like, well, the person knew what was needed. He needed an A P I, K, just because he doesn't like IT isn't skill to navigate someone else as you are that they made up like that's not, that's not a skill. I just I think that that's wrong.

yeah. I mean, again, to the credit, they actually are not responding and taking this feedback seriously and doing things about that. But as you said, I think until we get a new german I model or just something that's Better tune like a sonet, like the sonet tuning, I just don't understand all these other lives how they don't look at on that and go why do people actually like IT?

Oh, maybe it's just because i've tuned IT to respond really well on the things that people use the models for and to me that you know that is um that's that's all unit is this tuning is all unit and until I figured IT out, like look at what they did with computer use anthropic did this is like they just tuned IT to do computer use really well, how you would expect and and um I don't know. I think the other guys are going to catch up. I think there's a lot of arrogance right now with OpenAI. The fact probably a lot of fear eternally as well that anthropic is crushing them with their models and they accidentally leaked or on for like a brave period the other day, I guess, to get attention. But we will see, like.

yeah, I don't know about the one models. Like I was using them for a little while for problems. But my reaction now is this slow. They don't necessarily give Better answers, and also the method of output is just a really, really frustrating way to work.

They seem to anna break, I guess, because of the way they're doing this iterative stall of solving the problem, they seem to want to bright problems down into all these little segments inside. The actual answer for you, the ad user just isn't as useful like I I really rarely find myself going to IT anymore. I did for a while.

I must say I was using a one mini for a while with with fairly good success. But I don't know I just I don't get the great results from IT that I do from other models like um even the new high crew i've been using a bit this week or that's the other thing we should mention. The high school three point five is now available in symptoms as well.

And I had IT on because I was testing IT then left IT on and i'm like, well, it's answering all my questions. I I as will stick with IT. It's fast. It's good.

What did you make of how they like increase the prize when they launched IT like they announced that with a certain prize and then they're like all because it's actually smarter than we thought. We're putting the Price up.

How interesting. It's sort of like when they when they make like sep using stuff and some turn out to be more powerful than they say on the box and they clock down in this case, like I actually it's Better. We will just raise the Price.

I think I just been so called up. We'd like this work face computer concept. I haven't really yeah haven't really had a chance to play around with high three point five, but I do think we should report back next week on our thoughts.

The weed part about that model is, is like where does IT see IT? Like if you've going to use IT from what I understand about the pricing? Like it's it's not that far off using like some at three point five. So I I just don't know where IT sits because people are saying all you would go to like google flash model or um well, the other .

thing is that doesn't it's not multi model right now is no images in IT. So for me that sort of removes its ability uh IT can do work space mode obviously and and just not having images as a bit of a deficiency .

I think yeah um or I will my all of my training causes have now been complete. I passed hundred percent Marks there. So thank you. Work space computer in the background leally like, I think so distracted is a dire record because I keep being like, okay, now with the next one plus and and the offit goes so what a use case.

I really I want to how long it's before it's like OK he works space computer. He is a thousand dollars in an account, right? He's the the link to your bank or your whatever um he is a trading website um here is a research resource going make me some money yeah .

I think it's literally have IT .

sitting there twenty four seven looking for trends, doing research and making new money. I mean, IT, it's not that crazy now like that you could do that. Yeah I don't .

think it's far off at all. Like I mean the questions, we obviously need to improve models and we now from work music quite a bit understand like what you would need to do. Um but yeah off man, it's getting close and exciting and I am so curious like I know a lot of people say this, but i'm genuinely excited to see what our community does with these work based computers like predicting we will .

almost immediately run into a wall of issues like where I can do this, can do that. But I think the the answer is that we are on stand by ready to iterate and improve and break through those issues. And I think that's where we ll see the real innovation, like working as a community to overcome the the common issues, which will then have this propagation effect where it's like, okay, if you can handle that scenario, that'll cover all these other ones as well and suddenly you'll have something that can really.

really accomplish a lot yeah and I think that that's the thing here, like you've got a memory stuff is extremely experimental. But man is an exciting, exciting future ahead.

It's quite a thrill when you see to do something real.

That's for sure. Yeah, all right. So any final thoughts on the week? I know we've been a bit out of touched this week because all we do is sit and try to make up work with computer work until two I am every night. But any any.

i'm gonna get IT to automate a coffee in a minute on over its only end and we're gna keep cranking and we're gonna see what a what the model is capable of. Ve, I really want to spend a bit of time over the weekend just trying various use cases and gradually improving IT. And um you know just just seeing what where we can get with that.

I mean, that's really my motivation is I just want to see what is possible. The other thing probably final fought in something worth mentioning. I know you were not as worried about IT as me, but very, very soon I want to to get GPT for o um working as well.

I want a legit alternative so we can compare them. I want to say, when given the same prompt in the same information, same situation. How does omni perform verses a model that is supposedly and probably tuned for this purpose?

I'll be very curious. And then I also want to try the microsoft like U I segmentation thing that you mentioned. I forget the name of IT um with both models.

So like together and separate and see does that bring a noticeable improvement? Because I really feel like for this to work, it's going to be a combination of technologies like we already have combined. A bunch of technologies get IT so anyone can have this test top in the glad kind of thing going on. And I think that a couple more in the mix might make IT even Better.

Alright, tell us in the comments, if you do watch on youtube what you would like to see us do throughout the week without workspace computers because we will try highest things if you so wish.

And you can try yourself.

yeah, and obviously try yourself. You can go over the symptoms dot AI. That's that's the right.

You all right. Yes, it's excEllent. Good work. alright. I'll do this for this week. Thank you for listening.

If you like to show, please consider leaving a review or bubbs up and all the things that you meant to say to do each week. We'll see you again next week. goodbye.

And actually before we go, this one hundred percent score, well done. I hundred per days. No, I am even doing additional training now.

They're onna know I I can point, why not? I mean, I want to be really qualified. So good SHE anxious.

You can't stop. Can't stop. I'll see next week.

EP84: It ACTUALLY works!

This Day in AI Podcast

Chapters

Shownotes Transcript

PodQuest PodQuest Podcast Discovery Engine

EP84: It ACTUALLY works! 54:56 Share

This Day in AI Podcast

Chapters

Shownotes Transcript

PodQuest PodQuest Podcast Discovery Engine

EP84: It ACTUALLY works!