Model Plateaus and Enterprise AI Adoption with Cohere's Aidan Gomez

2024/11/21

No Priors: Artificial Intelligence | Technology | Startups

Chapters

Aidan Gomez discusses his journey from growing up in Canada to co-authoring the influential paper 'Attention is All You Need' during his internship at Google Brain.

Gomez was influenced by his environment at the University of Toronto, where Geoffrey Hinton taught.
He managed to secure an internship at Google Brain, which led to his involvement in the transformer paper.

Shownotes Transcript

Hi listeners, and welcome to no players. Today, we're hanging out with eight and gas cofounder and CEO of coherence, a company value at more than five billion and twenty twenty four, which provides A I powered language models and solutions for businesses and founded cohere in twenty. But before that, during his time as an intern, a google brain, he was a coauthor on the landmark twenty seventeen paper. Attention is all you need. Thanks for coming on today.

Yeah, thank you.

Having IT to be here, maybe we can start just a little bit with the personal background. How do you go from growing up in the woods in canada to um you know working on the most important technical paper in the world?

A lot of luck and and chance. Um B I I happened to go to school at the place where jeff hinton taught and so um obviously jeff recently won the nobel prize. He's kind of like attributed with being the godfather of of deeply in u of t the school where I went.

He was a legend and pretty much everyone who was in computer science studying at the school wanted to get into A I ah and so in some sense, I I feel like I was raised into A I like as soon as I stepped out of high school, um I was steeped in an environment that really saw the future and wanted to build a um and then from there he was a bunch of happy accidents. So I I somehow managed to get an internship with lukas kiser at at google brain um and I found out at the end of that internship I wasn't supposed to have got that internship they was supposed to have been for P. H. D. Students and so they were like throwing a goodbye party for me the turn um and the clash was like, okay so eight and you're going back how many how many years we've got laugh in your PHD ah and I was I got going back into third year and underground and he was like we don't do get intus so I think he is a bunch of like really lucky mistakes uh that let me let me to that team working on really interesting important .

things that google what a convinced you that you should start go here.

Yes, I bounced around like when I was working with luca tion norm in the transformer guys, I was in mountain view. And then I went back to u of T. I started working with hinton and my cofounder nick in toronto. I brain there and then I started my PHD and I went to england and I was working with, uh yao, up with another transformer paper author .

in berlin and had Young S A nice.

yeah yeah okay. Fan of the pot. Good good. Um C I is working with the up in berlin. And then I was also collaborating remotely with jeff in and sanjay on pathways, which was like there you know bigger than a supercomputer training program.

Ah the idea was like wiring together supercomputers to create a new larger unit of computer, the eco in models on and at that stage, GPT two had G S. Come out and IT was pretty clear, the trajectory of the technology that we were on a very interesting path. And these models that were ostensibly models of the internet, models of the web, um we're going to yield some pretty interesting, interesting things.

So I called up nick. I called up I have in mico founders and I said, you know, maybe we should figure out how to build these things. I think there are going to be .

useful for anyone who doesn't know yet. Can you just describe at the high level like what coheres mission is and then what the models and products are?

Mission the way we want to create value in the world um is by enabling other organizations to adopt this technology and make their workforce more productive or transform their product um in the services that they offer. So we're very focused on the enterprise. We're not going to build a ChatGPT competitor. What we want to build is A A platform in a series of products to enable enterprises to adopt this technology and make IT valuable.

And in terms of like your north star of how you organize um the team and invest, but you obviously come from a research background here yourself. Like how much do you think um you know cohere success is dependent on core models versus other you know platform and go to market support investments you make.

It's all of the above, like the models of the foundation. And if you're building on a foundation that um doesn't meet the customer's needs, then there there is no hope. And so the models are crucial and um it's like the hearts of the company.

But in the enterprise world, things like customer support, reliability, security, lisa oche. And so we've heavily invested on both sides were not just a modeling organization where a modeling and go to market organization um and increasingly, product is becoming a priority for go here. And so figuring out ways to shorten time to value for our customers.

Um the over the past, like eighteen months since the enterprise world sort of woke up to the technology we've watched, we've watched folks built with our models seeing what they're going to accomplish in the common mistakes that they make. That's been helpful, has been sometimes frustrating, right, washing the same mistake again and again. But we think there's a huge opportunity to be able to help enterprises avoid those mistakes and implement things right the first time. And that's really where we're pushing towards.

Yeah can we make that a little bit more real? Like what is the mistake that frustrates you most and how can product can meet that?

Yeah well, I I think all language models are quite sensitive uh to to prompts, to the way that you present data. They all have their own individual corks the way that you talk to one might not work for the way that you talk to another. And so when you're building a system like a rag system, where there is an external database, IT really matters how you present the retrieved results to the model.

IT matters how data is actually stored in those in those database, the formatting counts. And these small details are often lost on people. They ah the overreach mate the models, they think that they're like humans and and that has made to a lot of repeat failures.

People try to implementing x system. They don't know about these like idiosyncratic elements of implementing one properly, and then IT fails. And so in twenty twenty three, there are a lot of these P O C S.

A lot of people trying to get familiar with the technology, wrap their heads around IT. And a lot of those POS fail because of in familiar because of um yeah these common errors that we've seen. And so moving forward, we have two approaches.

One is making the models more robust. So the model should be robust to a lot of different ways that you present data. And the second piece is being more structured about the product that we exposed to the user. So instead of just handing a model and saying, you know, prompt IT good luck, uh, actually putting more structure around IT, so creating A P that more rigorously define how you're supposed to use model these sorts of pieces, I think just reduce the chances of failure uh, and make these systems much more usable for the user.

What are people trying to do? Can you give us a flavor of sum of like the biggest use cases you see in the enterprise is super broad.

a stands pretty much every vertical. I mean, the common things are like Q I. So speaking to a corpus of documents for a sense, if you're a manufacturing company, you might want to build a two and a bot for your engineers or your workers through on the assembly line, uh, and plugged in all of the the manuals of the different tools and diagnostic manuals for common errors in parts, and then let the user chat to that, instead of having to open up a thousand page book and and try to find what they need similarly Q N A bots for the average enterprise worker.

So plugging in your I T FAQ, your H R dogs, all the things about your company, and having a centralized chat interface onto the knowledge of your organza so that they can get their questions answered. Those are some of the common ones. Beyond that, there are kind of specific functions that we power.

Um a good example might be for a health care company, they have these long ji udal health records of patients and that consists of every interaction that that patient has with the the health care system from visits to a pharmacy y to the different labs or tests that they're getting to doctors visits and they can spend decades。 And so it's a huge, huge record of someone's medical history. And typically what happens is that patient will call in and they'll bringing up the reception as to be like my nie hurts.

I need an empty intent and the doctor that needs to kind of calm through the past few few entries IT has this come up before and maybe they missed something that was two years ago because they only have fifteen minutes before unemployment. Um but what we can do is we can feed the entire history in along side the reason they're coming in so contextual ly relevant right? What they said they're coming in for and surface a brief ing for the doctor um and so this tends to be one dramatically faster for the doctor to review, but also often IT catches things that a doctor couldn't possibly review before every patient meeting.

They're not going through twenty years of medical history. That is just not possible. Um but the model can do that.

We can do that in under a second. So those are the sorts of functions that we're sing sumi zon Q N A bots. A lot of these um you might think of them as monday, but the impact is immense.

We see tons of startups working on problem such as um let's say enterprise her over all specialized applications to uh let say like technical support for particular vertical, even looking at health records and reasoning against them and retrieving from them. How do you think about like what the end state, there is no end state but what um stable equal librarian status for how enterprises consume from, let's say, specialist A I powered application providers versus custom applications built in house with uh A I platforms and model aps.

I think gna be a hybrid. I think it's probably, you can imagine, like a pyramid d with the bottom of that pyramid, every organization needs the stuff and it's like copilot, like a generalist chatbot in the hands of every single employee answer their questions. And then as you had up the appear mid, it's more specific to the company itself or the specific domain product that they they Operated or offer.

Um and as you push up that, that pyramid, it's much less likely you're gona find and off the shelf solution to address that. Um and so you going to have to build yourself. What we've pushed organizations to do is have a strategy that encompasses that fall permit. Yes, you need the the generalist standard stuff. Maybe there is some industry specific tools you can go on by, but then if you are building, don't build those things that you could buy, instead focus on the stuff that no one's going to sell to you and that gives you uniquely a competitive advantage.

So we worked with this insurance company um and they they ensure like large industrial development projects that turns out I know nothing about the space IT turns out what they do is there's A D R F P H put out by a mine or something like whatever the project is for insurance and they have actuals jump on the R P. Do tons of research about know the land that is on the potential risks to saturday. And it's essentially a race whoever responds first usually gets IT.

And so it's A A time based thing. How quickly can use actually is put forward a good researched proposal. Um and what we built with them was like a research of system.

So we plugged in all the sources of knowledge that these actually is go to to do their research a via rag. And we gave them a chapter and IT dramatically sped up their ability to respond to R P S. And so I grew their business because they were just winning many more of them.

Um and so it's tough for like we build horriston tal technology. And now that is kind of like A A CPU. I don't know all the applications of an elan, right? It's so broad and really the deep inside or the the competitive advantage thing that puts you ahead is listening to the customer and letting them tell you what would put them head. Um and so that's that's a lot of what we have been doing is just being a thought partner in helping brainstorm these projects and ideas that that is strategic to them。

I'd wash that you know this company is winning because of vast majority of their competitors haven't been able to move so quickly to uh adopting you know and building up like this research assistant product that is helping them like what is the biggest barrier you see to um generally enterprise adoption?

I think the big one is trust. Uh so security a big one um in particular and regulated industries like finance, uh uh health care data is often not in a cloud or if IT isn't a cloud, IT can't leave their vpc. And so it's very locked down, very sensitive.

And so that starts a unique differentiator of coherent the fact that we haven't locked ourselves into one ecosystem and we're reflexible to deploy on prem if you want us in vbc. Outside of vbc, literally, whatever the customer wants were able to touch more data, even the most sensitive data and provide something that's that's more useful. So I would say security and privacy is probably the biggest one.

Beyond that, there's knowledge, right, like the the knowledges to know how to build these these systems. They're new that is unfamiliar to folks, you know the people with the most experience have a few years of experience. Um and so that the other major piece ah that bit I think it's honestly just a time game like eventually developers will become more familiar with building with this technology. Um but I think it's gonna another two or three years before IT really permeates do you think is like a traditional .

hype cycle for enterprise technologies problem for most technologies but in particular enterprise uh you know there's this trough of dissolution ment concept of people get very excited about something and it's of being harder to apply or more expensive than they thought. Do we see .

that in A I i'm sure we see some of IT for sure um but I think honestly, like the core technology is still improving at a steady clip and new applications are getting unlocked every few months. So I I don't think we're in that true disillusion, man, yet. Yeah feels like we're super early IT feels like we're really, really early.

And if you look at the market, this technology just unlocks an entire new set of things that you could build. You just fundamentally could not build them before and now you can. And so there's a restart facing of technology product systems that's underway.

Even if we didn't train a single new language model like, okay, all the data res blow up, we can improve the one we only have what we have today. There's a half decade of work to go integrate this into the to build all these things, to build the you know A R F P insurance, R F P response bought to build the health care record. Summarizing, like there's eight a half decade of just resurfacing to go due to there is a lot of work ahead of us.

I I think we're kind of pass that point. There is a question of, oh, is there too much hype? Is this technology actually going to be useful, but it's in the hands of hundreds lion people and hundreds of million and of people.

Now it's in production. There's a very clear value. The project is now putting IT to work and delivering IT to the world.

In this question of like integration into the real um some piece of IT is of course like interfaces and change management and like figuring out how users I can understand the the model outputs and and garden lis all of that um specifically when we think about the model and specialization, uh let do you some framework you offer customers without use internally around um what version of IT they should invest in, right? So we have free training, post training, tuning, retrieval, like an in a traditional sense, like prompting, especially as we get longer contacts like how how do you tell customers to make sense of how to specialize .

IT really depends on the application. Like there's some stuff, for instance, we partnered with fruit who's like the largest S I N in japan um to build a japanese language model。 There's just no way you can do that without intervening on cree training.

You can't like fine tune or post train japanese into effectively. And so you have to start from scratch. Um on the other side, there's more narrow things like if you want to change the tone of the model or you want to um I I don't know, change how IT format certain things.

I think you can just do fine tuning. You can take the the end state um and so there is this gradient. What we usually recommend to customers is start from the cheapest, easiest thing which is fine tuning and then work backwards.

And so start with for tuning, then go back into post training right like S F T R H, then if you need to. And you know it's kind of a journey, right, like as you're talking about a production system and the constraints are getting higher and higher, you potentially will need to touch pre training, hopefully not all of retraining. Hopefully it's like ten percent of pre training at the very end or maybe twenty percent of pre training. But yeah, that's usually how we think about IT is like this journey from the simplest st cheapest thing to the most sophisticated. But most performance .

moving along the gradient from the cheapest thing makes sense to me. A the idea that any enterprise customer will invest in pretrail ing is, I think, a bit more controversial. I believe some of the lab leaders would say, like nobody should be touching this. And IT doesn't make any sense for people from a scale of computer and data, data effort required and just of the talent required to do retraining in any sort of competitive way. Like how would you react to that?

I think if you're building like a if you're big enterprise and you're sitting on a ton of data like hundreds of billions of tokens, s of data free training is a real level that you're able to pull. I think for most like S M B S and uncertainly start up IT makes a no sense. So you should not be pretrail ing a model.

But if you're a large enterprise, I think it's IT should be a serious consideration. The question is how much retrain? It's not like you to start from scratch and do a you know fifty million dollar training run.

But you can do a fact, you can do a five million dollar training. That's what we've seen succeed these sort of continuation retraining efforts. Yeah, that's one of the offerings that we have.

But of course, we don't jump straight into that. You don't need to spend massively if you don't want to. And usually, uh, the enterprise buying psychology technology adoption cycle is quite slow.

And so you have time to move back into IT. I would say it's totally at the customer direction. But to the folks you say that no one should be retrying.

no one outside of, let's say, A G I labs should be pro training.

That's imperative.

Wrong, maybe that. So uh like a good jumping off point into just like talking a little bit more about what's going on in the technical landscape and also what that means. You're go here and like what is the what is the bar you said internally for go here, you said the models of foundation um and uh, I believe you ve also said like there is no market for last year's models. Like how do you square that with the expense of the capital, expensive of competition and the rise of open source models?

Now well, I think you have to spend. There is some like minimum threshold that you need to be spending a in order to build up a model that useful. The things get cheaper, the computer to train the model get cheaper and the source of the data, uh, well, in some directions to get cheaper and others, not with synthetic data, it's gotten dramatically cheaper.

But with expert data, it's getting harder and harder and more expensive. And so what we've seen is today, you can build a model that is good as GPT four in all the things that enterprises might care about for ten million dollars, twenty million dollars, like just orders of magnitude less than what we spent to develop that model. And so if you're willing to wait six months or a year to build the technology, you can build IT at a fraction of what those frontier lives paid to develop IT.

And so that's been a key part of careers strategy is we don't need to build that thing first. What will do is, well, we'll figure out how to do IT dramatically cheaper, and we'll focus on the parts of IT that matter to our customers. So we'll focus on the capabilities that our customers really depend on.

Now at the same time, we still have to spend like relative to a regular start up, we have to pay for a supercomputer and those things costs hundreds of millions of dollars a year. Um so IT is capital hungry, but it's not capital inefficient. Um it's very clear that will be able to build a very profitable business off of what we're building. So that's the the strategy is don't lead, don't burn, you know three, five, seven billion dollars a year. To be at the front, be six months behind and offer something to market to enterprises that actually fits their needs is at a Price point that .

makes sense of why spend on the supercomputer and the training yourself at all if um increasingly open source options.

will you do not really say more so for laa? Yeah you get like the base model um at the end when it's cooled down and IT has zero gradient. You get the the post trained model at the end when it's cool down and has zero gradient.

Taking those models and trying to um find to them, it's just it's not as effective as building at yourself and you have much fewer levers to pull. Um then if you actually have access to the data and you can change the data that goes into that process. And so we feel that by being vertically interrogated and by building these models ourselves, we just have dramatically more leverage to offer our customers.

Maybe if we go to um projection, we'll hit on a few things that you've mentioned as well um where we in scaling laws, like how much capability improvement do you expect over the next few years where .

we're pretty far along, I would say, like we're starting to enter into a sort of flat part of the curve. Um and we're certainly passed the point where if you just interact with a model, you can know how smart IT is like the the vibe tracks, they're losing utility. And so instead, what you need to do is need to get experts to measure within very specific domains like physics, math, chemistry, biology.

Um you need to get experts to actually assess the quality of these models because the average person can tell the difference at this stage between generations. Yes, like there's still much more to go to, but those games are going to be felt in very specialized areas and have impacts on more researching um more researcher domains. I think for enterprises and the general sort of tasks that they want to automate or tools that they want to build, the technology is already good enough or close enough that A A little bit of customization will get them there.

So that, that sort of the stage that we're add there is there is a new unlock in terms of the category of problems that you can solve, and that's reasoning. And so online reasoning is something that has been missing these models. They don't have a they previously didn't have an internal moloch.

Really I didn't really think to themselves you would just ask them a question and then expect them to immediately answer that question. They couldn't reason through IT. They couldn't fail, right? Make a mistake, catch that mistake, fix IT and try again.

And so the fact that we now have reasoning models coming online, course opening, I was the first to put IT into production, but cohere has been working on IT ah for about a year now. Um this category of tag I think is is really interesting. There's a new set of problems that you can go solve um and IT also changes the IT, changes the economics.

So before if I had a customer come to me and say a eight and I want your model to be Better at x or I want a smarter model, I would say, okay, you know give us six to twelve months. We need to go spin up a new training run train IT for longer train a bigger, bigger model. It's such a, it's such a now there's that was of the only level we had the poll to improve the performance of our product.

There is not a second lover, which is you can charge the customer more. You can say, okay, let's spend twice as many 啊， you know tokens or or that spend twice as much time at in france time and you'll get a smarter model. So there's A A much nicer product experience.

okay? You want a smart mode. You can have IT today.

You just need to pay this. And so they have that option. They only need to wait six months. And similarly, for model builders, I don't need to go double the size of my supercomputer to hit a rec was IT intelligence threats. Ld, I can just double the amount of inference time compute that my customers pay for. So I think that's a really interesting structural change, uh, in how we can go to market and what products we can go and what we can offer to the customer.

I agree. I think it's um uh perhaps under valued in the ecosystem right now how uh much more appealing IT should be to all types of customers that you can move from a like a capex model of improvement to a consumption model of improvement, right? And it's not like, you know these are apples and oranges things um but I I think you'll see people invest a lot more in you know solving problems when they don't have to hony up for a training run and have this delay as you described.

Yeah IT hasn't been clock that people haven't really trace in the impact of influence time compute delivering intelligence. Um there's loads of consequences even at like the chip layer, right like what sort of chips you, anna, build, what you should prioritize for data center construction. Um if we have a new venue, which is inferences time compute that doesn't require this densely interconnected supercomputer, it's fine to have nodes. You can do a lot more locally unless distributed. I I think has loads of impact up and down this chain and it's a new paradigm of um what these models can do and and how they do.

And you are dancing around this, but because are in your average percent doesn't spend that much time thinking about like what is reasoning right? Do you have any intuition you can offer people for? Like what are the types of problems this .

allows us to tackle Better? Yeah I think any sort of multi step problem there there are some modest step problems you can just memorize, which is what we've been asking models to do so far. Um like solving a Polly omy right like really that should be approached multistep that of humans solve.

We don't just get given a polonia eal then bomb. There's a few that may be we've memorize, right? But by in large, you have to work through those problems, break them down, solve the smaller parts and then composing the overall solution.

And that's what we've been locking. We've really liked and we've had self like chain of thought, which is um enabled that. But it's sort of like a retrofitting.

It's sort of like we train these models to just memorize and put out prepares and we found a nice little hack to illicit the behavior that the makes reasoning. I think what's coming now is from scratch. The next generation of models that is being built and delivered will have that reasoning capability burned into IT from scratch.

And it's it's not surprising that IT wasn't there to begin with because we've been training these models off of the internet. And the internet is like a set of documents which are the output of the reasoning process, with the reasoning all hit and like a human rote an article. And you know spent weeks thinking about this thing and the leading stuff in boba ba um but then posted the final product and that's what you get to see.

Everything else is implicit hitting unobservable able um and so IT makes a lot of sense why the first generation of language models lack this in our monolog. But now what we're doing is we're with human data and with synthetic data, we're explicitly collecting people's into thoughts. So we're asking them to verbalize IT and we're transcribing that and we're going to train on that a model, that part of the problem solving process.

Um and so I am really excited for that. I think right now, it's extremely inefficient and it's quite brittle, similar to the early versions of of language models. But over the next two or three years, it's gona become incredibly robust and unlock just the whole new set .

of problems. What is the basic driver of you slow down, you know, reaching the flat part of the curse that you you describe with scaling is IT is that the cost of you know increasingly expert data and collecting, as you said, like hiding reasoning traces that is harder and more expensive than just taking the data on the internet. Is that the difficulty of having evs for no increasingly complex problems um is just overall uh, cost of computer. Why do you think that flattening is happening when .

someone's making an oil painting? Um they do a back code and just covered the whole the whole canvas and then they they sort of painted in the shapes of you know the mountains and the the trees and and as you you get more and more details, you're bringing out very fine brush strokes.

There's a lot more of them that you need to make before you could just take a big wedge and just throw pain across the canvas and accomplish the thing that you want to accomplish. But as you start to get more and more targeted, more um more and more detailed in what you're trying to accomplish, um IT requires a much more fine instrument. And so that's what we've seen with language models.

We're able to do a lot of the common, simple, easy tasks quite quickly. But as we've approached much more specific sensitive domains like science, math, that's where we've started to see resistance to improvement. And in some places, we've gotten around that by using synthetic data, like in code and math.

These are places where the answer is very verifiable. You you know when you're right or you're wrong and so you can generate tons of synthetic data and just verify whether it's correct or not you know it's correct. Okay, let's train on IT um in other areas that require testing uh and knowledge in the real world, like in biology, like in chemistry um there's there's a bigger bottle neck to creating that sort of data.

And you have to go to experts who know the field of experienced that for decades and basically to still their knowledge um but eventually you run out of experts and you run out of that data and you're at the frontier of what humans know about X, Y, R, Z. They're just increasing friction to fill in these much finer details of this portrait. I think that's a fundamental problem.

I don't think that there's any shortcuts um around that. Um you know at some state we're going to have to give these models the ability to run their own experiments to to fill in areas of their knowledge they are curious about um but I think that's quite, quite a ways away. Um and it's gna be tough to scale that IT will take many, many years to do.

We will do IT. We're gonna get there one hundred percent. Um but for the stuff that I care about today with career, I think there are many applications which this technology is ready for production for. And so the primary focus is getting into production and ensuring that our our economy, adopting technology and integrates IT as quickly as possible gets that productivity uplift. And so while that technical question is super interesting about.

You know why is progress slowing down? I I think I should be kind of obvious, right? It's like the models are getting so good, they're hitting they're running into the thresh holds of human knowledge um which is really where they're getting their capabilities from.

You are so grounded and you know getting the capabilities we have and will continue to progressed even if the curve flattening into production. I think I know this answer, but how much do you or I just go here, think about like A G I and take off and does that matter to you?

A G I means a lot of things to a lot of different people. I think I believe in us building generally intelligent machines like completely but it's like, of course, we're going to do that. Um but he has been completed .

how soon we're .

already there. It's not a 没有 it's binary， is not discrete, it's continuous. And we're like, uh, well, on our way, we're pretty far down that road.

There's some definition elsewhere in industry that there are like you can put a break point at even if you even if you have this um continuous function, you can put a break point in like if there's intelligence that replaces like and educated adult professional in any digital rule. Your view is there's no really important break point that's happening.

That sort of like objective checklist thing, like when you have checked all these boxes, then you've got IT, I think you can always find like a counter example like oh well hasn't actually beaten this one human over here who's doing this like her random red of thing. Um now I think it's I think it's prety continuous and we're like quite far quite far along. Um but the A G I that I I really don't subscribed to is a super intelligence take off self improvement just leading to the terminator that exterminates us all or .

create abundance unclear yeah or creates right here.

And now I think we will be the one to create abundance. We don't need to wait for this god to emerging. Do IT for us.

Let's go. Do IT with the tech that we for building, you know, we don't need to depend on that. We can go to to ourselves.

We will build A G, I, if what you mean is very useful, generally capable technology that can do a lot of the stuff that humans can do and flex into a lot of different domains. If what you mean is, you know, are we gona build? god? no.

What do you think is a driver in that difference of opinion?

I don't know. I think maybe i'm. A little bit more in the weeds of the practical frustrations of the technology where IT breaks, where is slow, where IT we start to see things plato or slow down and perhaps others are more maybe they're more optimistic. Maybe they see um they see a curve increasing and they just think IT goes on forever like that will just continue arbitrary, which I I disagree with. I think there's there's friction points. There is genuinely friction that enters them like maybe even if in theory you know like a neural net is a universal tox matter, you can learn anything to university approximate, you would need to build a narrow net the size of the universe so like there are some fundamental barriers to reaching limits that people extrapolate out to that I think will um bound the practically realized all um forms of this technology.

Are there domains where you just believe allam, as we have them today, like not a good fit for prediction, right? And so an example might be like, are we going to get to physics simulation from sequence sequence models?

I mean, probably yeah like physics is just like a series of states and um transition probability. So I think it's probably quite one modelled by sequence modeling. But are there areas where portly suit um i'm sure i'm sure that there are Better models for certain things, more efficient models like you. You can take IT if you zoom into a specific domain, you can take advantage of structure and that domain to carve off some of the unnecessary generalities of the transformer um of these this category of architectures um and get a more efficient model that's definitely to when you when you use in and IT doesn't .

sound like you think it's like a at its core. I K A representation issues.

just not gonna irreducible in the world there like there's things that you genuinely cannot know and like building a beer model will not help you know this genuinely random uh were unobservable thing. Uh, and so those things will never be able to model effectively until we learn how to observe them. Or you know, I think the transformer in this category of model can do much more than people give a credit for its very general architecture.

Many, many things can be phrased as a, as a sequence, and these models are just sequence value. And so if you can phrase IT as a sequence, transformer can do a fairly good job at picking up any regularity in IT. But I am certain that there are examples that I am just not able to think of right now where seat once modeling is uh super inefficient. But you can do IT with sequence as you can phrase the graph as a sequence um but it's just like the wrong model and you would pay dramatically less computer if you approached IT from .

A A different angle. OK one last question for you. So you concluded earlier that um scaling computed inference time like uh people i've noticed, but it's not really Priced in. Like how big of a change this is? Is there anything else you think is not Priced by the market right now that like coherent thinks about that you think about yeah.

I think there is this idea like commoditization of models. I don't really think that's true. I don't think that models are actually getting commoditize.

I I think what you see is you see Price stump play um and so you see people giving you up for free, giving you out at a loss, giving out it's a margin. And so they see the places coming down. And they assume places coming down means commoditization.

I think in reality, the state of the world is there's a total technological reactor that's going on right now. We ll last the next ten to fifteen years and it's kind of like have to we have to repay every road on the planet. And there is like four, five companies that know how to make concrete.

okay. And like maybe today, some of them give their concrete creat away for free. Um but over time is a is a very small number of parties that know how to do this thing and a huge job in front of us and pressures to drive growth to show return on an investment IT.

It's an unstable present state to be Operating at a loss or giving away very expensive technology for free. Um so growth pressures of the market will push things in a certain direction. And yeah know the Price of high q for X A two weeks ago .

and this has been super fun. Thank you so much for doing this to us.

Yeah, my my pressure is super fun. Great senior .

fine us on twitter at no prior spot. Subscribe to our youtube channel if you want to see your faces fall the show on apple podcast, spotify or a where have you listen that way you get a new episode every week and set up for emails or fine transcripts for every episode at no dash prior stock com.

Model Plateaus and Enterprise AI Adoption with Cohere's Aidan Gomez 44:15 Share

No Priors: Artificial Intelligence | Technology | Startups

Chapters

Shownotes Transcript

PodQuest PodQuest Podcast Discovery Engine

Model Plateaus and Enterprise AI Adoption with Cohere's Aidan Gomez