How NotebookLM Was Made

2024/10/25

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0

AI Deep Dive AI Chapters Transcript

A

Alessio

R

Raiza Martin

S

Swyx

U

Usama Bin Shafqat

Raiza Martin：NotebookLM 的成功出乎意料，最重要的是聆听用户反馈并快速迭代。团队与 DeepMind 紧密合作，注重用户体验，并通过 Discord 社区收集实时反馈。产品设计注重简洁性，避免过多的自定义选项，并通过观察 Steven Johnson 等用户的实际使用方式来改进产品。未来计划支持更多语言和方言，并考虑提供 API 接口。 Usama Bin Shafqat：NotebookLM 的音频模型模拟人类对话，通过设置不同的 AI 角色来增加对话的趣味性和张力，并逐步展开信息。团队内部进行了大量的试听和反馈收集，并最终形成了正式的评估流程。幽默感是 AI 的一大挑战，NotebookLM 通过其他方式间接地激发幽默感。 Swyx：NotebookLM 的成功在于其独特的 AI 使用方式，它将 AI 视为虚拟人物，并赋予其独立的观点和表达方式，而非仅仅是工具。 Alessio：NotebookLM 的成功在于其个性化和娱乐性，它可以生成用户感兴趣的内容，并以音频的形式呈现。 Steven Johnson：作为 NotebookLM 的用户，他强调了该产品在帮助人们思考和分析信息方面的作用。

Deep Dive

Hey everyone, we're here today as guests on latent space.

It's a great to be here. I'm a long time listener .

in fan the'd had some great .

guest on the show before yeah what an honor to have us, the host of another podcast, join as guests. A huge thank you to sweets and alesia for the invite. thanks.

We're having us on this show. Yeah, really IT seems like they brought us here to talk a little bit our show, our podcast. Yeah I mean.

we've had lots of listeners ourselves, listeners.

a deep dive. Oh yeah, we've made a ton of audio interviews since .

we launched, and we're learning a lot, probably around a little .

bit .

at least the short version is will keep learning and getting Better for you.

We're glad you're .

along for the ride.

So I keep listening, keep listening and stay curious. We promise to keep diving deep and bringing you even Better options in the future.

Stay curious.

Everyone, welcome to the latest space pocket. This is alex o partner and c to and residents, a disciple partners and i'm joined by the funder, a small eyes.

hey. And today we're back in the studio with us, I guess, rise on marking and summer. I forgot us to get you last shot cut. yes. Okay, welcome. hello.

Thank you for having us have .

us so AI protesters meet human forecasters, always one congress on the successive noble colum. I mean, how does if you.

I spend a lot of fun, a lot of IT honest you is unexpected. But my favorite part is really listening to the audio overviews of people have been making.

Maybe we should do a little bit of intro and tell the story. You know what? What is your path into these of the google, A I, or or maybe I actually don't even know what org you guys are in.

I can start my name is aria. I lead the nobo alm team inside of google labs. So specifically, that's the org they were in.

It's called google labs is only about two years old. And our whole Mandate is really to build AI product that we were super closely with deep mind. Our entire thing is just like try a bunch of things and see what's landing with users.

And the the background that I have is really I worked in payments before this and I worked in address before, and then start, I tell people, like at every time that I changed org, I actually almost quit google. Like specifically like in between ads and parents of like can do this. This is like super hard as like it's not for me and like a very zero to one person.

But then I was like a cowd trial interview with other teams when I interviewed in payment else. I got these people are really cool. I don't know if i'm like it's a super good fit with this space about tric because the people are cool.

And then I really enjoying that. And then I worked on like zero to one features inside of payments and had a lot of fun. But then the time came again.

I was like, oh, I know it's time leaves time, start my own thing. But then I interviewed inside a google labs and I was like, old darn like there's definitely like they got me again now. Now, i've been here for two years and I am happy that I stayed because especially with the recent success of the my dying, we did IT. I actually got to do IT.

So that was very cool, kind of similar. Honestly, I was at a big team at google. We do sort of the data center supply chain planning stuff.

Google has like the large sort of footprint in opposer. There are a lot of management stuff to do there. But then there was this thing called area one twenty at google, which does not exist anymore. But I sort of wanted to do like more zero to one building and landed a role there where we're trying to build like a created commerce platform called kaa. IT launched briefly a couple years ago, but the area went to any sort of transitioned and morphed into labs. And like over the last few years, like the focus has got a lot player like we are trying to build new eye products and do IT in the wild and sort of cocreate and all of that. So ah we just been trying a bunch of different things and this one really landed, which is felt pretty enough on really.

really landed. Let's talk about the brief history of db. You had a tweet which is very helpful for doing research. May twenty, twenty three, during google, I now process tail wind so that so today is october twenty four. So you joined the october 2。

Actually, I used to lead air. I test kitchen. And this was actually, I think not.

I O twenty twenty three. I O twenty twenty two. OK is when we launched A I test kitchen, I announced IT.

And I don't know if you remember that I just how you would .

like have the basic prototype .

for genera access .

to people yeah, yeah, yeah. And I remember how I wow, this is, this is crazy. I we're going to launch an allam into the wild. And that was the first project that I was working on a google at the same time.

My manager at the time, josh, he was like, hey, but I want you to really think about like what real products would be build that are not just demos of the technology. That was in october of twenty twenty two, I was sitting next to an engineer that was working on a project called talk to small corpus. His name was adam.

And the idea of small corporation is basically using LLM to talk to your data. And at the time, I like, great, there is some like really practical things that you can build here. And just a little bit of background.

Like I was an adult learner. Like I went to college while I was working a full time job. And the first thing I thought was, like, this would have really helped me with my studying, right? Like if I could just talk to a textbook, especially like when I was tired after work, that would have been huge.

We took a lot of like to talk to small corpus prototypes, and I showed IT to a lot of like college students, particularly like adult learners. They're like, yes, that I get IT right that I didn't even have to explain that to them. And we just continue to iterate the prototype from there to the point where we actually got a slight as part of the I O .

demo in twenty three. And corpus was IT a textbook my god.

yeah yeah. It's funny actually when he explained the project to me, he was like, talk to small corpus.

He was like, talk to a yeah, it's more corporate.

This is not in yeah yeah and he really was just like the way for us to describe the amount of data that we thought that I could be good for.

Yeah but even then, you're still like doing rag stuff because know the context. Lks back then was probably like two K.

K. IT was basically rag that that was essentially what I was. And I remember I was like we were building the prototypes.

And at the same time, I think like the rest the world was we're seeing all of these like chat with PDF stuff, come up and come on. We've got to go like we have to like push this out into the world. I think if there is anything I wish we would have launched sooner because I wanted to learn faster. But I think like we needed out pretty well.

What's the initial product? Just taxi speech? Or were you also doing out like a synthesizing of the content refining IT? Or what are you just helping people reach IT?

Before we did the I O announcement in twenty three, we'd already done a lot of studies. And one of the first things that I realized was the first thing anybody hype was summers ized. The thing summarized the document.

And I was like, half like a test and half just like, or I know the content, I want to see how well IT does this. So it's part of the first thing that we launched, most called project turn. Back then I was just Q N, A, so you could chat with the dock.

Just retests and IT would automatically generate a summary as well. I'm not sure if we had IT back then. I think we did IT would also generate the key topics in your document and IT could support up to you like ten documents.

So IT wasn't just like a single dog. And well, I guess, and then what was the discussion from there to where we are today? Is there and you may be intermediate step of the product that people missed be doing this was slow, sure. IT was .

interesting because every step of the way I think we had, we hit like some pretty critical milestone. So I think from the initial demo, I think there was so much excited like, wow, what is this thing that google is launching? So we capitalized on that.

We built the weight list. That's actually when we also launched the discord server, which has been huge for us because for us in particular, one of the things that, that I really wanted to do was to be able to launch features and get feedback. Asia, like the moment somebody tries IT like, I want to hear what they think right now, and I want to ask, follow a questions.

And the discord has just been so great for that. But then we basically took the feedback from IO. We continued to refine the product.

So we added more features. We added sort of like the ability to save note, it's right notes. We generate follow up questions.

So there was a bunch of stuff in the product that shows like a lot of that research, but he was really the rolling out of things like we removed the weight list to roll out all the united states. We rolled out to over two hundred countries and territories. We started supporting more languages, both in the y and like the actual source stuff, we experienced interns of moston es.

There was like an explosion of, like users in japan. This was super interesting as a terms of just like unexpected that people would write to us. And there be like, this is amazing.

I have to read all of these rules in english, but I can chat in japanese like, oh, wow, that's true RAID like without alembic, you kind of get this natural translates the content for you and you can ask and you're sort of preferred mode. And I think that's not just like a language thing too. I think there's like I do this test with wealth of nations all the time because it's like a pretty complicated text to read Smith classic.

like four hundred pages .

is yeah but I like this test because I like I asked in like norm, you know, plain speak and then IT summarizes really well for me inside of adapts .

to my phone very capital alist brand I just checking on .

a nobile .

galement discard sixty five thousand people yeah crazy like for for one projects within google. It's not like not labs. It's just just .

not put them .

what do you learn from um the community?

I think that the discord is really great for hearing about a one when things are going wrong. I think honestly, they are fastest way that we've been able to find out if like the servers are down or there's just an influx of people being like IT says sister, unable to answer anybody else getting this. And i'm like, all right, let's go. And IT actually catches a lot .

faster than like our so .

so thank you to everybody. Please keep reporting IT. I think the second thing is really the use cases.

I think when we put IT out there, I was like, hey, I have a hunch of how people will use IT, but like to actually hear about, you know, not just the context of like the use of nobela, but like what is this person's life? Like why do they care about using this tool? Especially people who actually have trouble using IT, but they keep pushing like that's just so critical to understand what was so motivating, like what was your problem that was like so solving.

So that's like a second thing. The third thing is also just hearing sort of like when we have wins and when we don't have wins because there's actually a lot of functionality where i'm like, hm, I don't know if if that landed super welder, if that was actually super critical as part of having this service small project, great. I want to be able to unlawful things too. So it's not just about just like rolling things out and testing IT and being like, wow, now we have like ninety nine features like hopefully, we get to a place where it's like there's just a really strong core features said in the things that are .

not as great you just on launched.

What have you been launched? I have to ask. I'm in the process of unlawful ing some stuff, but for example, uh, we had this this idea that you could highlight the text in your source passage and then you could transform IT.

And nobody was really using IT. And I was like a very complicated piece of our architecture. And it's very hard to continue supporting IT in the context of new features. So you were like OK let to do a fifty fifty sunset of this thing and see if anybody complaints until far nobody has.

Is there like a feature flagging paradigm inside of your architecture that lets you feature flag these things easily?

yes. And actually.

what is that called I love feature flagging?

You mean like in terms of just like .

being able to expose yeah as A P like this, your number one two, right? Yeah, yeah. Let's try this out, right? If IT works, roll IT out. If IT doesn't.

roll back you. yeah. I mean, we just run mental experiments for the most part.

And actually I don't know if you saw, but on twitter, somebody, as they would get around our flags, and they enabled all the experiments. I would like, check out what the nobility team was cooking. I was like, and I was at lunch with the rest of the team. I was like, I was eating. I was like, gars guys.

I just like, they were like, oh, no.

as I OK just finish eating and then let's go figure out what to do yeah, I think I post .

more and will be, but I don't think we need to do IT on the can we just talk about what's behind the magic? So I think everybody is questions, I think, is about what models power IT. I know you might not be able to share everything, but can you just get people very basic?

How do you take the data put in the model, what text model you use? What's the text speech? Can I jump between two?

sure. yeah. I say to summer.

he manually does .

all the right. Thank you very. Ah most of .

the voices. That one. Good, good. yeah.

So for a bit of background, we were building this thing sort of outside notebook L M. To begin with, IT, like just the ideas like content transformation, right? Lake, we can do different modalities, like everyone knows that everyone's been poking at IT.

But like, how do you make IT really useful? And like one of the ways we thought I was like, okay, like you may be like, you know, people learn Better when they're hearing things, but tps exists and you can like narry whatever is on screen, but you want to absorb IT the same way. So like that's where we sort of start IT out into the realm of like maybe we try like, you know, two people are having a conversation kind of format.

We didn't actually start out thinking this would live in notebook out, right? Like notebook was sort of we built this demo out IT independently, tried out like a few different sort of sources that the main idea was like go from some sort of sources and transformed into a listening able, engaging audio format. And then through that process, we like unlocked a bunch more sort of learnings.

Like for example, in a sense, like you're not prompting the model as much because like the information density is getting enrolled by the model prompting itself in a sense. And because there's two speakers and they are both technically like A I personas rate that have different angles of looking at things and like they'll have a discussion about IT and that sort of we realized that kind of all is making IT riverton in a sense, like you care about comes next even if you ve read the material already. Could like like people say they get new insights on their own.

Journals or books are whatever like anything that differed in themselves. So yet from a modeling perspective, like it's like rios earlier, like we work with the deep mind audio folks pretty closely. So they're always cooking up new techniques to like get Better, more human like audio. And then german I one point five is really, really good at absorb long context. So we sort of like generally put those things together in a way that we could reliably produce the audio.

I would add, like there's something really new on they think about to the evolution of like the the utility of text to speech, where if it's just reading an actual text response. And i've done this several times, I do IT all the time with like reading my text messages or like sometimes i'm trying to read like a really dense paper, but i'm trying to do actual work. I'll have IT like read out the screen.

There is something really robotic about IT that is not engaging and it's really hard to consume content in that way and it's never been really effective. Like particularly for me where i'm like he is actually just like fine for like short stuff, like text, angular even that it's like not that great. So I think the frontier of experimenting here was really thinking about there is a transform that needs to happen in between whatever here's like my resume right here.

Here is like a hundred pages slide decks or something. There is a transform that needs to happen that is inherently editorial. And I think this is where like that two person persona, right dialogue model they have takes on the material that you've presented.

That's where IT really sort of like brings the content to life in a way that's like not robotic. I think that's like where the magic is, is like you don't actually know what's going to happen when you press generate for Better, for worse. Like to the extent that people are like now, I actually wanted to be more predictable now like I want to be able to tell them.

But I think that initial lake wow was because you didn't know yeah right when you all blow your resume, what is about to say about you and I think i've seen enough of these where I am like, oh, gave you good vibes, right? Like you know, is going to say like something really cool as we start to shape this product. I think we want to try to preserve as much of that wow, as much as we can because I do think like exposing like all the knobs and like the the dials, like we've been doing about this a lot. It's a key is that like the actual thing is like the the thing that people really want.

Have you found differences and having one model just generate the conversation and then using taxi speech to kind of fake two people? Or like, are you actually using two different kind of system problems to like keva conversation step by step and always here, like if persona system problems to make a big difference? Or like you just put in one prompt and then you just let her run.

I guess like generally, we use a lot of inference, as you can tell, what like the spinning thing takes takes a while. So there is nothing like a bunch of different things happening under the hood. We've tried both approaches, and they have their sort of drawbacks and benefits.

I think that that idea, like questioning, like the two different personas, like person throughout, like whatever approach we try, its like there's a bit of like imperfection and there like we had to really lean into the fact that like to build something that's engaging, like IT needs to be somewhat human and IT needs to be just not a chatbot like that was sort of like what we need to diverge from. Like most checkbox will just narrow the same kind of answer, like giving the same sources for the most Spark, which is ridiculous. So there's like experimenting there under the hood, like with the model to like make sure that it's spitting out like different takes and different personality and different sort of prompting each other is like a good logy, I guess.

Yeah, I think Stephen Johnson, I think he's on your team. I don't know what his role this. He seems like chief dreamer writer yeah I mean.

I can call me on Steven to Stephen joint actually in in the very early days, I think before I was even a fully funded project. And I remember when he joined, I was like Stephen Johnson is going to be on my team, know? And for folks you don't know him, Stephen is a new york kim's best selling author.

Like fourteen books, he is a pbs show. He's like, incredibly smart, just like a truth or celebrity by himself. And then he joined google, and he was like, I want to come here and I want to build the thing that i've always dreamed of, which is a tool to help me think I was like, a, what I got a tool to help you think I like.

What do you need help with? Like, you seem to be doing great on your own and know he would describe this to me and I would watch this flow and aside from like providing a lot of inspiration, to be honest, like when I watched Stephen work, I was like, oh, nobody works like this is what makes him special that he is such a such a dedicated, like a researcher and journalist and he's so sorrow, he's so smart and then I had this realization of, like, maybe Stephen is the product. Maybe the world is to take Stephen expertise and bring IT to like, everyday people.

They could really benefit from this like you're just watching him work. I was like, so I could definitely use like a mini Steven like doing work for me that that would make me a Better pm. And then I thought very quickly what like the adjacent roles that could use sort of this like research and analysis to. And so aside from being, you know, chief dreamer, Stephen also represents like a super workflow that I think all of us, if we had access to IT like IT, would just inherently like make us Better.

Did you make him expressed his thoughts, wealthy work? Or you just silently watch him?

Or how does this work? Oh, no. Now you're making. But yes, I did just island.

Yes, this is a part of the P. M. Took IT right again. Yeah, use your interviews and all that.

yeah.

I mean, I didn't interview him, but I noticed like if I interviewed him, IT was different than if I if I just watched him and I did the same thing with students all the time, like I followed a lot of students around, I watch them study, I would ask them like go, how do you feel now, right? Or why did you do that? Like what you do that actually? Or why are you upset about like this thing? Why are you cranky about particular topic? And he was very similar thing for Steven, especially because he was describing, he was in the middle writing a book, and he would describe like, oh, you know, here's how I research things, and here's how I keep my notes, and here's how I do IT.

And IT was really, he was doing this sort of like self questioning, right? Like now we talk about like chain of the reasoning or thought reflection. And I was like, he's the og. The guy watched him do IT in real time. I was like, that's that's like l one rate there. And to be able to bring some of that expertise in a way was like maybe like costly inference wise, but really have like that ability inside of the tool that was like, for starters, free inside of no poem. I was good to learn whether or not people really did find use out of IT.

So did he just commit to using nopody m for everything? Or did you just model his existence worthwhile?

Both right in the begin, there was no product for him to use. And so he just kept describing the thing that he wanted. And then eventually, like, we started building the thing, and then I would start watching him use IT.

One of the things that I love about Stephen is he uses the product in ways where IT kind of does IT, but doesn't quite like he's always using IT at like the absolute max limit of this thing. But the way that he describes that is so full of province where he's like, I can see IT going here. And all I have to do is sort of like meet him there and sort of pressure test whether or not, you know, everyday people wanted. And we just have .

to build IT I opening. I has a pretty similar person, Andrew Mason. I think his name is very similar, like just from the writing words and using IT as a tool for thought to show tragic T I don't think that people who use air tools to their limit are common.

I'm looking at my oppom now have got two sources. You have a little like source limit thing in my bars over here. You stretch.

did he fill IT up and he has like a higher limit? Oh yeah. Like I don't think even even has a limit.

And he has notes, google drive, P D S M P three, whatever. yes.

And one of my favorite demons, he just did this recently as he has actually PDF of like hand written curry notes.

I see. So you're doing image recovery tion as well. Yes.

yes. IT does support IT today. So if you have A P, D, F, that's purely images that will recognize. But his demo is just like super powerful. He's like, okay, here is murry curies notes and it's like here's how i'm using IT to analyze IT and i'm using IT for like this thing that i'm writing and that's really compelling, is like the everyday person doesn't think of these applications.

And I think even like when I listen to Steven demo, I see the gap, I see how Stephen got there, but I don't see how I could without. And so there's a lot of work still for us to build of. Like, hey, how do I bring that magic down to like zero work because I look at all the steps they had to take IT or do you do IT? And oh, god, okay, that's tough product work for us. That's just on boarding.

And so from an engineering perspective, people come to you as I K, I need to use this and brian nodes from margery from thirds of years ago. How do you think about adding support for, like, Better sources and then maybe any fun stories and like supporting more, eric, types of of inputs?

So I think about the product in three ways, right? So there's the sources, the source input there is like the capabilities of like what you can do with those sources. And then there's the third space, which is how do you output IT into the room? Like how do you put that back out there? There's a lot of really basic sources that we don't support still, right?

I think there is sort of like the handwritten note stuff is is one, but even basic things like a dog at, or like powerpoint, right? Like these are the things that people everyday people are like. K, my professor actually gave me everything in dog ex.

Can you support that? And then just like basic stuff, like images in pds combined with text, like there's just A A really long road map for resources that I think we just have to work on. So that's like a big piece of IT on the output side.

And I think this is like one of the most interesting things that we learned really early on is sure there's like the q na analysis stuff, which is like he when did this thing launch? Okay, you found IT in the slide decks. You're the answer.

But most of the time, the reason why people ask those questions is because they are trying to make something new. And so and actually, when some of those early features league, like a lot of the features are experimenting, whether the output types. And so you can imagine that people care a lot about the the sources that they're putting into nopal L M.

Because they are trying to create something new. So I think equally as important as like the source inputs are the the outputs that we're helping people to create. And really like you know, shortly on the robot, we're thinking about how do we help people use no book? I M to distribute knowledge.

That's like one of the most compelling use cases is like shared. No books is like a way to share knowledge. How do we help people take sources? And then like one click new documents out of IT.

And I think that something that people think is like, so yeah, of course, right? Like one who should document. But what does that mean to do IT, right? Like to do IT in your style, in your brand, right, to follow your guidelines. Ff, like that. So I think there's there's a lot of work like on both sides of that .

equation comments on the engineering side of things.

So yeah that I was mostly uh working on building the text to audio, which kind of lives as a separate engineer. Pipelines almost that we then put notebook. I am, but I think there's probably tons of notebook.

I am engineering war stories on dealing with sources. And so I don't work too closely with the engineering directly. But I think a lot if IT does come down to like genius native understanding of image is really well with the latest generation.

yeah. I think on the engineering and modeling side, I think we are really good example of a team that's put a product out there. And we're getting a lot of feedback from the users and we return the data to the model in team, right? To the extent that we say, hey, actually you know what people are upload, but we can really support super well texas image, especially to the extent that like topic O M can handle up to fifty sources, five hundred thousand words each like you're not going to be able to jump all of that like the context window. So how do we do multi model in bedding with that? There's really like a lot of things that we have to solve that are almost there, but not quite there yet.

And then turning IT into audio. I think one of the best things ks is IT. Are so many of the U N. Does that happen in the text generation that then becomes audio? Is that a part .

of the audio model that transforms audio model maybe like certain human intonations and like sort of natural, like breathing and pauses and like laughter and things like that. But yeah, in generating like the the text, we also have to sort of give signals on like where those things may be would make .

sense there and on the imposition instead having a transfer versus having the audio. I can you take some of the emotions out of IT to if i'm giving life? For example, when we do the recaps of our podcast, we can either give a audio of the pot or we give a disruption, but the transaction doesn't have some of the you know voice kind of like things yeah ah do you reconstruct that when people load audio or how does work?

So when you upload audio today, we just transcribe IT. So IT IT is quite lossy in the sense that like we don't transcribe like the emotion from that as a source. But um when you do upload a text file and IT has a lot of like that annotation, I think that there is some ability for a TV be used in like the audio output.

But I think I will still contextualize IT in the deep dive format. So I think that that's something that's like particularly important is like hate today. We only have one format.

It's deep dive. It's meant to be pretty general overview and he is pretty peppy, is just very upbeat. So yeah yeah, even if you had like a sad topic, I think they would find a way to be like over lining though. Yeah, we're having a good chat.

Yeah one of their ways, many, many, many ways that the dive went viral is people saying, like, if you want to feel good about yourself, just drop in your linton. Any other like favorite use cases that you can saw from from people discovering things in social media?

So many funny ones. And I love the funny ones. I think, because I always relieved when I watch them. I like that was funny and not scary is great.

There was another one that was interesting, which was a startup founder putting their landing page and being like our athletes test whether not like the value prop is coming through. And I was like, no, that's right. That's right. yeah. And then I saw a couple of other people following following up on that too.

Yeah, I put my about page in there and like, yeah, if there are things that i'm not come all that I should remove IT pick IT up.

right? I think that the personal machine was was like three viral one, I think, like people upload to their dreams and like some people like you, sword of dream journals. And it's like, what sort of comment on those? And like I was diapered tic.

I didn't see those other good yeah I I hear from googlers all the time, especially because we launched an internally. First, and I think we launched IT during the you know the q three are like check in cycle so all googlers after right notes about like, hey, you know what do you do in q three? And what googlers were doing is they would write whatever they accomplished in q and then they would create an audio overview.

And these people they didn't know. I just be in be like, wow. Like, I feel really good.

Like going into a meeting with my manager. Like, good, good, good, good. You really did that right?

There are cool. One is just like any wikipedia ticket you like you drop IT in and it's just like sudenly like the best sort of .

summary overview did that he has now a spotify channel called histories of mysteries, which is basically like he just looks like interesting stuff from wikipedia. And I do overviews out of IT.

Yeah, he became a forecast overnight. Yeah.

i'm here for IT. I fully support him. I'm racking up the license for onesta .

is useful even without the audio. I feel like the audio does add element to IT, but I always wants impaired audio in pex ton is just amazing to see what people are organically discovering. I feel like it's because you lay at the groundwork with no boyne, and then you came in and added sort of TTS portion and made IT.

So good, so human, which is weird, like this, is this engineering process of human. Oh, what thing I wanted to ask, do you have ever? Ils, yeah. yes.

what?

Potatoes shape, what is what mean really?

Oh, sorry, sorry. We were joke and joke we were doing with this like a couple of weeks ago. We were doing the side by sides. But like we said, I sent me the file and I was could literally called potato es for chefs now is like, you know, my job is really serious, but you have to laugh a little bit like the title of the files is like potato es for chefs.

like a training document .

for shifts, just a side, side by side for like two different kind of audio transformers.

The question is really like, as you iterate, the typical engineering advice is you established some kinds test, you have an order benchmark. You're like thirty percent. You want to get IT up to night.

yeah. What does that look like for making something on human, an interesting in voice? We have this sort of .

formal eval process as well. But I think like for this particular project, we maybe took a slightly different right to begin with IT. Like there was a lot of within the team listening sessions, a lot of like .

sort of yeah like I think .

we the bar that we tried to get to before even starting formal events with riders and everything was much higher than I think other projects would like because that as he said, like the traditional by story, I get that asp. Like what are you looking to improve on what every benchmark IT is? So there was a lot of just like critical listening, and I think a lot of making sure that those improvements actually could go into the model.

And like we're happy with that human element of IT. And then eventually, ally, we had to obvious ly, but still those down into an evil set. But like shoulders, like the team is just like a very, very like avid user of the product at all stages.

I think just are doing really opinionated. I think that sometimes if you are, your intuition is just sharper and you can move a lot faster on the product because it's like if you hold that bar high, if you think about that the iterative cycle, it's like we could take like six months to ship this thing to get IT to like mid where we were or we could just like listen to this and be like that. That's not IT.

And I don't need a rater to tell me that that's my preference, right? And collectively, like if I have two other people listen to IT, do you probably agree and it's just kind of the step of like just keep improving IT to the point where you're leg okay. Now I think this is really impressive. Then that do events and then validate that.

Was the sound model done in frozen before you start doing all this? Or are you also saying, hey, we need to improve this model as yeah .

we are making improvements on the audio and and just like generating the the transcribe yeah as well. I think another weird thing here was like we needed to be entertaining, and that's much harder to quantify than some of the other benchMarks that you can make for. Like you sweep, enter, get Better at this math.

Do you just have rate one to five or or just up and down the formal radar events?

We have sort of like a like scale and like A A bunch of different dimensions there, but we have to sort of break down that what makes IT entertaining into like a bunch of different factors. But I think the team stage of that was more critical. IT was like, we need to make sure that like what is making IT fun and engaging like we died that as far as IT goes and while we're making other changes that are necessary, like obviously they shouted make stuff up or be insensitive fluctuations .

and other safety things fe stuff .

yeah exactly. So with all of that and like also just, you know following sort of a coherent and narrative and structure is really important. But like with all of this, we really had to make sure that the central tenet of being entertaining and engaging in something you actually want to listen to, what does doesn't go away, which takes like a lot of just active listening time as you're closest to the prompts.

the model in everything. I think sometimes the difficulty is because we're dealing with non determined models, sometimes got a bad the and it's always on the distortion that you could get something bad. basically.

How many do you like? Do ten runs at a time? How do you get rid than .

non determinism? Yeah, that's that's bad. Look, yeah, yeah.

I mean, there still will be like bad audio views. There's like a lunch of that happens. Do mean for like the readers for readers.

right? Like what if that one person just got like, really bad reading? You actually had a great prompt, actually a great model, great weight, whatever. And you just, you have a battle IT like.

and that's OK right. Actually think like the way that these are constructed. If you think about like the different types of controls that the user hazard, like waking the user due today to affect IT.

I have tried to prompt engineer changing title. Yeah.

changing the title people have found out the title, the new book, people have found out, you can add, show no right? You can get them to think like the show has changed the funding, changing the language of the output, like those are are less well as ted because we like this one aspect. So IT did change the way that we sort of think about quality as well, right?

So it's like quality is on the dimensions of entertainment, of course, like consistency grounded dss, but in general, does IT follow the structure of the deep dive. And I think when we talk about like non determinism is like, well, as long as IT follows, like the structure of the deep dive, IT sort of inherently meets all those other qualities. And so IT makes IT a little bit easier for us to ship something with confidence to the exceptions like I know it's going to make a deep dive.

It's going to make a good deep dive, whether not the person likes IT, I don't know, but as we expand to new formats, as we open up controls, I think that's where IT gets really much harder even with the show notes, right? Like people don't know what they're going to get. They do that and we see that already where it's like this is going to be a lot harder to vue in terms of quality or now we'll get a greater distribution where as I don't think we really got like very distribution because of like that free process that someone was talking about and also because of the way that we'd constraining what are we measuring for? Literally just like is IT a deep division .

and you determine what a deep dive is? Yes, everything needs P. I have this is very similar to something have been thinking about for air products in general. There is always like a chief teasmade .

ker and for no me seems like a .

combination you and well some is .

like the like the head chef right of like deeper die, I think potato es of potatoes. And I say this because I think even though we are already a very opinion team and Steven for sure very opinionated, I think of the audio generations like summer was the most o opinionated and we all we all like would say, like, hey, I remember like one of the first once he said I was like I I feel like they should introduce themselves. I feel, say IT title. But then like we would gets things like maybe they shouldn't.

Their names and their .

names was still catch, if not, give their names.

So stuff like that is just like we all injected like a little bit of just like, hey, here's like my take on like how podcast should be right. And I think like if you're a person who like regularly listens to podcast, there's probably some collective preference. There is generic enough that you can stand ardie into like the deep dive format. But yeah, it's the new formats where I think, like, go, that's the next test.

Yeah, i've tried to make a make a clown, by the way. Of course, everyone did. Yes, everyone did. I this is so easier.

I obviously our models and guys, yours, but I tried to inject the consistent character backstory like age identity, where they went, where they work, where they went to school, where the hobbies are, then just the models tried to bring IT into much. Yeah, I don't have you tried this. yes. So then i'm like, okay, like how do I define the personality? But IT doesn't keep coming up every single time.

Yeah I mean, we have like really, really good, like character designer on our team.

like A D nd person just to say like .

we just like we had to be opinion about the format, we had to be opinion about who are those two people talking OK, right? And to the extent that like you can design the format, you should be able to design the people as well.

Yeah, I would love like a you know like when you played bold's gate, all like seventeen and carma and like like what race they are? I don't know.

I recently actually I was just talking about characters.

select screens. Yeah.

I love that. And I was like, maybe there something to be learned there because like people have fall in love with the deep dive as a as a format, as a technology.

But also as just like those two personas, now when you hear a deep dive and you heard them, you're like, I know those two, right? And people is so funny when I when people are trying to find out their names, like so as where are the tasks? Where are the goal? I know what you're doing, but the next step here is this sort of introduce like is this like what people want people want to sort at the person on us? Or do they just want more?

I'm sure you are getting a lot of opinions conflict with each other before we move on. To have to ask because we're kind of on this topic, how do you make audio engaging because it's useful matches for deep dive but also for us as casters. What is what does engaging mean um if you could break IT down for us every great .

mean I can try like, don't clam to be an expert at all.

So i'll give you some yes, variation in tone, right and speed. You know, this is sort of writing advice where this sentence is five words, the sentences three. That kind of advice where where you very things, you have excitement, you have laughter, all that stuff, but because is how else you break down.

So there's the basics like obviously structure that can be wandering right? Like there needs to be sort of a an ultimate goal that voices are trying to get to you, human or artificial. I think one thing we find often is if there's just too much agreement between people like that's not want to listen to. So there needs to be some sort of tension and build up, you know, withholding information, for example, like as you listen to.

A story unfold like you're going to learn more and more about IT an audio that maybe becomes even more important because like you actually don't have the ability to just like skin to the end of something or driving or something like you're going to be hooked cause like there's and that's how that's how a lot of podcasts work, like maybe not interviews necessarily, but a lot of true crime, a lot of entertainment in general there, just like a gradual unrolling of information. And that also like sort of goes back to the content transformation expert to fit like maybe you are going from like the the wikipedia ticket of like one of the history of mysteries, maybe episodes like the wikipedia ticals going to state out the information very differently. It's like here's what happened would probably be in the very first like paragraph.

And one approach we could have done is like maybe a person's just narrow that thing and maybe that would work for like a certain audience. I guess that's how I would picture like a standard history lesson to unfold. But like because we're trying to put IT in this two person dialogue format, like we inject like the fact that there's you don't give everything at first and then you set up like differing opinions of the same topic or the same.

Like maybe you seize on a topic and go deeper into IT and then try to bring yourself back out of IT and go back to the the mean narrative. So that's that's mostly from like the settle up the the scrip perspective. And then the audio I was saying earlier, it's trying to be as close to the human speech as possible, I think, was that what we found successful far?

Yeah IT with interjections. Are they getting like when you listen to two people talk, there's a lot of like yeah yeah right and there's like a lot like that questioning. Like, oh yeah really .

what do you think I noticed that? That's great.

Totally like exactly .

my question is due pulling speech experts to do this or did just come up with that yourselves? You can be like, okay, talk to bunch of fiction writers to to make things engaging. All comedy writers or whatever, send up comedy, they have to make something engaging.

Yeah but audio is what? Like there is professional fields of studying that where people do this, we're living. But this is A, I engineers are just making this up as go.

I mean, a great idea, but you definitely didn't.

Yeah, my guesses, my guesses. You didn't yeah there's there's a certain of view to authority to people have they are like oh like you can do this because you don't have any experience like making engaging audio. But that's what you literally did in first was luly .

chatting with someone out google earlier today about how some people think that like you need a linguistics person yeah in the room for like making a good chatbot. But that's not actually true because like this person went to school for a linguistic CS and according to him, and he's an engineer now, according to him, like most of his classmates were not actually good at line guage like they knew how to analyze language and like sort of the mathematical pattern rythm and language, but that doesn't necessarily mean they were just gonna be eloquent at like while speaking or writing. So I think yet a lot of uh, we haven't invested in specialist, I know you format yet, but maybe that would I think .

it's like super interesting. We really think there is like a very human question, like what makes something interesting. And there is like a very deep question of like what is that right? What is the quality that we are all looking for? Is IT to somebody have to be funny.

There's something have to be entertaining. There's something have to be straight to the point. And I think when you try to steal that, this is the interesting thing, I think about our experiment about this particular launch is, first, we only launched one former, and so we sort of had to squeak everything we believed about what an interesting thing is into one package.

And as a result of that, I think we learned it's like he interacting with a chatbot sort of novel at first. But it's not interest, right? It's like humans are what makes interacting with chatbot interesting.

I'm going to try to a tRicky it's like that's interesting spell strawberry. This is like the fun that like people have with IT. But like that's not the L M being interesting. That's you just like kind of giving in your own flavor.

But it's like what does that means to sort of flip IT on its head and say, no, you be interesting now, right that you give the chatbot the opportunity to do IT and this is not a chatbot per say. IT is like just the audio and it's like the texture. I think that really brings IT to life.

And it's like the things that we've described here, which was like, okay, now I have to like lead you down a path of information about like this commercialization deck, like how do you do that to be able to successfully do IT? I do think that you need experts. I think we'll engage your experts down the road, but I think IT will have to be in the context of or what's the next thing we're building, right?

It's like what am I trying to change here? What do I fundamentally believe that needs to be improved? And I think there is still like a lot more studying that we have to do in terms of like what are people actually using this for. And we're just in such early days like and .

even been a month. Yeah, I think the other what one other element to that is like the fact that you're bringing your own sources, yes, to IT like it's your stuff like you know this somewhat well. Are you care to know about this? So like that I think changed equation on his head as well. It's like your sources and someone's telling you about IT. So like you care about how that dynamic is, but you just care for IT to be good enough to be entertaining because ultimately they are talking about your mortality theater, whatever.

So it's interesting just from the topic itself, even taking out of the agreements and the hiding of the slow reveal.

I me there's a big time. Maybe like if I was like to draw, like if if someone was reading IT off like you know, that's like absolute, like worse.

But do you promise for humor? It's a tough one, right?

I think it's more of a agenor way to bring humor out of possible. I think humor is actually one of the hardest things.

But I don't know if that is human.

Yeah but did you see the chicken one? Okay.

if you haven't .

heard IT here, oh, there is video on threads. I think this is my marko wong and um it's A A P dif.

Welcome to your deep dive for today. Oh yeah, I get ready for a fun one because we are diving into chicken, chicken, chicken, chicken, chicken.

You got that right by dogs.

honor. And yes, you heard that title correctly.

Listener today, are you going to need our help?

And I can totally see why. absolutely. It's dense. It's bath ling, and it's packed with more chicken than a KFC buffet.

That's a there is that so funny? So it's like stuff like that, that I truly delightful, truly surprising. But didn't humors .

contextual also like super connection or realizing prompting for but we're prompting for maybe a lot of other things that are bringing out that humor.

I think the thing about generated content, if we look at youtube, like we do videos on youtube and it's like, you know, a lot of people like screaming and the time now get clicks, there's like everybody, there's kind of like a matter of like what you need to do to get a licks.

But I think in your product, there's no actual creator on the other side investing the time so you can actually generate a type of content that is maybe not univerSally appealing. You know I am much yeah exactly. I think that's the most interesting thing is like, well, is there way for like take mr.

Beast, right? It's like mr. Beast optimized videos to reach the biggest ideas and like the most collects. But what if every video could be kind of like regenerated to be closer to your taste? You know.

when you watch IT, that's kind of the promise of A I that I think we are just like touching on, which is I think every time i've gotten from somebody, they have delivered IT to me in their preferred method, right? If somebody gives me a PDF, it's a PDF. Somebody gives me a hundred slide deck that is a the format in which i'm going to read IT.

But I think we are now living in the era where transformations are really possible, which is look like I don't want to read a hundred slide deck, but I i'll listen to a sixteen minute or to overview on the drive home. And that I think is is really novel and that is is paving the way in a way that like maybe we wanted but didn't expect where I also think yearly send you a lot of content that Normally wouldn't have had content made about IT. I I watch this tiktok, where this woman uploaded her diary from two thousand four for sure, right? Like nobody was going to make a podcast by diary. And I fully not like this seems kind .

of embrace .

creepy eppy because he was he was doing this like live listen of like, oh, here's a pop gesture of my diary and it's like it's entertaining right now sort of all listen to IT together but like the connection is personal is like IT was her interacting with like her information in a totally different way and I think that's where like, oh, that's super interesting space, right? Where it's like I am creating content for myself in a way that suits the way that I want.

I want to consume IT or people compare like retirement plan options, like no one's going to give you that content, like for your personal financial and and like even when we started out the experiment, like a lot of the goal was to go for really obscure content and see how well we could transform that. Like if you look at the the mountain view, like city council eating notes, like you're never gonna read IT, but like if I was a three minute some read like that would .

be interesting if you have one system, one prompt that just covers everything .

you thread IT .

maybe just it's really interesting. You know, i'm trying to figure out what you nailed compared to others. And I think that the way that you treat your the A A little bit different than a lot of the builders.

I talk you so I don't know what is what is is he said I ish, I had to transfer t of me. But it's something like people treat A I like a tool that but usually it's kind of doing their. And you know what you're really doing is loading up these like two virtual agents.

I don't know you've never said the word agents up with that new mal, but two virtual humans are a and letting them from from their own opinion and letting them kind of just live. And in body IT a little little. Is that accurate?

I think that that is close to accurate as possible. I mean, in general, I try be careful about saying like, oh, you know, letting know yeah like these these personals live. But I think to your earlier question of like what makes IT interesting, that's what IT takes to me an interesting yeah right and I need to do IT well is like a worthy chAllenge. I also think that it's interesting because they're interested, right?

Like is this interesting to compared?

Yeah is is IT interesting to have two retirement plans? No, but to listen to to these, to talk about IT, oh my god, you would think I was like the best thing ever invented, right? It's like get this deep dive into four one k through chase verses. I know whatever they .

do a lot to get this.

I know, I know. I dream about IT. I'm sorry.

Um there is A, I have a few more questions on, just like the engineering around this and obvious ly, some of this is just me, creatives vely asking how how this works? How do you make decision between when to trust the AI overlord to decide for you? In other words, ticket, that's a product.

As IT is today, you want to improve IT in some way. Do you engineer IT into the system like red code to make sure IT happens? Or you just stick IT in the prompt and hope that the lm does IT for you.

Do you know I mean give me specifically about .

the the general products I think is like the one thing that people are struggling um and there's there's compounding I people and then there's big air people. So compound that people will be like database ks have lots of little models, change them together to making up with. This is determinist that you controlled single piece and ce produce the open eight people, totally the opposite.

Like write one giant prompts and let the model figure IT out. yeah. And obviously, the answer for most people can be a special in between those two, like big models, more model. When do you decide that?

I think that depends on the task. IT also depends on, well, that depends on the task, but ultimately depends on what is your desired outcome. Like what am I engineering for here? And I think there's like several potential outputs and there's sort of like general category.

And i'm trying to delight somebody and i'm trying to just like meat, whatever the person is trying to do, and I trying to sort of simplified workflow at what layer am I implementing this and I trying to implement this as part of the stack to reduce like friction, you know, particularly for like engineering or something? Or am I trying to engineering IT so that I deliver like a super high quality seeing? I think that the question of like which of those two I think you're IT is a spectrum.

But I think fundamentally IT comes down to like it's a craft that it's delic raft as much as IT is a science. And I think the reality is like you have to have a really strong people like what you want to get out of IT to be able to make that decision. Because I think if you don't have that strong pov, like you're going to a get lost in sort of the detail of like capability in capability is sort of the last thing that matters because it's like models will catch up, right? Like models will be able to do whatever in the next five years.

It's going to be insane. So I think this is like a race to like a value, and it's like really having a strong opinion about like what does that look like today and how far are you going to be able to push IT? Sorry, I think maybe that was like very philosophical.

but we we get there and I think that hits a lot of the points.

is to make I tweet today or I exposin whatever that we're going to interview you and what we should ask you. So guys, a list of feature requests, mostly. It's funny.

Nobody actually had any like specific questions about how the product was. They just want to know when you're releasing some. So I know you cannot talk about all of these things, but I think maybe I will give people in the idea of like where the product is going.

So I think the most common question, I think five people asked is like are you going to build an A P I? And you know, do you see this product can still be kind of full hand product, but like a log in and do everything that there? Or do you want to to be a piece of infrastructure that people build on?

I mean, I think why not both? I think we work at a place where you could have both. I think that end user products, like products that touch the hands of users, have a lot of value for me personally, like we learn a lot about what people are to do and what's like actually useful and what people are ready for.

And so we're going to keep investing in that. I think at the same time, right, legally, there are there a lot of developers that are interested in using the same technology to build their own thing. We're going to look that how soon that's going to be ready, I can't really comment, but these are the things that like, hey, we heard IT, we're trying to figure that out, and I think there's room for both.

Is there world in which this becomes the default erman I interface because it's technically different org?

It's such a good question. I think every every time someone ask is like gate. I just ask .

the german .

effort so they think multi .

and will support. I know people kind of hack this a little bit together. Any ideas for full support, but also mostly interested in dialects.

In italy we have entitlement and obviously, but we have a lot of local dialects like if you go to room, people don't really speak as only and speak the local. Do you think there's a path to which these models, especially the the speech, can learn very like niche dialogues, like i'm a state. Can people contribute like i'm if you see this as a possibility.

So I guess high level, like we're definite working on adding more languages. That's like top priority. We're going to start small. But like theoretically, we should be able to cover, like most languages pretty soon.

What dict ous statement that's that's crazy.

I like the soon are the pretty .

soon part a few years ago, like a small team of, like I don't know, ten people saying that we will support the house of hundred, two hundred languages is absurd. But right, you can do IT. Yes.

you can do IT. And I and I think like the speech team, we are a small team, but the speech team is another team. And the modeling team like these folks are just like, absolutely brilliant at what they do.

And I think like when we've talked to them and we've said, hey, you, how about more languages? How about more voices? How about dialects, right? This is something that like they are game to do. And like that's that's the program for them.

The speech team supports like a bunch of for their efforts across google, like temi life first temple is also the models built by the same like sort of deep mind speech team. But the thing about dialects is really interesting is like and some of her sort of earliest testing with trying out other language, we actually noticed that sometimes IT wouldn't stick to a certain dialect, especially for like, I think for french.

We noticed that like when we presented IT to like canadians aker, or would sometimes go from like a canadian person speaking french first is like a french person french or an american person speaking french, which not what we want IT. So there's a lot more sort of speech quality work that we need to do there to make sure that IT works reliably and at least sort of like the the standard dialect that we want. But that does show that there's potential sort of do the thing that you're talking about of like fixing a dialect that you want maybe contribute your own voice or like you take from one of the options. There's there's a lot more hedon there.

Yeah because we have movies like we roman movies that like different languages, but there's that you know so I always say, well, i'm sure like the italian is was strong in the model. Like when you're trying to light let away from IT.

you can need a lot. But that's that's all sort of like a wonderful deep .

mind speech team yeah anyway, if you need to tell he's .

got you I got I got to specifically english.

I got you managing system prompt people want a lot of that.

I see you, yes ish definitely looking into IT for just for everybody y's of that forever. So we're working on that. I think for the itself, we are trying to figure out the best way to do IT. So we'll launch something sooner rather than later. So we will probably stage IT. And I think, like you know, just to be fully transparent, will probably launched something that's more of a fast follow than like a food make treat her first just because like guy, I see so many people put in like the fake show notes is like I i'll help you will just put a text facts yeah yeah and I think .

a lot of people like this is almost perfect but like, I just need that extra ten.

twenty percent yes yeah. I noticed that you would say no a lot, I think, or you could try to ship one thing and that there's different about you then maybe other pms or other engine teams that try to they're like to hear all the norms and just take all my loves yeah yeah top p top cake doesn't matter. Just the door you figure out right that's whereas for you you you actually just you make one product yeah as supposed to like ten you could possibly have done. Yes, I think about this a lot.

I think everyone is a lot of discipline. Because I thought about the jobs I was like I saw on twitter on x people want the moves like great start making IT up, making the text boxes, designing like the little fiddles, right? And then I looked at in, I was kind of was like, this is not, call is not fun.

This is not magical. IT is sort of exactly what you would expect nos to be then, you know, it's psychotic. I mean, now how much can you you design a nab? I thought about IT.

I was like, but the thing that people really like was that there wasn't any that they just pushed up and I was cool. And so like how do we bring more of that, right? That still gives the user the optionality that they want.

And so this is where like you have to have a strong P O V think you have to like really boil down. What did I learn in like the months since I i've launched this thing that people really want and I can give you to them while preserving like that, that delightful sort of fun experience. And I think that's actually really hard, and i'm not going to come up with that by myself.

And like that's something that like our king things about every day. We all have different ideas or experimenting with sort of how to get the most out of like the insight and also ship IT quick. So so we will see we'll find out soon if people like you are .

not think the other interesting thing about like A I development now is that the knobs are not necessarily like speak going back to all the sort of like craft and like a human taste and all of that they went into building like, yes, the notes are not as easy to add as simply like i'm going to add a parameter to this and it's going to make that happen. It's like you kind of have to redo the quality process for everything. The prioritization is also different than IT goes .

back to sort of like it's a lot easier to do and eva are like the deep deform at then if like, okay, now i'm gonna inject like these random things, right? Okay, how i'm going to measure quality, either I say I don't care because like you just input whatever, or I say actually wait right, like I want to help you get the best up whatever what's is .

going to take the no actually needs to work reliably.

Yeah, yeah. Very important for two more .

things we never want to talk about. I guess now people would like notebook alam to like a poca generator, but I guess, you know, there is a whole products with there.

How should people think about that? Like is this and also like the future of the product as far as maize ation too, you know, like is that gonna be the best thing gonna accord to IT? Is that just gonna be one opportunity? And like you're still looking to me like a broader kind, like a interface, but date on documents.

I mean that such a this is a good question that I think the answer is i'm waiting to get more data, I think because we are still in the period where everyone's really excited about IT, everyone's trying IT. I think i'm getting a lot of like positive feedback on the audio. We have some early signal that says it's a really good hook, but people stay for the other features that's really good to I was making a joke yesterday, was like if be really nice, you know, if I was just the audio because then I could just like simplify the train, right?

I don't have to think about all the other function, but I think the reality is that the framework kind of like what we were talking about earlier that we had laid out, which is like you bring your own sources, there's something you do in the middle and then there's an output, is a really extensive one, and it's a really interesting one. And I think like particularly when we think about what a big business looks like, especially when you think of about commercialization, audio is just one such modality. But the editor itself, like the space in which you are able to do these things, is like that's the business, right? Like maybe the audio by itself, not so much, but like in this big package, like, oh, I can see that. I can see that being like are really big business.

You any thoughts on some of the alternative? Interact with date and documents. Think like cloud out of facts, like a tragedy. D canvas, you know, can not. How do you see maybe where notebook alam stars but like geri starts like you have so many amazing teams and products like google is sometimes like i'm sure you have to figure that out.

Yeah, well, I love artifacts. I play a with. Can I got a little diuck and ago there's something, you know, I like the idea of IT fundamentally, but something about the U. S. Was like, oh, this is like more disorient thing than like artifacts.

And I couldn't figure out what IT was and I didn't spend a lot of time thinking about, but I love that, right? Like the thing where you are like i'm working with you know an l lam and agent to chat about whatever to create something new. And there's like the chat space, there's like the output space.

I love that. And the thing that I think i'd feel a little anxious is like we've been talking about this really a year. Like of course, like i'm going to say that, but it's but like for a year now, I ve had these like mox.

I was just like I want to push the button, but we prioritize other things. We were like OK, what can we like really win at? And like we prioritize audio, for example, instead of that.

But just like when people will I go IT is this magic drafting? It's like one hundred percent right? It's like stuff like that, that we want to try to build into the book two.

And i'd made this comment on twitter as well where I was like now I don't know actually I don't actually know if that is the right thing like our people really getting utility out of this. I mean, from the launches, that seems like people are really getting. But I think now if we were to ship IT, I have to read on I like one later. More have to deliver like a differentiating .

value compared because if you demonstrated ability to fast follow, don't have to innovate every single time.

I know, I know. I think for me, it's just like the bar is high to show. And when I say that, I think get sort of like conceptually like the value that you deliver to the user. I mean, you'll see a nobo and there are a lot of corners like that eye personally caught where it's like are ux designers is always like, I can't believe if you let you let a ship but like these ugly score bars and like, no, I noticed this I promise he's like, no, everyone screen shot this thing but I mean, cutting aside, I think that's true that it's like we do want to be able to fast follow, but I think we want to make sure that things also land really well. So the utility has to be there.

Code in, especially on north park has a special place is called no book alam. Interesting to you. I haven't ve never I I don't see like a connect make tub to this .

thing yeah yeah. I think code code is a big one. Code is a big one. I think we have been really focused, especially when we had like a much smaller thing. We were really focused on like glitz, just like an end end journey together.

It's proved that we can do that because then once you lay the groundwork of like sources, do something in the chat output, once you have that, you just scale up from there, right? And it's like now is just a matter of like scaling the inputs, scaling the outputs, scaling the capabilities of the track. So I think we're going to get there. And now I also feel like I have a much Better view of like where the investment is required. We're as previously, I was like key like let's flesh out the story first before we put more engineers on this thing because that's just going to slow down me.

For what it's worth, the model stole understands code. So I seen at least one or two people, just like download, get hub repot, put IT in there and get like an audio overview of .

your this like .

these are all the files are connected together because the model still understands code.

Like even if you haven't, i'm sort like that that creep inside of things. I did watch a student like with her permission, of course. I watched her do her homework in no book I am and I didn't dollar like what kind of homework to bring.

But SHE brought like her computer science homework and I was like, oh, and SHE uploaded IT and SHE said, here is my homework. Read IT and I was just the instructions and I was like, okay, i've read IT and the student was like, okay, here's my code so far and SHE copy based from the editor and he was like, check my homework. No book alone was like a number one is wrong and I thought that was really interesting.

Didn't tell her what was wrong, just said it's wrong and SHE was like, okay, don't tell me the answer but like walk me through like how you think about this and he was what was interesting for me was that he didn't ask for the answer. And I asked her, as I know, why did you do that? You'll you want to learn IT, just like because i'm going to have to take a quiz on this at some point. I was I got a really good point and he was interesting because no, no book I am. While the formatting wasn't perfect, like did say, like kate, have you thought about using no, maybe an indigenous and like this so that that was .

are you adding like a real time chat on the output? Like you know there's kind of like the deep that show and then there's like the listeners call them and say, hey, yeah, we're actively that's .

one of the things were actively prioritising. Actually, one of the interesting things is now we're like, why would anyone want to do that? Like what are the actual like kind of going back to sort of having a strong pov about the experience is like what is Better?

Like what is fundamentally Better about doing that? That's not just like being able to q and a or know how is that different from like a conversation. Is that just the fact that like there is a show and you want to tweet, the show is IT because you want to participate? So I think there's a lot there that like we can continue to pack. But yes, like that's coming.

It's because I formed the paris social relationship. Yes, I I just .

be party life.

Totally yeah but IT is obviously be because openly I has just launched a real time chat. It's a very hot topic. I was say one of the toughest we are engineering disciplines out there because even there API doesn't bw interruptions that well to be on this. And yeah, so real time.

that stuff I love, that thing I ve yes.

okay. So we have a couple ways to end either call to action or laying one principle of A I P M ing or engineering that you think a lot, a lot. Is there anything that comes to mind.

if you like that? The test, of course, i'm going to say go to no book I am on google, try IT out, join the discord and tell us what you think .

yeah especially like you have a technical audience. What do you want from a technical engineering audience?

I mean, and I think it's interesting because the technical and engineering audience typically will just say, hey, where is the API but you know, and you could be addressed to you. But I think what I what I would really be interested to discover is, is, is useful to you. Why is this useful? What did you do right? Is useful tomorrow.

How about next week? Just the most useful thing for me is if if you do stop using IT, or if you do keep using IT, tell me why. Because I think contextualized IT, within your life, your background, your motivations like this is what really helped me build really dict things.

And then one piece of advice for a IBM.

okay, if I had to pick one, is just always be building like buildings yourself. I think like for pms, it's like such a critical skill. And just like take time to like pop your head up and see what else knew out there.

On the weekends I try to have a lot of discipline, like I only use ChatGPT and like cloud. On the weekend I try to like use like the aps. Occasionally, I i'll try to build something on like gcp over the weekend because like I don't do that Normally like at work, but it's just like the rigor of just trying to be like a builder yourself and even just like testing. I like you could have an idea like how product should work and maybe your engineers building in. But it's like what was you're like proof concept like what gave you conviction .

like that I was the right thing. And if you like .

consistently, like the most magical moments out of like eye building come about for me when like i'm really, really, really just close to the edge of the model capability and sometimes just like farther than you think IT is like I think while building this product, some of the other experiments, like there were faces where I was like, easy to think that you've like approached ed IT, but like sometimes at that point, what you really need is to like show your thing to someone and like they'll come up with creative ways to improve IT. Like it's we're all sort of like learning, I think so like if you like, unless you're hitting that bound of like this is what I M I one point five can do. Probably like the magic moment is like somewhere there, like in that sort of limit.

So pushed the edge of the capability.

Yes, 小助理。

it's fund. We had a nichols' tiny from deep point on the potent he was like if the model is always successful, you're trying not trying hard enough. I give IT hard thanks. So um yeah to my .

problem like sometimes and i'm not smart enough to right .

that's I think I think like that I hear that a lot that people are always like, I don't know and it's hard like I remember the first time google search I was like, what wait I my dad was like anything so anything I got nothing in my brain dad, what do you mean? And and I think there is a lot of like for product builders is like have a strong opinion about what is the users supposed to do.

help them do IT principle for a engineers or like just one advice that you have others.

I guess, like in addition to pushing the baLance and to do that, that often means like you're not gonna IT right in the first go. So like don't be afraid to just like batch multiple models together. I guess that's i'm basically describing in an agent. But more thinking time equals just Better results consistently. And that holds true for probably every single of time that i've tried to build something.

Well, at some point we will talk about the sort of longer inference paradise IT seems like deep mind, this rumor to be coming up with something you can come in, of course. yeah. Well, thank you so much. You know, you've created I I actually said, I think you saw this, I think that novel girl, and was kind of like the ChatGPT c moment.

And I so crazy, I saw that I was like, what the charge? Beauty is huge for me. And I think, you know, when you said that other people have said that I was like, is IT yes, that people .

weren't weren't like really cognisant ent of nobel m before in audio views in nokia, like unlocked the ua use case for people in the way that I would go so far as to say clock projects never did. And I don't know.

I think a lot of IT is composite pm ing and engineering, but also just you know it's it's interesting how a lot of these projects are always like low key research previews for you is like you are you're separate org, but you know you build products and UI innovation on top of also working with research to improve the model. That was a success that that wasn't planned to be this whole big thing. You know your tp s were on fire right when, oh my god.

I was so funny. I didn't know people would like really catch on to the l mofo. But IT IT was just like one of those things or I was like, you know, we had us for more abuse yeah we many times and you know is a little bit of of a subdue of like, hey, remind that gives more tps.

I just think like when people try to make big launches, then they flop. And then like when they're not trying and they just they are just trying to build a good thing, then they succeed. It's it's this fundamentally really weird magic that I haven't really and capitated IT, but you've deve dit.

Thank you. Thank you. You know, I think we will just keep going and like the same way, just keep trying.

keep trying to make Better. Yes, I hope so. right?

Thank you. Thank you. Thanks for you. thanks.

How NotebookLM Was Made 01:13:57 Share

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0

Deep Dive

Shownotes Transcript

How NotebookLM Was Made