27. 英文播客：和深度学习奠基人聊聊大脑、模型和人类的爱

2023/6/2

张小珺Jùn｜商业访谈录

AI Deep Dive AI Chapters Transcript

People

特

特伦斯·谢诺夫斯基

Topics

谢诺夫斯基从个人求学经历和研究历程出发，阐述了深度学习的发展脉络，特别是与杰弗里·辛顿合作开发玻尔兹曼机的意义。他认为，玻尔兹曼机证明了多层感知器学习的可能性，并为深度学习的发展奠定了基础。他还谈到了深度学习与连接主义的关系，以及与符号主义的根本区别在于对世界复杂性的理解。他认为，深度学习通过规模化计算，能够吸收世界的高度复杂性，解决早期人工智能方法难以解决的难题。谢诺夫斯基还回顾了与Marvin Minsky的论战，指出Minsky的符号主义方法的局限性，以及其对神经网络领域的负面影响。他认为，大型语言模型的出现代表着深度学习领域的新范式，其成功源于自我监督学习和规模化计算。大型语言模型的出现也挑战了乔姆斯基关于语言学习的理论。谢诺夫斯基对大型语言模型的未来发展方向进行了展望，他认为，大型语言模型未来可以加入更多大脑的功能模块，例如长期记忆和情感模块。他还讨论了人工智能的潜在风险和监管问题，他认为，人工智能需要监管，就像其他技术一样，这需要持续的测试和改进。他认为，每个公司都将构建自己的专用模型，以避免数据泄露和依赖云服务。最后，谢诺夫斯基表达了他对人工智能未来发展的乐观态度，他认为，人工智能将彻底改变各个领域，例如教育领域。但他同时也强调了谨慎和监管的重要性，认为人类正处于一个新的时代，一切都会发生改变。张小珺作为主持人，引导谢诺夫斯基对深度学习、人工智能等话题进行深入探讨，并就相关问题进行提问，例如大型语言模型的智能性、人工智能的潜在风险以及未来发展方向等。

Deep Dive

Chapters

本期节目采访了深度学习的奠基人之一特伦斯·谢诺夫斯基，探讨了他对深度学习、大脑与AI关系、大语言模型等话题的看法。谢诺夫斯基回顾了与杰弗里·辛顿合作研发玻尔兹曼机的经历，以及深度学习领域早期发展中面临的挑战和机遇。

谢诺夫斯基是深度学习的奠基人之一，也是美国四院院士
他与杰弗里·辛顿合作研发了玻尔兹曼机
深度学习早期发展面临的挑战和机遇

Shownotes Transcript

Hello, 大家好，欢迎收听张小珺商业访谈录，我是小俊。这是一档描摹我们时代的商业文化核心知的访谈节目。今天是一期英文博客，嘉宾是特伦斯谢诺夫斯基，他是 A I 之父，也是最近刚刚从 google 辞职的 jeff e hinton 之外的另一位深度学习的奠基人先驱人物。 38年前他们两个人一起合作研发出了玻尔兹曼机，这是一种奠定了今天深度学习基础的一种神经网络。辛诺夫斯基是美国的四院院士，也是全美在世的仅三位四院院士。然后他有一档非常知名的慕课，大家可以去看，叫 learning how to learn 学习如何学习。如果你对深度学习的原理，大脑与 A I 的关系有兴趣，那么这期播客会是一个你的很好的参考。 Terry 对当前的大模型进展也有一些很新颖的判断。比如说给 A I 注入情感要比注入语言更容易，以及未来每家公司都会建立自己的模型。

Hi, doctor tyron seg ski.

Hello, ann 小 uh.

you can call me bena, that is my english name.

Oh, real. What this? Uh, where did you get that name? Why did you pick the need .

I just think is the interesting name. And he is quite .

lovely yeah I I I can reminds me of all some some kind of moral latin of derivation maybe mexico. Well.

um when did you first decide to become a scientist and when did you specifically choose to become a scientist? The view of neural networks is learning and the fiction. Al.

okay. Well, uh you know I think i've always uh had a uh interest and also um you know uh a deep curiosity about the brain. Uh even when I was Younger, uh now when I went to school for college, I I beige in physics uh because I thought that would be the most chAllenging of all the sciences in terms of the the power of the theory and and that I thought that's what the brain needed, brain needed a theoretical foundation uh, and of course it's very, very complex so uh, I thought physics, the tools of physics, would be useful for me to be able to uh, get that uh, that kind of training now then I went to graduate school in physics. I cut my page in physics.

So I was very well down the road before I realize, is that if if you really wanted to understand the brain, you have to understand something about neuroscience, write the biological basis of the brain, and and and that's when I switched to, uh, neuroscience as a post doc. I I did a post doc at harvard neurobiology depart neurology medical school and and there I realized that, uh, the really I still needed additional training in computation. No other words uh if you try to understand the brain, you you know it's it's not just A A physics problem or biological problem because the brain does something that other objects in physics don't do, which is that IT learns and I can think and and so that was where competition comes in.

So so I I found my way, all the way back to computational. What I found in the field in neuroscience call computational neuroscience. And I recently received A A very procedure, a prize of group prize and neuroscience. So that that has no bless of the whole field is really growing very, very rapidly. It's related to the topic that we're gonna discussing today as IT turns out that there are many, many parallels and a lot of, uh, very, very value a discussion that you can you can have about comparing similarities and differences between these large language models on the one hand and the human brain on the other.

So in your long research, korea, one shining moment was code developing the boss man machine with jeff re. hinton. Did you realize at that time that this machine could make such a significant impact on the history of artificial intellect?

Well, uh you know that was one of the most exciting times in my life and also working with jeff IT was a uh, a really exciting opportunity for me because you know Jeffery has very deep interventions on computational intuitions and we complimented each other in terms of our backgrounds. His background was psychology in artificial intelligence. Mine was physics and neusatz.

You we messed perfectly, you know we filled in each other's backgrounds and and we really uh uh both both in terms of the discoveries we made, but also personally, I I think we've remaining in a very a long lasting friendship of our entire lives. So but you know, the most machine, as you point out, really was a seminal for the following reason, that IT was the proof that missin pepper, or wrong, right, missin pepper in their famous book on the perceptual, they made some very nice proofs about the limitations of the perception on. But they then had the opinion at the end that no one would be able to generalize this.

A learning role for percept us to multi layer percept runs you with many layers. And so what jeff and I did was we found that if you um extend the architecture to make the units probabilistic rather than deterministic, you know zero one with certain problem ability so that that would fluctuate. Yeah so the bolt machine was an existence proof that uh musk pepper er or on that you know here's an example, right? And now, uh, what was beautiful about IT? And so I think for both of us feel that was the most beautiful learning algorithm uh and and architecture very elegant.

And you you know I really was inspired by physics, which was my background. And so I loved IT because IT was a kind of a server dynamic proof that we had for the learning algorithm. But IT was also what makes a different from backup p is that IT was all local.

Uh, you didn't need to background the the air you you could compute. All you need to do is compute the correlation between the input and outputs, other two different conditions, one when you the input for present and the other when they were taken away. Then we called that the sleep phase so you compete the correlation, you attract them.

And that was a very, very effective for small networks. Uh IT required IT you come to deliberation and you have to compute the average correlation. So IT took much more computing uh than the background alcohol. And so IT IT really wasn't very efficient .

compared to early internet s like the protection. What aspects of the both men machine made a deep learning Better? And what are its limitations?

Well, OK, the, the, the, the beauty of the the also aching was that, at first of all, admitted, uh, many hidden letters, right? You can do hide just like you can have for deep learning. So there was already a deeper learning network way back to the eighties.

And me, we didn't call that, but I was, I mean, you know, you could make a deeply. The other thing was that mitted the learning out could be used both for supervised learning, which of course, with most of work out with backpack has been done, but also for unsupervised and learning problem installations. IT wasn't just learning a mapping between categorizing IT was IT was also able to develop internal probability distribution of of the inputs and then a very high dimensional space.

So IT IT had those of the the strength limitations are, like I said, IT required, uh, a lot. Tk computing IT was very low. And also when you have many layers and as you added layers, IT required longer and longer for IT equilibrate between all the inputs going up and and IT has to come down the whole the whole network casti basically become a single coordinated stake.

You know IT has IT has its its in physics, this is called coherent. When when you're near, when you near A A critical point like be to a face transition, say between uh water and steam, that's a there's something very special happens at that point. You you the whole system becomes coherent and that's what we were. What you know what we found is that the most machine has become globally, it's still very injected, put a lot of effort instead showing that you could build IT up layer by layer. So it's still a very viable algorithm, is just that IT needs more .

computing during those years. In fact, very few people believe that and you run that will fed a deep learning were viable for ai. And they thought he was just kind of a joke by some self proclaim, enthuses one professor kington approach you with the idea of collaboration. What did he say to you and what did you respond to him? Why did you choose to believe him?

Well, uh, we, I bet have A H small workshop here actually in Sandy ago. And I think seventy nine. And no, I back then IT wasn't even the way describe IT.

IT says, if we were even notice, nobody even cared. Other words, there was a very small number of people in the whole world international. There is probably had a dozen people at that meeting.

And you know, if we were just we were researchers as we were doing something that was different from everybody else, but nobody notices us. We were very happy working together because we all, you know, we all have the same intuition. And in the intuition is that, you know, these are very difficult problems that have to be solved in vision, speech, recognition, language problems. And that the only existence proof that the problems were solved was that nature had solved them.

right? yes.

And so our review was, you know, why not look under the hood? Let's see how what nature has done for us, and let's try to reverse engineer the brain. Uh, now what you when you do that, uh, you don't want to duplicate the technology because it's much more advanced than ours.

Yeah IT turns of the energy use and the the the scale. You know, even today's networks on even approach fraction the brain. But what you can take away from the brain are of general principles.

And so that's what we try to extract, and then to create our own artificial versions of the brain. And by far, the most important principle that was completely missing from artificial intelligence at the time was that you could learn the weights. You could learn to solve problems by giving IT example.

And that really was, is, is A A very important way that the brain adapts to the world. The brain can learn language, you can learn to place orts, you can learn physics, you can learn social and skills. And, you know, in other words, these are things that are not programmed in you as if you know you're writing a computer program.

The the part of the brain that s innate is the architecture and and the mechanisms for synaptic classy. So these are biological mechanisms that allow the the brain to be uh in IT have to have the connections that when you're born that are close to what you're gonna as an adult. And what you do is refine that with learning so that those of the principles, massively large scale of lots of connections between the units and a learning and and so that's what we focused on.

And you know if I wasn't um back that in the eighties, although we the same in learning, algorithms that we use back then were now are being used today. But what's happened since then is that the scale has been uh mass unbelievably increase because of morals, law. And in terms of the number of units and the number promoters know up to a trillion, we should still. But if we have small compared to the brain, you know the brain has about, uh, I think you know it's tens of the fourteen, tenth of the fifteen. So thanks to the coal source, is still about a thousand times more connections, more more parameters.

How did the human brain inspire your research? Uh, during this process, you want said, this is the most exciting moment of your life, and we are confident that we have figured out how the brain works. So what new discoveries did you met doing this process in this regard? Oh, oh, okay.

So for all first five, I don't want to give you the impression that we understand how the brain works is still a big mystery. That's why i'm in neural science, and because I is a chAllenge and uh and and you know brains are much more powerful than any neural network that model that anyone has created.

yes. Do do you have any. New discoveries during this process.

Well, uh, okay so well well first of all, uh you in my book I have a the whole chapter which shows that the architecture of convolution networks, which is one of the first that was made a big breakthrough in two thousand and twelve uh at the neu's meeting that was held in lake taaoa uh where jeff and show that you know the um you could reduce the air on the image data set by twenty percent, which was like you know living into the future by twenty years so that a conditional neural network if you if you look at IT, the architecture is very similar to the architecture of the primate visual system in terms of the way that is the the the the signals go through the different layers and the convolution architecture for the uh visual inputs coming in uh preprocessing uh and then there's a lot of other mechanisms like Normalization and grouping.

And so for that are all a present in in in in visual cortex, which has a visual cortex has about twelve layers in terms of the number of areas that are process the information sequences. So that was the case where, uh, convolution real networks up were required by the the visual architecture. And now what what what's happening is that a lot of the discoveries that are being made with transformers, for example, for a natural language processing and and and many other architectures, uh with recurrent networks that uh have have have emerged from uh analyzing these uh networks, uh, trying to understand how they work, have then put tools and techniques and waste analyze neural data.

So making vastly great progress and with collaboration with across A I and neuroscience, then we were in the last century which was very, very slow, painful IT was very complex and IT was very complicated. Uh, just a record from one neuron was is extremely difficult. But now we have tools and techniques to record from hundreds of thousands of news si multi eusden and is is giving us A A much more uh uh global picture of of how all the different neurons are cordially together. So so that's that's what we're exciting right now is that there's a cross talk between the engineers and the uh neuroscientists, which is very, very exciting because it's it's accelerating our understanding of of both how brain works and also how to improve ai.

What did the opponents like those A I establishment figures like a Marvin minsk， I think was their opinion and looking I know what's the biggest difference in the underlying understanding of the world between the like you 呃， you believe in connections ism, you believe in deep learning and neural networks, and those another cap publish camera symptom. So what was the 呃 difference in the underlying standing？

Yeah OK. So in the twenty century, computers are so puny that all they could do really efficiently was logic. So A I was based on writing down logical rules, incorporated symbols and manipulated symbols. And and so people run programs that were that built up these rules and try to solve difficult problems.

And I think in retrospect, uh, what they the mistake, no mistake, but what wasn't IT was the the fact that nobody really appreciated how difficult these problems are, that nature solved and the vision is extraordinary, complex and difficult. The problems that the brain souls, because it's so efficient IT, feels as if it's easy, right? I mean, you look out and you see objects.

What could be difficult about that? right? okay. So here's a true story about the first dark ground. You know this is a big defensive vands research project adminstration.

Um it's it's the military research wing um the first grant at the A I lab, Marvin sy, got A M I T was to build a robot that could play pink pal. You know they got the grand. But then they realized you thought we did not ask for any money to write division program. And so they assigned to a graduate student for summer project.

Yes.

IT IT seemed like IT would be easy. Okay, fifty, you know, the big A, I made in a dark. But that was in fifty six and in two thousand and six, fifty years later, anniversary meeting.

And I met mark for musky, and I ask him, is this a true story? C, because, I mean, i've I heard this, and I don t know IT seems A A powerful ful. I mean, if that that that, you know, is somebody must have made that story up. And he said, well, you've got your facts on, we did not assign to a graduate student. We decided to undergraduates.

I remember this story. I read your .

book OK. Okay, well, it's a true story. And and and I I think that what this, this, this was what what I think in retrospect, is that they did the best they could with the computers that they had at the time. And what they didn't with the problem is, is as the problem gets bigger and bigger, if you try to solve IT with writing a computer program, the program gets bigger and bigger and and that's very labor intense of writing programs is extremely uh is costly.

I mean, in terms of the of paying programmers, especially if that gets to be idea millions of lines of code you bill that IT doesn't scale. Okay, so that that was the problem with A I and twenty IT did not scale. And if you need to get IT up to a scale to solve the difficult problems about, you know, how the brain thinks and how the brain, how language works, they would have never gotten to IT even if they have kept, even if you gave them billions of dollars and to write computer programs that were millions of lines long, they would still not solved the problem, right? IT would have been clunk y OK that they didn't know that.

And I know nobody that, in fact, back then, we know we had these tiny, little networks with one hidden layer we could approve of the principle that we could solve of the problems that the perceptual couldn't solve. But what we didn't know was how IT would scale up, what would happen if you have ten hundred layers, right? We didn't know because we couldn't simulate that too complicated are too to computationally intense.

So now we know IT turns out that you wait thirty years and the computers get millions of times faster. Now we can begin to solve real, real problems, right? And that's something that we didn't know back then. But you know, IT were the situation back then where people thought that, well, you know, maybe it's a dead end because couldn't solve difficult problems in the eighties and nineties, but they could not solve the difficult problems either. So we we didn't care.

We we're just gonna just happy to keep going and saying how far we could get, you know what? And the other thing is that was really was missing from the are conceptual framework back then was the appreciation of the complexity of the world. The world is a very high dimensional.

And by that I been incredible amount of information, just vision for example, right? You have a camera with a mega pixel, right? And you know we your your retina uh has a hundred mea pixel, right? Is is uh bottom, you know the about of inflation coming in and just like a firehouses of unbelieved, lot of information is coming in all the time.

And uh, you can't reduce. If you to reduce the dimensionality, you lose the information, you can compress IT, which is to assemble right? The idea was a ability of symbols was that you could compressed something as complex as A A word for an object right into a symbol.

But you know, like up, you write this little symbol. And IT stands for all cups, not just this cup, but all cups. okay? 嗯，哼， isn't as very powerful.

The problem is that if you have to recognize an image of a cup that doesn't help you, because cups covered all shapes and sizes, you can see the different positions from the top. From the bottom is a high dimensions problem. The world is side visual.

And IT wasn't until we are able to get networks up to a size with. Now, you know, with trillions of of these parameters that we could begin to absorb the complexity the world into the network in a way that you could recognize objects, recognize speech. And then now natural language, not not only does recognize IT can generate b it's like we a irl.

Now I mean, this is like as in fact, this is really exciting and very interesting because what we discovered is that member, I signed my face transitions, we go from one state to another like um you are going for a liquid to a steam or actually even Better examples. You go to iron that is unmaking h at high temperature. But as you lower the temperature, IT becomes a magnet, is a face transition.

Well, turns out there are face transitions in network. As they get bigger and bigger. There is up until some point you can't make much progress, say, with the object recognition and images, you know is very poor performance.

But suddenly once you get to a certain size, IT IT starts getting Better and Better and Better as IT is bigger. So there's a minimum amount of complexity that you need to solve a problem. But and and as they get bigger, bigger is Better, Better, Better.

And to solve language problems, you IT was another face transition that occurred. No, there was another bigger network. And so far than what we're discovering is that as you get bigger and bigger networks, they can do more, more complex things.

You know they behaved more and more intelligently. And that that was something again, that no one expected that, that there was a minimum of model complexity that you needed. And and that's really why we've made these great advances, is that we've been able to scale up the computation.

Now what people are doing is building special purpose hardware to scale IT up even further. So it's gona keep going. You know, this is just the beginning. We're just and here's the analysis. okay? I I will give you, I, I I am not sure if if you remember this from the book about the right brothers who d were the first to have a man flight, we were back then.

But there was a analogy people would make that you, if you want to build an airplane, then watching, you know, birds, you not you learning for birds didn't shopping their wings, right? Ha ha. Well, yeah, there's a biography by dave mccoy, c uh, beautiful bioware brothers.

And here's what happened. They spent a lot of and watching birds when they're not flapping their wings, but they're able to glide. Have ever seen birds to sky the on the Cliff here we have a Cliff, we have we have the c goals and and close and and there's a the three hundred and four Cliff and and the the wind comes up.

And if they can go for hour without flapping their wings, and not only can they do IT, but so can humans, because there are parallel ders other people with these parachutes and also a hang lighters. They can go back and forth for many hours, back and forth, back and forth, back and forth. And that's because wings can hold you up.

And that's what the right brothers are interested in. So they built a wind tunnel and they designed to win that could hold them up. And that would about feathers, right? What are you going to learn about flying by looking at how feathers are made? You know what? What the material isn't so far, right? Then everybody would say at that time, you know, that's silly.

You know, we've got metal you, which is much Better than the feather, right? Because it's different. So forth. We can make a very big.

And so than than, you know, why bother with fathers? All the attempts to build airplanes out of metal failed miserable because metal is very heavy, right? In engine that very strong.

Okay, so here's with the respondent. Okay, well, the the feather is very stiff that has a very stiff quill. IT goes up the the middle. So let's build a wing made out of wooden spars, right? But only a few.

We're not going to build a lot of all wood because we're gonna is if you're going to put canvas, right, just like the feathers, the fathers are extremely that the surfaces, high services, but very light. And so we will put canvas between the wooden spars and that way will have a big surface area, very light, just like the bird wing. right? So what they did was they took the principles, not the details, the principles.

okay. And in kd hawk, you know, they didn't get far off the ground. Then they got like, you tend to twenty feet off the ground, and then I didn't go very far.

You know, maybe you have F A mile or something. But IT was proof of principle that this is gone to work. And and over a period of of the wings evolved and the agency volts and took a long time.

And by the way, the most difficult problem that they have to solve this control, how do you control the directions going right? That's a very, very important problem that was only solved a little bit later by putting wing flaps that we go to go up, down. And again, you know, if you look at birds, that's what birds do.

They, they, they twist their wings when they want to turn right. This is like, so you know you that nature is an infinite source of ideas about how to self very complex problems. And you have to be a good observer.

You you have to look and understand, see through the details, see what is the underlying ideas, what what are the underlying concepts that that was you that were nature used principles and and that's really I think what uh jeff ane and others this time we're trying to do is to see in this new architecture if this is this massively pet OK. Now this step back and talk about computers. Right up until recently, the only game in the toe is the vanloads architecture, where you have a processor, you have a memory, and you a bunch have a bunch of instructions that you uh you like program.

And that is, you know, incredibly powerful because, you know, you can solve a lot of signal problems that way, right? You know, keep the north matic que recently, you can, you can keep track of a huge amount of data and you can sort in in search. And you know, is no doubt that that we that we've we've made a huge amount progress because these computers have a allowed us to simulate other architectures, right? Paralo architectures are very, very difficult if you, if you use programs are very difficult to organize and know.

Supercomputers now are parl massively parallel with hundreds of thousands of cores, right? And it's very difficult to coordinates all these cores. And and I just went and visited a supercomputer center and texas, and they said, that is what makes the difficult is the speed of light is is actually not infinite, right? It's IT goes about a foot in an ano second, about that far in a nanosecond, of course, a nanosecond is a clock time for a nanosecond, you know, gig.

And so the wires between the course turns how to be critically important to time delate. And and so nature had to deal with the same thing. There are time delays between neurons. And nature has solved that problem. Nature has figured out solve that problem.

That's the problem i'm working on right now, is how did nature solve IT? And can we take that into account when we build our massively parallel architecture skill and can continue to scale up, up? So there's we're to learn from nature.

I also read another very interesting story from another book is like the history of declaring revolution. And uh, in this book there is a story is like, U. S. Asked, marking mini, are you the devil? Did you really say IT?

Was that meaning? I was tell you about the fifth anniversary meeting. You know, he's very smart guy. But IT turns out that even smart people make mistakes.

of course.

Yeah and even the best of us and and you know I don't blame him, you know at the time and he made his decision to go in one direction. But he he unfortunately a lot of people in neural network felt that because of his book, but also just because of his influence, he's very powerful in terms of funding major. You know he he is being the head of the A I lab M I T, founder of the A I lab and M I T.

And also, uh, all of the students when got the great jobs, S A stanford ry to give them on, you know so he did the whole you know field around his his vision of A I right and so and so look, I was at this meeting and IT was pretty clear for me the the versa meeting was to let's lift back at our successor, think about the future, right? I was pressure to be that every single person there who was making progress was making progress not because of the old fashion writing programs, but because they were uh taking advantage of large data sets, both envision and in a language, just for example, passing sentences on a gene cHardy ac, who is a student of a martian minsky and a notiki. You know, he said, I couldn't pass, you know, worth anything by using sibel processing.

But as soon as size I, I got a hold of a big corpus of part sentences from actually get my students to do this for me, I was able then to look at the statistics of words. And I know the common words that were paint in pairs and triplets and over and and now by just taking that and seeing how the new sentences are formed from similar combinations, we were able to solve that problem. And, you know, my god, you know, that was, everybody said h wasn't to tell you, able to semble the complexity of the statistics of the world.

There was no progress. And and at the end, misty said he got up and said, shame on you to students your failure because you're working on applications. You're not working on general artificial intelligence.

I was in the audience. I wasn't one of the students. I I actually I talked about a reinforcement learning and how temporal difference learning was was able to solve and become champion level backhand player.

A juris arrow, who a colleague of mine student is he? He dismissed as, again, right? But even, even though he was, he was creative, but actually he was able to look.

And now, you know, unsterilized, you know the story about go and deep mind, right? I mean, that basically was the same architecture, uh, which is more units and more hide layers, same algorithm, temporal difference learning. But we did know back in that was scale up to be able to solve even more complex problems like go.

But he was very A, I I thought, very close minded. No, he I I felt sorry for his students. I felt that that this guy, i'm willing to to say he's a pioneer.

But you know, he's talking him back, holding us back, know we all of the students want to go forward and you know we're working on these applications is a good way of doing IT, right? Because he comes gripes of the complexity of these problems and anything case I I was pretty pissed off. I had to say that.

And so at the final bank, where at the end, you know, each of the people, there are twelve people there who were at the first meeting, including the sky, and and he every once said, you know, I gave a brief speech about what they thought about the meeting and was that was good. And then there was you audience questions to the audience. So I put my hand up.

I ask a doctor being skiing. There are some people in the neural network community who think that you're the devil, because you held back progress for a decades. Are you the devil?

哦。

in public I said this, I I had to say that, you know, I, I, I, I, I, I, I, I do. I was not, I mean, not usually like that. I mean, I am pretty even going guy, and I don't often confront someone like that. But I I really felt that he had to be called up.

Yes, he was very angry.

I I would actually at that point, I wasn't angry so much as I know I was, I think that I was, I was, I was.

I thought I was angry not because of what he said, but the way he said that to his students that that don't treat your students that way your students are, you know, your family IT was just a violated my sense of of, you know, being human and and and and being a able to help people rather than to hurt them, right? Which is what he was to abusing his students. And I don't like that.

okay? Any case, whatever the reason, I I asked on this question, I use the devil IT was clear that I I press this button, you know, that expression means that means IT was, I was like, suddenly he started spouting all kinds of things, right? And he just went off saying, you don't understand about complexity.

You understand about the real with computations is really about he went on and on. And finally I use them. I said, 就 i ask you a yes， no question.

Are you or are you not the devil? yes. A binary question. Yes, no. He kind of spot, kind of sport, but and then he stopped. He said yes on the devil.

Yes, you know, I put the corner and I had to say I was unfair of me. But yeah, the reality is he was the devil and, you know, and bad. I think the audience was shocked, you know, that of this confrontation.

And I know a couple people that came to the afterwards and thank me, you know said, you know really was something that everybody was thinking and you know was was really sad that where he believed behaved and over but any case, that's all history. And in in again, in retrospect, I I actually think that IT wasn't him. IT was the whole field had congealed around a paradigm that wasn't working.

The whole field had these know, as you said, have bug questions, you boom and bus and and what that reflects is the fact that um they they were you know making progress and sometimes they made a little bit progress that look very promising and there would be a boom and there will be a bus when you realize that well actually didn't solve all the problems there. You know this is this is by way this is true of all areas of science in entering its all boven bus. Yes, there isn't any exception like that.

This know IT you make progress in in in these greek periods were suddenly you have a new theory or or a new uh, paradise and then you see how far you can take IT and when you reach the limit, well, now you have to wait for the next uh, breakthrough, right? Yes, natural and say every area science goes through the same process over and over and over again. There's nothing wrong. That's just the way things are.

So today, with the emergence of chat, P C and A S lotion has been exceeded your the most optimistic expectations has the emergence of large scale and code model about a new paradise of declining uh.

yes, absolutely, in a number of a crucial ways and uh， you will be very good, is to IT, which is that most other of neural networks were were not generative.

They they they were supply fit for classification yeah what exception is again, generally epoxy networks which are could do things that were uh very, very uh interesting in terms of being able to generate a like if you give you a bunch of faces, you could generate new faces, right? So so that that was an example of a generation network, but that was actually two networks. One was generating, one was, uh, selecting you was he was trying to detect whether IT was real or IT was generated.

And and so is this like a, uh, battle between two different networks and you know they could get Better, Better at at generating and at the technical. okay. Now the real breakthrough with the these uh, generative models was, in my mind, right there.

There are a lot actually there are a lot of really interesting technical details that were really important for the success. But I think by far the most important was the effect that they use self supervision. So uh, instead of label having labelled data, which is what you do with objects, right, you label the picture that supervise learning, and that varies processes because you have that humans to do that right, to get the ground truth.

But with self supervision, you can just use the data itself is actually is a form of self if of unsupervised, because, you know, nothing labels. The beauty is that all you day to do is trained IT up to predict the next word, the sense. And so you can give IT all the sentences from every place.

And that's more training data. So is, you know, the training data is basically said my infinite, right? It's like suddenly you know there's not that's not no longer A A, A constraints.

Before I was you, you had to have as network bigger, you did more data that was for sure. And and that limited the size and network could be small data set is a small network. So but now there is no limit.

You know people can just keep getting bigger. Bigger will see how far that goes. But okay, but that really, I think, shifted everything because began to emerge.

We're unexpected things. I never expected. Nobody expected the the capability that these uh, large language models have. okay. And and out, you know the thing that have surprised, surprised me was the fact that they could speak in english. And I know in other languages too, but you know english that they speak is perfect.

yes?

No, I mean, it's like, you know they don't make romantical mistakes like both humans yeah I I write, I make grammatical mistakes. I mean I when I talk, I make all kinds of grammatical mistakes. We all do right because we're not perfect. But how the hell did you get to be so good and grammar? Nobody really knows.

Well, IT had access to lots of words and lots of sentences, but they word all perfect either I don't know, but in the case that that's a demonstrate and also is a counter example to no cham ski who claim, ed, that the only way that you're gonna able to come up with a machine that can use grammar is by using his theory, right? Yes, a very general grammar and never worked. There's a whole field, completely inuit tics, that tried, try, tried, you know, IT doesn't work.

His theory doesn't work and then he said that learning will never work either again, if he said something, he, but not because he could prove IT. You said, it's impossible, I can believe. I can imagine that's literally with IT.

I cannot imagine something as crude as learning, being able to learn something as complex as language. And, you know, that was his intuition. He was completely wrong, you know, could be extremely smart and student when he comes to things that you don't know anything about, right? You didn't know something about. Learn ing.

Why is he becoming an expert about learning when he says something he can prove? He just says, well, i'm the smart st person in the world and I can't imagine IT well, everybody said, well, he's pretty smart, you know, maybe he's right so you know, and he's actually, he more than mens gave, say, held back not just a eye, but conditions, psychology and linguistics and uh, IT a whole fields were were that were based on his thinking, stagnated for decades and decades. The fact is only now that the istis is being resuscitated.

And you know, because we now we have understand how this a large language bottles work and is giving a some a new insight, though we can act before now. So we're making progress, right? So so linguistics is basically being revived after all these decades. You know it's it's really exciting to to be seeing that happen.

So why? Why is tragedy ity so clever? why? Why is so smart?

Oh, oh, well, there's a big outside. There's there's a big country acy about that. Okay, very big. And you know that the intellectuals love to argue with each other. And in a big topic of debate right now, which is raging and there's differences opinion across the whole spectrum.

And there are some people who say these large language models don't understand what you're saying, they don't understand the way we understand and and and not intelligent and and then they they put they throw slr like they say, oh, these are sarcastic parrots. The fact is parents are extremely smart. So you it's like actually by you know by comparing to apparent that that's pretty high praise.

But what they're revealing another other people who think that, oh my god, you know, they're just they're there's smarter than I am, you know, because they know so much they have this knowledge base that I don't have. And other people, the people think that are they intelligent. But there are sense as to say they can think like humans and they they they have minds like humans.

And so, you know, that's the extreme though. You know, the two extremes. And everything between this is something that happens very rarely, where you have something that suddenly appears out of nowhere.

I were completely at a loss, and it's as if an alien. So that appears from somewhere, you know, outside of the of of of of the earth and starts talking to us in english. And exactly what's happened right now.

And and the dealing thing we can be absolutely sure of that is not human. It's not human alien. And but what is IT?

We've created something that IT looks like, you know, IT has some of the characteristics of being an intelligent and IT certain tly knows a lot, but IT helps problems. First of all, IT makes things up. They call hlubis ation, right?

You know this that if if you sometimes that gets you things that look plausible, but there is just made up, it's not true. So that's no problem. Another problem is that because this have been exposed to so many different views, including a lot of people who you don't agree with, it's sometimes things that offend you, right? Well, but that's true humans.

I know people who say things that offend me too, right? No, maybe that is wearing us. And I have a of a paper where I I said that is the mirror yp thesis that is mirrors us is is like looking in the mirror, okay for humans when they when they have an interview with uh ChatGPT more than just asking you a question.

But you know getting involved in in a way that like for example, in the europe times, Kevin roose at the had this two hour chat with GPT and and and I was completely, you know, shook by the what happened? This interaction was emotional for him, right? That's because he was a bearing.

Him really was in, in, in, in a way IT was reflecting his own needs in his own thinking and what was at his mind at the time. But but here here now is an insight, which are are we are writing a new book about, uh, these new language models. And I already written this long paper that i'm expanding into a book for M.

I, T. press. They'll be out later this year.

And I think I understood something that no one else does. You can blame G P. three. They didn't have parents.

They didn't have anybody who, uh, helps them go through this process of, by the way, the use is reinforcement learning the way that this happens with being just a part of your brain that does was called reinforcement learning, which is is actually uh, below the cortex is called the basic ganga. And this is the part of brain that learn, learn sequences of actions to attain goals. And this is the part that needs feedback from the world about you what is good and was bad.

And that reinforcement learning system is the part of the program that was a core part of of alphago. Alphago had two parts. IT had a deep learning network that which is pata recognition for the board, for the positions.

And IT had a reinforcement learning at engine that assign value to all the positions, right? So they had these two parts. You need both. These large language models don't have a value function. And you know, and that's what are the things that missing.

In fact, that's what of the beauty is that we can look at the brain and say, well, how's the brain get past these problems? Well, they have this based organic is a huge thing sitting there. And and by the way, it's also important for learning how to do up things by practice, like playing the violin or support you.

That coordinated room reborn babies. A long time, they put things in their mouth and they hit things, but eventually they, they get up in the walk, and they can now grab things and able to run around and do things. Okay, but to get good at the sport require specialized practice, right?

You have to play many, many games. And the more you play, the Better you got. That is reinforcement learning.

You have use your basically ganglia to learn sequences of actions that make you good at hitting the ball. Sensory motor portion so that that's a part of the brain that's absolutely essential for being human is missing. That is, uh, large language models .

that remind me that a lots of people know you're very scary because they think maybe we create monster. How view a hintings rein decision to resign from google. He even expressed regret about his life long work.

Well, I I know jeff extremely well. We've talked a lot about this, and I think it's really important. Uh, we think about the worst case.

The other words, what you need to do is, uh, so the best when you do, technology suddenly is discovered, created IT abuse for good and bad. And there will be people who will do, we are one, one personal use IT for something good in society. Another one loves for something bad, right there.

So and so the question is, what's the worst case? That is a badness, right? Is, is this the case that if the bad person uses IT, can they do real damage to our our civilization? And so it's really important us to imagine that.

And if we don't, we're we're gona be in trouble. And so the way that you prevent the worst case is by first of understanding what's possible and we're we're not there, but we don't really know where is having. We don't know, nobody knows.

And so we have to be cautious, right? And I think jeff is being cautious, say, let's let's wait a second here, you know, where's and and what's gonna happen. And this is now my view.

My view is that you have to regulated, just like everything is regulated, every every aspect of life, every technology is regulated, right? For example, you know, you buy food in the supermarket. Well, IT turns out there are bad people out there who could poison you.

They, and they did in the past, right? So how do you ensure the fact you're not going to get poison? Buy some food you buy, right? We have a food and drug administration, the F D, A, the test food, to make sure that it's not bad for you, right? And you know, the regulations have to have to evolve, just like the food to evolve, right? You have to always, constantly be testing, testing, testing.

And that's true. Everything, everything, airplanes, F A, A test airplanes, make sure that they they don't have failures, right? Like boeing head, this is the later right. And and so this is like another is another technology.

There's nothing I don't think, is that going to be different in terms of how IT evolves is going evolve just like all the other textiles ties like White brothers, my brothers went from this little airplane that went in about, you know, fifty miles an hour to jet engines. But that took one hundred years. IT took one hundred years.

And so I think things gonna happen here. It's gonna one hundred years before a lot of people now are thinking really hard about that. Uh, uh so i'm the president of a foundation that runs the neurotic, you know, its annual being that held every year, december.

This beating has grown from just a few hundred people i've been running at now for the last thirty years. You since the nineties, it's been growing my lips and bounds every year. Amazing it's it's now uh, in the last year, new earlids, there were see we have I think we have like sixteen thousand people in person in a three thousand online.

So like nineteen thousand hybrid this this meeting is is not just grown, but IT has the energy, the youth, the all of the ideas. People are doing things really exciting things. And and it's not just people who people people are, not just people who are machine learning people.

These are physicists and you know biogas scientists, a engineers, climate people. They all want to use the tools, so they come to the meeting to find out what the latest, greatest tools are. And we have worked up, but they can get together.

It's really exciting. This is this, be so exciting. You have no idea that the energy, the, the, the, the billions of the Young people coming, oh, you know, there going up. I think this is amazing. Now the reason of bring this up is that community is a very, very aware of the problems of the the shortcomings, you know the problems with fairness, the problems with uh reliability and and and potentially exogenous al threats.

And we bring in experts ethics, and we bring in experts that like deep fakes, what what the impacts that can have on society, you know h where are we have, me know, or G P three, many impact is going have an education. It's the early place where you get all the opinions of all the people that, you know, like jeff on one hand, and other people who want to build things and can make money is like all the big companies are there with. They have boots.

They are hiring everybody is recruiting a much sure about this year because I thought that there have been they're firing a lot of people. But but I understand from my friends google that the A I people are are having fired because of that central to the the the core competence, right? A I is everybody's every company is trying to incorporate IT to every one of their services that they have that like search, these are really important issues.

And I like I said, ultimately, they they have to be regulated. The problem is, how do you do that without stiffing research? Other words, if you say somebody I forgot who you this is a group of people that one of the moral tori um that we should cap the size of the network.

So no network is bigger than a certain size, right GPT for with a trillion weights, that's not the way to do IT. That's ridiculous. You know what you should be happy is a capability, not the size is like saying nobody should be taller than six feet.

And you know, nobody should be taller than we should all be. Make sure that nobody could jump anything higher than anyone else. So we're going to keep people from growing. That's particulars.

You have to make rules that are sensible that allow growth but control growth, growth, growth that has really testing along the way to see what's is there something new emerging here. In fact, what the problems is, we don't know what is capable of, right? These are things that we put in, the things that are just emerging, you know, like you the ability to program computers or the ability to uh right poems and things, right?

You know this is like, you know this is things that we didn't put in. We just gave you a butcher of of of data, bunch of of your tax corporation. We have a job cut of us.

We have to be able to test and and and approve. We should have some approval process before we let them loose on on the society. 这 so in indicate h that's that's happening。 I'm not worried. I'm glad that jeff is worried because no assembly should be and he's a very smart. And so he'll figure out if there's something for us to worry about. But I I really think that were at the very beginning, we are the the right brothers of A I literally we just got off the ground and we don't know how to control the airplay we need to figure control of. And that's really what I think is gonna en over the next decade.

Here, some chino questions currently, like deep learning models require a large amount of data to achieve good performance. How do you think we can reduce dependence on large data size to achieve more efficient learning?

So I ran one of your articles, stay or leave chinese tech workers caught in silicon valley's big tech laos very well, very written uh in in uh you point out that these large language models are are gantry on because they're so much data out there. But there are a lot of people out there now that are building smaller models that are focused on smaller data sets.

And so they're going na be this maybe a small language model, but the point is that theyll be a lot of special purpose models out there. Each company will have a stone special purpose model for its own data set that IT doesn't have to go to the cloud, doesn't need anybody else to listen in on IT. A lot of companies now actually are not.

They ban GPT in the company because of course, you know these companies, microsoft and google, they are they're gonna a take up all of the that they're going to figure out what people in the company are doing by. But on the basis of the questions they're asking, right, you don't want to give away your secrets, right? They have means that of ultimate test.

These are these models are gonna uh and and they may not be small, but the the point is that right now very expensive to to build one and you know a little tens of millions of dollars in a months of computer type big, big, big mass of computers. But uh but in the future, computers are going to get cheaper and cheaper and cheaper. So people can be able to build their own models.

Within the within the next ten years, everybody, every company will be building its own models. That's a prediction. I'm i'm going to put that in my book OK. So I should be IT going though, because I have, my wife is expecting me. But a, you do any other questions in your best questions.

I can answer. OK, 嗯， that's just pretty happy.

Emotions, well, uh, IT has so good emotions. Other words, a IT read all kinds of novels where people have emotions. IT can simulate emotions.

Those with emotions are in other stands, emotions. I think you're interacting with IT. IT can create emotions in you, right? You, that's what I mean with by the miro hypothesis. Step IT will pick up your emotions.

If if you're angry, I will pick that up and IT will reflect that back to so I think that, uh, no, the answer is doesn't have the the emotions in internally. Uh, that's my wife. That's my wife.

Okay, i'm i'm life for being here. Okay, my wife, give me five minutes. I I SHE SHE was excepting me to be right now. But in in any case, however, that having been said, we understand a lot about emotions in our brain.

And so just like fixing the ability of GPT to be able to learn sequences so forth, by putting in the basal ganglia, we can put in emotions. We know a lot of about emotions, and and be easier to put the emotions and then did to put in language. And by the way, there is a lot, a lot of things in the brain that they're missing, long term memory.

The fact that you remember our discussion tomorrow, g GPT three does not GPT. Four does not remember from one day to the next right. We know what part of the brain is responsible for this called the hippo campus.

So why not just simulate the hippocampus and will get a long term memory? And there is a dozens of parts of the brain that are missing from these large language models that will almost certainly be added over the next decade as you add more of these parts of the brain, there are bitter a hundred parts of the brain that evolve for all these different sub critical functions. And all we have now is the critical part is really a strip down human.

Is is, if all we have, this is a very high level, but none of the low level stuff like the sensory motor stuff, right? He doesn't have any sense that doesn't have any motor output, but that can be done. I mean, we have robots will will give you the body, will give you the cameras are right? Well, by way, it's all being done.

All of this is going on. I have friends in these big days you're doing IT. It's gonna have. And so it's just a matter of more effort and more time.

We just like I say, where is the right brothers where at the very beginning, the very beginning of a whole new era, this is, you know, like going from industrial revolution, you you have the image enhances physical power, right? One farmer do the power field with the image, and that would take a hundred farmers, right? That the power.

And we have now the ability to enhance code a power. One lawyer using these large english mouse can do the work of ten. So that that's the future, that's the future. And and a and one thing for sure is that the most important things that will come out of IT are ones that we can even imagine, that we don't even know about you and you mention an education, I think it's gona revolutionize education is no doubt that is that, you know, what is what children really need is one on one with adults. And if we can get of a really good tutor, large language model that is good at a two, three children, IT could serve not only as a teacher, but also as a mentor, helping to the child navigate, and, you know, helping the parents teach, the kid was good.

And then what are the things deep learning will never be able to do.

you know? Uh, that's the that's the question which is completely unknowable in the worse, nobody has a proof of something that can be done. And the reason is that is still evolving, not the words.

If you even if I can't do IT now, doesn't mean they can't do IT with the next generation. Like I said, you know, there's is a scale fin here. IT with every increase in scale is new capabilities.

So right now, if anybody tells you IT litter, if anybody tells you that, oh, IT can just do general artificial intelligence, right? Well, just wait for tomorrow. It's moving target.

And this is the problem. Along in A I is, whenever you will complied softly. People say, oh, you know, that used to be A I, but now that's just pattern recognition. That's not real intelligence, right? At some point you get to the point where, come on, IT will add all capabilities, all these parts of the brain, so that IT will, in some ways, health, what we call general artificial gLance, not yet, but there's no rule or law that will prevent IT OK. My last question .

is humanity just a translation stage in the evolution of the target?

You know, this singularity, people talk about this, but again, that's too early. Uh, you know that that being one scenario. But you know, the future is always more interesting than anybody imagine.

I never could imagine what the impact of the internet would have on the world. And I I can't imagine nobody can imagine the impact that a these large things are can have in the world is too early. And I think that we have to be very careful.

I'm not saying that you just go head barel ahead. No, we've got to be very careful. We've got to to regulate.

And if we don't do IT ourselves, the government will do IT for us. But it's exciting. This is, this is a basing.

You know, we're entering a whole new era in human history. This is IT, right? We're got we where he at north of this is the doorstep.

We are going to the door and nothing will ever be the same again. Never that will. Everything will be transformed. And your lifetime.

好，这期节目就是这样。如果你喜欢我的节目，欢迎前往苹果 podcast、腾讯新闻、小宇宙、喜马拉雅、 QQ 音乐订阅张小珺商业访谈录。如果你有其他想邀请的嘉宾，想听的内容，或者你有任何想探讨的话题，都欢迎各位听众朋友们在评论区里留言。我们下期再见，拜拜。

27. 英文播客：和深度学习奠基人聊聊大脑、模型和人类的爱 01:04:14 Share

张小珺Jùn｜商业访谈录

Deep Dive

Shownotes Transcript

27. 英文播客：和深度学习奠基人聊聊大脑、模型和人类的爱