AI and the Future of Math, with DeepMind’s AlphaProof Team

2024/11/14

No Priors: Artificial Intelligence | Technology | Startups

AI Deep Dive AI Chapters Transcript

People

Laurence Sartrant

Rishi Mehta

Thomas Hubert

Topics

AlphaProof 团队成员介绍了 AlphaProof 的工作原理、优势和局限性，以及在数学推理方面的应用和未来发展方向。AlphaProof 基于 AlphaZero 的强化学习算法，通过将数学证明过程转化为动作空间的搜索来解决数学问题。其优势在于能够解决复杂的数学问题，尤其是在代数和数论领域表现出色，但目前在组合数学和几何方面能力相对较弱，并且缺乏理论构建能力。团队成员还讨论了 AlphaProof 的可扩展性、应用领域以及与人类数学家的合作方式。他们认为 AlphaProof 可以应用于代码验证、科学研究等领域，并促进数学家之间的合作。此外，他们还分享了 AlphaProof 在解决 IMO 难题时的一些意外发现，例如其独特的函数构造。团队成员还探讨了数学研究的动机，包括追求真理和发展通用人工智能 (AGI)。他们认为，解决纯数学问题有助于探索宇宙的奥秘，而数学推理能力的提升也有助于 AGI 的发展。在应用方面，他们对代码验证和将 AlphaProof 技术应用于其他领域表示了期待。他们还讨论了在没有明确标准答案的领域（如幽默）中，人类评价的重要性，以及如何将人类输入与强化学习相结合。团队成员就如何利用 AlphaProof 提升数学研究和教育提出了建议。他们建议数学家尽早学习 Lean 语言，因为它在数学研究和教育中越来越重要。他们还指出，随着 AI 在数学领域的能力提升，人类需要更加关注如何提出有意义的问题。

Deep Dive

Shownotes Transcript

Translations:

中文

Hi listeners, and welcome to No Priors. Today we have Thomas Hubert, Rishi Mehta, and Laurence Sartrant from DeepMind's AlphaProof team. AlphaProof is a new AI system that can find and verify mathematical proofs, building on DeepMind's earlier successes in chess and Go to tackle one of AI's greatest challenges, mathematical reasoning.

In today's episode, we'll explore how AlphaProof works, its implications for math and AI, more about TestTime RL, and what this reveals about machine learning's capability to reason rigorously. Really happy to have you guys. Welcome. Thank you for having us. Maybe you can start by just talking a little bit about your backgrounds and how you came to be working on AlphaProof together. I'm Rishi. I was one of the tech leads on AlphaProof. I've been working in computer science and machine learning for a while.

I'm a chess player and I came across the AlphaZero paper and saw some of the chess games that that agent produced and I found it really inspiring and I thought like, this is the kind of thing I need to work on. Like coming up with something beautiful and superhuman and almost alien felt magical. I came over to DeepMind and the AlphaZero team, which Thomas is leading,

uh, was working on, um, math and that's how I got into math. So yeah, I started working in, uh, in industry to, uh, in, in my early career. Um,

I worked on a minute detection computer networks. I worked on ad targeting and switch to AI research. And there, a constant interest of mine has been systems that can spend more computes to either tackle harder problems or to think more and math seem to be a perfect way for that. Yeah. On my side, I, um,

I was actually a Go player. So instead of doing programming since the age of 10, I was actually playing Go. And I played a lot of Go during my youth. And then at some point, it was also my dad's dream to build a computer Go program.

And so I was kind of figuring out like, what do I need to know to be able to build a computer go program? And then I realized that maybe it was being built at that time. And so that's how I discovered DeepMind and how I discovered AGI. And that's how I joined the company and have been kind of involved with AlphaGo and AlphaZero, MuZero, this line of work.

And recently we had worked on alpha code and alpha tensor. So, you know, like that's way before, uh, but we already knew that transformers were kind of changing a little bit how things were done. And so we, you know, I found that in math, yes, you could get this perfect variability and with alpha code, we realized we can generate a lot of good code. And so it was very natural at that time to think about, um,

the potential there was for mathematics. Can you contrast, so maybe just as context, like for any listener who wasn't a super cool mathlete like me, IMO is the sort of oldest

most prestigious math competition for young people. There's a set of six problems. They feel impossibly hard. And AlphaProof had this really amazing results of solving four of the six problems this year. Can you talk a little bit about math and the IMO in particular as a problem relative to game playing and other search problems like chess?

I think, you know, like first there is a big difference is that in board games you play against someone and that's a lot of fun in the board games. And for instance, when we did this AlphaGo or AlphaZero algorithms, we could really have this thing about self-play. You could always play against someone who is exactly just your strengths. And that proved to be a powerful idea. When you're kind of trying to learn to do math, in some sense, you know, you don't really have an opponent.

You just have to think about it. And math is a bit special in the sense that it's almost like a purely cognitive kind of thing where you can just, you know, the only thing you can do is to think more. You know, maybe you can't really go into the real world and run an experiment, but

I guess mathematicians say that sometimes it's a good thing to take a nap and let your unconscious self kind of think about the problem. And that's a good way to come up with new ideas. But it's a lot about thinking. And so, you know, like when you're confronted with a really hard problem, there's a whole question about how do you go and try to solve it. Maybe this is a good time to ask you to describe for our listeners, most of whom are technical or somewhere in the tech field, but also a broader business audience. Like, you know...

How does AlphaProof work overall architecturally? Yes, sure. So AlphaProof is based on this thing called AlphaZero. So maybe let me start there. AlphaZero has this program we developed to be basically kind of perfect information board games. And the way it works is it's also based on a reinforcement learning algorithm. So you can think of AlphaZero as maybe three components, like one neural network.

Two, kind of large scale rate for smart learning, basically learning from trial and errors. And three, it also has like this planning and kind of search component to it to try to, given the current situation, trying to search for kind of the best answer.

And it turned out that, you know, like maybe we weren't very imagining that when we were doing chess, but if you can handle kind of infinite action spaces, then, you know, instead of kind of looking for a chess move, you can look for, for instance, a line of a proof.

And so that's what we tried to do when we started AlphaProof. Basically, our action space is to generate lines of proofs. And maybe very importantly, we used formal language to do that. So basically, another way to say that is we use kind of code to write math. And it's become quite popular recently.

And the advantage of that is that once kind of the proof is complete, then the machine would give you a signal back to say, yes, your proof is correct or not. And so we could search for kind of correct proofs. Once we find a correct proof, we can learn from it.

and get a better neural network and kind of go into this self-improving loop of tackling harder and harder problems and learning more and more about math. So at a very high level picture, that's kind of how it works and how kind of the ideas behind Office Zero are adapted to the math. One of the things that was an interesting tidbit from your announcement was, you know, there's a series of problems, solves one problem within minutes, three days to solve other problems.

Is there a way you can characterize the search space for math overall or what AlphaProof is better at in terms of domains than others, like types of reasoning within math? The search space in math is quite large compared to something like chess or other board games. So, you know...

Some people might think of writing math proofs as like picking from a bag of known tricks at each step. But in fact, like there are many proofs where like you've got to come up with some non-trivial constructions, like you've got to invent a function out of thin air, or you've got to like come up with like some way to manipulate, even if you're just rewriting an expression.

you can rewrite it in infinite ways and there's only like a few ways you could rewrite it to actually make progress on the proof. Sometimes thinking about some novel problem requires like decades of theory building and approaching it from an entirely new perspective to arrive at the angle that helps you solve it. And so in that sense, you know, the reason why there are like a lot of

math problems that are like very simple to state but have stood the test of time in that they've been unsolved for centuries even, is that the search base is not sort of easy to navigate in most cases.

I think what AlphaProof is good at amongst the IMO categories is it's largely good. So the IMO problems have come in four categories. So there's algebra, number theory, combinatorics, and geometry. The two that it's strongest at are algebra and number theory. It's relatively weaker at combinatorics, although it can do quite good at some IMO combinatorics problems. And we didn't apply it at geometry at this year's contest. And

One of the ways in which it navigates this massive search space is via an idea that we came up with, which we call test time RL. So this is an idea where, like--

let's say you're confronted with a problem that you don't know how to solve, and you can do some search with what you know right now, and you're not able to find a proof to it. What the agent then does is it constructs many variations of the problem in the vicinity of that problem and attempts to solve all of them. And if it manages to solve any of them, it learns from that experience and sort of comes closer and closer to solving the original problem. And you can think of this as like...

when confronted with a new scientific problem, many of your priors that you've developed from other problems may not directly apply to it. And so you've got to do very bespoke experiments on this problem itself to learn something about it. And then he'll climb your way towards the solution to this problem. And so this process kind of mimics that. And so when you talk about the problems that are solved after three days,

These are the problems that had this in this loop of like, we can't solve it with the base network, but we propose many, many variations. We try all of them. We get slightly better, slightly better, slightly better. And then we hillclimb our way to the final solution after like days of inference time compute. What are the limitations or approaches to scaling or, you know, increasing the set of problems that alpha proof can solve? And I guess there's two dimensions to that.

To your point, there's a set of areas where it already does quite well, algebra and other areas. And then there's a set of areas where you may need new forms of training data or other things like certain aspects of geometry or potentially other areas of mathematics. I'm a little bit curious how you think about what is necessary to increase performance dramatically from here. And obviously, what you've done is incredibly impressive. I'm just sort of curious how you think about what are the obstacles to future growth?

Maybe the main thing that AlphaProof doesn't do is theory building. It doesn't invest in theories. AlphaProof only equips this.

number theory wouldn't be able to come up with the complex analysis necessary to derive results in number theory. So it's one of the main things that we don't even regenerate variance and get better by solving these and that enables us to tackle the original problems of interest. Theory building is a component that is

that would be required to your father. Maybe another dimension.

Problems like combinatorics can be stated in a somewhat obfuscated manner. That doesn't mean that all combinatorics problems, just as a reminder, combinatorics problems are all about counting the number of things. For instance, if you have 40 socks in a drawer, how many pairs do you have? It can be stated in some obfuscated way and how to translate them in is a major difficulty and then how to

So then it's a bit unwieldy. How to best tackle this is still an open question. I think there is a link a little bit to your theory of building because for instance, you know, if there were in some sense,

Part of the difficulty comes from the fact that the tools you need to express those things don't actually exist currently in the library that we're using. So, for instance, if you had like some kind of things that expressed, you know, what is an agent, what is an environment, what is a strategy, then, for instance, a lot of the combinatorics problems would be much easier to state.

But because that doesn't exist at the moment, in some sense, kind of the auto-formalization would need to kind of come up with all these things beforehand before actually kind of formalizing the problem. And so that's kind of, maybe that's kind of linked to the fact that, you know, at some point you will need to be able to develop new definitions, new mathematical objects, kind of come up with their properties and their proofs, et cetera, et cetera. So I guess, I think it was 1900 when David Hilbert posed his famous 23 problems problem

They kind of defined a lot of the big areas that at the time he felt were important to mathematics and the sort of unsolved things that were important or topical.

Or is there some Turing test equivalent, I guess, as a generic question? It's an excellent question. That totally defined the mathematics in the 20th century and maybe the millennium problems, the seven millennium problems of which one has been solved so far, is another attempt at defining what

you know, what could be one trajectory of mathematics for the 21st century? What are our chances of solving that? That's like a really hard question because we know, you know, our brain kind of thinks linearly, but we know from our experience in working on the eye that we should actually try to think exponentially.

things might change pretty quickly. At the same time, you know, I think we have no idea how hard maybe something like the Riemann hypothesis is. Is it, you know, is it two orders of magnitude, three orders? Maybe it is like 10 orders of magnitude away from what we can do, right?

because we don't have an existence kind of proof, it's much harder to kind of have an estimate of how hard this actually is. But I imagine that, you know, solving that problem will involve like kind of creating brand new math, brand new theories and things like this. So if we want to take alpha proof all the way there, I think that's a capability we need to...

either it emerges, it's possible because you know, like to prove harder problems, you start to need to be able to introduce this new kind of mathematical objects to decompose the problems or problems. And we see that already happening for the IMO at a small scale. But it's very possible that you could emerge from just trying to solve problems or potentially, you know, we have to kind of explicitly think more about what does it mean to build theories and how can we encourage

of how close to do these kind of things. Yeah, just for our listener friends who don't spend as much time on the unsolved math problems, just to contextualize a little bit, the Riemann hypothesis is, it essentially predicts that prime numbers follow a specific pattern.

and it has important implications. Maybe we can go from there to thinking a little bit about when you were working, beginning to work on math as a search space, there are like lots of interesting hypotheses for like why go work on math at all. One is like the field itself, right? And so be curious if any of you have problems you want to work on. Another is the premise that like math

this sort of advancement in reasoning will allow us, it will transfer to other domains, be they, you know, science or what we think of as more like language-based, like less verifiable non-code, non-rules-based domains as well. Where do you all want to take this? I'm not really a mathematician, so I didn't have like a problem I absolutely wanted to solve, but it's been my favorite subject when I was a student.

And I think, yeah, you're absolutely right. There are like, you know, at least two kind of main reasons why you would want to potentially spend a lot of time on mass. One is that, I guess it's been described as the language of the universe. And, you know, it's been extremely powerful to both, you know, describe and predict the natural world. And of course, to shape it. And you see that, you know, basically we see mass being at the core of all the technology we're using now. So kind of having a good understanding of mass is probably kind of

very important to understand our current world. And then, of course, you could, as you alluded to, make an argument that, you know, kind of

solving math or, you know, like, which requires, you know, reasoning, generalization, abstraction, all these things that we think about when we're talking about something like a cognitive AGI would be, would be, is a path to, you know, go towards AGI and that could really help in kind of the development of AGI. So I think at least for me, you know, these are maybe the two main reasons as to, you know, why math

is a particular interesting you know topic to try to solve in a general way even though you know like mass okay let's say it's still kind of a constrained domain but this kind of unbounded complexity and actually it's representing a lot of stuff uh but it's it might be worse kind of making a a really kind of um focused effort on it because of all the potential implications it might have i can that's something about my own situation um

I've always been interested in the question of making systems that get better by thinking more. I looked at this in machine translation and text diffusion and in navigating agents that discovered how the environment and the way of world they do better by thinking more.

math seems to be in particular proving statements. So the core of alpha proof seems to be one of the last big challenge domains where I still had a long way to go. And when I

one of the key ingredients would be figuring out a way of thinking more. So that can be searching more, that can be a test MRL, and maybe the next step of thinking more will be thinking so much that the agent comes up with its own theories. So math as a testbed for thinking more was my motivation. One of the big reasons to pursue AGI at all is to like uncover the secrets of the universe, you know, like the deep questions of like,

Why do things work the way they do? And why are we here? And, you know, why am I conscious? And what's going on? And I guess like math, like solving these pure math problems, it feels like one domain where we're already...

Like, there's like a large number of people who are already just doing this. They're seeing math as a search for truth. Like, you know, people are perceiving these problems not because they have any applications often, but more that it's like, you know, what actually is the answer? Why does this thing work this way? Something like you and solving the Riemann hypothesis feels like it has the flavor of like, you know, a sort of pure search for truth, which is quite appealing. I guess like related to that, you know, there's a long history in mathematics of people trying

or in science in general or science and mathematics of doing things for their own sake or for the pursuit of knowledge for its own sake. And Sarah mentioned as an example number theory and its applications in cryptography.

Zero knowledge proofs are popping up in different ways in cryptocurrencies. Group theory and algebra have popped up in quantum mechanics over time and was sort of developed beforehand and then applied and developed further there. Are there specific areas that you're most excited about from an applications area for some of the work that you're doing? Or is it mainly doing it for the love of the theoretical part of it?

It's a good question. I think we can, maybe, you know, maybe the answer depends on each one of us. I think one thing that I'd be motivated about is to learn from mathematicians, you know, what they are interested in and what they find interesting in the current, you know, world of mathematics. And so, for instance, you know, I've been reading about this thing called the Lang-Langs program that is trying to connect

different areas of math. And it was described to me a little bit as kind of, you know, just like in physics where you're trying to look for this unified theory of physics. It's like trying this unified theory of mathematics where maybe, you know, we have number theory here and we have like geometry there. And, but maybe there is kind of something behind that, you know, that is more unifying. So I personally like those abstract ideas, even though maybe I don't understand them very well.

But yeah, I'd be very motivated to just see, you know, like kind of what mathematicians care about and they might disagree between themselves as well. I think they would disagree, right? Like if you look at the historical examples, like when Vombelli starts working on imaginary numbers, he gets completely ridiculed by, I don't know, the more important mathematicians at the time. And like...

Many years later, we get alternating current and the ability to describe electricity. I think it's really interesting as a question. If you have the machine, the machine is able to develop its own theorems. Where do you point it for interest or usefulness eventually? It's a really big question. One domain I'm particularly excited about is code verification.

That is, at the moment when we write software, we write the code and we write tests, and when the tests pass, we are reasonably happy. Every once in a while, bugs that pass the test or even security issues.

It would be much better if we could express in code the properties that the algorithm is supposed to verify and to prove in code that the algorithm does verify these properties. So it's already done for very critical domain like Avionics, cryptography, where it's very important. It has to be done by humans. I think the software industry would be in a much better place if...

because verification was much more common. And if we remove the bottleneck, which is humans writing those proofs. So if humans could be enabled by specialized tools, we could handle the miniature of these proofs. I think we'd have done a major step for us. I think another application area I'm excited about is the transfer of this technology to many other domains. So there are two kinds of players. One is just the transfer of the mathematical reasoning skill that

that this agent acquires via this kind of training to like many other domains. And, you know, as we see, like math is critical for engineering and science and,

and you name it. But also, some of the tech we developed here of scaling RL and figuring out how to spend a lot of inference time to compute stuff like this feels like it's quite generally applicable to many other problems. I'm going to ask a question that is more conjecture. We've been talking about math and code and

uh domains which are not easily formalizable but are formalizable and verifiable um how much do you think this applies in the in the language domain can um nai make something funnier and uh it's easier to tell if something is funny than to like write a good stand-up set right uh and and so i think it's kind of an instructive example but like do you have any intuition for um how to uh

take some of these learnings around the ability to verify correctness or quality, you know, within the RL approach? I guess, you know, so as you said, you know, there are domains where there is a ground truth and there are domains where there aren't any ground truths. And so when they aren't, and, you know, like for instance, funny is probably kind of, you know, a human kind of

It's a fuzzy human concept and maybe there are kind of aliens out there. They would have a different sense of humor. And so for those kinds of problems, I think that the only way you can kind of get your grounding is through like the humans are your grounding.

For other domains, maybe the real world is some sort of grounding and then you have to go to the real world to get your grounding. But basically, the RL would allow you to say, "Where does my reward come from?" I guess sometimes it comes from the real world, sometimes you can perfectly ground it, and sometimes the reward comes from the human, and I think that's fine.

There are some techniques basically that we developed in the sense that, of course, when you can kind of machine check things, you can run at a scale that is quite different from if you have to ask humans in general, even though we are quite a lot of humans. So there's still probably quite a lot of scaling that is possible, especially for things like this where, like humor, where everyone should have a say about what is funny or not.

So definitely, for instance, the RL framework, I think is still a great framework to think about those questions.

And part of what makes RL work, that should kind of also transfer. And maybe the things where we rely on perfect verification, that's maybe that wouldn't transfer to that particular question. Where is there room for, I suppose, like human rating or input in terms of the explicit description or labeling of their reasoning in domains that are more verifiable, right? Like if you have...

unlimited access to Terry Tao to do labeling for you. Is that useful? Or how should we think of that versus just do more search and work on better formalization? The way Alpha Proof currently operates there is that it discovers its own proofs and when they are valid, it learns from them and develops its own style.

which has been commented upon as looking quite alien. So with unlimited access to territory, we create proofs that are correct with some niceness as perceived by this human.

And we could start optimizing amongst the set of valid proofs towards proofs that look nice for purposes of interpretability, for purposes of teaching. So there's definitely still room in this space of valid proofs for proofs that humans might prefer. That's actually like a pretty damning statement to some degree, right? Because it's just like...

preference versus perhaps like, you know, capability advancement? Like, what do I care if the proof looks alien if we have new knowledge? Interpretability seems useful in this case, but... Yes, I think that's the only angle for sure. With more supervised data, we can avoid the exploration problem and we could translate all the proofs that are known to man and

And that certainly would make the agent much better. That would save us years of compute for sure. I'm not saying that interpretability is the only way we could use a dogmatic transition for sure. But it could provide us a signal that we couldn't get otherwise. Presumably, with time, compute, theory building, we might be able to...

should maybe rediscover proofs that are already known? I think it's an interesting question because it highlights the sort of complementarity of specialist human data and RL data. And I think it's especially prominent with LLMs, where LLMs have these very strong human priors because they're pre-trained on a lot of human data. But then when we do RL with them, they have an opportunity to take these human priors and build on them and develop their own

um styles are doing things like after proof has done and i think one thing we've seen in this project is that uh often like the small amounts of precious human data can be really useful to exceed the agent's behavior and sort of you know get it from like a complete zero state to like some somewhere much higher where where it it can play with the environment in a much more efficient way and then beyond that it can take it from there and you know perhaps match or exceed the human but with its own style um that's how i guess like

that feels to me like that's probably going to be the way forward with RL for LLMs in general is that like the specialist humans are going to serve to like get the LLM from like just a bunch of weights that knows how to do nothing to like something that like is like surprisingly strong. And then the RL is going to take you from there to like something that's like superhuman. I think it was supposed to be Poincare or somebody who was the last mathematician who knew all of mathematics or at least, you know, understood big portions of it.

And suddenly you have an AI or a system that could potentially encapsulate all of mathematics in a single program. And it could really be used as something that could help check a proof. It could help sort of push a mathematician forward in their own research. You know, often when somebody proves something now, if they're in a more side field, there aren't that many people who can actually verify or check the proof that they've done. How many mathematicians have access right now to AlphaProof? Or how do you think about engaging with the mathematics community about day-to-day usage of this software?

pretty amazing set of advancements. So at the moment, mathematicians don't have access to alpha proof. And to be honest with you, at the moment we can't rival at all with someone like Terry Tao. I think we demonstrated that what we've demonstrated is that we can learn general mathematics almost from scratch and arrive at kind of an impressive high school level.

And then we, we need to grow our knowledge base so that we become useful. But I don't know if you've seen kind of Terry Tao's kind of recent kind of interviews over the last year. And he's been saying that, you know, one, one thing that, that is interesting in math is that collaborations has been relatively small. A lot of papers are one, two author paper and max five authors. And it's been because it's been very hard to collaborate with more mathematicians because you need to check

what they are doing. And so it's very time consuming. But if you instead relied on a formal system to check everyone else's work, then you could do a little bit like in astronomy where you could have an amateur kind of living in the middle of maybe nowhere and you've never met.

And then you wouldn't have to trust him like you could trust in some sense the machine to check the work. And if the machine says it's a correct proof, then it's a correct proof. And then you can kind of start to work with many, many more people in some sense. That's cool because you can kind of start to think about making collaborations between many humans.

but also potentially AIs, right? And so that could be, you know, that's one way we are thinking about it as well. It's like, oh, can we, could we potentially collaborate to those projects? And at the moment, the way we think about it is, you know, we think more of ourselves as some kind of, it would be great if we could be like a graduate student and kind of the mathematicians could give us kind of the questions. I think that's maybe related to theory building at the moment.

we're not really good at asking the right questions. And there's definitely kind of, we would like to rely on the whole mathematical community to kind of ask the questions that they care about and see if we can help even a little bit answering those questions. I was also a Go player growing up. And so, you know, seeing

seeing the Lee-Soltle match and very, as you described, alien moves and go was incredibly interesting to me too. Is there something that has felt alien or surprising in terms of an alpha proof so far that you can talk about? Yeah, sure. So let me show you guys part of the proof that AlphaProof came up with for problem 6 at the IMO. I think it'll be much more easy to understand if I flash it on the screen.

So this is problem six. So the IMO typically has six problems, and problems three and six are the hardest ones. And so this is supposed to be one of the hardest problems of this year's IMO. But it actually turned out to be one of the hardest problems ever because only five out of the 500-odd contestants managed to solve it.

So, you know, it is really hard. So we can just go over very quickly what this question is saying. And I won't go through the whole proof because it's probably incomprehensible in the time that we have. But like I'll point out like one cool construction that I've came up with. So the question talks about like the definition of a function being Aquasoulian. So Aquasoulian is not really a mathematical term. This is just a tongue in cheek reference to the fact that

Bath, where this contest was being held, used to be called Aquasulius by the Romans. But anyway, so they've given us two equations that characterize an Aquasulian function. And we've got to show that there exists some integer c, such that for any Aquasulian function f,

there are at most C different rational numbers of this form. So this is a bit complicated to understand, but essentially this form is the sort of even part of a function. And what it's trying to say is that, you know, you could like take any Aquisodian function and give it many rational numbers and, you know, figure out this expression for many rational numbers are, and you might get like infinite values or you might get five values or three values.

So it's asking you to prove that there is some bound C beyond which you cannot have more unique values and also asking you to find what this bound C is. So often, so okay, so that's the question, what is C? So often when you're trying to answer a question like this, one strategy you can use is to show an upper bound for C and a lower bound for C and then show that those two converge and then you find C.

So I'm going to pick up in the proof from the point where Aflaproof had already proved that C is less than or equal to 2. That's itself quite an interesting proof, and you can look at it on our blog post if you're interested. But at this point in the proof, Aflaproof has to decide, is C 1 or 2?

And the way it's going to have to decide that is by finding... So it's trivial to show that C can be 1. What's hard to show is, like, is there any acquisition function F such that C is 2? And, you know, you can pause and try and find functions F that follow these properties and see if you can find one for which you can find two different values of this F of r plus F of minus r function.

expression. But one thing that's interesting is that Tim Gowers, who was one of our judges and is also a Fields Medalist, tried this question for a couple hours and he couldn't find the construction for function that had this property, which is, of course, not to say that AlphaProof is better than Tim Gowers at math, but it just highlights how hard this part of the problem is.

And after proof came up with this construction, which is this function f of x equals minus x plus two C of x, which is this funky looking function that I've plotted over here. And interestingly, this function, if you plug in r equal to minus one and r equal to half, you get two different values. And so you can tell, let's see, must be equal to two.

So in my view, this is maybe the hardest problem that AlphaProof ever solved. And it was a really cool, funky construction that relies on this sort of, this seal function. Very cool. Very cool. Can we start with advice for people who are working on math today and investing time in it, given AlphaProof and related models? Formal math is going to be an increasingly important thing going forward, not just because AI is going to be really a very, very strong tool within formality,

a form of math, but also because it's emerging as a way of collaboration between mathematicians and many prominent mathematicians are adopting it. And it's still a small minority of the mathematical community that operates in Lean, but it's a growing minority. And AI will only accelerate that. And so I guess I'm not a mathematician, so who am I to advise mathematicians? But I guess my advice from the outside would be learn Lean as early as you can. FRANCESC CAMPOY: I realize as well it's a great tool for education because--

Most of the proofs are written at the level of abstraction of the person who writes the proof.

And so, for instance, if that person thinks that step is trivial, then you wouldn't have any explanation about it. But if you have that proof in a formal language, then you can zoom in at the level of detail that is adapted for you. And so that's one thing. You can learn much more at your own level. And the second thing is you can actually do self-play with yourself.

In a sense that, you know, maybe you are trying to learn about a very abstract part of math. And that's the usual thing is, you know, it's difficult to explain math because it's quite abstract. And then maybe you can't tell whether what you're doing is correct or not. And if you are doing it on paper, then maybe the next best thing you can do is to try to find someone or your teacher. That might take you a lot of time, right? To get some feedback.

But if you are using a formal language, then you know, you would know exactly what's left to be proved. If you've done the thing correctly, or if you've kind of fooled yourself a little bit. So even for educational purposes, I think it might be very interesting. It's not really advice, it's more speculation.

Somebody said that there were two types of mathematicians, the ones who build theories and the ones who prove theorems. I understand that in order to prove theorems, sometimes you have to build whole theories, as was done for Fermat's last theorem. But it seems that there is more room for human creativity, human taste, human skill, and...

building theories than in the proof box. One thing that I kind of struggle with here is it is increasingly true across a bunch of domains that are being, where the hill is being climbed with AI, I suppose, that technology

seems to matter increasingly much. It could be terry-tau taste in problems because if you're working on arithmetic progression in primes, it actually leads to, like, advancements in methods and tools. But I actually think it's really... One, it's challenging because, like, telling somebody go develop good taste is...

a more nuanced directive than make sure you learn lean, right? But two, it's also pretty interesting and parallel to the work you guys have been doing in that to some degree, if you look at the behavior of where you're applying computation, it's like, oh, well, why don't you work on abstractly some problems around this area because it will improve your ability to solve the

problem, right? And so I think maybe, as you said, we'll all do a little bit more test time RL. I don't know if we'll all develop taste, even if that looks useful. I saw this from somewhere, which was like, as machines get better at finding the answers, we're going to have to get better at finding the questions. And these systems don't have their own

sort of notion of what questions are interesting. And given a large question, how do I break it down into smaller questions that might be interesting? And, you know, that's probably where, like, not just in math, but in many domains, like, even as a sort of software engineer, like, I find that, like,

more and more my job is like not figuring out these sort of small low-level details but you know making like good high-level decisions that tools can help you with and that's probably going to be true of more and more domains thanks so much for doing this it's great to great to meet all of you and i appreciate you doing this well

Find us on Twitter at NoPriorsPod. Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.

AI and the Future of Math, with DeepMind’s AlphaProof Team 39:21 Share

No Priors: Artificial Intelligence | Technology | Startups

Deep Dive

Shownotes Transcript

AI and the Future of Math, with DeepMind’s AlphaProof Team