Home
cover of episode 833: The 10 Reasons AI Projects Fail, with Dr. Martin Goodson

833: The 10 Reasons AI Projects Fail, with Dr. Martin Goodson

2024/11/5
logo of podcast Super Data Science: ML & AI Podcast with Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

Chapters

Martin Goodson discusses the evolution of technology at Evolution AI, from traditional OCR to deep learning and transformer-based approaches, emphasizing the need for high accuracy in financial document processing.
  • Evolution AI uses compositional neural networks and deep learning initially, now transitioning to transformer-based models.
  • The company focuses on specific reasoning to achieve human-like accuracy in data extraction from financial documents.
  • Traditional OCR is insufficient for complex documents like invoices, requiring advanced machine learning algorithms.

Shownotes Transcript

Translations:
中文

This is episode eight hundred and thirty three with doctor Martin goods and C E O of evolution A I. Today's pisos s brought you by epic linked in learning instructor keep MC.

Welcome to the super data science podcast. The most listen to podcast in the data science industry. Each week, we bring you fun and inspiring people in ideas, expLoring the cutting edge of machine learning, AI and related technologies that are transforming our world for the Better.

I'm your host, john cone. Thanks for joining me today. And now let's make the complex, simple.

Welcome back to the super didi ze pocket. I think you're really going to enjoy the conversation today with my guess doctor. Martin goods is C E O and chief scientist that evolution A I, A firm that uses generate A I to extract information from millions of documents a day for their clients.

He's also founder and organizer of the london machine learning meet up, which with over fifteen thousand members, is the largest community of A I M L experts in europe. He previously LED data signs that startups that apply ml two billions of data points daily. And he was a statistical genetics at the university of oxford, where we shared a small office together.

H today's episode be of interested anyone even vegal interested in data science, machine learning, or A I in today's episode, more in details. The ten reasons why data science projects fail and how to avoid these common, he provides his insights on building A I startups that serve large enterprises and the importance of open source A I development. Are you ready for this fun episode? Let's go.

Marn gods, in welcome to the super data science podcast.

where calling in .

from x very nice. That is where you and I know each other from. We used to share a little office in on third small office. IT was, but he was nice. And we had, if I remember correctly, a lot of the time we were in there, we had four desks, but only three of us in there.

So felt ty luxury js y.

you should NIO the third .

oculus of that office.

Maybe this will spend a reach out. He's probably I this predial he could be he could be um sure all .

of the geneticists .

that cannot resist listening to the super day design podcast. I mean, he does sit at the intersection of of dat analysis, I guess. Um so it's great to have you on the show.

I actually 啊, only Martin would know this. But years ago I asked Martin to be on the show when I first took over his host of this podcast. And you say that you remember the conversation well.

the way I remember I was no, but very well, I said, I don't have anything to say and .

that's think, yeah.

that's IT. Now our .

listeners are going to get to experience that for an hour. That was the plan on the show. I know it's great to be a to be connected to you.

We had a lot of last before we even started recording, which is really nice. Um you are one of those people that, uh you know I really miss being around you all the time. You're brilliant. You do always have a lot of powerful things to say。 I also really miss the the way you would sit in meetings.

You would often sit on a desk, a chair, a cross like IT and then with a very Operate poster, you looked like, know kind of like an all knowing butter in the corner of the end. You you have a IT adds to your gravitates. It's probably .

things like that that .

LED to you becoming CEO and chief data science st at evolution AI which is uh according to for a research is a general A I power data extraction platform specializing in financial documents. That's the kind of thing that wasn't an obvious to me when I was just, you know, looking at your linked in profile. But according to our our researcher segment sees you specialize particularly in financial documents and you surpass traditional kind of O C R optical character recognition capabilities. Do you want to tell us about the advanced that you and your team have made in computer vision and NLP that allow your A M models to achieve human like accuracy in data extraction?

Yeah, yeah. sure. So so you know what what we do the data traction thing is customers, they have some kind of documentation. So typically, IT might be a commercial lenders like a bank, some someone who lends money to businesses, Normally s omes.

They i've got loser documentations that they need to read, transfer some data for all, and then run the data through their risk model, credit rist models, credit decision engines. What ever decide whether they want to lend money to, to an organization. So it's to another business Normally.

So that kind of documentation, ation might be a financial statement and might be a bank statement. IT might be an invoice. Maybe they are doing some kind of asset financing and they need to look at the invoice.

Traditionally, all of that stuff has been done manually. People have literally copy impacted from documents or just typing stuff into excel, which is obviously huge waste of time. So so we been doing which kind of been automating this process for lots of banks and other financial institutions since about two thousand fifteen.

So yeah in terms of know what the differences between traditional OCR. So OCR has been around, in a sense, seventies. Optical character recognition is just IT extracts characters, optical character recovering.

It's really, really going to extracting characters if you got something that that looks like a check when you know exactly where the check number is on every check, check as well. Checks look the same. IT works.

Great need to do with anything else. But most documents, nobody like a jack. If you look at an invoice as those a variation, the invoice number could be anywhere. You need to already just look at that a human, which is read the document, figure out where the invoice number was and then use that information. So with what's hard to try and design machine learning algorithms that can go through that kind of process is .

similar kind of process. Now I know from other research that we've done that like many people working in technology, you prefer not to use ads because and then actually kind of disclose your secrets on how you're doing things and because that would be so difficult to know if somebody has red your patent and is implementing your technology by the scene. So so I totally understand that.

And given that there's probably limitations on what you want to tell us on air. But if you've been doing this for nearly a decade, I can't help but imagine that the approaches that you're using have presumably evolved a lot and evolution AI. Are you is there anything that you're able to tell us in behind the scenes capabilities, technologies you're using and how that's .

changed over time? Yeah, yeah, yeah. I can speak pretty about actually. So I think they're been three main phases is like kind of traditional traditional A C R, traditional machine learning face for this stuff where people were using things like hit a mark of models to understand to understand text then and maybe neural networks as well, like plastic of neural networks.

Then there's a deep learning world and that's kind of where we answered into the field. I went box code NIPS, now called european conference, about two thousand and eleven. Um maybe well, we were still working together SHE.

And I went to workshop um a deep learning workshop. This is like thirty people in this workshop, but I don't know what the thing was about, but I learned all about publishing on your networks and stuff. Now obviously europe is like masses and ten times bigger than than IT. Wasn't those days I had a europe's paper in twenty ten?

I don't, if you member, this had a laboring of the university of edinburgh, a and this guy felt alcohol. Who was? He did. He did. He was the first daughter on the paper and he did, yeah.

And he went to euros in twenty ten, and I ended up being like only the top ten percent of papers and being selected for the proceedings. We managed to clear that threshold. And I still have not been.

I actually, but I today, I bought my ticket to go this year for the first time, finally, which is one of those huge gaps. I am like, what the hell I have been thinking. And it's exactly because of the things you're describing.

Like you in two thousand eleven, you'd somebody be talking about a convolution learn networks and you have never heard them before and that kind of information and like, why why do not go into them? What was I thinking? Um instead, I went to the complex trade consortium in chicago.

So useful. Yeah, useful knowledge. But now you can have great type. I'm sure you can have a great time.

All he is a very different thing now to what I was that. Mac, it's more amazing now than IT. IT wasn't than I think, but I say so.

So I did the thing, and I came back and these kind ideas going to percussion. And I once, in a different start, outward, which try to extract information from, and like tax documents we are doing, like an alternative taxi. But that still up, completely failed because we just couldn't extract the data accurately.

And traditional C. R. And then I thought I realized that combination on your networks would be a good technology for this this thing. So that's why we started the company. I started IT with a friend.

And yeah so originally your questions about like how have we after changed the technology so originally was all about compositional neural networks um and deep learning costco, deep learning alliston and stuff like that. And then now gradually everything moved over to a transform, a based ways of working. I think I think basically everything is like a gentle model.

Yes, that is not surprising at all. And then so IT sounds like there is a lot more happening than because of the maybe the ocean starting point and also the way that you've mentioned online that you you're looking for the way a document is structured and kind of know which parts of the are document are more important. IT sounds like you're doing a lot more than with a typical kind of transformer approach today.

You might use something like the P D F two text utility in unix to convert A P D F into text, a text file. And then you could just pass that text file into a transformer architecture you could using off the shelf, OpenAI A P, I, or go here whatever, to be doing any kind of document processing in that way. And so IT sounds like you're doing something more sophisticated where maybe the entry point is a is pixel as opposed to characters.

Yeah, I think I mean, the appreciate you just said that actually works really well. It's going to give you some really good results, but probably not good enough to use in some automated process where you are trying to lend money to you know tens of millions of hundreds, millions of dollars or pounds to to the company. Is you like you need something not going to have so many areas.

You don't want the page number in the bottom right order of the page being inserted in adding a .

zero into yeah think so. You need something with a bit higher accuracy. Um so that is really about specific reasoning.

You know when we read a document, he was read a document. You're not just I transcribe all of the characters and then figuring out what what that means. You're using the characters in context and you use special relationships to try and percent the document.

Nice, very cool. So that is, as a sense, in terms of a single read of a document. But you guys process over million pages of documents a day. What's that like you tell us a bit about the infrastructure of the chAllenges of scaling to that kind of high volume over trying to retain accuracy. Um and maybe um you know if you can you know tell us about some of the specific technology that you use uh in order to be able to do that.

Yeah I mean, I don't think we do anything basically clever in times of the engineering um but but in terms of scaling up, you know what's most important .

thing is that typically .

in this industry, if you won't really accurate data, you're gonna use some automated method and then you are going to employ some human beings to check that data. And that that really is still the kind of standard Operating procedure when you're trying to extract date that needs to be a high quality. The problem is when you've got like some of our projects are huge, like you said, like that there might be two hundred thousand pages a day.

Um for one project, um you can't really employ people to check that data is impossible. So you either you use no humans at all, in which case you need a model which is really, really accurate or you've got some really, really very, very good, well, calibrate a confidence score that you can use. The only allocate you is mega human resources to those cases where you think, as some institutions ous prediction and needs to be checked by human, you obviously you need the vast majority of the documents, the pages to go through automatically.

okay. Yeah, those are definite. Two options they get. So do you you tell us which one you do or well.

we do both and you we do both. I mean, we got loads of different projects guys. So so in some projects, we we just have really very accurate models. We've just used with invested a lot in training data and algorithm development. And we just don't you know we just have we don't really have any human component in them also, but others, we just don't have algorithms of the actual Frankly, it's you know it's hard someone times we see you too big. So we do apply human humans to to check data now that .

you know a transformer world. And using general A I model something that people grape about a lot, although a lot less than they did a year or two years ago, is alludin ation. And according to a research, we pulled out that you at evolution A I, you implement specific strategies for mitigating and managing the risks of holus nations in your AI outputs.

Yeah, I think you know it's definitely a huge problem. And one thing was seen is when you trying to extract information from this, a financial statement like an income, uh like a profit and loss statement, you the problem is that you going to extract some information and the alone is just gonna make up a line night on. It's gonna completely make up this line item.

You know look at a pal sheet is going to say current assets is x millions and that number is just gonna completely made up. The problem is that that number is gna make sense in context. Typically no IT is going to be some other numbers out IT together is gonna sense but IT really matters that IT hasnt ural been reported that you know these are kind of official documents could just makes them up that if he hasn't been reported, you couldn't really be saying that you've expected this information.

So um IT is a real problem, uh and we can talk, you know we can maybe get into that like why the limitations of our alms in this world and you know but but also why there are powerful in this world for what we do. But I I think the main thing is that you really do need some the main specific knowledge at the moment to sort this out there. Obviously, nobody really has solved this issue of pollination and weren't we have an either. But but I think we have some stuff to do with specifically to do with, for example, financial statements, some tests that we can do and some other yeah some other technology that we've built on top of allies to, for instance, give you a good sense of confidence whether the whether the result was accurate or whether it's .

just been hellum mated chem comic, the data scientist linked in learning author and many time yes, on this podcast, most recently in episode number eight hundred and twenty eight. Key will be sharing his executive guide to human in the loop machine learning and data annotation course this week. In this course, key presents a high level international to what human in the loop ml is, which will be intriguing even for consumers of A I products.

He also instructed ces why data professionals need to understand this topic for their own AI projects. Even if they delegate data animation to external companies. You can access the new course by following the hashtag S D S.

Keth on linkedin. Keep men will share a link today on the episodes release to allow you to watch the full new course for free. Thank you, K. IT IT sounds like that's a key part to your proprietary texas. This is are these models that are assigning confidence and allowing you, depending on the project of to bring in, as you say, some megger human resources to double check things.

And so getting that calibrated well is critical uh, because obviously, you can't you're like, okay, we promise the clients that we would get through two hundred thousand documents today and our model found that just two thousand documents have something suspicious about them. You know it's only at a tenth of one percent, but obviously that kind of ends of being um a huge deal. Yeah I think .

it's kind of interesting. Think about what what is what's the role of the of the start up here here know you know what when chat y particular about also kind of panic like there's no role starts up anymore A I starts ups and do anything. You just need to use a and learn to do everything.

And I think we starting to see that actually there is a lot of domain missions. C knowledge is really valuable that that that's where the startups can, can use for sure. I'm a big proponents of there, a lot of .

opportunity and startups at the application layer, like you're saying, with some domain specific expertise building a vertical ze solution, the biggest players are going to be occupied for some time with just building out the core capabilities, the elements that are underneath her that you can be leveraging um and it's great for start up that the companies like meta are open sourcing such huge powerful models that we can use fine tune very cost effectively with approaches like Laura that allow us to yet low rank adaptation for our listeners who aren't familiar with the term that allow us to take a add a very small number like single digit percentage model ibs into a huge LLM and fine tune to our specific use case for your specific application, your client specific needs.

Yeah it's a really exciting time to be building application later. Startups um your company evolutionary is used by major companies across arranger sectors uh time incorporated staples company in new york times. Is there anything that you're able to tell us around how you tailor your solutions to meet demands across diverse industries?

So I think I think that list of of of names, maybe your research found that that that's basically a list of different start, the different companies that have used various technologies that i've built. And there is other companies and not just that company for specifically, we are mainly .

financial services.

Do your bag net west smaller bag as well. Many pigs um and so the question is how do we how do we work across such I mean, you know really I don't think we .

do work across over. The Better question then is how do you tailor I mean, so even if you're talking to net west or touch your bank, they have different needs depending on different projects, different skills that they are working at. Um I don't know there's if there's .

anything interesting to say that which don't really work so much with those larger bags anyway. Like to work with smaller banks and fantail because it's such a hugely painful process that you're absolutely write. Like how can you work with the smaller companies and the really big companies? I don't think you can really do about both at they wear in a different time scale, that everything's on a different scale. And so you know, we gradually, I have moved from the bigger banks to to smaller that actually we love.

I really appreciate you saying on I think that, that might be the of thing that feels like a really honest insight that will be really valuable to a lot of our listeners. I am sure we'll be making that into youtube short.

Hope you just .

losing those of custom of my sites because out of here nice speaking of people being out of here. Um you've previously touched on the baLance between being a dictator and a democrat in leadership.

Um you've said that on air before in high sticks environments like N A, I start up where you know one big door to bank client walks out the door and all the sun is all hands on deck, trying to figure out you know how we're going to make revenue come together, you know not avoid some kind of fiscal Cliff is coming up. Um IT can be really intense. So how do you cultivate a culture that encourages innovation and creativity while still maintaining decisive leadership to meet market demands?

Yeah that's a good but I definitely I did did say I in fact I asked Andrew that question when I instituted him I am sure your own listers know about and and he give me a really great answer that wasn't expecting, which is that you just is no real answer to this question about whether you should be addicted or or democrat. You you really need think and introspective about what your experience allows you to to to be confident about.

Like if you have really spent a many years thinking about something, then you really should be confident and you you really do need to say, I think we should go this way um and you know please come with me and I think he is the word um he he said something like you just give a chance on like something with me I think he um you know and I thought this myself like, you know there's lots of self. I just do anything already and it's really important. So just be really, really, really self aware that is so much stuff that failed moving so quickly that there's so much stuff you basically don't know much about anything. There's a few things that you do know about, like I think i'm quite strong and certain things I think i'm quite stronger, like core scientific method, data analysis, experimental, like design of experiments, stuff like like quite classic stuff.

I remember when we were working in the same office, you are a, you are a researcher. So you were, you were past post dog. You are doing research, but you undertook, you audit IT, or maybe you must not have even been audited. I mean, you have to remember Better me, but you did a graduate level mathematics course at oxford while you are working full time as a researcher. And if I remember correctly.

you got the top grade.

second top. You know that actually I had I had a few neurons fire that were like second top. But they did want to like, yes, that be wrong. Yeah.

rather be wrong the way, yeah, yeah, I can so yeah, yeah, yeah. So I did. This statistics course is like a apply statistics.

But I didn't do IT. I just SAT the exams and stuff that were not fishing. Yeah, that was fun.

So I learned a lot doing that. I really enjoy touch. So I feel like i'm pretty strong with that kind of basic stuff.

I'm not really super strong on or anything else but something like that. But then I do think that a lot of machine and research does come down to understanding data, understanding statistics. I just i'm understanding the way that data can like screw you over and confuse you. I've been confused so much by data, probably with you many, many size of the person. It's just he's really state with me to my call.

I remember when you're the first person who ever told me that I should learn python that that was a good .

I was like I was doing .

everything in are at the time know what are you doing? Everything going to pass on and yeah, you're the first person that I like pan. I guess I should like python.

Um in terms of statistics, do you think there's a big gap amongst a lot of machine manning practitioners that don't study statistics? Um I think there's a lot of value there. I mean, I kind of like I have a lot of confirmation bias on this because I was a statistician before getting into machine learning. And so I like to always think that there's a lot of value in that. But I think in terms of understanding your data and especially cleaning up your data, IT can be usually valuable.

I mean, there's a lots of stuff that is in statistics and that you've learned in statistics that is just the useless now and is a waste time like there is is a quite a traditional build, you know hasn't didn't move very quickly and people still being stuff they definitely were there being told stuff that is not really that useful, like there should be much more emphases on computational methods, I think so that haven't said that. I mean, there's a lot of stuff you can just safely ignore in statistics .

like yeah p values .

like the delta math, like quite a lot of mathematically intensive stuff you should say when needed, if that. But you do need the approach and the kind of attitude and the scepticism about data and really understanding like bias really at your core, understanding things. So in a bias selection, buyers and all of self that statistics has got a really, really, really important has got really important lessons to learn about is to teach about that.

Um that's really, really critical. Most of the other stuff you don't really need any of the tech like the techniques. Do you really need any like you need to know any regression.

You can do a lot with regression. Don't think you need much more. That's like your statement.

Maybe I think that something that for me has come in handy is actually interestingly for things like evaluating not statistical bias but like unwanted bias um you know see your models behaving in a way. Um so for example, add our company at nebula. We're doing a lot of things in human resources.

And so for example, we would be ranking people for a specific job description. You know we have database of one hundred and eighty million people in U. S.

And somebody puts in a job description. We rank every one of the us. For that role.

We want to be sure that it's not ranking men Better of the women, for example. And so we have a test data set that we can test our algorithm on. And obvious ly, there's going to be some variation.

This the the the mean score on men is going to be somehow different from on women. You're never going to get exactly the same number. And so statistics has been useful for me there to be able to say, okay, but there is no statistically significant difference between these two groups. Um yeah yes.

absolutely. I think I think the concept of statistical noise is is just a really deep concept and IT needs to be really, really understood. Um and I don't think I really you know some of these basic concepts in statistics, they they are just not that widely understood by people that that come ready from a computer background.

So and also stuff to do a date of visualization. Like i'm literally like today we were showing someone how to make a Normalize histogram is quite typical that i'm teaching someone how to use a Scott blood. People who are really, really good, they they go great on background, the comfort capsize background, they don't really know how to do. The straight four things in date and I was just like make a good scatter blood or make a good first grammer one ever is kind of boring that stuff but it's also really, really important .

yeah IT is also the kind of thing that you can were getting really good at having your L M, you know, google, german, I built into your cob. No book can do a lot of that. Can I make a scatter blood? But you've got to be able to, especially on big important decisions like, you know, you're talking about, know situations where tens of millions of dollars or hundreds of millions of dollars are involved in the transaction. Uh, you want to be sure you want to give you don't just want to press the german I magic button and see some results and believe that you want to be able to dig into IT and and know yourself.

Sign off on IT someone, a friend. And he says he works to pick consultations like one of the top kind of strategy consulting. They, I say who this but hate they are.

They're doing so they using other lambs to analyze financial statements, but they are basically do what you just said. They just pressing the button which analyses the statement result, summarizes all like makes all of these financial decisions like that. IT is a magic button. IT just does one thing. And I got one big step IT IT doesn't like, you know, I think you need to split these things up into music processes and I think have a look at the results of each step.

I bet there charging a lot more than you are to yeah ah yeah this consultants es, so in addition to you work at the evolution AI, you're also widely known being the founder and organizer of what is now a nearly thirteen year old meet up in london, the london machine learning meet up. And I don't know exactly how you prove something like this.

You you in the same way that I guess I kind of I can say super dat science is most of the and science because i'm aware of the other day science pocket, I know how many downers they have in its smaller than are. So in the same way, I guess you're well aware all of the A I M L communities in europe. And so you can confidently say there were fifteen thousand members. The london machine learning me up is the largest community of A I M L experts. And you up .

so that's co fiendly say nobody .

raise their hand, said .

you're .

well in this group.

I should say that I didn't not found the meeting that was found, found by somebody else. He, he gave IT over to me. Someone could you? I can. He, he handed IT over to me. But I think I be right to get maybe ten years or something .

well will take that experience that's enough for us. I'll continue on with the same kind of line of inquiry. Yeah um so you you know you've been working in data science for longer than it's been called data science. You've witnessed the evolution of A I from ad hawk experimenters to strategic, continuously integrated workflows, from niche to mainstream and from exciting novelty to, at times, overhyped technology.

And one of the things that I was really excited about to have you on air on the show even four years ago, when you thought you had nothing to say, was that you'd written an article that reach number one on hacker news called why data projects fail. And we're obviously going to have a link to that in the shown notes. One of my favorite about IT is that it's funny, it's short.

IT takes a few minutes to read and it's funny and it's spot on. And you know, if you are looking at getting started on a new data project, I frequently there, I probably don't remember all ten of them for beta, but there are items from IT, from reading when you rote this eight years ago that come back to me when i'm thinking about getting started in a machine learning project. If things very simple, things like how are the data structured now? Has anybody been doing any kind of modeling with this yet? And if they hadn't, you need to add months onto the timeline for this project in order to be able to confidently say to the client that this is that you're .

going to be able to do something. Yeah, I was some hard one lessons in the article, mainly just based on me just growing up lot of times previously and try to learn the lessons .

from that going quickly go through. So useful. So number one is the data aren't ready. So that kind of the first one that I that I just said, um you know if you've got a look at the data before committing to a project um and I think that's eight years later as A I has become so powerful through transformer architecture, there is this kind of expectation that more more magic is possible and so executives in a company think, oh, you know my competitor is doing this, but maybe the competitor has been tracking data, logging data that's useful to this task.

And the executive of the company that you considering doing this consulting work for hasn't been logging those data. So how are you going to you can't magically create N A I capability without some kind of underlying data um and you number two, that somebody heard data is the new oil, which is something that people today it's more like you for saying data, the new electricity or things like that. It's such a weird thing to me because these like the resources like oil, electricity, their finite uh, where oil data, I mean data are finally, but data you can copy easily. And so there's some ways it's kind of like the inverse s of an asset like oil where part of what makes data survivable is that you can copy them.

And I think is if you got loads of oil, you can sell IT. You has a Price now define Price, you know, that is valuable and you can sell IT, but you try to sell data and it's just not like that. You there is some guy, there's always some some girl guy or girl, his job IT is to sort the data and know maybe their life is going to be a misery because the dates is herders and because so many problems that IT has literally negative value and does not work anything, anyone. So but it's easy to at the top of the sharp, you know in the in the exact team to think that we've got all of the data and is that idea was particularly prevalent at that time when I read the article, may be is not so yeah also .

the number three was that your data scientists are about to quit and this was because of like issues like access um where it's it's a wild to me. How many big organizations don't allow their data science? Sts are software developers to have root access to the machine to be able to install libraries, pipon library. It's wild. Um and so I think that was kind of the the .

main point and I was here the time I can go.

I regret to report. I've seen at first hand recently yeah definitely happens um actually something that isn't perfectly to what we were just talking about with respective statistics is your number four, which is you don't have you don't have a data scientist leader on the project um so you know you mention specific things like selection bias, measurement bias, simpsons paradox, statistical significance at the time you thought was important eight years ago and yeah you know there's if if somebody he's going to be building the term da scientist is used to describe so many different kinds of rules.

You really have to dig into a job description or a project description to understand what's really required in a role. But for me, something that has kind of I kind of use as a as a lime test of whether this is really a designs or not is whether they're going to be predictive models built. So not just analytics, not just analyzing things, but building a model that could go into production that will be making predictions on data that the model hasn't even seen before.

Um that to me is you know for sure a data scientist. Um and IT doesn't need to be a machine learning model. IT could be a regression model um you know statistical regression model but uh you can end up you know IT would be easy if you were you know creating a data science project.

You want to be building a predict of model and you know you have somebody who doesn't have experience uh creating predictive models and putting those into production. And you could end up with what another wildly popular blog boasts. Uh probably run the same time called um the the no limit credit card of software debt.

Yeah no I remember the article. I don't I don't remember the specifics, but but you specifically, I when I write the thing that you talking about that point, I was really talking about leadership. And I feel like that is still a huge problem. You get these teams where the leader of the data, tes team, is not, has never seen any data and then have really special of time of data ah they have some other background, like some great background is something else. But they they just don't really get IT, they just didn't really .

get they from one of the three strategy consulting rooms that happens a lot yeah you know very highly paid, very well educated, but they is suppressing the magic.

But yeah, yeah exactly yeah. And you get a lot of money wasted and you know the projects all fail. Nobody wants to admit that they failed because it's embarrassing and complete the time.

You number five is the inverse of four, which is that you shouldn't hire data scentin at all. Um so it's again, it's kind of a misunderstanding of the project where you hired da scientists, but in fact, you need data engineers or you need B I analysts. And yeah I guess it's easy.

Well, it's easy when you're executive time is short. You know you have this exciting project, you think there's an opportunity and you're like, oh, this is A I we need A A scientist, but very often you don't. And I think that's something that i've talked about on the air before, is that, well, yes, there is a lot of demand for data scientists out there in terms of know that being a job that people aspire e to have.

I'm sure a lot of it's probably quite a few listeners out they're listening to the super da science pod casters are interested in getting a jobs, the data scientists and yes, there there is the future for you in the space, but a career like being a data engineer, software developer. I mean, there's orders of magnus de more vacancies per interested applicant in those kinds of areas. Um when we have people on the show and we say, are you doing any hiring? Invariably people are hiring scores ign engineers, data engineers. They are not always hiring data es.

Yeah, I think that's true. I think I think I think use the early days of the of the industry. Everyone thought I need to date sites us and then start to realize they don't need as as though a number .

of six is actually basically what I just said, which is that your boss read a blog .

post about machine learning. You know, I have, i've never said this before, but actually, you know, I partly inspired to write this piece because as a revenge being, because my boss at the time .

was a complete.

he needs to come here to come to work and just come up with some rubbish about on humble models or something. And I thought, you know how GTA get revenge? China, i'll get something on how can use specifically calls out this behavior, which is disfunctional. So IT was he was quite embarrassed in the company at the time. I think, oh, wow.

well done.

No.

this I mean, this is just happened to me many times where people yeah there's a kind of personality in business where they they think somehow, you know, they can get the rest of an idea.

H one of my favorite ones is i'm supervised and unsupervised machine learning where somebody is, you know, this sounds like there's an intuitive concept there where a supervised machine learning model needs somebody, needs like a human in the loop, and then unsupervised, you don't. That's got what that sounds like. And so i've had that conversation a number of times where, yeah, an executive is like they've read about somehow they came across a blog post where the term unsupervised learning was there.

And I like h, that's what we need. We need. We need no more human in loop, even though hope, I mean, and not just to be clear on this, this is what unsupervised learning actually is, is where you have an algorithm where you're training machine letting model and you don't have labels.

So supervised learning is a paradigm machine learning where you have, say, a whole bunch of images. The classic examples is to say that half of them are dogs, hf of them are cats, and they're labeled as such. And you create, in this year, supervision aching learning model learns how to classify dogs from cats in the unsupervised learning paradise.

You don't have those levels. You don't know whether the images are are of dogs and cats, but there are still machine learning algorithms out there that can recognize patterns in the data kind of category of figure out how to sort things into buckets vegan um and so you're not necessarily going to end up with dogs and cats. You might end up with being able to distinguish of dark images for light images. Um i'm kind of i'm getting into a very vae example.

But no, I think the critical point is what are trying to say in this article and and many other articles that first is in this size is a really technical discipline and IT is a discipline, and you need to be specialists to really perform well.

And just reading a blog post that kind of spouting off and supervise supervise whatever, just because you kind of heard the world before is just not going to get you anywhere like in any other technical cipla people do that. Well, why is this different? Is new a Young discipline when I trace IT like, like a real discipline, like a real scientific discipline?

I just remember another one where a with reinforcement learning, when reinforcement learning started to become a buzz d around the time of alphago, I was having lunch with, I did someone that I really respect, actually, who's a start up founder, brilliant guy, has been very successful. But he'd done this the exact the same thing that we just describe with unsupervised. He'd heard this reinforcement learning, and I know I guess read a blog poster watching all of go movie or something. And so he was talking about how within his platform, he wants to get reinforcement learning involved so that when a when when humans uses platform and add data that will reinforce for the machine learning alga and what he needs to learn .

yeah and you know that some take sites to his job is to execute his vision. This is not having .

a fun time exactly what's exactly right? Um number seven, I can't find a way to thematically tired very easily to number six. Actually we can. So number seven is that your models are two complex, which is kind of this classic example of so if number six was your boss, read a blog post about machine learning. Number seven is that your models are too complex and that such a situation where could happen, it's like, you know your boss says we're going to do reinforcement learning and we're going to to use a large language model um when you don't need that at all, when you just need a simple statistic, logistics regression and that'll get you all the way yeah .

I just I am quite a simple person. I like to do things in a really simple way, if you can. I definitely like to start projects off with very, very simple methods that everyone really understands really well just to figure out what's going on, what's going on with the data.

I just, just, just, just have a really simple idea of what what's happening that you can really understand in a deeper way. I've heard of many, many projects where somebody come in and just wanted use some really advances technique that just heard about or just, you know, it's natural to have some excitement, but use this complex thing you don't really understand, there is really difficult to insulate the results. And I know i've known so many stories where six months down the line, oh, we use the wrong column in the input data set because, uh, we didn't we didn't understand the data and I was really hard to help with.

The results are seriously like six months of work just completely wasted because people use on complex a method that they could not really understand. I just i'm such an outfit for simplicity. Yeah we're possible.

And IT also allows you to be prototyping a lot more quickly, potentially, say a lot of resources, lots of reasons to start simple for sure. Um reason number eight, that data projects fail from your blog post is that your results are not reproduce able. And so you specifically cite the kinds of tools like get code review, automated testing, data pipeline orchestration, which yeah I mean today luckily we have more more tools than make that easy. Books right yeah yeah yeah ah know box are, I do love them, but yeah .

such yeah guilty pressure .

yeah exactly. Eager to learn about large language models and generate A I but don't know where to start. Check out my comprehensive two hour training, which is available in its entirety on youtube.

That means not only is IT totally free, but it's ad free as well. It's a peer educational resource. In the training, we introduce deep learning transformer architectures.

And how these enabled the extraordinary capabilities of state of art. L. S. And IT isn't just theory.

My hands on my demos, which feature the hugging face and PyTorch lightning python libraries, gave you through the entire lifecycle of L. M. Development, from training to real world deployment.

Check out my generate the eye with large language models. Hands on train. Today on youtube, we've got a link for you in the shots. I have a pandemic.

You got to a point where my whole data science team was off of no books, and I was the only one as the chief data science. St, do you see that such A A pain for everyone else in the oranienburg? Um I am a boss. yeah. You know this is still is still a problem.

I think I don't think the industry is the field is really, really figured this out. I definitely have like a perfect solution. So this, but definitely on the daily basis of still encourage people in my team, you know, we were confused with these results.

We can't really look at the code. We can't really reproduce these results. You know, we haven't really been regress enough at at this research.

We need to just separate things, get everything down and get or whatever, so that we can we can start again, ready. So I still are ongoing thing. I don't think we figured out. I don't think that's a good solution.

Yeah, there's no perfect answer. Is one of those things where, you know having a leader, at least an engineering leader, is helpful in figuring this out. There is I think, a goldy lock sweet spot on a lot of these things where you know if you go too far, you have too many processes.

Um you know you can you can kill the the the project with a overhead um so yeah so finding that sweet bob depends on expertise. Number nine is that an R N D lab is alien to our company culture. This is so huge.

I remember actually after reading this, I implemented things like A A an internal talks club. Um so you know R N D is a high risk activity. If you don't have things like you say in here, lab meetings, talks, publishing papers, it's difficult to kind of have that R N D culture. And you know if your if your business is looking for tangible near certain return on investment on every day's science project, it's going to be hard to do anything interesting.

Yeah, I think companies really makes take a hard look at themselves. And I really think about whether they want to do research or no, because the last company they say they wanted do, but when they had relative come comes up that you know, confronts them that I think they realized we don't really want to be doing this. We want certainly y we don't want someone saying to me, I don't know how long it's gonna to do this and I don't even know where that is gna work ever.

Something that I that I have been doing over the years that has, I think, been helpful to management is that i'll typically i'll kind of break up the amount of time that my team is spending on different a levels of risk and say, you know, we're going to spend thirty percent of our time on these like relatively high rise projects where if that works out, it's gonna be a game changer for us and our positioning relative to our competitors. Um and then another third on kind of medium rest projects and then another third on like projects that I like have something at the end of the quarter to show manager .

if even if if you go .

through final one here, never turn from you is people designing data products without seeing live data. You describe this as like doing taxidermy without looking at live animals. And this is huge.

And this is one that I think data scientists themselves are, think a lot of these ones in this the day, scientists are all life thinking, haha, over ten, that's the one. The data ciencias often get wrong. They have collected data or theyve scrape data and they anticipate what the user or what the production use case is going to be like, but they don't know for sure. And that ends up being that uh, you know due to drift or due to you know real world use case is being quite different from what you change your model. One give vastly different results .

in production. Yeah, I mean, I personally make this mistake really badly. I did a project I must about six months, and I I just completely screw this up. But had this vision for how are things were going to look in this data of products. And and I said a lot of time with customers really trying to to understand their use case, trying to understand that what flow.

But I just i've made less of my frames of stuff and and I tried to be really good about the whole product design thing, but I just didn't really use enough real data. And and when I use real data, I realized everything that I thought was just a lot of rushed IT was not in a while in the project was complete failure. So that was really half l really that was been screwing up passively.

Yeah i've done IT too and uneasy. You want to make um I asked online knowing that you're going to be coming on the episode I posted that you'd be coming on. I posted a link to this article on linked in and on twitter and I asked uh my audience if they had any thoughts on uh, why their own AI projects fail and we ve got to go to want to hear from Peter Anderson in denmark and so Peter, he gives us a uh a heart back at episode number seven hundred and eighty one with solar shi where solar studio talks. Actually that's another episode basically why A I projects fail.

H she's written a whole book about IT and her main point uh and also Peter Anderson's in recalling IT is that projects fail uh, more often he's seen projects feel for one thing that they have in common, which is a lack of proper business reason. So it's kind of this where you you have a hammer and see you go around looking for nails as opposed to having some some problem you know that you know is actually business problem, that there is a chance of there being in our way if you succeed that instead you end up I think we have seen this a lot in recent years with generate A I haven't be mourned where every application is trying to build, generate a iron to IT. And I bet most of those projects are a complete waste time and resources.

D I, I am, I can't say, have to be careful what what i'm saying that I would say that I know of many projects, real projects, where people have made the decisions. Use A I, first of all, they made the decision, you know, what kind of method to use as a second step started working on this method as a third step. And then as a fourth step, somebody says, do we actually need this?

Yes.

very, very common. And it's it's a problem, is a huge problem, is always been a problem. I'm not saying to change. It's been a problem for ten years at least.

Yeah, that is one of the big ones for sure. Um yes, there are anything else that you think you would add. You know now almost a decade later since you originally publish this wildly popular post, is there anything else that you think you'd add in or something that changed a lot over.

over the past decade? Well, I think just got was the problem has got worse because of the hype as the hype is increased. So the problem has got worse. And people want to use A I for everything um and they have over inflated expectations of what's really possible.

I'm saying this a lot actually that the thing is that the l laps are really good at creating clauses of the output, right? That's what they are really good at. That's what they are designed to do. So they're really good at falling people.

So so we get people expect to many people who they they try some stuff out with the you know some commercial and everything looks great, the open looks great, but they don't really do any evaluation and they don't think they need to because IT just looks perfect. Everything's fine. You know, I feel human k to you.

You ask them some accountants y question and they gave you this answer, which was like perfectly formed english for five minute exposition on this account concept. Be like that. This is really intelligent and they obviously experts in accountancy, but that world has just changed.

Something can tell you something which is very fluid uh um and very long winded you know equate actually but but but it's is wrong and you need to do all of that boring. To have like conceptive analysis and evaluation to figure out whether it's actually useful is going to do the job for. So these problems are even more prevalent now, I think, than than they used to be. So I don't do think that there are any new things I seen. But but definitely it's still definitely, most of least of things are real issues.

Very nice, he said. Well, yeah, thanks again for that blow post, which yeah, as I said in my own mind is something that i've refer back to so many times when thinking about getting going on a project um switching gears here to something you've done a lot more recently last year you give a talk to the european commission in which you advocated for a publicly funded open source um in which you advocated for publicly funded open source ai for humanity with large and you drew lessons in that talk from the human genome project um and you you know you want to ensure that the benefits of A I are spread widely across the global population and not just concentrated in the hands of big tech talk about that talk.

I mean, how does that even come about? How does the european do they send a letter? How do you? How do? How does the european commissioned by you?

Just I think, the eur, an commissioned ed nature nature publishing group, to organize this service and convey some people to get some talk. So there was like a panel like three of us and they select to me. I don't know why they select to me actually good question, but but I thought, okay, you know, i'm going to advocate for this because I think is really important.

I I really think that we need to get because at that time, IT IT really also clear what was going on. Like IT really seemed like we were going to have a handful of commercial players with alams who were going to take over everything and there was nothing gna compete with them. And that has not really pound out.

I think they have ve been obviously huge, really successful episode projects which are catching up, if not exceeded some of the commercial offerings, which has been amazing. But at that time, I was kind of huge worry, like what's gonna en to to the workforce, unemployment, employment considerations. All of this power is so much power concentrated in the tech players that the public, you are publicly funded initiatives need to get get in there.

And the reason why I spoke about human genome project is that I had a lot of people saying, well, what can public funding actually have, you know, walking. You know you can't do anything in the public sector is never gona be able to compete with the commercial world. And that's just so a historical because the key and gene project did compete with the commercial world and beat the commercial world.

You know, this is well documented. The select a venture, try to patterns the human genome and the and the loss of public spirit of scientists basically say, no, we're not going to allow that to happen. And they worked together on this huge project. So they got public funding for supporting public funding.

IT wasn't a public funding um and they they want they managed to release the genome, the human genome and the pub put IT in the public to me and because they put IT in the public domain, curry venture could not patent those genes and they beat him and they stopped that happening and who knows what kind of world would be living in he managed to patent those jeans. So you know I really wanted to inspire people to say a public a public sector archive stuff scientists and research as economic candies stuff together let's stop thinking of everything's so doing glue like they just give up our research. Everything's going to be owned by OpenAI google like I don't believe and I know I I think that's been been born out because you've had lots of really cool over source projects that have been built.

The trade of model using eights, a one hundred, whatever you know, some academy club has done, like lover, I think was trained on eight, eight, one hundred and so modest, really, really interesting. And and that despondency was totally unwarranted, I think. But I said any impacts that I had with the european commercial, I think I had zero impacts. but. You might have helped .

me drag get some funding.

Maybe possible, possible. yeah. But I think you know that would they would spend a lot if if we, you know that.

Okay, so, so, D, U, A did that. They ve released falcon that they funded. The united emerges.

They did found help. Can so know this? The kind stuff did happen.

It's just the E, A tender and the E K. ender. Um what's yeah mr, except like us, I bet definitely the U.

K. ender. And IT wasn't just about building an opposition model.

I was about building capabilities and skills. I really thought that, in particular, the U. K. Did this then that we would be building really good skills instead of that.

We've kind of spent money on stuff that I didn't think was very useful than you know. There's a whole topic in itself, which is the U. K.

government. And its I really didn't take advice from the right people. I think you take advice and technical people and so ended up wasting. I think that money would have been a lot Better to spend if we invested in open source. Open source and or other things would have been great for building network skills.

And that's the stuff. And you mentioned the U. K. there. And so you have in previous podcast and articles you've lamented, have U. K.

Used to lead science and technology? And now on A I at least they do have a back seat. So seems like you know there are mistral, the U A E, the U S, china.

There's lots of uh l ams in in in those countries that you know are are pushing the state of the art. Um we don't typically see the U K or you know british universities with l ams on the top of A I leader boards. Uh what kinds of strategic moves you think the U K. Should consider to regain leadership uh in A I which has historically been strange for the country.

Well, the big thing that we did, that we shoulder this is completely ignore language models and new on networks. Are we completely noted, we ve got this big in the U. K.

We've got this big flagship research institute called the alerts to, and they completely ignored this research or all of this. I just didn't do anything on language models, like even black birds are a language models. I just didn't do anything on on this. There was know, I returned about this before they they were publishing blog posts on fungible, non fungible tokens or whatever, and just stuff that was just completely unrelated to the the mainstream of A I .

reach cyp to currency.

Yeah, but not like but using A I to like predict something about critical currency or something that was like the biggest blog post twenty twenty, I think, was the thing like they Carry out with all the stuff. I had this, this group called the A. I.

Council that the government convened. And I was just all of these things. The U. K. Is just great at having lots of people with letters after their name, getting together, writing these long reports. But they they, you know, they just invite the wrong people.

They didn't invite any startup c to s the data scientists that the actual practitioners who are doing the work. And you understood this, they didn't do any of that. They didn't invite any of those people to any of these bodies um they they invite you like I said, people with less after IT is like they had ingredients ts of a really nice you know really lovely garden party.

But they didn't have the ingredient of like a hard core engineering or expert group. That's the problem that the k has a fix. So so impact, but that's to be negative.

But the positive thing I think is really just listen and talk to the practice. aer. I just shouldn't be I shouldn't need to say this.

Talk to the people who are actually doing A, I understand the field. There are many people in the U. K.

With great talent. There is amazing talent in the U. K. A lot. And there's a great community as well as great temp community. You just the government, we should need to get just to talk to them and they just can tell them what to do at that time. When I read that article that you mention, I think I really think the right thing to do wasted to work at open source outlands, i'm not sure. And now whether it's rightly well because not some other groups of come up and i've done that and the time means may be not so good any but know I said, I don't think I have good answer question.

Something related to your leadership in the U. K. Is that you were elected chair and I think you are no longer do this role.

But for several years, until recently, you were the elected chair of the data science and A I section of the royal statistical society. That sounds pretty fancy. You know, you talk about you were talking about garden parties there. The royal's statistic society. Is that like, you know, do you do you have little sandwich es with the queen ian and talk statistics with there?

No, but you great there. I say it's, it's really old. I think I think Florance ninety ago was the first president of the R.

S. S. It's right. I just made up nothing that's true. You know, if this is great and they really want to, and they really wanted me to to help help them have a presence in the A I world, day size world.

And they they quite rightly felt like the statistics was big sideline. And in this world they want you to come to the you know most parts of that they I saw yeah I was the chair of that this organization. I'm not i'm no longer the chair ih just paste is is is the great chair right now. She's doing a great job.

Um so I talking about you know I taking the time away from that but what we wanted to be was the voice of the practice because no one was the voice of the precision and still that's the case that no one is really putting the views forwards of data scientists, the actual people doing the work in any other industry. You gona have some industry body who can to represent, you know the Fisherman. If the government wanna take some policy decision that affects fishing communities, they're onna go and talk to some some groups who represent the Fishermen, although gona talk go to the communities that never happens in date, scientists and data cities, they just don't go to talk to the communities who are involved. So we wanted to be that voice to represent the survey membership, make sure that we representative accurate and lets them that what is still going on i'm not doing, and but still still happy.

nice. Marni did a fact check here know I believe that this episode is actually IT looks like it's going to be released on the day of the us. Presidential election. One of those candidates has been refusing interviews of fact checking um and uh so luckily you did agree to come into one of these a fact checking interviews and uh we ve got here IT doesn't look like florian nineteen gal was the .

first president of the .

worlds single society I thought IT IT doesn't look like IT no he was something um but the the first .

president of the royal statistical society was Henry petty fitzMorris the third markets of lands down it's .

not to be going .

back in time for can I SHE is something will find .

out uh we will take a break in film and fire .

we got yeah and and only is .

and our listeners only had to hear a small amount of your typing speaking um nice well very cool uh nice to get that kind of historical context and yeah I must have been I mean, I guess was that cool experience being in the role of YSL? I was great.

No, wonderful. I mean, very many interesting people. You know, I need to give me a voice. I didn't mean the I you really why do I get advice to european commission is probably because it's something to do being part of that learn to society because you think incredibility. So that was a great plan.

So you talk a lot in your last answer about people about, you know actual practitioners providing guidance on A I my last question for you is related to the public perception of AI, which seems to be influenced a lot by high profile tech personality.

So I for example, at the time of recording, I have been watching uh, bill gates netflix special called the what's next, which is at least the first epsom is all about ai and IT became I have been laughing to myself um because I mean with apologies to bill gates is a very impressive individual and quite learned IT but IT became quite obvious at least of the time of this show being filmed which looks like he was about a year ago, looks like he was twenty twenty three based on kind of like the ChatGPT related things that they're talking about. And it's pretty clear that filled gates does not have an understanding of A I that I expect the vast majority for listened to the podcast have. And so that was a really interesting experience for me, because I would think that he is the kind of person that would understand these things well.

But in the first deficit, the funny is far from me so far, is bill gates as this yellow no pad. And he has like the words like train in a box and then like reinforce like rain off in the box is something. And the people who are filming IT as he's explaining a bit about the same kinds of things that we are talking materia supervised learning, reinforcement learning um there this they they made the director oral decision to use footage zoom in of his notebook on this like this like it's like some important oh bill, get the book is great.

I do not just like what were trained in a box um and so my expectations of at least twenty twenty three bill gates expectation my expectation of bill gates twenty twenty three knowledge of A I uh was much less than I would have anticipated from him and so you have a quote um from another podcast that you did where you said it's true. I don't really know anyone in the field of ai who thinks of elon musk e as an expert on A I and you know that is the kind of person that I would expect to be more of the mark than bill gates, I guess um on our understanding of A I but yeah, we have this this problem where the public perception is being influences by these kinds of high profile tech personalities. IT doesn't seem like a people like father I or jeff hinton who really know what's going on a share, the same kind of reach as these other kinds of people who the public seems to think, oh in line mask bill gates.

these R A I experts yeah um what's the question right?

I didn't really ask person did um. Well, I guess I just I just vote your point.

You 这个 时期 又是 个 十岁。

uh, I don't know.

think I think about this.

Yes, yes, yes, yes.

So you measure before I got this shine, learning me up that you know part of the organize capacity. We have lots of academics to come on and and give talks, give some of them give really great talks, you know, is really amazing actually know I really love you thought so absolutely amazing um but I have to say that we quite often we sometimes they say sometimes we get academics on IT give talks a and that they are really they are really like over hoping stuff.

And it's very easy if you're outside the field, like become something people are talking about to read some papers, like you could become an expert like a self expect quite easily by reading stuff archive, you know you could not really newspapers and stuff and and get become myself to claim that expert quite easily. The problem is, one of the problems is there are many problems, but just going to hide that one. One is that the economics are publishing stuff and they're overclaiming like the titles of the papers are just wildly.

They just they don't have the evidence to claim what they are claiming. I were mentioned this to this kind of of there, but we we do have people IT comes to the meet up and they gave a talk and they just make up the stuff that is just very overheated ed claims and you just you really put them on the scree once you put them recruit IT just falls apart, but they don't have the evidence. Both you, I, we met in a world class research institute in genetics.

IT was world class. So, so we learned at first hand what IT means to be really rigorous and what the sense of method is. You know, at the highest level, I not said I was the highest level. I was the higher level, but we definitely around people who working at the highest level um and we we took on board a lot of lessons then and and I just feel like and I actually sometimes get quite annoyed with some of our speakers.

I have to say we have someone recently, he came, they gave us all and they said, know not i'm not going to talk about any of the technical stuff here because I don't think you're going to be interest. You know, we don't have time to talk about the technical stuff. 芬, you know, you're in a technical need up.

You you should be on the scrutiny and and I think we all need to do Better in terms of raising the bar of the scientific culture with machine learning. And I think that if we did, we would do much Better like that. This go to get some way to serve in the problem that you're talking about.

You used in our back and all day in working in genetics. People use the right papers and and I keep in your university and then you'd have A P, R department who would like make up these massively overclaimed headlines that would go into the newspapers. But now people skip the P R, the P R team and they just do IT themselves, the academics directly.

They just cut the out people out the job and I just don't think that's A A A positive. Um so I guess we should yeah what should we do about that? We should stop doing that.

Nice said. Are you talking about a high level of rigor and science? Uh, reminds me of something that happens.

You probably don't know this that might have actually happened. I might have still been doing a PHD and you'd moved on. I can't remember what the timeline was exactly, but john fan f.

Lyn, who ran the lab that both of us were in and he's an episode five, four, seven of the show. People wanted hear more on genetics and the intersection with machine learning is great. He is he is an outstanding leader uh, in the space.

He's probably the world's foremost psychiatric geneticist and um he was a big believer in following the data, following the scientific method to the extent that there's something that he made me do which was extremely embarrassing and I IT he must have known that something like this would happen but he made me go through IT anyway was um you know this was actually I now remember IT was IT was before I even started the P H. D. I was I did a master's project with him and this and this was the result that came out of my major's project.

Um I had I had looked at the relationship between so at the time that I would zoom my master's research, there had been a paper that had made a lot of headline that has shown that magnetic fields that um you know looking at satellite imaging animals in the fields like cows tended to be aligned somehow magnetically. Like there some there are statistically more likely to be facing north or something magnetic north. I don't I don't remember the details but so JoNathan had me look into um looking into behaviors like related to fluctuations of the moon.

IT was something really bizarre and I was and there were some statistically significant effect like there was some some biochemical uh some some biochemical things in mice turned turned out to be related to moon cycle. And you know, that couldn't be related to light because laboratory slapped by they, they they don't see daylight. They're in the dark in like some interior room.

There's no windows in the room they're in. So IT would have something to do with managers and something like that. And so I can't remember what exactly the the chemical was IT let's just say was calcium. Calcium turned out to be statistically significantly related to um to fluctuations in the moon, the moon cycle.

And so johnna than made me go to the radCliff hospital, just a huge hospital, and me, with researchers that had similar kind of biochemical data on humans, and say to them, I found this relationship with between calcium in the moon. I would like to have some human data, please. Well, he just not that made me do that and they and they took one meeting with me and then refuse to answer any of my subsequent emails.

But IT, but it's a great experience to yes.

So I don't know what he is teaching me there. Something something like I guess, you know it's about following the data that you know if you is unlikely, that seems like probably just dispirate correlation. But if you could show IT in a completely unrelated data set, I don't know. It's something interesting and publisher P R team would have loved IT.

Yeah I think I think yeah I guess I guess you're right. Like you have to fall on the data. You need to be dismissive er and and even something you believe to be completely crazy. If the data says that that is worth investigates again, I mean, you don't need to be wait to IT, but you might be worth investigates you yeah.

I guess that's the fear is that you know this person, this physician, they don't really know me so like this guy must be a acta but really i'm they're like, I don't believe this I don't want to believe this um please provide me some more data to prove that this isn't true. Um well, Martin has been awesome having you on the show. I really enjoyed reconnecting here on air. Before let you go, I always ask my guests for a book recommendation. Do you have anything for us?

Why do you have I have two recommendation actually. So these are books that I really love um and not really about A I although you might be a kind of kind of related. So one is vision in the brain.

Whenever and anyone asked me this question I always mentioned, this book, which I absolutely we adore, is called vision of the brain, as by James stone. And it's about it's just about human vision system and it's goes into IT is to such a lovely book about about how our brain processes visual information. But IT IT gives us a technical detail, but IT explains IT in such a way that you don't need to be a specialist understander.

It's it's, is for me just really open my eyes to so many different things. I already built that book. And the second book is.

Evolution of cognition by David balls, which also opened my mind to send different things. And any any really is about how really about humans is how much was. No, how animal actually is not true.

IT talks about humid cognition, but IT starts from single solution animals. So it's everybody is talking about biological cognition. So i'm interested, I, but I was so interested in animal commission, so it's just really amazing. But I really .

people for that is super cool. I also unfashioned by this. I mean, this is, I mean, technically, my P H D is a neuroscience.

This kind of how our brains allow us to you perceive the world, to think all the thoughts we think and do the actions we do is pretty wild. And ah yes, that's a great recommendation. I may have to check that one out myself because I didn't heard of IT. Um Martin, what's the best place for people to follow you um after this episode if I if if my memory for you you aren't .

you aren't a prolific social media user these days. Quite make my.

thinking.

I just looked into a years ago, maybe around the time that I asked you to be a guest four years ago.

yeah I I didn't use to very much, but now I do. I think links got a lot Better and you can actually have some many interesting conversations AI. Like now I didn't used to be that wasn't like this .

back yeah really yeah on muss done a great job of making IT the defect or platform yeah yeah .

that's that's what happened with me. Yes, that's exactly alright.

Thanks, Martin, for taking the time really appreciated and yeah exciting all the things you have going on over at evolution AI um and yeah even just seeing in the background for people who washing the youth version, they get to see um Martin in this beautiful what's called the garden room of his home and so there's you can see garden on opposite sides of the room uh which is a really beautiful thing and that makes me this uh oxo which went it's Sunny which is rare but your experiencing .

right now IT seems like the most wonderful place on earth so well so much good fun.

Nice to recap in today's eps. Ode doctor mark goods and filled this in on ten reasons why data science projects failed, including due to issues like lack of data readiness over the complex models and poor reproductive.

He talked about how there is a need for more publicly funded, open source AI development to ensure benefits are widely distributed, how practitioners and technical experts need more of a voice in A I policy discussions that rigorous scientific methods and healthy skepticism are crucial as AI capabilities advanced rapidly. And he provided his sites on how into disciplinary knowledge spending fields like stats, computer science and biology are invaluable. Bor A I development ah right.

As always, you can get all the show notes, including the transcribe for the episode, the video recording, any materials mentioned on the show, the euro for Martin, social media profiles, as well as my own at super data de dog comms lash, eight, three, three. And next week, if you'd like to connect in real life, I will be in new york on november twelve, conducting interviews at the scale up A I conference, which is run by the iconic VC firm inside partners. This is a slickly run conference for anyone keen to learn a network on the topic of scaling up AI startups.

One of the people i'll be interviewing will be no other than Andrew in one of the most wildly ly known data science leaders, and very much looking forward to that cool that Martin also interviewed Andrew in the past. I've got a link to that interview in the shower notes for you. And right after conference in new york, I will be flying overnight to portugal to give a keynote and host a half day of talks at web summit.

Um so that runs from november eleven to fourteen th and lisbon and portugal with over seventy thousand people intendance, one of the biggest tech conferences in the world would be cool to see you there alright, thanks, of course. Do everyone on the superdad designs podcast team, our podcast manager of on a sea bert, our media editor maria ombo, Operations bander nataly ji, our researcher surge macy, our readers dr. Sarah Carter and Sophia og.

And yes, of course, as always last, our founder, curl erromango, thanks to all of them for producing another fun episode for us today for enabling that super team to create this super podcast for you, or so grateful to our super sponsors. You can support the show by checking out our sponsors links, which are in the shown notes. And if you would like to sponsor and episode, you can get the details on how by heading to john cronan dog com slash podcast.

Otherwise you can help us out by sharing this eissa people who would like IT reviewing IT on your favorite podcasting upper on youtube. Subscribe, of course, if you're not a subscriber that goes without saying, come on but yeah, most importantly, I really just told you keep on listening. I'm so grateful to have you listening and I hope I continue to make episodes you love for years and years to come. So next time, keep on a rock and out there. And i'm looking forward to enjoying another round of the super day to signs podcast with you very soon.