#267 No More NoSQL? How AI is Changing the Database with Sahir Azam, Chief Product Officer at MongoDB

2024/12/5

DataFramed

Richie

Sahir Azam

Richie: 本期节目探讨了数据库技术的变化，特别是AI对数据库的影响，以及如何选择合适的数据库。 Sahir Azam: MongoDB已经从一个NoSQL数据库发展成为一个通用的开发者数据库，旨在提高开发效率。传统的关系型数据库在设计之初，硬件成本高昂，开发者效率并非首要考虑因素。而MongoDB等现代数据库，在廉价硬件和分布式计算的背景下，更注重开发者效率。MongoDB的文档模型更符合现代开发者的思维方式，提高了开发效率。此外，MongoDB还集成了搜索、向量和时间序列等功能，简化了数据库架构，降低了运营成本。 Sahir Azam: 云数据平台的普及使得企业更倾向于使用托管服务，将资源集中于应用程序开发而非数据基础设施管理。数据库的简化也使得企业更容易采用和管理数据库，并更容易培训员工。 Sahir Azam: 生成式AI将促进更复杂的软件开发，增加对数据库的需求，并能够处理非结构化数据，解锁更多数据价值。生成式AI在汽车诊断和药物审批流程优化等领域已有应用。 Sahir Azam: 企业需要整理和组织数据，并从旧系统迁移到现代系统，才能充分利用生成式AI。生成式AI可以降低旧系统迁移的成本和风险。 Sahir Azam: 现代技术正在改变软件开发人员所需的技能，例如机器学习和AI技能。数据工程师的需求也将增加，以满足对数据处理和组织的需求。数据团队和开发团队之间的合作越来越紧密。 Sahir Azam: 产品设计和用户体验仍然是软件开发的关键。生成式AI只是工具，并不能完全取代产品设计和用户体验。 Sahir Azam: 对数据和AI领域的未来充满乐观，认为AI将提高整体生产力，并创造新的就业机会。 Richie: 本期节目主要围绕着数据库技术，特别是AI对数据库的影响展开讨论。我们探讨了数据库技术的发展历程，从早期的关系型数据库到如今多样化的数据库类型，以及AI技术如何改变数据库和软件开发的方式。 Sahir Azam: MongoDB作为现代数据库的代表，其核心优势在于提高开发者效率。它采用文档模型，更符合现代软件开发的模式，并通过整合多种数据库功能，简化了开发流程，降低了运营成本。 Sahir Azam: 云数据平台的兴起改变了数据库的使用方式，企业更倾向于使用托管服务，这使得开发者可以专注于应用开发，而无需过多关注数据库基础设施的管理。 Sahir Azam: 生成式AI的出现为数据库带来了新的机遇和挑战。一方面，它将推动软件开发的复杂性，增加对数据库的需求；另一方面，它也能够处理非结构化数据，解锁更多数据价值。 Sahir Azam: 生成式AI的应用案例包括汽车诊断和药物审批流程优化等，这些案例展示了生成式AI在提高效率和降低成本方面的巨大潜力。 Sahir Azam: 企业需要做好数据准备工作，将数据从旧系统迁移到现代系统，才能充分利用生成式AI。 Sahir Azam: 现代数据库技术和生成式AI正在改变软件开发人员和数据团队所需的技能，开发者需要掌握更多AI和机器学习技能，而数据团队则需要更专注于数据处理和组织。 Sahir Azam: 数据团队和开发团队之间的合作越来越紧密，这需要组织内部加强沟通和协作。 Sahir Azam: 尽管生成式AI技术发展迅速，但产品设计和用户体验仍然是软件开发的关键，需要专业的技能和经验。 Sahir Azam: 对数据和AI领域的未来充满信心，相信AI技术将提高整体生产力，并创造新的就业机会。

Deep Dive

Key Insights

Why is NoSQL no longer a dominant term in database discussions?

MongoDB now positions itself as a modern general-purpose database capable of handling a broader range of use cases than traditional NoSQL databases, which are often seen as limited to specific functionalities.

What are the main types of databases developers need to know about today?

The main types include relational databases, document databases, search databases, time series databases, and vector databases, each tailored to specific application needs like IoT, AI, and operational data.

Why has developer productivity become a key focus in database design?

With cheap hardware and distributed computing no longer being the primary constraints, the focus has shifted to making developers more efficient, allowing them to build high-quality software faster and create better user experiences.

How does MongoDB enhance developer productivity?

MongoDB uses a document model that aligns with object-oriented programming, making it easier for developers to reason about data. It also integrates features like search and vector capabilities natively, reducing the need for multiple databases and simplifying development.

What challenges do developers face when integrating multiple databases?

Developers often need to manage complex sprawl with separate databases for search, time series, and AI, leading to data duplication and operational overhead. MongoDB aims to simplify this by integrating these functionalities into a single platform.

How is generative AI impacting the database landscape?

Generative AI is driving the need for more sophisticated software and data infrastructure. It also enables the use of unstructured data like audio and video in real-time applications, which traditional databases couldn't handle effectively.

What are some practical use cases of generative AI in enterprises?

One example is a European automaker using AI to diagnose car issues based on audio patterns, reducing diagnostic time from hours to minutes. Another is a pharmaceutical company using AI to auto-generate clinical study reports in minutes instead of weeks.

Why is modernizing legacy systems important for leveraging generative AI?

Legacy systems are often too rigid and outdated to integrate with modern AI applications. Modernizing these systems allows organizations to unlock the value of their data for AI-driven insights and applications.

How is MongoDB using AI to assist in modernizing legacy systems?

MongoDB is leveraging AI tools to make the process of migrating and modernizing legacy applications less risky and more cost-effective, helping organizations transition to modern data platforms.

How is the role of developers changing with the rise of AI and modern databases?

Developers are increasingly expected to have AI and machine learning skills, as these technologies become integral to software development. This shift is moving AI capabilities from centralized teams to being embedded in every development team.

Chapters

This chapter explores the evolution of databases, focusing on the shift from hardware constraints to developer productivity as a primary concern. It discusses the changing landscape of databases beyond NoSQL and introduces the concept of a modern, general-purpose database.

Hardware constraints are no longer the primary concern in database design.
The focus has shifted to developer productivity and efficiency.
Modern databases need to support a variety of use cases and integrate seamlessly into development workflows.

Shownotes Transcript

Translations:

中文

The constraint of hardware being the expensive component is no longer the case. It's really more, how do you architect systems properly and how do you make developers able to be efficient and productive so that you can get, you know, ship software at high quality and build those compelling experiences. Welcome to Data Framed. This is Richie. For a long time, when you wanted to choose a data brace, it was pretty simple since you just had a few relational databases to choose from.

Then along came NoSQL and things got a bit more complicated. And now there's a database for every possible use case. Today I'm continuing my quest to try and figure out what a sensible text tag looks like and get another opinion on how to decide what database to use.

Inevitably, one of the big drivers for change in database technology is AI. So I want to explore what's changing and what the implications are for data professionals and developers. Our guest is Sahir Azam, the Chief Product Officer at MongoDB. Under Sahir's leadership, he's been helping transform Mongo from being a NoSQL database to a more general developer database. So he spent a lot of time thinking about developers' needs.

He also serves on the boards of Temporal and Observe, a cloud data observability startup. Sahir joined MongoDB from Sumo Logic, where he managed platform pricing, packaging, and technology partnerships. Before Sumo Logic, he launched VMware's first organically developed SaaS management product and grew their management tools business to $1 billion plus in revenue.

Earlier in his career, Sahir also held technical and sales-focused roles at Dynamic Ops, BMC Software, and BladeLogic. All right, let's learn about databases. Hi, Sahir. Welcome to the show. Hey, Richie. Thanks for having me.

Cool. So with MongoDB, I rather strongly associated it with the NoSQL movement, but I noticed that's gone from the branding. So does that mean is NoSQL not a thing anymore? You know, I think many people in our industry, our ecosystem still think of us in the category of kind of NoSQL or non-relational kind of database category. I think from our perspective,

We consider ourselves quite a bit different than some of the other technologies in that space. And so when we describe our technology, we tend to use a broader framing of a more modern general purpose database because we think it applies and does apply in our customers' environments for a much broader set of use cases than what I think the typical developer or technologist may think a NoSQL database is capable of.

Okay, so perhaps we can get into more depth on what MongoDB is in a moment. But it seems like for a long time you had SQL databases and NoSQL databases, and now there are lots of different types of databases for pretty much every use case. Can you just give me a quick overview of what are the main types of database people need to know about?

Yeah, I would step back and I think the reason why you're seeing a variety of different database types is fundamentally driven by the fact that software and the experiences that developers are creating on behalf of their customers, whether that be an internal piece of software or something that powers all of our other business or personal lives, it's just more and more advanced and getting more complicated, more sophisticated in the types of experiences that people are building.

The traditional relational SQL database model was invented 40 years ago at a time where the majority of software was back office software, maybe for some accountants or for some back office bookkeeping in an organization. And it was really optimized for not only those types of applications, which are very different than the average application today, but also in a world where they were optimizing back then, hardware and storage were really expensive. Developer time and productivity was sort of an afterthought.

But MongoDB and a lot of these new modern technologies were started in a world in which we have the benefits of cheap hardware and the price is constantly driving, distributed horizontal compute and all of that. So that's not really the constraint anymore. And yet everyone is really thinking about how to make their developers more productive so they can spend more of their time building those delightful experiences I mentioned.

The whole technology landscape and software landscape has changed drastically in those last few decades. And I think various database technologies have evolved to serve those in a much more efficient way. That's interesting that the infrastructure problems, I guess you're saying, have basically been solved. So no matter how much data you've got, there's a database that can handle it. And so you have to focus on developer productivity a lot of the time. So can you talk me through, what do you do to make developers more productive when they're working with databases?

I'm not trying to say by any definition that cost is not a concern for customers or scalability isn't something that a lot of technologies still have to build and continue to increase. But I think in general, the constraint of hardware being the expensive component is no longer the case. It's really more, how do you architect systems properly and how do you make developers able to be efficient and productive so that you can get shipped software at high quality and build those compelling experiences?

I think a lot of what you're asking goes back to the founding of MamboDB. Our founders, they were developers and CTO and developers at DoubleClick. At the time, one of the largest high-performance pieces of software in the world. It was running that dual-sided ad network, eventually got acquired by Google and powers the majority of Google's revenue, I think, through to this day. They really face two types of problems, both of which I would characterize as fundamentally about scale.

One, you know, as their application and their platform became larger and larger, serving more ads at a faster and faster performance requirement, it became quite expensive to throw hardware at a traditional relational kind of monolithic architecture with some of the typical technologies then. So that was a problem of cost scaling and performance scaling.

But then there was the more subtle point of scale, which is really around productivity. You have a small development team, you can build on any stat, leverage any database and still be relatively quick. But what they observed is as they went from dozens of developers to hundreds of developers to thousands of developers, they didn't get a similar return of productivity back from those teams. They felt like their teams were getting slower as they scaled.

One of the reasons, certainly not the only, but one of the reasons that was significant for that was just the rigidity of the traditional database model at the time. The heavyweight relational schema, the change management to take those down. Every change was a big program in its own right. That was one of the factors that led them to feel like, okay, developer productivity was going to be really important.

And so when they released MongoDB, they solved for the scale and cost by leveraging distributed systems architecture. MongoDB is a distributed database. You get things like high availability, failover, the ability to scale endlessly, effectively, horizontally, which makes things much more cost efficient.

But then they chose a different data model rather than relational to build the database itself. And that's what we call the document model, which in their eyes and certainly what we've seen in the 15 years since the founding of the company is a much more natural way, especially for developers building an object-oriented programming, to think about reasoning and therefore it makes them more efficient and more scalable as they build more and more capability over time.

That reminds me of the classic business sense about the mythical man month where you try and scale a number of people and then your productivity doesn't scale linearly with that as well. So it's interesting how you've got to think about architecting your processes and your talent as well as just the technology.

Absolutely. And I think that the idea of a database being built from the developer in modern developers' mindset and needs first was a new thing. And that certainly led to the extreme popularity of MongoDB.

Once it was open source, but not only MongoDB, you mentioned there were other kind of NoSQL or non-relational, all with different flavors and caveats. But that, I think, showed that there was demand in the developer ecosystem for a different way of working with operational data. You mentioned this idea of the document model. So a document database rather than a relational database. What's the difference and why would you want a document model?

Sure, yeah. In a relational database, if you're modeling, for example, a customer or a product, it's typical that that is represented in not a single row and table, but in a handful of rows and tables that you have to manage the relationships between.

and have a rigid kind of schema around, which makes it harder to make changes quickly over time. And it creates a cognitive burden because when an application developer is reasoning about the business objects in their code, whether that be a customer, a product, whatever it might be,

They don't want to shred it into a bunch of rows and tables and then have to reconstruct it every time they want to access that data pattern. The more elegant way is to persist the objects that are retrieved and persisted by the application all at once. And because the hardware is cheaper, it's more cost-effective to store all that information together. And we can start a database company in the 2000s and 2010s versus 40 years ago. That constraint was removed. So that was kind of the linkage between the point I was making.

bringing up earlier. And that makes it just a lot faster for developers to cognitively reason about what they're building, because the database just feels like it's integrated into their core development workflow and the business objects that they're coding against, as opposed to having to think about the business objects and the application code, and then remember how that's modeled in a schema, write a different language, aka SQL, to interact with that data, and then have to meld those two things together.

Okay, that certainly seems to make sense. So if you're writing object-oriented code, as I guess most application developers do, you want something in the database to look very similar to object-oriented code rather than have a... Yeah, the access patterns and just the business objects that you're managing in code. And

Certainly, that fundamental insight and building a database with that mindset is what really distinguishes MongoDB. The richness of handling all those objects with all the sophistication we have is what distinguishes MongoDB versus

SQL or non-relational players, but also obviously the corpus of relational databases, whether they be newer offerings or the more traditional ones that we hear about. Okay. So having this document model seems one way in which you're helping developers to be more productive. Are there any other ways that think about making developers more productive and having a database for developers?

Yeah, absolutely. I think there's two macro points I'd add beyond the core data model itself. One that's closely linked to it is if you're building an application in 2024, it has all sorts of different requirements. It's very rare that you're going to only have a SQL relational database powering that application.

we typically see that you also need a separate search database. If you're doing an IoT system, you need a separate time series database to be performing for that type of information. Or these days with AI driven applications, you see a vector database get introduced for certain use cases

So what we find is that, combined with the simplicity that the cloud providers have made it easy for any developer to spin up a new database, has actually led to a lot of complex sprawl in the average application's database architecture. Because it's all these different things creating duplication and serving these narrow needs, as opposed to in the past, it was largely one database could serve your entire application's needs.

We spent a lot of time at MongoDB saying, okay, we have this really rich data model, the document model,

but we're still seeing all these other pockets of things having to be bolted around a traditional relational database. How can we simplify that? So it takes its form in two ways. One, how do we make sure our database can be an effective and superior alternative to relational databases for the use cases, systems of record in particular that relational databases are suited for? So we're not just another bolt-on, we're actually a fundamental replacement for it. So that led to years in R&D adding

Schema governance, enterprise security, transaction guarantees, all the things that people associate typically with the relational camp, but not with the NoSQL camp. And we brought that forward into MongoDB so that we could serve the breadth of use cases that over the decades that relational databases have been known to be accustomed to.

Then we said, "Okay, what are the other types of areas where we see most commonly developers have to bolt on some more niche solution?" We saw search as a very critical one. Four and a half, five years ago, we integrated search into the document model, into the database, so it feels like it's just native and not a separate system. We manage all that synchronization, so you don't have to worry about it operationally. For AI, we added vector capabilities natively into the document model in a really elegant way.

Time series data came from our customers saying, hey, I don't want to have to stand up a separate time series database. I use MongoDB for my core transactional data. Can you make it much more performant for time series so I don't need to bring in another tool? And so we've simplified that.

We call this idea of like a developer data platform, but it's really just about looking from the modern applications needs backwards and saying, how do we deliver that in a way that's elegant and seamless for a developer? So they spend less their time bolting together four or five different technologies and then obviously the operational cost and burden of managing that over time.

That's such a common problem. We talk about this a lot on the show is the idea of data silos where your data is just stuck in different places. And certainly if you've got like four or five different databases, then you by definition got data silos. I'm curious though, so a lot of these databases like time series databases tend to be quite specialized. How do you manage performance

performance than if you're trying to do everything in the same database. Yeah, and I want to be clear, I don't think the days of an organization having one database as a standard for everything are ever going to see themselves. There's a lot of preference that developers express in choosing the right technology for the job, and

Our goal is not to be the one database to rule them all or anything, but it's to be general purpose enough that for 70, 80% of the common operational use cases, and especially large organizations, that they shouldn't have to go reach for a highly specialized solution.

Now, there will be certain edge cases or use cases that are so deep in time series or so deep on graph traversals and graph capabilities that a general purpose platform like MongoDB may be not the right choice. But we think that the average of what they get used for is actually not that complicated.

And so that just comes down to us optimizing the core database engine, the storage, the indexing for these different use cases. And we pick the ones that we think are best suited for the document model that we think are pervasive enough that it warrants investment from us to go solve that problem. And that's how we come to those decisions.

But by all means, it's an effort in performance tuning, optimization, indexing to make sure that we can capture at least 50, 60, 70 plus percent of the workloads that ever need those features, but in a performant way, in a much more simple, integrated fashion than having to always reach for the specialist solutions.

Okay, so just cover the main use cases all at once. That seems sensible. So I'm curious as to whether changes in database platforms or data platforms more generally have changed how both software development is done and how data teams work as well. What's the impact on these other teams that are working? I mean, one big shift over the last decade certainly is just the level of comfort that developers and especially large enterprises even now have with cloud data platforms.

even a decade ago, short, fast-moving, less risk-averse organizations were like, okay, I don't have the time to manage my database infrastructure. I'm just going to go use a service from AWS or whomever else. Now it's almost like the default in many organizations. Yes, there are workloads on-premises, some of which will always stay there. But when a workload is migrated or built in the cloud, I think there's just

The entire industry is almost leaned to the fact that, okay, managed services, cloud services are the way to consume this because it's a much more effective experience and your dollars are better spent building applications than managing data infrastructure. So I think that big shift started initially by the hyperscalers, but then look at MongoDB or Snowflake or Databricks and the other parts of the database market. There are now a handful of really

mission-critical scale cloud data platforms that people trust. And I don't think that was the case a decade ago. So I think that step changes a lot.

And then this idea of kind of simplification of not needing to have five niche databases that you have to bolt together and connect the dots and do a data duplication, but just simplify, frankly, makes it easier for organizations to adopt. A lot of our larger customers, the large enterprises are saying, you know, I want two or three standard database offerings. I don't want 25, you know, niche technologies. And so how do we then govern that? How do we get skills reinforcement?

A lot of times, it's not whether the tech can do something, it's can they train all of their developers on a set of skills that then are repeatable and reusable across many jobs to be done, so to speak, in the organization. These are the real concerns customers face day in, day out in the real world. And the idea of having

A vendor they can trust that can solve a lot of problems and can develop this sort of skills inertia in their organization is definitely a benefit of kind of the simplification of these things into a unified and elegant way to build.

Okay, I definitely like the idea of the simple tech stack. So it makes it easier on procurement, makes it easier on governance, makes it easier on upskilling as well. Okay. Exactly. So I don't think we would get away without talking about generative AI. Since it's working its way into everything, I'm sure it must be changing databases in some way. Can you talk me through what those changes are? At a macro level, the thing that's exciting about generative AI is I think anything there's going to be more

sophisticated software being created in the world, right? Because one of the most powerful use cases for generative AI is code assistance, or now we're talking about agents that can do more sophisticated software development tasks. And, you know, I think that's only going to increase as the tools get better, as the models jump with any generation. I think the idea of creating software is going to become easier and easier.

Whether that's a 10x developer now being a 100x developer or for a more simple application, you have somebody who's less technical being able to generate some software themselves. Both of those trends are happening already. And even though I'd argue it's still early days, it's a very clear line to see how that just gets better and better. And so more applications means there's going to be more data that's necessary to serve those applications. So we absolutely are excited by this as the next kind of level application.

change in the software industry around application development. That's kind of the macro framing. I think in terms of data infrastructure itself or data itself, one of the things that's interesting is the majority of the information that we interact with in our mobile phone applications or web applications that we use in our personal lives or business lives is it's largely structured or semi-structured data that's powering those applications.

Yet, 70% plus of the world's information, I think I read from one of the analysts, is truly unstructured information: audio, free-form text, video content, and the like. That type of information, besides for maybe the most basic uses, has never really operationally been powering applications in the way that generative AI and these models will allow us to.

Because now you can start to use GenAA models to run similarity search, infer meaning from all of this unstructured data and make it usable for real-time applications in a way that just wasn't really possible. So that unlocks a whole corpus of value and knowledge of human information

that this software can now leverage and build on top of that really wasn't possible except for displaying images or streaming video, like those types of use cases. I think that's a macro level, very powerful concept. I was looking at a presentation from

Actually, I think one of our investors. And I still think it's early days. Just like the first versions of the iPhone came out and the apps were pretty simplistic, you know, the flashlight app and things of that nature. The idea of what's possible in applications with generative AI, I still think is in its infancy. As I think models get faster and more accurate as the cost drives down, we're going to see new business models and experiences be created.

that we just can't even conceptualize today. Just like we couldn't believe that, we couldn't foresee necessarily an Uber or an Airbnb or these other business models that came up in the mobile era. I think

More powerful generative AI will create new types of software and application interfaces that we haven't even thought of yet. So I think it's early days, but I'm really excited because I think it's that kind of primitive that any type of application in some way, shape, or form can benefit from over the long term. And I think it'll create experiences that we just haven't seen before.

Yeah, definitely exciting times. And so it seems like there's a mix there. So some of it's about helping developers be more productive. So you mentioned the idea of AI code assistance, and some of it's about things that are going to impact the end users directly. So for example, you mentioned like, well, I guess all those new applications that are going to be created.

You mentioned the idea of that all these unstructured data types, so text and images and audio and things like that. These are now kind of data and you can do cool stuff with them using AI. Can you make it a bit more concrete and just talk me through like some of these use cases? I'll give you two examples. These happen to both be large established enterprises. And certainly there are plenty of startups in the AI ecosystem, many of whom

build on MongoDB that are doing awesome stuff, building net new businesses with generative AI in the heart of the application. We're actually seeing a lot of experimentation and now even production stuff happening in traditional enterprises as well. Two of the ones that stuck out, one, we work with a

very large European automaker. And one of the problems they went after with audio models in particular was car diagnosis. So what they did is they created vector representations of the common sounds that their models of cars make when they have certain issues.

And, you know, anyone who's driven a car knows you sometimes you just know something's wrong. You hear a certain rattle of various sorts. Well, they were able to basically catalog those and turn those into BNAI model, gender BNAI model into vector representations. So now when a car shows up in one of their shops, they can record the issue that that particular car is having.

and use it to do diagnosis really fast against a known set of issues. And that's a similarity. So it's using an AI model that basically generated off these audio files. So that actually shrinks down the amount of hours and time spent for diagnosing a particular problem massively, especially when you extrapolate it across the thousands of different dealer or third-party sites that they have globally.

If you can shrink something that takes a skilled technician hours to be now something that can be done by a less skilled technician and can be done in minutes just by doing the similarity search, Matt, that's billions and millions of dollars of savings to that organization. That's the diagnosis part. The next part is then, okay, how do you actually fix the problem? Well, right now, most of the steps for these issues, you find your diagnosis code and then you have a manual that you go through around the steps and parts required to fix it. Well, they also just...

layered in a chat bot on top of all their repair manuals. So now the technician can just say, okay, this audio diagnosed this problem, what are the three steps? And it just gives a nice summary without having to troll through the PDFs and the physical manuals to get that answer to kind of closing that loop. Like a very pragmatic, applicable use case. I think that we can all sort of intuitively understand. Another is we worked with, actually I was just on a panel with the team that built this at Novo Nordisk, a pharma company.

They have, like any pharmaceutical company, a pretty heavy paper process they have to follow to submit new drugs for approval. It's called a clinical study report. And that typically takes...

A lot of people, a lot of time to write and review manually. They ended up training a model on all their submissions, both draft submissions and the ones they submitted to the various regulatory bodies. And now they have a model that auto-generates based on the raw input data from the clinical studies, the first draft of that CSR, which used to take weeks. And now it takes 15 minutes to generate a reasonable quality first draft.

Now, it's not like they're submitting that directly. They still need to review that and pass around. Obviously, the stakes are high for these types of things. They obviously want to, they still have a lot of manual review in the process, but it's shaved off weeks of work to minutes to get a much higher quality initial draft and having that done manually and training people to do that and get them up to speed on a particular domain.

So these are two kind of relatively recent use cases where we were very lucky to be part of ideation and build out and proving out these proof of concepts. But I'm sure we're going to see even more and more over the coming year or two. Some very cool examples there. I'd say the first one about the car making noises. I drive a 20-year-old car, so it has a lot of squeaks and funny noises. So yeah, that sounded incredibly useful. But also the other business use cases. Yeah, like just from simple like working with documents to...

more sophisticated examples. It feels like there are lots of opportunities just to make use of this new technology. What's interesting about Gen AI is, you know, it's non-deterministic and it's like it's named. It helps you generate knowledge and information in a way that just wasn't really possible before with classic software. And so the types of applications, I think, will really go after services industries that were very human intensive and software really wasn't ever a great fit for. So I think it's less about

disrupting existing software, which I think some of that will happen, and more about bringing software to industries that just was never really a great fit for before. So I think a lot of businesses now think, well, how do we make use of all these other data types and file types that we've got lying around?

Does that require a change to your fundamental data infrastructure to take advantage of this? What do you need to do? Yeah, there's a lot of project work happening in the industry and certainly the consulting world is benefiting from some of this right now in terms of getting your data house in order.

So how do you probably catalog, tag, label data so it can be organized and leveraged? So I think that's one area of work. I think most organizations, especially large ones, the data is really siloed, as you mentioned earlier. It's hard to organize or understand. So there's a whole lot of work I think organizations are undergoing to try to get their hands around this because they recognize it's going to be valuable if they're going to benefit from AI in the future. The other is just a lot of the systems of record that house this

critical information for a business is locked up in 20, 30-year-old legacy systems. In the past, it was sort of like, okay, we can deal with just the maintenance of this because it's just too expensive to actually move off these old database systems and all that. But now, I think it's pretty...

no one's going to add generative AI capabilities to a 30-year-old application on a 30-year-old database. And if that data is the information that's going to differentiate your business by powering AI models, integrating into foundational models, then there's like a pull now to get it into a much more modern posture architecturally for the application, or obviously from a database perspective to a more modern technology standard. Obviously, MongoDB's

a beneficiary of that, but we're also leveraging AI actually as a tool to make the process of migrating these old applications and modernizing them to something new

much less risky, more cost-effective than ever before. It's kind of a circular thing here. It's interesting. Yeah, I can certainly imagine how if you got 30-year-old data on some legacy system, then it's going to be very difficult to hook that up to some sort of a generative AI application. But interestingly, this is one of the issues where data frame guests seem to be

completely divided. So some people are very much like you must store all your data for all time, whatever it is, everything needs to be archived and stuff like, well, just keep the data that you care about. So is that 30-year-old data really going to make a difference to your generative AI applications? Do you, where

It may not even be the age of the data. When I'm referencing this, it's not necessarily that the data itself is 30 years old. Now, maybe there's some value in that data. Models, what they're great at is finding relationships between information that humans wouldn't necessarily be able to understand, whether it's classical machine learning or even training used in AI models. So I suspect there's some value in some of those corpuses of data that we don't even understand yet, but the models will be able to infer.

But that's not even what I'm referring to. A lot of these applications are serving the business and customers today. They just happen to be 30-year-old software systems. So now how is this 30-year-old software system, this old application, this old database, suddenly going to be something that's flexible and performant enough and scalable enough to handle a generative AI application built on that system of record, that old set of data?

even if the information is only one week old, you know, because it was generated by the app the last week. And I think that's what we're seeing is more commonly the driver. It's just an old stack. People recognize that you're not going to use old technology to solve a new problem, which is generative AI applications. And so they need to get all that information in these applications, modernize to a new stack, and then go from there on applying and adding AI capabilities. Yeah, that certainly makes sense. And I was trying to think of some examples of 30-year-old software just for...

composite oh uh windows 95 is going to be turning 30 next year that's desktop software but think about like server side in a bank had to build a custom fraud detection system 15 years ago or insurance you have a quoting system we're working on modernizing it i think a 20 year old quotes quoting system for insurance quotes at a large insurer right now you know and

It was built by developers who are no longer at the organization. Nobody understands the code base. There isn't testing necessarily. But now suddenly this system houses critical data that they know will be powerful with the Gen AI. They use it to fine tune or, you know, augment a foundational model. They're not going to do that with their 25-year-old stack. So now they're modernizing that whole application, moving its data forward so that they can get it into a posture where they can move.

forward for the next decade off the backs of that. Okay, yeah, I can certainly see how once you get into software that's embedded in machinery or software that's like in some mainframe, some large organization, it does have to last a lot longer. I guess this is probably a trillion dollar question, but how do you go about making sure that you don't end up with that

obsolete technology stack that is 30 years old and then you're stuck with stuff that doesn't work. Yeah, I think every organization would want, if they could flip their fingers, if it was no cost and no risk, they would, of course, modernize things. But the reality is that for many applications, it's just too expensive and it's not worth other investments they could be making. And that's how these applications age. And there's a little bit of, if it ain't broke, don't fix it kind of a thing sometimes as well.

But now, because of generative AI, because of increasing push to move to the cloud, in some cases, the cost pressure or regulatory pressure now to get off of these old technologies, there are enough reasons to force the issue. And ironically, Gen AI is a tool to make those modernizations happen at lower cost and lower risk in its own right.

And so it's kind of like a moment where I think organizations are much more willing and or forced to modernize systems that they probably would have left alone otherwise. And so I think AI is a key driver of that that we're seeing in our customer base.

Then for us as a technology provider ourselves, I think it's really around making sure we continue to keep an innovative culture. We were really quick to market around adding AI capabilities to our platform, to our developer tools, our vector capabilities, etc., but also a whole lot of integrations. There's an emerging whole ecosystem of technologies that people are using, the model providers themselves, the inference platforms, a new set of developer libraries, eval tools that

we've had to integrate with to make sure that MongoDB has worked well for the most modern gen AI development as it did for cloud native or mobile native web development, which is kind of where we got our roots. And I think that's about execution. If we were not able to move fast, not stay close to those emerging ecosystems and integrate well and really learn how customers are using those and

and supporting them on the journey, then sure, I think any company that can't cross that chasm or make that leap, whatever analogy you want to use, has the potential of being stuck in a data position. I do like that it works in both directions. So generative AI is increasing the demand for more modern data stacks. It's giving you the tools in order to transition to that modern data stack as well. Yeah, absolutely. I'd also like to talk a bit about roles. So how is...

this sort of modern database technology, the modern data developer platform, how is that changing both data roles and software development roles? Yeah, absolutely. I think in general, over the last 20 years or so, database decisions have gotten more democratized. Meaning it used to be a CIO signs a large contract with one of your favorite large enterprise software companies, I'm sure you can name, and they would kind of mandate that this would have to be the standard for

Many applications have got all applications, except by exception. That would be kind of the traditional 90s way of buying software early 2000s. But I think because developers are so...

in terms of there's more demand for software than there is supply, so to speak. And we'll see how Gen AI changes that. But it's made the developer preference on tools that they love working with that makes them productive and easy to experiment really be much more powerful. In fact, MongoDB and many organizations starts because some developers

developers downloaded our open source version or signed up for our cloud service because they like MongoDB versus a more traditional option that's available, they have success with it. It starts to grow organically in those organizations. And eventually we get a more strategic sort of relationship with those customers. We have executive level buy-in and the bottoms up developer adoption. But that's a big shift just generally in terms of

the preference of technology stacks, at least for operational databases, moving more to really being a user preference as opposed to a buyer preference. And so I think that's really powerful. We've been the beneficiary of it. We love that. We've constructed a whole lot of work in community and DevRel and just even our strategy with open source to try to drive that and be really strong on that bottoms-up adoption. I think the other thing that's happening, though, more related to AI and machine learning generally is

For a long time in many organizations, data science and machine learning was a small team in a centralized fashion that was kind of like a service center for when you needed a model to solve a problem for some business user or maybe there's some software use case. They build the machine learning model and then it gets thrown over the wall and maybe integrated or implemented by a core software development team.

One of the things that I think Gen AI is changing is it's not like there's typically a centralized Gen AI development group. There may be some standards teams or a team that has best practices, but we're seeing this really shift left to be standard part of almost every software development team. Even today, the developers we see coming out of college, university, they have AI and ML skills that they're learning as part of their core CS curriculum.

So I think this idea that it's moving from sort of a highly specialized, centralized skill set and role organization is something that's just going to be more brass tacks where you're going to have either a level of basic AI skills in every development team or every development team will have an AI developer or an ML developer embedded as part of that core group across every part of the enterprise is definitely a trend that we see happening.

That's kind of cool that you have these sort of bottom-up approaches where people just adopt a tool because they like it and then that grows rather than just being enforced on a whole organization. So we've mentioned data teams, we've mentioned developer teams quite a lot. I'm wondering whether these are coming together. Certainly when you've got applications involving data, it seems like the two roles are overlapping a bit more than they used to.

Yeah, and whether that overlap means that there are organizational ways for those data teams to work more closely day-to-day with their development counterparts, that's certainly happening in some organizations. Or the idea of leveraging data to power new modern application experiences with AI is just a core part of the fundamental developer skill set of the future. I think that over time, we'll see more of the latter, but I think organizationally, those things are being pulled together more so.

every day because you need to prep your data to fine-tune an AI model or to create a RAG workflow to integrate it to a public foundational model. You're going to need the data team, the governance team to be able to be part of that. But then you need developers to obviously code that up, integrate it in, product managers, designers to think about the actual experience of that to the end user of the application. And so that forces that integration or type collaboration.

Okay, yeah, so lots of teams mentioned there to seem like creating applications that involve data or AI. It really is a team sport with a lot of the organization involved. Yeah, I'm a skeptic of this idea that all you're going to need is models and suddenly, you know, you don't need great product craft, to use a term loosely, meaning understanding the core pain and needs of your end users, designing delightful software that's

whether that's an audio interface or whether it's a classic visual interface that we think of today. There's a lot of craft that goes into that. And then it's Gen AI is in these models are a tool to solve those problems. But I don't think that goes away. The ratios might be different. This tool is very powerful. So it may change user experiences in fundamental ways.

But I think people underestimate how much true product craft is still there. And that requires different skill sets. And even the most successful, flashy Gen AI startups that we all can read about or many of whom we spend time with at MongoDB, the amount of craft that goes into creating a great product, even if it's based on Gen AI fundamentals, is very high. That's what still drives their success.

Absolutely. Yeah. It does seem like there's quite a big difference between something that's just okay and thrown together and something that's been very carefully designed. If I see another raw chatbot, I mean, come on, how much, like I'm not saying there's zero value in that, but like, it's not enough. You need a real crafted experience and you need to learn how to tie models together to get the best outcome for the use case you're trying to build. And there's a lot of complexity in that problem.

Do you think that the changes in technology, so basically more modern databases, generative AI, things like that, do you think they're changing the skills that data people need? I think it's changing the skills that the average developer needs. I'm not certain it's changing the core needs of the data teams because I still think there's a lot of data engineering that still needs to happen to prep all this information and get it organized in a way that it's useful for these workflows with gen AI, I think.

Python is getting more important, not less. And that's always been the core of the lingua franca for advanced kind of data teams in a lot of ways. There's a whole plethora of new tools and technologies that are emerging always. I was just talking to a startup this morning that serves data engineers and ML engineers, for example. So I'm not saying that technology isn't going to change, but I feel like there's more skill set in core development that needs to develop this versus the other side of it.

That's interesting that you think it's affecting developers more than data teams. I think the average application developer will need to become more sophisticated with leveraging unstructured data, AI models, machine learning generally.

Yes, they will lean on these centralized data teams, obviously, to help. But I think they're going to be, for organizations to move really fast, they're going to need to develop that skill more pervasively in the organization. Okay, so developers need machine learning skills, some AI skills. And then, I guess, on the data side, it's like, well, yeah, data engineering seems to be becoming increasingly

I think they're going to be more under pressure to serve those needs and be that specialization of the experts in managing all of that data and making it useful to those teams. So have you seen any success stories where organizations have just leaned in on these new technologies to build something cool and they've had a success?

Yeah, I mean, there are dozens of examples. I gave you a couple large company examples, but every week we pretty maniacally look at all the new user signups of our AI products. And there are, frankly, startups in every industry vertical in almost every geography of the world, whether it's Southeast Asia, Africa, the Bay Area, London, all parts of Europe. So I think

There is a lot of innovation happening. Now, we're in one of those hype cycle kind of moments where there's a whole new wave of a platform technology. I'm sure many of those ideas will fail. But like all things, from those, we'll blossom amazing new companies that I think if we're sitting here five or 10 years from now will be the next tech big brands that we think of. We saw that with the shift to cloud and mobile. I think this is going to be a similar, if not bigger, shift in terms of new types of...

organizations created. I think we're just lucky at MongoDB to get kind of a view of that across a global scale because we see 50,000 developers every week trying our products, playing with things, and we do our best to make sure that we're trying to understand the innovation that's happening, not just in the large organizations, but globally with new ecosystems that are sprouting up.

I love that it really is global and it's not just these big organizations that are getting in on it, it's everyone. All right, to wrap up then, what are you most excited about in the world of data and AI? You know, certainly there's a lot of concern around how AI will change various job roles in our industry, or broadly in our industry, in the world economy. And I certainly think that that presents the risk of being disruptive and really dangerous.

will change things fundamentally. But I also believe at the same time that at an aggregate level, we're going to be more productive overall as a species, so to speak, and it will ultimately lift people because this productivity and intelligence gets smarter and more capable. And I think humans overall always migrate to the higher order problem that we can uniquely solve. So I'm more in the optimist camp than the pessimist camp, but with the eyes wide open that there will be some

dislocation and disruption along the way. But I think, you know, it's early days of an exciting time. I think for this is a podcast aimed at technologists. I think it's an exciting time for all of us to kind of learn something new, lean in no matter what level of experience to really kind of lean into all this change and leverage that as a way to drive personal growth as well.

Absolutely. Lots of very cool things coming in the pipeline over the next few years, I hope. So yeah. Yeah. Exciting times. I agree. If I could predict exactly what the next new fancy thing was, then, you know, I'd probably be out there investing in it somewhere. But I think there's more unknowns than knowns. But I do know the unknowns are probably going to be really amazing in the long term. Nice. All right. Thank you so much for your time, Zaheer. Thank you, Richie. I really appreciate it.

#267 No More NoSQL? How AI is Changing the Database with Sahir Azam, Chief Product Officer at MongoDB 43:40 Share