#480: Ahoy, Narwhals are bridging the data science APIs

2024/10/9

Talk Python To Me

AI Deep Dive AI Insights AI Chapters Transcript

People

Marco Gorelli

Michael Kennedy

Topics

Michael Kennedy：数据科学领域存在多种数据框库，它们API相似但并不相同，这给编写兼容多个库的程序带来了挑战。Narwhals库旨在解决数据科学领域中不同数据框库之间API不兼容的问题，使得编写兼容多种数据框库的代码变得更容易。使用Narwhals可以简化库的开发，即使只支持Pandas也能减少维护工作。Narwhals的兼容性集成对库的复杂度有依赖，将库与Narwhals集成的工作量取决于库本身数据框操作的复杂性。Narwhals依赖少，体积小，适合部署到资源受限的环境。在处理Pandas索引时，Narwhals既支持索引对齐，也允许忽略索引，以满足不同用户的需求。Narwhals也适用于构建不依赖特定数据框库的应用程序。 Marco Gorelli：他介绍了自己的背景，在Quonsight Labs工作，最初是Pandas维护者，后来转向Polars和其他项目，并开发了Narwhals库。Narwhals是一个兼容层，用于在不同的数据框库之间进行桥接，本身不进行计算。Narwhals旨在帮助库维护者以较低的维护成本支持多种数据框库作为输入。Pandas v2中PyArrow是可选的，未来版本可能成为必需依赖。Narwhals支持cuDF (CUDA DataFrame库)，该库在GPU上加速Pandas。Modin试图成为Pandas的替代品，Narwhals对其支持良好。Narwhals对Dask提供了部分支持，专注于保持性能。Pandas和Polars在计算方式和API设计上存在差异，Pandas基于NumPy，而Polars基于Rust，在设计理念和API上有所不同。Polars比Pandas更严格，主要通过表达式与数据框交互，Polars的延迟执行和查询优化器使其在某些情况下比Pandas更快。在某些复杂操作中，Polars的延迟执行和查询优化使其比cuDF更快。Polars的Rust插件系统使其易于扩展。将Polars的API映射到Pandas比反向操作更简单。Narwhals与IBIS的区别在于目标用户群体不同，IBIS是一个完整的数据框库，而Narwhals是为库构建者设计的工具。Narwhals的测试覆盖率为100%，帮助发现了其他库中的错误。静态类型提高了代码的可读性和可维护性。Narwhals采用完美的向后兼容性策略，类似于Rust的版本机制，通过narwhals.stable.v1模块保证向后兼容性。Narwhals是纯Python库，但对WASM兼容性有限制。Narwhals提供narwhals.fromNative函数，用于在不同数据框库之间转换，对不同数据框库的性能开销很低。Narwhals使用类型提示和协议来处理不同数据框库，使用nw.narwallify装饰器简化数据框库间的转换，支持延迟执行，可以保持输入数据的延迟特性。完全数据框无关的代码编写对新库更容易，现有库则需要考虑向后兼容性。Narwhals提供不同级别的库支持，包括完整支持和交互支持。Narwhals的低开销源于其简洁的设计和社区驱动，通过严格应用表达式定义实现的。Narwhals的未来路线图包括支持更多库和改进文档，优先任务是支持更多库，并改进其文档。虽然Narwhals的下载量很大，但这主要是因为它被Altair用作依赖项。Narwhals的社区贡献者积极参与库的改进和扩展。Narwhals需要改进文档和教程。Narwhals未来可能支持DuckDB，使用DuckDB结合Polars API可能比直接使用DuckDB更方便。

Deep Dive

Key Insights

Why did Marco Gorelli create Narwhals?

Marco created Narwhals to solve the problem of writing libraries that can work with multiple DataFrame frameworks. He was frustrated with the lack of support for Polars in libraries, often resulting in conversions to Pandas or PyArrow, even for simple operations.

What is the primary goal of Narwhals?

The primary goal of Narwhals is to provide a compatibility layer for tool builders, allowing them to support multiple DataFrame libraries with minimal overhead and maintenance.

How does Narwhals handle different DataFrame libraries?

Narwhals acts as a wrapper around different DataFrame APIs, focusing on the Polars API. It allows libraries to accept various DataFrame inputs without needing to handle the differences in APIs themselves.

What are some of the DataFrame libraries supported by Narwhals?

Narwhals currently supports Pandas, Polars, cuDF (GPU-accelerated Pandas), Modin, and has partial support for Dask. It also has interchange-level support for IBIS and DuckDB.

How does Narwhals compare to IBIS?

Narwhals and IBIS both aim to bridge different DataFrame libraries, but their target audiences differ. IBIS is a full-blown DataFrame library, while Narwhals is designed for tool builders who need to support multiple DataFrame libraries with minimal effort.

What is the difference between eager and lazy execution in DataFrame libraries?

Eager execution, like in Pandas, performs operations immediately, while lazy execution, like in Polars, defers operations until the results are needed. Lazy execution allows for query optimization and can lead to more efficient processing, especially for complex operations.

How does Narwhals handle the differences between Pandas and Polars?

Narwhals adopts the Polars API, which is more strict and optimized for lazy execution. It avoids some of the complexities of the Pandas API, such as index alignment and multi-index handling, making it easier to map Polars operations to other libraries.

What is the role of DuckDB in the Narwhals ecosystem?

DuckDB is a powerful embedded analytics database that Narwhals could potentially support in the future. While it currently has interchange-level support, adding full support for DuckDB could be beneficial for analytical operations that don't depend on row order.

How does Narwhals ensure low overhead when working with different DataFrame libraries?

Narwhals ensures low overhead by strictly applying the definition of an expression as a function from a DataFrame to a series. It avoids unnecessary data conversions and index operations, keeping the overhead minimal even for Pandas.

What is the roadmap for Narwhals?

The roadmap for Narwhals includes helping libraries like Formulaic and Shiny/Plotly integrate with Narwhals. There is also interest in expanding support for DuckDB and improving documentation and tutorials.

Shownotes Transcript

Translations:

中文

If you work in data science, you definitely know about data frame libraries. Handus is certainly the most popular, but there are others such as QDF, Moden, Polar's, Dask, and more. They're all similar, but definitely not the same APIs, and Polar's is quite different. But here's the problem. If you want to write a library that is for users of more than one of these data frame frameworks, how do you do that?

Or if you want to leave open the possibility of changing yours after the app is built, you've got the same problem. Well, that's what Narwhals solves. We have Marco Gorelli on the show to tell us all about Narwhals. This is Talk Python to Me, episode 480, recorded September 10th, 2024. Are you ready for your host? You're listening to Michael Kennedy on Talk Python to Me. Live from Portland, Oregon, and this segment was made with Python.

Welcome to Talk Python to Me, a weekly podcast on Python. This is your host, Michael Kennedy. Follow me on Mastodon, where I'm at mkennedy, and follow the podcast using at TalkPython, both accounts over at Fostodon.org. And keep up with the show and listen to over nine years of episodes at TalkPython.fm.

If you want to be part of our live episodes, you can find the live streams over on YouTube. Subscribe to our YouTube channel over at TalkPython.fm slash YouTube and get notified about upcoming shows. This episode is brought to you by WorkOS. If you're building a B2B SaaS app, at some point your customers will start asking for enterprise features like SAML authentication, skim provisioning, audit logs, and fine-grained authorization.

WorkOS helps ship enterprise features on day one without slowing down your core product development. Find out more at talkpython.fm slash workOS.

Marco, welcome to TalkPython to me. Hey, thanks for having me. Hey, it's fantastic to have you here. We talked a little bit on the socials and other places, but, you know, nice to talk to you in person and about some of your projects. Yeah, nice to finally do it. I've been listening to your shows for years, so it's a pleasure to be here. Yeah, that's really cool. It's awesome when people who are listeners for a long time get to come on the show. I love it.

So we're going to talk about narwhals and data science, data frame libraries, and basically coming up with a way to write consistent code against all these different libraries, which I think is an awesome goal, which is why I'm having you on the show, of course. Before we get to all that, as you know, let's hear a little bit about yourself. Sure. So yeah, my name is Marco. I work at a company called Quonsight Labs, which supports

several open source projects and also offers some training and consulting services. I live in Cardiff in Wales and been at Quonsight for about two years now. Originally hired as a Pandas maintainer, but then shifted considerably towards some other projects such as Polars. And then in February of this year, I tried releasing this novels library as a bit of an experiment. It's growing a bit faster than expected and

Yeah, it's a really interesting project. QuantSight is quite the place. You know, I didn't really know about y'all before having people on the show from there, but here's my experience. I've had, I reach out to say some of the Jupyter folks, whatever, let's have some of the Jupyter people on. There's three or four people from QuantSight show up and then...

Oh, let's talk about this other project. Another person from QuantSight two weeks later, and then you're from QuantSight. And none of those connections were like, let me try to find people from QuantSight. I think you all are having a pretty big impact in the data science space. That's cool. It is a bit amusing in the internal Slack channel. If you ask a question, does anyone know how to do this? Someone will reply, oh yeah, let me ping this person who's a maintainer of that library. Exactly. I think we know how it works. Let's ask them. Yeah.

Yeah, exactly. Yeah, it's a big world, but also a small world in interesting ways. Yeah. How did you get into programming in the first place? I think the first experience with programming I had was at university. So I studied maths. I think like you as well. Yeah, yeah, yeah. That sounds really... Yeah, keep going. So far, you're telling my story. Yeah, sure. Although my initial encounter with it, I didn't... I'm not quite particularly enjoyed it because

was just having to solve some problems in MATLAB. I did find it kind of satisfying that if you gave it instructions, it did exactly that. But I wouldn't say that I felt like naturally talented or anything. I then really took to programming though, after I started a maths PhD and dropped out because it wasn't really going anywhere. And,

And once I went deeper into programming, then I realized, okay, actually, I do have some affinity for this topic. I do quite enjoy it. Yeah. I think that's often the case. What was your PhD focus before you dropped out? It was meant to be, well, some applied mathematics, like stochastic partial differential equations. But, you know, in academia, you publish or perish. And I wasn't publishing and didn't really see that changing. So...

I had to make a bit of a pivot. I imagine you made a pretty good choice, just guessing. I mean, I love math, but the options are just so much broader outside of

of academia? In hindsight, yeah, I kind of wish that somebody at the time had told me that I could have still had a really interesting and rewarding career outside of academia. And I shouldn't have stressed myself out so much about trying to find a PhD or about having to complete it when I had already started it. The secret, I think, about being a good programmer is it's kind of like grad school anyway. You're constantly studying and learning. You feel like you've

figured something out. It's like, well, that's changed. Now on to the next thing. You're kind of like, well, we figured out pandas. Now we got polars. Okay, well, we're going to start over and figure out how to use that well, right? That is true. You need to do a lot of learning, a lot of self-directed learning in particular. It's really stimulating, I must say. It is. It's great if you want that. If you

If you want to just nine to five, you don't need to stress about. Look, I think there's actually options there. We'll get to narwhals in a second, but I think there are options there. I think if you want to do COBOL, Fortran, some of these older programming languages where no, so much of the world depends on them, but nobody wants to do them. Yeah.

you could, you could totally own that space and make really good money if you didn't want to learn anything. But where's the fun in that, right? Yeah. Yeah. All right. We'll get rid of our legacy systems, but you know, this stuff does power the world. They're there for a reason, right? It's the, that works. I would really like if you don't touch it, please. Yeah. But that's, that's not the way it is with narwhals. Let's, let's,

Let's start with an overview of what Narwhals is and why you created it. And then I want to talk a bit about some of the data science libraries before we get too much deeper. What is Narwhals? Narwhal is a cool whale as far as I know. It's like the unicorn of the sea, basically. What is this library? Yeah.

So it's intended as a compatibility layer between different DataFrame libraries. So novels does not do any computation itself. It's more just a wrapper around different DataFrame APIs. And I like the Polar's API, so I figured that I should keep it fairly close to the Polar's API and in particular to Polar's expressions. As to why novels, so I was just getting frustrated with the fact that there's...

Let's say about a year ago, there were relatively few libraries that supported polars. And if libraries did support polars, it was often just done by converting to pandas or converting to PyArrow. Yet a lot of these libraries, they weren't doing anything that complicated with data frames. A lot of data frame consuming libraries, they don't really want to do that much. They want to select columns. They want to select rows. Maybe they want to do some aggregations. Like they're not doing stuff that's completely wild.

And so trying to design some minimal compatibility layer, I think is a lot easier than trying to make a full blown data frame API that end users are meant to use. So the idea with novels is this is a tool for tool builders. If library maintainers want to support different data frame libraries as inputs with minimal overhead and with minimal maintenance required on their side, this is the problem we're trying to solve. It's a great problem to be solved because...

Maybe you want to have a library that works with an abstract concept of a data frame. But usually I would imagine you have to start out and say, are we going to go and support pollers? Are we going to support pandas? And the APIs are different, not just the APIs, but the behaviors, for example, the, the

lazy execution of pollers versus the eager execution of pandas. And so being able to just write your library so it takes you there is probably a big hassle, right? Because it's just, you kind of have to have almost two versions each step, right? Yeah, exactly. Well, I actually heard from a maintainer recently who was saying that he was interested in using nowals even just at

to have Pandas as a dependency because Pandas, the API changes a bit between versions and he was getting a bit tired of Pandas API changes and was like, okay, well, if we can just defer all of the version checks and API differences to an abstraction there that might even simplify our life, even if we're just interested in supporting Pandas. Yeah, that's cool. Yeah.

Yeah, actually, that's an interesting idea. It's just like, we'll have a compatibility layer just in case. And Pandas went from one to two on major version recently, which is a big deal, and switched to PyArrow and...

all that, right? Yeah, so version 2, it was sometime last year, I think, 2023. So yeah, the Pyaro, I think there were some misconceptions around that. So as we're live on air, let's take the chance to address some Pyaro misconceptions. Pyaro in Pandas currently is optional and

it'll probably stay optional for quite a while. So there is some talk about in version 3 using PyArrow strings instead of the classical NumPy object strings. By default, if people have PyArrow installed, it's not totally decided, it's not totally set in stone whether PyArrow will be a required dependency and maybe Pandas version 4 will have it as a required dependency and it'll be the default everywhere. But that's a few years away. Yeah, maybe. Maybe.

Maybe we won't get Python 4 as well. You never know. You know, it's interesting. I think the data science space, more than many other areas, has this ability to run Python in more places, right? For example, there's Pyodide, there's Jupyter Light. There's a lot of more constrained environments that it might go in. And I don't know what the story with Pyro and Wasm and all these different things. You still get benefits there.

But there's a lot to consider. Yeah, totally. And I think that's one reason why some library maintainers are really drawn to a lightweight compatibility layer like narwhals. With narwhals, you say you don't need any dependencies. You need narwhals, but that's just a bunch of Python files. Like if you wanted to, you could even just render narwhals. Like it's not that big of a deal, but there's no extra dependencies required. Like Pandas users don't need polars installed.

And Polar's users don't need pandas installed. So if you're trying to deploy to a constrained environment where package size is limited, like if a library has now as a required dependency, as opposed to any big data frame library, and then the user can just bring their own data frame, then like this, we're really minimizing the number of installation and dependency hell issues that people might run into. I think you've covered dependency hell on the show a few times before. Yeah, indeed. I

I think one thing that's interesting for people out there listening, we'll talk about the different libraries that it works with right now. But if you have a library out there and you're listening, there's not too much work to integrate it into or make it narwhal compatible, narwhalification of a library to let it do this interchange, right? So I think I can interpret your question in a couple of ways. So I'll just play them back and let's just see. Let's do it. One is if you're a library that consumes data frames.

So we have some examples there on the readme of who's adopted narwhals. So like Altair is the most recent, probably the most famous one. I think that's where I heard of it was when I was, some news about Altair and narwhals together is actually how I heard of narwhals.

Okay, yeah. So yeah, and how complicated that is really depends on how complicated the data frame operations this library is doing. In the case of Altair, they weren't doing anything that was that crazy. They needed to inspect the data types, select some columns, convert datetimes to strings, get the unique categories out of categoricals. It wasn't that bad. So I think...

Within a few weeks, we were able to do it. Same story with Psychic Lego. There's some other libraries that have reached out that have shown interest, where it's going to be a bit of a heavier lift, but it's generally not as bad as I thought it was going to be when I started the project.

The other way that I think I might have interpreted your question is how difficult is it for a new data frame library to become Nowells compatible? And there's a couple of ways that they can go about doing that. The preferred way is if they either write to us or open a pull request, adding their library as a backend in Nowells. However,

We love open source, but I don't consider myself an open source absolutist. I understand that not everything can be open sourced. And so if somebody has a closed source solution, we do have an extensibility mechanism within narwhals such that somebody just needs to implement some Dunder methods. And then if they pass the data frame into a library that's been narwhalified, then narwhals will know how to glue things together and they'll be able to still support this closed source solution without it needing to go out into the open. Right.

It's kind of something like inheriting from a class and implementing some functions and then it knows, right? Yeah, exactly. Yeah, yeah, cool. So right now it has full API support for CUDAF, C-U-D-F. I guess that's

CUDA data frame library? Yeah, I'm not totally sure how we're supposed to pronounce it. I call it QDF. Yeah, that came out of the Rapids team at NVIDIA. It's like an accelerated version of Pandas on GPU. Yeah, that's been quite a fun one. Nice. Yeah, that's pretty wild.

The API is quite similar to Pandas, but it's not exactly the same. So we have to do a bit of working around. Right, right. Because graphics cards are not just regular memory and regular programs. They're weird, right? Yeah, that's part of it. So there's some parts of the Pandas API which they intentionally don't support and...

there's another part of it is just that the pandas API is so extensive that it's just a question of resources like it's pretty difficult to reimplement 100% of the pandas API but Moden does attempt to do that Moden does tell itself as a drop-in replacement for pandas in practice they do have a I think they do have a section in their docs where they do

mention some gotchas, some slight differences, but that's the idea. They've kind of got their own intermediate representation, and they've got their algebra, which they've published a paper about, which they then map onto the Pandas API. A pretty interesting project that was a lot easier to support their

The way they mimic the Pandas API is a lot closer. But it's been interesting. Like with novels, we did find a couple of minor bugs in Modin just by running our test suite through the different libraries, which we then reported to them and they fixed very quickly. That's pretty awesome. Yeah, yeah. That's super awesome. So Modin lets you use Ray, Dask, or Unidisk. Unidisk. Two of, one of which I know, one of which I've heard of, two of which I've heard of.

I was going to ask about things like Dask and others, which are sort of themselves extensions of Indus. But if you support Moden, you're kind of through one more layer supporting Dask. Oh, but it's better. We don't have this on the readme yet, but we do have a level of support for Dask. I've not quite put it on the readme yet because we're still kind of...

defining exactly where the boundaries are, but it's going to be some kind of partial lazy only layer of support. And it's actually quite a nice way to run Dask. Like when you're running Dask, there are some things which do trigger compute for you. There are some things which may trigger index repartitioning. I think that's what it's called. And in Nowels, we've just been extremely careful that if you're able to stick to the Nowels API, then what you're doing is going to be performant.

Awesome. Yeah, that's super cool. So one thing I think worth maybe pointing out here is you talked about pandas, pandas one, pandas two, and it'd be an extensive API. I mentioned the eager versus lazy computation, but these two, these two libraries are maybe some of the most popular ones, but they're pretty different in their philosophy. So maybe just, could you just quick compare and contrast polars versus pandas? Yeah.

indirectly Dask and so on. Yeah, sure. So, well, Panda started a lot earlier, I think in 2008, maybe first released in 2009, and originally really written heavily around NetBuy and

you can see this in the classical pandas numpy data types. So the support for missing values is fairly inconsistent across types. So you brought up PyArrow before, so with the PyArrow data types, then we do get consistent missing value handling in pandas, but for the classical numpy ones, we don't. Polar's started a lot later. There

It didn't have a lot of backwards compatibility concerns to have to worry about, so it could make a lot of good decisions up front. It's generally a lot stricter than Pandas. And in particular, there's a lot of strictness around the...

the kinds of ways it lets you interact with its objects. So in Pandas, the way we interact with data frames is we typically extract a series as one-dimensional objects. We then manipulate those series. Maybe we put them back into the original data frame, but we're doing everything one step at a time.

In Polar's, the primary way that we interact with data frames is with what you've got there on the screen. PL.col, AB, these are called expressions. And an expression, my mental model for it is just a function. It's a function from a data frame to a series. Almost like a generator or something, huh? Yeah, kind of, yeah. So, although I think...

When you say generator, like in Python, a generator, it's at some point you can consume it. Like you can type next on the generator and it produces a value, but an expression doesn't produce a value. It's like if you've got lambda x, x times two. It doesn't produce a value until you give it an input. And similarly, an expression like pl.col a, b, by itself, it doesn't do anything. The interpretation is given some data frame df, I'll return you the columns a and b. So it only produces those columns once you give it some input data frame.

And functions, just by their very definition,

are lazy kind of like you don't need to evaluate them straight away and so polis can take a look at all of the things you want to do it can recognize some optimization patterns it can recognize that maybe between some of your expressions there are some parts that are repeated and so instead of having to recompute the same thing multiple times it can just compute it once and then reuse that between the different expressions and yeah that's one of the big features of big capabilities of a

of Polar's is that it has kind of a query engine optimizer in there, whereas Panda's, because it's not lazy, it just does one thing, then the next, then the next. But maybe if you switch the order, like first filter and then compute versus compute and then filter, you might get a way better result.

outcome, right? That's a massive one, yeah. So when I was doing some benchmarking, we brought up QDF earlier, so that's the GPU-accelerated version of Pandas. And that is super fast if you're just doing single operations on one at a time in a given order. However, there are some...

where maybe you're having to join together multiple data frames, and then you're only selecting certain rows. At that point, it's actually faster to just do it on a CPU using a lazy library like Polar's, because Polar's can do the query optimization. It can figure out that it needs to do the filter and only keep certain rows before doing five gigantic joins, whereas QDF, it's super fast on GPU, but it is all eagerly executed.

It did way more work, but it did it really fast, so it was about the same in the end. Yeah, but now in Polos, there's going to be GPU support. Oh, is there? Okay. And it's going to be query-optimized GPU support. I don't know if the world is ready for this level of speed. Yeah, that's going to be interesting.

This portion of Talk Python is brought to you by WorkOS. If you're building a B2B SaaS app, at some point your customers will start asking for enterprise features like SAML authentication, skim provisioning, audit logs, and fine-grained authorization. That's where WorkOS comes in, with easy-to-use APIs that'll help you ship enterprise features on day one without slowing down your core product development.

Today, some of the fastest growing startups in the world are powered by WorkOS, including ones you probably know like Perplexity, Vercel, and Webflow. WorkOS also provides a generous free tier of up to 1 million monthly active users for AuthKit, making it the perfect authentication layer for growing companies. It comes standard with useful features like RBAC, MFA, and bot protection.

I guess another difference, it's not a massive, you know, in some ways it matters, some ways it doesn't, is hands-on.

pandas is based on c extensions right i'm guessing if i remember right and then poolers is rust and even took the dot rs extension for their domain which is really embracing it but not that it really matters you know what your native layer is if you're not working in that right like most python people don't work in c or rust but it's still interesting well i think it's

Yeah, it is interesting, but also it can be useful for users to know this because Polus has a really nice plugin system. So you can extend Polus with your own little expressions, which you can write in Rust. And the amount of Rust that you need to do this is really quite minimal. Like if you try to write these Polus plugins as if you were writing Python and then just use some LLM or something to guide you, I think...

Realistically, most data scientists can solve 98% of their inefficient data frame usage by using Polar's plugins. So having a nice, safe language that you can do this in really makes a difference. I'm going to write it in Python and then I'm going to ask some LLM. Right now I'm using LM Studio and I think Llama 3. Anyway, ask it, say, okay, write this in Rust for me. Write it as a Polar's plugin. Here we go. All right.

Yeah, exactly. It's crazy, this new world we live in. Yeah, yeah, totally. I mean, like the amount of Rust knowledge you need to take care of some of the complicated parts in Polar is really advanced. Really need to study for that and LLM isn't going to solve it for you. But the amount of Rust that you need to just make a plugin to solve some inefficient function, I think that's doable. Right, yeah, exactly. It's very different to say, we're going to just do this loop in this function call here versus there rather than...

I'm going to write a whole library in Rust or C or whatever. So, yeah, so there's a pretty different API between these. And in Narwhals, it looks like you've adopted the Rust API, right? A subset of it. Is that right? The Polar's one. Yes, exactly. So I kind of figured. Yeah, that's what I mean. Yeah, the Polar's one.

Yeah, yeah. We can either just choose the Pandas API. But to be honest, I found that trying to translate the Pandas API to Polars was fairly painful. Like Pandas has a bunch of extra things like the index, multi-index, and it does index alignment on all the operations. I just found it

not a particularly pleasant experience to try to map this onto Pandas. However, when I tried to do the reverse of translating the Polar's API to Pandas, it kind of just worked without that much effort. And I was like, oh, wow, this is magic. Okay, let's just take this a bit further, publish it on GitHub. Maybe somebody would find a use case for it. I don't know. Yeah, that's great.

Out in the audience, ZigZagJack asks, how is Narwhal's different from IBIS? All right. The number one most common question. Love this. Is it? Okay, great. Yeah. Maybe we should provide a bit of context for the listeners on what is IBIS. So IBIS, yes, you can see there on the screen, they describe themselves as the portable data frame library. So IBIS is really aiming...

to be a data frame library, just like pandas, just like polars. But it's got this API, which can then dispatch to different backends. The default one is DuckDB, which is a really powerful embedded analytics database. I think you covered it on the show. In fact, I think I might have first heard about DuckDB on Python Bytes. So listeners, if you want to stay up to date, subscribe to Python Bytes. It's

Thank you. Yeah, one of the shows I almost never miss. So yeah, I think the primary difference between novels and IBIS is the target audience. So with IBIS, they're really trying to be this full-blown data frame that people can use to do their analyses. Whereas with novels, I'm openly saying to end users, like if you're an end user, if you're a data scientist, if you're an ML engineer, if you're a data analyst, don't use novels. Like,

It's a tool for tool builders, like learn Polars, learn DuckDB, learn whatever the best tool is for your particular task and learn it well and master that and do your analysis. On the other hand, if you're a tool builder and you just need to do some simple operations with data frames and you want to empower your, if you want to enable your users to use your tool, regardless of which library they're starting with, then Nowels can provide a nice bridge between them. Interesting.

Is there any interoperability between IBIS and Narwhal? We do have some level of support for IBIS. And at the moment, this is just interchange level support in the sense that if you pass a IBIS data frame, then you can inspect the schema, not do much else. But for the Altair use case, that's all they needed. Like they just wanted to inspect the schema, make some decisions, etc.

on how to encode some different columns. And then depending on how long your data frame is, they might convert to Pyarrow and dispatch to a different library called Vega Fusion, or they might just do everything within Altair. But we found that even just having this relatively minimal level of support for IBIS, VEX, DuckDB, and anything else, anything that implements the data frame interchange protocol was enough to already solve some problems for users of these libraries. Yeah.

Okay, very interesting. Let's see, we'll hit a few more of the highlights here. 100% test coverage. You already mentioned that you found some bugs in which library was it? I think it's helped to uncover... Modian, yeah, yeah, that's right. I think all of them. I think it's helped to uncover some rough edge cases in all of the libraries that we have some support for. You write a library and you're going to say, I'm going to try to behave like you do and write some tests around that.

And then when you find the differences, you're like, wait a minute, right? Yeah, exactly. Also really love to see the let your IDE help you thanks to static typing. We'll definitely have to dive into that in a bit as well. That looks awesome. Cheers. Yeah, huge fan of static typing. You know, it's a bit of a controversial topic in some Python circles. Some people say that it's not really what Python is meant for and that it doesn't help you prevent bugs and all of that. And I can see where these people are coming from. But yeah.

When I've got a statically typed library and my IDE is just always popping up with helpful suggestions and doc strings and all of that, then that's when I really appreciate it. Exactly. Forget the bugs. If I don't have to go to the documentation because I hit dot and it's immediately obvious what I'm supposed to do, that's already a win, right? And typing gives you that. Plus it gives you checking. Plus it gives you lots of other things. I think it's great. And especially with your focus on tool builders, right?

Tool builders can build tools which have typing. They can build better tools using your typing, but they don't because it's optional. It's not really forced upon any of the users. The only libraries that I can think of that really force typing on their users is Pydantic and

and fast API and a couple of these that like, um, typer that have behavior driven on the types you put. But if you're using that library, you're choosing that as a feature, not a bug, right? Yeah, exactly. Yeah. So awesome. And then finally sticking with the focus on tool builders, perfect backwards compatibility policy. What does this mean? Okay.

This is a bit of an ambitious thing. So when I was learning Rust, I read about Rust editions. So the idea is that when you start a Rust project, you specify the edition of Rust that you want to use. And even as Rust gets updated, if you write some project using the 2015 edition of Rust, then it should keep working essentially forever. So they keep this edition around forever.

And if they have to make backwards incompatible changes, there's new additions like 2018, 2021 additions. So this is kind of what we're trying to do. Like the idea was, well, we're kind of mimicking the Polar's API, right?

I think there was a bracket I opened earlier, which I might not have finished, which was that the third choice we had was to make an entirely new API. But I thought, well, better to give, to do something that people are somewhat familiar with. Yeah, I think that's a great choice. Yeah, because when you go and write the code, half of the people already know Polars. And so they just keep doing that. You don't have to go, well, here's a third thing you have to learn, right? Yeah, I'd like to think that now half people know Polars. Unfortunately, I think we might not quite be there yet, but it is growing. Yeah.

Yeah. Yeah. Yeah. I think we'll get there. So yeah, it's okay. We're kind of mimicking a subset of the Polar's API and we're just sticking it to fundamentals. So that part should be relatively stable, but some point presumably Polar's is going to make a backwards incompatible change. And at that point, what do we do in now? What do we do about the top level now was API and

coordinating changes between different libraries, it's going to get tricky. And the last thing that I want to do is see people put upper bound constraints on the novels library. I think upper bound constraints on something like this should never really be necessary.

So we've tried to replicate what Rust does with its additions. The idea is that we've got a stable V1 API. We will have a stable V2 API at some point if we need to make backwards incompatible changes. But if you write your code using the V1 stable Narwhals API, then even as new Narwhals versions

come out, even as the main narwhals namespace changes, even as we might introduce v2, then your code should, in theory, keep working. Like v1 should stay supported indefinitely. This is the intention. Yeah, you said see the stable API for how to opt in. So how do you... I'm just curious what the mechanism is. So, for example, import narwhals.stable.v1 as nw, which is the standard narwhal support. Yeah, exactly. So instead of...

I got you. That's cool. Yeah, instead of import-narr-walls-snw, you'll do import-narr-walls.stable.v1-snw. And yeah, I'd encourage people when they're just trying it out, prototyping, use import-narr-walls-snw. If you want to make a release and future-proof yourself, then switch over to the stable.v1. This is a little similar to the api.talkpython.fm slash v1 slash whatever.

Versus, you know, where people encode a different version in their API endpoints, basically. Yeah, yeah. In import statements. I like it. I like it a lot. It's great. Yeah, let's see how this goes. Yeah, exactly. No, it's good. So just back to typing real quick. Pam Phil Roy out there says, load...

a lot of open source maintainers complain about typing because if you want to make it really correct, it's painful to add. That can be true. And some, you know, the last 1% is some insane statement, but it's so helpful for end users. Yeah. Sure. Yeah. You mentioned earlier that everyone seems to be at QuantSight. Do you know where I met Pam Phil? At QuantSight? He was an ex-colleague of mine at QuantSight. Yes. Amazing. Yeah.

See, it continues to happen. But yeah, I think that totally sums it up for me as well. You know, it's really great to be using libraries that give you those options. We do have the PYI files and we have TypeShed and all of that where people can kind of put typing on things from the outside that didn't want to support it. But if it's built in and

part of the, part of the project. It's just better, you know? Yeah. You have it from day one. It works well. I mean, trying to add types to a library that started without types like pandas, it's very painful to be honest. I bet it is. I bet it is. Yeah. Really cool. All right, let's go and talk through, I guess a quick shout out. Just last year I had Richie Vink, who's the creator of Polar's on talk Python. If people want to check that out, they can certainly, uh,

have a listen to that. I also just recently had Wes McKinney, creator of Pandas on and I'll link to those shows if people want to like dive into those. But let's talk a little bit through your documentation. It tells a really good story. I like what you've put down here as you know, it's not just here's your API and stuff, but it walks you through. So we talked about why obviously install pip install. It's pure Python with a pure Python wheel, right?

Yeah, exactly. Shouldn't be any issues with installation. Is it WASM compatible? Do you know? Could I use it on PyScript, Pyodide? I don't know. Are there any restrictions that they need? There's some restrictions. For example, I don't think you can do threading. I don't think you can use some of the common, which you don't have any dependencies, but some of the common third-party HTTP clients because it has to go through the browser's AJAX layer. There's some but not terribly many restrictions.

I'd imagine then that we would only be limited by whichever data frame people are passing in. Yeah, yeah, awesome. Okay, that's super nice. And maybe let's just talk through a quick example here. Keep it in mind that most people can't see any of the code, but let's just give them a sense still of what does it look like to write code that is interoperable with all these different libraries, these data frame libraries using narwhals. So maybe give us just an example. Sure.

Sure. So the idea is what we can see on the screen is just a very simple example of a data frame agnostic function. We've got a function called my function, and this is something that users could maybe just use. Maybe it's something your library exposes, but the user doesn't need to know about novels. The novels only happens once you get inside the function.

So the user passes in some data frame. We then call narwhals.fromNative on that data frame object. We do some operation, and then we return some native object back to the user. Now, the narwhals.fromNative, it's a practically free operation. It's not doing any data conversion. It's just instantiating some narwhals class that's backed by your original data frame. Right, right.

And I imagine if it's Polar's data frame that gets passed in, it's probably a more direct pass-through to the API than if you're doing operations on a Pandas data frame, right? Is there a difference of sort of runtime depending on the backend? The overhead is really low even for the Pandas case. In fact, some of the

Sometimes things do get a little bit faster because of how careful we've been about avoiding index operations and unnecessary copies. To be honest, some of this will be alleviated in Pandas version 3 when copy on byte becomes the default. Oh, that's interesting. Yeah. Yeah. In terms of the mapping on the implementation side, it's a bit easier to do the Polar's backend. But even then, we do need to do some version checks like in Pandas.

0.20.4 they renamed uh with row count to with row index i think and so yeah even there we do need some if then statements but like the end of the day what the library does is there's a few extra function calls a few checks on versions it's not really doing that much yeah like you might experience an extra millisecond compared to running something natively at most in

Usually you're using a data frame because you have some amount of data, even hundreds of rows. It's still most of the computation is going to end up there rather than if it's version this, call that, otherwise call this, right? That's not a lot of overhead, relatively speaking. I agree, yeah. So, yeah, we see an example here of a data frame agnostic function, which just calculates some descriptive statistics from an input data frame using the expressions API, which we talked about earlier. And here's something that I quite like about mkdocs.

So you see where it says, let's try it out. We've got these different tabs and you can click on like polars, pandas, polars lazy. And then you can see in each case what it looks like from the user's point of view. And you can see, you can compare the outputs. So from the user's point of view, they're just passing their object to Funk. What they're not seeing is that under the hood, Funk is using narwhals, but from their perspective, they put pandas in, they get pandas out. They put polars in, they get polars out.

That's awesome. So we talked about the typing and in this one we have a DF typed as a frame T. Is that some sort of generic and does it have restrictions on it? What is this frame T? I didn't dive into the source and check it out before.

Sure, yeah, it's a type for. So it's just the idea that you start with a data frame of some kind and you get back some data frame of the same kind. Start with polars, get back polars, start with pandas, get back pandas, and so on. And yeah, this version of the function is using the decorator nw.narwallify. Narwallify, it's a fantastic verb. So...

Yeah. So there's two ways in which you can implement your function. You can do it the explicit way where that's in the quick start and the docs where you write your function that takes some frame, some native frame, and then you convert that to this narwhals one. You say from native, then you do your work and then you

Depending on, you could convert it back, or in this case, it returns a list of strings in that example. Or you can skip the first and the last step and just put this decorator on it, and it'll convert it to, or wait, convert it from, and then convert it to on the way in and out, right? Yeah, exactly. So if you're really strict about type annotations, then using from native and to native gives you a little bit of extra information. But...

I think now it looks a little bit neater. Yeah, that's true. So for example, in the first one, you could say that this is actually a pandas data frame because you're writing the code or something like that. I don't know. What is this into frame? This is the type on this first example. Yeah, by into frame, we mean something that can be converted into a novel's data frame or lazy frame. How do you implement that in the type system? Is that a protocol or what is this?

Yeah, we've got a protocol. So I just found some methods that these libraries have in common. Exactly, yeah. You can find them.

That's what I was thinking. Yeah. Okay. Yeah. But if it, if it has enough of the functions of pandas or polars, you're like, all right, this is probably good. All right. And you can say it's one of these that's, that's pretty cool. Yeah, exactly. I mean, if any of this is confusing to listeners, we do have a page there in the documentation. That's all about typing. So people can read through that at their own leisure. Yeah, for sure. All right. Let's see. I do like the MK docs where you can have these different examples.

One thing I noticed is you've got the polar's eager evaluation and you've got the polar's lazy evaluation. And when you have the polar's lazy, this function decorated with the decorator, the narwhalify decorator, it itself returns something that is lazy and you've got to call collect on, right? So it kind of preserves the laziness, I guess. Is that right? Yes, exactly. This was something that was quite important to me. Like,

not be something that only works well with eager execution. I want to have some level of support such that lazy in can mean lazy out. Yeah, eager in, eager out, lazy in, lazy out, okay. Exactly. Yeah, so the way you do that in Polar is you create a lazy frame versus...

data frame, right? But then you've got to call collect on it, kind of like awaiting it a bit more async, which is cool. Yeah, or don't call collect or just wait until you really need to call collect. Right, or pass it on to the next one and on to the next. Yeah, exactly. Exactly. So one of the things that you talk about here is the Pandas index, which is one of the key differences between Polars and Pandas. And you've classified Pandas people into two categories.

Those who love the index and those who try to get rid of it and ignore it. Yeah, exactly. So if James Powell is listening, I think we can put him in the first category. I think most realistically, most pandas users that I've seen call dot reset index drop equals true every other line of code. They just find that the index gets in the way more than helps them most of the time.

And with novels, we're trying to accommodate both. So we don't do automated index alignment. So this isn't something that you have to worry about. But if you are really bothered about index alignment, say due to backwards compatibility concerns, then we do have some functions which allow you to do that, which would be no operations for other libraries. There's an example in scikit-lego of where they were relying on pandas index alignment. So we've got a function here,

narwhals may be a line index. So for pandas-like, the index will do its thing. And for other libraries, the data will just be passed through. You said there pandas-like. And pandas-like is actually a type in your type system, right? Did I see that? Yeah, yeah. So we've got is pandas-like data frame function to tell

So by pandas-like, we mean pandas QDF modin. So the libraries that have an index and follow those kinds of rules. Yeah, yeah, that's really cool. Yeah, because at the end of the day, like the idea of writing completely data frame agnostic code is a lot easier for new libraries than for existing libraries that have backwards compatibility concerns. And we recognize that

it might not be completely achievable. I think in all of the use cases where we've seen narwhals adopted, they're doing most of it in a data frame agnostic way, but they do have some parts of it where they're saying, okay, if this is a Pandas data frame, we've got some Pandas specific logic and otherwise let's go down the data frame agnostic route.

Yeah, you also have here levels of support. You have full and interchange. I think we talked about that a little bit. So maybe just point people here, but this is if you want to be QDF or modem, you can fully integrate. Or if you just want to have enough implementation

Enough of an implementation that they can kind of work together, right? You can do this data frame interchange protocol. Yeah, exactly. Or just write to us and we'd be happy to accommodate you without you having to go through the data frame interchange protocol. Oh yeah, very nice. Okay. You mentioned the overhead before, but you do have a picture. Pictures are always fun. And...

In the picture, you've got little different operations, different times for each of the operations. And there's a quite small overhead for pandas versus pandas with narwhals. Yeah, exactly. Like in some of them, you can see it becoming a little bit faster. In some of them, you can see it becoming a little bit slower. And

And these are queries that I think are the size that you can expect most data scientists to be working with a lot of the time. You've got queries that take between a couple of seconds to 30 seconds. And it's pretty hard to distinguish reliably between the blue and red dots. Sometimes one's higher, sometimes the other one's higher. There's a bit of statistical variance just between running the same benchmark multiple times. But overall, yeah, we were pretty happy with these results. Yeah, that's great. So...

How well have we covered how it works? We talked about the API, but I don't know if we've talked about the implementation of how you actually... Why is it basically almost the same speed? Why is it not going to work? Are you using underwater unicorn magic? Is that what it is? That's the secret, yes. Underwater unicorn magic. First, I should just say why we wrote this, how it works. It's because, really, I want this to be a community-driven project, and

And this is one of those cases where open source is more of a social game than a technical one. I'm not saying that's always the case. There are many problems that are purely technical. Novels is a social game in the end. Like what we're doing isn't that complicated, but if we want it to work, then it needs to be accessible to the community. People do need to be able to trust us and understand

That typically does not happen if it's a one-person project. So it was really important to me that different people would be able to contribute to it, that it all be as simple and clear as possible. So I made this page trying to explain how it works. It's not quite as clear and quite as extensive as I'd like to be, but a few contributors did say that it really helped them.

So in terms of how do we get this low overhead? So we're just defining an expression as being a function from a data frame, your sequence of series, and then we're just repeatedly and strictly applying that definition. So there's nothing too fancy going on. That's like in the end, just evaluating Lambda functions in Python, going down the stack trace like...

It's pretty fast. Yeah, that's really cool. Yeah, so people can check this out if they want to know. I think this might be where I saw the pandas-like expression. Ah, right, yeah. Pandas-like. Yeah, pandas-like is this class that encompasses pandas mode in QDA, the ones that kind of follow the pandas API. Mm-hmm.

Yeah, close enough for what you need to do. Yeah, exactly. All right. Well, I saw a question out there in the audience somewhere from Francesco, which was basically asking about the roadmap. Like, where are you? Where are you going? Yeah, I should probably introduce Francesco. He's one of the most active contributors to the project. So thanks, Francesco, for...

helping to make it a success. He was actually also the first person to adopt Narwhals in one of his libraries. Yeah, I spoke to him about it at a conference and he was like, I've got this tiny little time-based CV library. Let's try Narwhalifying it as an experiment.

Sure, we did that then. Not scikit-learn, sorry. Not scikit-lego. It was this kind of experimental building blocks for scikit-learn pipelines that he maintains. And then we've just been taking it from there. So in terms of roadmap, my...

top priority is helping out libraries that have shown interest in novels so at the moment uh formulaic that opened a draft pull request in which they were trying out novels and they tagged me just about some things they were missing so i'd like to see if i can take that to completion i've got i think i've got most of it working but just been a bit busy with conferences recently so maybe next month i'll be able to get something ready for review and show that to them that would be pretty cool it's it's

Summer is passing, the conferences are ending, it's going to get dark and cold. Perfect time to program. Yeah, we'll get back to the situation that I was in when I started Novels, which was that it was a rainy Friday. Not Friday, sorry. It was a rainy February weekend in Wales, the rainiest part of the UK. Yeah, that's exactly the same in Oregon here. So it's a good time to get stuff done. Yeah.

Yeah, exactly. So yeah. And then I've been speaking to people from Shiny and Plotly about potentially looking into novels, like

There's no contract set in stone or anything. These people may well change their mind if it doesn't work for them. But my idea is, okay, they've shown interest. Let's go head first into seeing whether we can help them and whether they'd be able to use Nowels. If it doesn't work out, we'll just have strengthened the Nowels API and learned some new things. If it does work, then great, exciting. Okay.

So that's my top priority. And it's been really pleasing to see the contributor community develop around novels. I really thought it would be a one person project for a long time, but so many people have been contributing really high quality pull requests. It's really been, yeah, you see what they do. Okay. One of them is this. Okay. Maybe a couple of them here are like GitHub bots, this pre-commit CI bot and

Yeah. Maybe that's not as common as it does. Maybe 40, 30, but still, that's a lot. And while we're talking numbers on the homepage, I also want to point out 10 million downloads a month is a lot of downloads. Yeah.

Yeah, that's maybe slightly misleading because they pretty much just come from the fact that it's now a required dependency of Altair. And Altair gets millions of downloads. Yeah, exactly. But that's the place of some libraries. Like Verkzorg, I don't think many people go, oh, let me go get this HTTP library or it's dangerous. They just go, I'm going to use Flask, right? But it's still a really important building block of the community here.

even if people don't seek it out as a top level thing they use, right? Sure. Cheers. Thanks. Yeah. In fact, if we do our job well, then most people should never know about novels. Exactly. They should just use, just work. Yeah, exactly. They just look in their, their PIP list. Like what is this whale thing in here? Yeah, exactly. Yeah. So yeah, it's been really encouraging, really pleasing to see this contributor community emerge around the project. And yeah,

I think a lot of the contributors are really interested in adding extra methods and adding extra backends and things. So I'm trying to leave a lot of that to the community. So like with Dask, I just got the rough building blocks together and then it was just so nice, like so many really high quality contributions coming up that brought the Dask support.

pretty much complete. We should see now if we're able to execute all of the TPC-H queries with the Dask backend. We might actually be there or be pretty close to getting there. Nice. What does TPC-H stand for? Well,

I don't remember what it stands for, but it's a set of database queries that they were originally written for testing out different databases. So it's a bunch of SQL queries. But I'm not sure if it was Kaggle that popularized the idea of translating these SQL queries to data frame-like APIs and then running different data frames on them to see who wins the speed test.

But we just figured they do a bunch of things like joins, concatenations, filtering, comparisons with dates, string operations. And we're like, okay, if the NowLens API is able to do all of this, then maybe it's extensive enough to be useful. Right. Yeah, yeah. That's super cool. It sounds a little bit like the TOB index plus other stuff maybe, but for databases. I'm not familiar with that. It's like a language ranking thing.

type of thing. And, you know, one aspect is maybe ranking the databases, but yeah, no, this is a very cool. Okay. Got it. Yeah. I mean, in the end, we're not trying to be fast as novels, but we, we just want to make sure that there's no extra overhead compared to running things natively. As long as you're not, as long as you're not much slower than the stuff that you're, you're operating with, like,

That's all you should ask for. You can't make it go faster in the extreme. Like you did talk about some optimizations, but you can't fundamentally change what's happening. Yeah, we could do some optimizations on the novel side, but to be honest, I'm not sure I want to. And part of the reason is because I want this to be a pretty simple project that's easy to maintain. Yeah, sure. And that's really just low overhead. Yeah.

Add extra docs and tutorials coming. That's fun. Yeah. Looking for contributors and maybe want to write some tutorials or, or docs. I would love this. Yeah. I mean, it drives me crazy when I see so many projects where people have put so much effort into making a really good product, but then the documentation is really scant. Like if you don't,

But prioritize writing good docs. Nobody's going to use your product. So I was really grateful to my company. They had four interns come on who really helped out with making the docs look amazing. That's cool. Like if you look at the API reference, I think every single function now has got a...

like a doc string with an example at the bottom, I think there's API reference. Yeah, if you search for any function in here, yeah, in the search box at the top, some, I don't know, series dot something. Yeah, see, for any of these, we've got like an example of, okay, here's how you could write a data frame agnostic function which uses this method.

And let's show that if you pass pandas or polars, you get the same result. And if there's some slight differences that we just cannot get around, like in the way that they handle missing values, then we've got a very clear note about in the docs. Yeah, that's great. Maybe someday support for DuckDB.

I would like that. I don't think we have much of a choice about whether or not we support DuckDB. Like, DuckDB is really on fire now. It really is. Yeah, I think it might be a question of either we have some level of support for DuckDB, or somebody else is going to make something like novels that supports DuckDB, and then we become extinct. Yeah.

But besides, to be honest, DuckDB is amazing. I just find it a bit painful to write SQL strings. And so if I could use DuckDB, but with the Polar's API that I prefer and I'm more familiar with, then...

Yeah, I 100% agree. It looks super nice, but if you look, it has a SQL example, and then the Python example is just SQL, quote, quote, quote. Yeah, exactly. Here's the SQL embedded in Python, you know what I mean? So you're kind of right in SQL no matter what. Yeah, and then the error messages that you get sometimes are like, oh, there's a pass error near this keyword, and you're like, what on earth is going on? Yeah.

And then you're like, oh yeah, I forgot. I've got an extra comma at the end of my select or something. I don't know. Yeah. So this kind of thing. Yeah. So DuckDB is a little bit like SQLite, but for analytics rather than relational, maybe. I'm not sure if that's a...

I think it's primarily aimed at analysts. Yeah. Analytical kinds of things. Yeah. Data scientists and people. What I kind of, what we are going to struggle with is that in a duck DB, there's no guarantees about row order or operations. So,

But on the plus side, when I look at what Altair are doing with data frames, when I look at some of the other libraries that I've shown interest in now, they're often just doing very simple things. They're not doing things that depend on row order. So if we could just initially just support DuckDB for the operations that don't require row order, so for something like a cumulative sum, maybe initially we just don't support that for DuckDB. Like in the end, if you want to do advanced SQL, just use DuckDB directly, like...

As I said earlier, I don't recommend that end users use Narwhals directly, but even just having some common operations, ones that aren't row-order dependent, I'd like to think that this is already enough to solve some real problems for real people. Yeah, I know you said it's mostly for library builders, but if you were building an app and you were not committed to your DataFrame library, or you really wanted to leave open the possibility of choosing a different DataFrame library, you know, sort of using Narwhals...

isolate that a little bit might be nice right yeah so yeah yeah if anyone tries to do this and i'd love to hear your story i did hear from somebody in our community call we've got a community call every two weeks by the way if anyone wants to come and chat with us i did hear from somebody that like at work has got some teams that are primarily using pandas some at

teams that are primarily using Polars and he just wanted to build some common logic that both teams could use and he was using novels for that. So I think there are some use cases beyond just library maintainers. Yeah, absolutely. Maybe you're building just an internal library

And it needs to work with some code you've written in pandas, but maybe you want to try your new project in Polars, but you want to still use that library, right? That would be a cool use case as well. Just to lock yourself in. Yeah, I'm pretty sure you've brought up on the show before that XKCD about the space bar overheating.

I can't remember which number that one is, but in the end, with a lot of open source projects, you put it out with some intention of how it's meant to be used. Yes. But then people find their own way of using it. I believe it was spacebar heating. Workflow, this is it here. Yes. Love this one. Yeah, it looks like something out of a change log with feedback or something that says, changes in version 10.7. The CPU no longer overheats when you hold down the spacebar. Comment. Yeah.

LogtimeUser4 writes, this update broke my workflow. My control key is hard to reach, so I hold the spacebar instead, and I configured Emacs to interpret a rapid temperature rise as control. That's horrifying. Look, my setup works for me. Just add an option to re-enable spacebar heating. Ha ha.

I've seen it so many times, but it still makes me laugh each time. It's incredible. It's incredible. All right, Marco. Well, congrats on the cool library. Congrats on the traction. Final call to action. Maybe people want to start using our walls. What do you tell them? Yeah, give it a go and please join our discord and or our community calls. We're very friendly and open and would love to hear from you and see what we can do to address whatever limitations you might come up against.

This has been another episode of Talk Python to Me. Thank you to our sponsors. Be sure to check out what they're offering. It really helps support the show. This episode is brought to you by WorkOS. If you're building a B2B SaaS app, at some point your customers will start asking for enterprise features like SAML authentication, SCIM provisioning, audit logs, and fine-grained authorization.

WorkOS helps ship enterprise features on day one without slowing down your core product development. Find out more at talkpython.fm slash workOS. Want to level up your Python? We have one of the largest catalogs of Python video courses over at TalkPython. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in sight. Check it out for yourself at training.talkpython.fm.

Be sure to subscribe to the show, open your favorite podcast app, and search for Python. We should be right at the top. You can also find the iTunes feed at slash iTunes, the Google Play feed at slash Play, and the direct RSS feed at slash RSS on TalkPython.fm. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at TalkPython.fm slash YouTube.

This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

#480: Ahoy, Narwhals are bridging the data science APIs 59:15 Share