cover of episode #22 Why testing data pipelines can be so challenging - and how to tackle it

#22 Why testing data pipelines can be so challenging - and how to tackle it

2024/9/6
logo of podcast Plumbers of Data Science

Plumbers of Data Science

Frequently requested episodes will be transcribed first

Shownotes Transcript

In this episode of the Plumbers of Data Science podcast, I’m diving into why testing can be so challenging for data engineers. The inspiration for this topic actually came from one of my recent Coaching sessions, where the question of test-driven development (TDD) came up during a Q&A. It stuck with me, so I thought it would be a great topic to dive deeper into.

I’ll explain the key benefits of TDD, like improved code quality and easier refactoring, and why, despite its advantages, it’s not always widely adopted—especially in fast-paced environments where time constraints dominate. We’ll also talk about the specific challenges data engineers face with TDD, such as handling large, unpredictable data, integrating with external systems, and adapting to ever-changing data.