SummaryThe majority of machine learning projects that you read about or work on are built around batch processes. The model is trained, and then validated, and then deployed, with each step being a discrete and isolated task. Unfortunately, the real world is rarely static, leading to concept drift and model failures. River is a framework for building streaming machine learning projects that can constantly adapt to new information. In this episode Max Halford explains how the project works, why you might (or might not) want to consider streaming ML, and how to get started building with River.Announcements
Interview
Introduction
How did you get involved in machine learning?
Can you describe what River is and the story behind it?
What is "online" machine learning?
What are the practical differences with batch ML?
Why is batch learning so predominant?
What are the cases where someone would want/need to use online or streaming ML?
The prevailing pattern for batch ML model lifecycles is to train, deploy, monitor, repeat. What does the ongoing maintenance for a streaming ML model look like?
Concept drift is typically due to a discrepancy between the data used to train a model and the actual data being observed. How does the use of online learning affect the incidence of drift?
Can you describe how the River framework is implemented?
How have the design and goals of the project changed since you started working on it?
How do the internal representations of the model differ from batch learning to allow for incremental updates to the model state?
In the documentation you note the use of Python dictionaries for state management and the flexibility offered by that choice. What are the benefits and potential pitfalls of that decision?
Can you describe the process of using River to design, implement, and validate a streaming ML model?
What are the operational requirements for deploying and serving the model once it has been developed?
What are some of the challenges that users of River might run into if they are coming from a batch learning background?
What are the most interesting, innovative, or unexpected ways that you have seen River used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on River?
When is River the wrong choice?
What do you have planned for the future of River?
Contact Info
Parting Question
Closing Announcements
Links
The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano) by The Freak Fandango Orchestra)/CC BY-SA 3.0)