SummaryMachine learning is a data hungry activity, and the quality of the resulting model is highly dependent on the quality of the inputs that it receives. Generating sufficient quantities of high quality labeled data is an expensive and time consuming process. In order to reduce that time and cost Alex Ratner and his team at Snorkel AI have built a system for powering data-centric machine learning development. In this episode he explains how the Snorkel platform allows domain experts to create labeling functions that translate their expertise into reusable logic that dramatically reduces the time needed to build training data sets and drives down the total cost.Announcements
Interview
Introduction
How did you get involved in machine learning?
Can you describe what Snorkel AI is and the story behind it?
What are the problems that you are focused on solving?
Which pieces of the ML lifecycle are you focused on?
How did your experience building the open source Snorkel project and working with the community inform your product direction for Snorkel AI?
How has the underlying Snorkel project evolved over the past 4 years?
What are the deciding factors that an organization or ML team need to consider when evaluating existing labeling strategies against the programmatic approach that you provide?
What are the features that Snorkel provides over and above managing code execution across the source data set?
Can you describe what you have built at Snorkel AI and how it is implemented?
What are some of the notable developments of the ML ecosystem that had a meaningful impact on your overall product vision/viability?
Can you describe the workflow for an individual or team who is using Snorkel for generating their training data set?
How does Snorkel integrate with the experimentation process to track how changes to labeling logic correlate with the performance of the resulting model?
What are some of the complexities involved in designing and testing the labeling logic?
How do you handle complex data formats such as audio, video, images, etc. that might require their own ML models to generate labels? (e.g. object detection for bounding boxes)
With the increased scale and quality of labeled data that Snorkel AI offers, how does that impact the viability of autoML toolchains for generating useful models?
How are you managing the governance and feature boundaries between the open source Snorkel project and the business that you have built around it?
What are the most interesting, innovative, or unexpected ways that you have seen Snorkel AI used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on Snorkel AI?
When is Snorkel AI the wrong choice?
What do you have planned for the future of Snorkel AI?
Contact Info
Parting Question
Closing Announcements
Links
SHAP)
The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano) by The Freak Fandango Orchestra)/CC BY-SA 3.0)