cover of episode Scaling Machine Learning with Spark • Adi Polak & Holden Karau

Scaling Machine Learning with Spark • Adi Polak & Holden Karau

2023/6/30
logo of podcast GOTO - The Brightest Minds in Tech

GOTO - The Brightest Minds in Tech

Frequently requested episodes will be transcribed first

Shownotes Transcript

This interview was recorded for the GOTO Book Club.gotopia.tech/bookclub)Read the full transcription of the interview here)Adi Polak) - VP of Developer Experience at Treeverse & Contributing to lakeFS OSSHolden Karau) - Co-Author of "Kubeflow for Machine Learning" & many more books & Open Source Engineer at NetflixDESCRIPTIONLearn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, author Adi Polak introduces data and ML practitioners to creative solutions that supersede today's traditional methods. You'll learn a more holistic approach that takes you beyond specific requirements and organizational goals--allowing data and ML practitioners to collaborate and understand each other better.Scaling Machine Learning with Spark examines several technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLflow, TensorFlow, and PyTorch. If you're a data scientist who works with machine learning, this book shows you when and why to use each technology.You will:• Explore machine learning, including distributed computing concepts and terminology• Manage the ML lifecycle with MLflow• Ingest data and perform basic preprocessing with Spark• Explore feature engineering, and use Spark to extract features• Train a model with MLlib and build a pipeline to reproduce it• Build a data system to combine the power of Spark with deep learning• Get a step-by-step example of working with distributed TensorFlow• Use PyTorch to scale machine learning and its internal architecture* Book description: © O’Reilly)The interview is based on the book "Scaling Machine Learning with Spark)"RECOMMENDED BOOKSAdi Polak • Machine Learning with Apache Spark)Holden Karau, Trevor Grant, Boris Lublinsky, Richard Liu & Ilan Filonenko • Kubeflow for Machine Learning)Holden Karau • Distributed Computing 4 Kids)Holden Karau • Scaling Python with Dask)Holden Karau & Boris Lublinsky • Scaling Python with Ray)Holden Karau & Rachel Warren • High Performance Spark)Holden Karau, Konwinski, Wendell & Zaharia • Learning Spark)Holden Karau & Krishna Sankar • Fast Data Processing with Spark 2nd Edition))

Bluesky)Twitter)Instagram)LinkedIn)Facebook)CHANNEL MEMBERSHIP BONUSJoin this channel to get early access to videos & other perks:https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/join)Looking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.tech)SUBSCRIBE TO OUR YOUTUBE CHANNEL) - new videos posted daily!