How do you find the most interesting or suspicious points within your data? What libraries and techniques can you use to detect these anomalies with Python? This week on the show, we speak with author Brett Kennedy about his book “Outlier Detection in Python.”
Brett describes initially getting involved with detecting outliers in financial data. He discusses various applications and techniques in security, manufacturing, quality assurance, and fraud. We also dig into the concept of explainable AI and the differences between supervised and unsupervised learning.
This episode is sponsored by APILayer.
Course Spotlight: Using k-Nearest Neighbors (kNN) in Python)
In this video course, you’ll learn all about the k-nearest neighbors (kNN) algorithm in Python, including how to implement kNN from scratch. Once you understand how kNN works, you’ll use scikit-learn to facilitate your coding process.
Topics:
00:00:00 – Introduction
00:01:56 – Describing the book
00:03:22 – How did you get involved in outlier detection?
00:06:50 – Initially looking at the data to spot errors
00:08:22 – Amount of fraud and financial errors
00:09:50 – Understanding the nature of the outliers
00:12:15 – Industries that would be interested in detection
00:18:21 – Sponsor: APILayer.com
00:19:15 – Who is the intended audience for the book?
00:22:16 – Differences between supervised vs unsupervised learning
00:25:48 – Autonomous vehicles detecting anomalous imagery
00:29:08 – What is explainable AI?
00:36:21 – Video Course Spotlight
00:37:43 – Detecting an outlier across multiple columns
00:44:32 – Detection of LLM and bot activity
00:49:49 – Proving you are a human checkbox
00:52:25 – What are Python libraries for outlier detection?
00:53:57 – Creating synthetic data to work through examples
00:57:10 – Tools developed and described in the book
01:01:29 – How to find the book
01:02:27 – What are you excited about in the world of Python?
01:04:55 – What do you want to learn next?
01:05:52 – How can people follow your work online?
01:06:16 – Thanks and goodbye
Show Links:
Episode #169: Improving Classification Models With XGBoost – The Real Python Podcast)
pyod: A Comprehensive and Scalable Python Library for Outlier Detection (Anomaly Detection))
scikit-learn: machine learning in Python — scikit-learn 1.5.0 documentation)
DataConsistencyChecker: A Python tool to examine datasets for consistency)
Level up your Python skills with our expert-led courses:
Support the podcast & join our community of Pythonistas)