Cloud computing and ubiquitous virtualization have changed the ways that our applications are built and deployed. This new environment requires a new way of tracking and addressing the security of our systems. ThreatStack is a platform that collects all of the data that your servers generate and monitors for unexpected anomalies in behavior that would indicate a breach and notifies you in near-realtime. In this episode ThreatStack’s director of operations, Pete Cheslock, and senior infrastructure security engineer, Patrick Cable, discuss the data infrastructure that supports their platform, how they capture and process the data from client systems, and how that information can be used to keep your systems safe from attackers.
Hello and welcome to the Data Engineering Podcast, the show about modern data management
When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode) to get a $20 credit and launch a new server in under a minute.
For complete visibility into the health of your pipeline, including deployment tracking, and powerful alerting driven by machine-learning, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix performance bottlenecks in no time. Go to dataengineeringpodcast.com/datadog) today to start your free 14 day trial and get a sweet new T-Shirt.
Go to dataengineeringpodcast.com) to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.
Your host is Tobias Macey and today I’m interviewing Pete Cheslock and Pat Cable about the data infrastructure and security controls at ThreatStack
Introduction
How did you get involved in the area of data management?
Why don’t you start by explaining what ThreatStack does?
What was lacking in the existing options (services and self-hosted/open source) that ThreatStack solves for?
Can you describe the type(s) of data that you collect and how it is structured?
What is the high level data infrastructure that you use for ingesting, storing, and analyzing your customer data?
How do you ensure a consistent format of the information that you receive?
How do you ensure that the various pieces of your platform are deployed using the proper configurations and operating as intended?
How much configuration do you provide to the end user in terms of the captured data, such as sampling rate or additional context?
I understand that your original architecture used RabbitMQ as your ingest mechanism, which you then migrated to Kafka. What was your initial motivation for that change?
How much of a benefit has that been in terms of overall complexity and cost (both time and infrastructure)?
How do you ensure the security and provenance of the data that you collect as it traverses your infrastructure?
What are some of the most common vulnerabilities that you detect in your client’s infrastructure?
For someone who wants to start using ThreatStack, what does the setup process look like?
What have you found to be the most challenging aspects of building and managing the data processes in your environment?
What are some of the projects that you have planned to improve the capacity or capabilities of your infrastructure?
Pete Cheslock
@petecheslock) on Twitter
petecheslock) on GitHub
Patrick Cable
@patcable) on Twitter
patcable) on GitHub
ThreatStack
@threatstack) on Twitter
threatstack) on GitHub
The intro and outro music is from The Hug) by The Freak Fandango Orchestra) / CC BY-SA)