cover of episode Public safety Kubernetes

Public safety Kubernetes

2024/11/29
logo of podcast Ship It! Cloud, SRE, Platform Engineering

Ship It! Cloud, SRE, Platform Engineering

People
M
Marc Boorshtein
Topics
Marc Boorshtein: 本人负责华盛顿特区急救人员的身份系统,该系统连接了不同层级的政府部门,解决了跨辖区通信难题。系统最初基于Active Directory和LDAP虚拟目录,经历了从虚拟设备到Kubernetes的演进,并采用了高度自动化的流程,包括使用Azure DevOps进行持续集成和持续部署,以及使用Prometheus进行监控。在应对COVID-19疫情期间,系统迁移到Azure云,并进一步采用GitOps策略,以实现高可用性和容错能力。过程中面临的主要挑战包括政府采购流程的复杂性、证书管理、网络问题、合规性与安全性的平衡以及Azure平台的持续变化。 Justin Garrison & Autumn Nash: 两位主持人与Marc Boorshtein讨论了在公共安全领域使用Kubernetes的经验,并就相关技术细节、挑战和最佳实践进行了深入探讨。他们关注了可观测性、警报机制、开发与运维团队协作以及反馈循环的重要性。

Deep Dive

Key Insights

Why did Marc Boorshtein's team initially struggle with implementing Docker containers in 2015?

The implementation was a cluster, and they decided not to touch Kubernetes until they could use a managed service.

What was the primary reason for the move to Kubernetes in 2021?

The team needed to move to Azure, and once on Azure, they felt ready to adopt Kubernetes as a managed service.

How did the team automate their monitoring and infrastructure updates?

They used Azure DevOps to automate builds, deployments, and monitoring, with a Rube Goldberg-like system that included webhooks and automated testing.

Why did the team switch to GitOps?

They needed to maintain configuration manifests between two different clusters for redundancy, and GitOps allowed them to manage this more efficiently.

What challenges did the team face when integrating with Azure AD for SSO?

Azure AD required going through the public internet for authentication, which conflicted with security policies, and they had to create an email forwarding service to handle external users.

How did the team handle email forwarding for external users in Azure?

They ran an email forwarding service in a container within their Kubernetes cluster, forwarding emails through AWS SES to avoid being blocked by email providers.

What is the next major project for the team's infrastructure?

They plan to convert OpenUnison configurations to use CRDs dynamically, move to the External Secrets Operator, and revamp their user interface using React and Material Design.

Why did the team choose Argo for GitOps over Flux?

Marc Boorshtein prefers Argo because of its GUI features, which he finds more user-friendly for enterprise use.

What was the impact of COVID on the team's identity infrastructure?

COVID accelerated the need for identity infrastructure as work-from-home became prevalent, highlighting the importance of SSO and cloud-based identity solutions.

How did the team handle security vulnerabilities in Azure?

They were mostly insulated from Azure-specific vulnerabilities because they didn't use the services that were affected, but they did face challenges with log4j due to their Java-based systems.

Shownotes Transcript

Marc Boorshtein from Tremolo Security joins Justin & Autumn to talk all about running Kubernetes in the public sector.

Join the discussion)

Changelog++) members save 8 minutes on this episode because they made the ads disappear. Join today!

Sponsors:

  • System Initiative) – The future of DevOps automation (is here!) — System Initiative is an intuitive, powerful, collaborative replacement for Infrastructure as Code (IaC). The free tier is awesome (no credit card required) and you can get started) in 3 clicks.

  • Retool) – The low-code platform for developers to build internal tools — Some of the best teams out there trust Retool…Brex, Coinbase, Plaid, Doordash, LegalGenius, Amazon, Allbirds, Peloton, and so many more – the developers at these teams trust Retool as the platform to build their internal tools. Try it free at retool.com/changelog)

  • Timescale) – Purpose-built performance for AI Build RAG, search, and AI agents on the cloud and with PostgreSQL and purpose-built extensions for AI: pgvector, pgvectorscale, and pgai.

Featuring:

Show Notes:

Something missing or broken? PRs welcome!)