cover of episode CI/CDagger

CI/CDagger

2024/12/6
logo of podcast Ship It! Cloud, SRE, Platform Engineering

Ship It! Cloud, SRE, Platform Engineering

People
A
Autumn Nash
D
Dave Rosenthal
G
Gerhard Lazu
J
Justin Garrison
K
Kurt Mackey
Topics
Dave Rosenthal: Sentry 正在努力实现全应用健康监控,将日志、指标和错误监控等数据整合到一个平台,通过 Trace ID 关联所有数据,方便开发者调试和分析。这将提高系统的可调试性,方便开发者分析和解决问题,并能够检测到更多类型的错误。

Deep Dive

Key Insights

Why did Gerhard Lazu start the Ship It podcast?

Gerhard started Ship It as an extension of the work he was doing on Changelog, focusing on infrastructure and the process of taking applications to production. The podcast was a way to share the insights and experiences gained over years of working in this space.

What is Dagger, and why might someone want to use it?

Dagger is a tool that allows teams to replace YAML and scripts in CI/CD pipelines with code they are familiar with, such as Python, Go, or TypeScript. It helps scale automation for teams and makes it easier to share and reuse modules, reducing the need for templated YAML or Jenkins files.

What are the key features of Dagger that make it stand out?

Dagger's key features include the ability to write automation in familiar programming languages, shareable modules, OpenTelemetry integration for tracing, and a shell for interactive discovery of automation. It also ensures consistent execution across local and CI environments.

Why is documentation crucial before implementing automation?

Documentation provides a blueprint for understanding inefficiencies and ensures that automation is built on a solid foundation. It helps teams avoid mistakes and provides a reference for future changes, making it essential before any automation is implemented.

What is the future vision for Dagger?

The vision for Dagger is to become the standard for automation, similar to how containers revolutionized application packaging. It aims to make automation more consumable, shareable, and documented, reducing the need for repetitive tasks like downloading and validating packages.

What challenges does Dagger face at scale?

At scale, Dagger faces challenges such as managing distributed caching, ensuring reliable pipeline execution, and optimizing costs. Running thousands of CPUs for testing can be expensive, and the tool must balance performance with cost efficiency.

How does Dagger compare to traditional CI/CD tools like Jenkins?

Dagger allows application teams to own their CI/CD pipelines by writing automation in their preferred language, unlike traditional tools where DevOps teams often drop in Jenkins files. This shifts the responsibility to the application teams, making CI/CD more aligned with their workflows.

What is the significance of Dagger's modules and how are they used?

Dagger modules are shareable packages of automation code that can be written in various languages. They allow teams to reuse and combine functions, making it easier to build pipelines without reinventing the wheel. Modules are self-documenting and can be discovered and used by anyone.

What role does OpenTelemetry play in Dagger?

OpenTelemetry in Dagger captures detailed traces of every operation within a pipeline run, providing insights into execution times, caching efficiency, and step performance. This data is visualized in Dagger Cloud, helping teams optimize their automation workflows.

Why is the portability of Dagger important for modern development?

Dagger's portability allows teams to run the same automation locally and in CI environments, making it easier to test and debug pipelines. It also enables teams to choose where to run their pipelines, whether on-prem or in the cloud, based on cost and performance needs.

Chapters
Dave Rosenthal, CTO of Sentry, discusses the future of application health monitoring, emphasizing the integration of various telemetry data sources using a trace ID to provide a comprehensive debugging experience for developers.
  • Integration of various telemetry data sources (logs, metrics, errors) using a trace ID.
  • Enhanced debugability through a richer, interconnected data model.
  • Ability to analyze and slice and dice data based on various parameters (e.g., operating system).

Shownotes Transcript

Translations:
中文

One.

This is Ship It with Justin Garrison and Autumn Nash. If you like this show, you will love The Change Log. It's software news on Mondays, deep technical interviews on Wednesdays, and on Fridays, an awesome talk show for your weekend enjoyment. Find it by searching for The Change Log wherever you get your podcasts. Ship It is brought to you by Fly.io. Launch your app in five minutes or less. Learn how at Fly.io.

What's up friends. I'm here with Dave Rosenthal, CTO of Sentry. So Dave, when I look at Sentry, I see you driving towards full application health, error monitoring where things began, session replay, being able to replay a view of the interface a user had going on when they experienced an issue with full tracing, full data, the advancements you're making with tracing and profiling, cron monitoring, co-coverage, user feedback, and

and just tons of integrations. Give me a glimpse into the inevitable future. What are you driving towards? Yeah, one of the things that we're seeing is that in the past, people had separate systems where they had like logs on servers, written files. They were maybe sending some metrics to Datadog or something like that or some other system. They were monitoring for errors with some product, maybe it was Sentry. But more and more what we see is people want all of these sources of telemetry logically tied together somehow.

And that's really what we're pursuing at Sentry now. We have this concept of a trace ID, which is kind of a key that ties together all of the pieces of data that are associated with the user action. So if a user loads a web page, we want to tie together all the server requests that happened, any errors that happened, any metrics that were collected. And what that allows on the back end

You don't just have to look at like three different graphs and sort of line them up in time and, you know, try to draw your own conclusions. You can actually like analyze and slice and dice the data and say, hey, what did this metric look like for people with this operating system versus this metric look like for people with this operating system and actually get into those details. So this kind of idea of tying all of the telemetry data together using this concept of a trace ID or basically some key, I think is interesting.

is a big win for developers trying to diagnose and debug real world systems in something that is we're kind of charge the path for that for everybody. Okay. Let's see you get there. Let's see you get there tomorrow. Yeah, perfectly. How will systems be different? How will teams be different as a result?

Yeah, I mean, I guess again, I'll just keep saying it maybe, but I think it kind of goes back to this debugability experience. When you are digging into an issue, you know, having a sort of a richer data model that's, you know, your logs are structured. They're sort of this hierarchical structure with spans. And not only is it just the spans that are structured, they're tied to errors, they're tied to other things. So when you have the data model that's kind of interconnected, it works.

opens up all different kinds of analysis that were just kind of either very manual before, kind of guessing that maybe this log was, you know, happened at the same time as this other thing, or were just impossible. We get excited not only about the new kinds of issues that we can detect with that interconnected data model, but also just for every issue that we do detect, how easy it is to get to the bottom of.

I love it. Okay, so they mean it when they say code breaks. Fix it faster with Sentry. More than 100,000 growing teams use Sentry to find problems fast, and you can too. Learn more at Sentry.io. That's S-E-N-T-R-Y.io. And use our code CHANGELOG. Get $100 off the team plan. That's almost four months free for you to try out Sentry. Once again, Sentry.io.

Hello and welcome to Ship It, the podcast all about everything after Git Push. I'm your host, Justin Garrison, and with me as always is Autumn Nash. How's it going, Autumn? I'm very happy to be here. Slightly caffeinated, like almost there. Getting there. Getting there. A little more coffee.

Autumn, are we telling people about your new job? Yeah, we're good to go now. I mean, the world knows at this point. That's right. You announced it on LinkedIn. We're good. So what's your new job? So I am the product manager for...

Azure Linux. So the Azure Linux distribution at Microsoft. Just started this week. Congratulations. Well, actually, the security product manager. I'm sure there's a lot of PMs on Azure Linux. Yeah, so I am the one that works on the security vision for Azure.

So anyone listening to the podcast, if you have problems with security on Azure, you know what? Rude, Justin. Rude. Don't lie. The DMs are going to be from you making different accounts. On my bot army. Yeah.

It's going to be Justin's bot army being like, can you fix this for me? I've had a bug. For any longtime listeners of the show, you'll know our guest. How's it going, Gearheart? It's going really well. I'm very happy to be back. This feels very cozy. I'm so excited to meet you. I feel like I'm geeking you out. Likewise. I was a longtime listener of the show. I thought it was great. Can you bring us back? Why did you start ShipIt?

It started with all the work that we were doing on Changelog with Adam and Jared. I mean, there was a lot of...

Infra work and setting everything up and going through all the motions that you normally do when you take an application to production. And we've been doing that for, I don't know how many years before ShipIt started, but it's been years in the making. And there were blog posts before that. And one day we realized, actually, there's so much here that we could start a podcast, start a new show if you'd be up for it.

And the rest is history. And you carried that on for, I think it was 90 episodes, which was awesome. And then also we have like going full circle from you stopping at the 90th episode. We have some news to share everyone else that's a ship it as this podcast on the change log is going to stop at the end of the year. So in, at the end of December, 2024,

Don't know when you're listening to this now, but we're stopping the podcast. Again for you, Gerhard. First time for me and Otto to stop it. Well, that was all news to me as well. When we scheduled this conversation, we didn't know about that. And I'm glad that I was able to come back one more time before the original Ship It, in this form, will be put on pause. I always like to say it's on pause. Maybe indefinitely.

Most likely and definitely. But you're right. It's like history repeating itself. Yeah. And for anyone listening, sorry about the news breaking to you. This is a decision for Changelog as the network. They're stripping down. They're not going to do a lot of the extra podcasts they were doing. I think GoTime and JS Party, they want to focus on the main Changelog podcasts.

And that makes total sense to me. I think we're up to seven right now. I came in when we restarted Chipit. We're just like, let's just see what happens. And Autumn and I have been doing this for almost a full year. And they wanted to trim it back. And that makes total sense. Autumn and I are planning on continuing on with some form of this podcast, at least for a little while. We still have

a bunch of amazing people to interview about all these different topics that were just like, you know what? Like this is already has some momentum. We already appreciate everyone that's listening and, and talking to us and saying, telling us, giving us feedback and tell us what they like about the show. So we want to continue that. We want to, we think there is space for this.

in the podcast universe. And it's a passion that, you know, Autumn and I and a lot of people share just about infrastructure and technology and just running, you know, responsibility of running software.

in general. And it's awesome meeting all the people that maintain and run software and infrastructure. And the variety. The variety of people, everything from 3D printer software, like the Octoprint stuff, to stuff in space. And it's been awesome just learning all of the things that are different and the challenges in each space, but also all the things that are the same. Yeah, that are very much the same in the most hilarious of ways.

We should also see if listeners want to send us some ideas for the name of our new podcast, because that would be neat. That's going to be unhinged. Yeah. But that's when you get the best stuff.

So yeah, so this episode, I think we have three more episodes after this one to finish out the year. And hopefully on that last episode, we will have some more formal announcement about where you can find us, where this is going forward. Jared and Adam have been great and they are encouraging us to continue and allowing us to keep doing this. So they might keep some sort of redirect up for people that are listening to this later than the end of 2024. But yeah, we want to be able to

keep that going for some people and make it as seamless as possible. But also like you're probably going to have to add a new feed in your podcast listener. I'm really excited, though. I feel like this just allows for like a new evolution of Ship It. Yeah. I mean, you had 40 episodes, right? Nine months. More than 40, actually, at this point, close to 50. How would you summarize that in a few words? All the episodes that you've done so far,

in this format. I think kind of going back to what Justin said, it's amazing to see like you can be running a satellite in space or you can be running pipelines and platform teams. And it's so much that is different, but so much of it is the same. So much of the new technology that we've built to make infrastructure easier is also just reminiscent of like the past, you know, and it just makes it

I don't know. It's like all the different ways that you can solve this awesome big puzzle in it. I think sometimes tech gets really weird, you know, and this podcast has made me remember why I love what we do and kept me loving it even in like the last year, you know?

I think for me, some of my favorite episodes were the throwbacks, right? Like talking to Rich and Mandy and people that were like, this is what it was like to run the AOL chat rooms. I'm like, that was awesome, right? It was just like, it was basically the same thing we're doing now, just with tools that everyone's like, oh, you shouldn't use those anymore. I'm like, that ran the internet for years and years and years. Like we can't just throw out all the old stuff that was super functional because we don't like it anymore. And so those were really cool to me. Also, I just think that like, it's wild, like,

The amount of people that we've met and they were just like, we were doing this cool thing and I found it and I started doing it. And then it leads to this job and this whole career. Learning how you run Linux and different things in space is just wild to me and how you have to make sure that...

It can be updated and just all the thought that goes into it. But the people that we've met are almost cooler than the technology. Absolutely. I mean, yeah, the people in their journey into it have been really fun to learn from. In almost every case, it was like someone just, well, I just stepped up and learned a thing. That's what I'm saying. But how many jobs can you make this type of impact on the world, this type of money and this type of community? Yeah.

And just because you thought something was cool and you nerded out about it, like that is just, that is the essence of what makes us still want to do this at the end of the day, you know? Yeah. I think my favorite idea is start with this can't be done. Yeah. This is too crazy. This is like, no way this is never going to work. And going through the cycles to either realize, indeed, this will never work the way I thought it would.

But the learnings and the relationships that you make along the way, those are the ones that will take you wherever you're going next. So it's all little steps, some missteps, and usually the missteps are the ones that teach you the most. That would be one of my takeaways, I think, from ShipIt and from all the work that you do in this industry, learning from mistakes.

So powerful. So true. Because I think it's like, I used to be like really scared of making mistakes and want it to be like perfect at everything, which was my toxic engineer trait. And like, I think at like a certain point when you've done so much stuff in like production and just like worked in this industry for so long, like you're no longer scared of like making mistakes. Like you just kind of almost have to like,

get joy and the ambiguity, you know, and the doing hard things because you have no other choice. When you have to be given the freedom to do that, right? Like the, the number one contributing factor to like good performant teams is psychological safety, right? Like being able to say like, I don't know, or I made a mistake and everyone's like, great. What do we learn from that? Where are we going forward? And that's okay to be able to, you know, to have that freedom to make the mistakes and, and,

There's a lot of privilege in that for some of us. I guess like a white dude in tech, like I've been given the benefit of doubt more than I should have been throughout my career to be able to say, actually, I don't know that or I messed up. Sorry, I'll fix it next time, which I know a lot of people don't give that. But also a lot of companies don't give that because they're just like we hire senior people and senior people know what they're doing. Right. Like, no.

like the senior people don't know what they're doing either. They've just taken down production before. Right. It's just like the only real difference of like the junior people are like terrified to take down production and the senior people are like, Oh no, this is gonna be all right. But that's why I think this podcast is important. And that's the why I'm proud of the last 50 episodes that we did, because I feel like there's a lot of podcasts that are like big on like tech and like they're very technically deep depth, but like,

I appreciate the way that we talk about making mistakes and the way that we talk about like the people aspect and how you have to have that safe environment. And we can talk about diversity and all these different things because like, I think people really think that like diversity or safe places or like all these things are like an added bonus to technology, but you can't make good tech without thinking about the people without thinking about how to make a better environment. So like,

Whatever we can do to use whatever privilege we have to influence and to make things better and to help people know that they can get started. And also just to talk to people that are really good technically, but come from all these different backgrounds. Look at all the people that we've had on the show. So I just think it's cool to be able to use the privilege that we have to...

try to make it better and make other people like seen and to also like show that like you can be different and still very technically depth, you know? Well, I guess we'll just transition right into gear hard. What have you been doing since, since you kind of left ship it, since you left change log, what have you been working on? What's what software have you been responsible for? So, um,

It feels like I never really left ChangeLog because, first of all, the Kaizen episodes and of all the infrastructure improvements that we are still driving and they're still very much present. Trying all the things that we did over the years and taking it to a place where it is now and continuing the journey that has been

A long term, very satisfying journey. That's the way I would put it. And I'm very happy that that is continuing. And we figured out a way to make that work with Adam and Jared. So that is personally very satisfying thing and also professionally very satisfying thing.

After ShipIt, and the reason why for me, even like back then when I was on the last episode, 19, Bracing Change, it was a priorities, like I had to reshuffle a bunch of priorities basically. And I had to give more time to my main job, which at the time was Dagger, and it still is a Dagger even to this day.

And I think that was one of the big changes that happened between me starting ShipIt and then having to part ways at that point. 2021 was a very interesting year, and it was not because of COVID, but that obviously that did make it interesting for everybody. But for me personally, I was transitioning into a startup again. I went from a large enterprise at the time that was VMware to

And at VMware, I have been working on RabbitMQ for, I think, six or seven years. And I went through like different types of teams until eventually end up on the core RabbitMQ team. So you get all the Erlang, you get all the Make, and there's a story there because it connects to Dagger.

And you just get to see a lot of really important systems, distributed systems, distributed systems problems. And you realize how important the kernel is, even when you're not using containers.

So little differences between the different kernels can have a huge impact on how something like the Erlang VM behaves. And these are really important applications like things for banks, financial institutions, GPS trackers, and you may be thinking food deliveries, but there's also some other GPS trackers which are really important they work correctly. Tails, payment systems, it's all over the place. At some point, we didn't realize this, some cars, the doors wouldn't open.

And RabbitMQ was in that stack. Like I was not expecting that. Like, you know, you would honestly not expect that. It is wild like to see where tech ends up and how it ends up being used. A lot of the time it gets used wrong. So having those conversations and going through those cycles when you have big teams and big budgets and big enterprises is fun, but also it's a certain type of game. So after playing that game for like six, seven years, something like that, I said, you know what? It's time to go back to the startup world.

Because I did start on that journey before getting to VMware. We were a small startup. We were Cloud Credo. We were consultants for Cloud Foundry at the time and Bosch, for those that remember Bosch, maybe if you listen as well.

And Chef wasn't working for those systems. So a team of 20 something people, then we became Pivotal, as in we were acquired by Pivotal. But in my mind, we took over Pivotal in some way because of that craziness, the crazy spirit that we had. And that worked really well. So being part of Pivotal was great and pair programming and extreme programming. That was at the core of it.

And then Pivotal eventually got acquired by VMware. So those transitions from 20 to 2,000 to 40,000 were huge jumps and huge changes. So I wrote all of that and I said, you know what? It's time to go back to the startup world. And that's where Dagger enters. So Dagger was interesting because I was fascinated by Docker and I was working with Docker and using Docker, but I haven't helped build Docker. So Dagger was in the moment where I could try that.

And I took it. And three years later, here I am. For anyone that doesn't know what Dagger is, describe what are you doing or what does Dagger actually make as a product, as a startup that's like, hey, we're going to change the world for this thing. What is that? So Dagger is what happens when you get tired of all the YAML. When you get tired of all the YAML in your pipelines, especially your CICD pipelines, or when you get tired of your Jenkins file, or when you get tired of your scripts,

You want something that scales with teams and with ideas that can... It's really hard to capture them in YAML. If you are finding yourself starting to template YAML for GitHub Actions or CircleCI or any CI/CD system, you know you need Dagger. The other option is to go towards Bazel and to go into that world. But for anyone that knows that world and experienced that world, knows that is a very heavyweight, enterprisey world.

So Dagger takes all the scripts and all the YAML and it allows you to capture that in code. So what that means is that imagine writing your automation. It can be a make file. It can be your GitHub Actions YAML. It can be your CircleCI config, your Jenkins file, all those things you can take and you can put them in the code that you are familiar with.

whether it's Python, whether it's Go, whether it's TypeScript, and some more legacy languages which are still very much present like PHP or Elixir, for example, Rust, some newer ones. Any of these languages you can use to write your automation and you can package it in something called modules. You can distribute these modules as you would any package and you can assemble them just in time and you can combine them with other modules.

so what that means is that now you're writing automation you're integrating with ci cd is just a matter of calling the right function from the right module and making sure that that function gets wired with everything else that it needs so for example you have your tests right and build and you say okay but you could do this with make files or you could use a just file or you could use anything like that and that is true you can

But what ends up happening with that is that there will be assumptions about the context in which that automation runs. Dagger, the only assumption which it makes is that there will be an engine, which is a container runtime. And in that container runtime, you always have to specify, hey, which container image do you want to run? Which one do you want to start with? So all these functions, they always have a context. And the context is the same regardless where you run.

which means that if you want to run this locally, it will run exactly the same as it runs on any CI platform anywhere in the world. You can run Dagger and Jenkins if you want. It is an option. I've been doing containers and make files for a long time, right? Like that assumption that I can run make

in Jenkins or my local machine and it runs the same. I don't need Dagger for that, right? Correct. It's like I can do, I've been doing it, right? Like all of, yeah, a lot of stuff just works like, oh, just execute the container. Here's some arguments for it. Like, you know, variables to the make file. That's fine. So that side of it doesn't really change how I've been doing things. At least I know it does change how a lot of people are doing it because bash is prevalent. The second thing here, though, I think that is interesting. To me, Dagger has always been

been almost the dagger, if you will, the blade for the DevOps team. The fact that how DevOps used to work for me at large enterprises was...

application teams would go write a bunch of code and the DevOps team would come in and like drop a Jenkins file in, right? Here's your PR for your Jenkins file. Now this is all going to work magically. We had a whole team at Disney Plus that was like a lib Jenkins team. Like we wrote libraries. I didn't do it. I hate Jenkins file, but they wrote libraries for this groovy script so that everyone in their Jenkins file would import that team's lib Jenkins and then it would do a bunch of stuff for them by default.

But that's still required. There was some other team doing some other thing in some other language that was external to the application team. And my sense with Dagger is the fact that Dagger requires that the application team now owns CICD as like they are the ones because like the familiarity of code doesn't matter if you're not the one writing the code, right? Like if you're some external person, you're going to have your own

pep formatting for Python and your own modules and formatting, so it's not going to jive well with the application team. So in this case, Dagger makes the most sense when the team writing the code is also the one doing the CI/CD. Is that right? Yes, that is a valid take for sure. We see different teams and different companies use Dagger in different ways. At this point, we've seen every which way, and they're all valid.

The point is that it forces the different perspectives to come together as code. So forget like a Makefile or Jenkins file or a script or anything, just we will be writing the code that our company is most familiar with, whether you're a DevOps person

whether you're someone in the community that wrote a module, it doesn't really matter. The point is we will all be looking at the same code, we understand the same code, we can contribute to the same code and all the automation ends up being code that we can run locally from wherever it is. So just to give you an example of how powerful this is, let's say that you will take the Dagger repository as is today.

The Dagger repository has a module for the entire repository which encapsulates all the things that can happen in that repository. So without you knowing anything about how it runs or what's needed, you can start discovering what is possible in this module. For example, build me the docs.

but also serve the docs. You have one command that will build, lint, serve the docs on your local machine, the Dagger docs, without you knowing anything or having to install anything apart from the Dagger CLI.

In the same way, you could build yourself a Dagger CLI if you wanted to, once you discover what the command is. And it's all self-documenting, it's all there. So it provides a very nice way of consuming things without you knowing much about what this piece of software is. It's almost like an API to code.

but an API to consuming code, to consuming resources. I don't care whether it's Python, whether it's PHP. What I want is the artifact, or what I want is the docs, or what I want is the auto-completion, whatever the case may be. So how do you encapsulate that in a way that others can understand and consume it in an easy way? Well, but there's a separation there of understanding something and using something without needing to understand it, right? Because again, I've written plenty of

things that were just like, you just run make docs and docs are there for you, right? Like make docs dev. And they're like, it's like, I don't, they don't need to know what's behind the make file and it's running containers still. It's doing that stuff. The thing that I think is really interesting here is the fact that like those modules are shareable and the modules are,

something really powerful that Terraform did for us, right? Like Terraform modules were powerful because you're just like, you don't have to know behind the scenes. And granted, at some point you might need to escape the module. You might need to override the module. You might need to go build your own module, but you can get started with something that has some opinions on how we think you should be doing this. And in the dagger sense and in the Terraform sense, like most of the time, those things are just going to work for you without needing to care. Right. And so it's like, I can,

Dagger and it or whatever, I can start off with the module like, oh, this is the thing I wanted. This looks right. I'm going to go with the defaults. And if I need to change it, I can. So essentially, can you get rid of the make file and all of that with Dagger and just have like, say you were building a startup or application and you

you didn't have the experience that Justin has, right? And you just needed to figure out how to make your CI, CD and like your DevOps, like just whole realm work. And you don't have that experience. Does this now enable you to skip all of those extra files in different ways and just have your whole team learn how to do your infrastructure in that way using Dagger? Yes. If you would go to the modules that Justin, if so, first of all, it would require Justin to take time

to write the modules and to share them so that others can discover them um it's just a matter of basically putting him on on his github repository um the convention is daggerverse so many people have the daggerverse repo which is a collection of the different modules that people use and wrote so at this point um there's i think five or six implementations of the go module

which does all things around go applications testing building linting all those things so you're right in the sense that you would need to figure out how to write that automation just a matter of using it you have a go app great this is how you provide the source and this is what options are available to you and you can try running it locally it works great so how do i run this in ci the same commands the same commands that you'd run locally you'd put in ci and it would work exactly the same

No more figuring of YAML, no more figuring of a lot of things like caching, for example. I mean, we haven't even unpacked this aspect of Dagger, how it caches, how it, for example, sends open telemetry traces for every single operation which happens inside of it. There's so much here. At a surface, it looks like it's a replacement for your scripts, a way of embedding this knowledge and sharing it with others. But other tools have done this before.

So how is this special? It's all the other things which we haven't gotten into, which makes it a very comprehensive way of putting automation in code, sharing it with others and letting others reuse it rather than everyone having to write the same thing in their own specific way. I think the code piece for me is always kind of that barrier for a lot of people because

saying you get to write all your automation in the code you're familiar with is is a lockout for a lot of people because a lot of people don't they're not comfortable with code they're not all the people writing yam are expert yaml engineers they they know how the git lab yaml is going to do something different on something that has a you know a list versus an array whatever it is right like they're like i know this thing but once you ask me to like go write some go code

I'm not as familiar with that. And I feel a little bit out of place if I'm the one that is like an external person going in to do and maintain and do this thing for another team. That's an application team, right? Because like a lot of companies I've worked at, the app teams were always this like high level, like you don't mess with them. Like they're the ones making the money and all you other people are just overhead. And you're just like, oh, you write the YAML, everyone else writes code, right? And like that I think is a barrier for a lot of people.

I think it's also perspective, though, because think about it. When you go to school for computer science or you first get into computers, you're writing Python or Java, right? And nobody tells you about DevOps. Nobody tells you about scripting unless you somehow...

just stumble on that. So I think for people that maybe started in systems or started with scripting, yeah, but it's a huge barrier for people who just started writing the high level code. And then all of a sudden they're thrown into a production environment. Well, now they have to manage infrastructure. Like,

I think you have more people coming from the writing code because when you think about it, when you hear boot camps or school or you go to these different ways that people are being educated to get into the tech industry, everybody tells you about the code and nobody tells you about the scripting and the version control and the...

DevOps and CI/CD, right? So like, I think this is a great tool for those people and making it more accessible to get into DevOps and CI/CD and to be able to maintain your infrastructure because

Everybody talks about making this really cool app, but nobody talks about maintaining that really cool app and releasing it. That's why we're here. That's what this podcast is for. Yeah, exactly. And I think that's a great point where it's a barrier both ways. There's a person that's only ever written web apps with Spring Boot. They're like, I don't know what this Docker thing is. Don't...

you know, I'm not going to go into the weeds of something that's not my thing. It was like, actually it's all your thing. And same thing for the system. People are like, I don't know how the JVM is going to do that thing. Like, well, you better learn. Cause that's also your thing. Cause I felt like when they're like, they'll be like, okay, you have to learn front end, but then you have to learn the backend and then you have to learn how to fix, how to make that connect to a database. But nobody's ever talking about how to like,

keep all of this working together and maintaining it. And yeah, you know, so it's like, I felt like all the whole time during school and going through training at AWS and going through different bootcamps and everything, like I was like, why are we always like, we talk about the main, like the big main products, but nobody talks about all the secrets you need to make all of it work together, you know? One analogy that I think works really well when you're trying to approach Dagger and you're trying just like to figure out like, where does this thing fit in?

Imagine that what you're building is a software factory, right? Your startup, your company, your team, whatever it is, you are a software factory. You are delivering value to your users in the form of software or through software. Great.

Where's the thing that helps you maintain your factory? Where's the thing that helps you do all the things that you normally, I mean, there's not that much value in figuring out, for example, how to run your tests, right? Or how to cache them properly or how to maybe lint. That's another one. How to package. I mean, how many hours do you want to spend figuring out how to package a JVM container or Java container? Seriously.

I mean, sure, there's like recipes that you can copy and AI is very helpful. But what if some people that were really passionate about this and had the time, they encoded this in Java, let's say using Java, and then just get to use it? Wouldn't that be nice? But not just that, but when you have AI do it, it's very hard to get the context. But if somebody else on your team or somebody else that works at the same company as you, that you can ping and be like, hey, I want to reuse this, but I don't understand this part, you know?

Yeah, and if you give people Makefiles or Jenkins files or YAML, what do they say? I mean, when was the last time you received like a YAML package and you said, wow, this is cool. I'm going to use this. This is amazing. When did someone get excited about Jenkins file? Rarely. Not just that, but it never works the same for everybody. Exactly. Everyone's like, use this and it's so easy. And you're like, is it though? Yeah.

I mean, same thing can be said for a lot of like code though, right? Like, like so many NPM modules and Python things I've tried to solve, like this is broken, right? Like what, what is the expectations of this thing to work? And I think that encapsulating that in a, a holistic container that can do some of that has, has been the solve for a lot of those things. But once I'm like, I need to go write some software around this thing, but I can't even run this thing is a problem. Yeah.

And when it comes to introspecting, it's okay, so great. So I have this command which I run. Great, I'll figure out how to run it. Let's see, it's in the make file. So what is actually happening in this thing when it executes? How can I visualize the execution of the different steps? How do I know when a step runs or doesn't run? I mean, I used to love make. I still have make files around, which I still use to this day.

and they're great. Just files, same thing. Only recently on the Kaizen we talked about just files and how just makes more sense in the context of changelog. That's great. You know, if you're happy with your make file, if you're happy with your just file, any automation that you have that works for you, keep it. It's a good thing. It's an asset. It's not a liability. But when that thing stops working, when you get frustrated, when you get all the issues with YAML and all the things that, you know, you've been maybe toiling away for years and years,

When you will consider something better, have a look and see if that makes sense. And then discover the web UI, discover the traces, discover all the things which are available and which are getting better. Discover the shell. I haven't even talked about the shell. That's, by the way, that's like a hidden experimental feature that is coming in a future Dagger release.

That's been shipped silently for a while. But if you are interested, you can join the community calls and you can find out more. But enough about Dagger. We can talk about infrastructure if you want. I want to lean into like all the other things you're talking about, right? Because like I have plenty of make files I've never loved to make. I've always thought it was arcane and hard to learn in a kind of a, in a very gatekeepy way, right? I was like, this is like,

someone learned it once 18 years ago and they're like, yeah, I wrote that make file. I have no idea what it does. And it's always been really hard to get back into it. You're like, this is all obscure old docs that aren't relevant anymore.

But it works, right? Like if it's the thing, if it keeps working, cool. It's probably a you problem if it doesn't work. It works somewhere. But you talk about like Dagger is a drop-in replacement for some of those things. But what are the other things like on the edges there where like, actually, what are we missing out on by not doing this, by not having a newer tool to be able to do that? You've mentioned OpenTelemetry. You've mentioned the shell. You've mentioned modules. Like those are all pieces, but...

It's hard to understand, like, why do I need those things? Or what can't I do today? Okay. So modules, as a category, it's a way to package the code that you wrote and share it with others.

Think about it like an atomic pieces of code that go well together. For example, the Go module would encapsulate all the code for writing with Go apps. There's something for Node.js, there's something for Helm, there's something for even Kubernetes. If you want to figure out how to run K3S inside of Dagger, that is possible. Many things like this are in the Daggerverse. Daggerverse.dev is the place to go to check what modules are available, what can I pick and choose. So those are the modules.

The open telemetry is how we capture what happens inside of a Dagger call when basically Dagger runs. And we are sending all that information to Dagger Cloud using these traces so that we can visualize what happens in your run and think of like the network graph.

So in the browser, if you were to open the network graph and you would see how long resources take to load, the spans, the traces, it's exactly the same concept. So that gives you a very deep insight into what happens in your automation. And you can see which are the steps which take a long time or which are the steps which, for example, don't cache well. All that information would be conveyed, in this case, in Dagger Cloud.

So the way to do that, you just basically connect your CLI to your Dagger Cloud account. You have to create one for, it's basically free for individuals, for teams. It is a paid plan. But as an individual, you can try to see what this looks like. And, you know, you can maybe bring your team members over your shoulder to look or share your screen to see what it looks like. Everyone can do this. So that gives you an insight and appreciation of all the things that happen in your automation.

And then the shell, the third thing, is a way to put yourself in a context where you're trying to discover what automation is available and how to stitch the different functions together. It's exactly how we'd use pipes, how you'd basically get functionality from different parts and try experimenting with it to see what makes sense. So what is the right arrangement for this, for example, pipeline?

that caches well, that works well, that the expensive steps happen first. And you can do that in the Dagger shell. So it's a way to interactively discover and work with your automation. And the perspective is the functions that are declared in Dagger. So that is the starting point.

I think that's really important because we don't have enough ops availability and actual insight into our automation. Automation is a great tool and it makes it where you can scale and take a lot of the human error out. But if you don't have a deep understanding of your automation, it makes it really hard to maintain and to scale it and just to use it in general. When it breaks, you're kind of out of luck. All these things put together, it's about...

an experience which is a bit more visual. It's an experience which is a little bit more curated, right? It's almost like, what would you do if you had to do automation properly? What would you do if you had to not reinvent, but I would say rethink how make files and how your Jenkins files and how your scripts should work in a container-first world. And containers are important because it's that immutable thing, right?

Because those are like having something immutable, having something that caches, having an actual layer is able to speed things up in a way that's difficult to do otherwise. Makefile is local, right? How do you distribute a makefile resolution? It has a DAG, but how do you distribute the DAG?

And that's something which today, for example, in Dagger, we're getting very close to that being possible. What do you mean by close? Like what's missing there? So how do you have a cache? Like we tried a couple of iterations and we know how this fails. How do you have a remote cache that you can safely store operations in at scale? So imagine every single step that runs, right? It has some inputs, it does a function, and then it has some outputs.

If you're able to cache those outputs, put them somewhere like a CDN or an object storage, either way the object storage is what we have and we had. And then when a pipeline runs again or the same call runs again, it doesn't matter where you call it from, as long as the engine is connected to this object store, it can retrieve the steps, right? It can retrieve the layers, it doesn't have to recompute them. Sometimes it's a lot more efficient to pull down these layers rather than recompute the operation.

How do you do that safely at scale in a way that is easy to use and is easy to operate? That is the hard part. Caches should always be invisible, right? Like you just like the easiest cache is the one you don't know you're using. That's the local one. Locally, it works well. When you go distributed, that's when problems start. That's when you have race conditions. That's when you have pruning, for example. That's when you have all sorts of things that you have to deal with. When you're dealing with many terabytes, like hundreds of terabytes of this data, it becomes a hard problem.

And sometimes, depending on network, I wouldn't say even like a network conditions, it can be cheaper to recompute it locally. How much of that can you rely on like what Docker does for caching, right? Because I can have my build X cache somewhere, which is basically the same thing of I get this layer that has a SHA and I can say, oh, I'm going to get the same SHA, just give me the data, right? Like how much of that is...

is using what's under the hood by relying on containers versus building something that new. So that makes sense when the inputs don't change that frequently. In this case, an input is source code and source code churns a lot. So how can you still have a good cache hit ratio when your input is something that changes like on a, like with every commit?

So how do you sequence your source code? How do you compartmentalize it so that you know which functions depend on which source code so that you get invalidations working properly and you don't bust the cache too often? And that is a hard problem.

What's up, nerds? I'm here with Kurt Mackey, co-founder and CEO of Fly. You know we love Fly. So, Kurt, I want to talk to you about the magic of the cloud. You have thoughts on this, right? Right. I think it's valuable to understand the magic behind a cloud because you can build better features for users, basically, if you understand that. You can do a lot of stuff, particularly now that people are doing LLM stuff, but you can do a lot of stuff if you get that and can be creative with it. So when you say clouds aren't magic because...

You're building a public cloud for developers and you go on to explain exactly how it works. What does that mean to you? In some ways, it means these all came from somewhere like there was a simpler time before clouds where we'd get a server at Rackshack and we'd SSH or Telnet into it even and put files somewhere and run the web servers ourselves to serve them up to users.

Clouds are not magic on top of that. They're just more complicated ways of doing those same things in a way that meets the needs of a lot of people instead of just one. One of the things I think that people miss out on, and a lot of this is actually because AWS and GCP have created such big black box abstractions.

Like Lambda is really black boxy. You can't like pick apart Lambda and see how it works from the outside. You have to sort of just use what's there. But the reality is like Lambda is not all that complicated. It's just a modern way to launch little VMs and serve some requests from them and let them like kind of pause and resume and free up like physical compute time.

The interesting thing about understanding how clouds work is it lets you build kind of features for your users you never would expect it. And our canonical version of this for us is that like when we looked at how we wanted to isolate user code, we decided to just expose this machines concept, which is a much lower level abstraction of Lambda that you could use to build Lambda on top of. And what machines are is just these VMs that

are designed to start really fast, are designed to stop and then restart really fast, are designed to suspend sort of like your laptop does when it closes and resume really fast when you tell them to. And what we found is that giving people those primitives actually, there's like new apps being built that couldn't be built before.

specifically because we went so low level and made such a minimal abstraction on top of generally like Linux kernel features. A lot of our platform is actually just exposing a nice UX around Linux kernel features, which I think is kind of interesting, but like you still need to understand what they're doing to get the most use out of them.

Very cool. Okay, so experience the magic of Fly and get told the secrets of Fly because that's what they want you to do. They want to share all the secrets behind the magic of the Fly cloud, the cloud for productive developers, the cloud for developers who ship. Learn more and get started for free at fly.io. Again, fly.io.

I was digging around in Daggerverse while you were talking about things. You have a shell SDK?

Is that like a thing? Yeah, so the Dagger shell, which I'm talking about, yes, so the shell SDK used to be a thing. I mean, okay, so let me just like unpack a little bit. The power of Dagger, one of the qualities of Dagger is that it puts a GraphQL API, it exposes a GraphQL API that all SDKs talk to.

So the engine itself, which is where the work happens, I can think of it like the server, and the way to interact with it is via this GraphQL API. The SDKs, all they do, they are GraphQL clients that expose all the operations and all the resources from the GraphQL API

in a language-specific way. There's Java. By the way, there's a Java SDK with its shell in this case. So if you're able to model the interactions with the GraphQL API through shell functions, it would work. One of the things that Dagger shipped a while ago was the ability to mix and match the SDKs, right? So I can use a module that's written in Go, but my app's in Python, and I can write my function on top of it in Python.

And that seems like a

an outcome of everything talks to the GraphQL API, right? - That's correct. - Everything's just like, okay. It's like, we all just talked to an API. I don't care how you got here. If you're doing curl commands, you can get to this GraphQL, do something, and then the next step is on you to write that in whatever language you want. - That's also awesome because it makes it accessible for multiple teams working at the same startup or enterprise that are writing in multiple languages because it gets to the point where enterprises wanna just start making

Everybody used two languages and you're like, dude, this language does not work for the things that you want it to do. Like, I mean, it can, but it's not the right, you know, like it just gets to the point where you're using a hammer for every project. And they want to optimize shareability. Right. And they're just like, Hey, by, by making everyone right, Kotlin, then everything's going to go smoother. Don't lie. It's usually a TypeScript. We all know. It's TypeScript.

But it goes to the two languages, right? Like everything front-end, that and maybe some back-end Kotlin or something like that. And it's just like, okay, everyone writes these languages now because we have to be able to share this stuff. And we have this DevOps team over in the corner and they're out there writing YAML. And they're like, well, I'm just going to plug in this YAML thing. It's like the team's like, no, I don't want that. But if the Kotlin team is writing a Dagger module and the TypeScript team wants to also use that module, they can because you just said our lowest common denominator is the APIs.

If you can call the API, I really don't care what language you call it in. We're going to give you the same functions. And every language has a way to call a WebSocket or some way to call an API somewhere that's external and say, here's data, give me back something. You're enabling the dev team to do it in the language they're more comfortable with, right? And you're allowing them to fit it into...

whatever they've already built, which means that you're not trying to completely... This is what I'm looking for. There's so many times where they're just like, go restructure all this code to fit this one API, and then you break six things. It doesn't work right. Any way that you can make your life easier without having to do a huge restructure is...

Enabling and forcing are kind of the same thing, right? Because not all dev teams want this. They're like, actually, I just liked it when the external team was responsible for the thing. And when it broke, I just sent them an email and I went to lunch, right? Like that was how a lot of devs liked it. True, but in this time when we're having, we have less resources and we're trying to do less with more, I don't know if people are going to always have a whole nother team. You know what I mean? Like think about the restructuring of enterprises and just the fact of what you have to work lean with a startup, right?

You may not have the option to have that whole other team. There's a reason why cloud services and managed products and SaaS products, people pay so much money for it because it got rid of your DBAs. It got rid of parts of your ops teams and you could have a smaller ops team because at the end of the day, people want us to do all the things with the least amount of resources and resources.

I mean, and I would also say like, if you went to RDS and got rid of your DBAs, you made a mistake. Like that's not, it's not necessarily the thing you thought it was going to be. I also think that these things evolve, right? So like, instead of needing a DBA, now a lot of people need data architects, right? So like maybe,

Maybe you don't need somebody running around a data center and doing the DBA in the traditional sense. But just because you got rid of a DBA, you need a data architect to now tell you how to do your access patterns and how to optimize your database. But one is easier to contract and one you need every day. So they're all trade-offs, right? It's all what works for your business and what works for your use case.

But I think the problem that we have is that people don't know enough about the differences and we just tell them like it's this new shiny thing. So they don't, they're not enabled with the right information to make those decisions. They both have value, but what has the most value for what you're trying to do? You know? A lot of that also to me, at least in my experience, has been people that were

going after promotions and you see bigger, you know, you like, Hey, guess what? I'm going to rewrite all of our make files and dagger. And it's going to have impact on the business. And someone else over here like, but why didn't you learn why we had the make files in the first place? Like, why didn't you learn? He's like, that's hard, right? Like that, that side of the business is a lot harder and also never get you a promotion. I guess we're like, I understand make now has never been on the promo doc for like any engineer. Yeah. Um,

I would not recommend that, by the way. If you have a makefile, what I would say, don't rewrite it in Dagger. Try running it in Dagger first. That would be a much smarter first step.

and always focus on documentation first. Ship It episode 44. Can you say that louder? Because I think people think that automation means we no longer have to document things and it hurts my entire soul so badly. Automation is not documentation. I'm going to emphasize this in a couple of ways. First of all, Ship It episode 44 was a very important moment in my career

When we sat down with Kelsey and we went through all the things that are important for understanding how complex systems work. And documentation is the first step that you should do before you touch, before you even think about automation. Because when you document, you realize about all the inefficiencies and you realize that if you were to change your automation at any point,

as back to what you were mentioning, Justin, makefile, rewriting that in Dagger. What you really want is the blueprint for the makefile. And the blueprint is the document that you don't have. I was one of those people. I remember the Neil Fedotov, I'll not forget him. We were on the RabbitMQ together.

and he was asking me for the documentation. I said, "Hey, Daniil, you don't need documentation. I wrote this beautiful Makefile. It does everything for you. It's self-documenting. It tells you what the targets are. What do you mean, where's the documentation? This is the documentation." And it took me many years to understand

how wrong I was in that moment. And I did it right. So when I joined Dagger, one of the first things which I did, I made sure that the releasing process is documented. So how we released Dagger, we started with a document. That document has been updated almost every other week for the last three years.

And that is the blueprint which we use for all the automation at Dagger. That makes my heart so happy. Releasing MD. So I've learned my lesson and I hope that you will too. Dear listener, write the documentation first. Keep it up to date. Keep refining it. Keep working on it. Keep sharing with the team. It's never done.

and then add your automation. Did you guys hear him? Replay. I just, yes. Because it blows my mind. So you know how you were saying, well, people want to do that for promotion where they rewrite everything? Or people are like, we've been doing it in Bash the same way for 20 years and we're never going to change anything and there's so many new ways to do it, but let's just do it this way forever. And you're like...

Not that like, I do think sometimes if it's really simple, bash sometimes is just the way to go. Right. But like never thinking about how you can onboard new people and share knowledge and make it easier for everybody. Like,

You're just doing the same thing forever. And the comments aren't docs too, right? Like that's the thing that a lot of developers like, well, that has a bunch of comments. We know, no, no, no, no. You pull those out somewhere else. You make it searchable for someone else that isn't in the code to be able to find. I've had people tell me not to put comments though, because the script is self-explanatory. No, they'll be like it. The automation is explanatory. If you write it clean enough. And I'm like, no, no, it's not. Yeah. Yeah. I know. Yeah.

I know now. I'm just like, when you're new and you've never seen this before, it's not self-explanatory. Every job I've worked at, the docs I've written have lasted longer than the code I've written. And if you want to do the lasting impact, the docs are the thing that's going to last. But we don't incentivize docs. We don't reward them. Yeah, that's what I'm saying. And it doesn't make sense because what is the number one rule in school or wherever when you learn to code? What do they always say? Right.

write it down, plan it out, and then you start coding. But I don't know where we missed that with automation. We were just like, oh, but like that's only for code, not for scripting or anything else that's important or release processes. When dude release processes and scripting and like doing the infrastructure usually takes longer than writing the code that you want to release. So like, why would you not do your due diligence to make sure? Well, I mean, let's be honest, the meetings about writing the code take longer. Yes. Like, you know,

But that's what just kills me when they're just like, engineers just need to be able to write really good code and be technical. And I'm like, that is such a small part of being a good engineer. A lot of people that are afraid of AI being able to write code is like, you're just looking at just a small part of what the engineers are doing. Those are the parts that we incentivize and reward to be this 10x developer who can write and build things super fast, but like...

But think about it. Those are the people that job hop every two years and somebody else has to go fix the 10x developer stuff that you put together and duct taped it. Nobody knows what it does. Nobody's documented it. And you're ruining someone else's on-call life. Autumn, you can just drop names if you want. I was a little triggered for a minute there. Like...

I'm so tired of tech bros. So where do you think this goes? Gerhard? Like what, what is the end goal here for something like dagger or something that is, is making this a, I feel like dagger is the, is the product ties make, right? It's like if I, if I, if I erased everything and said, what if this was a really good product and

And we put it in here and we gave them good experience and good tools and good UI around this thing that is kind of opaque and always been weird that one or two people on the team understand even remotely. Like, let's just make it so it's easier for everyone to understand. Let's make it so it's a thing that's maintainable, is shareable, is scalable.

What's the end goal there? What's the thing that you're at the end of it like, oh, if we get here, we've made it. You just made me think of like the difference between VMs and the cloud. Dagger is like the make file for like making. Well, remember what containers did for applications. That is the moment that I envisage for all the scripts that we write. That moment for all the automation that you write.

the container moment for all those things where we agree to put these things in containers first in a way that's immutable, in a way that's content addressable. And if we do that, if this grows to a certain point, we are no longer writing automation. The automation knows how to consume the resources that others have built.

without us having to go and figure out how to plug them together for example how many times have you gone to download the package checked or like a tarball unzipped it check the sha256 make sure it's okay make sure read the change log read what changed figure out how like what things have deprecated all of that times a thousand

In your career, you will have done this a thousand times at least. You're giving me PTSD of like flashbacks. Is there a better way that I can consume these things? I can run it. I can validate that my software works. Whatever I'm trying to do, it combines well with all these things.

in a way that is just friendlier. You get like a whole new experience of consuming software, of building, of distributing, of doing anything that you have to do with your source code before it gets out in front of people. Encapsulating all that knowledge, encapsulating all that knowledge in a way that is also documented, right? Because coming back to the documentation, you don't document things, but maybe you're willing to document your argument. Maybe you're willing to document your function.

and explain how this function should be used and how it can be interacted. And your function is a very small piece the world now gets to use because you're passionate about Java and you know how to build Java apps and to have package containers as well. Can you just go on a speaking tour about documentation and automation? Because I feel like you could fix the world. One doc at a time. One doc at a time, yes. Do you feel like there's any mistakes we're going to make again?

I feel like there's, as much as containers have done well, there's a lot of things that containers haven't done well for application packaging and sharing. Do you feel like there's something in there that's like, maybe this is...

in general, like not, not dagger specifically, but maybe this like process of like this writing code, doing a bunch of stuff to it and then like spitting it out, packaging it and sending it somewhere else. Maybe that's the problem. I think the problem is that the ambition is too big and we don't get to capture it in practice well enough. So if we disconnected too much from reality, if we talk about these hypotheticals without, you know, having something real to back them up,

everything will just not go where it needs to go because there's too much air in the balloon and it will pop. And the balloon is just air. That's it. That is my fear. I think with every technology, you tend to get that.

Also, you start getting competition, players that maybe they dominate the market, but maybe they're not the right solution for the problem. But they're the big gorilla, the 800-pound gorilla in the ring. And everyone gravitates towards them because they have a monopoly on the market. And everyone talks about that. And then you get some huge acquisitions, and then everything goes sideways.

So I think there is a real risk of not keeping it rooted in reality, going too far out in hypotheticals. And...

Maybe not dealing enough with like the little paper cuts because there's a lot of paper cuts and we try addressing them. So many paper cuts. Yeah. And now I'm talking about Dagger specifically, but you're right. Even in general, like in the software industry, there's like so much stuff which is broken. And I think you need some of that because it is a form of technical debt, right? You need to innovate at the same time. But if it's too broken, that's a problem.

So one of the challenges that we have, or that's something which I've taken upon myself, and if you want to read more about it, you can go to Dagger issue 8184. And guess how I know it off the top of my head? Because I've been on it for months now, is how Dagger uses Dagger at scale. What does that mean? The scale that I'm talking about, when our pipeline runs, we are spinning up

maybe up to 10, 15 large instances on EC2 that sum up about 500 CPUs to run the pipelines, all the pipelines for Dagger. If you have five or six pull requests happening at the same time, you have thousands of CPUs being spun up to test various things Dagger. That costs real money. That is a hard problem. The less reliable your pipeline is and the longer it takes to run, the higher the pain.

the higher the waste. And that is a hard problem. Every single team will have this at some point if they're successful. We are getting there. We are looking at our AWS bill and we're thinking, wow, this is expensive, really, really expensive. What can we do? And there are certain ways in which we use Dagger, which we need to bust the cash. We need to do certain things at which point are containers the right way of doing this? Let

That comes up. What is the overhead of using overlayFS? What is the overhead on the disks? What is the overhead on the networks of pulling all these bits down? What is the overhead of having to recompute the same thing when you don't have a distributed cache? So there are some hard problems there. They're fun to solve. And this can either be a great success or the biggest lesson of my life. And I will accept them both with equal joy by heart.

I think you make a point there too. Like I, I did a calculation for my local developer desktop, which is like a AMD thread ripper versus something comparable at Amazon and

And when I look at the machine that I built out of parts, basically cost around 600 something dollars. And like to get something to AWS, it was 101 times more expensive. Like not 10 times, like for a, this is consumer grade, not ECC RAM. It's not all that stuff. Like I don't need that for some of these things. We're like, I'm doing local like development and testing. Like I don't need ECC. I don't need all this stuff. And when you have something as portable as Dagger, it makes a lot more sense to,

to like, hey, maybe actually just a couple of these PCs that don't have all of the benefits that we don't actually need could save us a lot of money. And even if they're not used, because if people say like, oh, it's wasted, right? You're not using it all the time. Like, it doesn't actually matter when it's a hundred times cheaper.

I could use it one one hundredth of the time and still save money out of it. That math works out. So you just feel like, oh, maybe a pile of build machines just sometimes does make sense. I think that's fascinating. Sometimes it does make sense, but sometimes you also have to remember that

if you're using it and you're not just playing around with it and you need it to be reliable, you have to replace it. You have to maintain it and you have to have people that have the knowledge to do it. So, I mean, people build, you know, see like the, I have terraform code and all the other stuff in the cloud too. Like all that stuff needs to be maintained and done. And so it's not free either way. Like they're not free either way, but like,

I think that we went so hard on the cloud. We went so all in and it was the answer for everything. But now our attitude is automatically that we don't need the cloud and everything can be done on prem. And neither of those are correct. Like there's there is a use case and there is a time for both. And you have to figure out what works best and what is most cost efficient. And sometimes one works for a while and then you grow too fast and you need to do the other thing.

But I do think that it's a disservice to pretend like one size fits all and we can do everything on prem. We can't like, no, yeah, absolutely. If that was the case, we'd still be running servers in our grandma's basement. You know what I mean? I still am. It's your basement now, but, but,

But you know what I mean? What you use for fun and what you use to maintain, I don't know, a huge enterprise are two completely different things. And like, it's funny because like there's so many companies talking about how they run on-prem right now, but they're essentially building their own cloud that someone else is maintaining. It's a little bit cheaper, but they're not really truly running on-prem. Like,

they're sending their hardware to someone else to run. So it's like another iteration of the cloud almost. You know what I mean? It's not all AWS and all computers in my garage. There's this mix of like, oh, there's colos. I can rent some things temporarily. I can get bare metal places. That's what I'm saying. So it's almost an iteration of the cloud and hardware and on-prem. And we're in this weird place that I think because everybody is...

trying to save money and figure out how we do things differently that like people are almost scrambling and it's going to be interesting to see what actually saves people money and time and is good for their use case or if they end up down a whole nother rabbit hole but I think we just like

like it's a disservice to pretend like they are all the same thing and that you are not essentially using a different type of a cloud. You're just using a private cloud. If we remove the word cloud, like in gear, it's like, it's just VMs. That's what I'm saying. They're all VMs. And so my point was the fact that dagger is a new tool because the downside of having a

on-prem build machines, which I've had in plenty of jobs is always the maintenance costs of like, Oh, someone tweaked something and now I don't know what broke. Right. But the, the dagger encapsulation of these jobs of saying, Hey, actually this should run anywhere as long as this engine exists.

And now we can encapsulate that holistically and not rely on the base OS as much or the version of it. And you even, we started the conversation, you were saying how like the kernel matters in a lot of these cases too. And so there is some things that aren't generically possible across the board. There is still some maintenance costs there, but for the most part, like,

should run more portably than my make file that is doing bash scripts right like that side of it is like oh now i have that flexibility to make the decision of where do i want to run this how should i run this where is cheap compute available right like i could do spot instances to lower that price or if i have some extra computers in the closet i could just plug them in and do that too like there are there are options there because of the portability that you're

building into the tool and I don't lose out on the benefits that I might get for something that's like, hey, I'm doing a VM in AWS. I need logs from it. Oh, cool. Those are just in CloudWatch logs. And in this case, like, no, we're all talking to an API. You get the tracing data

by default, by using this Dagger Cloud or by using, you get the logs, you get the stuff out of it. Like that's why I think there's so much gravity around GitHub Actions because GitHub Actions is just like, we give you a bunch of these defaults, but it's not portable. It's not something that is easy to move around and decide, I don't want to run this here anymore. I think that's going to be the startups that really like that become the next big tech companies that are going to really benefit from this weird sunken place of tech that we're in where everyone's trying to like

make as much money with the least amount of what they have is like the startups that allow you to lift and shift and to be portable and to be as

kind of like into monitor and like to monitor all your different clouds, all your different instances and all what you're running, like whatever gives you that portability and that observability and monitoring into all the different things, because everybody wants to do hybrid cloud, all the different regulations coming or the fact that now one cloud or one way to do it or on premise cheaper. And they want to lift off of this thing and go to this thing, like all the things that are going to make software more portable and easier to switch to.

are going to be what really expands and like, like just hits right now. And that's where I think dagger fits perfectly of, um,

We could say we could run this anywhere. The portability of I can't write code anywhere. I need a developer machine. I need something that has some tooling locally. In all the web-based development tools and whatnot, I've tried all of them. I've never stuck with any of them because they always had some limitation for me or they ended up being really expensive. And I'm like, actually, I already paid for this computer. I could use the computer and pay myself the maintenance cost of making sure this thing runs.

And on the other end is like, well, we have these containers, we can package them. We have this Cooper and Eddie's thing, which can run everywhere. We're like, we have all these things that like at the other end, we can have the portability, but in the middle there between writing the code and deploying the code somewhere in that build pipeline, the CICD stuff was always super sticky.

And it was super hard to say, like, I can shift this somewhere. I can't lift and shift Jenkins. I can, but like, it's a lot of work. It's not actually just an endpoint. Like, and you have to just reproduce the same thing in a different environments. And I think that's really cool about how, like you mentioned at the beginning, right? Like this is what containers did for applications. Dagger could do for that CICD middle pipeline of this is portable and we have some options here.

Also, when you're scaling release infrastructure, what used to work can definitely get to the point where you have too many pipelines and you've grown too much and then having to lift and shift that is the most painful thing.

in the whole world. I mean, and that's a unique problem to a lot of, like not all companies have that problem. Like Amazon had that problem. That was definitely an Amazon. There's a lot of pipelines here to make things safe and roll out and all that stuff. But not everyone requires that amount. But also when you're building stuff for different architectures too, being able to lift and shift them to different things because what might work on Mac might not work on Windows and what, you know what I mean? So just being able to see if this really does work

in that architecture natively is really important. Yeah. One thing that we talked about previously, I thought was a really important point, was around what makes a good developer. One would say, go further, what makes a good engineer, a good software engineer. And it is not the documentation, even though that plays a very important role. It is not the code that they write, even though that has to be present.

It is not how they present and how they talk. But again, that has to be present. How they share ideas, you know, how they show into the world and do whatever they have to do.

Those are well-rounded individuals. And that well-roundedness, I can see how it translates to whether it's the cloud, whether it's the on-prem, whatever the next thing is, you need to be well-rounded in all these things. And you should be using all of them because if you don't use it, you lose it. Very simple, very true. It will never change. So you should be using the cloud, but you should be using the cloud the way the cloud was meant to be used.

And you, Justin, made a very good point. At Disney, how you were using the cloud, possibly the best example of using the cloud correctly. Right? Span out, stand up all these instances, and then tear them down. That's how it's supposed to be used, right? It's capacity on demand, massive capacity. You would not want to buy that. But when you need it, it's there. And it's a great commodity.

It is 100 times more expensive because that's what this commodity costs you. Where else can you turn up and say, hey, spin up 100 beef instances? That wouldn't take months. It would take maybe years to source all of those things. And by the way, no, no, I don't want that AMD CPU. I want that Intel CPU. Oh, ARM? I think I'm going to have some ARM. And by the way, I'm going to have, I don't know,

10,000 ARM CPUs. Where can you get that? It's the convenience of that. Now, if you know what you need, right, as a business that is successful, as a startup that's successful, well, for the things that you know that you definitely need, and that's like your baseline operating budget in terms of infrastructure, have that in a place that makes sense. Have that cheap. Have that, you know, cheap, I say cost efficient. Do what makes sense. Hybrid cloud. What about Wasm?

What about the Bozzy runtimes? How do they change how we see containers and how we work with containers? Is that coming? Well, I can tell you that React did not work for Dagger Cloud. React, the DOM, to generate all the things that we had to generate was just breaking down. The tech was not up to scratch. So what did we do? We looked at Bozom v3.

That's the V3 cloud. We went through three rewrites to get it to a point where it's as performant as it should be. And there's so many dragons there. Like, don't think that this is like the holy place where you pick it up and everything is awesome. Oh, no. The cycle starts again. But that's it. That's the beauty of it. Start those cycles. Keep going through them. Keep learning. Keep iterating. Eventually, you will be able to consider yourself well-rounded.

And others will look at you and say, wow, I wish he responded to my pull requests. I wish he left some comments and did some reviews because I like his reviews. I wish he blogged more. Or I wish he went and gave some talks because she's an excellent presenter.

So look for well-rounded, not for 10x. I think that's a great place to end the episode. And Gerard, again, thank you so much for starting this podcast, for putting it out there as a place that people can learn and just get access to someone that they may not be able to approach and have a conversation with and then learn from their insights on all of the cycles and failures that they've had from the past and the things they've learned and then the things they want to share about that.

That's what we are trying to continue. That's the things that we're like, not just having cool people on with exciting conversations, which we do like, but really just being able to help people be a little more well-rounded about actually that database is important. And actually the CICD is important and all of the things that go around the code and the businesses responsible for that,

matters and how do we help them be more responsible and help the people that run them understand them so thank you so much and i truly think people don't talk about that part enough like

That's the information that people need. Because we don't reward it, right? But it's so true, though. You can't have any of these cool things without the database or without the infrastructure, without all of these things. So I just think that it brings me nerdy joy to expose people to things that they may not have been exposed to and to have people who do that every day tell them, like,

the caveats or what they tried and what they hated and where they failed and what they succeeded at, you know? Where should people find you online if they want to reach out, continue the conversation? Gerhard.io. That's a good place. There's this new space, which now it has a domain, makeitwork.fm and makeitwork.tv.

The way I think of it is movies for nerds. Come to Blue Sky so we can like bug you. I am. I have to start using it. I am there, but I have to start using it because there's too many social places and there's so many things to do. That's, I think, the one area in which I'm not as well-rounded as I could be.

Be more present on the social media. I think blue sky is becoming where everybody seems to be going. I'm really hoping that it sticks because I can't manage all these social media platforms. Yeah. I mean, there's a lot. I I'm hoping blue sky succeeds only for the fact that, uh,

that it will encourage people to hopefully make their own websites. Like, I think that's the true mission and purpose of blue sky is to make the internet be more democratized and people just understand I should be able to own and run this data wherever I want and then publish my own content and own a domain. Like that stuff is super important to me. And, and it's not just another centralization of, of,

Hey, we're the new search engine. We're the new social media thing. It's like, everyone come here. It's like, no, actually, how everyone interacts together and builds on top of what other people are doing in a democratized way, I think is very critical for the internet to succeed long-term. I like that it has the ease of other social medias, but it does, like you said, force you to like...

But not just force you, but you get to keep what you're investing in, your data, your content. You can't see it. I'm running a PDS on my Raspberry Pi back here behind me. And so I have a user that's on Blue Sky that's literally reporting back here. I already made a video about it. It'll be posted soon. I'm waiting for Blue Sky to cool down. Come get your nephews and teach them how to run stuff on Raspberry Pi because there's too many things to do and I don't have time to figure this out and they keep bugging me. Come get

We will have someone from blue sky infrastructure on the podcast soon. They're dealing with all sorts of scaling million people every day for the past few days. So I'm, I'm waiting to post my video cause I was talking to them. I'm like, Hey, would this be a problem for you? If everyone started doing this and like, yeah, actually please wait, just wait a week and we'll be fine. But yeah, we're going to have them on the show soon cause I'm fascinated to learn about their scaling journey for the last couple of weeks and months.

They've done a great job. And also they've done a great job on print. Yeah. Question for you, Justin. Would it work in Kubernetes PDS?

Yeah. I mean, it's just, it's a, it's a web server with a SQLite database. That's all it is. It's just, you know, it's like a Git tree in SQLite and that's all it is. And it's just like, yeah, if you can publish that somewhere, it's, that's, that's one of the things I think is really cool about Blue Sky's self-owned sort of federation versus Mastodon where the Mastodon scale up story was a lot more difficult where like you need Postgres, you have a Ruby on Rails app, you need cache.

I just can't commit to that type of, like, it's not that I hate Mastodon. I just can't commit to that type of overhead for social media. Like I have so many things to do. And the PDS is just a, it's actually, they deploy it in a container. It's like, if you have a Debian thing, it's already packaged in a container. You just want some, some way to have some, you know, reliable storage underneath so that your SQLized database is always available and backed up.

But otherwise, yeah, it's fascinating. I've already run a few of them. I did a live stream a little while ago, but yeah, I'm still toying with it and figuring out like, how does this thing work? How would someone want to own this going forward? Because more things are being built on this sort of protocol and personalization

personal storage versus app tier scraper sort of thing where it's like the app tier is the search engine and I have a website and how does that interact together speaking of that I need to go fix my website why can't you just fix my github pages because you obviously like enjoy messing with dns more than I do I love dns I hate it so much I hate it so much like

Thank you everyone for listening. Please reach out online to us if you have suggestions for a show title that we can continue going forward. And we will hopefully have something for you in a couple weeks on where you can find us going forward for next year for 2025. And we will talk to you all again soon. I feel like Gerhard has to come back for our new podcast. Oh yeah. I'd be happy to. Good luck. It was so nice meeting you. You too. I think you've done an excellent job.

with these episodes. I think it was a very nice transition and it's great that you were able to carry the torch, the spirit, continue the spirit of Ship It. I think people appreciated that and I'm looking forward to what you'll do next. I'm really excited. I feel like this can be the new iteration of Ship It. It's a little scary though. Thank you. See you next time.

Thanks for listening to Ship It with Justin Garrison and Autumn Nash. If you haven't checked out our ChangeLog newsletter, do yourself a favor and head to changelog.com slash news. There you'll find 29 reasons, yes, 29 reasons why you should subscribe. I'll tell you reason number 17, you might actually start looking forward to Mondays.

Sounds like somebody's got a case of the Mondays. 28 more reasons are waiting for you at changelog.com slash news. Thanks again to our partners at Fly.io. Over 3 million apps have launched on Fly, and you can too in five minutes or less. Learn more at Fly.io.

And thanks, of course, to our beat-freaking residents. We couldn't bump the best beats in the biz without Breakmaster Cylinder. That's all for now, but come back next week when we continue discussing everything that happens after Get Pushed. ♪♪♪