DOP 283: OpenTelemetry Meets Mobile

2024/10/2

DevOps Paradox

People

Austin Emmons

Darren Pope

Viktor Farcic

Topics

Darren Pope 探讨了 OpenTelemetry 在移动端应用中的数据传输方式，指出其采用推送模式，并强调了处理来自数百万设备数据的挑战。他回顾了早期移动端分析平台的局限性，并对 OpenTelemetry 的标准化表示肯定。 Viktor Farcic 关注了 OpenTelemetry 在移动端应用中的数据收集和处理方式，以及在处理来自数百万设备数据时的挑战。他提出了关于数据存储、上传时间和重试机制等问题，并探讨了苹果公司对隐私保护的限制。 Austin Emmons 详细介绍了 Embrace 公司如何使用 OpenTelemetry 标准格式收集移动端应用数据，并解释了选择 OpenTelemetry 的原因，以及如何处理移动端应用的特殊限制，例如应用沙盒和系统限制，以及网络状况和重试机制。他还讨论了自动监控和手动监控的结合，以及与各种后端系统的集成，并解释了为什么指标数据被放在最后处理。他分享了在 iOS 平台上使用 Swizzling 技术进行自动监控的经验，以及如何处理低电量模式和低内存警告等系统事件。此外，他还探讨了移动端应用的发布流程，以及如何进行 Canary 部署。最后，他阐述了积极参与 OpenTelemetry SIG 的重要性，以及 Embrace 公司的业务和开源策略。 Darren Pope 关注 OpenTelemetry 在移动端应用中的数据传输和数据解释的挑战，并对 OpenTelemetry 的标准化和跨平台应用表示肯定。他提出了关于数据量和数据处理方式的问题，并对移动端应用的监控提出了新的思考。 Viktor Farcic 关注了移动端应用监控的特殊性，以及与后端应用监控的差异。他提出了关于数据收集、数据处理、数据存储、以及应用发布流程等问题，并对 OpenTelemetry 在移动端应用中的应用前景表示期待。 Austin Emmons 详细解释了 Embrace 公司如何利用 OpenTelemetry 构建移动端应用可观测性平台，以及如何解决移动端应用监控中遇到的各种挑战。他介绍了 Embrace SDK 的功能、架构和使用方法，以及如何与各种后端系统集成。他还分享了在 OpenTelemetry SIG 中的参与经验，以及对移动端应用监控的未来展望。

Deep Dive

Chapters

This chapter explores the challenges and opportunities of applying OpenTelemetry to mobile application monitoring. It highlights the shift from server-centric observability to the complexities of monitoring millions of diverse mobile devices.

OpenTelemetry extends beyond traditional server observability.
Challenges include data interpretation from millions of devices and determining optimal data upload times.

Shownotes Transcript

Translations:

中文

you have those existing pillars of logs, traces, and metrics, but then it becomes, okay, when is the right time to push this up? And then how do we interpret this data when it's coming from hundreds of thousands of millions of devices and not just a small subset of servers that are sitting on some server rack? This is DevOps Paradox, episode number 283, Open Telemetry meets mobile. ♪

Welcome to DevOps Paradox. This is a podcast about random stuff in which we, Darren and Victor, pretend we know what we're talking about. Most of the time, we mask our ignorance by putting the word DevOps everywhere we can and mix it with random buzzwords like Kubernetes, serverless, CICD, team productivity, islands of happiness, and other fancy expressions that make us sound like we know what we're doing.

Occasionally, we invite guests who do know something, but we do not do that often since they might make us look incompetent. The truth is out there, and there is no way we are going to find it. Yes, it's Darren reading this text and feeling embarrassed that Victor made me do it. Here are your hosts, Darren Pope and Victor Farsen. Victor, we've sung the praises of Open Telemetry now for, what, maybe a year, year and a half? Yeah.

Yeah. I mean, it put everybody in line, finally. It's not a praise that much. It's not about being better or worse. It's more like being a standard. But we spent all that time saying, okay, this is great for the data dogs and the new relics and the dynatraces. I'm trying not to forget any of them as I'm listing them out. I think it's the other way around.

You think so? And this is my theory. I cannot prove this, right? Datadog already had its agents and formats and stuff like that, right? They gained nothing from OpenTelemetry. If anything, they could have lost because they already have their own investment IP before it, right?

who gained their end users more. Oh, okay, so now I don't need to instrument application in a certain way for a certain observability tool and then wonder why I can never switch. But one of those platforms we haven't talked about at all is mobile. And on today's show, we have Austin Amazon from Embrace. Austin, how are you doing? I'm doing great. How are you guys doing?

Very well, thank you. Who would have thought that you could actually use OpenTelemetry with mobile? It never crossed my mind. I'm so confused. I cannot even imagine how that would work on mobile. My head can wrap itself around, you know, oh, those are my servers, right? I know where things are running. I know how to get stuff from the things. Mobile, that's a mystery.

It's definitely been a long time coming. I think we've seen the progress on the back end. And now is the time that somebody has to make an attempt. And the community has really making the attempt, which is great. We're seeing that thrust in the last six months. There's just been growth in the client side space. And naturally, when you talk about client side, then the mobile space as well.

How does that work? I'm guessing now, and correct me if I'm really not informed in that area. It must be a push type, right? Because you don't know where to pull things from. And you probably couldn't even if you would. It's pushing. So the main logic is really that export. The data model stays the same with OpenTelemetry. And you have those existing pillars of logs, traces, and metrics.

But then it becomes, okay, when is the right time to push this up? And then how do we interpret this data when it's coming from hundreds of thousands of millions of devices and not just a small subset of servers that are sitting on some server rack? Millions of devices. Okay, I was already getting my head just around the mobile, but I hadn't actually factored in the millions of devices. Back when I was doing mobile development back in 2007, 2008, there were analytics platforms.

And we would implement whatever the library was. And again, going back to what Victor was saying earlier, Datadog had the way of doing things. New Relic had their way of doing things. In mobile, there were the analytics platforms that had their way of doing things. But now with OpenTelemetry, it should be straightforward. I mean, with the Embrace project, it is an open source project. Are you going just after the standard metrics, logs, and traces? What are you doing?

That's exactly it. We had a proprietary formats for the data we collected beforehand. And we realized, well, every time we add a new type of telemetry, some new instrumentation, we were starting from ground zero and saying, okay, how is this coming across the wire? What is the envelope that this should go in? And we optimize that as best as possible. But then we realized, well, let's just hold on a sec.

We can just throw this into a span and measure that duration of time and start capturing these, grouping them together and batching them up as we send them. Or if it's a single point in time, maybe we decide, okay, a log fits better for whatever this is and we'll write it out as a log. And then when we want, send that log up when the device is ready. And that is a little more difficult than just the phrase there, when the device is ready. It's like finding the right time to...

stay out of the way of the application that you're monitoring, but also provide feedback as quickly as possible to those application developers that want to look at sessions almost as they're ongoing. Are there some limitations that are out of your control? I'm inventing now, does Apple allow, iPhone allow you to send data or you need to go through some hoops or how does that part work? Because you don't control the hardware, right?

Absolutely. And that's the biggest thing is that you do not control the hardware. So the limitations that Apple provides, there's just sometimes the application itself, we are an SDK. And so if you're an observability provider, you're likely an SDK that sits within that application sandbox on the iOS platform and you are...

Part of that application, which means if the app is in the background and nothing is happening for a long enough time, the system might just say, okay, you're done. We're just going to kill your app for a sec or sleep it, suspend it, and maybe bring you back the next time the user is coming in. Well, obviously bring you back the next time the user opens the app, but you might come back into...

a fresh state, or you might come back into some dirty state, depending on how far into your sleep you've gotten, I guess. And so there are definitely constraints that you need to work around. There's just a lot of recovery. It's okay, where in this process am I? And do I have data that I haven't yet uploaded that I need to export and get up to the OTLP collector? And when I do that, making sure that

that happens as robustly as possible. So if I need to retry anything, if any of those uploads failed for any reasons, there's absolutely... And then just the physical constraints on the network, where you live or where you are, is this a Wi-Fi connection? Do you have to batch these smaller than you might when you're developing? Because your development environment is as clean as it comes sitting on Wi-Fi with...

maybe you're on a VPN or something and everything's a lot more stable than what's out there in the real world. Victor went down that point of trying to figure out, okay, what happens when, what happens if somebody does a force kill? It feels like to me, I haven't gone through the code. It feels like you're probably writing to a SQLite database and then doing some sort of transaction log off of that, which then you can determine resends or deletes. Does that sound about right?

Yeah, pretty much. The SGLite database and then...

When we're ready, so we determine that a user session is important because this is a client-side application. And so we will batch things into what we call a user session. And we've determined that to be when the application is foregrounded or backgrounded. And we have foreground sessions and background sessions. And so at the end of that session, we will kind of take that raw data that's stored in SQLite as a span record and push it into kind of more of a

blobby payload record, and then we can cache that payload record for it to be ready to be uploaded. And that's kind of the, the cue to offload from the device. So if I'm an iOS or Android developer today, and it looks like you also support react native flutter and unity. Wow. Quite, quite a big, big span there. I'm assuming it's just another library to me.

And I just have to decide what data I want to send. Well, not necessarily send. What data I want to, I'm going to air quote, log. And then depending on how I've called that SDK is going to how it's going to actually look once it gets shipped back, whether it's a log metric or a trace, right?

Correct. So hopefully it is just another dependency and you're familiar with pulling in a new package with the Swift package manager or pulling in a dynamic framework if you want to build it from source yourself. And then from there, it's just interacting, you know, enabling certain features, maybe customizing what you want to capture.

and what instrumentation you would like to be automatic. You might want certain parts of your application to be instrumented manually, and that's calling the actual log message or start and stop a span because you know your application logic better than we would be able to interpret. Then you configure an exporter. What's great about

Open telemetry is OTLP is this standard protocol that there's so much infrastructure already out there that supports this. And so if you have a Grafana instance running, you plug it in. If you have a Dynatrace instance running, you know, you just plug in the authorization for that collector and quote unquote, it just works. But that's the best part of this.

being in this ecosystem is the data can go wherever you'd like it to go. And if that's right next to your backend data and you want this data to intermingle with your backend data, feel free. If you want to keep them separate and you want to look at them in isolation because that's just how your team is structured, then that's also up to you. So it's very flexible in terms of where you can export this data.

Where does the data go? I mean, it sounds like a silly question, but does it go directly to the destination like Datadog or Prometheus or whatever people are using? Or it goes to the backend together with other requests and then from there further?

So in our SDK, you configure the exporter on device. And so it goes straight from the device to wherever you've configured that exporter. If you put a gateway in front of something to manage that load, then you could. Can I set up multiple exporters? Let's say I wanted to send my, just because the UI is at the other vendors, or let's say, and thank you for saying Grafana earlier, because I'd forgotten them.

So let's say I wanted to send my metrics to Grafana. I want to send my logs to Datadog and I want to send my traces to Dynatrace. Could I do that? Yeah, absolutely. So right now, our SDK just handles logs and traces from the device. We're starting to explore metrics and hopefully we'll get there. And I can discuss why we saved that for last.

The exporter is configured for each pillar. So log exporter, trace exporter. And my favorite part is that's a protocol that accepts, you know, it's one method that you implement. And so there's one baked into the underlying OpenTelemetry SDK just called the, I think it's the multi-span exporter, and it does the fan out for you. And so if you want your traces to go to multiple locations, you can configure that.

When I work with something similar in backend, you combine application metrics or traces or whatever you're doing, hopefully all of those, with hardware, right? And then all of that combines. Can you access hardware as well, or it's purely application data only? Do you know kind of, do you get, oh, this is iPhone, or this is Samsung? Oh, yes. Yeah.

So Apple has done a really good job with its push for privacy to really...

hunker down on what you can access, but they have done a very good job at saying, we don't want you to be able to fingerprint this user behind the application. And so getting access to the device model, things like that, that is easy and straightforward and still acceptable, but getting access to a piece of information like how much disk space is remaining is

That was determined by Apple to be kind of a vector to add to this ability to fingerprint the user. And so if you take how much disk space is remaining, which is an integer value, but a very precise value with the IP address maybe, or one or two of these other characteristics, then you can start painting the user into this picture and start really identifying, oh, we can see them travel across these apps. And that's the behavior that Apple is really trying to cut down on.

on because it's not really the application's place to monitor where they go. Or even us as an observability vendor, we really do not want to do that. So there's nothing that we would capture location or the system already has those very explicit permissions, which is great. But a lot of those now Apple has this official privacy manifest that

It looks like a nutrition label when you go to the app store. And it's really helpful to say, here's how my data is being treated. And this data for application performance is kind of the bucket that we fall into or any observability vendor would fall into. And you have to explicitly declare what calls you're using if you are accessing any of that hardware information. How willing developers are to instrument their applications?

I'm asking that because in backend, we have that constant struggle that, okay, there are ops people and I'm generalizing now, right? And they observe everything and stuff like that. And they have all their agents on their servers running. And they hope that one day, maybe if all the stars align, developers will also instrument applications, but that often does not happen. What's your experience in mobile, especially since actually that's,

There is nothing beyond instrumentation. You cannot run agents, right?

It is frustrating at times, for sure, to get people to understand what instrumenting is and why it's useful. But that's just part of the process of educating the community about, okay, these developers... A lot of developers that I've talked with and interfaced with, they're using some other event provider, maybe Mixpanel or any one of these vendors to...

follow the user flow and see success. This user signed in, this user reached the checkout flow, and it's very quick application level instrumentation that they do. So they're familiar with that. And now it's

translating what they're used to from these vendors into concepts that are OpenTelemetry concepts. Okay, mark this as a log. Mark this as a span here. You want to measure the performance of downloading this image and deserializing it. Let's monitor that and wrap that with a span. So that's definitely become very difficult. I'm very jealous when I look at our own backends metrics when it's just you can trace from...

the endpoint that gets hit all the way down into the database and what the database is doing. I think with time that will come and we're hoping to expose some of these APIs to allow third-party vendors to provide instrumentation into their libraries using the OpenTelemetry API. But right now it's still early days. And so there's still a lot of manual instrumentation. You stole my next question. I was about to ask how common are

out to instrumentations or I'm not sure how it's called right if I work with Go often in backend hey I mean the best results are when I add the instrumentations you know myself but hey I can use this web server that comes with instrumentation done does that exist or how common it is in mobile?

So iOS and the history there is Objective-C was the main language of choice. The Objective-C runtime has this beautiful and harrowing thing called swizzling, which is where you're changing out a pointer to a method as the application is running. And it can be incredibly dangerous, but it's honestly not too difficult once you understand what's happening.

We can use swizzling to hook into methods to do some auto instrumentation. And so out of the box, a lot of observability vendors will provide auto instrumentation, but mostly around Apple's frameworks. And so Apple does a really good job at providing URL session or core networking to make a network request. That's probably the biggest...

thing that an observability vendor would instrument is when the application makes network requests. And so swizzling what a URL session is or URL session task for each individual request is very common. It's how we can hook in and provide some auto instrumentation. And when you're not swizzling, the system also provides some good notifications just to say, hey, I'm

On low power mode, you can query for that from the device. And low power mode on these devices, a lot of times that means that the system is throttling the CPUs and it's going to be using the efficiency cores on the CPU instead of the best ones. Or low memory warnings. Hey, the system is seeing a low memory event. Your application should try to eject some data.

or some caches that it has. So it just fires a notification and we can tap into that and just at least allow somebody to see, oh, a low memory event was fired here. And then, you know, maybe three steps away, the application received a SIG kill and the system had killed your app.

And that likely means that your app was causing some of that memory growth. And then the system decided, okay, let's terminate. This is not okay. And so there's a couple of different ways to do it. But yeah, there's a limited set of auto-instrumentation that most people will provide. And then we're looking to always add more, but then allow the API to be extended so that even if you just want to configure...

instrumentation in your application in a way that isn't in line in your code, hope maybe you would be able to do that by swizzling your own code or doing something that fits into whatever design patterns you're using in your logic. What do people do when they discover that there is something wrong with the release?

in mobile? I mean, this is, I guess, more of a generic question than Oto itself. You know, if it would be servers, okay, we messed up, this is leaking like crazy, let's apply a new release. I guess that's not how mobile works, right? Hopefully you see it. Hopefully you understand, and hopefully you see it before your users do. The app store reviews, I don't think, are a good thing to observe because that's a little too late.

If you see it, then you start prioritizing, okay, how widespread is this issue? Is it 100% of my users? Is it 3% of my users that have been around for six months or greater? Whatever it might be, hopefully, of the data to really understand how big that outage is. And then it's fixing it, diagnosing it, however quickly that can take. Hopefully, it's scheduled and you can put out a hot fix and the release process at your company is scheduled.

tried and true. And you can do that within hopefully a day or two. And then unfortunately, you might have submit a release to Apple and there's the app store review process. Android has a automated review process that takes a bit of time where you're waiting on them to verify and certify the app ready for the app store. And that process has come down in time. They've done a really good job at making that as short as possible, but it's,

It's harrowing to just wait and just, okay, is this going to be a day? Is this going to be...

two minutes. Let me just refresh this. And I don't know if I should go get a cup of coffee or if I need to sit here second by second and watch this thing. Once it does get approved, it'll probably be ready to be automatically submitted to the store. It then has to roll out through the app store and those users need to hopefully update. A lot of times the system will auto update these apps and that's becoming more common. But

But in the early days, the user had to manually go in and say, I need an update to this app. And if they weren't affected by it, or if they didn't realize they were affected by it, it would take them maybe a week or so to really go into the app store and update and realize that there's an update available. And so, yeah, it's a process. Again, I am super jealous of my server compatriots that can just say deploy and it's fixed because it's

We release things to the wild. Once it's out there, it takes on a life of its own and you have to be ready. It's like watching your kid go off to school. Not that I'm there yet, but they can do whatever they want or it's going to act on however it wants and you just have to hope that you've set it up for success. I'm guessing then probably if I would look for some stats that because of that whole process that is in big part out of your control, probably the release frequency of mobile applications must be much higher.

lower than backend, right? Because I don't have to worry that much with backend, right? Because yeah, I should have bug-free, zero vulnerabilities releases, but hey, if I mess it up, I can be up and running in two minutes. And from what I understood, in case of mobile, you don't have that luxury, right? No, absolutely not. The CICD process is not non-existent. It's really come a long way and their GitHub Actions is available in great

There's many vendors that now let you build with macOS runners. And so that helps a lot when you push in maybe a new release candidate is built and ready to go and maybe even submitted to TestFlight so that your test users have access to it as immediately as possible. But there's still a process. And for us as SDK developers, that process is extended a little bit because we put out a release and then we have to wait. Hopefully we haven't caused any issues, but it can occur.

And we have to wait for the applications that use us to take on the new source code and then go through the release process. And so it is a long pipeline for some of these fixes to be finished and completely behind you, making sure that you understand that.

upfront what is the limits of this outage how how widespread is this how big is this bug is very important and so that's probably why i got into observability and understanding like i've always loved being able to solve problems software i think is a really good way to solve problems now observing when problems occur gives me kind of a head start at solving those problems so you've

Rollout is in part in hands of Apple, Google, whomever, right? Is there some way that you can do... And I'm completely off topic now, but actually it is still related to observability. Is there some way to do some form of Carnari deployments and say, I'm going to roll out this to 5% of Apple users, iPhones...

Is it possible? I mean, I guess if it's possible, that would be possible because Apple allows it, right? Yes. Yeah, I believe so. I know on Android, it's much more common. And their Google Play workflow, I was familiar with it when I was in a past life working as an Android application developer. And that was very useful. I forget if at the time Apple had it.

But I believe since they purchased TestFlight now five or six years ago, I believe they've introduced that pipeline to allow these canary deployments. Because to me, that sounds like an amazing, especially since the release time is much longer, like amazing usage of FOTL and telemetry and what's not, right? That, hey, I'm getting my information and I can make a decision to continue rolling out, right? Yeah, absolutely. You were saying earlier,

You've got logs and traces today, and you waited until metrics for the third. He said, we can talk about that later. I think now is the later. Sure. Why is metrics last? It seems like to me, I could see logs always being first. That's pretty straightforward. No, in Hotel, it's last. Logs are last. Oh, I know. But in normal life, not in Hotel. Yeah. In normal life, you get logs first, probably metrics, and then traces, because traces are just hard. Well, yeah.

pre-Hotel. Why did you do metrics last? For us, it's mostly, it's really hard to pin that metric to something that's comparable and actionable. The use case that I like, that we really want to get into and start using is just a memory usage. And Xcode, as you're developing, when you're connected to a local debugger, gives you how much application memory is being used. If you connect instruments, then you can break it way down into every single allocation.

But that is a little heavy hitting for something that's deployed to the wild. But then...

What is difficult is if I'm on my device that's an iPhone 15 and I'm using 20 megabytes of application memory, if I switch over to a different device in a different session, that device might be running just as well in terms of the timing that we're seeing, but it might be using 30 megabytes of application memory for some reason. And so it's really hard to make that comparative and pin it down to...

something that the application developer can act on, but it is very useful information to have. And so we're working on figuring out how best can we display this information in a way that is centered around the user session and what the user is doing within the app. And also it makes sense. I can see, oh, when the user enters this view, that's when this application memory is spiking and I can do something.

That's the biggest thing that we've held off on and it's in active development. And so we're really excited to get it out there and into people's hands, but it's just why we came last. And mostly there are three things you kind of have to, you do have to pick an order. And for us logs was a point in time. Okay. That seems like the base case, simplest item traces was now two points in time, a start and an end. Okay. Let's, and then the tree structure, but that's not too difficult.

But that seemed like a good second case. And then now metrics is, you know, over the course of time at an interval, we'll take certain measurements and we'll figure out what that means. So that's simply why we put it third. It sounds like a very complicated thing to come to some conclusion from the perspective of person observing, right? Because if I go back to the service example, right?

It's okay if my application is using more memory because there is more available memory so I can let it fly, right? But then when things start burning in a server, then I don't want it and stuff like that. And I guess in mobile, you really don't know what else is happening in mobile, right? Yeah, exactly.

Am I using too little memory because other applications are running and the system is throttling it or because some other reason and so on and so forth?

Yeah, exactly. It's what other stuff is happening on the system could contribute just to the throttling that the system might do to a CPU or to the device. And then at what memory events am I getting is another indicator. Okay. There might be a memory issue going on. And so, yeah, we're really excited to start getting into that work. And it's definitely been requested a bunch because I think it'll be very useful if done correctly. And now we just have to figure out how can we do this correctly? Yeah.

I was just taking a look at the iOS SDK and it looks like you're fairly active in releasing it. I didn't take a look at the others. For your developers that are consuming your SDK. So if I was still writing iOS, if I was using Embrace, I see this coming out every five days, 10 days, whatever the case may be. What do you see as Embrace? How quickly do your developers actually pick up those new versions?

It's pretty varied, I'd say. We have some early adopters that are building from source and getting back to us in GitHub issues or their own pull requests, which are amazing. And then we have some large organizations that move as large organizations do. And it's a slow, steady, day-by-day improvement process, which is completely valid when you have...

an organization of that size and your app is that important to that many users. So it's very varied and we're very excited with the 6.X series. That's the embrace Apple SDK that you're looking at is now open source and modernized Swift design patterns and completely new project for us based on open telemetry. We, we have, uh,

Two main dependencies, the OpenTelemetry SDK, and then another one that I'll shout out is GRDB, which is a really good Swift wrapper around SQLite and how we do our storage or local storage. It's a steady improvement, and we hope that people take it on and try it out. So if you're out there listening, definitely check it out and tell us what's wrong. Tell us what's right. We love the feedback.

If somebody was to start with it today, what would you recommend? Source or just grab a library? Definitely source. I think SPM, Swift Package Manager, is the Apple's or the Swift language's new package management system. And that has superseded some of the old package managers that were available like CocoaPods or Carthage. It's very easy and integrated into Xcode to get started and check it out. And you can step through the source. You can see exactly how it works under the hood.

And you can provide really good feedback. Or if you'd like, you can fork it and make changes. If you'd like to keep those changes for yourself, feel free. I would love if you would help us out. I'm very much against that. Yeah. Or just take a look at how we've implemented certain things. And maybe when you're working on some of this auto instrumentation that we haven't gotten to yet, maybe you'd like to see how we have done some of the auto instrumentation that we do have and then model some

the classes that you write to match ours or to fit nicely with ours. But we're hoping that if you understand what an open telemetry tracer is, you can get started with tracing and create instrumentation that traces something. And then if you're familiar with open telemetry logs, you can get started with the logger and create instrumentation that writes out logs. What are you seeing about people using for backends?

Are they using their own? Are they using services? Do you even know? I don't know. So out of the box, you can connect to our backend and there's a free trial stage and all that. I think for us, when we test, we've used Grafana just because familiarity, but the OTLP exporter...

that we use in some of our examples in the documentation, that comes from the underlying OpenTelemetry SDK. And so that's not something that we've written. That is just, here's straight from the horse's mouth, the OTLP exporter. And that helps if you want to export anywhere that supports that OTLP protocol. When you pull the latest OTL library for you to consume, how many times has that blown up on you?

We haven't done it yet, but it is planned. We do have an upgrade to do. And I work on the OpenTelemetry SIG as well, so I've made some contributions to the Swift package myself, and members of our team have as well. And so we will upgrade soon. But most of it is because it's standard and because we're participating in the SIG, we

see what the changes have been. And I can tell you, we haven't upgraded because there's a database migration that we're going to have to do. And we're just holding off to make sure we choose the right version to upgrade so that we don't have multiple database migrations to do when that happens. That again is another thing that gets tricky when you're out in the wild. This isn't a database that you control. It's out on somebody's device. How important is it that you're active in the hotel SIG?

Oh, it's massive. Just for me, in my understanding of Open Hotel, just to have an avenue to ask questions, I sit on the client-side SIG every Tuesday and the Swift SIG every Thursday, and we discuss different things. The client-side SIG covers Android, iOS, and browser JavaScript stuff, and the Swift SIG just is for the OpenTelemetry Swift project itself. And it's great because the contributions made

and the proposals for semantic conventions that are made at the client-side SIG, a lot of those are applicable to mobile, and a lot of them Embrace has made. My coworker Hanson is...

Yeah.

we have to follow the semantic conventions. Otherwise, then it kind of breaks the whole idea. The purpose is that a network span looks like a network span. And if I send that to Grafana or some other vendor, it gets ingested and is readable as a network span. And so the ability to kind of limit vendor lock-in is also one of our big goals. And we can only do that if we participate in the community. So what I'm hearing is, if you weren't

participating in the community, you were going off and just doing your own thing, we would still have yet another old school Datadog scenario to where here's our library, go use our library. That's the only way you can use it, right? Yeah, I guess I have never been a Datadog user. Well, fill in the blank. It could be anybody, right? That's what I'm saying is if you had a completely proprietary example and you hadn't gone down the path of OTEL and really gotten into OTEL SIGs to really see what the community is doing,

Would embrace even be a thing at this point? I don't know. Right. I mean, it would be tough. Yeah. Five years ago. The answer is yes. Right. But today, not really. Yeah. No, the, the community is building around mobile and it's getting stronger just within the last six months that, that we've attended these meetings, six to eight months, we've seen growth in them and more and more contributions and more and more involvement, which is just amazing to see.

Even if we started at OpenTelemetry and we didn't participate, you just naturally would kind of walk your own way and you diverge naturally. Staying in these meetings and keeping up with what the community has decided these semantic conventions are, that means that we can limit that divergence and really stay as together as possible. What's interesting to me is you're focused fully on mobile. How many people are in the SIGs when you show up on Tuesdays and Thursdays?

The client side SIG is roughly eight to 12 any given week. And there are some core contributors that are there every week and run the meeting. And then people kind of come and go depending on their schedules. The Swift SIG, I think there's four or five of us that are

There every week and that's a much more consistent four or five we've had a couple people pop in and just ask a question or there's a slack community for cncf and open telemetry has channels in there so the open telemetry swift channel is available to ask questions.

And a lot of times the conversation might start in an issue or pull request or Slack, and then we'll ask somebody, oh, can you join this meeting to talk with us in person about this just so that we, you know, face-to-face time. It's just unmatched and we don't have this game of telephone. It's handful here, handful there. But if anybody out there is listening, please join. Even if you join and just listen in, it's pretty valuable. I find it valuable.

And if you have any contributions or any ideas that arise during the meetings, it's very easy to just unmute and start talking. And it's a very friendly environment. Do not be worried about not feeling like you belong there or something. Everybody is super friendly and super nice. So Austin works for Embrace. We've talked about it, embrace.io. Obviously, Embrace puts out SDKs that can be used. But what else does Embrace do?

Embrace is a mobile app observability platform based on open telemetry. And so our SDKs are open source. From them, you can export to any OTEL compatible tool. We try to integrate as seamlessly as possible with these vendors so that your data can be wherever you want it to be. If you're a DevOps person,

an SRE monitoring servers and you have a mobile team and that is kind of a black box to you, you don't understand what's happening in the mobile space, you could use a tool like ours to start observing some of that application performance and start monitoring how well it's doing out in the wild.

We've got a Slack community that just started a couple weeks ago where people can start talking about open telemetry and what we do and how we interact with it. Or you can join the CNCF Slack communities and talk to me there and reach out there to any of the SIGs. Each SIG has its own Slack channel, which is a really great opportunity to start the conversation. That's mostly what Embrace does. Yeah.

And when Austin says the SDKs are open source, it's legit open source as an Apache 2.0 licensed open source. So all of Austin's contact information is going to be down in the episode description. Again, Embrace is at embrace.io. Austin, thanks for being with us today. Thanks for having me. We hope this episode was helpful to you.

If you want to discuss it or ask a question, please reach out to us. Our contact information and a link to the Slack workspace are at devopsparadox.com slash contact. If you subscribe through Apple Podcasts, be sure to leave us a review there. That helps other people discover this podcast. Go sign up right now at devopsparadox.com to receive an email whenever we drop the latest episode. Thank you for listening to DevOps Paradox.

♪

DOP 283: OpenTelemetry Meets Mobile 41:21 Share

DevOps Paradox

Deep Dive

Shownotes Transcript

DOP 283: OpenTelemetry Meets Mobile