2.5 Admins 227: Six Day Certs

2024/12/26

2.5 Admins

AI Deep Dive AI Insights AI Chapters Transcript

People

Alan

Jim

专注于 IT 自动化和网络安全的技术专家

Joe

面临上水汽车贷款，寻求多种解决方案以减轻财务负担。

Jonas

Topics

@Joe : Windows 原生 SSH 客户端发送的遥测数据过于激进，实时发送连接信息等元数据，类似于警方获取通讯记录。虽然理解微软收集这些数据的动机，例如了解客户端软件和密码套件的使用情况，以便改进和淘汰旧技术，但其实时性令人担忧。建议使用其他 SSH 客户端，例如 OpenSSH 或 PuTTY。 @Alan : 微软收集大量遥测数据，但其内容和触发条件并不总是透明的。Windows SSH 客户端的遥测数据易于被网络管理员发现，因为它会实时发送版本信息等元数据。该遥测数据包含过多细节，例如支持的加密算法和 MAC，甚至包含主机密钥信息，这显得过于冗余。 @Jim : 建议使用其他 SSH 客户端替代 Windows 内置客户端，以避免发送遥测数据。如果必须使用 Windows 内置 SSH 客户端，可以从 GitHub 获取项目代码自行构建，以禁用遥测功能。 Jim: Let's Encrypt 将提供有效期为 6 天的证书，以应对浏览器厂商推动证书有效期缩短的趋势。对于已自动化证书续期的用户，6 天证书的影响较小；但对于未自动化证书续期的用户，6 天证书将增加工作量。Let's Encrypt 需要扩展基础设施以应对 6 天证书带来的挑战。6 天证书需要更频繁的监控和更快的响应速度。Let's Encrypt 将推出 6 天有效期的证书，但 90 天有效期的证书仍将保留一段时间。推出 6 天证书是明智之举，有助于适应浏览器厂商的政策。缩短证书有效期是安全方面的改进，可以降低被盗密钥的风险，并促使更好的自动化和监控。6 天证书政策是对所有让 SSL 证书过期的人的一种“严厉的爱”。 Alan: 如果拥有域名但不使用其发送邮件，则应创建 SPF 和 DMARC 记录，以减少垃圾邮件并保护域名声誉。没有 SPF 和 DMARC 记录可能会损害域名声誉，导致邮件投递问题和防火墙拦截。正确配置 DMARC 可以接收有关冒充域名的垃圾邮件报告。不正确的域名声誉可能会影响 SEO。互联网通信具有对抗性，与垃圾邮件发送者关联会带来负面影响。关于 Synology NAS 性能问题，可能是由于 NFS 连接单线程、Synology 的 Btrfs 或硬盘问题导致的。如果 NAS 本地也出现延迟，则问题可能出在 Synology 的 Btrfs 或硬盘上。网络饱和可能会导致文件传输过程中的延迟，Wi-Fi 的半双工特性会加剧网络饱和情况下的延迟。FreeBSD 和 Linux 上的 NFS 可以利用服务器端文件复制来优化性能。接近饱和的硬盘写入速度会造成高延迟，高负载下的硬盘写入会造成高延迟。RAID 1 镜像在写入时只会使用一个硬盘，这会降低性能。

Deep Dive

Key Insights

Why is Microsoft's SSH telemetry on Windows considered aggressive?

Microsoft's SSH telemetry sends information about every SSH connection in real time, including client versions and cipher suites. While it doesn't track who connected, the instantaneous nature of the data collection feels invasive, similar to how law enforcement collects call metadata. This level of detail, especially for server-side connections, is seen as excessive and unnecessary for legitimate purposes like retiring outdated ciphers.

What specific data does Microsoft's SSH telemetry collect?

Microsoft's SSH telemetry collects the version of SSH used, remote protocol error lists, peer versions, supported ciphers, compression mechanisms, message authentication codes, and proposed host keys. This includes detailed metadata about the connection setup, which is more information than Microsoft would reasonably need for operational purposes.

Why is Let's Encrypt introducing six-day certificates?

Let's Encrypt is introducing six-day certificates to enhance security by reducing the validity period of SSL certificates. This move aligns with industry trends, such as Apple and Google pushing for shorter certificate lifespans to prevent misuse of expired domains or stolen keys. Automation makes frequent renewals manageable, and shorter cycles ensure compromised certificates are invalidated faster.

What challenges does Let's Encrypt face with six-day certificates?

Let's Encrypt must scale its infrastructure to handle a significant increase in certificate issuance, potentially up to 100 million certificates per day in the future. Additionally, shorter validity periods require better monitoring and faster response times to renewal failures, as there is less buffer time to address issues before certificates expire.

Why should domain owners set up SPF and DMARC records even if they don't send email?

Setting up SPF and DMARC records prevents spammers from using the domain for phishing or spam, protecting the domain's reputation. Without these records, spam sent from the domain can lead to blacklisting, affecting future email delivery or even causing firewalls to block the domain entirely. Proper DMARC configuration also allows domain owners to receive reports on fraudulent email activity.

What causes sluggish performance during file transfers on a Synology NAS?

Sluggish performance during file transfers on a Synology NAS is often due to network saturation, especially when using NFS over a 1Gbps connection. If the drives are CMR and not SMR, the bottleneck is likely the network or the single-threaded nature of NFS. Running IOstat or checking local performance via SSH can help identify if the issue is with the drives or the network setup.

How can NFS performance be improved during file transfers?

NFS performance can be improved by enabling server-side file copying, which avoids the need to transfer data over the network. On file systems like ZFS or ButterFS, using features like reflinks or block reference trees allows files to be cloned locally, reducing network load. Additionally, ensuring the network is not saturated and using multi-threaded NFS configurations can help.

Chapters

The episode discusses the telemetry data sent by Windows' built-in SSH client to Microsoft. Concerns are raised about the amount of data transmitted and the real-time nature of the transmission, questioning whether it's an overstep of privacy.

Windows' SSH client sends telemetry data to Microsoft upon each connection.
The telemetry includes connection metadata such as client and server versions, cipher suites, and host keys.
Concerns raised about the real-time transmission and potential privacy implications.

Shownotes Transcript

Translations:

中文

Bye.

Two and a half admins, episode 227. I'm Joe. I'm Jim. And I'm Alan. And here we are again. And before we get started, you've got another plug for us, Alan. Winter 2024 Roundup, Storage and Network Diagnostics. Yeah, this is a collection of articles and stories from Clara about storage and networking and diagnosing different problems and just how to work around them. So we have interesting cases we solved on storage and interesting constructs for networking to be able to do things like

Inside one computer, how do you simulate a bunch of conflicting networks in a way that they can all work together? Right, well, link in the show notes as usual. Someone messaged you on Mastodon recently, Jim, with a link to a video from Defcon32 about SSH on Windows sending telemetry to Microsoft. Yeah, it turns out that every time you shell into a Windows server running the native Microsoft-made SSH server,

It actually sends information about the connection off to the mothership, lets Microsoft know when somebody connected and what client version they connected with. And it caught my attention because, you know, I'm usually and fairly famously not the first one to complain about telemetry, but...

This seems a little aggressive to me. I can see some value in knowing what client software and what cipher suites get selected and knowing when you can retire support for older things, especially on the server side. Like on the client side, there's not as much risk. But on the server side, if you know that, you know, less than 0.1% of incoming connections try to use these old ciphers, then maybe you can finally get rid of them. And I know that's...

always a thing it's unclear for the people making open source SSH it's like are people actually done using this although they've taken the stronger stance of we don't care if you're done using it or not it's not secure so we're not supporting it anymore so you know upgrade your shit but I can see why they would want it and the telemetry doesn't track actually who connected and so on but it does still seem to be a bit of an overstep I don't like the instantaneous nature of it it doesn't feel like there's there's really an attempt to

To go above and beyond to obscure the original and potentially private information, it seems like Microsoft just like, well, we can get away with getting this, so we'll take this.

It would feel a little different to me if it was a case of, oh, you know, once a month or so, Microsoft will phone home and say, these are the ciphers that we saw in use over the month or something. But just the idea that like every time somebody connects in real time, Microsoft gets an update, letting it know that that connection happened and a fair amount of metadata about the connection. It kind of reminds me of, you know, the way that modern police work tends to handle telecommunications, which is,

They don't necessarily even care what you're actually saying on the phone. The first thing that they want is just the subpoena of the records. They want to know what calls happened and when they happened and the metadata about that call.

And it just, it feels like that is essentially what Microsoft is getting here. Is this not pretty standard practice for Microsoft software though? A lot of telemetry is certainly standard practice for Microsoft, but there's some interesting differences here. One of the issues is that although we know Microsoft takes a ton of telemetry, we don't always know what that telemetry is because when we're talking about things that are purely internal to Windows,

It's kind of hard to isolate like what actually triggers an outbound connection to deliver some telemetry somewhere. We don't have any source code to examine and we don't necessarily know exactly what's happening when. So all we get is this kind of uneasy feeling. We know there's a lot of telemetry and we don't know all the details about it.

But now you bring something like SSH in and it becomes a little easier to see what's going on. You know, if you're an interested network administrator, you don't exactly have to be a genius to figure out every time somebody connects to the SSH server, this many packets go winging off to a Microsoft owned IP address.

Looking at the snippet of source code to show in the video on slide 52, there's a function called send SSH version telemetry. And it basically sends, quote, our version of SSH, the remote protocol error list and the peers version of SSH. So that would just be the version strings that go back and forth. But really, to Jim's point, the fact that this gets sent instantaneously rather than kind of aggregated and batched means that

Somebody knows when that happens. So even if you're, you know, SSH-ing on an odd port and it's all encrypted so nobody can tell what's going on, just the fact that whenever we see a packet come in on this port and the connection gets set up, then we always see a packet out to Microsoft. So we know it must be SSH. Helpfully, Microsoft publishes on their PowerShell GitHub account the copy of OpenSSH-Portable that they've mangled for Windows and they've

mostly kept their code separate under a contrib win32 directory. And looking at the actual ssh telemetry.c file, you can see in more detail the telemetry they're sending. And based on that, it would appear this applies to both the client and the server on Windows. So if we're just using the built-in SSH client on Windows, Microsoft is telling themselves about the version

of SSH that every machine you connect to is using. But it also looks like they said in quite a bit of detail about the setup of the message, including what ciphers you offered, what ciphers the server offered, which compression mechanisms were available, which MAC message authentication codes were possibly being used.

So not just what ones were selected, but what each end supported as well, which seems quite excessive. Also, very specifically proposed host keys on both client and server end, which is...

a unique fingerprint that says for absolute certain, if you know who held those keys, that like, you know, this is who was connecting right then. Well, the host keys would be who they were connecting to. But yeah, you know, when you normally SSH to a new host and you get that warning that, hey, I've not seen this pub key before. If Microsoft is getting every one of those, then yes, they would be able to probably just from the fingerprints collected by things like Shodan, be able to know which server you actually SSH-ing to if it was anything remotely popular and internet accessible.

I'm not sure actually in the protocol if that is going to actually be the host key or just the different algorithms for the host keys. But either way, it's excessive. But if it's the actual public key of the host key, that is kind of egregious rather than just excessive. Well, it's certainly a variable named client proposed host key and server proposed host keys.

Right, so the client doesn't actually send a host key. They just propose which host key algorithms they understand. You'd have to run SSH with a lot of dash Vs and actually watch the exchange back and forth to see what normally gets sent there. But it's a lot more information than Microsoft could really ever legitimately need.

So if you really want to use SSH on Windows, you should probably get the real OpenSSH from upstream OpenSSH and run that on Windows or run it under Windows Subsystem for Linux, where you're going to get the official Ubuntu package of SSH. And it's not going to have all the Microsoft telemetry in it or just use a different open source client like PuTTY or something. But...

I'd probably avoid the built-in SSH client. You know, it was nice to finally have that as a convenience feature, but it turns out Microsoft spoils everything. If you do want to use that specific client, but without the telemetry, you can fetch the project from GitHub yourself and build it and use your own binaries. At least as long as you trust the project to be telling you the truth when it says that

Any binaries built from the code on GitHub will have the telemetry disabled. It's just the ones built for Windows that have the telemetry manually enabled. Well, I'm guessing that's so that I don't build a version of it and just flood Microsoft's telemetry with fake nonsense until they decide it's not useful anymore. Well, there's nothing stopping you from turning on the flag that enables the telemetry when you build your copy, if that's what you want to do. And my copy just says, everybody's running OpenSH version 5,000.

Surely version 4.20.69. Coming soon to Let's Encrypt will be certificates that last six days, as opposed to the 90 days, which is the standard right now. Yeah, this is interesting. We talked about a couple of weeks or months ago, Apple and I think Google wanting to get that down to all certificates. The browsers wouldn't accept any certificate longer than 45 days to force

certificates to have to keep getting reissued to make sure that nobody's managing to have an SSL certificate for an expired domain or just to stop bad actors much more quickly. And if you've implemented the automation for Let's Encrypt and are doing this every 90 days, there's really no difference between doing it every 90 days and doing it every four days. You know, they're talking about having certificates that are good for about six days. So if you renew it at about two thirds of its life, like you normally would now, then you're

down to just a couple of days. But with that automation, it's not really costing you anything extra to do that because you've already built the automation. Whereas we were talking about from the Apple and Google one, if you're already slightly suffering from the fact that they shrunk the validity period to just over one year, going down to 45 days takes it from this annoying thing we do once a year to something we have to do basically every month. And it would

continue to push you to properly automate it.

And so Let's Encrypt getting down to that short seems interesting to me. And it sounds like a kind of logical evolution. But in their post here, they're talking about some of the challenges that will pose, considering that if they have to start issuing 20 times as many certificates as they do now, how do they scale their infrastructure for that? And saying, you know, at the current point, it's not inconceivable that by a decade from now, they'd have to be issuing 100 million certificates per day. And it's like, that seems like a really big number. But...

Thinking about it, 10 years ago, we wouldn't have thought that 5 million certificates a day was possible either. I agree with you. The automation makes it not seem like that big of a deal to the degree that it does seem like a challenge. It is kind of nice now feeling like, well, if something goes wrong within renewal process, it's going to go wrong soon enough that, you know, I'll have like a month to get it fixed before it actually bites me.

Whereas with a six-day cycle, you know, if you're normally renewing four days into a six-day cycle, now it's like, well, you better be on it if something's gone wrong. Like you need to detect that issue very quickly and you need to respond to it promptly or it will bite you in the butt a lot quicker. Yeah.

And I'm not saying that that's something that you can't deal with. I'm not saying that it's an unbearable burden, but it is something to be aware of. Especially if your certificate validity period is six days, you probably want to renew every two days or so in order to make sure that you have that bit of buffer time. So if it doesn't work, you have time for A, it to automatically try again and maybe work, and that you still have some time to respond if that also doesn't work. It means you've really got to up your monitoring game, essentially.

Well, in my case, it would be just change the threshold by monitoring, right? My Nagios is already monitoring all my important domains and tells me when the certificate's getting anywhere near its expiration. And so if it's a less encrypt certificate, as soon as it's down to less than 30 days, it's like, hey, that's not supposed to happen at.

I think it's 30 or between 30 and 45 days, it's supposed to auto renew with another newer one and extend the lifetime out. And if it's not doing that, then that's an alert and that gets addressed. And luckily I haven't had one of those in ages. But to Joe's point, the vast majority of people are not tracking their SSL cert expiration dates that way. I'm generally doing the same thing. Honestly, I could do a better job of it than I've been doing because again, you know, even with Let's Encrypt, like you have a little bit of time to figure it out, but

But as these cycles get shorter and shorter, I think it's going to get more and more important to be like, no, that needs to go to the forefront of it. And also, we may be looking at some indications that just like the kind of monitoring that Alan and I talk about that most people like, well, that sounds nice, but, you know, I have time for that. We may be getting towards a period where like that's just kind of the entry level technology.

to get into doing things professionally at all. It is worth saying that in this post, they say, our longstanding offering won't fundamentally change next year, but we are going to introduce a new offering that's a big shift from anything we've done before, short-lived

certificates with a lifetime of six days. So it sounds like at least initially the standard 90 day ones will still be available. Yeah, at least initially. But who knows if the six day ones go well, then that may well become the standard and the 90 day ones may get phased out. Well, in particular,

It seems to make sense for them to have this technology ready, specifically with the CA slash browser forum talking about shortening certificate lifetime maximum that browsers will refuse any certificate that has a lifetime of more than 45 days. It really makes sense for Let's Encrypt to have this up and running where people can already be deploying it and make sure it works before it becomes basically mandatory. Otherwise, browsers won't accept the certificates. So I think it's a really smart move and will benefit

both the implementation of shorter lifetime certificates for everyone, but especially the community of us professionals that have to manage this stuff all the time to already have experience with it before it becomes mandatory. I feel like this is a good opportunity for monitoring as a service companies to do quite well. Quite possibly. And I know Let's Inc. tried to do a little bit of something with email notifications early on. I don't know if they still do that. But yeah, I definitely think that

building on Jim's point that if this becomes the standard you have to do to be able to run a domain and so on, people will either set up that infrastructure themselves or turn to companies to do that. Or even, you know, the interesting other one we've seen is people that

aren't actually controlling their own Let's Encrypt at all. You know, if you use a lot of popular like WordPress hosting platforms and so on, because they can put the file in the right, in the well-known directory to do it, they do the Let's Encrypt for you and you don't even know that you're using Let's Encrypt. Your web hosting provider takes care of it. Yeah, it's just free SSL is what they call it. Yeah, whereas in the past that involved them actually having to pay money for certificates. And now they're like, well, we could just abuse Let's Encrypt. Yeah, now it's just a cron job. It's free real estate.

But ultimately, we're in favor of this then. Yeah. When it comes down to it, the shorter security certificate release cycle is a security win.

At least as long as you trust the distribution mechanism more than you trust the general ecosystem of servers, many of which get abandoned and compromised and keys can be stolen from because that's the reality of it is that the world is full of compromised servers that just get like abandoned and left alone and nobody ever does anything about the fact they've been compromised and get hijacked and

used for different things or the keys can be stolen from them. So the less time that those keys will remain valid for some use other than what was originally intended, the better off everybody is. Yeah, although the crontab will probably still run. Yeah.

renew the certificate so maybe it won't help that much but it's definitely interesting to see and I think will be good in the end I don't know if it will cause fewer problems we talked about again months ago when Microsoft accidentally let some certificates expire and it caused a whole bunch of things to not work anymore you know

If that can happen anytime somebody doesn't look at something for six days straight, then that might increase the chances of that. But at the same time, to our point, if it requires that level of automation and it's going to break that quickly, that's going to be a much better forcing function for people having better automation and more monitoring on it to catch it before that happens. So if it expires every six days, you're probably going to keep a closer eye on the expiration than when it expires every...

one to five years and nobody remembers about it. Alan, it sounds like you're supporting this policy specifically as an example of tough love for Microsoft. Not just for Microsoft, for everybody who lets their damn SSL certificates expire.

Which in the past has included myself. It seems like this is a slight cranking of the convenience and security dial, slightly away from convenience, slightly towards security. Yeah, and I think that in this case, that's a good thing.

Okay, this episode is sponsored by ServerMania. Go to servermania.com slash 25A to get 15% off dedicated servers recurring for life. ServerMania is a Canadian company with over a decade of experience building high-performance infrastructure hosting platforms for businesses globally. ServerMania has up to 20 gigabits per second network speeds in eight locations worldwide for optimized global reach.

They have flexible custom server configurations tailored to unique needs, as well as a personal account manager offering free consultations. With 24-7 live chat support and support ticket response times under 15 minutes, you can always keep your systems running smoothly.

Alan's been a happy ServerMania customer for over seven years, so support the show and join him at a hosting provider that truly delivers. Get 15% off dedicated servers recurring for life at servermania.com slash 25A and use code 25ADMINS. That's servermania.com slash 25A and code 25ADMINS.

I saw an interesting post from Jerry Lerman on Mastodon, which was an important reminder if you own a domain name and don't use it for sending email. Essentially, the message is that if you own a domain and you don't plan on using it to send email yourself, you should create SPF and DMARC records in the DNS that makes it clear that anybody who does receive mail claiming to be from your domain, well, it's not legitimate.

And the reason that you do this is twofold. One, because you're a good Internet citizen and you want to cut down on the amount of spam that reaches people's inboxes. And that's a good way to do it. And two, because that way, that spam that has your domain name on it won't be tarnishing the reputation of your domain.

If a lot of this spam gets through and there aren't SPF and DMARC records making it clear that that spam is not legitimate and was not sent by the actual owner of the domain, then what can happen is that trashes the reputation of the domain, which can not only cause you problems with delivering email down the line, should you choose to attach email services.

It can also cause issues that will cause firewalls to block you outright. There are a lot of corporate firewalls out there that actually use email spam block lists as just one hit kill. No, you can't go to that a record, you know, type firewalling. So you don't want to run afoul of these kinds of lists and setting SPF and DMARC that says, hey, any email that you get from my domain, it's not for me. That's not for me.

That's what keeps you from taking that reputational hit. Yeah, so basically setting an SPF record that says any email is junk and a DKIM that's saying anything not signed coming from this domain is junk. But what they point out that's interesting is if you actually set up DMARC properly, you can get reports about all the email that does get dropped as spam pretending to come from your domain. If you're just curious, who's trying to pretend to be you?

Actually, if you set up DMARC properly, you have to get those reports because if you don't actually include an email address that does accept email for those reports to go to, you have not properly configured DMARC.

which can be a mild annoyance. You know, I set up DMARC on my own domain and I get reports from Google and Microsoft every day. And about once or twice a week, I'll get a report from some random provider I've never heard of. And they're never really anything that you want to sit down and read, but you get them. You mentioned corporate firewalls. Presumably it's bad for SEO as well. Yes, absolutely. I mean, just in general, you don't want to be associated with spamming operations.

An awful lot of Internet connectivity and communication in general these days is very adversarial in nature. You know, you have filtering mechanisms that are explicitly designed to keep out the attacks and, you know, let in the good stuff. So anything that's associating you with, you know, people who are on the attacker side of things, it's going to make your life difficult in many aspects.

and somewhat unpredictable ways as you discover that as your name and your reputation spreads farther and it gets poorer and poorer, there are more and more different ways that people will find themselves blocked off from accessing your services.

Let's do some free consulting then. But first, just a quick thank you to everyone who supports us with PayPal and Patreon. We really do appreciate that. If you want to join those people, you can go to 2.5admins.com slash support. And remember that for various amounts on Patreon, you can get an advert-free RSS feed of either just this show or all the shows in the Late Night Linux family. And if you want to send in your questions for Jim and Alan or your feedback, you can email show at 2.5admins.com.

Jonas writes, I'd love to hear your thoughts on my NAS situation. My problem is that I find it slow during transfers. I have the folders mounted on my computer using NFS, and I typically copy or move a movie from one folder to another. Sometimes this is on the storage pool, sometimes it is not, but in both cases it feels extremely sluggish.

When the files are being written, it is almost impossible to browse to a new directory until the transfer is complete. The folder simply doesn't load. I'm connected to the NAS via a switch with 1Gbps. The maximum write speed I get is 115MBps, which maxes out the network, but not the drives. Is it my setup, or am I crazy to expect a responsive system during this?

The NAS is a Synology DS920 Plus with four bays. I think the easiest way to figure out what's the bottleneck here is while such a copy is going on and the drives are busy with the writes, if you SSH into the Synology and see if you can browse around the directories and run LS and if that's stalling or not. It could be the NFS connection is single threaded and so nothing else can happen until the writes finish and that's what's going on there and look

Looking at getting more threads for your NFS setup could solve it. But if you do see sluggishness locally on the NAS while doing the copy, then the problem is much more likely in Synology's use of ButterFS or just your drives. One thing to look at there is running IOstat and seeing if the drives are actually constantly busy or if they have...

some idle time that you could be using to do the browsing, but it's not happening. In addition to that, I think it's worth pointing out that the user themselves acknowledges they are saturating the network.

And they've already said that if they shell in and move the files locally, they don't see the slowdown. It's only when they're doing it, you know, via the NFS mount. So what's happening is you're doing a brute force copy over the network. When this happens, you're saturating the network in particular, if there's any Wi-Fi in this link. Now you're talking about 150 megabytes a second. So probably the NAS itself is not on Wi-Fi, but like if you're accessing this from a laptop over Wi-Fi and you're already saturating the network, then

You can have some real latency issues just issuing additional commands because you're already saturating the network. And with Ethernet, you've got bidirectional, your full duplex. So you can upload at the same time you're downloading at full saturation. But if you're going over Wi-Fi, Wi-Fi is half duplex, not full. And if you're saturating it, then everything is going to suck until you stop.

Now, Jonas did specifically say that the drives are CMR, so we're not dealing with SMR drives here. Yeah, because I can see how that could be a quick suspicion when you hear drives that take a really long time to do things while you're writing to them. Because SMR drives will do that when they have to garbage collect and move a bunch of 256 megabyte chunk of something to another SMR zone. But no, these are CMR drives.

One thing that occurs to me is that he was specifically talking about, you know, moving files, which is going to have to happen as a brute force copy operation both ways over the network. Do you know of any way to convince NFS to do server side copies instead? Not on Linux off the top of my head, but.

On FreeBSD, NFS does respect the copy file range call, which will, if you're using ZFS or the same thing on Linux, if you're using ButterFS, would be able to use the file cloning mechanism. So BRT in ZFS, or I guess just reflinks in ButterFS to be able to basically clone the file instead of copy it and have that all happen on the server side so that no one

nobody has to read the file to make a copy of it. And then when you delete the old one as part of the move, you just decrement the ref link by one or the block reference tree counters by one.

Clara also implemented this for Samba. So Samba on Linux and FreeBSD will now do this automatically on ZFS. And I think it already did it on ButterFS, where if you are doing a copy or a move, the server can see what you're doing and optimize it and use the file cloning feature in both of these file systems.

All right, now before we move on, there is one last thing I just noticed that I want to bring up. And I feel like this is a mistake that Al and I both made. We should have caught and didn't because we're kind of used to larger systems and thinking in a larger scale.

We already heard that the maximum write speed that they get is 115 megabytes per second. And for either myself or Alan, that sounds like immediately it's like, oh, well, yeah, you've saturated the network and we kind of stopped thinking about it. That's a network problem, not a storage problem. However, the actual topology here is just a pair of Rust drives, which means 115 megabytes per second write speed is getting real close to the absolute maximum those drives can actually accomplish as well.

With a really easy workload, a small Western Digital CMR drive might hit about 150 megabytes per second on a local write operation with no other bottleneck. Now, the thing about that is if you come that close to saturating a Rust drive with a very heavy write stream, I don't care what you do. Yes, anything else is going to have awful latency as that drive is trying to manage the storm of data that it's got to move around.

So yeah, if you are actually at that very moment performing writes at 115 megabytes per second on a simple two-rust drive mirror, you should absolutely expect it to be very pokey to browsing while that's going on. Worse, it looks like it's one six-terabyte drive and one four-terabyte drive, and the pool has 3.5 terabytes on it and is 75% full, meaning it's probably not a mirror. It is some kind of

ButterFS skunk raid. So it's probably using both drives in tandem and limiting the IOPS and the performance even worse. So exactly Jim's point that that might be the most those drives are physically capable of, and they're going to have really high latency when you try to read while they're busy writing.

That's actually going to be a flat butter volume on top of an MD RAID 1 from the look of it, being a Synology machine. And I think that when users said it was a 3.5 terabyte, what they're saying is they see 3.5 terabytes usable, which is slightly smaller than the 3.6 terabyte smaller drive of the pair.

So it probably genuinely is 75% full. That's fine. Like you don't have much more headroom before you start hitting real problems. By the time you hit 80%, you should be looking at some way to increase your capacity or you will start hitting very significantly increased fragmentation issues on Rust. But you're not there yet. That's not the problem. But again, like I said, if all you've got is the performance of one drive, which is all

all a RAID 1 mirror is going to give you when you're doing writes, and you're coming that close to saturating the drive and throughput, then yes, your small block operations, like trying to list the contents of a directory, they are going to be incredibly laggy until that massive write finishes. Right, well, we better get out of here then. Remember, show at 2.5admins.com if you want to send any questions or feedback.

You can find me at joerrest.com slash mastodon. You can find me at mercenariesysadmin.com. And I'm at Alan Jude. We'll see you next week.

2.5 Admins 227: Six Day Certs 28:35 Share