Microsoft's SSH telemetry sends information about every SSH connection in real time, including client versions and cipher suites. While it doesn't track who connected, the instantaneous nature of the data collection feels invasive, similar to how law enforcement collects call metadata. This level of detail, especially for server-side connections, is seen as excessive and unnecessary for legitimate purposes like retiring outdated ciphers.
Microsoft's SSH telemetry collects the version of SSH used, remote protocol error lists, peer versions, supported ciphers, compression mechanisms, message authentication codes, and proposed host keys. This includes detailed metadata about the connection setup, which is more information than Microsoft would reasonably need for operational purposes.
Let's Encrypt is introducing six-day certificates to enhance security by reducing the validity period of SSL certificates. This move aligns with industry trends, such as Apple and Google pushing for shorter certificate lifespans to prevent misuse of expired domains or stolen keys. Automation makes frequent renewals manageable, and shorter cycles ensure compromised certificates are invalidated faster.
Let's Encrypt must scale its infrastructure to handle a significant increase in certificate issuance, potentially up to 100 million certificates per day in the future. Additionally, shorter validity periods require better monitoring and faster response times to renewal failures, as there is less buffer time to address issues before certificates expire.
Setting up SPF and DMARC records prevents spammers from using the domain for phishing or spam, protecting the domain's reputation. Without these records, spam sent from the domain can lead to blacklisting, affecting future email delivery or even causing firewalls to block the domain entirely. Proper DMARC configuration also allows domain owners to receive reports on fraudulent email activity.
Sluggish performance during file transfers on a Synology NAS is often due to network saturation, especially when using NFS over a 1Gbps connection. If the drives are CMR and not SMR, the bottleneck is likely the network or the single-threaded nature of NFS. Running IOstat or checking local performance via SSH can help identify if the issue is with the drives or the network setup.
NFS performance can be improved by enabling server-side file copying, which avoids the need to transfer data over the network. On file systems like ZFS or ButterFS, using features like reflinks or block reference trees allows files to be cloned locally, reducing network load. Additionally, ensuring the network is not saturated and using multi-threaded NFS configurations can help.
Bye.
Two and a half admins, episode 227. I'm Joe. I'm Jim. And I'm Alan. And here we are again. And before we get started, you've got another plug for us, Alan. Winter 2024 Roundup, Storage and Network Diagnostics. Yeah, this is a collection of articles and stories from Clara about storage and networking and diagnosing different problems and just how to work around them. So we have interesting cases we solved on storage and interesting constructs for networking to be able to do things like
Inside one computer, how do you simulate a bunch of conflicting networks in a way that they can all work together? Right, well, link in the show notes as usual. Someone messaged you on Mastodon recently, Jim, with a link to a video from Defcon32 about SSH on Windows sending telemetry to Microsoft. Yeah, it turns out that every time you shell into a Windows server running the native Microsoft-made SSH server,
It actually sends information about the connection off to the mothership, lets Microsoft know when somebody connected and what client version they connected with. And it caught my attention because, you know, I'm usually and fairly famously not the first one to complain about telemetry, but...
This seems a little aggressive to me. I can see some value in knowing what client software and what cipher suites get selected and knowing when you can retire support for older things, especially on the server side. Like on the client side, there's not as much risk. But on the server side, if you know that, you know, less than 0.1% of incoming connections try to use these old ciphers, then maybe you can finally get rid of them. And I know that's...
always a thing it's unclear for the people making open source SSH it's like are people actually done using this although they've taken the stronger stance of we don't care if you're done using it or not it's not secure so we're not supporting it anymore so you know upgrade your shit but I can see why they would want it and the telemetry doesn't track actually who connected and so on but it does still seem to be a bit of an overstep I don't like the instantaneous nature of it it doesn't feel like there's there's really an attempt to
To go above and beyond to obscure the original and potentially private information, it seems like Microsoft just like, well, we can get away with getting this, so we'll take this.
It would feel a little different to me if it was a case of, oh, you know, once a month or so, Microsoft will phone home and say, these are the ciphers that we saw in use over the month or something. But just the idea that like every time somebody connects in real time, Microsoft gets an update, letting it know that that connection happened and a fair amount of metadata about the connection. It kind of reminds me of, you know, the way that modern police work tends to handle telecommunications, which is,
They don't necessarily even care what you're actually saying on the phone. The first thing that they want is just the subpoena of the records. They want to know what calls happened and when they happened and the metadata about that call.
And it just, it feels like that is essentially what Microsoft is getting here. Is this not pretty standard practice for Microsoft software though? A lot of telemetry is certainly standard practice for Microsoft, but there's some interesting differences here. One of the issues is that although we know Microsoft takes a ton of telemetry, we don't always know what that telemetry is because when we're talking about things that are purely internal to Windows,
It's kind of hard to isolate like what actually triggers an outbound connection to deliver some telemetry somewhere. We don't have any source code to examine and we don't necessarily know exactly what's happening when. So all we get is this kind of uneasy feeling. We know there's a lot of telemetry and we don't know all the details about it.
But now you bring something like SSH in and it becomes a little easier to see what's going on. You know, if you're an interested network administrator, you don't exactly have to be a genius to figure out every time somebody connects to the SSH server, this many packets go winging off to a Microsoft owned IP address.
Looking at the snippet of source code to show in the video on slide 52, there's a function called send SSH version telemetry. And it basically sends, quote, our version of SSH, the remote protocol error list and the peers version of SSH. So that would just be the version strings that go back and forth. But really, to Jim's point, the fact that this gets sent instantaneously rather than kind of aggregated and batched means that
Somebody knows when that happens. So even if you're, you know, SSH-ing on an odd port and it's all encrypted so nobody can tell what's going on, just the fact that whenever we see a packet come in on this port and the connection gets set up, then we always see a packet out to Microsoft. So we know it must be SSH. Helpfully, Microsoft publishes on their PowerShell GitHub account the copy of OpenSSH-Portable that they've mangled for Windows and they've
mostly kept their code separate under a contrib win32 directory. And looking at the actual ssh telemetry.c file, you can see in more detail the telemetry they're sending. And based on that, it would appear this applies to both the client and the server on Windows. So if we're just using the built-in SSH client on Windows, Microsoft is telling themselves about the version
of SSH that every machine you connect to is using. But it also looks like they said in quite a bit of detail about the setup of the message, including what ciphers you offered, what ciphers the server offered, which compression mechanisms were available, which MAC message authentication codes were possibly being used.
So not just what ones were selected, but what each end supported as well, which seems quite excessive. Also, very specifically proposed host keys on both client and server end, which is...
a unique fingerprint that says for absolute certain, if you know who held those keys, that like, you know, this is who was connecting right then. Well, the host keys would be who they were connecting to. But yeah, you know, when you normally SSH to a new host and you get that warning that, hey, I've not seen this pub key before. If Microsoft is getting every one of those, then yes, they would be able to probably just from the fingerprints collected by things like Shodan, be able to know which server you actually SSH-ing to if it was anything remotely popular and internet accessible.
I'm not sure actually in the protocol if that is going to actually be the host key or just the different algorithms for the host keys. But either way, it's excessive. But if it's the actual public key of the host key, that is kind of egregious rather than just excessive. Well, it's certainly a variable named client proposed host key and server proposed host keys.
Right, so the client doesn't actually send a host key. They just propose which host key algorithms they understand. You'd have to run SSH with a lot of dash Vs and actually watch the exchange back and forth to see what normally gets sent there. But it's a lot more information than Microsoft could really ever legitimately need.
So if you really want to use SSH on Windows, you should probably get the real OpenSSH from upstream OpenSSH and run that on Windows or run it under Windows Subsystem for Linux, where you're going to get the official Ubuntu package of SSH. And it's not going to have all the Microsoft telemetry in it or just use a different open source client like PuTTY or something. But...
I'd probably avoid the built-in SSH client. You know, it was nice to finally have that as a convenience feature, but it turns out Microsoft spoils everything. If you do want to use that specific client, but without the telemetry, you can fetch the project from GitHub yourself and build it and use your own binaries. At least as long as you trust the project to be telling you the truth when it says that
Any binaries built from the code on GitHub will have the telemetry disabled. It's just the ones built for Windows that have the telemetry manually enabled. Well, I'm guessing that's so that I don't build a version of it and just flood Microsoft's telemetry with fake nonsense until they decide it's not useful anymore. Well, there's nothing stopping you from turning on the flag that enables the telemetry when you build your copy, if that's what you want to do. And my copy just says, everybody's running OpenSH version 5,000.
Surely version 4.20.69. Coming soon to Let's Encrypt will be certificates that last six days, as opposed to the 90 days, which is the standard right now. Yeah, this is interesting. We talked about a couple of weeks or months ago, Apple and I think Google wanting to get that down to all certificates. The browsers wouldn't accept any certificate longer than 45 days to force
certificates to have to keep getting reissued to make sure that nobody's managing to have an SSL certificate for an expired domain or just to stop bad actors much more quickly. And if you've implemented the automation for Let's Encrypt and are doing this every 90 days, there's really no difference between doing it every 90 days and doing it every four days. You know, they're talking about having certificates that are good for about six days. So if you renew it at about two thirds of its life, like you normally would now, then you're
down to just a couple of days. But with that automation, it's not really costing you anything extra to do that because you've already built the automation. Whereas we were talking about from the Apple and Google one, if you're already slightly suffering from the fact that they shrunk the validity period to just over one year, going down to 45 days takes it from this annoying thing we do once a year to something we have to do basically every month. And it would
continue to push you to properly automate it.
And so Let's Encrypt getting down to that short seems interesting to me. And it sounds like a kind of logical evolution. But in their post here, they're talking about some of the challenges that will pose, considering that if they have to start issuing 20 times as many certificates as they do now, how do they scale their infrastructure for that? And saying, you know, at the current point, it's not inconceivable that by a decade from now, they'd have to be issuing 100 million certificates per day. And it's like, that seems like a really big number. But...
Thinking about it, 10 years ago, we wouldn't have thought that 5 million certificates a day was possible either. I agree with you. The automation makes it not seem like that big of a deal to the degree that it does seem like a challenge. It is kind of nice now feeling like, well, if something goes wrong within renewal process, it's going to go wrong soon enough that, you know, I'll have like a month to get it fixed before it actually bites me.
Whereas with a six-day cycle, you know, if you're normally renewing four days into a six-day cycle, now it's like, well, you better be on it if something's gone wrong. Like you need to detect that issue very quickly and you need to respond to it promptly or it will bite you in the butt a lot quicker. Yeah.
And I'm not saying that that's something that you can't deal with. I'm not saying that it's an unbearable burden, but it is something to be aware of. Especially if your certificate validity period is six days, you probably want to renew every two days or so in order to make sure that you have that bit of buffer time. So if it doesn't work, you have time for A, it to automatically try again and maybe work, and that you still have some time to respond if that also doesn't work. It means you've really got to up your monitoring game, essentially.
Well, in my case, it would be just change the threshold by monitoring, right? My Nagios is already monitoring all my important domains and tells me when the certificate's getting anywhere near its expiration. And so if it's a less encrypt certificate, as soon as it's down to less than 30 days, it's like, hey, that's not supposed to happen at.
I think it's 30 or between 30 and 45 days, it's supposed to auto renew with another newer one and extend the lifetime out. And if it's not doing that, then that's an alert and that gets addressed. And luckily I haven't had one of those in ages. But to Joe's point, the vast majority of people are not tracking their SSL cert expiration dates that way. I'm generally doing the same thing. Honestly, I could do a better job of it than I've been doing because again, you know, even with Let's Encrypt, like you have a little bit of time to figure it out, but
But as these cycles get shorter and shorter, I think it's going to get more and more important to be like, no, that needs to go to the forefront of it. And also, we may be looking at some indications that just like the kind of monitoring that Alan and I talk about that most people like, well, that sounds nice, but, you know, I have time for that. We may be getting towards a period where like that's just kind of the entry level technology.
to get into doing things professionally at all. It is worth saying that in this post, they say, our longstanding offering won't fundamentally change next year, but we are going to introduce a new offering that's a big shift from anything we've done before, short-lived
certificates with a lifetime of six days. So it sounds like at least initially the standard 90 day ones will still be available. Yeah, at least initially. But who knows if the six day ones go well, then that may well become the standard and the 90 day ones may get phased out. Well, in particular,
It seems to make sense for them to have this technology ready, specifically with the CA slash browser forum talking about shortening certificate lifetime maximum that browsers will refuse any certificate that has a lifetime of more than 45 days. It really makes sense for Let's Encrypt to have this up and running where people can already be deploying it and make sure it works before it becomes basically mandatory. Otherwise, browsers won't accept the certificates. So I think it's a really smart move and will benefit
both the implementation of shorter lifetime certificates for everyone, but especially the community of us professionals that have to manage this stuff all the time to already have experience with it before it becomes mandatory. I feel like this is a good opportunity for monitoring as a service companies to do quite well. Quite possibly. And I know Let's Inc. tried to do a little bit of something with email notifications early on. I don't know if they still do that. But yeah, I definitely think that
building on Jim's point that if this becomes the standard you have to do to be able to run a domain and so on, people will either set up that infrastructure themselves or turn to companies to do that. Or even, you know, the interesting other one we've seen is people that
aren't actually controlling their own Let's Encrypt at all. You know, if you use a lot of popular like WordPress hosting platforms and so on, because they can put the file in the right, in the well-known directory to do it, they do the Let's Encrypt for you and you don't even know that you're using Let's Encrypt. Your web hosting provider takes care of it. Yeah, it's just free SSL is what they call it. Yeah, whereas in the past that involved them actually having to pay money for certificates. And now they're like, well, we could just abuse Let's Encrypt. Yeah, now it's just a cron job. It's free real estate.
But ultimately, we're in favor of this then. Yeah. When it comes down to it, the shorter security certificate release cycle is a security win.
At least as long as you trust the distribution mechanism more than you trust the general ecosystem of servers, many of which get abandoned and compromised and keys can be stolen from because that's the reality of it is that the world is full of compromised servers that just get like abandoned and left alone and nobody ever does anything about the fact they've been compromised and get hijacked and
used for different things or the keys can be stolen from them. So the less time that those keys will remain valid for some use other than what was originally intended, the better off everybody is. Yeah, although the crontab will probably still run. Yeah.
renew the certificate so maybe it won't help that much but it's definitely interesting to see and I think will be good in the end I don't know if it will cause fewer problems we talked about again months ago when Microsoft accidentally let some certificates expire and it caused a whole bunch of things to not work anymore you know
If that can happen anytime somebody doesn't look at something for six days straight, then that might increase the chances of that. But at the same time, to our point, if it requires that level of automation and it's going to break that quickly, that's going to be a much better forcing function for people having better automation and more monitoring on it to catch it before that happens. So if it expires every six days, you're probably going to keep a closer eye on the expiration than when it expires every...
one to five years and nobody remembers about it. Alan, it sounds like you're supporting this policy specifically as an example of tough love for Microsoft. Not just for Microsoft, for everybody who lets their damn SSL certificates expire.
Which in the past has included myself. It seems like this is a slight cranking of the convenience and security dial, slightly away from convenience, slightly towards security. Yeah, and I think that in this case, that's a good thing.
Okay, this episode is sponsored by ServerMania. Go to servermania.com slash 25A to get 15% off dedicated servers recurring for life. ServerMania is a Canadian company with over a decade of experience building high-performance infrastructure hosting platforms for businesses globally. ServerMania has up to 20 gigabits per second network speeds in eight locations worldwide for optimized global reach.
They have flexible custom server configurations tailored to unique needs, as well as a personal account manager offering free consultations. With 24-7 live chat support and support ticket response times under 15 minutes, you can always keep your systems running smoothly.
Alan's been a happy ServerMania customer for over seven years, so support the show and join him at a hosting provider that truly delivers. Get 15% off dedicated servers recurring for life at servermania.com slash 25A and use code 25ADMINS. That's servermania.com slash 25A and code 25ADMINS.
I saw an interesting post from Jerry Lerman on Mastodon, which was an important reminder if you own a domain name and don't use it for sending email. Essentially, the message is that if you own a domain and you don't plan on using it to send email yourself, you should create SPF and DMARC records in the DNS that makes it clear that anybody who does receive mail claiming to be from your domain, well, it's not legitimate.
And the reason that you do this is twofold. One, because you're a good Internet citizen and you want to cut down on the amount of spam that reaches people's inboxes. And that's a good way to do it. And two, because that way, that spam that has your domain name on it won't be tarnishing the reputation of your domain.
If a lot of this spam gets through and there aren't SPF and DMARC records making it clear that that spam is not legitimate and was not sent by the actual owner of the domain, then what can happen is that trashes the reputation of the domain, which can not only cause you problems with delivering email down the line, should you choose to attach email services.
It can also cause issues that will cause firewalls to block you outright. There are a lot of corporate firewalls out there that actually use email spam block lists as just one hit kill. No, you can't go to that a record, you know, type firewalling. So you don't want to run afoul of these kinds of lists and setting SPF and DMARC that says, hey, any email that you get from my domain, it's not for me. That's not for me.
That's what keeps you from taking that reputational hit. Yeah, so basically setting an SPF record that says any email is junk and a DKIM that's saying anything not signed coming from this domain is junk. But what they point out that's interesting is if you actually set up DMARC properly, you can get reports about all the email that does get dropped as spam pretending to come from your domain. If you're just curious, who's trying to pretend to be you?
Actually, if you set up DMARC properly, you have to get those reports because if you don't actually include an email address that does accept email for those reports to go to, you have not properly configured DMARC.
which can be a mild annoyance. You know, I set up DMARC on my own domain and I get reports from Google and Microsoft every day. And about once or twice a week, I'll get a report from some random provider I've never heard of. And they're never really anything that you want to sit down and read, but you get them. You mentioned corporate firewalls. Presumably it's bad for SEO as well. Yes, absolutely. I mean, just in general, you don't want to be associated with spamming operations.
An awful lot of Internet connectivity and communication in general these days is very adversarial in nature. You know, you have filtering mechanisms that are explicitly designed to keep out the attacks and, you know, let in the good stuff. So anything that's associating you with, you know, people who are on the attacker side of things, it's going to make your life difficult in many aspects.
and somewhat unpredictable ways as you discover that as your name and your reputation spreads farther and it gets poorer and poorer, there are more and more different ways that people will find themselves blocked off from accessing your services.
Let's do some free consulting then. But first, just a quick thank you to everyone who supports us with PayPal and Patreon. We really do appreciate that. If you want to join those people, you can go to 2.5admins.com slash support. And remember that for various amounts on Patreon, you can get an advert-free RSS feed of either just this show or all the shows in the Late Night Linux family. And if you want to send in your questions for Jim and Alan or your feedback, you can email show at 2.5admins.com.
Jonas writes, I'd love to hear your thoughts on my NAS situation. My problem is that I find it slow during transfers. I have the folders mounted on my computer using NFS, and I typically copy or move a movie from one folder to another. Sometimes this is on the storage pool, sometimes it is not, but in both cases it feels extremely sluggish.
When the files are being written, it is almost impossible to browse to a new directory until the transfer is complete. The folder simply doesn't load. I'm connected to the NAS via a switch with 1Gbps. The maximum write speed I get is 115MBps, which maxes out the network, but not the drives. Is it my setup, or am I crazy to expect a responsive system during this?
The NAS is a Synology DS920 Plus with four bays. I think the easiest way to figure out what's the bottleneck here is while such a copy is going on and the drives are busy with the writes, if you SSH into the Synology and see if you can browse around the directories and run LS and if that's stalling or not. It could be the NFS connection is single threaded and so nothing else can happen until the writes finish and that's what's going on there and look
Looking at getting more threads for your NFS setup could solve it. But if you do see sluggishness locally on the NAS while doing the copy, then the problem is much more likely in Synology's use of ButterFS or just your drives. One thing to look at there is running IOstat and seeing if the drives are actually constantly busy or if they have...
some idle time that you could be using to do the browsing, but it's not happening. In addition to that, I think it's worth pointing out that the user themselves acknowledges they are saturating the network.
And they've already said that if they shell in and move the files locally, they don't see the slowdown. It's only when they're doing it, you know, via the NFS mount. So what's happening is you're doing a brute force copy over the network. When this happens, you're saturating the network in particular, if there's any Wi-Fi in this link. Now you're talking about 150 megabytes a second. So probably the NAS itself is not on Wi-Fi, but like if you're accessing this from a laptop over Wi-Fi and you're already saturating the network, then
You can have some real latency issues just issuing additional commands because you're already saturating the network. And with Ethernet, you've got bidirectional, your full duplex. So you can upload at the same time you're downloading at full saturation. But if you're going over Wi-Fi, Wi-Fi is half duplex, not full. And if you're saturating it, then everything is going to suck until you stop.
Now, Jonas did specifically say that the drives are CMR, so we're not dealing with SMR drives here. Yeah, because I can see how that could be a quick suspicion when you hear drives that take a really long time to do things while you're writing to them. Because SMR drives will do that when they have to garbage collect and move a bunch of 256 megabyte chunk of something to another SMR zone. But no, these are CMR drives.
One thing that occurs to me is that he was specifically talking about, you know, moving files, which is going to have to happen as a brute force copy operation both ways over the network. Do you know of any way to convince NFS to do server side copies instead? Not on Linux off the top of my head, but.
On FreeBSD, NFS does respect the copy file range call, which will, if you're using ZFS or the same thing on Linux, if you're using ButterFS, would be able to use the file cloning mechanism. So BRT in ZFS, or I guess just reflinks in ButterFS to be able to basically clone the file instead of copy it and have that all happen on the server side so that no one
nobody has to read the file to make a copy of it. And then when you delete the old one as part of the move, you just decrement the ref link by one or the block reference tree counters by one.
Clara also implemented this for Samba. So Samba on Linux and FreeBSD will now do this automatically on ZFS. And I think it already did it on ButterFS, where if you are doing a copy or a move, the server can see what you're doing and optimize it and use the file cloning feature in both of these file systems.
All right, now before we move on, there is one last thing I just noticed that I want to bring up. And I feel like this is a mistake that Al and I both made. We should have caught and didn't because we're kind of used to larger systems and thinking in a larger scale.
We already heard that the maximum write speed that they get is 115 megabytes per second. And for either myself or Alan, that sounds like immediately it's like, oh, well, yeah, you've saturated the network and we kind of stopped thinking about it. That's a network problem, not a storage problem. However, the actual topology here is just a pair of Rust drives, which means 115 megabytes per second write speed is getting real close to the absolute maximum those drives can actually accomplish as well.
With a really easy workload, a small Western Digital CMR drive might hit about 150 megabytes per second on a local write operation with no other bottleneck. Now, the thing about that is if you come that close to saturating a Rust drive with a very heavy write stream, I don't care what you do. Yes, anything else is going to have awful latency as that drive is trying to manage the storm of data that it's got to move around.
So yeah, if you are actually at that very moment performing writes at 115 megabytes per second on a simple two-rust drive mirror, you should absolutely expect it to be very pokey to browsing while that's going on. Worse, it looks like it's one six-terabyte drive and one four-terabyte drive, and the pool has 3.5 terabytes on it and is 75% full, meaning it's probably not a mirror. It is some kind of
ButterFS skunk raid. So it's probably using both drives in tandem and limiting the IOPS and the performance even worse. So exactly Jim's point that that might be the most those drives are physically capable of, and they're going to have really high latency when you try to read while they're busy writing.
That's actually going to be a flat butter volume on top of an MD RAID 1 from the look of it, being a Synology machine. And I think that when users said it was a 3.5 terabyte, what they're saying is they see 3.5 terabytes usable, which is slightly smaller than the 3.6 terabyte smaller drive of the pair.
So it probably genuinely is 75% full. That's fine. Like you don't have much more headroom before you start hitting real problems. By the time you hit 80%, you should be looking at some way to increase your capacity or you will start hitting very significantly increased fragmentation issues on Rust. But you're not there yet. That's not the problem. But again, like I said, if all you've got is the performance of one drive, which is all
all a RAID 1 mirror is going to give you when you're doing writes, and you're coming that close to saturating the drive and throughput, then yes, your small block operations, like trying to list the contents of a directory, they are going to be incredibly laggy until that massive write finishes. Right, well, we better get out of here then. Remember, show at 2.5admins.com if you want to send any questions or feedback.
You can find me at joerrest.com slash mastodon. You can find me at mercenariesysadmin.com. And I'm at Alan Jude. We'll see you next week.