No doubt, the hottest topic in I.T. at the start of 2018 continues to be the CPU security risks that have come to light as 2017 came to a close.
Otherwise known as “Spectre” and “Meltdown ” an exhaustive amount of information has been written already in regards to how these design choices can lead to data being accessed within the computer by processes or other code that shouldn’t have access to it, potentially leaving the system open to attacks by malicious code run on the computer.
For instance one of the more concerning attack vectors in this scenario are servers hosting multiple customers on one system, and in a world where it might be common to hear about many virtual machines being used in a hosting environment in order to keep them separate and secure, allowing this type of code to access the data with poor security in place opens up the possibility of transaction details, passwords and other customer records in a manner that has obviously raised a large amount of concern in both security professionals and end consumers alike.
Off the back of this have emerged the patches and updates required to solve the issue, and along with those are some rather alarming headline figures regarding performance levels potentially taking a hit, with claims of anywhere up to 30% overhead being eaten away by certain types of workload.
As there are many great resources already explaining this including this one here that can help outline what is going on, I’m not going to delve too much into the background of the issues, rather focus on the results of the updates being applied.
We’re going to look at both the Microsoft patch at a software level and test the BIOS update released to support it. There are two issues here with Meltdown and Spectre and there happens to be two variants of Spectre, one of which can be handled at the software level, with the other requiring the microcode update applied via a BIOS update.
Microsoft has, of course, released their own advisory notes which are certainly worth a review too and available here. At this time it is advised that Meltdown and all Spectre variants can both affect Intel CPU’s and some ARM compatible mobile chips, whereas AMD is only affected by the Spectre variants with AMD themselves having just issued an updated advisement at the time of writing which can be found here. This is also largely an OS platform agnostic issue with Microsoft, Apple, Linux and even mobile OS’s all having the potential to be affected and over the last few weeks rapidly deploying updates and patches to their users.
At this point, I’m just going to quote a portion taken from the Microsoft link above verbatim, as it outlines the performance concerns we’re going to look at today. Keep in mind that in the text below “variant 1 & 2” are both referring to the Spectre issues, whereas Meltdown is referred to as simply “variant 3”.
One of the questions for all these fixes is the impact they could have on the performance of both PCs and servers. It is important to note that many of the benchmarks published so far do not include both OS and silicon updates. We’re performing our own sets of benchmarks and will publish them when complete, but I also want to note that we are simultaneously working on further refining our work to tune performance. In general, our experience is that Variant 1 and Variant 3 mitigations have minimal performance impact, while Variant 2 remediation, including OS and microcode, has a performance impact.
Here is the summary of what we have found so far:
- With Windows 10 on newer silicon (2016-era PCs with Skylake, Kabylake or newer CPU), benchmarks show single-digit slowdowns, but we don’t expect most users to notice a change because these percentages are reflected in milliseconds.
- With Windows 10 on older silicon (2015-era PCs with Haswell or older CPU), some benchmarks show more significant slowdowns, and we expect that some users will notice a decrease in system performance.
- With Windows 8 and Windows 7 on older silicon (2015-era PCs with Haswell or older CPU), we expect most users to notice a decrease in system performance.
- Windows Server on any silicon, especially in any IO-intensive application, shows a more significant performance impact when you enable the mitigations to isolate untrusted code within a Windows Server instance. This is why you want to be careful to evaluate the risk of untrusted code for each Windows Server instance, and balance the security versus performance tradeoff for your environment.
For context, on newer CPUs such as on Skylake and beyond, Intel has refined the instructions used to disable branch speculation to be more specific to indirect branches, reducing the overall performance penalty of the Spectre mitigation. Older versions of Windows have a larger performance impact because Windows 7 and Windows 8 have more user-kernel transitions because of legacy design decisions, such as all font rendering taking place in the kernel. We will publish data on benchmark performance in the weeks ahead.
The testing outlined here today is based on current hardware and Windows 10. Specifically, the board is an Asus Z370 Prime A, running on a Samsung PM961 M.2. drive, with a secondary small PNY SSD attached. The CPU is an i5 8600 and the is 16GB of memory in the system.
Software wise updates for windows were completed right up to the 01/01/18 point and the patch from Microsoft to address this was released on 03/01/18 and is named “KB4056892”. I start the testing with the 605 BIOS from late 2017 and move through to the 606 BIOS designed to address the microcode update specified by Intel.
Early reports have suggested a hit to the drive subsystem, so at each stage, I’m going to test this and of course, I’ll be monitoring the CPU performance as each step is applied. Also keep in mind that as outlined in the Microsoft advisory above, different generations of hardware and solutions from different suppliers will be affected differently, but as Intel is suggested as being the hardest hit by these problems, it makes sense to examine a current generation to start with.
Going into this, I was hopeful that we wouldn’t be expecting to see a whole load of processing power lost simply due to the already public explanations of how the flaw could potentially affect the system didn’t read as being one that should majorly impact the way an audio system handles itself.
Largely it’s played out as expected, as when you’re working away within your sequencer the ASIO driver is there doing its best to keep itself as a priority and generally, if the system is tuned to work well for music, the shouldn’t be a million programs in the background that should be affected by this and causing the update to steal processing time. So, given we’re not running the sort of a server related workloads that I would expect to cause too much of an upset here, I was fairly confident that the impact shouldn’t be as bad as some suggestions had made out and largely on the processing side it plays out like that.
However, prior to starting the testing, it was reported that storage subsystems were taking a hit due to these patches and that of course demanded that we take a look at it along the way too. Starting with the worst news first, those previous reports are very much on the ball. I had two drives connected and below we see the first set of results taken from a Samsung M.2. SM961 model drive.
To help give you a little more background on what’s being tested here, each test should be as follows:
- Seq Q32T1 – sequential read/ write with multiple threads and queues
- 4K Q32T1 – random read/ write with multiple threads and queues
- Seq – sequential read/ write with a single queue and thread
- 4K – random read/ write with a single queue and thread.
The is no doubt a performance hit here to the smaller 4k files which are amplified as more threads are taken up to handle the workload in the 4K Q32T1 test. On the flip side of this is that the sequential handling seems to either escape relatively unscathed and in some instances even improved to some degree, so there is some trade-off here depending on the workload it’s handling.
The gains did confuse me at first and whilst first sifting through the data I started to wonder if perhaps given we were running off the OS drive, and perhaps other services had skewed it slightly. Thankfully, I also had a project SDD hooked up to the system as well, so we can compare a second data point against the first.
The 4k results still show a decrease and the sequential once again hold fairly steady with a few read gains, so it looks like some rebalancing to the performance levels has taken place here too, whether intentional or not.
The DAWBench testing, on the other hand with the DSP test, ends up with a more positive result. This time around I’ve pulled out the newer SGA based DSP test, as well as the Kontakt based DAWBench VI test and both were run within Reaper.
The result of the DSP test which concentrates on loading the CPU up shows little difference that doesn’t fall within the margin of error & variance. It should also be noted that the CPU was running at 99% CPU load when it topped out, so we don’t appear to be losing any overhead here in that regard.
With the Kontakt based DAWBench VI test, we’re seeing anything between 5% and 8% overhead reduction depending on the ASIO buffer, with the tightest 64 buffer suffering after each update whereas the looser settings coped better with the software update before taking a small hit when we get up to the 256 buffer.
Ultimately the concern here is how will it impact you in real terms?
The minor loss of overhead on the second testing set was from a Kontakt heavy project and the outcome from the drive tests would suggest that anyone with sample library that has a heavy reliance on disk streaming may wish to be careful here with any projects that are already on the edge prior to the update being applied.
I also timed that project being loaded across all 3 states of the update process as I went with the baseline time frame to open the project being 20 seconds. After the software update, we didn’t see a change in this time span, with the project still taking 20 seconds to open. However, the BIOS update once applied along with the OS update added 2 seconds to this giving us roughly a 10% increase in the project load time.
So at this time, whilst any performance is certainly not welcome, we’re not seeing quite the huge skew in the performance stakes that has been touted thankfully, and certainly well short of the 30% figure that was being suggested initially for the CPU hit.
There have been suggestions by Microsoft that older generations might be more severely affected and from the description of how it affects servers I suspect that we may well see that 30% figure and even higher under certain workloads in server environments, but I suspect that it’ll be more centered around the database or virtual machine server workstation segments than the creative workstation user base.
Outside of our small corner of the world, TechSpot has been running a series of tests since the news broke and it’s interesting to see other software outside of the audio workstation environment seems to be largely behaving the same for a lot of end users, as are the storage setups that they have tested. If you’d like to read through those you can do so here.
The issue was discovered over the course of 2017 back but largely kept under wraps so it couldn’t be exploited at the time. However, the existence of the problem leaked before the NDA was lifted and feels like a few solutions that have been pushed out in the days since may have been a little rushed in order to stop anyone more unethical capitalizing upon it.
As such, I would expect performance to bounce around over the next few months as they test, tweak and release new drivers, firmware and BIOS solutions. The concern right now for firms is ensuring that systems around the world are secure and I would expect there to be a period of optimization to follow once they have removed the risk of malware or worse impacting the end user.
Thankfully, it’s possible to remove the patch after you apply it, so in a worst case scenario you can still revert back and block it should it have a more adverse effect upon your system, although it will then leave you open to possible attacks. Of course, leaving the machine offline will obviously protect you, but then that can be a hard thing to do in a modern studio where software maintenance and remote collaboration are both almost daily requirements for many users.
However you choose to proceed, will no doubt be system and situation specific and I suspect as updates appear the best practice for your system may change over the coming months. Certainly, the best advice I can offer here is to keep your eye on how this develops, make the choices that keep you secure without hampering your workflow and review the situation going forward to see if further optimizations can help restore the situation to pre-patch levels as a resolve for the problem is worked upon by both the hardware and software providers.