Tag Archives: threadripper

2020 Q1 – Cpu’s in the Studio overview

So, here we are, back with a look over the array of CPU’s that arrived around xmas 2019 and the start of this year. Rather late to the party this time I fully admit, but it’s been an interesting journey just to get here.

Around the start of last November Intel announced its next iteration of its X299 refresh as being imminent. The timing stood out as potentially taking the wind out of AMD’s own launch of the much anticipated 3950X, although what was notable at the time was a lack of chips from either firm being immediately available. AMD managed to start rolling out the 3950X’s properly a few weeks later and into our hands over the xmas period, whilst Intel also got a few chips out over January although since then however availablity has remained erratic due to at first Chinese new year and the subsequent rise in the Coronavirus impacting the supply chain of hardware coming out of the Asia region.

Since then AMD has gone on to release further new Threadripper solutions over the last few weeks, which are continuing to create a buzz from the market at large. Intel on the other hand continues to be dogged by supply chain issues with the 10900X only appearing whilst writing this up and the 10940X still sitting on the list of chips to acquire and test in the near future.

So, availability aside, why was there no testing done late 2019?

Well, we tried is the answer, but it wasn’t to be. Along with the stock scarcity, I did manage to test most of the chips I could find, only to watch it all fall over when the initial 10980XE sample arrived. This is the first time that I’ve put a 18 core chip through the Reaper based DAWBench VI test and it became clear in testing that once we passed 32 threads in this build, some internal dependencies caused by the way that Kontakt was being mapped were limiting its ability to balance across more than 16 cores/32 threads successfully.

This issue was escalated back to Vin at DAWBench who then spent much of his xmas break rebuilding the testing suite and making it more suitable for high core count CPU’s going forward. After a closed beta throughout a large part of January this year we’ve finally got a release candidate that can be run over the chips as we start again from the ground up.

This means the hybrid test’s that I ran with last time in order to solve the issue of running out of test overhead are retired in favour of two new builds developed by Vin himself. The SGA 1566 test is pretty close to the version I ran last time, with the SGA test remaining largely the same as the older public build but with the audio quality of the plugin being switched to “high performance” in order to cause more load on the CPU.

As noted the new DAWBench VI build has changed to ensure smooth balancing on extreme core counts and is now based around a larger number of single instances, rather than running a multiple instruments per Kontakt instant design, it now runs a one instrument per instance layout with multi-processing disabled within Kontakt to allow the sequencer to spread the load easier. The new test has some interesting restrictions such as pre-loading close to 30GB’s worth of data when all the instances are live, which make it a bit more unwieldy in some situations. However, having tested it on 64 thread successfully it does seem to allow Reaper to manage the load balance to the best of its ability, making me confident that this should be scalable for a quite a while longer going forward.

This isn’t going to be like a normal run down, in that I’m not going in depth with specific chips this time. This is more of a catch up or state of play post to allow us to see the market as it stands. All prior benchmarks are invalid as comparisons as these are brand-new tests to enable us to be able to validate the new hardware.

So, from the top I should mention what I haven’t covered and why as I’m sure a lot of people are asking the relevant questions in relation to those models already.

First off the big omission is the new Threadripper chips and model dependent I saw two different problems that tripped me up here.

In testing on the 64 thread 3970X I saw it refuse to run cleanly on the DB VI test at a 64 buffer, where it simply crackled constantly with little to no load applied. It did run better on a 128 buffer, but the score still placed it behind a number of far weaker chips, so it didn’t look to handle itself well. The 256 buffer and upwards seemed to slowly creep towards the sort of performance levels I would hope to see, pretty much repeating the sort of issue’s we saw in previous generations with the low latency performance hole.

So, we thought there might be a few usable scenarios to be had, but then we ran the DSP test and hit another snag. The projects in that SGA DSP test would always overload at around 60 %- 65% load and I didn’t see a way around this in the time I spent with it. I’ve tried different memory types and speeds, but no matter what it seemed to cap out there.

I also took a look at the mighty 3990X, although was beaten here as well for a slightly different reason. There have been reports in the wider press about software having trouble addressing all the cores currently due to underlying Windows interactions, and we saw pretty much that there. With Reaper, we could only address 64 threads out of the 128 and it behaved much like the 3970X in the testing above. I had wondered if the Windows Pro Workstation build with its support for improved addressing across more cores might help, but the feedback so far is that it’s currently ineffective and comments from Microsoft and AMD would appear to currently bear this out and it remains one to keep an eye out in the future.

I have managed to take a look at a number of the Intel 10 series enthusiast range chips, although what we see here is a platform that is starting to look like it’s approaching the end of its current life cycle. Having attempted to change process a few times now, the strain of extracting ever more performance out of the current platform is beginning to show. This is the first time in a long time, where I’ve had to drop the all core turbo clock lock setting from the Intel enthusiast range as the overhead simply doesn’t seem to be there at this time to allow for it.

I’ve also retested the common Intel mid-range selection, including the ever popular 9900K. These were amongst the last chips that I took a look a proper look at and throughout 2019 remained popular options, still offering a strong solution where the utmost compatibility is concerned.

However, where the excitement stands then is with AMD’s mid-range in the shape of Ryzen and most notably the previously mentioned 3950X launch. One of the last articles I published last year was looking into the differences that overclocking your RAM could make to the system, with improvements to performance becoming ever more obvious as you increased the RAM clock and tightened the timings.

If you look at the officially rated RAM recommendations that accompany the current Ryzen chips they still outline that 3200Mhz kits are recommended and whilst they work fine, they still result in the performance hole that we’ve seen in previous generations of hardware. Even with the 3950X it remains recommended and whilst I’m sure it’s fine for gaming or other less intensive time sensitive workloads, for high performance audio system you want to be cranking those RAM clocks for the best results it seems.

It’s been discussed quite a lot with this generation that the internal data bus is running around 3733MHz or there about. Having tried it with 3733MHz RAM kits, it does indeed work great, but packs of RAM at that speed are both fairly rare and pretty expensive in comparison with more common speeds.

The 3200MHz “AMD Optimized” RAM packs that are in circulation do help, reducing the performance hit from 15 – 20% we’ve seen previously, down to around 8 – 15%. By going up to the 3600MHz AMD optimized kits we can pretty much remove the performance hole that we saw with previous generations.

So the question comes up, is more, simply more? Apparently not, as going over 3733MHz in a lot of testing appears to be having the opposite effect than the one desired, as it switches the controller ratio to another divider setting that allows the performance hole to creep in again, meaning that just adding a kit of 4000MHz isn’t being advised as a solution.

If you don’t have pre-optimized RAM packs already, fear not, with a bit of RAM tweaking you can possibly clock your RAM up to be more efficent. There is an excellent AMD Ram Calculator tool that I was playing with early in testing and you want to be looking at getting better than 90NS for optimizing your RAM for audio. I was hitting 74NS through manual tweaking of a none AMD optimized kit and 80NS through using a kit with a predefined profile and simply selecting that in the BIOS. For the sake of a few hours work overclocking it and testing, I would suggest that anyone who wants the best from their system pays the little extra for the pre-approved packs and have the hassle taken out of it, as regular packs running at stock settings were offering in the region of 110NS and above which isn’t great for our requirements.

So, testing this time is carried out with that in mind and a few additional notes before I lay out the charts.

All tests done using the current Reaper 6.0.4 build.
All testing done with an RME Babyface.
X299 systems are running 2933MHz which is the optimum RAM for the current 10 series.
Z390 systems are running 2666MHz RAM, again the optimum for the platform.
The AMD systems are all testing with 3600MHz RAM packs with pre-optimized timings as discussed above.
Windows 10 running the 1909 build, which is latest at the time of writing.

Regarding all core boosting, as already noted the X299 platform is simply running too hot for this to be viable without some much heavier duty cooling, due in large to the 260w+ power draw that it requires when clocked up. Whilst my standard cooler can handle it, by the time you’ve ramped up the cooling to that level it’s not exactly ideal for an audio recording setup.

AMD has suggested not overclocking the chips themselves this generation, rather letting the system manage it for you. Indeed, so far in practice I’ve seen that any attempt to carry out a manual overclock will then tend to restrict the RAM clocking we can then carry out. Given that optimizing the RAM gives us far more benefit over adding a few hundred MHz to the CPU itself, we’ve chosen to go that route instead. Our Z390 setups by comparison remain active with an all core turbo setup on each chip and officially rated RAM as the more optimum setup.

With that said about the AMD overclocking, I’ve been particularly impressed by AMD’s own turbo handling in this round of testing, with the cores boosting fairly consistent to all core levels when dealing with busy multi-threaded workloads and then pretty much just sitting there, which in most cases with the Ryzen 3000 series appears to be about 4.2Ghz.

With that out of the way, what exactly are we looking at results wise this time?

DAWBench DSP SGA1566 (2020 Build).

DAWBench DSP - SGA 1566 Test
DAWBench DSP – SGA 1566 Test – Click To Expand

DAWBench VI Kontakt Test (2020 Build).

DAWBench VI Test - Q1 2020
DAWBench VI Test – Q1 2020 – Click To Expand

So, as expected with the memory lag cleared up, we see the performance restored at the lowest latency settings and it makes a sizable difference.

The DSP test, which is essentially loading up each core to see where they top out was always the stronger test for AMD and the additional gains put them squarely ahead now at each price point.

Of the chips tested, the 9600K and 9700K sit at the bottom of the results and given the competition that’s not overly surprising. It’s interesting to note that the 3700X and 3800X are fairly similar results wise, with about 300MHz separating the chips on paper, we saw both of these turbo to roughly the same level meaning that there wasn’t a lot between them in real terms. That means that In this instance the 3700X looks to offer the better value out of these two models and indeed from pretty much the whole sub £500 segment.

Moving up to the more expensive “enthusiast” segment, this is where the AMD’s value lead is most apparent at this time. The Intel chips that we’ve seen look have made the standard 10% – 15% generational gains on the previous models, but the launch of the 3950X really pushed the results in AMD’s favour.

Of course, the performance hole we saw previously was always more apparent in the VI test and again the difference here is clear.

The 10980XE holds its crown in the VI test but not by a huge margin, whilst falling behind in the DSP test, meaning that if the chips were more equally priced then this would be far too tough to call. However, even with Intels price adjustments over recent months the 10980XE is sat around the £1250 price point at the time of writting and looking at the chart and specifically that 3950X in comparison which is sat at less than £700, it becomes clear that whilst the 10980XE is a perfectly fine chip in isolation it really should be £300 cheaper than it currently is in order to make sense in a value comparison with the rest of the market.

In fact, that’s the take home right now. Intels prices cuts should have made them more competitive and to be fair they briefly did, but AMD’s continued monitoring of the market and aggressive counter pricing has left Intel with very little to shout about at this time.

For those old enough to remember the days of the P4 chips and FX64, this was the last time that AMD secured a solid tech jump and dominant market share, right now we find ourselves in a period very much reminiscent of that. For Intel to come back now, it’s going to require that long awaited node change just to keep up with AMD and going forward I suspect that ideally they need a new platform in order to reposition themselves fully, kind of what we saw when the “i” series launched and until then I can’t see this highly competitive era easing up.

Looking forward is always interesting, although at this point I’m not sure anyone can predict how the year ahead will pan out. With much of the first and even second quarter of this year likely to see the entire component industry hit with delays in production and shipping, we might well be seeing shortages across the board for the more popular lines for some time to come. There has certainly been short availability of the 10980XE so far and this is unlikely to clear up anytime soon. The knock on effect has seen the 3950X go through waves of short supply too as AMD continue to try to meet the high demand for its flagship Ryzen chip, although it’s an envious problem to have, I’m sure.

It also means that at the time of writing this may prove to be quite possibly the best time to buy in quite a while, if you can source the parts. With possible supply issues in mind, it could be awhile before it all stabilises again and prices may start to drift if those available supplies struggle to meet demand in the short term.

Intel has 10nm mobile chips out there and more on the way, but it’s all a bit subdued on its desktop range. The already known about i9-10900K 10 core successor to the 9900K is the obvious model of interest but looking at the charts, the predicted up to 20% gains to be had via IPC, additional cores and increased cache are attractive, but I’m not sure it’ll elevate it much above the 3900X a chip that’s already out, established and in easy supply.

Also, notably AMD’s 4000 series were announced at this years CES and looks to be arriving later in the year if they remain on schedule. Crucially this is another overhaul for AMD and it seems like they are well-placed to keep making gains from their platform too, which undoubtedly will apply further pressure to Intel.

The earlier concerns about a limited number of  incompatibilities  have so far not grown and we’re aware that AMD have been working hard to help smooth these out. With the platform not so much “on the map” at this point, as stomping all over it in size 12 boots, we would hope that that developers are now sitting up, paying attention and baking in full compatibility from the ground upwards on future product releases.

We’ve already seen chip prices tumble since the start of this CPU performance war and no doubt it’ll continue for quite possibly a few more years yet.

As we’ve said before the consumer of course remains the big winner in all this. At this point we’re watching the software firms attempting to play optimization catch up as now that more consumers have cheaper and easier access to some absolutely stupendous core counts, it should hopefully lead to more developers taking advantage of this jump in available processing power.

Exciting times and it’s going to be insteresting to see how the rest of the year ahead pans out.

ThreadRippers 2990WX & 2950X on the bench: Just a little bit of history repeating?

I’m the first to admit that I’m a little late to the table with this write-up. The original 2990WX sample arrived whilst I was on leave and was quickly placed into a video rig and sent out for review, meaning I’ve had to locate another one at a later date. Along with that, I’m honestly a little overwhelmed with how much interest this £1700 workstation grade CPU has generated with the public in recent weeks, as I really didn’t expect this level of interest in a chip at this sort of price point.

I’ve also approached this with a little trepidation due to earlier testing. As someone noted over the GS forum, the 2990WX might not prove all that interesting for audio due to the design layout of the cores and the limitations we’ve seen previously with memory addressing inside of studio-based systems. They were certainly right there, as the first generation failed to blow me away and there remains a number of reservations I have with the under-laying design of this technology that potentially could be amplified by this new release. During the initial testing of the 2990WX this time around, the 1950X replacement also arrived with us too in the shape of the 2950X and given some of the results of the 2990WX I thought throwing it into the mix might prove a handy comparison. 

Why bring all this up at all? Well, because everything I discussed back then is still completely relevant. In fact, I’m going to go as far as to suggest that anyone doesn’t understand what I’m referring to at this point should head over to last years 1950X coverage and bring themselves up to speed before venturing forward any further.

Back again? Up to speed?

Then I shall begin.

The 2990WX is the new flagship within the AMD consumer range and features a 32 core / 64 thread design. It has a base clock of 3GHz with a max twin core turbo of 4.2GHz and an advised power draw of 250W TDP. 

I won’t split hairs. It’s a beast… something I’m sure most people reading this are well aware of given the past week or so’s publicity. 

In fact, for offline rendering, I could close the article right there. If you’re a video editor on this page and don’t happen to care about audio (hello… you might be lost, but welcome regardless) then you should feel secure in picking up one of these right now if you have the resources and the need for more power in your workshop.

But as was proven with the release of the 1950X, the requirements for a smooth running audio PC for a lot of users are largely pinned on how great it is for real-time rendering, which is a whole different ballgame.

In the 1950X article I linked up top, I went into a great deal of detail in regards to where performance holes existed. I found that low latency response was sluggish and resulted in a loss of performance overhead that left it not in an ideal place for audio orientated systems. I had a theory that NUMA load optimization for offline workloads was leaving the whole setup in a not ideal situation for real-time workloads like ASIO based audio handling. 

In the weeks following that article, we saw AMD release BIOS updates and application tweaks to try and resolve the NUMA addressing latency I had discussed in the original article, largely to no avail as far as the average audio user was concerned. In AMD’s defence, they were optimizing it further for tasks that didn’t include the sort of demands that real-time audio places upon it, so whilst I understand the improvements were successful in the markets they were designed to help, few of those happened to be audio-centric.  

At the time it was just a theory, but my conclusion was largely one being that if this is as integral to the design as I thought it might be, then it would take a whole architecture redesign to reduce the latency that was occurring to levels that would keep us rather demanding pro users happy.

The 2990WX we see here today is not the architecture change we would require for that to happen as where the 1950X has 2 dies in one chip, the 2990WX is now running a 4 die configuration which has the potential to amplify any previous design choices. If I was right about hard NUMA being the root of the lag in the first generation then on paper it looks like we can expect this to only get worse this time around due to the extra data paths and potential extra distance the internal data routing might have to cope with.

The 2950X, by comparison, is an update to the older 1950X and maintains 2 functional dies, with tweaks to the chip’s performance. Given the similar architecture, I would expect this to perform similar to the older chip, although make gains from the process refinements and tweaks enacted within this newer model. I’ll note that the all core overclocking is improved this time around and a stable 4GHz was quick and easy to achieve.

OK, so let’s run through the standard benchmarking and see what’s going on.

2990WX CPU-Z Report
2950X CPU-Z

As normal I’ve locked it off at an all core turbos on both of the chips. As with a lot of these higher core count chips, I’ve not managed to hit a stable all core max turbo clock, which would have been 4.2GHz, rather settling for 3.8GHz on the 2990WX and 4GHz on the 2950X both of which perform fine with aircooling

I’ve spoken to our video team about this and they managed to hit a stable 4.1GHz on the 2990WX using a Corsair H100, so it looks like you can eak out a bit more performance if noise is less of a consideration in your environment.

If you’re not aware from previous coverage why I do this, if you’re running a turbo with a large spread between the max and minimum clock speeds then the problem with real-time audio is that when 1 core falls over, they all fall over. So, whilst you might have 2 cores running at 4.2GHz the moment one of the cores still running at 3.2GHz fails to keep up then the whole lot will come tumbling down with it. Locking cores off will give you a smoother operating experience overall and I’m always keen to find a stable level of performance like this when doing this sort of testing.

2990WX CPU-Z Benchmark
2950X CPU-Z Benchmark

I don’t always remember to run this benchmark, although this time I’ve made the effort as Geekbench doesn’t appear to support this many cores at this point. Handily enough, I did at least run this over the 1950X last time which returned results of 428 on the single core and 9209 on the multi-core at the time.

Given that the 2990WX looks to be pulling twice the performance and physically has twice the number of cores, it looks to all be scaling rather well at this point. The 2950X, on the other hand, sees around a 10%-15% gain on the single and multi-core scores over the previous generation.

Moving onwards and the first test result here is the SGA DAWBench DSP test. 

DAWBench SGA1156 Test – Click To Enlarge

This initial test is very promising, as was the older 1950X testing. Raw performance wise we’re talking about it by the bucket, I really can’t stress that enough with both chips performing well in what is essentially a very CPU-centric test.

At the lowest buffer we see it being exceeded by the older chip, so what is going on there? Well, we’re seeing a repeat in the pattern that was exhibited by the 1950X where there is an impact to performance at tighter buffers, and it does appear that at the very tightest buffer setting that we’re seeing some additional inefficiency caused by the additional dies, although this does resolve itself when we move up a buffer setting.

Last time we scaled up from 70% load being accessible at a 64 buffer and this time, I imagine due to the extra dies being used we see the lowest setting corrupting around the 65% load level and then scaling up by 10% every time we double the buffer.

ASIO BufferCPU Load
6465%
12875%
25685%
51295%

As a note when I pulled that 512 buffer result this time around and it returned 529 instances.

The 2950X, by comparison, returned me a load handling around the 85% on a 64 buffer, rising to 95% at a 256. An improvement on the first look we took a look at the original 1950X chip, although I’ll note I was also seeing this improved handling when I did the 1950X retest a few months ago using the newer SGA1156 charts that has replaced the classic DSP test, so this might be down to the change in benchmarks over the last year, or it could also be down to the BIOS level changes they’ve made since original generation launch.

So far, so reasonable. A lot of users, even those with the most demanding of latency requirements can get away with a 128 buffer on the better audio interfaces and the performance levels seen at a 128 buffer, at least in this test are easily the highest single chip results that I’ve seen so far.

In fact, knowing we’re  losing 40% of the overhead on the 2990WX is really frustrating when you understand the sort of performance that we could be seeing otherwise. But even with that in mind, if you wanted to go all out and grab the most powerful option that you can, then wouldn’t this still make sense?

Well, that test is pure CPU performance and in the 1950X testing, the irregularities started to really manifest themselves in the DAWBench Kontakt test where it started to depend equally on the memory addressing side of things.

Normally I would insert a chart here to show how that testing panned out.

But I can’t.

It started off pretty well. I fired it up with a 64 buffer and started adding load to the project. I made it up to around 70% CPU load on the first attempt before the whole project collapsed on me and started to overload. I slackened it off by muting the tracks and took it back down to around 35% load where it stabilised, but from this point onwards I couldn’t take it above 35% without it overloading, not until I restarted the project. 

I then tried again at each buffer setting up to 512 and it repeated the pattern each time.

I proceeded to talk this one through with Vin the creator of the various DAWBench suites and a number of other ideas were kicked about, some of which I’ve dived further into.

One line of thought was that as I was still using Cubase and the last 8.5 build specifically, precisely for the reason that C9 has a load balance problem for high core count CPU’s that is currently being worked upon. The older C8.5 build is noted as not having the same issue manifest due to a difference in the engine and during testing this time Windows itself was showing a fairly balanced loads mapped across all of the cores whilst I was looking at performance meter, but even so, historically, exceeding 32 cores has always been questionable inside many of the DAW clients.  

So, to counter this concern, I went and ran the same tests under Reaper and saw much the same result. I could push projects to maybe 65%-70% and then it would distort the audio as the chip overloaded and this wouldn’t resolve itself until the sequencer was closed and reloaded.

So what is going on there? If I was to speculate, then the NUMA memory addressing is designed to allocate the nearest RAM channel to it’s nearest physical core and not to use other RAM channels until on core’s local channel is full.

I suspect with knowing that, that the outcome here is that it’s maintaining the optimal handling up until that 70% level and then once it figures out that the RAM channel is overloaded it starts allocating data on the fly as it sees fit. The reallocation of that data to one of the other 3 dies would result in it being buffered and then allocated to the secondary memory location and would result in additional latency when the data is recalled in a later buffer cycle which would result in audio being lost when the buffer cycle completes before it can be recalled.

In short, we’re seeing the same outcome as the first generation 1950X but amplified by the additional resources that now need to be managed.

This way of working is the whole point of hard NUMA addressing and indeed is the optimal design for most workstation workloads where multiple chips (or die clusters in this case) need to be managed. It’s a superb way for dealing with optimization for many workloads from database servers through to off-line render farms, but for anything requiring super-tight real-time memory allocation handling it remains a poor way of doing things.

As I’ve said previously, this is nothing new for anyone who deals with multi-CPU workstations where NUMA management has been a topic of interest to designers for decades now. There has always been a performance hit for dealing with multiple CPU’s in our type of workflow and it’s largely why I’ve always shy’d away from multiple chip Xeon based systems as they too exhibit this to a certain extent.

Much like the first generation 1950X with it’s 2 dies, we see similar memory addressing latency when we use 2 seperate Xeons and this has always been the case. I would never use 4 of those together in a cluster for this sort of work simply due to that latency and so the overall outcome with 4 dies being used in this fashion isn’t all that surprising.

I also tried retesting with SMT turned off, so it could only access the 32 physical cores in order to rule out a multi-threading problem. The CPU usage didn’t quite double at each buffer instead settling around the 70% total usage mark but the total amount of usable tracks remained the same and once again going over this lead to the audio collapsing quite rapidly.

So, much like the first generation the handling of VST instruments and especially those which are memory heavy look like they may not be the best sort of workload for this arrangement. This ultimately remains a shame, especially as one of the other great concerns from last time which was heat has been addressed by quite some degree. Running the 2990WX even with an overclock didn’t really see it get much above 70 degrees and that was on air. Given that the advised TDP here is 250W at stock, rising quickly when overclocked even to the point of doubling the power draw, the temperatures for a core count this huge is rather impressive. I think there is a lot to pay attention too here by Intel in regards to thermals and the news that the forthcoming i9’s are finally going to be soldered again, makes a whole load of sense given what we’ve seen here with the AMD solutions. If anything it’s just a shame it took the competition pulling this out of the hat before they took notice of all the requests for it to be brought back by their own customers over recent years.

Still, that’s the great thing about a competitive marketplace and very much what we like to see. Going forward I don’t really see these performance quirks changing within the Threadripper range, much the same way that I never expect it to change within the Xeon ecosystem. Both chip ranges are designed for certain tasks and optimized in certain ways, which ultimately makes them largely unsuitable for low latency audio work, no matter how much they exceed in other segments. 

There is some argument here for users who may not require ultra-tight real-time performance. It’s been brought to my attention in the past that users like mastering guys could have a lot of scope for using the performance available here and if they are doing video production work too, well, that only strengthens the argument. 

On paper that all makes sense and although I haven’t tested along those lines specifically, the results seem to indicate that even the trickiest of loads for these CPU’s seem to stabilise at 512 and above with 80%+ of the CPU being accessed, even in the worst case scenario. I have to wonder how it would stand up in mixed media scenarios although I would hope that ultimately in any situation where you render it offline that you should be able to leverage the maximum overhead from these chips.

I suspect the other upshot of this testing might be one of revisiting the total CPU core count that each DAW package can access these days. Last time I did a group test was about half a decade ago and certainly, all the packages look to have up’d their game since then. Even so, I doubt anyone working on a sequencer engine even 3 years ago would have envisioned a core count such as the one offered by the 2950X here, let along the monstrous core count found in the 2990WX. 

AMD’s Zen core IPC gains this generation as we’ve already seen with Ryzen refresh earlier in the year were around the 12% mark and it looks to have translated faithfully into Threadripper series with the 2950X  model. One of AMD’s big shouting points at launch was regarding just how scalable the Zen package was simply by upping the die count and that’s clear by the raw performance offered by the 2990WX, they really have proven just how effective this platform can be when dealing with workloads it’s designed for.

One day I just hope they manage to find a way of making it applicable to the more demanding of us studio users too.

First look at the AMD Threadripper 1920X & 1950X

Another month and another chip round up, with them still coming thick and fast, hitting the shelves at almost an unprecedented rate.

AMD’s Ryzen range arrived with us towards the end of Q1 this year and its impact upon the wider market sent shockwaves through computer industry for the first time for in well over the decade for AMD.

Although well received at launch, the Ryzen platform did have the sort of early teething problems that you would expect from any first generation implementation of a new chipset range. Its strength was that it was great for any software that could effectively leverage the processing performance on offer across the multitude of cores that were being made available. The platform whilst perfect for a great many tasks across any number of market segments did also have its inherent weaknesses too which would crop up in various scenarios with one such field where its design limitations being apparent being real-time audio.

Getting to the core of the problem.

The one bit of well meaning advice that drives system builders up the wall and that is the “clocks over cores” wisdom that has been offered up by DAW software firms since what feels like the dawn of time. It’s a double edged sword in that it tries to simplify a complicated issue without ever explaining why or in what situations it truly matters.

To give a bit of crucial background information as to why this might be we need to start from the point of view that your DAW software is pretty lousy for parallelization. 

That’s it, the dirty secret. The one thing computers are good at are breaking down complex chains of data for quick and easy processing except in this instance not so much.

Audio works with real-time buffers. Your ASIO drivers have those 64/128/256 buffer settings which are nothing more than chunks of time where the data is captured entering the system and held in a buffer until it is full, before being passed over to the CPU to do its magic and get the work done.

If the workload is processed before the next buffer is full then life is great and everything is working as intended. If however the buffer becomes full prior to the previous batch of information being dealt with, then data is lost and this translates to your ears as clicks and pops in the audio.

Now with a single core system, this is straight forward. Say you’re working with 1 track of audio to process with some effects. The whole track would be sent to the CPU, the CPU processes the chain and spits out some audio for you to hear. 

So far so easy.

Now say you have 2 or 3 tracks of audio and 1 core. These tracks will be processed on the available core one at a time and assuming all the tracks in the pile are processed prior to the buffer reset then we’re still good. In this instance by having a faster core to work on, more of these chains can be processed within the buffer time that has been allocated and more speed certainly means more processing being done in this example.

So now we consider 2 or more core systems. The channel chains are passed to the cores as they become available and the once more the whole channel chain is processed on a single core.  

Why?

Because to split the channels over more than one core would require us to divide up the work load and then recombine it all again post processing, which for real-time audio would leave us with other components in the chain waiting for the data to be shuttled back and forth between the cores. All this lag means we’d lose processing cycles as that data is ferried about, meaning we’d continue to lose more performance with each and every added core something I will often refer to as processing overhead.

Clock watching

Now the upshot of this means that lower clocked chips can often be more inefficient than higher clocked chips, especially with newer, more demanding software. 

So for just for an admittedly extreme example, say that you have the two following chips.

CPU 1 has 12 cores running at 2GHz

CPU 2 has 4 cores running at 4Ghz

The maths looks simple, 2 X 12 beats 4 X 4 on paper, but in this situation, it comes down to software and processing chain complexity. If you have a particularly demanding plugin chain that is capable of overloading one of those 2GHz CPU cores, then the resulting glitching will proceed to ruin the output from the other 11 cores.

In this situation the more overhead you have to play with overall on each core, the less chance the is that an overly demanding plugin is going to be able to sink to the lot in use.

This is also one of the reasons we tend to steer clear of single server CPU’s with high core counts and low clock speeds and is largely what the general advice is referring too. 

On the other hand when we talk about 4 core CPU’s at 4GHz vs 8 core CPU’s at 3.5GHz, in this example the difference between them in clock speeds isn’t going to be enough to cause problems with even the busiest of chains, and once that is the case then more cores on a single chip tend to become more attractive propositions as far as getting out the best performance is concerned.

Seeing Double

So with that covered, we’ll quickly cover the other problematic issue with working with server chips which is the data exchange process between memory banks. 

Dual chip systems are capable of offering the ultimate levels of performance this much is true, but we have to remember that returns on your investment diminish quickly as we move through the models. 

Not only do we have the concerns outlined above about cores and clocks, but when you move to dealing with more than one CPU you have to start to consider “NUMA”  (Non-uniform memory access) overheads caused by using multiple processors. 

CPU’s can exchange data between themselves via high-speed connections and in AMD’s case, this is done via an extension to the Infinity Fabric design that allows the quick exchange of data between the cores both on and off the chip(s). The memory holds data until it’s needed and in order to ensure the best performance from a CPU they try and store the data held in memory on the physical RAM stick nearest to the physical core.  By keeping the distance between them as short as possible, they ensure the least amount of lag in information being requested and with it being received.

This is fine when dealing with 1 CPU and in the event that a bank of RAM is full, then moving and rebalancing the data across other memory banks isn’t going to add too much lag to the data being retrieved. However when you add a second CPU to the setup and an additional set of memory banks, then you suddenly find yourself trying to manage the data being sent and called between the chips as well as the memory banks attached. In this instance when a RAM bank is full then it might end up bouncing the data to free space on a bank connected to the other CPU in the system, meaning the data may have to travel that much further across the board when being accessed. 

As we discussed in the previous section any wait for data to be called can cause inefficiencies where the CPU has to wait for the data to arrive. All this happens in microseconds but if this ends up happening hundreds of thousands of times every second our ASIO meter ends up looking like its overloading due to lagged data being dropped everywhere, whilst our CPU performance meter may look like it’s only being half used at the same time.

This means that we do tend to expect there to be an overhead when dealing with dual chip systems. Exactly how much depends on entirely on what’s being run on each channel and how much data is being exchanged internally between those chips but the take home is that we expect to have to pay a lot more for server grade solutions that can match the high-end enthusiast class chips that we see in the consumer market, at least when it comes to situations where real-time related workloads are crucial like dealing with ASIO based audio. It’s a completely different scenario when you deal with another task like off line rendering for video where the processor and RAM is being system managed on its own time and working to its own rules, server grade CPU options here make a lot of sense and are very, very efficient.

To server and protect

So why all the server background when we’re looking at desktop chips today? Indeed Threadripper has been positioned as AMD’s answer to Intel’s enthusiast range of chips and largely a direct response to the i7  and i9 7800X, 7820X and 7900X chips that launched just last month with AMD’s Epyc server grade chips still sat in waiting.

An early de-lidding of the Threadripper series chips quickly showed us that the basis of the new chips is two Zen CPU’s connected together. Thanks to the “Infinity Fabric” core interconnect design it makes it easy for them to add more cores and expand these chips up through the range; indeed their server solution EPYC is based on the same “Zen” building blocks at its heart as both Ryzen and Threadripper with just more cores piled in there.

Knowing this before testing it gave me some certain expectations going in that I wanted to examine. The first being Ryzens previously inefficient core handling when dealing with low latency workloads, where we established in the earlier coverage that the efficiency of the processor at lower buffer settings would suffer. 

This I suspected was an example of data transference lag between cores and at the time of that last look we weren’t certain how constant this might have proven to be across the range. Without having more experience of the platform we didn’t know if this was something inherent to the design or if perhaps it might be solved in a later update. As we’ve seen since its launch and having checked over other CPU’s in testing this performance scaling seems to be a constant across all the chips we’ve seen so far and something that certainly can be constantly replicated.

Given that it’s a known constant to us now in how it behaves, we’re happy that isn’t further hidden under-laying concerns here. If the CPU performs as you require at the buffer setting that you need it to handle then that is more than good enough for most end users. The fact that it balances out around the 192 buffer level on Ryzen where we see 95% of the CPU power being leveraged means that for  plenty of users who didn’t have the same concerns with low latency performance such as those mastering guys who work at higher buffer settings, meant that for some people this could still be good fit in the studio.

However knowing about this constant performance response at certain buffer settings made me wonder if this would carry across to Threadripper. The announcement that this was going to be 2 CPU’s connected together on one chip then raised my concerns that this was going to experience the same sort of problems that we see with Xeon server chips as we’d take a further performance hit through NUMA overheads. 

So with all that in mind, on with the benchmarks…

On your marks

I took a look at the two Threadripper CPU’s available to us at launch.

The flagship 1950X features 16 cores and a total of 32 threads and has a base clock of 3.4GHz and a potential turbo of 4GHz.

CPUz AMD 1950x
CPUz Details for the 1950X

CPU z AMD 1950x benchmark
CPUz details for the 1920X

 

Along with that I also took a look at the 1920X is a 12 core with 24 threads which has a base clock speed of 3.5GHz and an advised potential turbo clock of 4GHz.

CPUz AMD 1920XCPUz AMD 1920X benchmark

First impressions weren’t too dissimilar to when we looked at the Intel i9 launch last month. These chips have a reported 180W TDP at stock settings placing them above the i9 7900X with its purported 140W TDP.

Also much like the i9’s we’ve seen previously it fast became apparent that as soon as you start placing these chips under stressful loads you can expect that power usage to scale up quickly, which is something you need to keep in mind with either platform where the real term power usage can rapidly increase when a machine is being pushed heavily.

History shows us that every time CPU war starts, the first casualty is often your system temperatures as the easiest way to increase a CPU’s performance quickly is to simply ramp the clock speeds, although often this will also be a  cause of an exponential amount of heat then being dumped into the system because of it. We’ve seen a lot of discussion in recent years about the “improve and refine” product cycles with CPU’s where a new tech in the shape of a die shrink is introduced and then refined over the next generation or two as temperatures and power usage is reduced again, before starting the whole cycle again.

What this means is that with the first generation of any CPU we don’t always expect a huge overclock out of it, and this is certainly the case here. Once again for contrast the 1950X, much like the i9 7900X is running hot enough at stock clock settings that even with a great cooler it’s struggling to reach the limit of its advised potential overclock.

Running with a Corsair H110i cooler the chip only seems to hold a stable clock around the 3.7GHz level without any problems. The board itself ships with a default 4GHz setting which when tried would reset the system whilst running the relatively lightweight Geekbench test routine. I tried to setup a working overclock around that level, but the P-states would quickly throttle me back once it went above 3.8GHz leaving me to fall back to the 3.7GHz point. This is technically an overclock from the base clock but doesn’t meet the suggested turbo max of 4GHz, so the take home is that you should make sure that you invest in great cooling when working with one of these chips.

Geekout

Speaking of Geekbench its time to break that one out.

Geekbench 4 1950X stock Geekbench 4 1720X stock

I must admit to having expected more from the multi-core score, especially on the 1950X, even to the point in double checking the results a number of times. I did take a look at the published results on launch day and I saw that my own scores were pretty much in-line with the other results there at the time. Even now a few days later it still appears to be within 10% of the best results for the chip results published, which says to me that some people do look to have got a bit of an overclock going on with their new setups, but we’re certainly not going to be seeing anything extreme anytime soon.

 

Geekbench 4 Threadripper
Click to expand Geekebench Results

When comparing the Geekbench results to other scores from recent chip coverage it’s all largely as we’d expect with the single core scores. A welcome improvement from the Ryzen 1700Xs, they’ve clearly done some fine tuning to the tech under the hood as the single core score has seen gains of around 10% even whilst running at a slightly slow per core clock. 

One thing I will note at this point is that I was running with 3200MHz memory this time around. The were reports after the Ryzen launch that running with higher clocked memory could help improve the performance of the CPU’s in some scenarios and it’s possible that the single core clock jump we’re seeing might prove to be down as much to the increase in memory clocks as anything else. A number of people have asked me if this impacts audio performance at all, and I’ve done some testing with the production run 1800X’s and 1700X’s in the months since but haven’t seen any benefits to raising the memory clock speeds for real time audio handling. 

We did suspect this would be the outcome as we headed into testing, as memory for audio has been faster than it needs to be for a long time now, although admittedly it was great to revisit it once more and make sure. As long as the system RAM is fast enough to deal with that ASIO buffer, then raising the memory clock speed isn’t going to improve the audio handling in a measurable fashion.

The multicore results show the new AMD’s slotted in between the current and last generation Intel top end models. Whilst the AMD’s have made solid performance gains over earlier generations it has still be widely reported that their IPC scores (Instructions per clockcycle) are still behind the sort of results returned by the Intel chips.

Going back to our earlier discussion about how much code you can action on any given CPU core within a ASIO buffer cycle, the key to this is the IPC capability. The quicker the code can be actioned, then the more efficently your audio gets processed and so more you can do overall. This is perhaps the biggest source of confusion when people quote “clocks over core” as rarely are any two CPU’s comparable on clock speeds alone ,and a chip that has a better IPC performance can often outperform other CPU’s with higher quoted per clock frequencies but a lower IPC score. 

….And GO!

So lengthy explanations aside, we get to the crux of it all.

Much like the Ryzen tests before it, the Threadrippers hold up well in the older DawBench DSP testing run.

DawBench DSP Threadripper
Click To Expand

Both of the chips show gains over the Intel flagship i9 7900X and given this test uses a single plugin with stacked instances of it and a few channels of audio, what we end up measuring here is raw processor performance by simply stacking them high and letting it get on with it.

The is no disputing here that the is a sizable slice of performance to be had. Much like our previous coverage, however, it starts to show up some performance irregularities when you examine other scenarios such as the more complex Kontakt based test DawBenchVI.

DawBench VI Threadripper
Click To Expand

The earlier scaling at low buffer settings is still apparent this time around, although it looks to have been compounded by the hard NUMA addressing that is in place due to the multi chip in one die design that is in use. It once more scales upwards as the buffer is slackened off but even at the 512 buffer setting which I tested, it could only achieve 90% of CPU use under load.

That to be fair to it, is very much what I would expect from any server CPU based system. In fact, just on its own, the memory addressing here seems pretty capable when compared to some of the other options I’ve seen over the years, it’s just a shame that the other performance response amplifies the symptoms when the system is stressed.

AMD to their credit is perfectly aware of the pitfalls of trying to market what is essentially a server CPU setup to an enthusiast market. Their Windows overclocking tool has various options to set up some control and optimize how it deals with NUMA and memory address as you can see below.

AMD Control Panel
Click To Enlarge

I did have a fiddle around with some of the settings here and the creators mode did give me some marginal gains over the other options thanks to it appearing to arrange the memory in a well organized and easy to address logical group, but ultimately the performance dips we’re seeing are down to a physical addressing issue, in that data has to be moved from X to Y in a given time frame and no amount of software magic will be able to resolve this for us I suspect.

Conclusion

I think this one is pretty straight forward if you need to be running at below a 256 ASIO buffer, although there are certainly some arguments for mastering guys who don’t need that sort of response.

Much like the Intel i9’s before it, however, the is a strong suggestion that you really do need to consider your cooling carefully here. The normal low noise high-end air coolers that I tend to favour for testing were largely overwhelmed once I placed these on the bench and once the heat started to climb the water cooler I was using had both fans screaming.

Older readers with long memories might have a clear recollection of the CPU wars that gave us P4’s, Prescott’s, Athlon FX’s and 64’s. We saw both of these firms in a CPU arms race that only really ended when the i7’s arrived with the X58 chipset. Over the years this took place we saw ever raising clock speeds, a rapid release schedule of CPU’s and constant gains, although at the cost of heat and ultimately noise levels. In the years since we’ve had refinement and a vast reduction of heat and noise, but little as far as performance advancements, at least over the last 5 or 6 generations.

We finally have some really great choices from both firms and depending on your exact needs and price points you’re working at the could be arguments in each direction. Personally, I wouldn’t consider server class chips to be ultimate solution in the studio from either firm currently, not unless you’re prepared to spend the sort of money that the tag “ultimate” tends to reflect, in which case you really won’t get anything better.  

In this instance, if you’re doing a load of multimedia work alongside mastering for audio, this platform could fit your requirements well, but for writing and editing some music I’d be looking towards one of the other better value solutions unless this happens to fit your niche.

To see our custom PC selection @ Scan

To see our fixed series range @ Scan