Tag Archives: threadripper

ThreadRippers 2990WX & 2950X on the bench: Just a little bit of history repeating?

I’m the first to admit that I’m a little late to the table with this write-up. The original 2990WX sample arrived whilst I was on leave and was quickly placed into a video rig and sent out for review, meaning I’ve had to locate another one at a later date. Along with that, I’m honestly a little overwhelmed with how much interest this £1700 workstation grade CPU has generated with the public in recent weeks, as I really didn’t expect this level of interest in a chip at this sort of price point.

I’ve also approached this with a little trepidation due to earlier testing. As someone noted over the GS forum, the 2990WX might not prove all that interesting for audio due to the design layout of the cores and the limitations we’ve seen previously with memory addressing inside of studio-based systems. They were certainly right there, as the first generation failed to blow me away and there remains a number of reservations I have with the under-laying design of this technology that potentially could be amplified by this new release. During the initial testing of the 2990WX this time around, the 1950X replacement also arrived with us too in the shape of the 2950X and given some of the results of the 2990WX I thought throwing it into the mix might prove a handy comparison. 

Why bring all this up at all? Well, because everything I discussed back then is still completely relevant. In fact, I’m going to go as far as to suggest that anyone doesn’t understand what I’m referring to at this point should head over to last years 1950X coverage and bring themselves up to speed before venturing forward any further.

Back again? Up to speed?

Then I shall begin.

The 2990WX is the new flagship within the AMD consumer range and features a 32 core / 64 thread design. It has a base clock of 3GHz with a max twin core turbo of 4.2GHz and an advised power draw of 250W TDP. 

I won’t split hairs. It’s a beast… something I’m sure most people reading this are well aware of given the past week or so’s publicity. 

In fact, for offline rendering, I could close the article right there. If you’re a video editor on this page and don’t happen to care about audio (hello… you might be lost, but welcome regardless) then you should feel secure in picking up one of these right now if you have the resources and the need for more power in your workshop.

But as was proven with the release of the 1950X, the requirements for a smooth running audio PC for a lot of users are largely pinned on how great it is for real-time rendering, which is a whole different ballgame.

In the 1950X article I linked up top, I went into a great deal of detail in regards to where performance holes existed. I found that low latency response was sluggish and resulted in a loss of performance overhead that left it not in an ideal place for audio orientated systems. I had a theory that NUMA load optimization for offline workloads was leaving the whole setup in a not ideal situation for real-time workloads like ASIO based audio handling. 

In the weeks following that article, we saw AMD release BIOS updates and application tweaks to try and resolve the NUMA addressing latency I had discussed in the original article, largely to no avail as far as the average audio user was concerned. In AMD’s defence, they were optimizing it further for tasks that didn’t include the sort of demands that real-time audio places upon it, so whilst I understand the improvements were successful in the markets they were designed to help, few of those happened to be audio-centric.  

At the time it was just a theory, but my conclusion was largely one being that if this is as integral to the design as I thought it might be, then it would take a whole architecture redesign to reduce the latency that was occurring to levels that would keep us rather demanding pro users happy.

The 2990WX we see here today is not the architecture change we would require for that to happen as where the 1950X has 2 dies in one chip, the 2990WX is now running a 4 die configuration which has the potential to amplify any previous design choices. If I was right about hard NUMA being the root of the lag in the first generation then on paper it looks like we can expect this to only get worse this time around due to the extra data paths and potential extra distance the internal data routing might have to cope with.

The 2950X, by comparison, is an update to the older 1950X and maintains 2 functional dies, with tweaks to the chip’s performance. Given the similar architecture, I would expect this to perform similar to the older chip, although make gains from the process refinements and tweaks enacted within this newer model. I’ll note that the all core overclocking is improved this time around and a stable 4GHz was quick and easy to achieve.

OK, so let’s run through the standard benchmarking and see what’s going on.

2990WX CPU-Z Report
2950X CPU-Z

As normal I’ve locked it off at an all core turbos on both of the chips. As with a lot of these higher core count chips, I’ve not managed to hit a stable all core max turbo clock, which would have been 4.2GHz, rather settling for 3.8GHz on the 2990WX and 4GHz on the 2950X both of which perform fine with aircooling

I’ve spoken to our video team about this and they managed to hit a stable 4.1GHz on the 2990WX using a Corsair H100, so it looks like you can eak out a bit more performance if noise is less of a consideration in your environment.

If you’re not aware from previous coverage why I do this, if you’re running a turbo with a large spread between the max and minimum clock speeds then the problem with real-time audio is that when 1 core falls over, they all fall over. So, whilst you might have 2 cores running at 4.2GHz the moment one of the cores still running at 3.2GHz fails to keep up then the whole lot will come tumbling down with it. Locking cores off will give you a smoother operating experience overall and I’m always keen to find a stable level of performance like this when doing this sort of testing.

2990WX CPU-Z Benchmark
2950X CPU-Z Benchmark

I don’t always remember to run this benchmark, although this time I’ve made the effort as Geekbench doesn’t appear to support this many cores at this point. Handily enough, I did at least run this over the 1950X last time which returned results of 428 on the single core and 9209 on the multi-core at the time.

Given that the 2990WX looks to be pulling twice the performance and physically has twice the number of cores, it looks to all be scaling rather well at this point. The 2950X, on the other hand, sees around a 10%-15% gain on the single and multi-core scores over the previous generation.

Moving onwards and the first test result here is the SGA DAWBench DSP test. 

DAWBench SGA1156 Test – Click To Enlarge

This initial test is very promising, as was the older 1950X testing. Raw performance wise we’re talking about it by the bucket, I really can’t stress that enough with both chips performing well in what is essentially a very CPU-centric test.

At the lowest buffer we see it being exceeded by the older chip, so what is going on there? Well, we’re seeing a repeat in the pattern that was exhibited by the 1950X where there is an impact to performance at tighter buffers, and it does appear that at the very tightest buffer setting that we’re seeing some additional inefficiency caused by the additional dies, although this does resolve itself when we move up a buffer setting.

Last time we scaled up from 70% load being accessible at a 64 buffer and this time, I imagine due to the extra dies being used we see the lowest setting corrupting around the 65% load level and then scaling up by 10% every time we double the buffer.

ASIO BufferCPU Load
6465%
12875%
25685%
51295%

As a note when I pulled that 512 buffer result this time around and it returned 529 instances.

The 2950X, by comparison, returned me a load handling around the 85% on a 64 buffer, rising to 95% at a 256. An improvement on the first look we took a look at the original 1950X chip, although I’ll note I was also seeing this improved handling when I did the 1950X retest a few months ago using the newer SGA1156 charts that has replaced the classic DSP test, so this might be down to the change in benchmarks over the last year, or it could also be down to the BIOS level changes they’ve made since original generation launch.

So far, so reasonable. A lot of users, even those with the most demanding of latency requirements can get away with a 128 buffer on the better audio interfaces and the performance levels seen at a 128 buffer, at least in this test are easily the highest single chip results that I’ve seen so far.

In fact, knowing we’re  losing 40% of the overhead on the 2990WX is really frustrating when you understand the sort of performance that we could be seeing otherwise. But even with that in mind, if you wanted to go all out and grab the most powerful option that you can, then wouldn’t this still make sense?

Well, that test is pure CPU performance and in the 1950X testing, the irregularities started to really manifest themselves in the DAWBench Kontakt test where it started to depend equally on the memory addressing side of things.

Normally I would insert a chart here to show how that testing panned out.

But I can’t.

It started off pretty well. I fired it up with a 64 buffer and started adding load to the project. I made it up to around 70% CPU load on the first attempt before the whole project collapsed on me and started to overload. I slackened it off by muting the tracks and took it back down to around 35% load where it stabilised, but from this point onwards I couldn’t take it above 35% without it overloading, not until I restarted the project. 

I then tried again at each buffer setting up to 512 and it repeated the pattern each time.

I proceeded to talk this one through with Vin the creator of the various DAWBench suites and a number of other ideas were kicked about, some of which I’ve dived further into.

One line of thought was that as I was still using Cubase and the last 8.5 build specifically, precisely for the reason that C9 has a load balance problem for high core count CPU’s that is currently being worked upon. The older C8.5 build is noted as not having the same issue manifest due to a difference in the engine and during testing this time Windows itself was showing a fairly balanced loads mapped across all of the cores whilst I was looking at performance meter, but even so, historically, exceeding 32 cores has always been questionable inside many of the DAW clients.  

So, to counter this concern, I went and ran the same tests under Reaper and saw much the same result. I could push projects to maybe 65%-70% and then it would distort the audio as the chip overloaded and this wouldn’t resolve itself until the sequencer was closed and reloaded.

So what is going on there? If I was to speculate, then the NUMA memory addressing is designed to allocate the nearest RAM channel to it’s nearest physical core and not to use other RAM channels until on core’s local channel is full.

I suspect with knowing that, that the outcome here is that it’s maintaining the optimal handling up until that 70% level and then once it figures out that the RAM channel is overloaded it starts allocating data on the fly as it sees fit. The reallocation of that data to one of the other 3 dies would result in it being buffered and then allocated to the secondary memory location and would result in additional latency when the data is recalled in a later buffer cycle which would result in audio being lost when the buffer cycle completes before it can be recalled.

In short, we’re seeing the same outcome as the first generation 1950X but amplified by the additional resources that now need to be managed.

This way of working is the whole point of hard NUMA addressing and indeed is the optimal design for most workstation workloads where multiple chips (or die clusters in this case) need to be managed. It’s a superb way for dealing with optimization for many workloads from database servers through to off-line render farms, but for anything requiring super-tight real-time memory allocation handling it remains a poor way of doing things.

As I’ve said previously, this is nothing new for anyone who deals with multi-CPU workstations where NUMA management has been a topic of interest to designers for decades now. There has always been a performance hit for dealing with multiple CPU’s in our type of workflow and it’s largely why I’ve always shy’d away from multiple chip Xeon based systems as they too exhibit this to a certain extent.

Much like the first generation 1950X with it’s 2 dies, we see similar memory addressing latency when we use 2 seperate Xeons and this has always been the case. I would never use 4 of those together in a cluster for this sort of work simply due to that latency and so the overall outcome with 4 dies being used in this fashion isn’t all that surprising.

I also tried retesting with SMT turned off, so it could only access the 32 physical cores in order to rule out a multi-threading problem. The CPU usage didn’t quite double at each buffer instead settling around the 70% total usage mark but the total amount of usable tracks remained the same and once again going over this lead to the audio collapsing quite rapidly.

So, much like the first generation the handling of VST instruments and especially those which are memory heavy look like they may not be the best sort of workload for this arrangement. This ultimately remains a shame, especially as one of the other great concerns from last time which was heat has been addressed by quite some degree. Running the 2990WX even with an overclock didn’t really see it get much above 70 degrees and that was on air. Given that the advised TDP here is 250W at stock, rising quickly when overclocked even to the point of doubling the power draw, the temperatures for a core count this huge is rather impressive. I think there is a lot to pay attention too here by Intel in regards to thermals and the news that the forthcoming i9’s are finally going to be soldered again, makes a whole load of sense given what we’ve seen here with the AMD solutions. If anything it’s just a shame it took the competition pulling this out of the hat before they took notice of all the requests for it to be brought back by their own customers over recent years.

Still, that’s the great thing about a competitive marketplace and very much what we like to see. Going forward I don’t really see these performance quirks changing within the Threadripper range, much the same way that I never expect it to change within the Xeon ecosystem. Both chip ranges are designed for certain tasks and optimized in certain ways, which ultimately makes them largely unsuitable for low latency audio work, no matter how much they exceed in other segments. 

There is some argument here for users who may not require ultra-tight real-time performance. It’s been brought to my attention in the past that users like mastering guys could have a lot of scope for using the performance available here and if they are doing video production work too, well, that only strengthens the argument. 

On paper that all makes sense and although I haven’t tested along those lines specifically, the results seem to indicate that even the trickiest of loads for these CPU’s seem to stabilise at 512 and above with 80%+ of the CPU being accessed, even in the worst case scenario. I have to wonder how it would stand up in mixed media scenarios although I would hope that ultimately in any situation where you render it offline that you should be able to leverage the maximum overhead from these chips.

I suspect the other upshot of this testing might be one of revisiting the total CPU core count that each DAW package can access these days. Last time I did a group test was about half a decade ago and certainly, all the packages look to have up’d their game since then. Even so, I doubt anyone working on a sequencer engine even 3 years ago would have envisioned a core count such as the one offered by the 2950X here, let along the monstrous core count found in the 2990WX. 

AMD’s Zen core IPC gains this generation as we’ve already seen with Ryzen refresh earlier in the year were around the 12% mark and it looks to have translated faithfully into Threadripper series with the 2950X  model. One of AMD’s big shouting points at launch was regarding just how scalable the Zen package was simply by upping the die count and that’s clear by the raw performance offered by the 2990WX, they really have proven just how effective this platform can be when dealing with workloads it’s designed for.

One day I just hope they manage to find a way of making it applicable to the more demanding of us studio users too.

First look at the AMD Threadripper 1920X & 1950X

Another month and another chip round up, with them still coming thick and fast, hitting the shelves at almost an unprecedented rate.

AMD’s Ryzen range arrived with us towards the end of Q1 this year and its impact upon the wider market sent shockwaves through computer industry for the first time for in well over the decade for AMD.

Although well received at launch, the Ryzen platform did have the sort of early teething problems that you would expect from any first generation implementation of a new chipset range. Its strength was that it was great for any software that could effectively leverage the processing performance on offer across the multitude of cores that were being made available. The platform whilst perfect for a great many tasks across any number of market segments did also have its inherent weaknesses too which would crop up in various scenarios with one such field where its design limitations being apparent being real-time audio.

Getting to the core of the problem.

The one bit of well meaning advice that drives system builders up the wall and that is the “clocks over cores” wisdom that has been offered up by DAW software firms since what feels like the dawn of time. It’s a double edged sword in that it tries to simplify a complicated issue without ever explaining why or in what situations it truly matters.

To give a bit of crucial background information as to why this might be we need to start from the point of view that your DAW software is pretty lousy for parallelization. 

That’s it, the dirty secret. The one thing computers are good at are breaking down complex chains of data for quick and easy processing except in this instance not so much.

Audio works with real-time buffers. Your ASIO drivers have those 64/128/256 buffer settings which are nothing more than chunks of time where the data is captured entering the system and held in a buffer until it is full, before being passed over to the CPU to do its magic and get the work done.

If the workload is processed before the next buffer is full then life is great and everything is working as intended. If however the buffer becomes full prior to the previous batch of information being dealt with, then data is lost and this translates to your ears as clicks and pops in the audio.

Now with a single core system, this is straight forward. Say you’re working with 1 track of audio to process with some effects. The whole track would be sent to the CPU, the CPU processes the chain and spits out some audio for you to hear. 

So far so easy.

Now say you have 2 or 3 tracks of audio and 1 core. These tracks will be processed on the available core one at a time and assuming all the tracks in the pile are processed prior to the buffer reset then we’re still good. In this instance by having a faster core to work on, more of these chains can be processed within the buffer time that has been allocated and more speed certainly means more processing being done in this example.

So now we consider 2 or more core systems. The channel chains are passed to the cores as they become available and the once more the whole channel chain is processed on a single core.  

Why?

Because to split the channels over more than one core would require us to divide up the work load and then recombine it all again post processing, which for real-time audio would leave us with other components in the chain waiting for the data to be shuttled back and forth between the cores. All this lag means we’d lose processing cycles as that data is ferried about, meaning we’d continue to lose more performance with each and every added core something I will often refer to as processing overhead.

Clock watching

Now the upshot of this means that lower clocked chips can often be more inefficient than higher clocked chips, especially with newer, more demanding software. 

So for just for an admittedly extreme example, say that you have the two following chips.

CPU 1 has 12 cores running at 2GHz

CPU 2 has 4 cores running at 4Ghz

The maths looks simple, 2 X 12 beats 4 X 4 on paper, but in this situation, it comes down to software and processing chain complexity. If you have a particularly demanding plugin chain that is capable of overloading one of those 2GHz CPU cores, then the resulting glitching will proceed to ruin the output from the other 11 cores.

In this situation the more overhead you have to play with overall on each core, the less chance the is that an overly demanding plugin is going to be able to sink to the lot in use.

This is also one of the reasons we tend to steer clear of single server CPU’s with high core counts and low clock speeds and is largely what the general advice is referring too. 

On the other hand when we talk about 4 core CPU’s at 4GHz vs 8 core CPU’s at 3.5GHz, in this example the difference between them in clock speeds isn’t going to be enough to cause problems with even the busiest of chains, and once that is the case then more cores on a single chip tend to become more attractive propositions as far as getting out the best performance is concerned.

Seeing Double

So with that covered, we’ll quickly cover the other problematic issue with working with server chips which is the data exchange process between memory banks. 

Dual chip systems are capable of offering the ultimate levels of performance this much is true, but we have to remember that returns on your investment diminish quickly as we move through the models. 

Not only do we have the concerns outlined above about cores and clocks, but when you move to dealing with more than one CPU you have to start to consider “NUMA”  (Non-uniform memory access) overheads caused by using multiple processors. 

CPU’s can exchange data between themselves via high-speed connections and in AMD’s case, this is done via an extension to the Infinity Fabric design that allows the quick exchange of data between the cores both on and off the chip(s). The memory holds data until it’s needed and in order to ensure the best performance from a CPU they try and store the data held in memory on the physical RAM stick nearest to the physical core.  By keeping the distance between them as short as possible, they ensure the least amount of lag in information being requested and with it being received.

This is fine when dealing with 1 CPU and in the event that a bank of RAM is full, then moving and rebalancing the data across other memory banks isn’t going to add too much lag to the data being retrieved. However when you add a second CPU to the setup and an additional set of memory banks, then you suddenly find yourself trying to manage the data being sent and called between the chips as well as the memory banks attached. In this instance when a RAM bank is full then it might end up bouncing the data to free space on a bank connected to the other CPU in the system, meaning the data may have to travel that much further across the board when being accessed. 

As we discussed in the previous section any wait for data to be called can cause inefficiencies where the CPU has to wait for the data to arrive. All this happens in microseconds but if this ends up happening hundreds of thousands of times every second our ASIO meter ends up looking like its overloading due to lagged data being dropped everywhere, whilst our CPU performance meter may look like it’s only being half used at the same time.

This means that we do tend to expect there to be an overhead when dealing with dual chip systems. Exactly how much depends on entirely on what’s being run on each channel and how much data is being exchanged internally between those chips but the take home is that we expect to have to pay a lot more for server grade solutions that can match the high-end enthusiast class chips that we see in the consumer market, at least when it comes to situations where real-time related workloads are crucial like dealing with ASIO based audio. It’s a completely different scenario when you deal with another task like off line rendering for video where the processor and RAM is being system managed on its own time and working to its own rules, server grade CPU options here make a lot of sense and are very, very efficient.

To server and protect

So why all the server background when we’re looking at desktop chips today? Indeed Threadripper has been positioned as AMD’s answer to Intel’s enthusiast range of chips and largely a direct response to the i7  and i9 7800X, 7820X and 7900X chips that launched just last month with AMD’s Epyc server grade chips still sat in waiting.

An early de-lidding of the Threadripper series chips quickly showed us that the basis of the new chips is two Zen CPU’s connected together. Thanks to the “Infinity Fabric” core interconnect design it makes it easy for them to add more cores and expand these chips up through the range; indeed their server solution EPYC is based on the same “Zen” building blocks at its heart as both Ryzen and Threadripper with just more cores piled in there.

Knowing this before testing it gave me some certain expectations going in that I wanted to examine. The first being Ryzens previously inefficient core handling when dealing with low latency workloads, where we established in the earlier coverage that the efficiency of the processor at lower buffer settings would suffer. 

This I suspected was an example of data transference lag between cores and at the time of that last look we weren’t certain how constant this might have proven to be across the range. Without having more experience of the platform we didn’t know if this was something inherent to the design or if perhaps it might be solved in a later update. As we’ve seen since its launch and having checked over other CPU’s in testing this performance scaling seems to be a constant across all the chips we’ve seen so far and something that certainly can be constantly replicated.

Given that it’s a known constant to us now in how it behaves, we’re happy that isn’t further hidden under-laying concerns here. If the CPU performs as you require at the buffer setting that you need it to handle then that is more than good enough for most end users. The fact that it balances out around the 192 buffer level on Ryzen where we see 95% of the CPU power being leveraged means that for  plenty of users who didn’t have the same concerns with low latency performance such as those mastering guys who work at higher buffer settings, meant that for some people this could still be good fit in the studio.

However knowing about this constant performance response at certain buffer settings made me wonder if this would carry across to Threadripper. The announcement that this was going to be 2 CPU’s connected together on one chip then raised my concerns that this was going to experience the same sort of problems that we see with Xeon server chips as we’d take a further performance hit through NUMA overheads. 

So with all that in mind, on with the benchmarks…

On your marks

I took a look at the two Threadripper CPU’s available to us at launch.

The flagship 1950X features 16 cores and a total of 32 threads and has a base clock of 3.4GHz and a potential turbo of 4GHz.

CPUz AMD 1950x
CPUz Details for the 1950X
CPU z AMD 1950x benchmark
CPUz details for the 1920X

 

Along with that I also took a look at the 1920X is a 12 core with 24 threads which has a base clock speed of 3.5GHz and an advised potential turbo clock of 4GHz.

CPUz AMD 1920XCPUz AMD 1920X benchmark

First impressions weren’t too dissimilar to when we looked at the Intel i9 launch last month. These chips have a reported 180W TDP at stock settings placing them above the i9 7900X with its purported 140W TDP.

Also much like the i9’s we’ve seen previously it fast became apparent that as soon as you start placing these chips under stressful loads you can expect that power usage to scale up quickly, which is something you need to keep in mind with either platform where the real term power usage can rapidly increase when a machine is being pushed heavily.

History shows us that every time CPU war starts, the first casualty is often your system temperatures as the easiest way to increase a CPU’s performance quickly is to simply ramp the clock speeds, although often this will also be a  cause of an exponential amount of heat then being dumped into the system because of it. We’ve seen a lot of discussion in recent years about the “improve and refine” product cycles with CPU’s where a new tech in the shape of a die shrink is introduced and then refined over the next generation or two as temperatures and power usage is reduced again, before starting the whole cycle again.

What this means is that with the first generation of any CPU we don’t always expect a huge overclock out of it, and this is certainly the case here. Once again for contrast the 1950X, much like the i9 7900X is running hot enough at stock clock settings that even with a great cooler it’s struggling to reach the limit of its advised potential overclock.

Running with a Corsair H110i cooler the chip only seems to hold a stable clock around the 3.7GHz level without any problems. The board itself ships with a default 4GHz setting which when tried would reset the system whilst running the relatively lightweight Geekbench test routine. I tried to setup a working overclock around that level, but the P-states would quickly throttle me back once it went above 3.8GHz leaving me to fall back to the 3.7GHz point. This is technically an overclock from the base clock but doesn’t meet the suggested turbo max of 4GHz, so the take home is that you should make sure that you invest in great cooling when working with one of these chips.

Geekout

Speaking of Geekbench its time to break that one out.

Geekbench 4 1950X stock Geekbench 4 1720X stock

I must admit to having expected more from the multi-core score, especially on the 1950X, even to the point in double checking the results a number of times. I did take a look at the published results on launch day and I saw that my own scores were pretty much in-line with the other results there at the time. Even now a few days later it still appears to be within 10% of the best results for the chip results published, which says to me that some people do look to have got a bit of an overclock going on with their new setups, but we’re certainly not going to be seeing anything extreme anytime soon.

 

Geekbench 4 Threadripper
Click to expand Geekebench Results

When comparing the Geekbench results to other scores from recent chip coverage it’s all largely as we’d expect with the single core scores. A welcome improvement from the Ryzen 1700Xs, they’ve clearly done some fine tuning to the tech under the hood as the single core score has seen gains of around 10% even whilst running at a slightly slow per core clock. 

One thing I will note at this point is that I was running with 3200MHz memory this time around. The were reports after the Ryzen launch that running with higher clocked memory could help improve the performance of the CPU’s in some scenarios and it’s possible that the single core clock jump we’re seeing might prove to be down as much to the increase in memory clocks as anything else. A number of people have asked me if this impacts audio performance at all, and I’ve done some testing with the production run 1800X’s and 1700X’s in the months since but haven’t seen any benefits to raising the memory clock speeds for real time audio handling. 

We did suspect this would be the outcome as we headed into testing, as memory for audio has been faster than it needs to be for a long time now, although admittedly it was great to revisit it once more and make sure. As long as the system RAM is fast enough to deal with that ASIO buffer, then raising the memory clock speed isn’t going to improve the audio handling in a measurable fashion.

The multicore results show the new AMD’s slotted in between the current and last generation Intel top end models. Whilst the AMD’s have made solid performance gains over earlier generations it has still be widely reported that their IPC scores (Instructions per clockcycle) are still behind the sort of results returned by the Intel chips.

Going back to our earlier discussion about how much code you can action on any given CPU core within a ASIO buffer cycle, the key to this is the IPC capability. The quicker the code can be actioned, then the more efficently your audio gets processed and so more you can do overall. This is perhaps the biggest source of confusion when people quote “clocks over core” as rarely are any two CPU’s comparable on clock speeds alone ,and a chip that has a better IPC performance can often outperform other CPU’s with higher quoted per clock frequencies but a lower IPC score. 

….And GO!

So lengthy explanations aside, we get to the crux of it all.

Much like the Ryzen tests before it, the Threadrippers hold up well in the older DawBench DSP testing run.

DawBench DSP Threadripper
Click To Expand

Both of the chips show gains over the Intel flagship i9 7900X and given this test uses a single plugin with stacked instances of it and a few channels of audio, what we end up measuring here is raw processor performance by simply stacking them high and letting it get on with it.

The is no disputing here that the is a sizable slice of performance to be had. Much like our previous coverage, however, it starts to show up some performance irregularities when you examine other scenarios such as the more complex Kontakt based test DawBenchVI.

DawBench VI Threadripper
Click To Expand

The earlier scaling at low buffer settings is still apparent this time around, although it looks to have been compounded by the hard NUMA addressing that is in place due to the multi chip in one die design that is in use. It once more scales upwards as the buffer is slackened off but even at the 512 buffer setting which I tested, it could only achieve 90% of CPU use under load.

That to be fair to it, is very much what I would expect from any server CPU based system. In fact, just on its own, the memory addressing here seems pretty capable when compared to some of the other options I’ve seen over the years, it’s just a shame that the other performance response amplifies the symptoms when the system is stressed.

AMD to their credit is perfectly aware of the pitfalls of trying to market what is essentially a server CPU setup to an enthusiast market. Their Windows overclocking tool has various options to set up some control and optimize how it deals with NUMA and memory address as you can see below.

AMD Control Panel
Click To Enlarge

I did have a fiddle around with some of the settings here and the creators mode did give me some marginal gains over the other options thanks to it appearing to arrange the memory in a well organized and easy to address logical group, but ultimately the performance dips we’re seeing are down to a physical addressing issue, in that data has to be moved from X to Y in a given time frame and no amount of software magic will be able to resolve this for us I suspect.

Conclusion

I think this one is pretty straight forward if you need to be running at below a 256 ASIO buffer, although there are certainly some arguments for mastering guys who don’t need that sort of response.

Much like the Intel i9’s before it, however, the is a strong suggestion that you really do need to consider your cooling carefully here. The normal low noise high-end air coolers that I tend to favour for testing were largely overwhelmed once I placed these on the bench and once the heat started to climb the water cooler I was using had both fans screaming.

Older readers with long memories might have a clear recollection of the CPU wars that gave us P4’s, Prescott’s, Athlon FX’s and 64’s. We saw both of these firms in a CPU arms race that only really ended when the i7’s arrived with the X58 chipset. Over the years this took place we saw ever raising clock speeds, a rapid release schedule of CPU’s and constant gains, although at the cost of heat and ultimately noise levels. In the years since we’ve had refinement and a vast reduction of heat and noise, but little as far as performance advancements, at least over the last 5 or 6 generations.

We finally have some really great choices from both firms and depending on your exact needs and price points you’re working at the could be arguments in each direction. Personally, I wouldn’t consider server class chips to be ultimate solution in the studio from either firm currently, not unless you’re prepared to spend the sort of money that the tag “ultimate” tends to reflect, in which case you really won’t get anything better.  

In this instance, if you’re doing a load of multimedia work alongside mastering for audio, this platform could fit your requirements well, but for writing and editing some music I’d be looking towards one of the other better value solutions unless this happens to fit your niche.

To see our custom PC selection @ Scan

To see our fixed series range @ Scan