Top 5 This Week

Related Posts

If Nvidia’s next-gen GPUs can somehow live up to the hype, they’ll make the RTX 3090 look slow

As we draw closer to the launch of Nvidia’s next generation of graphics cards, expected in Q3, and possibly as soon as August, it’s inevitable that hype begins to build. The usual leakers fire off tweets every other day proclaiming a titbit performance estimate, feature, or characteristic. Sometimes they’re vague or cryptic, and other times quite specific. Regardless, a trend is clearly emerging. Nvidia’s next gen flagship consumer GPU, the tentatively named RTX 4090 is rumoured to be an absolute monster. If it ends up being twice as fast as an RTX 3090 (hardly a slouch!) then Nvidia will have pulled off an intergenerational performance uplift that it hasn’t managed in the many years I’ve been covering GPUs.

It’s difficult to put a precise figure on historical gen-on-gen performance increases, though a good example was the jump in performance Nvidia achieved when it introduced the GTX 10-series. The GTX 980 to GTX 1080 performance uplift was above 50% in many cases, and sometimes a lot higher. But it wasn’t 100%. So, what’s going on? Are we to believe that an RTX 4090 will be twice as fast as a 3090? Has Nvidia found something truly revolutionary? I wish I knew. The simple answer is that it’s too early to tell.

There are three significant reasons why a 100% gain is possible. They are: process node, shader count and power budget. Let’s begin with the process node. Ampere GPUs are manufactured on Samsung’s 8nm node. Ada Lovelace is to be manufactured on TSMC’s 5nm (or Nvidia optimised N4 node). That doesn’t mean its transistors are half the size; there’s a lot more to it than that. It’s more of an umbrella term. There’s gate length, pitch, density and a healthy dose of marketing thrown in to obfuscate what ‘size’ a node really is. Still, smaller is generally better, and Nvidia will gain a lot from the move from Samsung 8nm to TSMC 5nm.

Next up is shader count. The RTX 3090 Ti with its fully unlocked GA102 GPU packs in 10,752 so-called CUDA cores, or shader cores. Rumours point towards the next gen AD102 GPU containing 18,432 cores. That info comes from the infamous cyberattack Nvidia suffered back in late February. That’s a 70% increase right there. Add to that the increase in Level 2 cache size and like-for-like, GA102 will gain a big chunk of shader performance over GA102 just there.

See moreYour next upgrade

Best CPU for gaming: The top chips from Intel and AMD
Best gaming motherboard: The right boards
Best graphics card: Your perfect pixel-pusher awaits
Best SSD for gaming: Get into the game ahead of the rest

Then there’s the power budget. All of those cores need to be fed, which means there would be an expected increase in power to keep 70% more shaders clocked at the same level as those of the RTX 3090 (and Ti). Nvidia will gain some efficiency from moving to the smaller node, but if the rumours of a large jump in power consumption are true, then Nvidia might not be sticking with 3090 like clocks, but possibly clock a lot higher. Are 2.5GHz boost clocks out of the question? I wouldn’t bet against it.

So, we have the efficiency gains from moving to a smaller node, a huge increase in shader count (and L2 cache size) and probably clock speed increases. If you combine them all with the other expected architectural improvements, suddenly a 100% performance increase isn’t out of the question.

Nvidia will surely optimise its RT and Tensor cores to deliver improved ray tracing, DLSS performance and features. Is RT performance the basis of 100% performance increase rumours? It’s possible. As good as ray tracing looks on screen, it’s not at the point where it can be universally implemented without a big performance hit. Expect improvements on that front. Nvidia isn’t likely to back off from hyping ray tracing as the frontier of gaming technology, even though raster performance will remain vital for years to come.

I’m left wondering if memory bandwidth won’t be an issue though. A 384-bit bus with 21Gbps GDDR6X would provide just about 1TB/s of bandwidth. That’s the same config as seen on the RTX 3090 Ti. Is a 512-bit bus feasible? AMD did it back in 2007 with the HD 2900 XT so it’s certainly not impossible. Perhaps we’ll see a GDDR7 4090 Ti in a year or so? Don’t bet against it. How about HBM3? That’s unlikely though.

Let’s not forget that I’m talking about the RTX 4090 vs 3090 (Ti). These kinds of cards grab the headlines but are actually not interesting to a lot of gamers who think the idea of US$2,000 graphics cards is utterly ludicrous. What might really impress me is how an RTX 4060 or 4070 class card will perform relative to something like a 3080. If a 4060 can match a 3080 at 200W or so and come with an attractive price, it will raise the roof. Shut up and take my money!

It’s still early days. It’s likely that we’re still months away from a proper reveal, and only then it’s only going to be the high-end cards. There’s conflicting info out there though. The moral of the story is that a healthy pinch of your favourite salt is needed. The quest for clicks makes it difficult to separate truth from fiction, and rumor from total BS.

You can be sure that next generation GPUs are going to be fast. But how fast? Let’s wait and see just how fast that next gen fast really is. I’m excited, even if a doubling of performance is a bit too much to hope for. I’ve been surprised before though, and I’d love to be surprised again.

Popular Articles