Yes, you are reading this right. NVIDIA has finally released their long awaited next generation video card that moves away from the previous architecture. The GeForce GTX 480 is their high end GF100 card making it their most powerful video card currently available. While the delays have stacked up a little, I can finally say NVIDIA has their DirectX11 card out to compete with AMD's offering. Now, was the card worth the wait?
From their press conference a few months ago, we got the factors in creating their next GPU architecture which I will re-iterate here. There were four points outlined in their presentation: Geometric Realism, Unrivaled Image Quality, Revolutionary GPU Compute for Gaming, and Highest Performance GPU Ever Built. The first two and last one should always be, in my opinion, the targets for a brand new architecture. It’s only recently that we saw using the GPU’s computing power to accomplish things so that is a natural addition to the goals outlined. The GF100 is what's come out of that addressing the four points. NVIDIA could have stuck some parts to accomplish tessellation, a core component of DirectX 11, onto their current architecture but there are bottlenecks there that would’ve gotten in the way of producing a powerful and fast video card. Below is the brand new design of the GF100.
As you can see from the diagram, there are four groups of GPCs or graphics processing clusters. Each cluster contains what is called an SM or shader multiprocessors. Within those are 32 CUDA cores, 48/16KB of shared memory, 16/48KB of L1 cache, ISA improvements, 4 texture units, and 1 polymorph engine. Compared to GT200, their last architecture, it has 4X the CUDA cores, 3X the shared memory, and 100% more L1 cache as GT200 doesn’t have L1. But, this was a few months ago. The GeForce GTX 480 won't ship with 512 cores as initially thought but 480 cores. There has been word around the Net that the yields were low for 512 cores so this might be the case of having the initial high end cards having 480 cores. The good news though is that it gives NVIDIA a little room to grow and they can, in the future, release a higher end part using 512 cores. Also from the presentation a few days ago, there are about the same number of transistors on the GTX 480 as there are in four, yes four Intel Core i7 CPUs. We're talking over 3 billion transistors so there's a ton of them in this video card alone.
In the middle of it all is a big L2 cache block. L2 cache size is determined by the number of memory controllers available with a total of 768k of L2 cache with six memory controllers. Taking away one memory controller will remove 128k from the L2 cache. The L2 cache allows for efficient communication between SMs.
So you have these split up units working together to render scenes. Because many of the items can be rendered in parallel, the architecture is set to work on these tasks efficiently. There’s a lot of work though to ensure that in the end everything renders and comes out in the right order, which NVIDIA has spent a lot of time perfecting.
Coming with the release of GeForce GTX 480 and 470 will be a new CSAA called 32x sample CSAA which consists of 8 color samples and 24 coverage samples. If used, this will have an improvement over the previous method with better accuracy and quality. One example shown was a railing using the old CSAA against the new one. With the old method, some railings were missing but the 32x CSAA showed a more accurate picture with the railings appearing. It didn’t look that nice up close but from a distance it will be a more accurate representation when using the new technique.
Page 1 of 6