We're looking for new writers to join us!

NVIDIA's GF100

by: John -
More On:
The next graphics architecture from NVIDIA is a long time coming and they’ve had a few delays now. Well on January 15th, NVIDIA held a little press meeting detailing the GF100, which is the current code name for the consumer version of Fermi. While no specifics on speed or pricing was made available, it’s one step closer to the card being actually released.

When GF100 is finally released, it will have a GeForce name so with their recent re-branding of their line to make it a little less confusing, they should name it to distinguish that it is the fastest set of cards available upon release.

So what were the factors in creating their next GPU architecture, There were four points outlined in their presentation: Geometric Realism, Unrivaled Image Quality, Revolutionary GPU Compute for Gaming, and Highest Performance GPU Every Built. The first two and last one should always be, in my opinion, the targets for a brand new architecture. It’s only recently that we saw using the GPU’s computing power to accomplish things so that is a natural addition to the goals outlined. GF100 is made to give you all that and at the highest level possible.

NVIDIA could have stuck some parts to accomplish tessellation, a core component of DirectX 11, onto their current architecture but there are bottlenecks there that would’ve gotten in the way of producing a powerful and fast video card. Below is the brand new design of the GF100.


As you can see from the diagram, there are four groups of GPCs or graphics processing clusters. Each cluster contains what is called an SM or shader multiprocessors. Within those are 32 CUDA cores, 48/16KB of shared memory, 16/48KB of L1 cache, ISA improvements, 4 texture units, and 1 polymorph engine. Compared to GT200, their last architecture, it has 4X the CUDA cores, 3X the shared memory, and 100% more L1 cache as GT200 doesn’t have L1.

In the middle of it all is a big L2 cache block. L2 cache size is determined by the number of memory controllers available with a total of 768k of L2 cache with six memory controllers. Taking away one memory controller will remove 128k from the L2 cache. The L2 cache allows for efficient communication between SMs.

So you have these split up units working together to render scenes. Because many of the items can be rendered in parallel, the architecture is set to work on these tasks efficiently. There’s a lot of work though to ensure that in the end everything renders and comes out in the right order, which NVIDIA has spent a lot of time perfecting.

Coming with the release of GF100 will be a new CSAA called 32x sample CSAA which consists of 8 color samples and 24 coverage samples. If used, this will have an improvement over the previous method with better accuracy and quality. One example shown was a railing using the old CSAA against the new one. With the old method, some railings were missing but the 32x CSAA showed a more accurate picture with the railings appearing. It didn’t look that nice up close but from a distance it will be a more accurate representation when using the new technique.

What NVIDIA is trying to accomplish with Geometric Realism is to make things like rounded items be less triangulated. For example the picture supplied by NVIDIA for Far Cry 2, you can see holster of the gun and the shoulder of the character having angles. Yes, it’s suppose to be a curved surface but everything is made up of triangles and you need a lot of little triangles to make a curved shape. GF100 is looking to solve this problem and part of the solution I mentioned earlier was to use tessellation.


With tessellation, one stores a rough shape of the object which is also what the program animates while the hardware adds in more triangles to make the object look smoother. You can subdivide the various triangles using tessellation from the rough shape in hardware to produce a much smoother shape. Developers have control on how much they want the objects tessellated. Tessellation can also add more details and more defined objects such as making a rather flat stone road into a very bumpy road by adding in more triangles all the while keeping it looking smooth . To do this without bogging down the machine, DirectX 11 supports hardware accelerated tessellation and GF100 will be NVIDIA’s first architecture that will support DX11 and in turn hardware tessellation.

DirectCompute should also be a big thing for gaming in the near future. Right now, you can experience some benefits first hand with some of the CUDA enabled applications. Taking advantage of the cores on the GPU, some operations can be performed quicker than by just using the CPU. Developers will be able to use the GPU far more than just outputting graphics. Metro 2033 was one example given of a company using DirectCompute to improve the game. The picture shown of an image where the items up close were clear while those farther away were blurred. Using DirectCompute, it was faster to accomplish this effect than using standard rendering modes.

So the short of it is GF100 is coming and it’s still on track for availability in Q1. Boards have been shipped to developers so content creators do have it on hand to test their games with. The cards were even on the show floor at CES. From the presentation, they did a lot of comparison with the ATI Radeon HD 5870 which leads me to believe that the cards are going to be around the $400 price range. Again, no price was given but that’s just my assumption based on the materials I saw. This is a big step for NVIDIA as they’ve finally offered up a brand new architecture that many enthusiasts have been calling for. NVIDIA may be late to the DX11 game but they’re looking to make a big, big splash with GF100. Let’s just hope the card comes out soon.

  •