Contact The Author
Wayne

Review Related Links

Creative labs
PowerDVD
3DMark 2001
Vulpine GLmark
GL Excess
Quake 3 Auto-benchmark utility
DroneZ Mark
AMD N-Bench

Final Reality

VillageMark

Current Pricing

 

Creative Labs3d Blaster GeForce3 Ti500
Author : Wayne Date : 24th September 2001

3DVelocity would like to thank Creative Labs and especially Rosie Tickner of ProdigyPR for their help and courtesy in providing this graphics card for review.

 

Technology Overview :

We're a little late to the table with this review to launch into a full on technology review, so we'll simply take a quick look at the key aspects and what they offer you as an end user.

Lightspeed Memory Architecture :

To see a rich and detailed rendering on screen is a rewarding experience, but for many the processes involved in this seemingly simple task are vastly underestimated. Scene geometry levels have multiplied at an almost alarming rate with polygon counts up from perhaps a couple of thousand per frame to 100,000 plus. This may vastly improve the accuracy of the final rendered scene, but it also causes major problems for the processor having to move such vast amounts of data from place to place so that the various stages in the rendering can be accomplished.

To help overcome the geometry bandwidth problem, NVIDIA include hardware support for Higher Order Surfaces (or HOS). HOS allows developers to define a surface property using curves rather than the traditional way which involves creating it from perhaps many thousands of polygons. These curves use control points to which define the direction, severity and gradient of the curve and are known as splines.

The other major bottleneck in rendering highly detailed scenes is what's known as pixel bandwidth. We tend to forget that the creation of a scene isn't just about polygons, the GPU must calculate certain values on a pixel by pixel level. Let's take a fairly average scene.


Color Read + Z-Read + Texture Read + Color Write + Z-Write
4 bytes + 4 bytes + 4 bytes + 4 bytes + 4 bytes = 20 bytes

Here we can see that 20 bytes of data make up the final processing of a single pixel. Sound fairly harmless at this point doesn't it? Now let's factor in overdraw. This means that because certain parts of a scene will not be visible in the final image, each pixel will be rendered more than once depending on the final depth complexity of the scene. Using an average depth complexity of 2.5, this means we must multiply this figure by 2.5. Now let's imagine we are running at a screen resolution of 1024x768, this means our screen is comprised of a grid of pixels that is 1024 pixels wide and 768 pixels high. Hold on tight, here's the maths :

1024 pixels x 768 pixels x 2.5 (depth complexity) x 20 bytes per individual pixel = 39,321,600 bytes (39.3Mbytes) for every frame rendered. If we're running at say 60 frames per second this translates to 2.4GB/sec of data.

Increase the depth complexity, framerate or resolution and you can imagine how this figure can rapidly become unmanageable using traditional technology. Three of the primary technologies used to counter this problem are a crossbar-based memory controller to improve the efficiency of access to the frame buffer, lossless Z-buffer compression, and Z-Occlusion Culling to reduce the drawn depth complexity. Let's take these one at a time :

Crossbar Memory Controller

By far the busiest "lane" in a modern graphics processor is that between the frame buffer and the GPU (Graphics Processing Unit). Contrary to what many people think, the frame buffer isn't just used to store the final rendered image prior to display, it is actually your graphics card's "storage" area for information such as color, depth values, textures and geometry. In traditional architectures, a single "lane" is used which can access 128 bits of data at a time. Using modern DDR technology this becomes 256 bits of data as two transfers occur for every clock cycle. The problem with this approach is that only single packets of data can be transferred at any one time which is great if that particular packet of data happens to be 256 bits large, but what when the packet of data is only 64 bits? Well, what happens is that the rest of the 256 bit transfer is wasted. 64 bits of data get ferried over and the remaining capacity (192bits) is completely wasted, an efficiency of only 25%.

The idea of NVIDIA'a crossbar memory architecture is that rather than have a single "lane" along which data can travel, four "lanes" are used, each of which is able to carry 64 bits of data at a time. This way, if say four 64 bit data chunks need to be transferred, it can be done in a single transfer while the traditional approach would mean four separate transfers. By the same rule, if the data chunk happens to be a full 256bits, it is split up and carried across as normal, a kind of "best of both worlds" approach. Although I've made the whole process sound simple to the point that you're probably wondering why everybody doesn't do it, the reality is that this is quite a complex engineering task with four memory controllers each of which has to communicate with all of the others and also the GPU. As you might expect, under ideal conditions NVIDIA's crossbar memory architecture is up to four times more efficient than traditional approaches.

 

Page 3 - Lossless Z Compression and Z-Occlusion Culling

Home

Navigate to page : 1 :: 2 :: 3 :: 4 :: 5 :: 6 :: 7 :: 8 :: 9 :: 10 :: 11