|
Creative
Labs3d Blaster GeForce3 Ti500
|
|
Author : Wayne
Date : 24th September 2001
|

3DVelocity would like to
thank Creative
Labs and especially Rosie Tickner of ProdigyPR for their
help and courtesy in providing this graphics card for review.
Technology Overview :
We're a little late to the table with this review
to launch into a full on technology review, so we'll simply
take a quick look at the key aspects and what they offer you
as an end user.
Lightspeed Memory Architecture :
To see a rich and detailed rendering on screen
is a rewarding experience, but for many the processes involved
in this seemingly simple task are vastly underestimated. Scene
geometry levels have multiplied at an almost alarming rate with
polygon counts up from perhaps a couple of thousand per frame
to 100,000 plus. This may vastly improve the accuracy of the
final rendered scene, but it also causes major problems for
the processor having to move such vast amounts of data from
place to place so that the various stages in the rendering can
be accomplished.
To help overcome the geometry bandwidth problem,
NVIDIA include hardware support for Higher Order Surfaces (or
HOS). HOS allows developers to define a surface property using
curves rather than the traditional way which involves creating
it from perhaps many thousands of polygons. These curves use
control points to which define the direction, severity and gradient
of the curve and are known as splines.
The other major bottleneck in rendering highly
detailed scenes is what's known as pixel bandwidth. We tend
to forget that the creation of a scene isn't just about polygons,
the GPU must calculate certain values on a pixel by pixel level.
Let's take a fairly average scene.
Color Read + Z-Read + Texture Read + Color Write + Z-Write
4 bytes + 4 bytes + 4 bytes + 4 bytes + 4 bytes = 20 bytes
Here we can see that 20 bytes of data make up
the final processing of a single pixel. Sound fairly harmless
at this point doesn't it? Now let's factor in overdraw. This
means that because certain parts of a scene will not be visible
in the final image, each pixel will be rendered more than once
depending on the final depth complexity of the scene. Using
an average depth complexity of 2.5, this means we must multiply
this figure by 2.5. Now let's imagine we are running at a screen
resolution of 1024x768, this means our screen is comprised of
a grid of pixels that is 1024 pixels wide and 768 pixels high.
Hold on tight, here's the maths :
1024 pixels x 768 pixels x 2.5 (depth complexity)
x 20 bytes per individual pixel = 39,321,600 bytes (39.3Mbytes)
for every frame rendered. If we're running at say 60 frames
per second this translates to 2.4GB/sec of data.
Increase the depth complexity, framerate or resolution
and you can imagine how this figure can rapidly become unmanageable
using traditional technology. Three of the primary technologies
used to counter this problem are a crossbar-based memory controller
to improve the efficiency of access to the frame buffer, lossless
Z-buffer compression, and Z-Occlusion Culling to reduce the
drawn depth complexity. Let's take these one at a time :
Crossbar Memory Controller
By far the busiest "lane" in a modern
graphics processor is that between the frame buffer and the
GPU (Graphics Processing Unit). Contrary to what many people
think, the frame buffer isn't just used to store the final rendered
image prior to display, it is actually your graphics card's
"storage" area for information such as color, depth
values, textures and geometry. In traditional architectures,
a single "lane" is used which can access 128 bits
of data at a time. Using modern DDR technology this becomes
256 bits of data as two transfers occur for every clock cycle.
The problem with this approach is that only single packets of
data can be transferred at any one time which is great if that
particular packet of data happens to be 256 bits large, but
what when the packet of data is only 64 bits? Well, what happens
is that the rest of the 256 bit transfer is wasted. 64 bits
of data get ferried over and the remaining capacity (192bits)
is completely wasted, an efficiency of only 25%.
The idea of NVIDIA'a crossbar memory architecture
is that rather than have a single "lane" along which
data can travel, four "lanes" are used, each of which
is able to carry 64 bits of data at a time. This way, if say
four 64 bit data chunks need to be transferred, it can be done
in a single transfer while the traditional approach would mean
four separate transfers. By the same rule, if the data chunk
happens to be a full 256bits, it is split up and carried across
as normal, a kind of "best of both worlds" approach.
Although I've made the whole process sound simple to the point
that you're probably wondering why everybody doesn't do it,
the reality is that this is quite a complex engineering task
with four memory controllers each of which has to communicate
with all of the others and also the GPU. As you might expect,
under ideal conditions NVIDIA's crossbar memory architecture
is up to four times more efficient than traditional approaches.

Page 3 - Lossless Z
Compression and Z-Occlusion Culling