A lightening fast GPU is absolutely pointless
without some lightening fast memory to partner it. DDR has served
the industry well for some time now but it has had its day and
it's time for something new. That something new is DDR2! Although
we have been told that the GeForceFX will ship with 1000MHz
DDR2 running at 2.0ns we'll no doubt see variants of the NV3X
ship with 2.2ns 900MHz memory and presumably at some point with
1.8ns 1100MHz DDR2 also.
Contrary to what many people were speculating
NVIDIA have decided to stay with a 128bit memory interface for
the time being. Clearly they feel the bandwidth afforded them
by DDRII along with the bandwidth saving features they've incorporated
give them enough headroom for at least this current generation
of products.
I'm assuming the memory actually used will be
Samsung's K4N26323AE-GC20 (and I could equally be totally wrong)
but if it is the specs are as follows.
Features
2.5V ± 0.1V power supply for device
operation
1.8V ± 0.1V power supply for I/O interface
On-Die Termination for all inputs except CKE,ZQ
Output Driver Strength adjustment by EMRS
SSTL_18 compatible inputs/outputs
4 banks operation
MRS cycle with address key programs
- CAS latency : 5, 6, 7 (clock)
- Burst length : 4 only
- Burst type : sequential only
Additive latency (AL): 0,1(clock)
Read latency(RL) : CL+AL
Write latency(WL) : AL+1
Differential Data Strobes for Data-in, Date
out ;
- 4 DQS and /DQS(one differential strobe per byte)
- Single Data Strobes by EMRS.
Edge aligned data & data strobe output
Center aligned data & data strobe input
DM for write masking only
Auto & Self refresh
32ms refresh period (4K cycle)
(16ms is under consideration)
144 Ball FBGA
Maximum clock frequency up to 500MHz
Maximum data rate up to 1Gbps/pin
DLL for Address, CMD and outputs
The figure quoted to us for memory bandwidth was
62GB/sec which is an impressive figure but we must remember
this is effective bandwidth rather than an actual bandwidth.
I'm currently waiting for confirmation as to the actual physical
bandwidth. As with DDR we're looking at a conventional 144pin
FBGA package so physically it's not going to look any different.
The memory architecture is being billed as "3rd
generation Lightspeed Memory Architecture". This may well
change as NVIDIA tells us it's significantly different to previous
incarnations of their Lightspeed memory architecture but no
precise details on these differences were forthcoming during
our conference call or evident in the technical papers. The
GeforceFX is currently the world's only DDRII production GPU
and its >1GHz data rate is also a world first.
DirectX 9 and GeForceFX
Shader Functions
shader programmability: The DX8
pixel shader specifications could appropriately be termed configurable
rather than programmable because the DX8 spec needed the general
programmability associated with a full instruction set and flexible
programming structure. The GeForce FX GPUs and DX9 allow much
longer shader programs, and give developers a greatly expanded
pixel shading instruction set. Vertex shader programmability: The DX8 vertex
shader specification gave developers very little control over
program flow. DX8 vertex shader programs are executed linearly,
with no early termination for performance optimization. DX9
and the GeForce FX GPUs support conditional branching for greatly
improved program flow control. Unified pixel shading specifications: The addition
of Pixel Shader 1.4 to the DX8 specification created an alternate
pixel shader structure and programming structure that followed
a completely different implementation philosophy. Many developers
were forced to re-write shader programs for different hardware,
rather than writing strictly to the API. The DX9 specification
eliminates this problem with the new unified Pixel Shader 2.0
definition.
The combination of DX9 and GeForce FX GPUs narrows the gap between
state-of- the-art PC graphics and state-of-the-art rendered
movies such as the Disney/Pixar Monsters, Inc. and Columbia
Pictures Final Fantasy: The Spirit Within productions. The DX9
API is a critical enabling technology for this powerful new
combination of hardware and software.
The DX9 specification includes three major new features:
Pixel Shader 2.0: DX9 exposes true programmability
of the pixel shading engine. This makes procedural shading on
a GPU possible for the first time. Vertex Shader 2.0: DX9 dramatically enhances the
power of the previous DirectX vertex shader by increasing the
length and flexibility of vertex programs. High-precision, floating-point color: DX9 breaks
the mathematical precision barrier that has limited PC graphics
in the past. Precision, and therefore visual quality, is increased
with 128-bit floating-point color per pixel.
GeForceFX Pixel Shader
2.0+
As you might have guessed from the "+"
in the name Nvidia's implementation of the GeForceFX's pixel
shader goes beyond what's strictly supported by DirectX 9. Provided
programmers are prepared to take advantage of what's on offer
NVIDIA have really opened the floodgates on some potentially
awesome effects. With an incredible 1024 texture and 1024 colour
instructions available in addition to 16 textures per pixel
NVIDIA have also massively increased the limits for shader instructions,
constants and registers (see below). Forget about things like
swizzling and conditional write masks, these are functions directly
linked to the source registers and aren't really something you
need to be concerned with unless you're actually programming
the shader.
Scientific Instructions :
Before DirectX 9 and GeForceFX, pixel shader functions
were limited to a very restricted array of functions and were
considered to be configurable rather than truly programmable.
Pixel shader 1.4 specifications of DirectX 8 were primarily
limited to texture related operations such as fetching, blending
and so on. Pixel shader 2.0 introduces several new functions
to the fray including maths instructions that allow it to solve
arbitrary math algorithms. One example is the alien below where
the plain blue version on the left can be transformed to a stylized
pink/brown version simply by making the colour and shade a function
of the angle at which the light hits it. With a little imagination
from the programmer simple maths procedures can be made to create
some fairly impressive effects.
Flow Control :
Another key element to the pixel shader 2.0 armoury
is the introduction of more efficient and thus more productive
flow control handling. For the first time in Direct3D, DirectX
9 introduces functions such as branching and looping to the
pixel shader specifications. This allows code to be routed and
reused like never before and also simplifies the creation of
effects as they can be grouped into categories and called upon
whenever they're required. This drastically cuts down on development
time as large quantities of the code can be shared between numerous
effects and can be debugged in a more efficient and generalized
manner. Programmers can now develop a a generalized wood-grain
shader, a generalized vegetation shader, a generalized eyeball
shader, etc.
Procedural Shading :
The improved flexibility of the pixel shader 2.0
arbitrary maths functions greatly enhances what can be achieved.
If we look at procedural textures for an example programmers
can now develop more complex 2D and 3D patterns on demand without
the need for a texture map. If we liken this to a cake, the
old method involved trudging down to the shops, picking up your
cake and trudging home with it again. This is obviously labour
intensive and nor particularly efficient. The procedural approach
is that rather than going to the shop for the cake you'd simply
slip a recipe into your oven and the cake would be created based
on the instruction contained in the recipe. This approach saves
on resources all round and totally streamlines the whole procedure.
What Pixel shader 2.0 does is allows far more lines to be included
in the cake recipe and thus more elaborate cakes can be conjured
up.
If we look at the image below, the amazing thing
is that no texture maps at all were used in its creation. Every
surface from the wood-grained table top to the rough surface
finish on the vase were computed mathematically and no traditional
texture maps were required at all. Another benefit to this technique
comes in games where perhaps a gunshot would blast away the
surface of a textured object like a brick wall or marble pillar.
One way of achieving this is by using 3D textures which not
only define the external appearance but also texture the internal
area of the object. This method is generally not very efficient
as large 3D textures are being shippped over the internal bus
despite the fact they may never actually be seen, depending
how good a shot you are of course! With procedural textures
the surface that remains after the surface has been blasted
away would simply have a new texture created for the underlying
surface mathematically, a far more efficient way of handling
it.
GeForceFX Vertex Shader
2.0+
The GeForceFX vertex shader enhancements closely
mirror those of the pixel shader. Vertex Shader 2.0 introduces
true programmability to the DX9 vertex shader specification.
Two key changes, compared to the DX8 specification, are longer
vertex programs and flow control using conditional branching
and subroutine commands. Vertex Shader 2.0 has also extended
the instruction set with new math functions.
With this new flexibility programmers are free
to write more complex routines for sophisticated and realistic
effects with less system and development time overhead.
DX9 vertex programs also simplify the animation of
complex geometries such as water
While the vertex shader is calculating the deformations
required to animate the water's surface the pixel shader is
busy calculating things like refraction on a per pixel basis.
By accurately showing the way light either shines through or
is reflected off the water's surface the sense of realism is
greatly enhanced.
Matrix Palette Skinning :
An example of how the GeForceFX's advanced vertex
shader 2.0+ can let's take Matrix Palette Skinning as an example.
DX8 / NV2x / R200 / RV250
4 shaders
1 bone
2 bone
3 bone
4 bone
Segment Model into those polys depending on 1,2,3,4
bones
Draw separately
DX9 / Radeon 9700
1 shader (1-4 bones)
Branching is per-object
Still have to segment model into 1-4 bone groups
Draw separately
GeForce FX
1 shader - branching is per-vertex
No need to segment model
Loop is done conditionally on a pervertex level
Because the GeForceFX is able to perform branching
on a per-vertex level rather than per-object a four segment
skinning operation can be performed in a single operation with
no need to draw each of the bones independently.
In case you're wondering who the good looking
character above is, his name is Ogre and he features in another
of NVIDIA's demos. This "Dancing Ogre" is a real-time
rendition of a movie originally created by Spellcraft Studios
titled "Yeah! the movie". The fidelity of the characters
in the scene (polygon counts, materials, and complexity of motion)
are nearly indistinguishable from that of the original off-line
rendered composition.
Check out Spellcraft
Studios from whom the original material to produce this
demo was provided.
You may also remember our friend The Wolfman from
the GeForce4 launch. Despite being a major step forward the
GeForce4 combined with DirectX 8 still had to render the fur
as individual layers of geometry using eight passes for each
individual pixel. Thanks to DirectX 9 the GeForceFX is now able
to perform the same operations in a single pass leading to huge
performance gains and no subsequent loss in quality!
Time Machine is one of the demos NVIDIA will be
using to demonstrate the power of their pixel shaders. Here's
NVIDIA's explanation of what we'll see :
Gaze through a portal at the passage of time
as this 1950's era pickup truck shows the ill effects of decades
of neglect. Moving from its pristine condition in 1950 to an
old rust bucket of today, the power of the programmable GeForce
FX pixel engine blends a variety of material surface effects
into a single shader program. The combination of procedural
and high-resolution texture data creates a series of realistic
surface treatments on the paint, chrome, wood, and interior
components of this old workhorse.
Key Feature:
Time-based Shaders - Each of the aging materials
has a single pixel shader associated
with it. These shaders use a variety of texture map inputs (color,
bump, specular,
reflection, surface reflectivity, and reveal maps) to produce
a seamless transition of
surface material effects over time.