|
Why Cache Matters
I
don't want to turn this into an overly technical tour,
there are plenty of sites who already do a damned
fine job of that kind of thing, but as the only real
difference between Barton and Thoroughbred is the
amount of L2 cache it's important you know at least
the basic principles.
In
the good old days when an 8MHz CPU was considered
the norm there were no concerns about keeping the
CPU waiting for data. CPU's in those days generally
took so long to get anything calculated that in some
situations the data was kept waiting by the CPU rather
than the other way around.
One
of the prices we've had to pay for ever increasing
processor speeds is that memory performance, though
many times greater than it was, hasn't kept pace and
as such there's a massive potential bottleneck as
your lightening fast CPU waits for data to be fetched
from your relatively lethargic system memory. The
solution to this is cache.
Cache
is a type of memory that's significantly faster than
your system memory and is used to store data that
it's hoped your CPU will be needing to continue its
calculations without having to fish around in system
memory for it. The problem with cache memory is that
it's difficult and expensive to implement.
Unlike
times gone by when level 2 cache was often slower
and mounted either on the motherboard, on a separate
daughter card or sometimes on the same PCB as the
core AMD build their level 2 cache into the processor
core itself. This not only allows them to strictly
control the flow of data it also means they can operate
it at the same speed as the processor (synchronously).
Actually,
cache is a fairly generic term that refers to all
kinds of temporary data storage designed to speed
up or streamline the flow of data. Technically speaking
your hard disk is a form of cache, and if we ignore
the optical drives it's just about the slowest, most
primitive form of mechanical cache in your PC. In
fact because it's so slow it usually has cache of
its own to help its performance reach desirable levels.
If
we look at the diagram below we can see how cache
is actually arranged in layers, and the closer we
get to the CPU the faster the cache is. After your
HDD we have your system memory followed by your level
2 cache, then level 1 cache and finally your CPU registers.
CPU registers are actually tiny areas of storage that
form part of the CPU itself and are used to store
data that's actually being used for the current calculation.
This data is controlled by the compiler. The level
1 cache is the next fastest and then the level 2 cache
both of which AMD run at full speed and have incorporated
right onto the CPU core.

So
how does cache speed up operations? Let's think of
your CPU as a construction worker (a builder) who's
building a house and the cache as his tool bag. To
avoid having to keep popping out to his van he loads
his tool bag with tools based on what he thinks he's
likely to need for the job. If he gets his tool selection
wrong he has to stop what he's doing and nip out to
the van and get the right one. In cache terms this
is called a "miss" because the required
data wasn't available when requested and this clearly
slows down the progress of the job. If however the
tool he needs is right there in his tool bag then
this is recorded as a "hit", the data was
available when requested and this allows him to continue
working almost uninterrupted.
So
how do we try to keep our builder busy and stop him
having to keep getting tools from the van? Well, there
are two ways to achieve this. Firstly, hire an intelligent
guy who's more likely to guess correctly when he selects
the tools in the first place, or secondly, buy him
a bigger tool bag so he can carry more tools with
him. The former is known as a hardware data prefetch,
a routine which intelligently anticipates data (tools)
the processor (builder) will need based on
the instruction stream executed (job that needs
doing). The second solution is where Barton comes
in. AMD have now now doubled the size of our builder's
tool bag, or should I say they've doubled the amount
of level 2 cache available to the CPU, from a previous
256k to a now rather generous 512k which combined
with the 128k of level 1 cache makes a full 640k of
on-die, full speed cache and one very efficient builder!!
In this scenario we could perhaps think of level1
cache as being our man's tool belt, so the very commonly
used tools would be here in the belt. So you see how
we have several "layers" of options available
to try to make the job more efficient. If the required
tool isn't on our builder's belt then he'll have to
stoop and check his tool bag, and if it's not here
the only option left is to pop out to the van to get
it. So it is with cache. If the data isn't in L1 so
the CPU checks L2. If it's not there then system memory
must be accessed. If you're wondering where the hard
drive fits in to our building example then that I'm
afraid is a trip to the local builder's merchants!
An
interesting fact and one worth pondering is that A
512 KB level 2 cache, caching 64 MB of utilized main
system memory can supply 90% to 95% of the data requested
by the CPU directly!
Let's
take a look at how this impacts on performance.
|