3DVelocity logo


Content

Latest

News
Articles

Community

Forum


AMD Athlon 64 : AMD64 technology - Under the skin
The architecture of the Athlon 64 is 'special' thanks to having a built in memory controller. Usually memory controllers are present on an external processing unit placed on the motherboard, called the Northbridge. Having it as part of the CPU increases performance through reducing latency and allowing higher bandwidth connections. Instead of data having to go from the RAM to the Northbridge and then finally ending up at the processor, by implementing the memory controller on the CPU means the second leg of that journey is almost non-existent. It means that motherboards made for the Athlon 64 will not feature a traditional Northbridge chip.


Another advantage is bandwidth. Athlon 64s are set to support dual channel DDR 400 but looking into the future, it is physically easier to implement faster buses if the length is small. For overclockers wanting greater memory bandwidth, overclocking the processor will result in an overclock of the memory controller, which in turn, AMD say, result in better memory bandwidth.

Depending on the depth at which you wish to look at AMD64 architecture, it has many extensions over previous AMD architectures. Essentially it's an Athlon XP, but AMD have added more pipeline stages and a new memory management unit. A couple of months ago we described what pipelining means, but to save you the finger time, we've included the gist.

The aim of pipelining is to reduce the time taken by the processor through doing tasks in parallel. The irony is that the length of the task doesn't shorten, so if a set of instructions take 5ms on a single-cycle processor (that is one that doesn't have any pipelines), it will take 5ms on a processor that has 2, 5 or 50 million pipelines. So how does it speed things up?

It's probably best explained by a real world analogy. A popular one that is used in many courses is doing laundry (probably because lecturers know that students love doing it). Let's say you are lucky enough to have a washing machine and a dryer. Your tasks are :
1) Put the dirties into the washing machine.
2) When the washing machine has finished cleaning your clothes you load them into the dryer.
3) When the dryer is finished shrinking your clothes into a crisp, you put them in a pile (ready to put in storage).
4) Get someone to put your clothes into storage.
The non pipelined way of doing that (for this example we assume there is more than one load of laundry, and that all stages take a fixed amount of time, X) would be to wait until you finish stage 4, then go back to stage 1. The pipelined way to do this would be to load up the washing machine with dirty clothes as soon as you take the cleaned load out. So whilst the dryer is doing it's thing, the washing machine is cleaning your second load of clothes. In this little example pipelined laundry is up to 4 times faster than non-pipelined laundry.

The essence of pipelining

So coming back to AMD64 architecture, whereas Athlon XP architecture had 10 stage integer pipeline, AMD64 has 12. For floating point it has 17 stages. Broadly speaking, the greater the number of stages, the "faster" it is.

AMD64 extends x86 architecture by using 'long mode active' or LMA. Controlled by a single bit, where 0 signifies "legacy mode" and 1 representing "long mode". When in Legacy Mode, the processor runs like any standard 32-bit processor currently on the market. Operating systems such as Windows XP and older games like Half Life run without a problem. You can think of it as a top-of-the-range Athlon XP processor.

Upon LMA activation (the bit being set to 1) we have the first in two criteria met for 64-bit operation. LMA is determined by whether is 64-bit compatible. At present only Linux 2.4 or above supports the AMD64 architecture, therefore should you run Windows XP which you can, LMA would be set to 0 and therefore provide you with a 32-bit processor.

Within Long Mode there are two sub-modes, which allow for compatibility with 16 and 32-bit applications. Compatibility mode runs when application is not 64-bit compatible, but you are running a 64-bit operating system. So for example if you were to run a 64-bit version of Windows and play Quake 3 Arena, then you would be in compatibility 32-bit mode.

Compatibility mode isn't the fully featured 32-bit processor that we see in legacy mode. x86 is only supported in "protected mode" with no support for "real mode" or virtual 8086 (VM86).

Protected mode first appeared with Intel's 80286 processor allowing programmers to have greater control over their code and how the CPU processed it. However this processor didn't have backwards compatibility and the first mainstream OS, Microsoft Windows/286 helped it to gain a footing. Intel still realized something had to be done as older DOS programs weren't going to disappear overnight and for the 80286 processor to be successful it needed to be compatible with these applications. Virtual 8086 mode was created to solve that problem. You may think that because you don't run DOS programs on your PC it shouldn't matter whether this archaic technology is implemented or not, but drivers and BIOSes still utilize this technology.

Real mode and virtual 8086 is supported when the Athlon 64 is running in legacy mode. This could mean that if you have old add-in cards whose drivers still utilize virtual 8086 mode you may have to run a dual boot system in order to use them. BIOSes of motherboards won't be a problem, but older devices such as SCSI cards may require flashing.

The second sub-mode is 64-bit, and this only occurs when you run both operating system and application that support 64-bit operation. Surprisingly, the default operand size is 32-bit in this mode, however developers can change this to 16 or 64-bit as per their requirements. One further mode, allowing 64-bit processing and 64-bit operand sizes by default is, at present reserved.

The inclusion of two master modes, long and legacy ensure that present day operating systems can run on the Athlon 64, the sub-modes within long mode allow the user to run 32-bit applications even when a 64-bit operating system is installed. All of these modes need to be present in order to provide compatibility with current 32-bit operating systems and software.

Features of 64-bit mode

  • Support for up to 64-bit memory addressing.
  • Register extensions.
  • Eight new general purpose registers.
  • Eight new 128-bit SSE registers.
  • 64-bit instruction pointer.

  • When running 64-bit mode, the Athlon 64 enables register extensions which gives the programmer increased storage space on the actual processor. Registers are what's known as primary data stores. Located physically on the processor they provide the fastest data store present in the computer. In 32-bit processors, each register is 32-bits wide and when the Athlon 64 runs in legacy or compatibility mode they will remain at this size. In 64-bit mode these registers widen to 64-bits long, allowing the store of larger data such as 64-bit memory addresses. In addition, eight new general purpose registers double the total available to 16, and all will be 64-bits wide.


    The addressing logic is similar to that seen on 32-bit processors.
  • 16-bits are addressed as AX.
  • 32-bits are addressed as EAX.
  • 64-bits are addressed as RAX.

    All sizes start from bit 0 (ie. 16 bits is from bit 0 to bit 15).
  • If a value is 32-bits and is put into a 64-bit register, then the 32-bit value will be zero-extended to fit the 64-bit register.

    So what does more registers mean? For developers it allows them to store more information closer to the processor and for users this should mean programs run faster and be more resource efficient.

    The instruction pointer is a register which holds the memory address of the next location in memory to be read. Think it of as a "what's going to happen next" store. When running in 64-bit mode, it also becomes 64-bits wide to support the larger memory addresses.

    The biggest advantage of having 64-bit CPUs is the ability to address more than 4GB of RAM. For large servers this is extremely important as swapping data between hard drives is painstakingly slow. The Athlon 64 will initially support 40-bit physical addresses and 48-bit virtual memory addresses. The difference between the two is that the memory management unit within the processor gets handed the virtual address and translates that into a physical address in memory. If you feel miffed by not having "full" 64-bit virtual addressing then the fact that this implementation supports multiple terabytes is a sobering thought. AMD say that it restricted itself to this amount purely due to pricing in order to keep the number of pins down and it's pricing structure in place.

    To summarize AMD64 and the way it handles 32 and 64-bit operation, we can split it up into three modes. Two master modes, legacy and long are separated through the operating system that is running. A 32-bit operating system will result the processor running in legacy mode, in which case it can be looked upon as a very fast Athlon XP processor. When running a 64-bit operating system, the Athlon 64 will be operating in long mode. In order to maintain compatibility with 32-bit applications, the Athlon 64 has two sub-modes, compatibility and 64-bit. Compatibility mode jumps into action when you run any 16 or 32-bit application. 64-bit mode only occurs when you run applications that are 64-bit. When this mode is active, the processor has register extensions enabled.
    comments powered by Disqus