Contact The Author
Wayne

Review Related Links

Current Pricing

 

i845 - P4 For The Masses?
Author : Wayne Date : 18th October 2001

3DVelocity would like to thank Intel Corp and especially Mathias Raeck and Graham Palmer for their help and courtesy in providing this motherboard for review.

 

Architecture details -

As this isn't strictly a Pentium 4 review, but rather a look at the i845 chipset, I'm not going to bog you down too much in the intricacies of the Pentium 4's architecture, but to understand the importance of this chipset we need to understand some basics.

Features Benefits
Processor Core Speeds Up to 2 GHz Maximum performance for a wide range of emerging Internet, PC and workstation applications
Intel® NetBurst™ Microarchitecture, including: 400-MHz System Bus High bandwidth between the processor and the rest of the system improves throughput and performance
256-KB L2 Advanced Transfer Cache Enhances performance by providing fast access to heavily used data and instructions
Hyper-Pipelined Technology Extended pipeline stages significantly increase overall throughput
Streaming SIMD Extensions 2 144 new instructions accelerate operation across a broad range of demanding applications
Rapid Execution Engine Arithmetic Logic Units run at twice the core frequency, speeding execution in this performance critical area
128-Bit Floating Point Port Floating Point performance boost provides enhanced 3D visualization and scientific calculation
SIMD 128-bit Integer Accelerates video, speech, encryption and imaging/photo processing
Execution Trace Cache Greatly improves instruction cache efficiency, maximizing performance on frequently used sections of software code
Advanced Dynamic Execution Improved branch prediction enhances performance for all 32-bit applications by optimizing instruction sequences

Intel christened its new architecture for the P4 "NetBurst", and though I've not read an official reason for this name being chosen, it's certainly nothing to do with Internet, at least it doesn't seem to be optimised for the Internet other than for video streaming. It was suggested that the name was chosen to reflect the way the 'Net was seen by many as new and trendy, but I don't think any of us see the 'Net as new and trendy in a particularly high tech way. Perhaps it refers to P4's ability to transfer or "burst" data at speed through its own micro network. Either way, the name isn't really what matters, it's the technology behind it we want to know about.

Bandwidth -

The Pentium 4 works on a "quad pumped" internal bus. That is, although the system bus runs at 100MHz, it is "multiplied" by a factor of four. This means data is transferred internally at an incredible 400MHz. The upshot of this is that the P4 has a full 3.2GB/Second of bandwidth, totally eclipsing Athlons maximum of 2.1 GB/Second. This is only half the story however, as it's pointless having 3.2GB/S of data bouncing around internally if your memory starts choking at less than that figure. Controversial though it has been, Intel's decision to launch the P4 on the i850 chipset with dual channel Rambus was actually the only way to go, as only Rambus (RDRAM) has anywhere near enough bandwidth to go the distance. In actual fact, rated at 3.2GB/S, the dual Rambus configuration is the perfect match for the P4's appetite. What turned people against the idea of RDRAM, prior to its very public legal battles, was that it needed to be fitted in pairs, and in identically matched pairs at that. Given the price of RDRAM when P4 launched in the November 2000 it was never likely to be an easy for Intel to sell the idea of Rambus to a value conscious market.

L3 Cache -

It's claimed Intel had originally planned to strap MB of L3 cache to the P4, but clearly that never happened, presumably because it would have meant a move back to a cartridge design not to mention costs.

L2 Cache -

Sticking with its PIII naming convention, the P4's L2 cache, or its "'Advanced Transfer Cache" remains at 256k. However, additional enhancements were introduced to help power through the data, including 128 byte cache lines and a 256 bit data bus to the core. The bandwidth offered is awesome, in fact you're talking 48GB/Second for the 1.5GHz core.

L1 Cache -

I don't think it's any great secret that the original plans for the P4 had to be "stripped down" in order to meet its die size limitations. Several ambitious features were either dropped or rethought, and one of those was the L1 cache. Originally planned to comprise of a 16k data cache and a 12000 instruction execution trace cache (instruction cache), the final silicone had a mere 8k data cache, though so far as I'm aware the 12000 instruction execution trace cache remained intact. Why the fancy name? well, in a nutshell the execution trace cache is very closely tied in with the core's decoders and handles only micro-ops, these are chunks of code that, unlike x86 instructions, need no prior deciding, and with a latency of only 2 clocks, an excellent branch prediction unit and a clever compression algorithm in place this seemingly tiny amount of cache does a fine job of keeping the rather long pipeline fed with data.

Hyper Pipeline -

To enable it to push processor speeds beyond the 1GHz ceiling encountered using its older P6 architecture, Intel raised the pipeline stages from 10 in the PIII to 20 in the P4. Occasionally however, data in that pipeline will need to be flushed and the more stages there are, the more data gets flushed (up to 126 instructions in fact) and the longer it takes to refill. Use of the execution trace cache aims to keep these occurrences to a minimum, but when they do occur the delays can be significant, in processor terms at least.

SSE2 -

SSE2 adds 140 new instructions to the original SSE set. Some will claim that SSE2 is nothing more than SSE should have originally been, but regardless of that its power and flexibility is now huge. This is probably just as well because one of the other parts of the P4 architecture to hit the operating theatre floor when it underwent its fat reduction operation was one of the two floating point units. This is why we often see such average FPU performance when running code that can't compensate for this deficit by using SSE2.

The Rapid Execution Engine -

At the heart of the rapid execution engine lie two double pumped ALUs and two double pumped AGUs. These operate at twice the core frequency but are only able to cope with micro-ops. More complex instructions need to be channeled through the single slow ALU, and this actually accounts for the vast majority of data handled.

Page 3- Chipset Options

 

Home