Connect with us

News

Nehalem-EP workstation Part 3: The First Benchmarks

Archivebot

Published

on

In the first part of this review, we looked at the hardware, and
 continued with the BIOS and Vista boot in the second part. We open the third part with a picture of how Vista welcomes you once that you have this rig:

We cannot wait for Microsoft to finally release a Windows 7 Releace Candidate. While Vista is working like a charm on this setup [Nova, should we state only on this setup – Ed.], it is no secret that W7 and its built-in file system optimizations for SSDs would make this setup fly.

In the meantime, pictures like this are the reason why you should leave the Welcome screen on. No matter what your day is, just looking at this overview makes me smile. An ideal graphics card for this system would be a nVidia Quadro FX 5800 or the upcoming Fire Pro 9000 series dual-GPU card.

Now, here 
are the first sets of benchmarks I ran, just in time for the official launch. Cutting to the chase: the first round of benchmarks, focusing mostly on
SiSoft Sandra 2009 SP3 beta, CineBench R10 and Linpack 10, used all default 
system settings and SMT aka Hyper-Threading enabled. Turbo and NUMA
 were always enabled in all tests. We did encounter various issues which we will address in this article, but all of the issues can be resolved by disabling Hyper-Threading technology.

Synthetic (does not) lie: 8-core Nehalem beats out 16-core AMD system

Oh my… are we seeing 138 GFLOPS Double-Precision from a two-socket system? nVidia now needs a new GPU to beat raw numbers achieved by Nehalem-EP. ATI Radeon 4870 is still safe with its 250 GFLOPS. This is ego boasting for Intel engineers par excellence.

SiSoft Sandra is useful for its
 very graphical results display in multiple modes – two of which we 
use here – and convenient competitive comparisons, usually up to
date, built in. In this round, we covered Sandra CPU tests, Multimedia, Memory bandwidth,
Memory latency (random and linear latency benchmarks).

Memory Latency test – Hyper-Threading enabled and 79ns latency

Memory Latency test – Hyper-Threading enabled and 79ns latency

Memory Latency test – Hyper-Threading disabled and 78ns

Testing memory latency with HT disabled results in impressive latency of just 78ns. Bear in mind this is a combined 384-bit memory controller, just like GeForce 8800GTX architecture.

Note: we
 ran the random latency several times, both on CPU0 and then on CPU1, where it
s haves off a nanosecond – interesting… we will thoroughly check various benchmarks and work closely with the developers to see are these issues software or hardware-based.
 Also, note that our unit seems just a tiny bit slower than the
”other” dual W5580 that SiSoft seemingly tested. It could be a
combination of slightly slower base clock, memory speed and some
other BIOS or hardware factors that we’ll investigate in due course.

Memory bandwidth… theory states 51.02 GB/s with DDR3-1066 memory, reality dictates still impressive 36.86 GB/s

Memory results entered Warp 7 compared to weak scores achieved by Nehalem’s predecessor.

Sandra’s MultiMedia score shows that AMD still has some life left – 16 AMD cores yield almost 425 MPixel/s

AMD’s pricey 16-core monster reacts to Intel’s threat, when a combined 512-bit memory controller showed fantastic 423 MPix/sec, resulting in fantastic 423 MPix/s.

Either way, in most cases, you can see that the W5580 blows away the
 competition, with Intel’s own dual Harpertown Xeon X5492 being the 
closest match in many cases. In a few situations, the HT-enabled dual
W5580 beats the quad-socket Shanghai Opterons from AMD, as you can see. And even with all twelve of the DIMM slots populated with 2-rank registered
 modules, neither the memory bandwidth nor latency suffer; a good base
 for very fast streaming data processing.

Linpack meets Hyper-Threading…

16 CPUs, 8 threads, HT enabled and we get 56.99 GFLOPS

Linpack forgets Hyper-Threading

8 CPUs, 8 Threads and we get 85.25 GFLOPS. Not a small difference…

If you want the best, there you go

Best shot with large data array – more improvements to come using DDR3-1333 or 1600 RAM

Interesting run with Linpack when SMT is enabled: I manually set the
total thread count to be eight as we wanted to see how the Windows will
 prioritize the threads. As guessed, it tried to pile them all up on
 one CPU, it seems, instead of doing the “physical CPUs used first,
 then logical CPUs when all physical occupied” approach. Result was, well… not exactly encouraging. So, please no
 SMT/HT for Linpack-like heavy math stuff.

Original Author: Nebojsa Novakovic


Webmaster’s note: You have stumbled on one of the old articles from our archive, for the latest articles we would recommend a click to our tech news category. There you can find the latest technology news and much more. Additionally, we take great pride in our Home Office section, as well as the best VPN one, so be sure to check them out as well.