SGI’s Record for Memory Bandwidth

Silicon Graphics announced that an SGI Altix 4700 system has achieved a sustained memory bandwidth of 4.35 Terabytes (TB) per second in STREAM Triad benchmark tests.

The feat was achieved on an SGI Altix 4700 system powered by 1,024 Intel Itanium 2 processors running under a single copy of SUSE Linux Enterprise Server 10 from Novell with SGI ProPack 5 for Linux. The configuration, which includes 4TB of system memory, is the largest single system image (SSI) attainable on a Linux OS system.

The world record was posted last week on the STREAM Triad Top 20 page, after the results were achieved and validated June 1 on an SGI Altix 4700 system now installed at the Leibniz Computing Centre Munich (LRZ).

The Top 20 list also reveals that SGI Altix outperforms systems from NEC, HP, IBM, Cray and Sun. High-performance computing (HPC) codes require a balance between the processor and memory subsystem to maintain a constant flow of data. STREAM is a highly regarded performance metric that measures the sustainable memory bandwidth, or flow, of a computing system.

“Today’s systems face ever-increasing memory bandwidth requirements, particularly as processors and system architectures grow more powerful,” said Bill Mannel, director of systems marketing, SGI. “The ability to meet these demands is critical to a system’s ability to shorten time to results for HPC users. By leveraging our unique shared-memory architecture, SUSE Linux Enterprise, and Intel Itanium 2 processors and compilers, SGI demonstrates once again that Altix is the ideal platform to extract insight and innovation from even the largest computing problems.”

SGI’s latest STREAM Triad achievement is four times faster than the previously held record of 1 TB/sec, which also was attained on an SGI Altix system, but one powered by 512 processors.

SGI scored both world records using an OpenMP, or shared-memory, version of the STREAM Triad benchmark. Many vendors tout results achieved using the MPI version of STREAM Triad, which puts severe limitations on memory access that can hamper the performance of a significant set of HPC applications. In contrast, SGI’s NUMAflex shared-memory architecture allows all the processors in the computer to share the same memory. This results in faster processing and easier programming.