Cray unleashes 100 petaflop XC30 supercomputer with up to a million Intel Xeon cores
By Timothy Prickett Morgan for The Register.com
Hot on the heels of the delivery of the 20-plus petaflops “Titan” CPU-GPU hybrid supercomputer to Oak Ridge National Laboratory last week, Cray has launched what is unquestionably a much better machine, the long-awaited “Cascade” system developed in conjunction with the US Defense Advanced Research Projects Agency and sporting the new “Aries” interconnect.
The Aries interconnect is so important to hyperscale and parallel computing that Intel shelled out $140m back in April to get control of the people who created the Aries and predecessor “Gemini” interconnects, the chip designs themselves, and the 34 patents associated with them.
Cray retained exclusive rights to the use of Gemini and Aries, so you are not going to be able to buy an Aries chip at Newegg and build your own XC30 supercomputer. (Sorry.) Further down the road, Cray and Intel are working on a common supercomputer design called “Shasta,” which may or may not use an interconnect similar to the fifth generation “Pisces” interconnect that Gray was kicking around as an idea two years ago.
What we do know is that Intel will be footing most of the bill for whatever the Shasta interconnect is, which suits Cray fine, apparently.
Now that Cascade is launched, Cray is willing to talk about a few things that were not disclosed about the project. Barry Bolding, who is currently vice president of storage and data management at Cray, worked at Cray Research two decades ago, then left to work for IBM for a while. Bolding came back to Cray when Peter Ungaro, who used to run the HPC biz for Big Blue, asked him to return to Cray and, specifically, to handle Cascade.
“This project was my baby for a while, and it was a very tough time on us,” Bolding tells El Reg. But as the Gaffer says in Lord of the Rings, “All’s well as ends better.”
First, and this was a bit surprising, DARPA does not actually get its own Cascade machine for all of the money that it spent on Cascade development, but rather has access to a machine installed elsewhere for a number of months. Then if DARPA thinks the machine passes muster, the branches of the US military can decide on their own whether to buy a machine or not.
Cray gets to monetize all of DARPA’s investments, as it did with prior systems funded by the government. It’s good work, if you have the nerves of steel to keep your wits in the low-margin, high-stakes supercomputer racket.
In phase one DARPA’s High Productivity Computing Systems program in 2003, Cray originally received $43.1m to begin work on the Cascade line of machines, which sought to converge various machines based on x86, vector, FPGA, and MTA multithreaded processors into a single platform. (GPU accelerators were not yet on the scene.)
In phase two of the HPCS effort, Cray received a $250m award in 2006 to work further on Cascade and also to create its Chapel parallel programming language, which is available now and open source.
IBM got $244m to work on its PERCS system, which was similar to but not the same as the ill-fated “Blue Waters” Power7-based 20 petaflopper that Big Blue pulled the plug on at the University of Illinois last year, leaving Cray wide open to win a $188m deal with an XK7 Opteron-Tesla hybrid machine.
Anyway, in January 2010, DARPA scaled back the Cray Cascade funding by $60m, and neither DARPA nor Cray ever explained why.