Category Archives: HPC

Advances in Computing

Have been watching advances in computing power for some time, since university really.   I did research in parallel algorithms and architecture for my first position at the university and later applied practically on Wall Street.   In those days super-expensive machines like the Intel Hypercube, Paragon, and many other architectures were the backbone of the HPC community.

HPC (High Performance Computing) roughly breaks down into 4 categories:

  1. Big iron supercomputers (MIMD generally)
  2. Distributed computing (these days advertised as Cloud Computing)
  3. The emerging SIMD GPU based solutions
  4. Quantum Computing (not really here yet for the mainstream)

In the machine learning and optimisation world there are massive problems, some of which are not computable on von-neumann architectures, as their runtime would be astronomical.    An (absurd) example of such a problem would be to simulate a large number monkeys typing on typewriters, stopping when one produces the works of Shakespeare.   The number of monkeys required to produce such a work on average in astronomical.     This seems like an absurd problem, but is comparable to the GP / GA approach.

Then of course there are numerous problems with high dimensionality and/or with polynomial order complexity.

Supercomputing on the Cheap
The FASTRA team at the University of Antwerp has put together an inexpensive multi-teraflop machine with 7 gaming cards.  Check out their video.

Unfortunately the “easy” part of these sort of solutions is the hardware.  The problem is the (often) great expense to develop one’s models in a SIMD framework, so can be applied to for the GPU architecture.    Although there is now standardization on the low-level C-variant used to program GPUs, there are significant differences between different models of GPUs, that even if you manage to write a correct SIMD program, may have to rearrange for a specific GPU implementation.   (I guess this is not all that different from my experiences with big-iron parallel architectures of the past).

One could have a team devoted to parallelization, tuning, and retuning / reworking for the new GPUs that are out periodically.   Very time consuming!

For my work, the problems that would map well are particle filters and monte-carlo based models, each of which have obvious fine-grained parallel operations.

Quantum Computing
The other notable announcement this week was Google’s use of quantum computing to solve pattern recognition problems.   I have not done the leg-work to fully understand the algorithms in quantum computing, but broadly it seems to be a matter of framing one’s problems statistically as path integration problems (i.e., expectations), where quantum computing allows the paths to be explored simultaneously.


5 Comments

Filed under HPC, machine-learning

Future For Commercial HPC

As I noted in an earlier post I have High Performance Computing requirements. Basically if you can give me thousands of processors, I can use them. The problem with HPC today is that it is one or more of the following (depending on where you are):

  • academic and only open on a limited basis to researchers based on their proposals
  • internal
  • available but not cost effective (8 core @ $7000 / compute year at amazon)

This flies in the face of what we know:

  • there are many thousands of under or un-utilized computers available
  • the true cost of computing power + ancillary costs (power, people) can be scaled to a much lower #
  • organizations should want to monetize this underutilized capacity

Why do I care about this? Well, I could use cheap computing power today, but also I used to be a parallel algorithm researcher back in the day, so have been waiting for this for a long time.

The solution needs to be to allow compute resource providers a means to auction their unused resources for blocks of time, immediate or future. HPC users that want to evaluate a massively parallel problem can collect a forward dated/timed group of nodes for execution, finding a group within their cost range or wait for lower cost nodes to become available.

How would this be accomplished?

  1. Exchanges are set up for geographical areas where providers can offer gflop-hr futures and consumers can buy computing futures or alternatively sell their unused futures.
  2. Contract requires standardized power metrics (SPECfprate2006 for instance)
  3. Contract requires standardized non-CPU resource (min memory, disk)
  4. Standard means of code and data delivery (binary form, encryption, etc)
  5. Safe VM in which to run code
  6. Checkpointing to allow for a computation to be moved (optional)

Research into auction-based scheduling and resource allocation began in the early 90s, perhaps earlier. The first paper I saw in this regard was in 1991. There are now hundreds of papers on this and a few academic experiments. There should be a big market for this amongst web hosting companies, etc.

Amazon and Google, although likely to be very efficient with resource utilization, are likely to have peak periods and slack periods like everyone else. The strategy would be to price resources lower during slack periods to attract “greedy” computations looking for cheap power.

I have specific ideas about how this would be implemented. Contact me if you are interested.

Leave a comment

Filed under grid, HPC, performance

High Performance Computing on the Cheap

I have a couple of trading strategies in research that require extremely compute intensive calibrations that can run for many days or weeks on a multi-cpu box. Fortunately the problem lends itself to massive parallelism.

I am starting my own trading operation, so it is especially important to determine how to maximize my gflops / $. Some preliminaries:

  • my calibration is not amenable to SIMD (therefore GPUs are not going to help much)
  • I need to have a minimum of 8 GB memory available
  • my problem performance is best characterized by the SPECfprate benchmark

I started by investigating grid solutions. Imagine if I could use a couple of thousand boxes on one of the grids for a few hours. How much would that cost?

Commercial Grids
So I investigated Amazon EC2 and the Google app engine. Of the two only Amazon looked to have higher performance servers available. Going through the cost math for both Amazon and Google revealed that neither of these platforms is costed in a reasonable way for HPC.

Amazon charges 0.80 cents per compute hour, $580 / month or $7000 / compute year on one of their “extra-large high cpu” boxes. This configuration of box is a 2007 spec Opteron or Xeon. This would imply a dual Xeon X5300 family 8 core with a SPECfprate of 66, at best. $7000 per compute year is much too dear, certainly there are cheaper options.

Hosting Services
It turns out that there are some inexpensive hosting services that can provide SPECfprate ~70 machines for around $150 / month. That works out to $1800 / year. Not bad, but can we do better?

Just How Expensive Is One of these “High Spec” boxes?
The high-end MacPro 8 core X5570 based box is the least expensive high-end Xeon based server . It does not, however, offer the most !/$ if your computation can be distributed. The X5500 family performs at 140-180 SPECfprates, at a cost of > $2000 just for the 2 CPUs.

There is a new kid on the block, the Core i7 family. The Core i7 920, priced at $230 generates ~80 SPECfprates and can be overclocked to around 100. A barebones compute box can be built for around $550. I could build 2 of these and surpass the performance of a dual cpu X5500 system, saving $2000 (given that the least expensive such X5500 system is ~$3000).

Cost Comparison Summary
Here is a comparison of cost / 100 SPECfprate compute year, for the various alternatives. We will assume 150 watt power comsumption per cpu at 0.10 / Kwh, in addition to system costs.

  1. Amazon EC2
    $10,600 / year. 100/66 perf x 0.80 / hr x 365 x 24
  2. Hosting Service
    $2,700 / year. 100/70 perf x $150 x 12
  3. MacPro 2009 8 core dual X5570
    $1070 / year. 100 / 180 perf x $3299 / 2 + $160 power
  4. Core i7 920 Custom Build
    $430 / year. 100 / 80 perf x $550 / 2 + $88 power
  5. Core i7 920 Custom Build Overclocked
    $385 / year. 100 / 100 perf x $550 / 2 + $100 power


The Core i7 920 build is the clear winner. One can build 5-6 of these for the cost of every X5570 based system. Will build a cluster of these.

6 Comments

Filed under HPC, performance