[picture:OPTI-NUM header graphic]


Newsletters
November 2008

November 2008

Back to main newsletter


The Need for Parallel Computing

In this part of the newsletter, we discuss types of parallel computing problems, and introduce the different levels of support provided by Parallel Computing Toolbox in order to address parallel computing problems.

 

Why is parallel computing important to me?

More and more researchers, engineers and scientists are finding a need for parallel and distributed computing, for some or all of the following reasons:

  • They have computing tasks that they would like to run faster.
  • This is called a task bottleneck
  • They are working with datasets that are too big for the memory of their machines.
  • This is called a data bottleneck
  • They want to utilise the extra computing power they have available on their desktop, in their department, or on a dedicated computing resource (such as a computer cluster).

Task-parallel problems

Task-parallel problems, such as parameter sweeps and Monte Carlo simulations, take a long time to run because there are so many different combinations, and each needs to be executed separately.  Running such problems serially scales linearly with time. In order to address this problem, the user could distribute the task over multiple machines, where each runs a smaller part of the problem. However, users then need to be aware of the size and shape of the problem in great detail, in order to optimise their computing on each machine and then efficiently gathering the results. For this reason, most users have shied away from parallelising their code, as this complication in their code means they would divert attention away from their task to the task of being aware of which computational node had executed each computation.  Consequently, many users have resorted in the past to running far fewer simulations, and much coarser parameter sweeps.  When this is not possible, many researchers have felt no other option exists except to wait for their problem to be solved serially.

Data-parallel problems

Some problems are just too big for one computer.  As your dataset grows, inevitably your hardware will soon no longer be able to support all the operations you need to perform.  While it is possible to get applications to communicate with “virtual” memory, this requires a high level of expertise from users. Users have instead resorted to using reduced datasets, and using less accurate, but less resource-hungry, operations on their data.  This results in a lack of precision, increased algorithm development time, and requires careful data reduction techniques.

Solving task-parallel and data-parallel problems with MATLAB

Parallel Computing Toolbox lets you solve computationally and data-intensive problems using MATLAB and Simulink on multi-core and multi-processor computers.  You can use the toolbox to execute applications on a multi-core or multi-processor desktop. Without changing the code, you can run the same application on a computer cluster (using MATLAB Distributed Computing Server™). With the release of Parallel Computing Toolbox 4.0, Parallel MATLAB applications can be distributed as executables or shared libraries (built using MATLAB Compiler™) that can access MATLAB Distributed Computing Server.

MATLAB’s parallel computing solutions provide multiple levels of “user involvement” with task and/or data parallelisation. 

  • No code changes. Certain toolboxes, such as the Optimisation Toolbox, Genetic Algorithm and Direct Search Toolbox, and System Test, will take advantage of your parallel computing setup seamlessly without any code changes. For an example of this, see the section in this newsletter on “Built-in Support for HPC”.
  • Minimal code changes. Using built-in MATLAB constructs introduced with Parallel Computing Toolbox, you can change your serial for-loops to parallel for-loops, simply by changing the word ‘for’ to ‘parfor’ wherever appropriate.  Large arrays can be spread out across your cluster by adding the single keyword ‘codistributed’ to the matrix creation function.  The new spmd (‘Single Program, Multiple Data’) keyword is all you need to run commands simultaneously on your workers. These constructions and shortcuts allow the average MATLAB and Simulink user to quickly and efficiently write scalable code, allowing for seamless transition from initial algorithm development to large scale technical computing. For an example of using these constructs, see “Parallel Computing Language Constructs”.
  • For the more advanced programmer, Parallel Computing Toolbox supports MPI (‘Message Passing Interface’) statements such as labSend, labReceive, and labBroadcast, to allow users to take low level control over the parallelisation of code.

Back to main newsletter