previous | contents | next

MULTI-MICROPROCESSORS: AN OVERVIEW AND WORKING EXAMPLE 483

on a rectangular grid of size M X N, where only the values at the outer edges of the grid are given.

A finite difference method [Baudet, 1976] that transforms the problem into a set of linear equations Ax = b is used. Here, x is an MN vector of all the points in the grid, A is an MN X MN sparse matrix, and b is an MN vector derived from the boundary conditions. This set of linear equations is derived from the new approximate values of the points (in each iteration) by averaging the values of the four adjacent neighbors of each point. The solution of this PDE is required in many application areas (e.g., in electromagnetic fields, hydrodynamics). Other PDE problems can be similarly solved using this method.

The computation is initially decomposed into P processes, where P is equal to the number of processors available. Each process (and processor) iterates on a fixed subset of MN/P components out of the total MN components. One processor, the "master" processor, initializes and starts the other "slave" processors, and prints the results when all have finished. Note that the master participates in the computation just like the slave processors.

Sorting

This problem concerns the decomposition of the well-known QUICKSORT algorithm [Singleton, 1969] into asynchronous parallel processes. The median for each sort pass was chosen as the median of the first, middle, and last elements in the sublist. During a sorting pass, a processor partitions its list of elements into two sublists: elements larger than the median of the original set and elements smaller than the median. The processor then pushes the address and size of the smaller of the two subsets onto a stack shared by all the processors. Making the smaller subset available to the other processors tends to put more work onto the shared stack in order to keep as many processors as possible busy. The processor proceeds to further partition the remaining (larger) subset. When the remaining subset cannot be partitioned further, the processor selects the next available subset from the shared stack.

Simple assumptions about the algorithm give a theoretical sorting time of:

cN [(K - M)/P + 2 (1 - (l/2)M)]

where N is the number of elements tc sort, K is Log2 N, c is constant, P is the number of processors, and M is Log2 P.

When the number of processors is much smaller than the number of items to be sorted, almost linear speedup can be achieved. The performance degrades considerably when the number of processors is large and asymptotically approaches a speed of T = c Log N/2. See Stone [1971] for a description of sorting methods that speed up as N/Log N for large numbers of processors.

Integer Programming - The Set Partitioning Problem

The particular integer programming considered here is one of the most practical and applicable methods. It is used, for example, in airline crew scheduling [Bales and Padberg, 1976].

The set-partitioning problem is to solve:

min {c.x Ax = 0, xj = 0 or 1 for 0 < j <N}

where A is an M X N binary matrix, c is an N vector, and c = (1 . . . 1) M vector.

This problem typically is solved by performing an N-ary tree search on a large relatively sparse binary matrix. As an example of this method, consider the airline crew scheduling problem. The rows of the A matrix correspond to a set of flight legs from city A to city B, in time T to be covered during a specified period, and the columns of A correspond to a possible sequence of tours of flight legs done by one crew; c is the vector of the associated cost of each tour. A possible solution includes a set of

previous | contents | next