previous | contents | next

MULTI-MICROPROCESSORS: AN OVERVIEW AND WORKING EXAMPLE 481

Map Bus is about 550K references/second. Assuming that the measured benchmarks represent typical situations, and that a 90 percent hit ratio to local memory can be achieved, we see that a Map Bus and K.map can support a cluster of about 60 CMs. The band width of the Map Bus is an important limiting factor that constrains the number of CMs in a cluster, so that there is a need to consider multicluster configurations independent of reliability or availability considerations.

3. All processors access their local memory for the code, stack, and local variables, and use the K.map only for mapping to shared global data. This is the case already considered, and for up to eight processors, negligible contention is experienced (Figure 18).

Figure 21. Integer programming speedup.

From additional measurements, we estimate the intercluster saturation rate to be about 287K references/second, with the source K.map being the bottleneck component in the system.

Figure 21 shows another interesting measurement on Cm*. Here, a number of different cases of the Integer Programming program are shown as a plot of execution speedup versus the number of available processors. Most of the time, almost linear speedups were observed. This is a consequence not of a breakthrough in algorithmic design, but rather of the fact that the time to find the optimal solution in a search tree is dependent on the order in which the tree is searched. In other words, some search orders allow quicker, more radical prunes of the tree than other search orders. Therefore, the chance will always exist that one of the parallel paths initiated will fortuitously find a good solution and allow early pruning of the search tree.

Fundamentally, the multiprocessor cannot expect speedups greater than linear in the number of available processors. If, for example, the speedup of the Integer Programming problem was observed to increase as the square of the number of processors, then a new program could be written for a uniprocessor that, in effect, emulated the operation of a set of parallel processors by round-robin sharing of the uniprocessor among the parallel processes. In special instances, parallel processes may allow the elimination of some overhead, but linear speedup in the number of available processors is the ideal situation.

Performance of Multiple Cluster Configurations

The results of Figure 18 imply that many more than ten CMs could be managed in a single cluster before the Map Bus becomes a performance bottleneck. However, since we are interested in the potential of the Cm* structure for much larger systems, we also examined the

previous | contents | next