operation. The final stage, realized in all large computers, is where a multiply exists as a primitive operation. The latter brings us full circle to where the multiply itself is realized in a hardwired form.

We have now created a substantial number of designs, all ostensibly to do the same task. Our analysis is not complete without an attempt to compare the designs in some uniform way. Throughout we have been concerned primarily with operational performance, measured in terms of operation time or operation rate, and total system hardware cost, measured in terms of the costs presented in this book, which are representative of relative technological cost. These are not the only objectives of concern in a design -- recall the list of objectives in Chapter 1. We will return to this issue at the end of the section; for now let us work with the kinds of data we have made available in our analyses.

In Figure 31 we select most of the systems and plot them horizontally on effective operation time (the reciprocal of the operation rate) and vertically on total hardware cost. In discussing each of the separate systems, we did not always provide the explicit cost figure. These can be computed from each of the figures using Figure 17 from Chapter 2. Earlier we distinguished operation time and operation rate as two separate measures. We prefer to use operation rate in the figure, on the assumption that parallel organizations will be used in situations where the rate can be exploited. We prefer to express the operation rate as an effective operation time, since a designer's intuitions are built up in terms of microseconds per operation rather than millions of operations per second. RT-level systems are generally serial so that one adds operation times for a heterogeneous collection of operations. There is essentially no use for a number like 750,000 multiplications/second, since it never occurs that a system does nothing but multiplications for any period of time.

The most striking fact about the Figure 31 is that the operation times vary over a factor of 500, from 1 to 500 microseconds and the costs vary over a factor of 10, from 50 to 400. The total ranges, however, are less important than detecting the effects of the various types of alternatives that are generally available in designing an RT-level system. We can get some insight into the effect of parallelism and facility sharing, unwinding control loops, varying the algorithm, hardwire versus software control, and generality versus specificity. Where quantitative factors can be made of the effect from the figure, these are summarized in the table of Figure 32.

We take parallelism to include both replication of hardware to achieve concurrency factors greater than 1 and facility sharing to achieve concurrency factors less than 1.

The simplest effect of parallelism, facility sharing, is independent of algorithm and implementation. Some of the implementations are given as pairs in Figure 31, showing both the 1 and 2 DMgpa arrangements. With shared resources extra time is required to carry out the sharing, consisting of shuffling data to and from the shared resource. In the two implementations for the eight-step algorithm that use the straight-line control, only two active registers are required. Since a single DMgpa provides these, only one DM module is required. Sharing in this form costs in time by a factor of about .5 to .6.

Facility sharing also occurs in the Crtm, since it has only a single register (not

148