stream) detects both that the two streams have been completed and also that the C-stream found that C=0, hence that the computation should terminate. This two- Bus parallel implementation is approximately twice as fast as the one-Bus serial version, but requires extra hardware in the form of an extra Kbus and two K(parallel-merge)'s.
Fig. 14. RTM diagram of 8-bit multiplier, concurrent implementation, test at end of loop.
One can attempt to avoid some of the extra cost by making minor adjustments in the control structure. For instance, one of the Kpm's can be eliminated if the loop control is divorced from the synchronization. This can be done by putting the test prior to entering the loop, as shown in Figure 15. Unfortunately, this structure requires one additional control step in the P-MPD control stream, thus slowing up the system slightly.
The type of parallelism exhibited here is really the general case -- functionally diverse computations that do not depend on each other for data (or do not affect data used by each other, such as updating in midstream) are done by independent machinery. Synchronization, when it is finally required, is forced by means of parallel-merges. This is often called concurrency, although the term parallelism itself is often used as well; terminology is not yet standardized. Our example again illustrates a rather general rule: that there is always a tradeoff between speed and hardware. The increase in speed (in the two-Bus system) costs hardware; the attempt to avoid some of that cost has to give back some of the speed gain.