actual facts on which a design choice must be based. But the alternatives almost always exist to be considered, since they reflect pervasive qualitative characteristics of RT systems: the occurrence of an operation more than once in a larger system; the prevalence of array data; and the existence of serial computation.
Having set out to study the design of RTM systems to multiply, it may seem odd that we concentrated first on the use of multiply as a subsystem within a larger algorithm, rather than simply taking multiply as an autonomous computation and considering alternatives that arise for its internal structure. Our reason for so doing is to counteract the natural tendency to consider multiply as a new unit, just as if a DM(multiply) had been defined as a basic RTM module. This latter is a powerful notion and in programming it is almost universally the right thing to do. In RT design it is also appropriate to a degree. Every designer should have a personal library of such schemes, designed, checked out and documented. However, as the present section shows, the larger system should always be taken into account in determining the way the subsystem is to be implemented. In the examples given above the internal structure was substantially modified in response to whether to implement the algorithm with one or another degree of parallelism. Thus, the level at which the unit should exist is that of the conceptual RTM flow chart, such as Figure 9. In fact,-one could even write simply K(...) for Kmacro(...), indicating that no decision had been made on how to realize the computational function.
With this fundamental point made, we can henceforth restrict our concern to the function of multiplication considered in isolation.
VARIATIONS IN IMPLEMENTING THE BASIC ALGORITHM'
In this section we ask what alternatives for implementation regularly occur if we take a given algorithm, such as that given by Figure 3 for multiplication, and consider the calculation of an output from a single input. The basic option is given by a direct one-to-one mapping of the algorithm into an RTM flowchart; it was already presented in Figure 4. What other alternatives exist?
GENERAL PARALLELISM: CONCURRENCY
Given that a certain set of operations has to be performed, speed comes from doing some of them at the same time -- in parallel. What prevents simply doing everything at once is that some operations depend for their inputs on the outputs of other operations. Such operations are forced into a sequential relationship. Thus the opportunity for parallelism arises from operations that do not depend on each other. Examination of Figure 3 shows that the calculations on C are independent of the calculations on P and MPD, providing only that the
two sets of calculations remain locked together as a whole (only. one decrement of C per multiplication step). Thus, these two sequences can be separated into two parallel streams.
Figure 14 shows the implementation, with separate data-memory parts for shifting and adding, and for counting. The parallelism begins at the output of the K(serial merge) following Ke(C<-8), control branching into two paths. Control must be synchronized again at the end of the sequences. This is done by a K(paralled-merge). Actually, two Kpm's are required. The left hand one (under the P-MPD control stream) detects both that the two streams have been completed, and that the C-stream found that C was not 0, hence that another iteration of the loop is required. The Kpm at the right (under the C control