previous | contents | next

bit stages, rather than the 1-bit stages of the original algorithm. Since the desired goal is to handle 8-bit numbers, the use of 4-bit multiplication implies that the table look-up is into a 2^4 x 2^4 = 2^8 memory. RTM read-only memories are not currently available in the smaller size, but 1256 word read- write memory is available at 50, thus cutting the table cost a little bit.

Figure 20 shows the basic algorithm using an example: 1010 * 0111. We use 2-bit components, so that each 4-bit number is made up of two 2-bit components, analogously to the 8-bit number being made up of two 4-bit components. However, there is no scaling up of the stages in the computation, since the components simply increase in size. Thus, three stages of computation are required, with shifts for each stage corresponding to the size of the component (2 in the example, 4 in the actual case). Figure 21 gives the RTM system for the 8-bit case with 4-bit components. Unlike the basic algorithm, there is no requirement for a test on the multiplier, since all cases are handled in a uniform way through the component multiplier. The problem with this implementation is that a significant amount of field shifting is required to manipulate the partial products. This raises the time and-cost both to a factor of 2 times the unwound loop case of Fig. 17. Normally when this scheme is used, all the shifting is done with wires and combinational circuitry. Here we have used the transfer fields of the M(transfer)'s, and Tin's specially wired to the Boolean outputs of the M(transfer)'s.

This scheme with 4-bit components might appear to be like doing arithmetic in 4-bit digits -- i.e., serial hexadecimal arithmetic -- since the multiplication is done hexadigit by hexadigit. However, the additions are done in parallel in binary. If this, were not the case, the addition in each stage would have been a multistage process, making the scheme still more expensive.

Though we can only be illustrative, let us examine one other alternative scheme for multiplication. This algorithm, known in some parts as the Russian Peasant's Algorithm (Knuth, 1971) and hereafter abbreviated simply as the RP algorithm, works on two positive integers. It involves doubling the multiplicand and halving the multiplier, while accumulating only the multiplicands of odd multipliers to get the total product. Figure 22 provides an example of multiplying 67 by 12. The first row is obtained by placing the original multiplier (67) in the Multiplier column and the multiplicand (12) in the Multiplier-odd column because 67 is odd. The second row is obtained by dividing the multiplier by 2 and putting in the truncated value (33); the multiplier is doubled (2*12 = 24) and placed in the odd side again since 33 is odd. The process is continued until the multiplier becomes 1 or 0. Then the product (804) is obtained by adding up all the entries in the Multiplier-odd column. The figure also gives the scheme run with 12 as the multiplier and 67 as the multiplicand.

The algorithm has a peculiar flavor when expressed in decimal notion, but it does not take much insight to see that it is well adapted to RTM implementation, and in fact works with the underlying binary representation of the two numbers. Note, this is similar to the algorithm of Figure 1. The number of iterations in the algorithm is dependent on the number of 1-bits in the multiplier. It is likely that the numbers are small. Thus, assuming that the logarithms of the numbers are distributed uniformly, only an average of four iterations is required. In general the smaller of the two inputs will have fewer 1-bits (though not always), so the number of iterations can be reduced somewhat by taking the smaller number as the multiplier. The algorithm automatically generates its termination condition

136

previous | contents | next