The basic cause of the high performance can be traced to a combination of three things. First, DMgpa is an expensive module, so its replacement can support a substantial amount of control. Second, the number of iterations was only eight, so the total size of the unwound control was small. Third, since the number of steps in the iteration loop was fixed, the test of termination was required only because the loop was introduced. Normally, each stage in the unwound control would not only have to evoke the basic processing step, but to test for termination. This increases the cost of the control step considerably. Removing these last two conditions would increase the cost of the control beyond the the saving from a DMgpa, returning the situation to a trade-off between speed and cost.

Throughout the chapter we have taken as given the algorithm for doing multiplication. But clearly there are many ways to perform multiplication. From an investigation of the properties of one algorithm essentially nothing can be known about the others. It might seem simple to define the essential information processes involved in an operation as basic as multiplication, where the function is easily defined mathematically (by axioms), free of any particular way of computing it. But no such results are yet available in computer science, though the topic currently is highly active. Indeed, 'recently new algorithms have been found for the multiplication of two matrices which take less time than anyone would have predicted earlier.

The upshot is that we cannot offer any systematic view for finding-alternative algorithms or knowing that an algorithm in hand is as good as can be expected. The best we can do is illustrate issues. This is not as crippling as it might sound. Design at the RT level is very much the implementation of algorithms defined externally to the design and not, like programming, very much the creation of new fundamental algorithms. The reasons for this lies in the relative simplicity of the algorithms realized at the RT level compared to those realized routinely by programming (which can have thousands of instructions).

In general an algorithm can be realized in RTM's with any basic set of data operations on a set of data structures capable of holding the required information. At one level, the data structure and data operations are completely fixed, given by the RTM system. But, in fact for any realistic information processing task there is a design issue of how to represent the information of the task in terms of the basic bit vectors of the RT level. Though occasionally the choice of data representation and algorithm are separate design decisions, usually they are tightly coupled. Given the algorithm, one chooses just that data representation that makes the processing efficient. To do otherwise invariably generates needless operations of extraction and encoding, just to deal with the inappropriate data representation.

Conversely, if the data representation is selected first, then it determines a class of algorithms as the natural candidates to do a task. Thus, the data representation can be used as means of generating possible alternative algorithms. Applying this to multiplication leads to searching for the different possible representations of numbers. It is clear that various familiar encodings of numbers, such as binary coded decimal, will not provide interesting alternatives in comparison with the basic binary coding already used. They exist

132