previous | contents | next

468 Part 2½ Regions of Computer Space
Section 6 ½ Fault-Tolerant Systems


data bits and address. This overlapping is arranged to cope with particular memory circuit failures which may affect more than one bit of a word.

The 1A call stores contain both translation data backed up on the file stores and call-related transient data which are difficult to regenerate. The roving spare concept is expanded for the call stores to include sufficient spares to provide full duplication of transient data. If a fault occurs in a store that contains translation data, one of the duplicated stores containing transient call data is preempted and loaded with the necessary translation data from the duplicated copy in the file store. A parity check is done in the same manner as in the program store, using- two check bits.

The combination of writable program store and file store provides a very effective and flexible system architecture for administrating and implementing a wide variety of features which are difficult to obtain in the No. 1 ESS. However, this architecture also complicates the process of fault recognition and recovery. Reconfiguration into a working system under trouble conditions is an extensive task, depending on the severity of the fault. (For example, it is possible for the processor to lose its sanity or ability to make proper decisions.) An autonomous hardware processor configuration (PC) circuit is provided in each CC to assist in assembling a working system. The PC circuit consists of various timers which ensure that the operational, fault recovery, and configuration programs are successfully executed. If these programs are not executed, the PC circuit controls the CC-to-program memory configuration, reloading program memory from file store when required, and isolating various subsystems from the CC until a working system is obtained.


No. 3A Processor

The No. 3A Processor was designed to control the small No. 3 ESS [Irland and Stagg, 1974], which can handle from 500 to 5000 lines. One of the major concerns in the design of this ESS was the cost of its processor. The low cost and high speed of integrated logic circuitry made it possible to design a cost-effective processor that performed better than its discrete component predecessor, the No. 2 Processor. The No. 3A project was started in early 1971. The first system cut into commercial service in late 1975.

Because the number of components in the No. 3A Processor is considerably less than in the No. 1A Processor, all subsystems are fully duplicated, including the main store. The CC, the store bus, and the store are treated as a single switchable entity rather than individual switchable units as in the No. 1A Processor. The system structure is similar to the No. 2 ESS. Experience gained in the design and operation of the No. 2 provided valuable input for the No. 3 Processor design.

The 3A's design makes one major departure from previous ESS processor designs: it operates in the nonmatched mode of duplex operation. The primary purpose of matching is to detect errors. A mismatch, however, does not indicate where (which one of the processors) the fault has occurred. A diagnostic fault-location program must be run to localize the trouble so that the defective unit can be taken off-line. For this reason, the No. 3A Processor was designed to be self-checking, with detection circuitry incorporated as an integral part of the processor. Faults occurring during normal operation are discovered quickly by detecting hardware. This eliminates the need to run the standby system in the synchronous and match mode of operation, or the need to run the fault recognition program to identify the defective unit when a mismatch occurs.

The synchronous and match mode arrangement of the No. 1 Processor and the No. 2 ESS provides excellent detection and coverage of faults. However, there are many instances (e.g., periodic diagnostics, administration changes, recent change up dates, etc.) when the system is not run in the normal match mode. Consequently, during these periods, the system is vulnerable to faults which may go undetected. The rapid advances in integrated circuit technology make possible the implementation of self- checking circuits in a cost-effective manner. This eliminates the need for the synchronous and match mode of operation. Self- checking design is covered in more detail in Toy [1978].

Another new feature in ESS processor design is the application of microprogram technique in the No. 3A [Storey, 1976]. This technique provides a regular procedure of implementing the control logic. Standard error detection is made part of the hardware to achieve a high degree of checkability. Sequential logic, which is difficult to check, is easily implemented as a sequence of microprogram steps. Microprogramming offers many attractive features: it is simple, flexible, easy to maintain, and easy to expand.

The No. 3A Processor paralleled the design of the No. 1A Processor in its use of an electrically alterable (writable) memory. However, great strides in semiconductor memory technology after the No. 1A became operational permitted the use of semiconductor memory in the 3A rather than core memory.

The 3A's call store and program store are consolidated into a single store system. This reduces cost by eliminating buses, drivers, registers, and controls. A single store system no longer allows concurrent access of call store and program store. However, this disadvantage is more than compensated for by the much faster semiconductor memory. Its access time is 1 m s (the earlier PMT stores had an access time of 6 m s).

Normal operation requires the on-line processor to run and process calls while the standby processor is in the halt state, with its memory updated for each write operation. For the read operation, only the on-line memory is read, except when a parity error occurs during a memory read. This results in a micropro-

previous | contents | next