previous | contents | next

Chapter 28 ½ Fault-Tolerant Design of Local ESS Processors 461
 

be assigned as the on-line working system, while the other unit serves as standby backup. The mean-time-to-failure (MTTF), a measure of reliability, is given by the following expression [Smith, 1972]:

where m is the repair rate (reciprocal of the repair time), and l is the failure rate,

The failure rate (l ) of one unit is the summation of failure rates of all components within the unit. For medium and small ESS processors, Fig. 2 shows a system structure containing several functional units which are treated as a single entity, with l still sufficiently small to meet the reliability requirement. The single-unit duplex configuration has the merit of being very simple in terms of the number of switching blocks in the system. This configuration simplifies not only the recovery program but also the hardware interconnection. It does this by eliminating the additional access required to make each duplicated block capable of switching independently into the on-line system configuration.

In the large No. 1 ESS, which contains many components, the MTTF becomes too low to meet the reliability requirement. In order to increase the value of the MTTF, either the number of components (failure rate) or the repair time must be reduced. Alternatively, the single-unit duplex configuration can be partitioned into a multiunit duplex configuration as shown in Fig. 3. In this arrangement, each subunit contains a smaller number of components and is able to be switched into a working system. The system will fail only if a fault occurs in the redundant subunit while the original is being repaired. Since each subunit contains fewer components, the probability of two simultaneous faults occurring in a duplicated pair of subunits is reduced. The MTTF of the multiunit duplex configuration can be computed by taking into consideration the conditional probability of a subunit failing during the repair time of the original subunit.

An example of a multiunit duplex configuration is shown in Fig. 3. A working system is configured with a fault-free CCx-CSx-CSBx-PSx-PSBx-PUBx arrangement, where x is either Subunit 0 or Subunit 1. This means there are 26, or 64 possible combinations of system configurations. The MTTF is given by the following expression:

The factor r is at a maximum when the failure rate (li) for each subunit is the same. In this case

lCC = lCS = lCSB = lPS = lPSB = lPUB = lI or where s number of subunits, s = 6, and r = s At best, the MTTF is improved by a factor corresponding to the number of partitioned subunits. This improvement is not fully realized since equipment must be added to provide additional access and to select subunits. The partitioning of the subsystem into subunits as shown in Fig. 3 results in subunits of different sizes, Again, the failure rate for each individual subunit will not be the same; hence, the r-factor will be smaller than 6. Because of the relatively large number of components used in implementing

previous | contents | next