previous | contents | next

446 Part 2 ½ Regions of Computer Space Section 6 ½ Fault-Tolerant Systems

"process-pairs." One I/O process is designated as primary, the other as backup. All file modification messages are delivered to the primary I/O process. The primary sends a message with checkpoint information to the backup so that it can take over if the primary's processor or access path to the I/O device fails. Files can also be duplicated on physically distinct devices controlled by an I/O process-pair on physically distinct processors. All file modification messages are delivered to both I/O processes. Thus, in the

case of physical failure or isolation of the primary, the backup file is up to date and available.

User applications can also use the process-pair mechanism. Consider a NonStop application program A. Program A starts up a backup process A1 in another processor. There are also duplicate file images, one designated primary and the other backup. Program A periodically (at user-specified points) sends checkpoint information to A1. A1 is the same program as A, but it knows that it is a backup program. A1 reads checkpoint messages to update its data area, file status, and program counter. A1 loads and executes if the system reports A's processor is down (i.e., if an error message is sent from A's operating system image or if A's processor fails to respond to a periodic "I'm alive" message). All file activity by A is performed on both the primary and backup file copies. When A1 starts to execute from the last checkpoint, it may attempt to repeat I/O operations successfully completed by A. The system file handler will recognize this situation and send A1 a successfully completed I/O message. A1 periodically asks the operating system whether a backup process exists. Since one no longer does, it can request the creation and initialization of a copy of both the process and file structure. More information on the operating system and the programming of NonStop applications can be found in Bartlett [1977].
 
 

previous | contents | next