Design of fault tolerant multiprocessor systems
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The purpose of this thesis is towards designing a Fault-Tolerant Multi Processor
newlineSystem useful for space or military applications. INMOS T800 has been conceived on
newlinethe basis of building block for multiprocessing, because of its link based architecture.
newlineTransputer is inherently more Fault-Tolerant than any other standard bus based
newlinearchitecture.
newlineThe multiprocessor network design is developed around and#8213;Triple Modular
newlineRedundancyand#8214; philosophy with autonomous decentralized loop configuration for
newlinenetworking. The powerful fault-tolerant nature of the network is gradually unveiled as
newlineone goes through the thesis. The concurrent programming language OCCAM - 2 is used
newlinefor real time programming. The programs, examples presented in this thesis were
newlineimplemented using OCCAM programming language in Transputer Development System (TDS). The sources of different kinds of faults have been discussed and also various
newlinehardware and software schemes towards achieving fault-tolerance and hence overall
newlinesystem reliability are outlined in the thesis. Fault tolerance is generally achieved through
newlinethe use of redundancy of hardware and/or software (spatial redundancy), of time
newline(temporal redundancy), of information (value redundancy), or a combination thereof.
newlineSpatial redundancy means that individual elements of hardware and/or software that can
newlinebe faulty are replicated. There exist three types of spatial redundancy: static, dynamic
newlineand hybrid.
newlineStatic redundancy is a technique that works by and#8213;maskingand#8214; faults. This is
newlineperformed by executing a distinct task on each of the replicated elements. Dynamic
newlineredundancy involves two phases. In the first phase faulty elements of the system are
newlineidentified in some manner. In the second phase the system recovers from the faulty units
newlineby switching over their operation to redundant, backup elements that are hopefully available and fault-free. The success of this method relies on choosing a suitable number
newlineof spares, an effective technique for detecting faults, and a viable switching operation.
newline