Fault-Tolerant Routing and Deadlock Recovery for Direct Multicomputer Networks

Autor:	Shih-Chang Wang, 汪世昌
Rok vydání:	2000
Druh dokumentu:	學位論文 ; thesis
Popis:	88 In direct multicomputer networks, tasks are executed by a set of intercommunicating nodes or processors. The communication is usually carried out by means of passing messages from one node to another over the interconnection network. Since direct networks utilize the locality of the message references more efficiently, most of the existing systems use direct networks such as k-ary n-cube or n-dimensional mesh. The performance of the interprocessor communication scheme depends largely on the network dimension, the switching technique, and the routing algorithm. Most contemporary systems have used two or three dimensions. Store and forward, virtual cut-though, and wormhole routing are the main switching techniques used for inter-processor communication. Due to the lower latency and smaller buffer requirements, wormhole routing is preferred and is widely used in recent multicomputers. Multicast is the delivery of the same message from one source node to an arbitrary number of destination nodes. Recently, wormhole routing has become one of the most popular switching techniques in new generation multicomputers. When designing communication algorithms for wormhole routed network topologies, the necessity to be deadlock-free due to the exclusive use of channels must be considered in the first place. Unfortunately, fault-tolerance properties of the routing networks would hurt under such restrictions. Previous researches have focused on fault-tolerant one-to-one routing algorithms for n-dimensional meshes. However, little research has been done on fault-tolerant one-to-many(multicast) routing algorithms due to the difficulty to achieving deadlock-free routing on faulty networks. We will develop such an algorithm for faulty hypercubes. Our approach is not based on adding physical or virtual channels to the network topology. Instead, we integrate several techniques such as partitioning of nodes, partitioning of channels, node label assignments, and path-like multicast to achieve fault tolerance. Both theoretical analysis and simulation are performed to demonstrate the effectiveness of the proposed algorithm. Routing is the delivery of a message from one source node to another destination node. Previous researches have focused on designing deadlock-free routing algorithms for an n-cube based on the Hamilton-path labeling. However, the labeling method widely employed in the literature is called the Gray-Code labeling. The Gray-Code labeling is a fixed approach and provides only one way to achieve routing. In contrast to this traditional way, a distributed Hamilton-path labeling algorithm is proposed in this thesis to provide up to ways of labeling. In addition, the proposed Hamilton-path labeling process can be completed for an n-cube in O(n) parallel steps. Finally, a heuristic algorithm to obtain an efficient way of labeling for our approach according to different system traffic distributions is also given. Therefore, the system performance can be improved significantly with such heuristic approach. In order to avoid deadlocks, prevention-based routing algorithms impose certain routing restrictions which lead to high hardware complexity or low adaptability. If deadlock occurrences are extremely rare, recovery-based routing algorithms become more attractive due to lower hardware complexity and better routing adaptivity. A simple architecture that each router is provided with an additional special flit buffer was developed to achieve deadlock recovery in the literature. Disha_SEQ and Disha_CON are two deadlock recovery schemes based on such architecture to accomplish sequential recovery and concurrent recovery, respectively. In this thesis, we propose a simple recovery scheme for a 2-dimensioal mesh with the same router architecture and eliminates the drawbacks in Disha_SEQ or Disha_CON such as hardwired token, finding the Hamilton cycle, Hamilton-path labeling for each node, and non-minimal path routing. Moreover, the simulation results show that the proposed scheme also outperforms Disha_SEQ and Disha_CON under various load distributions.
Databáze:	Networked Digital Library of Theses & Dissertations
Externí odkaz:	http://ndltd.ncl.edu.tw/handle/85615399275271762958 Zobrazit plný text záznamu