Popis: |
Rapid technological advances in recent years have made powerful yet inexpensive commodity PCs a reality. New interconnecting technologies that deliver very low latency and very high bandwidth are also becoming available. These developments lead to the trend of cluster computing, which combines the computational power of commodity PCs and the communication performance of high speed interconnects to provide cost-effective solutions for computational intensive applications, especially for those grand challenge applications such as weather forecasting, air flow analysis, protein searching, and ocean simulation. InfiniBand was proposed recently as the next generation interconnect for I/O and inter-process communication. Due to its open standard and high performance, InfiniBand is becoming increasingly popular as an interconnect for building clusters. However, since it is not designed specifically for high performance computing, there exists a semantic gap between its functionalities and those required by high performance computing software such as Message Passing Interface (MPI). In this dissertation, we take on this challenge and address research issues in designing efficient and scalable communication subsystems to bridge this gap. We focus on how to take advantage of the novel features offered by InfiniBand to design different components in the communication subsystems such as protocol design, flow control, buffer management, communication progress, connection management, collective communication, and multirail network support. Our research has already made notable contributions in the areas of cluster computing and InfiniBand. A large part of our research has been integrated into our MVAPICH software, which is a high performance and scalable MPI implementation over InfiniBand. Our software is currently used by more than 120 organizations world-wide to build InfiniBand clusters, including both research testbeds and production systems. Some of the fastest supercomputers in the world, including the 3rd ranked Virginia Tech Apple G5 cluster, are currently powered by MVAPICH. Research in this dissertation will also have impact on designing communication subsystems for systems other than high performance computing and for other high speed interconnects. |