XCPU: a new, 9p-based, process management system for clusters and grids

Autor: Ron Minnich, Andrey Mirtchovski
Rok vydání: 2006
Předmět:
Zdroj: CLUSTER
DOI: 10.1109/clustr.2006.311843
Popis: Xcpu is a new process management system that is equally at home on clusters and grids. Xcpu provides a process execution service visible to client nodes as a 9p server. It can be presented to users as a file system if that functionality is desired. The xcpu service builds on our earlier work with the bproc system. Xcpu differs from traditional remote execution services in several key ways, one of the most important being its use of a push rather than a pull model, in which the binaries are pushed to the nodes by the job starter, rather than pulled from a remote file system such as NFS. Bproc used a proprietary protocol; a process migration model; and a set of kernel modifications to achieve its goals. In contrast, xcpu uses a well-understood protocol, namely 9p; uses a non-migration model for moving the process to the remote node; and uses totally standard kernels on various operating systems such as Plan 9 and Linux to start, and MacOS and others in development. In this paper, we describe our clustering model; how bproc implements it and how xcpu implements a similar, but not identical model. We describe in some detail the structure of the various xcpu components. Finally, we close with a discussion of xcpu performance, as measured on several clusters at LANL, including the 1024-node Pink cluster, and the 256-node Blue Steel infiniband cluster.
Databáze: OpenAIRE