A Part-Time Parallel Processing Environment for Statistical Computation
Keywords: parallel processing, statistical computing, MPI, R
Abstract: A serial computer can process only one command at a time. Parallel computers, on the other hand, can perform multiple operations simultaneously. The advent of parallel processing allowed for computation of large problems as well as speedier solution of many standard problems. Parallel computing has been widely used in both science and technology, but until recently has only been available to those with access to a supercomputer. Current trends, such as the Beowulf Project at CESDIS, have shown that disparate personal computers can be connected to create a parallel computer with capabilities rivaling those of a supercomputer. However, a dedicated computer cluster such as Beowulf is unavailable to the majority of statisticians. Instead, we demonstrate a model where idle computers can join in an ad hoc cluster with the user's machine to allow for part-time parallel processing. We illustrate this using the statistical package R and show that components of such a package can be written with this parallelization in mind. From the user's point-of-view, all that is required to enable parallel computing is a parallel(on) or parallel(off) call. We address issues of granularity in parallel programming with respect to functions in R, with the end goal of making these capabilities available to both developers and practitioners.