Tim Hesterberg, MathSoft, timh@statsci.com
James Schimert, MathSoft

A Model-Based Approach to Handling Missing Values in S-PLUS using EM, Iterative Simulation, and Multiple Imputation

Keywords: EM, missing data, multiple imputation tilting

Abstract: Missing data cripple most routines in statistical packages which typically expect a complete rectangular data set. The common practice is to artificially create a rectangular data set, usually by (1) throwing away cases with missing values or (2) ad hoc imputation, estimating and filling in missing data. These and other ad hoc approaches lose information and may give biased results.

More principled methods require methodology and computational methods that can be expensive to implement. Most practicing statisticians do not have time for this.

This talk discusses S-PLUS software that supports model based missing data methods -- EM methods (Little and Rubin 1987) and data augmentation (Schafer 1997). The latter produces proper multiple imputations, which may be used to obtain standard error estimates not readily produced by EM, and to create completed data sets which may be used in algorithms that do not allow missing data.

This software may be applied to handle a wide variety of missing data problems. Calculations across multiple imputations are carried out in a largely user-transparent way using symbolic programming.