David Patterson, University of Montana, davep@selway.umt.edu
Brian Steele, University of Montana

Ideal Bootstrap Estimation of Expected Error Rate for k-Nearest Neighbor Classifiers

Keywords: classification, bootstrap

Abstract: Efron and Tibshirani (1997) have suggested the leave-one-out bootstrap and variants for estimating the expected error rate for classifiers. The leave-one-out bootstrap estimates the error rate by using each bootstrap sample to classify only observations which are not in the bootstrap sample. In this paper, we present analytic formulae for the ideal leave-one-out bootstrap estimate of expected error rate for k-nearest neighbor (k-NN) classifiers. We also propose a new weighted k-NN classifier based on these calculations. The resampling-weighted k-NN classifier replaces the conventional k-NN posterior probabilities by their expectations under resampling and predicts an object to belong to the group with the largest resampling expected posterior probability. The calculation of the weights follows the calculation of the ideal leave-one-out bootstrap estimate of error rate. A simulation study shows that the resampling-weighted classifier compares favorably to unweighted and distance-weighted k-NN classifiers.