Using k-Nearest Neighbor Classification in the Leaves of a Tree
Keywords: nearest neighbor; classification trees; composite classifiers
Abstract: In a tree classifier, the feature space is divided into regions ("leaves"). A new observation is then classified by finding the leaf into which it would have fallen; the class which makes up a plurality in that leaf becomes the estimate for that observation. When one class is much more common than the others it can happen that that class makes up a plurality in every leaf. In this case the classification tree performs no better than the naive classifier. In this paper, weconstruct a composite classifier by combining classification trees and k-nearest-neighbor (k-NN). In our scheme we divide the feature space up by a classification tree, and then classify test set items using the k-NN rule just among those training items in the same leaf as the test item. This avoids some of the problems associated with k-NN and reduces somewhat the associated computational load. More importantly it produces a classification rule that performs better than either trees or the usual k-NN in a number of well-known data sets.