|
Archive@NYU >
Stern School of Business >
IOMS: Statistics Working Papers >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2451/27728
|
| Title: | An Investigation of Missing Data Methods for Classification Trees |
| Authors: | Ding, Yufeng Simonoff, Jeffrey S. |
| Keywords: | C4.5 Classification tree Separate class CART rpart |
| Issue Date: | 13-Oct-2008 |
| Series/Report no.: | SOR-2008-1 |
| Abstract: | There are many different missing data methods used by classification
tree algorithms, but few studies have been done comparing their
appropriateness and performance. This paper provides both analytic and
Monte Carlo evidence regarding the effectiveness of six popular missing
data methods for classification trees. We show that in the context of
classification trees, the relationship between the missingness and the
dependent variable, rather than the standard missingness classification
approach of Little and Rubin (2002) (missing completely at random
(MCAR), missing at random (MAR) and not missing at random (NMAR)), is
the most helpful criterion to distinguish different missing data
methods. We make recommendations as to the best method to use in various
situations. The paper concludes with discussion of a real data set
related to predicting bankruptcy of a firm. |
| URI: | http://hdl.handle.net/2451/27728 |
| Appears in Collections: | IOMS: Statistics Working Papers
|
All items in Faculty Digital Archive are protected by copyright, with all rights reserved.
|