|
Archive@NYU >
Stern School of Business >
CeDER Published Papers >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2451/27813
|
| Title: | Handling Missing Values when Applying Classification Models |
| Authors: | Saar-Tsechansky, Maytal Provost, Foster |
| Keywords: | missing data classification classification trees imputation |
| Issue Date: | Jul-2007 |
| Publisher: | Journal of Machine Learning Research |
| Citation: | Journal of Machine Learning Research 8(July):1625-1657 |
| Series/Report no.: | CeDER-PP-2007-06 |
| Abstract: | Much work has studied the effect of different treatments of missing
values on model induction, but little work has analyzed treatments for
the common case of missing values at prediction time. This paper first
compares several different methods—predictive value imputation,
the distributionbased imputation used by C4.5, and using reduced
models—for applying classification trees to instances with missing
values (and also shows evidence that the results generalize to bagged
trees and to logistic regression). The results show that for the two
most popular treatments, each is preferable under different conditions.
Strikingly the reduced-models approach, seldom mentioned or used,
consistently outperforms the other two methods, sometimes by a large
margin. The lack of attention to reduced modeling may be due in part to
its (perceived) expense in terms of computation or storage. Therefore,
we then introduce and evaluate alternative, hybrid approaches that allow
users to balance between more accurate but computationally expensive
reduced modeling and the other, less accurate but less computationally
expensive treatments. The results show that the hybrid methods can scale
gracefully to the amount of investment in computation/storage, and that
they outperform imputation even for small investments. |
| URI: | http://hdl.handle.net/2451/27813 |
| Appears in Collections: | CeDER Published Papers
|
All items in Faculty Digital Archive are protected by copyright, with all rights reserved.
|