Skip navigation
Full metadata record
DC FieldValueLanguage
dc.contributor.authorSaar-Tsechansky, Maytal-
dc.contributor.authorProvost, Foster-
dc.date.accessioned2008-12-03T17:56:26Z-
dc.date.available2008-12-03T17:56:26Z-
dc.date.issued2007-07-
dc.identifier.citationJournal of Machine Learning Research 8(July):1625-1657en
dc.identifier.urihttp://hdl.handle.net/2451/27813-
dc.description.abstractMuch work has studied the effect of different treatments of missing values on model induction, but little work has analyzed treatments for the common case of missing values at prediction time. This paper first compares several different methods—predictive value imputation, the distributionbased imputation used by C4.5, and using reduced models—for applying classification trees to instances with missing values (and also shows evidence that the results generalize to bagged trees and to logistic regression). The results show that for the two most popular treatments, each is preferable under different conditions. Strikingly the reduced-models approach, seldom mentioned or used, consistently outperforms the other two methods, sometimes by a large margin. The lack of attention to reduced modeling may be due in part to its (perceived) expense in terms of computation or storage. Therefore, we then introduce and evaluate alternative, hybrid approaches that allow users to balance between more accurate but computationally expensive reduced modeling and the other, less accurate but less computationally expensive treatments. The results show that the hybrid methods can scale gracefully to the amount of investment in computation/storage, and that they outperform imputation even for small investments.en
dc.description.sponsorshipNYU, Stern School of Business, IOMS Department, Center for Digital Economy Researchen
dc.format.extent389647 bytes-
dc.format.mimetypeapplication/pdf-
dc.language.isoen_USen
dc.publisherJournal of Machine Learning Researchen
dc.relation.ispartofseriesCeDER-PP-2007-06en
dc.subjectmissing dataen
dc.subjectclassificationen
dc.subjectclassification treesen
dc.subjectimputationen
dc.titleHandling Missing Values when Applying Classification Modelsen
dc.typeArticleen
Appears in Collections:CeDER Published Papers

Files in This Item:
File Description SizeFormat 
CPP-06-07.pdf380.51 kBAdobe PDFView/Open


Items in FDA are protected by copyright, with all rights reserved, unless otherwise indicated.