An Investigation of Missing Data Methods for Classiffcation Trees

Ding, Yufeng; Simonoff, Jeffrey S.

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ding, Yufeng	-
dc.contributor.author	Simonoff, Jeffrey S.	-
dc.date.accessioned	2008-05-25T13:58:55Z	-
dc.date.available	2008-05-25T13:58:55Z	-
dc.date.issued	2006-12-03	-
dc.identifier.uri	http://hdl.handle.net/2451/26305	-
dc.description.abstract	There are many different missing data methods used by classification tree algorithms, but few studies have been done comparing their appropriateness and performance. This paper provides both analytic and Monte Carlo evidence regarding the effectiveness of six popular missing data methods for classification trees. We show that in the context of classification trees, the relationship between the missingness and the dependent variable, rather than the standard missingness classification approach of Little and Rubin (2002) (missing completely at random (MCAR), missing at random (MAR) and not missing at random (NMAR)), is the most helpful criterion to distinguish different missing data methods. We make recommendations as to the best method to use in various situations. The paper concludes with discussion of a real data set related to predicting bankruptcy of a firm.	en
dc.language	English	EN
dc.language.iso	en_US	en
dc.publisher	Stern School of Business, New York University	en
dc.relation.ispartofseries	SOR-2006-3	en
dc.subject	C4.5	en
dc.subject	CART	en
dc.subject	Classification tree	en
dc.subject	Separate class	en
dc.title	An Investigation of Missing Data Methods for Classiffcation Trees	en
dc.type	Working Paper	en
dc.description.series	Statistics Working Papers Series	EN
Appears in Collections:	IOMS: Statistics Working Papers

Files in This Item:

File	Description	Size	Format
06-03.pdf		371.85 kB	Adobe PDF	View/Open

Show simple item record