Faculty Digital Archive

Archive@NYU  >
Stern School of Business >
IOMS: Statistics Working Papers >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2451/26305

Title: An Investigation of Missing Data Methods for Classiffcation Trees
Authors: Ding, Yufeng
Simonoff, Jeffrey S.
Keywords: C4.5
CART
Classification tree
Separate class
Issue Date: 3-Dec-2006
Publisher: Stern School of Business, New York University
Series/Report no.: SOR-2006-3
Abstract: There are many different missing data methods used by classification tree algorithms, but few studies have been done comparing their appropriateness and performance. This paper provides both analytic and Monte Carlo evidence regarding the effectiveness of six popular missing data methods for classification trees. We show that in the context of classification trees, the relationship between the missingness and the dependent variable, rather than the standard missingness classification approach of Little and Rubin (2002) (missing completely at random (MCAR), missing at random (MAR) and not missing at random (NMAR)), is the most helpful criterion to distinguish different missing data methods. We make recommendations as to the best method to use in various situations. The paper concludes with discussion of a real data set related to predicting bankruptcy of a firm.
URI: http://hdl.handle.net/2451/26305
Appears in Collections:IOMS: Statistics Working Papers

Files in This Item:

File Description SizeFormat
06-03.pdf371.85 kBAdobe PDFView/Open

All items in Faculty Digital Archive are protected by copyright, with all rights reserved.

 

The contents of this archive are either in the public domain or subject to copyright. Please consult NYU's "Handbook for Use of Copyrighted Materials" (http://library.nyu.edu/copyright/copyright.html) for information on using material within the Faculty Digital Archive.
Valid XHTML 1.0 | CSS