Faculty Digital Archive

Archive@NYU >
Stern School of Business >
CeDER Published Papers >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2451/27770

Title: Tree Induction vs. Logistic Regression: A Learning-Curve Analysis
Authors: Perlich, Claudia
Provost, Foster
Simonoff, Jeffrey
Keywords: decision trees
learning curves
logistic regression
ROC analysis
Tree induction
Issue Date: 1-Jun-2003
Publisher: Journal of Machine Learning Research
Citation: 4 (2003) pp. 211-255
Series/Report no.: CeDER-PP-2003-05
Abstract: Tree induction and logistic regression are two standard, off-the-shelf methods for building models for classification. We present a large-scale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on classmembership probabilities. We use a learning-curve analysis to examine the relationship of these measures to the size of the training set. The results of the study show several things. (1) Contrary to some prior observations, logistic regression does not generally outperform tree induction. (2) More specifically, and not surprisingly, logistic regression is better for smaller training sets and tree induction for larger data sets. Importantly, this often holds for training sets drawn from the same domain (that is, the learning curves cross), so conclusions about induction-algorithmsuperiority on a given domain must be based on an analysis of the learning curves. (3) Contrary to conventional wisdom, tree induction is effective at producing probability-based rankings, although apparently comparatively less so for a given training-set size than at making classifications. Finally, (4) the domains on which tree induction and logistic regression are ultimately preferable can be characterized surprisingly well by a simple measure of the separability of signal from noise.
URI: http://hdl.handle.net/2451/27770
Appears in Collections:CeDER Published Papers

Files in This Item:

File Description SizeFormat
CPP-05-03.pdf302.66 kBAdobe PDFView/Open

Items in Faculty Digital Archive are protected by copyright, with all rights reserved, unless otherwise indicated.

 

The contents of the FDA may be subject to copyright, be offered under a Creative Commons license, or be in the public domain.
Please check items for rights statements. For information about NYU’s copyright policy, see http://www.nyu.edu/footer/copyright-and-fair-use.html 
Valid XHTML 1.0 | CSS