|
Archive@NYU >
Stern School of Business >
IOMS: Information Systems Working Papers >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2451/14161
|
| Title: | Tree Induction vs. Logistic Regression: A Learning-Curve Analysis |
| Authors: | Perlich, Claudia Provost, Foster Simonoff, Jeffrey S. |
| Issue Date: | 11-Dec-2001 |
| Publisher: | Stern School of Business, New York University |
| Series/Report no.: | IS-01-02 |
| Abstract: | Tree induction and logistic regression are two standard, off-the-shelf
methods for building models for classification. We present a large-scale
experimental comparison of logistic regression and tree induction,
assessing classification accuracy and the quality of rankings based on
class-membership probabilities. We use a learning-curve analysis to
examine the relationship of these measures to the size of the training
set. The results of the study show several remarkable things. (I)
Contrary to prior observations, logistic regression does not generally
outperform tree induction. (2) More specifically, and not surprisingly,
logistic regression is better for smaller training sets and tree
induction for larger data sets. Importantly, this often holds for
training sets drawn from the same domain (i.e., the learning curves
cross), so conclusions about induction-algorithm superiority on a given
domain must be based on an analysis of the learning curves. (3) Contrary
to conventional wisdom, tree induction is effective at producing
probability-based rankings, although apparently comparatively less so
for a given training--set size than at making classifications. Finally,
(4) the domains on which tree induction and logistic regression are
ultimately preferable can be characterized surprisingly well by a simple
measure of signal-to-noise ratio. |
| URI: | http://hdl.handle.net/2451/14161 |
| Appears in Collections: | IOMS: Information Systems Working Papers
|
All items in Faculty Digital Archive are protected by copyright, with all rights reserved.
|