|
Archive@NYU >
Stern School of Business >
IOMS: Information Systems Working Papers >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2451/14165
|
| Title: | Active Sampling for Class Probability Estimation and Ranking |
| Authors: | Saar-Tsechansky, Maytal Provost, Foster |
| Keywords: | active learning class probability estimation cost-sensitive learning |
| Issue Date: | 2001 |
| Publisher: | Stern School of Business, New York University |
| Series/Report no.: | IS-01-03 |
| Abstract: | In many cost-sensitive environments class probability estimates are used
by decision makers to evaluate the expected utility from a set of
alternatives. Supervised learning can be used to build class probability
estimates; however, it often is very costly to obtain training data with
class labels. Active sampling acquires data incrementally, at each phase
identifying especially useful additional data for labeling, and can be
used to economize on examples needed for learning. We outline the
critical features for an active sampling approach and present an active
sampling method for estimating class probabilities and ranking.
BOOTSTRAP-LV identifies particularly informative new data for learning
based on the variance in probability estimates, and by accounting for a
particular data item's informative value for the rest of the input
space. We show empirically that the method reduces the number of data
items that must be obtained and labeled, across a wide variety of
domains. We investigate the contribution of the components of the
algorithm and show that each provides valuable information to help
identify informative examples. We also compare BOOTSTRAP-LV with
UNCERTAINTY SAMPLING,a n existing active sampling method designed to
maximize classification accuracy. The results show that BOOTSTRAP-LV
uses fewer examples to exhibit a certain class probability estimation
accuracy and provide insights on the behavior of the algorithms.
Finally, to further our understanding of the contributions made by the
elements of BOOTSTRAP-LV, we experiment with a new active sampling
algorithm drawing from both UNCERTAINIY SAMPLING and BOOTSTRAP-LV and
show that it is significantly more competitive with BOOTSTRAP-LV
compared to UNCERTAINTY SAMPLING. The analysis suggests more general
implications for improving existing active sampling algorithms for classification. |
| URI: | http://hdl.handle.net/2451/14165 |
| Appears in Collections: | IOMS: Information Systems Working Papers
|
All items in Faculty Digital Archive are protected by copyright, with all rights reserved.
|