|
Archive@NYU >
Stern School of Business >
CeDER Published Papers >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2451/27800
|
| Title: | Active Sampling for Class Probability Estimation and Ranking |
| Authors: | Provost, Foster Saar-Tsechansky, Maytal |
| Keywords: | active learning cost-sensitive learning class probability estimation ranking supervised learning decision trees uncertainty sampling selective sampling |
| Issue Date: | 2004 |
| Publisher: | Machine Learning |
| Citation: | Machine Learning, 54, 153–178, 2004 |
| Series/Report no.: | CeDER-PP-2004-05 |
| Abstract: | Abstract. In many cost-sensitive environments class probability
estimates are used by decision makers to evaluate the expected utility
from a set of alternatives. Supervised learning can be used to build
class probability estimates; however, it often is very costly to obtain
training data with class labels. Active learning acquires data
incrementally, at each phase identifying especially useful additional
data for labeling, and can be used to economize on examples needed for
learning. We outline the critical features of an active learner and
present a sampling-based active learning method for estimating class
probabilities and class-based rankings. BOOTSTRAP-LV identifies
particularly informative new data for learning based on the variance in
probability estimates, and uses weighted sampling to account for a
potential example’s informative value for the rest of the input
space.We show empirically that the method reduces the number of data
items that must be obtained and labeled, across a wide variety of
domains.We investigate the contribution of the components of the
algorithm and showthat each provides valuable information to help
identify informative examples.We also compare BOOTSTRAP-LV with
UNCERTAINTY SAMPLING, an existing active learning method designed to
maximize classification accuracy. The results show that BOOTSTRAP-LV
uses fewer examples to exhibit a certain estimation accuracy and provide
insights to the behavior of the algorithms. Finally, we experiment with
another new active sampling algorithm drawing from both UNCERTAINTY
SAMPLING and BOOTSTRAP-LV and show that it is significantly more
competitive with BOOTSTRAPLV compared to UNCERTAINTY SAMPLING. The
analysis suggests more general implications for improving existing
active sampling algorithms for classification. |
| URI: | http://hdl.handle.net/2451/27800 |
| Appears in Collections: | CeDER Published Papers
|
All items in Faculty Digital Archive are protected by copyright, with all rights reserved.
|