Faculty Digital Archive

Archive@NYU >
Stern School of Business >
IOMS: Information Systems Working Papers >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2451/14165

Title: Active Sampling for Class Probability Estimation and Ranking
Authors: Saar-Tsechansky, Maytal
Provost, Foster
Keywords: active learning
class probability estimation
cost-sensitive learning
Issue Date: 2001
Publisher: Stern School of Business, New York University
Series/Report no.: IS-01-03
Abstract: In many cost-sensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels. Active sampling acquires data incrementally, at each phase identifying especially useful additional data for labeling, and can be used to economize on examples needed for learning. We outline the critical features for an active sampling approach and present an active sampling method for estimating class probabilities and ranking. BOOTSTRAP-LV identifies particularly informative new data for learning based on the variance in probability estimates, and by accounting for a particular data item's informative value for the rest of the input space. We show empirically that the method reduces the number of data items that must be obtained and labeled, across a wide variety of domains. We investigate the contribution of the components of the algorithm and show that each provides valuable information to help identify informative examples. We also compare BOOTSTRAP-LV with UNCERTAINTY SAMPLING,a n existing active sampling method designed to maximize classification accuracy. The results show that BOOTSTRAP-LV uses fewer examples to exhibit a certain class probability estimation accuracy and provide insights on the behavior of the algorithms. Finally, to further our understanding of the contributions made by the elements of BOOTSTRAP-LV, we experiment with a new active sampling algorithm drawing from both UNCERTAINIY SAMPLING and BOOTSTRAP-LV and show that it is significantly more competitive with BOOTSTRAP-LV compared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing active sampling algorithms for classification.
URI: http://hdl.handle.net/2451/14165
Appears in Collections:IOMS: Information Systems Working Papers

Files in This Item:

File Description SizeFormat
IS-01-03.pdf5.29 MBAdobe PDFView/Open

Items in Faculty Digital Archive are protected by copyright, with all rights reserved, unless otherwise indicated.

 

The contents of the FDA may be subject to copyright, be offered under a Creative Commons license, or be in the public domain.
Please check items for rights statements. For information about NYU’s copyright policy, see http://www.nyu.edu/footer/copyright-and-fair-use.html 
Valid XHTML 1.0 | CSS