Active Sampling for Class Probability Estimation and Ranking

Provost, Foster; Saar-Tsechansky, Maytal

Full metadata record

DC Field	Value	Language
dc.contributor.author	Provost, Foster	-
dc.contributor.author	Saar-Tsechansky, Maytal	-
dc.date.accessioned	2008-12-02T17:58:40Z	-
dc.date.available	2008-12-02T17:58:40Z	-
dc.date.issued	2004	-
dc.identifier.citation	Machine Learning, 54, 153–178, 2004	en
dc.identifier.uri	http://hdl.handle.net/2451/27800	-
dc.description.abstract	Abstract. In many cost-sensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels. Active learning acquires data incrementally, at each phase identifying especially useful additional data for labeling, and can be used to economize on examples needed for learning. We outline the critical features of an active learner and present a sampling-based active learning method for estimating class probabilities and class-based rankings. BOOTSTRAP-LV identifies particularly informative new data for learning based on the variance in probability estimates, and uses weighted sampling to account for a potential example’s informative value for the rest of the input space.We show empirically that the method reduces the number of data items that must be obtained and labeled, across a wide variety of domains.We investigate the contribution of the components of the algorithm and showthat each provides valuable information to help identify informative examples.We also compare BOOTSTRAP-LV with UNCERTAINTY SAMPLING, an existing active learning method designed to maximize classification accuracy. The results show that BOOTSTRAP-LV uses fewer examples to exhibit a certain estimation accuracy and provide insights to the behavior of the algorithms. Finally, we experiment with another new active sampling algorithm drawing from both UNCERTAINTY SAMPLING and BOOTSTRAP-LV and show that it is significantly more competitive with BOOTSTRAPLV compared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing active sampling algorithms for classification.	en
dc.description.sponsorship	NYU, Stern School of Business, IOMS Department, Center for Digital Economy Research	en
dc.format.extent	207955 bytes	-
dc.format.mimetype	application/pdf	-
dc.language.iso	en_US	en
dc.publisher	Machine Learning	en
dc.relation.ispartofseries	CeDER-PP-2004-05	en
dc.subject	active learning	en
dc.subject	cost-sensitive learning	en
dc.subject	class probability	en
dc.subject	estimation	en
dc.subject	ranking	en
dc.subject	supervised learning	en
dc.subject	decision trees	en
dc.subject	uncertainty sampling	en
dc.subject	selective sampling	en
dc.title	Active Sampling for Class Probability Estimation and Ranking	en
dc.type	Article	en
Appears in Collections:	CeDER Published Papers

Files in This Item:

File	Description	Size	Format
CPP-05-04.pdf		203.08 kB	Adobe PDF	View/Open

Show simple item record