|
Archive@NYU >
Stern School of Business >
CeDER Working Papers >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2451/25882
|
| Title: | Get Another Label? Improving Data Quality and Data Mining Using
Multiple, Noisy Labelers |
| Authors: | Sheng, Victor Provost, Foster Ipeirotis, Panagiotis G. |
| Issue Date: | 6-Mar-2008 |
| Series/Report no.: | CeDER-08-01 |
| Abstract: | This paper addresses the repeated acquisition of labels for data items
when the labeling is imperfect. We examine the improvement (or lack
thereof) in data quality via repeated labeling, and focus especially on
the improvement of training labels for supervised induction. With the
outsourcing of small tasks becoming easier, for example via Rent-A-Coder
or Amazon's Mechanical Turk, it often is possible to obtain
less-than-expert labeling at low cost. With low-cost labeling, preparing
the unlabeled part of the data can become considerably more expensive
than labeling. We present repeated-labeling strategies of increasing
complexity, and show several main results. (i) Repeated-labeling can
improve label quality and model quality, but not always. (ii) When
labels are noisy, repeated labeling can be preferable to single labeling
even in the traditional setting where labels are not particularly cheap.
(iii) As soon as the cost of processing the unlabeled data is not free,
even the simple strategy of labeling everything multiple times can give
considerable advantage. (iv) Repeatedly labeling a carefully chosen set
of points is generally preferable, and we present a robust technique
that combines different notions of uncertainty to select data points for
which quality should be improved. The bottom line: the results show
clearly that when labeling is not perfect, selective acquisition of
multiple labels is a strategy that data miners should have in their
repertoire; for certain label-quality/cost regimes, the benefit is substantial. |
| URI: | http://hdl.handle.net/2451/25882 |
| Appears in Collections: | CeDER Working Papers IOMS: Information Systems Working Papers
|
All items in Faculty Digital Archive are protected by copyright, with all rights reserved.
|