Convolutional Recurrent Neural Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision

Ebbers, Janek; Häb-Umbach, Reinhold

doi:https://doi.org/10.33682/57xx-t679

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ebbers, Janek
dc.contributor.author	Häb-Umbach, Reinhold
dc.date.accessioned	2019-10-24T01:50:14Z	-
dc.date.available	2019-10-24T01:50:14Z	-
dc.date.issued	2019-10
dc.identifier.citation	J. Ebbers & R. Häb-Umbach, "Convolutional Recurrent Neural Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision", Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pages 64–68, New York University, NY, USA, Oct. 2019	en
dc.identifier.uri	http://hdl.handle.net/2451/60729	-
dc.description.abstract	In this paper we present our audio tagging system for the DCASE 2019 Challenge Task 2. We propose a model consisting of a convolutional front end using log-mel-energies as input features, a recurrent neural network sequence encoder and a fully connected classifier network outputting an activity probability for each of the 80 considered event classes. Due to the recurrent neural network, which encodes a whole sequence into a single vector, our model is able to process sequences of varying lengths. The model is trained with only little manually labeled training data and a larger amount of automatically labeled web data, which hence suffers from label noise. To efficiently train the model with the provided data we use various data augmentation to prevent overfitting and improve generalization. Our best submitted system achieves a label-weighted label-ranking average precision (lwlrap) of 75.5% on the private test set which is an absolute improvement of 21.7% over the baseline. This system scored the second place in the teams ranking of the DCASE 2019 Challenge Task 2 and the fifth place in the Kaggle competition ``Freesound Audio Tagging 2019'' with more than 400 participants. After the challenge ended we further improved performance to 76.5% lwlrap setting a new state-of-the-art on this dataset.	en
dc.rights	Copyright The Authors, 2019	en
dc.title	Convolutional Recurrent Neural Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision	en
dc.type	Article	en
dc.identifier.DOI	https://doi.org/10.33682/57xx-t679
dc.description.firstPage	64
dc.description.lastPage	68
Appears in Collections:	Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)

Files in This Item:

File	Size	Format
DCASE2019Workshop_Ebbers_54.pdf	467.36 kB	Adobe PDF	View/Open

Show simple item record