Shuffling and Mixing Data Augmentation for Environmental Sound Classification

Inoue, Tadanobu; Vinayavekhin, Phongtharin; Wang, Shiqiang; Wood, David; Munawar, Asim; Ko, Bong Jun; Greco, Nancy; Tachibana, Ryuki

doi:https://doi.org/10.33682/wgyb-bt40

Full metadata record

DC Field	Value	Language
dc.contributor.author	Inoue, Tadanobu
dc.contributor.author	Vinayavekhin, Phongtharin
dc.contributor.author	Wang, Shiqiang
dc.contributor.author	Wood, David
dc.contributor.author	Munawar, Asim
dc.contributor.author	Ko, Bong Jun
dc.contributor.author	Greco, Nancy
dc.contributor.author	Tachibana, Ryuki
dc.date.accessioned	2019-10-24T01:50:17Z	-
dc.date.available	2019-10-24T01:50:17Z	-
dc.date.issued	2019-10
dc.identifier.citation	T. Inoue, P. Vinayavekhin, S. Wang, D. Wood, A. Munawar, B. Ko, N. Greco & R. Tachibana, "Shuffling and Mixing Data Augmentation for Environmental Sound Classification", Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pages 109–113, New York University, NY, USA, Oct. 2019	en
dc.identifier.uri	http://hdl.handle.net/2451/60739	-
dc.description.abstract	Smart speakers have been recently adopted and widely used in consumer homes, largely as a communication interface between human and machines. In addition, these speakers can be used to monitor sounds other than human voice, for example, to watch over elderly people living alone, and to notify if there are changes in their usual activities that may affect their health. In this paper, we focus on the sound classification using machine learning, which usually requires a lot of training data to achieve good accuracy. Our main contribution is a data augmentation technique that generates new sound by shuffling and mixing two existing sounds of the same class in the dataset. This technique creates new variations on both the temporal sequence and the density of the sound events. We show in DCASE 2018 Task 5 that the proposed data augmentation method with our proposed convolutional neural network (CNN) achieves an average of macro-averaged F1 score of 89.95% over 4 folds of the development dataset. This is a significant improvement from the baseline result of 84.50%. In addition, we also verify that our proposed data augmentation technique can improve the classification performance on the Urban Sound 8K dataset.	en
dc.rights	Copyright The Authors, 2019	en
dc.title	Shuffling and Mixing Data Augmentation for Environmental Sound Classification	en
dc.type	Article	en
dc.identifier.DOI	https://doi.org/10.33682/wgyb-bt40
dc.description.firstPage	109
dc.description.lastPage	113
Appears in Collections:	Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)

Files in This Item:

File	Size	Format
DCASE2019Workshop_Inoue_20.pdf	759.78 kB	Adobe PDF	View/Open

Show simple item record