HODGEPODGE: Sound Event Detection Based on Ensemble of Semi-Supervised Learning Methods

Shi, Ziqiang; Liu, Liu; Lin, Huibin; Liu, Rujie; Shi, Anyan

doi:https://doi.org/10.33682/9kcj-bq06

Full metadata record

DC Field	Value	Language
dc.contributor.author	Shi, Ziqiang
dc.contributor.author	Liu, Liu
dc.contributor.author	Lin, Huibin
dc.contributor.author	Liu, Rujie
dc.contributor.author	Shi, Anyan
dc.date.accessioned	2019-10-24T01:50:23Z	-
dc.date.available	2019-10-24T01:50:23Z	-
dc.date.issued	2019-10
dc.identifier.citation	Z. Shi, L. Liu, H. Lin, R. Liu & A. Shi, "HODGEPODGE: Sound Event Detection Based on Ensemble of Semi-Supervised Learning Methods", Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pages 224–228, New York University, NY, USA, Oct. 2019	en
dc.identifier.uri	http://hdl.handle.net/2451/60764	-
dc.description.abstract	In this paper, we present a method called HODGEPODGE\footnotemark[1] for large-scale detection of sound events using weakly labeled, synthetic, and unlabeled data proposed in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge Task 4: Sound event detection in domestic environments. To perform this task, we adopted the convolutional recurrent neural networks (CRNN) as our backbone network. In order to deal with a small amount of tagged data and a large amounts of unlabeled in-domain data, we aim to focus primarily on how to apply semi-supervise learning methods efficiently to make full use of limited data. Three semi-supervised learning principles have been used in our system, including: 1) Consistency regularization applies data augmentation; 2) MixUp regularizer requiring that the predictions for a interpolation of two inputs is close to the interpolation of the prediction for each individual input; 3) MixUp regularization applies to interpolation between data augmentations. We also tried an ensemble of various models, which are trained by using different semi-supervised learning principles. Our proposed approach significantly improved the performance of the baseline, achieving the event-based f-measure of 42.0\% compared to 25.8\% event-based f-measure of the baseline in the provided official evaluation dataset. Our submissions ranked third among 18 teams in the task 4.	en
dc.rights	Distributed under the terms of the Creative Commons Attribution 4.0 International (CC-BY) license.	en
dc.title	HODGEPODGE: Sound Event Detection Based on Ensemble of Semi-Supervised Learning Methods	en
dc.type	Article	en
dc.identifier.DOI	https://doi.org/10.33682/9kcj-bq06
dc.description.firstPage	224
dc.description.lastPage	228
Appears in Collections:	Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)

Files in This Item:

File	Size	Format
DCASE2019Workshop_Shi_15.pdf	557.1 kB	Adobe PDF	View/Open

Show simple item record