Sound Event Localization and Detection using CRNN Architecture with Mixup for Model Generalization

Pratik, Pranay; Jee, Wen Jie; Nagisetty, Srikanth; Mars, Rohith; Lim, Chongsoon

doi:https://doi.org/10.33682/gbfk-re38

Full metadata record

DC Field	Value	Language
dc.contributor.author	Pratik, Pranay
dc.contributor.author	Jee, Wen Jie
dc.contributor.author	Nagisetty, Srikanth
dc.contributor.author	Mars, Rohith
dc.contributor.author	Lim, Chongsoon
dc.date.accessioned	2019-10-24T01:50:22Z	-
dc.date.available	2019-10-24T01:50:22Z	-
dc.date.issued	2019-10
dc.identifier.citation	P. Pratik, W. Jee, S. Nagisetty, R. Mars & C. Lim, "Sound Event Localization and Detection using CRNN Architecture with Mixup for Model Generalization", Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pages 199–203, New York University, NY, USA, Oct. 2019	en
dc.identifier.uri	http://hdl.handle.net/2451/60759	-
dc.description.abstract	In this paper, we present the details of our solution for the IEEE DCASE 2019 Task 3: Sound Event Localization and Detection (SELD) challenge. Given multi-channel audio as input, goal is to predict all instances of the sound labels and their directions-of-arrival (DOAs) in the form of azimuth and elevation angles. Our solution is based on Convolutional-Recurrent Neural Network (CRNN) architecture. In the CNN module of the proposed architecture, we introduced rectangular kernels in the pooling layers to minimize the information loss in temporal dimension within the CNN module, leading to boosting up the RNN module performance. Data augmentation mixup is applied in an attempt to train the network for greater generalization. The performance of the proposed architecture was evaluated with individual metrics, for sound event detection (SED) and localization task. Our team’s solution was ranked 5th in the DCASE-2019 Task-3 challenge with an F-score of 93.7% & Error Rate 0.12 for SED task and DOA error of 4.2° & frame recall 91.8% for localization task, both on the evaluation set. This results showed a significant performance improvement for both SED and localization estimation over the baseline system.	en
dc.rights	Copyright The Authors, 2019	en
dc.title	Sound Event Localization and Detection using CRNN Architecture with Mixup for Model Generalization	en
dc.type	Article	en
dc.identifier.DOI	https://doi.org/10.33682/gbfk-re38
dc.description.firstPage	199
dc.description.lastPage	203
Appears in Collections:	Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)

Files in This Item:

File	Size	Format
DCASE2019Workshop_Pratik_72.pdf	685.51 kB	Adobe PDF	View/Open

Show simple item record