Skip navigation
Full metadata record
DC FieldValueLanguage
dc.contributor.authorPratik, Pranay
dc.contributor.authorJee, Wen Jie
dc.contributor.authorNagisetty, Srikanth
dc.contributor.authorMars, Rohith
dc.contributor.authorLim, Chongsoon
dc.date.accessioned2019-10-24T01:50:22Z-
dc.date.available2019-10-24T01:50:22Z-
dc.date.issued2019-10
dc.identifier.citationP. Pratik, W. Jee, S. Nagisetty, R. Mars & C. Lim, "Sound Event Localization and Detection using CRNN Architecture with Mixup for Model Generalization", Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pages 199–203, New York University, NY, USA, Oct. 2019en
dc.identifier.urihttp://hdl.handle.net/2451/60759-
dc.description.abstractIn this paper, we present the details of our solution for the IEEE DCASE 2019 Task 3: Sound Event Localization and Detection (SELD) challenge. Given multi-channel audio as input, goal is to predict all instances of the sound labels and their directions-of-arrival (DOAs) in the form of azimuth and elevation angles. Our solution is based on Convolutional-Recurrent Neural Network (CRNN) architecture. In the CNN module of the proposed architecture, we introduced rectangular kernels in the pooling layers to minimize the information loss in temporal dimension within the CNN module, leading to boosting up the RNN module performance. Data augmentation mixup is applied in an attempt to train the network for greater generalization. The performance of the proposed architecture was evaluated with individual metrics, for sound event detection (SED) and localization task. Our team’s solution was ranked 5th in the DCASE-2019 Task-3 challenge with an F-score of 93.7% & Error Rate 0.12 for SED task and DOA error of 4.2° & frame recall 91.8% for localization task, both on the evaluation set. This results showed a significant performance improvement for both SED and localization estimation over the baseline system.en
dc.rightsCopyright The Authors, 2019en
dc.titleSound Event Localization and Detection using CRNN Architecture with Mixup for Model Generalizationen
dc.typeArticleen
dc.identifier.DOIhttps://doi.org/10.33682/gbfk-re38
dc.description.firstPage199
dc.description.lastPage203
Appears in Collections:Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)

Files in This Item:
File SizeFormat 
DCASE2019Workshop_Pratik_72.pdf685.51 kBAdobe PDFView/Open


Items in FDA are protected by copyright, with all rights reserved, unless otherwise indicated.