Skip navigation
Title: 

Hierarchical Detection of Sound Events and their Localization Using Convolutional Neural Networks with Adaptive Thresholds

Authors: Chytas, Sotirios Panagiotis
Potamianos, Gerasimos
Date Issued: Oct-2019
Citation: S. Chytas & G. Potamianos, "Hierarchical Detection of Sound Events and their Localization Using Convolutional Neural Networks with Adaptive Thresholds", Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pages 50–54, New York University, NY, USA, Oct. 2019
Abstract: This paper details our approach to Task 3 of the DCASE’19 Challenge, namely sound event localization and detection (SELD). Our system is based on multi-channel convolutional neural networks (CNNs), combined with data augmentation and ensembling. Specifically, it follows a hierarchical approach that first determines adaptive thresholds for the multi-label sound event detection (SED) problem, based on a CNN operating on spectrograms over long duration windows. It then exploits the derived thresholds in an ensemble of CNNs operating on raw waveforms over shorter-duration sliding windows to provide event segmentation and labeling. Finally, it employs event localization CNNs to yield direction-of-arrival (DOA) source estimates of the detected sound events. The system is developed and evaluated on the microphone-array set of Task 3. Compared to the baseline of the Challenge organizers, on the development set it achieves relative improvements of 12% in SED error, 2% in F-score, 36% in DOA error, and 3% in the combined SELD metric, but trails significantly in frame-recall, whereas on the evaluation set it achieves relative improvements of 3% in SED, 51% in DOA, and 4% in SELD errors. Overall though, the system lags significantly behind the best Task 3 submission, achieving a combined SELD error of 0.2033 against 0.044 of the latter
First Page: 50
Last Page: 54
DOI: https://doi.org/10.33682/c6q0-wv87
Type: Article
Appears in Collections:Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)

Files in This Item:
File SizeFormat 
DCASE2019Workshop_Chytas_24.pdf779.62 kBAdobe PDFView/Open


Items in FDA are protected by copyright, with all rights reserved, unless otherwise indicated.