Skip navigation
Full metadata record
DC FieldValueLanguage
dc.contributor.authorChytas, Sotirios Panagiotis
dc.contributor.authorPotamianos, Gerasimos
dc.date.accessioned2019-10-24T01:50:13Z-
dc.date.available2019-10-24T01:50:13Z-
dc.date.issued2019-10
dc.identifier.citationS. Chytas & G. Potamianos, "Hierarchical Detection of Sound Events and their Localization Using Convolutional Neural Networks with Adaptive Thresholds", Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pages 50–54, New York University, NY, USA, Oct. 2019en
dc.identifier.urihttp://hdl.handle.net/2451/60726-
dc.description.abstractThis paper details our approach to Task 3 of the DCASE’19 Challenge, namely sound event localization and detection (SELD). Our system is based on multi-channel convolutional neural networks (CNNs), combined with data augmentation and ensembling. Specifically, it follows a hierarchical approach that first determines adaptive thresholds for the multi-label sound event detection (SED) problem, based on a CNN operating on spectrograms over long duration windows. It then exploits the derived thresholds in an ensemble of CNNs operating on raw waveforms over shorter-duration sliding windows to provide event segmentation and labeling. Finally, it employs event localization CNNs to yield direction-of-arrival (DOA) source estimates of the detected sound events. The system is developed and evaluated on the microphone-array set of Task 3. Compared to the baseline of the Challenge organizers, on the development set it achieves relative improvements of 12% in SED error, 2% in F-score, 36% in DOA error, and 3% in the combined SELD metric, but trails significantly in frame-recall, whereas on the evaluation set it achieves relative improvements of 3% in SED, 51% in DOA, and 4% in SELD errors. Overall though, the system lags significantly behind the best Task 3 submission, achieving a combined SELD error of 0.2033 against 0.044 of the latteren
dc.rightsDistributed under the terms of the Creative Commons Attribution 4.0 International (CC-BY) license.en
dc.titleHierarchical Detection of Sound Events and their Localization Using Convolutional Neural Networks with Adaptive Thresholdsen
dc.typeArticleen
dc.identifier.DOIhttps://doi.org/10.33682/c6q0-wv87
dc.description.firstPage50
dc.description.lastPage54
Appears in Collections:Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)

Files in This Item:
File SizeFormat 
DCASE2019Workshop_Chytas_24.pdf779.62 kBAdobe PDFView/Open


Items in FDA are protected by copyright, with all rights reserved, unless otherwise indicated.