Hierarchical Sound Event Classification
|Citation:||E. Nichols, D. Tompkins & J. Fan, "Hierarchical Sound Event Classification", Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pages 248–252, New York University, NY, USA, Oct. 2019|
|Abstract:||Task 5 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge is "urban sound tagging''. Given a set of known sound categories and sub-categories, the goal is to build a multi-label audio classification model to predict whether each sound category is present or absent in an audio recording. We developed a model composed of a preprocessing layer that converts audio to a log-mel spectrogram, a VGG-inspired Convolutional Neural Network (CNN) that generates an embedding for the spectrogram, a pre-trained VGGish network that generates a separate audio embedding, and finally a series of fully-connected layers that converts these two embeddings (concatenated) into a multi-label classification. This model directly outputs both “fine” and “coarse” labels; it treats the task as a 37-way multi-label classification problem. One version of this network did better at the coarse labels (CNN+VGGish1); another did better with fine labels on Micro AUPRC (CNN+VGGish2). A separate family of CNN models was also trained to take into account the hierarchical nature of the labels (Hierarchical1, Hierarchical2, and Hierarchical3). The hierarchical models perform better on Micro AUPRC with fine-level classification.|
|Appears in Collections:||Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)|
Files in This Item:
|DCASE2019Workshop_Tompkins_64.pdf||523.61 kB||Adobe PDF||View/Open|
Items in FDA are protected by copyright, with all rights reserved, unless otherwise indicated.