Deep Multi-view Features from Raw Audio for Acoustic Scene Classification

Singh, Arshdeep; Rajan, Padmanabhan; Bhavsar, Arnav

doi:https://doi.org/10.33682/05gk-pd08

Full metadata record

DC Field	Value	Language
dc.contributor.author	Singh, Arshdeep
dc.contributor.author	Rajan, Padmanabhan
dc.contributor.author	Bhavsar, Arnav
dc.date.accessioned	2019-10-24T01:50:23Z	-
dc.date.available	2019-10-24T01:50:23Z	-
dc.date.issued	2019-10
dc.identifier.citation	A. Singh, P. Rajan & A. Bhavsar, "Deep Multi-view Features from Raw Audio for Acoustic Scene Classification", Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pages 229–233, New York University, NY, USA, Oct. 2019	en
dc.identifier.uri	http://hdl.handle.net/2451/60765	-
dc.description.abstract	In this paper, we propose a feature representation framework which captures features constituting different levels of abstraction for audio scene classification. A pre-trained deep convolution neural network, SoundNet, is used to extract the features from various intermediate layers corresponding to an audio file. We consider that the features obtained from various intermediate layers provide the different types of abstraction and exhibits complementary information. Thus, combining the intermediate features of various layers can improve the classification performance to discriminate audio scenes. To obtain the representations, we ignore redundant filters in the intermediate layers using analysis of variance based redundancy removal framework. This reduces dimensionality and computational complexity. Next, shift-invariant fixed-length compressed representations across layers are obtained by aggregating the responses of the important filters only. The obtained compressed representations are stacked altogether to obtain a supervector. Finally, we employ the classification using multi-layer perceptron and support vector machine models. We comprehensively perform the validation of the above assumption on two public datasets; Making Sense of Sounds and open set acoustic scene classification DCASE 2019.	en
dc.rights	Distributed under the terms of the Creative Commons Attribution 4.0 International (CC-BY) license.	en
dc.title	Deep Multi-view Features from Raw Audio for Acoustic Scene Classification	en
dc.type	Article	en
dc.identifier.DOI	https://doi.org/10.33682/05gk-pd08
dc.description.firstPage	229
dc.description.lastPage	233
Appears in Collections:	Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)

Files in This Item:

File	Size	Format
DCASE2019Workshop_Singh_32.pdf	1.6 MB	Adobe PDF	View/Open

Show simple item record