The Impact of Missing Labels and Overlapping Sound Events on Multi-label Multi-instance Learning for Sound Event Classification

Meire, Maarten; Karsmakers, Peter; Vuegen, Lode

doi:https://doi.org/10.33682/y8xs-0463

Full metadata record

DC Field	Value	Language
dc.contributor.author	Meire, Maarten
dc.contributor.author	Karsmakers, Peter
dc.contributor.author	Vuegen, Lode
dc.date.accessioned	2019-10-24T01:50:20Z	-
dc.date.available	2019-10-24T01:50:20Z	-
dc.date.issued	2019-10
dc.identifier.citation	M. Meire, P. Karsmakers & L. Vuegen, "The Impact of Missing Labels and Overlapping Sound Events on Multi-label Multi-instance Learning for Sound Event Classification", Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pages 159–163, New York University, NY, USA, Oct. 2019	en
dc.identifier.uri	http://hdl.handle.net/2451/60750	-
dc.description.abstract	Automated analysis of complex scenes of everyday sounds might help us navigate within the enormous amount of data and help us make better decisions based on the sounds around us. For this purpose classification models are required that translate raw audio to meaningful event labels. The specific task that this paper targets is that of learning sound event classifier models by a set of example sound segments that contain multiple potentially overlapping sound events and that are labeled with multiple weak sound event class names. This involves a combination of both multi-label and multi-instance learning. This paper investigates two state-of-theart methodologies that allow this type of learning, LRM-NMD and CNN. Besides comparing the accuracy in terms of correct sound event classifications, also the robustness to missing labels and to overlap of the sound events in the sound segments is evaluated. For small training set sizes LRM-NMD clearly outperforms CNN with an accuracy that is 40 to 50% higher. LRM-NMD does only minorly suffer from overlapping sound events during training while CNN suffers a substantial drop in classification accuracy, in the order of 10 to 20%, when sound events have a 100% overlap. Both methods show good robustness to missing labels. No matter how many labels are missing in a single segment (that contains multiple sound events) CNN converges to 97% accuracy when enough training data is available. LRM-NMD on the other hand shows a slight performance drop when the amount of missing labels increases.	en
dc.rights	Distributed under the terms of the Creative Commons Attribution 4.0 International (CC-BY) license.	en
dc.title	The Impact of Missing Labels and Overlapping Sound Events on Multi-label Multi-instance Learning for Sound Event Classification	en
dc.type	Article	en
dc.identifier.DOI	https://doi.org/10.33682/y8xs-0463
dc.description.firstPage	159
dc.description.lastPage	163
Appears in Collections:	Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)

Files in This Item:

File	Size	Format
DCASE2019Workshop_Meire_22.pdf	510.79 kB	Adobe PDF	View/Open

Show simple item record