Skip navigation
Full metadata record
DC FieldValueLanguage
dc.contributor.authorIkawa, Shota
dc.contributor.authorKashino, Kunio
dc.date.accessioned2019-10-24T01:50:16Z-
dc.date.available2019-10-24T01:50:16Z-
dc.date.issued2019-10
dc.identifier.citationS. Ikawa & K. Kashino, "Neural Audio Captioning Based on Conditional Sequence-to-Sequence Model", Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pages 99–103, New York University, NY, USA, Oct. 2019en
dc.identifier.urihttp://hdl.handle.net/2451/60737-
dc.description.abstractWe propose an audio captioning system that describes non-speech audio signals in the form of natural language. Unlike existing systems, this system can generate a sentence describing sounds, rather than an object label or onomatopoeia. This allows the description to include more information, such as how the sound is heard and how the tone or volume changes over time, and can accommodate unknown sounds. A major problem in realizing this capability is that the validity of the description depends not only on the sound itself but also on the situation or context. To address this problem, a conditional sequence-to-sequence model is proposed. In this model, a parameter called ``specificity'' is introduced as a condition to control the amount of information contained in the output text and generate an appropriate description. Experiments show that the proposed model works effectively.en
dc.rightsCopyright The Authors, 2019en
dc.titleNeural Audio Captioning Based on Conditional Sequence-to-Sequence Modelen
dc.typeArticleen
dc.identifier.DOIhttps://doi.org/10.33682/7bay-bj41
dc.description.firstPage99
dc.description.lastPage103
Appears in Collections:Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)

Files in This Item:
File SizeFormat 
DCASE2019Workshop_Ikawa_82.pdf769.12 kBAdobe PDFView/Open


Items in FDA are protected by copyright, with all rights reserved, unless otherwise indicated.