Background sound classification in speech audio segments

Singh J.; Joshi R.

doi:https://doi.org/10.1109/SPED.2019.8906597

Background sound classification in speech audio segments

dc.contributor.author	Singh J.; Joshi R.
dc.date.accessioned	2025-05-24T09:39:47Z
dc.description.abstract	Background sound classification is the task of identifying secondary sound sources in the surrounding environment. Real-time speech is always accompanied by a context. This context can be very helpful in enhancing the behavior of a variety of applications. Traditionally, audio classification tasks have mainly focused on speech due to its wide applicability. Recent works have explored environmental scene classification using acoustic features. Availability of different datasets like UrbanSound, ESC50, and AUDIOSET have further aided the process. Previous works have mostly focused on the classification of independently occurring acoustic events. In this work, we explore the classification of background sound in audio recordings containing human speech. We prepare a new dataset YBSS-200 using youtube videos where each sample contains a distinct background sound and an accompanying foreground human voice. We present a convolutional neural network based transfer learning approach using a VGG like Network for classification of context in such acoustic signals. Specific data augmentation techniques were used to improve the classification results. © 2019 IEEE.
dc.identifier.doi	https://doi.org/10.1109/SPED.2019.8906597
dc.identifier.uri	http://172.23.0.11:4000/handle/123456789/18492
dc.relation.ispartofseries	2019 10th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2019
dc.title	Background sound classification in speech audio segments

Collections

2019

Background sound classification in speech audio segments

Files

Collections