Repository logo
Institutional Digital Repository
Shreenivas Deshpande Library, IIT (BHU), Varanasi

Background sound classification in speech audio segments

dc.contributor.authorSingh J.; Joshi R.
dc.date.accessioned2025-05-24T09:39:47Z
dc.description.abstractBackground sound classification is the task of identifying secondary sound sources in the surrounding environment. Real-time speech is always accompanied by a context. This context can be very helpful in enhancing the behavior of a variety of applications. Traditionally, audio classification tasks have mainly focused on speech due to its wide applicability. Recent works have explored environmental scene classification using acoustic features. Availability of different datasets like UrbanSound, ESC50, and AUDIOSET have further aided the process. Previous works have mostly focused on the classification of independently occurring acoustic events. In this work, we explore the classification of background sound in audio recordings containing human speech. We prepare a new dataset YBSS-200 using youtube videos where each sample contains a distinct background sound and an accompanying foreground human voice. We present a convolutional neural network based transfer learning approach using a VGG like Network for classification of context in such acoustic signals. Specific data augmentation techniques were used to improve the classification results. © 2019 IEEE.
dc.identifier.doihttps://doi.org/10.1109/SPED.2019.8906597
dc.identifier.urihttp://172.23.0.11:4000/handle/123456789/18492
dc.relation.ispartofseries2019 10th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2019
dc.titleBackground sound classification in speech audio segments

Files

Collections