Developing Hindi Stammering Corpus: Framework and Insights

Dwivedi S.; Ghosh S.; Dwivedi S.

doi:https://doi.org/10.1007/s42979-021-00891-3

Developing Hindi Stammering Corpus: Framework and Insights

Authors

Abstract

Artificially intelligent voice-based virtual assistants are taking place in households due to ease of access. However, since such tools are trained on general-purpose speech data, speakers with peculiar speech traits face difficulties using these voice-based tools. Stammering is one such peculiar speech condition in which speakers face fluency-related problems during oral communication. Complications that speakers who stammer face while using these tools defy the sole purpose of virtual assistants. Hence, deficient speech-friendly voice-based technologies are need of the hour. As machine learning (ML) based solutions use sizable datasets for training, corpus creation is the first step in the process of developing an efficient ML solution. This research paper discusses the proposed annotation framework that we used in the Hindi Stammering Speech corpus. We also show initial insights based on the computable linguistic features from the annotated speech corpus. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021.