Repository logo
Institutional Digital Repository
Shreenivas Deshpande Library, IIT (BHU), Varanasi

Building a text retrieval system for the Sanskrit language: Exploring indexing, stemming, and searching issues

dc.contributor.authorSahu S.S.; Pal S.
dc.date.accessioned2025-05-23T11:17:21Z
dc.description.abstractStemming is an important pre-processing step in the text analysis domains such as text mining, text summarization and information retrieval (IR). In this study, we build a Sanskrit text collection and explore different indexing, stemming and searching strategies in Sanskrit. We also propose two stemmers: a ‘light’ and an ‘aggressive’ and evaluate their effectiveness in the text analysis task. The performance of the stemmers is evaluated in two ways: a direct and an indirect IR-based evaluation. In direct evaluation, we found that the stemmers are effective. In indirect evaluation, we apply different retrieval models such as BM25, TF–IDF, Divergence from Randomness (DFR) based and language models. The proposed stemmers are compared with GRAS stemmer, language-independent indexing (trunc-n) and no stemming approach. Among different stemming methods, aggressive stemmers provide the best performance. Hiemstra language model outperforms other retrieval models we experimented with. In statistical analysis, we found that the proposed stemming approaches produce significantly better results than the no-stemming approach. © 2023 Elsevier Ltd
dc.identifier.doihttps://doi.org/10.1016/j.csl.2023.101518
dc.identifier.urihttp://172.23.0.11:4000/handle/123456789/7295
dc.relation.ispartofseriesComputer Speech and Language
dc.titleBuilding a text retrieval system for the Sanskrit language: Exploring indexing, stemming, and searching issues

Files

Collections