Repository logo
Institutional Digital Repository
Shreenivas Deshpande Library, IIT (BHU), Varanasi

A corpus-based decompounding in Sanskrit

dc.contributor.authorSahu S.S.; Mamgain P.
dc.date.accessioned2025-05-24T09:40:16Z
dc.description.abstractUnlike English, in highly inflected Indian languages like Bengali, Marathi, and Sanskrit, compound words are not multi-word expressions but created by combining two or more simple words without any orthographic separation. A compound word with unmarked word boundaries creates a problem for many computational tasks. Splitting compound words improves performances in Machine Translation, and Information Retrieval by reducing out-of-vocabulary words in the dictionary. So far, a number of decompounding techniques have been applied in European languages like German, Dutch, and Scandinavian. In this work, we apply a corpus-based decompounding technique in Sanskrit and improve splitting accuracy by applying various ranking methods. We evaluate the performance by different ranking methods against a gold standard in terms of Precision, Recall, and F-measure. Copyright © 2019 for this paper by its authors.
dc.identifier.doiDOI not available
dc.identifier.urihttp://172.23.0.11:4000/handle/123456789/19007
dc.relation.ispartofseriesCEUR Workshop Proceedings
dc.titleA corpus-based decompounding in Sanskrit

Files

Collections