A corpus-based decompounding in Sanskrit

dc.contributor.author	Sahu S.S.; Mamgain P.
dc.date.accessioned	2025-05-24T09:40:16Z
dc.description.abstract	Unlike English, in highly inflected Indian languages like Bengali, Marathi, and Sanskrit, compound words are not multi-word expressions but created by combining two or more simple words without any orthographic separation. A compound word with unmarked word boundaries creates a problem for many computational tasks. Splitting compound words improves performances in Machine Translation, and Information Retrieval by reducing out-of-vocabulary words in the dictionary. So far, a number of decompounding techniques have been applied in European languages like German, Dutch, and Scandinavian. In this work, we apply a corpus-based decompounding technique in Sanskrit and improve splitting accuracy by applying various ranking methods. We evaluate the performance by different ranking methods against a gold standard in terms of Precision, Recall, and F-measure. Copyright © 2019 for this paper by its authors.
dc.identifier.doi	DOI not available
dc.identifier.uri	http://172.23.0.11:4000/handle/123456789/19007
dc.relation.ispartofseries	CEUR Workshop Proceedings
dc.title	A corpus-based decompounding in Sanskrit

Collections