A corpus-based decompounding in Sanskrit
| dc.contributor.author | Sahu S.S.; Mamgain P. | |
| dc.date.accessioned | 2025-05-24T09:40:16Z | |
| dc.description.abstract | Unlike English, in highly inflected Indian languages like Bengali, Marathi, and Sanskrit, compound words are not multi-word expressions but created by combining two or more simple words without any orthographic separation. A compound word with unmarked word boundaries creates a problem for many computational tasks. Splitting compound words improves performances in Machine Translation, and Information Retrieval by reducing out-of-vocabulary words in the dictionary. So far, a number of decompounding techniques have been applied in European languages like German, Dutch, and Scandinavian. In this work, we apply a corpus-based decompounding technique in Sanskrit and improve splitting accuracy by applying various ranking methods. We evaluate the performance by different ranking methods against a gold standard in terms of Precision, Recall, and F-measure. Copyright © 2019 for this paper by its authors. | |
| dc.identifier.doi | DOI not available | |
| dc.identifier.uri | http://172.23.0.11:4000/handle/123456789/19007 | |
| dc.relation.ispartofseries | CEUR Workshop Proceedings | |
| dc.title | A corpus-based decompounding in Sanskrit |