Repository logo
Institutional Digital Repository
Shreenivas Deshpande Library, IIT (BHU), Varanasi

Word transduction for addressing the OOV problem in machine translation for similar resource-scarce languages

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Similar languages have a large number of cognate words which can be exploited to deal with Out-Of-Vocabulary (OOV) words problem. This problem is especially severe for resource-scarce languages. We propose a method for 'word transduction' for addressing this problem. We take advantage of the fact that, although it is difficult to prepare sentence aligned parallel corpus for such languages, it is much easier to prepare 'parallel' list of word pairs which are cognates and have similar pronunciations. We can try to learn pronunciations (or orthographic representations) of OOV words from such a parallel list. This could be done by using phrase-based machine translation (PBMT). We show that, for small amount of data, a model based on weighted rewrite rules for phoneme chunks outperforms a PBMT-based approach. An additional point that we make is that word transduction can also be used to borrow words from another similar language and adapt them to the phonology of the target language. © 2017 Association for Computational Linguistics.

Description

Keywords

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By