Performance evaluation of English to Sanskrit machine translation system
Abstract
The performance evaluation of machine translation (MT) has proven to be a very difficult problem. In present times, the automatic evaluation methods of MT have become very popular viz. bilingual evaluation understudy (BLEU), unigram precision, unigram recall, F-measure and METEOR score. The BLEU is a metric based on n-gram co-occurrence; precision, recall and F-measure are based on unigram matches; while METEOR score is based on explicit word-to-word matches between the translation and a reference translation (human judgement). In this paper, we evaluate our English to Sanskrit MT (EST) system with these evaluation methods of MT and propose a weighted BLEU, weighted unigram precision, weighted unigram recall, weighted F-measure and weighted METEOR score that is based on assignment of weight to different part-of-speech (POS) in the translation and a reference translation. Our proposed methods of above weighted methods of MT evaluations improve the score of these evaluation methods of MT. The performance results of our EST system, with and without weight of the above methods of MT evaluations, are shown using table. Copyright © 2012 Inderscience Enterprises Ltd.