Repository logo
Institutional Digital Repository
Shreenivas Deshpande Library, IIT (BHU), Varanasi

Hierarchical self attention based sequential labelling model for Bhojpuri, Maithili and Magahi languages

dc.contributor.authorMundotiya, Rajesh Kumar
dc.contributor.authorMishra, Swasti
dc.contributor.authorSingh, Anil Kumar
dc.date.accessioned2023-04-18T07:50:09Z
dc.date.available2023-04-18T07:50:09Z
dc.date.issued2022-10
dc.descriptionThis paper is submitted by the author of IIT (BHU), Varanasien_US
dc.description.abstractSequential labelling plays a vital role in solving numerous Natural Language Processing (NLP) applications such as Machine Translation and Information Extraction etc. One of these is Part-of-Speech (POS) tagging, which assigns a sequence of grammatical categories to the given sentence, and Chunking which groups them into ‘chunks’ or what can be called minimal phrases. Bhojpuri, Maithili and Magahi are low resource languages and widely spoken in central north-eastern India, belonging to the Indo-Aryan language family. The creation of an annotated corpus for POS tagging and Chunking, and then building an initial automatic tool for these problems is the first attempt towards building language technology tools for these languages. The annotated corpus used to develop POS Taggers and Chunkers, based on various machine learning algorithms (TnT, CRF, MEMM and Structured SVM) and state-of-the-art LSTM-CNN-CRF model, and then these compared with the obtained results on two new proposed deep learning-based models, Self-Attention Hierarchical Bi-LSTM CRF (SAHBiLC) and a fine-tuned version of it, Fine-SAHBiLC. The SAHBiLC and Fine-SAHBiLC models outperform on Bhojpuri (Accuracy for POS and Chunking is 0.86% and 0.94%, respectively) and Maithili (Accuracy for POS and Chunking is 0.86% and 0.95%, respectively) and Magahi (Accuracy for POS is 0.86%).en_US
dc.identifier.issn13191578
dc.identifier.urihttps://idr-sdlib.iitbhu.ac.in/handle/123456789/2081
dc.language.isoenen_US
dc.publisherKing Saud bin Abdulaziz Universityen_US
dc.relation.ispartofseriesJournal of King Saud University - Computer and Information Sciences;Volume 34, Issue 10, Pages 8739 - 8749
dc.subjectChunkingen_US
dc.subjectDatasetsen_US
dc.subjectMachine learningen_US
dc.subjectNeural networken_US
dc.subjectPOS taggingen_US
dc.subjectTransfer learningen_US
dc.titleHierarchical self attention based sequential labelling model for Bhojpuri, Maithili and Magahi languagesen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hierarchical self attention based sequential labelling model.pdf
Size:
2.82 MB
Format:
Adobe Portable Document Format
Description:
Article - Gold Open Access

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: