Integration of morphological features and contextual weightage using monotonic chunk attention for part of speech tagging

Mundotiya, Rajesh Kumar; Mehta, Arpit; Baruah, Rupjyoti; Singh, Anil Kumar

Integration of morphological features and contextual weightage using monotonic chunk attention for part of speech tagging

dc.contributor.author	Mundotiya, Rajesh Kumar
dc.contributor.author	Mehta, Arpit
dc.contributor.author	Baruah, Rupjyoti
dc.contributor.author	Singh, Anil Kumar
dc.date.accessioned	2023-04-19T05:21:34Z
dc.date.available	2023-04-19T05:21:34Z
dc.date.issued	2022-10
dc.description	This paper is submitted by the author of IIT (BHU), Varanasi, India	en_US
dc.description.abstract	Part-of-Speech (POS) tagging is a fundamental sequence labeling problem in Natural Language Processing. Recent deep learning sequential models combine the forward and backward word informatio for POS tagging. The information of contextual words to the current word play a vital role in capturing the non-continuous relationship. We have proposed Monotonic chunk-wise attention with CNN-GRU-Softmax (MCCGS), a deep learning architecture that adheres to these essential information. This architecture consists of Input Encoder (IE), encodes word and character-level, Contextual Encoder (CE), assigns the weightage to adjacent word and Disambiguator (D), which resolves intra-label dependencies as core components. Moreover, different morphological features have been integrated into the core components of MCCGS architecture as MCCGS-IE, MCCGS-CE and MCCGS-D. The MCCGS architecture is validated on the 21 languages from Universal Dependency (UD) treebank. The state-of-the-art models, Type constraints, Retrofitting, Distant Supervision from Disparate Sources and Position-aware Self Attention, MCCGS and its variants such as MCCGS-IE, MCCGS-CE and MCCGS-D are obtained mean accuracy 83.65%, 81.29%, 84.10%, 90.18%, 90.40%, 91.40%, 90.90%, 92.30%, respectively. The proposed model architecture provides state-of-the-art accuracy on the low resource languages as Marathi (93.58%), Tamil (87.50%), Telugu (96.69%) and Sanskrit (97.28%) from UD treebank and Hindi (95.64%) and Urdu (87.47%) from Hindi-Urdu multi-representational treebank.	en_US
dc.description.sponsorship	Science and Engineering Research Board , IIT (BHU), Varanasi, India	en_US
dc.identifier.issn	13191578
dc.identifier.uri	https://idr-sdlib.iitbhu.ac.in/handle/123456789/2102
dc.language.iso	en_US	en_US
dc.publisher	King Saud bin Abdulaziz University	en_US
dc.relation.ispartofseries	Journal of King Saud University - Computer and Information Sciences;Volume 34, Issue 9, Pages 7324 - 7334
dc.subject	Morphological features	en_US
dc.subject	Part of Speech tagging	en_US
dc.subject	Convolutional neural network	en_US
dc.subject	Attention mechanism	en_US
dc.title	Integration of morphological features and contextual weightage using monotonic chunk attention for part of speech tagging	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 1-s2.0-S1319157821002263-main.pdf
Size:: 1.26 MB
Format:: Adobe Portable Document Format
Description:: Article - Gold Open Access

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Department of Computer Science and Engineering