Repository logo
Institutional Digital Repository
Shreenivas Deshpande Library, IIT (BHU), Varanasi

Integration of morphological features and contextual weightage using monotonic chunk attention for part of speech tagging

dc.contributor.authorMundotiya, Rajesh Kumar
dc.contributor.authorMehta, Arpit
dc.contributor.authorBaruah, Rupjyoti
dc.contributor.authorSingh, Anil Kumar
dc.date.accessioned2023-04-19T05:21:34Z
dc.date.available2023-04-19T05:21:34Z
dc.date.issued2022-10
dc.descriptionThis paper is submitted by the author of IIT (BHU), Varanasi, Indiaen_US
dc.description.abstractPart-of-Speech (POS) tagging is a fundamental sequence labeling problem in Natural Language Processing. Recent deep learning sequential models combine the forward and backward word informatio for POS tagging. The information of contextual words to the current word play a vital role in capturing the non-continuous relationship. We have proposed Monotonic chunk-wise attention with CNN-GRU-Softmax (MCCGS), a deep learning architecture that adheres to these essential information. This architecture consists of Input Encoder (IE), encodes word and character-level, Contextual Encoder (CE), assigns the weightage to adjacent word and Disambiguator (D), which resolves intra-label dependencies as core components. Moreover, different morphological features have been integrated into the core components of MCCGS architecture as MCCGS-IE, MCCGS-CE and MCCGS-D. The MCCGS architecture is validated on the 21 languages from Universal Dependency (UD) treebank. The state-of-the-art models, Type constraints, Retrofitting, Distant Supervision from Disparate Sources and Position-aware Self Attention, MCCGS and its variants such as MCCGS-IE, MCCGS-CE and MCCGS-D are obtained mean accuracy 83.65%, 81.29%, 84.10%, 90.18%, 90.40%, 91.40%, 90.90%, 92.30%, respectively. The proposed model architecture provides state-of-the-art accuracy on the low resource languages as Marathi (93.58%), Tamil (87.50%), Telugu (96.69%) and Sanskrit (97.28%) from UD treebank and Hindi (95.64%) and Urdu (87.47%) from Hindi-Urdu multi-representational treebank.en_US
dc.description.sponsorshipScience and Engineering Research Board , IIT (BHU), Varanasi, Indiaen_US
dc.identifier.issn13191578
dc.identifier.urihttps://idr-sdlib.iitbhu.ac.in/handle/123456789/2102
dc.language.isoen_USen_US
dc.publisherKing Saud bin Abdulaziz Universityen_US
dc.relation.ispartofseriesJournal of King Saud University - Computer and Information Sciences;Volume 34, Issue 9, Pages 7324 - 7334
dc.subjectMorphological featuresen_US
dc.subjectPart of Speech taggingen_US
dc.subjectConvolutional neural networken_US
dc.subjectAttention mechanismen_US
dc.titleIntegration of morphological features and contextual weightage using monotonic chunk attention for part of speech taggingen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1-s2.0-S1319157821002263-main.pdf
Size:
1.26 MB
Format:
Adobe Portable Document Format
Description:
Article - Gold Open Access

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: