Repository logo
Institutional Digital Repository
Shreenivas Deshpande Library, IIT (BHU), Varanasi

Advancing Language Identification in Code-Mixed Tulu Texts: Harnessing Deep Learning Techniques

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This study focuses on the task of word-level language identification in code-mixed Tulu-English texts, which is crucial for addressing the linguistic diversity observed on social media platforms. The CoLI-Tunglish shared task served as a platform for multiple teams to tackle this challenge, aiming to enhance our understanding of and capabilities in handling code-mixed language data. To tackle this task, we employed a methodology that leveraged Multilingual BERT (mBERT) for word embedding and a Bi-LSTM model for sequence representation. Our system achieved a Precision score of 0.74, indicating accurate language label predictions. However, our Recall score of 0.571 suggests the need for improvement, particularly in capturing all language labels, especially in multilingual contexts. The resulting F1 score, a balanced measure of our system’s performance, stood at 0.602, indicating a reasonable overall performance. Ultimately, our work contributes to advancing language understanding in multilingual digital communication. © 2023 Copyright for this paper by its authors.

Description

Keywords

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By