Repository logo
Institutional Digital Repository
Shreenivas Deshpande Library, IIT (BHU), Varanasi

Classification of Hindi Compound Nouns Using Machine Learning

dc.contributor.authorDwivedi V.; Ghosh S.
dc.date.accessioned2025-05-23T11:24:20Z
dc.description.abstractThis work is a preliminary attempt towards automatic interpretation of Hindi domain-independent noun compounds of two words using machine learning technique. This is an on-going work on the interpretation of Hindi compound nouns. This work is the first of its kind to the best of our knowledge for Hindi compound noun interpretation. We have collected the dataset of 1500 words for this work from the multi-word expression list provided in the website of Centre For Indian Language Technology, IIT Bombay. A classification set is prepared for annotation of these compounds taking help of the previous works in English as well as looking at the patterns of occurrence of the compounds in the entire list. Features are extracted for the relations found in this dataset of 1500 words. We calculated inter annotator agreement using kappa coefficient to check the homogeneity of our relation set. We used the SVM and Random Forest model for the classification of these relations on a dataset 900 for training and 600 for testing and got an accuracy of 35.3% and 45.6%. The most frequent relations among these dataset were Purpose, Modifier, and Topic. We used the SVM and Random Forest model again for finding out these three relations. The result shows more accurate negative predictions than the positive predictions. The future work plans to use more machine learning techniques and incorporate multi-label classification algorithms to achieve more accurate predictions. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021.
dc.identifier.doihttps://doi.org/10.1007/s42979-021-00895-z
dc.identifier.urihttp://172.23.0.11:4000/handle/123456789/9969
dc.relation.ispartofseriesSN Computer Science
dc.titleClassification of Hindi Compound Nouns Using Machine Learning

Files

Collections