A language identification method applied to Twitter data
| dc.contributor.author | Singh A.K.; Goyal P. | |
| dc.date.accessioned | 2025-05-24T09:20:44Z | |
| dc.description.abstract | This paper presents the results of some experiments on using a simple algorithm, aided by a few heuristics, for the purposes of language identification on Twitter data. These experiments were a part of a shared task focused on this problem. The core algorithm is an n-gram based distance metric algorithm. This algorithm has previously been shown to work very well on normal text. The distance metric used is symmetric cross entropy. | |
| dc.identifier.doi | DOI not available | |
| dc.identifier.uri | http://172.23.0.11:4000/handle/123456789/14362 | |
| dc.relation.ispartofseries | CEUR Workshop Proceedings | |
| dc.title | A language identification method applied to Twitter data |