A language identification method applied to Twitter data

dc.contributor.author	Singh A.K.; Goyal P.
dc.date.accessioned	2025-05-24T09:20:44Z
dc.description.abstract	This paper presents the results of some experiments on using a simple algorithm, aided by a few heuristics, for the purposes of language identification on Twitter data. These experiments were a part of a shared task focused on this problem. The core algorithm is an n-gram based distance metric algorithm. This algorithm has previously been shown to work very well on normal text. The distance metric used is symmetric cross entropy.
dc.identifier.doi	DOI not available
dc.identifier.uri	http://172.23.0.11:4000/handle/123456789/14362
dc.relation.ispartofseries	CEUR Workshop Proceedings
dc.title	A language identification method applied to Twitter data

Collections