Defense response of search engine websites to non cooperating crawlers

Dev Chandna R.; Chaubey P.; Gupta S.C.

doi:https://doi.org/10.1109/WICT.2012.6409078

Defense response of search engine websites to non cooperating crawlers

dc.contributor.author	Dev Chandna R.; Chaubey P.; Gupta S.C.
dc.date.accessioned	2025-05-24T09:15:08Z
dc.description.abstract	Robots.txt non cooperating web crawlers are unwanted by any website as they can create serious negative impact in terms of denial of service, privacy and cost. Defense mechanisms such as automated content access protocol, captcha, web crawler trap, real time bot detection etc. have been proposed to protect websites from unwanted crawler access. Although, the extent of these mechanisms being practically applied against such crawlers is not known clearly. In this paper we present an investigation carried out to get insights about defense mechanisms used by websites against robots.txt non cooperating web crawlers. This investigation is limited only to search engine class of websites. MBot, a self-developed non cooperating web crawler is the primary tool used for investigation. On investigation we find that search engine websites do have defense mechanisms to prevent non cooperating crawler access on them. Although, absence of any kind of defense phenomena to prevent MBot's access is also observed on some of the investigated websites. Robustness in observed defense mechanisms to basic network and application parameters like proxy, port number, user agent, IP address etc. is also observed. © 2012 IEEE.
dc.identifier.doi	https://doi.org/10.1109/WICT.2012.6409078
dc.identifier.uri	http://172.23.0.11:4000/handle/123456789/13539
dc.relation.ispartofseries	Proceedings of the 2012 World Congress on Information and Communication Technologies, WICT 2012
dc.title	Defense response of search engine websites to non cooperating crawlers

Collections

2012

Defense response of search engine websites to non cooperating crawlers

Files

Collections