Repository logo
Institutional Digital Repository
Shreenivas Deshpande Library, IIT (BHU), Varanasi

Defense response of search engine websites to non cooperating crawlers

dc.contributor.authorDev Chandna R.; Chaubey P.; Gupta S.C.
dc.date.accessioned2025-05-24T09:15:08Z
dc.description.abstractRobots.txt non cooperating web crawlers are unwanted by any website as they can create serious negative impact in terms of denial of service, privacy and cost. Defense mechanisms such as automated content access protocol, captcha, web crawler trap, real time bot detection etc. have been proposed to protect websites from unwanted crawler access. Although, the extent of these mechanisms being practically applied against such crawlers is not known clearly. In this paper we present an investigation carried out to get insights about defense mechanisms used by websites against robots.txt non cooperating web crawlers. This investigation is limited only to search engine class of websites. MBot, a self-developed non cooperating web crawler is the primary tool used for investigation. On investigation we find that search engine websites do have defense mechanisms to prevent non cooperating crawler access on them. Although, absence of any kind of defense phenomena to prevent MBot's access is also observed on some of the investigated websites. Robustness in observed defense mechanisms to basic network and application parameters like proxy, port number, user agent, IP address etc. is also observed. © 2012 IEEE.
dc.identifier.doihttps://doi.org/10.1109/WICT.2012.6409078
dc.identifier.urihttp://172.23.0.11:4000/handle/123456789/13539
dc.relation.ispartofseriesProceedings of the 2012 World Congress on Information and Communication Technologies, WICT 2012
dc.titleDefense response of search engine websites to non cooperating crawlers

Files

Collections