Unwanted crawler indexing my site

Obsessing as I do over my log files I noticed a crawler indexing every page on my blog. Nothing unusual there except it was a crawler I had never seen before and was rapidly going through my site. In their useragent string was a URL so off I went there to find out some more. Here’s what they say:

LiteFinder Network Crawler is a research project started by a group of Indian candidates from the cities of Bangalore, Patna and Jaipur. The project serves as a testing ground for information search technologies and programs, developed by a group of young scientists.

A quick look at their main site and one can see that their “information search technologies and programs” means a portal page entirely made up of links to search for online casinos, porn sites, prescrption drugs and the like. In fact exactly the sort of stuff that all attempts to spam this blog are attempting to link to.

Continuing to read through their info page I discovered this nugget:

Can I learn the IP addresses, which LiteFinder Network Crawler comes from?
Unfortunately, You can’t since it is against the rules of our company.

And this one takes the biscuit:

Does LiteFinder Network Crawler accept the directives from robots.txt file?
LiteFinder Network Crawler can recognize the directives from robots.txt files only partially, which is the result of the scantiness of our resources. Full support of robots.txt will be launched soon.

They can go take a running jump because I have just made it one of my rules not to allow them to access my site. So in the interests of impartiallity I’m going to give you the information they won’t.

Their .com and .net domains are registered through a proxy registrar. The IP address for the .com is 216.195.33.107 and the .net is 64.28.181.194 both in co-location facilities in the U.S.

The crawler had an IP address of 216.40.220.18 which again is in a co-located facility in the U.S. most likely the same facility where the .com is hosted. So much for the scantiness of their resources. Their crawler also uses 216.40.222.50, 75.125.18.178 and 76.53.249.34. All these IP addresses seem to be assigned to ev1servers.net (theplanet.com) with the exception of 64.28.181.194 which is assigned to cernel.net and 76.53.249.34 which is a broadband user in the U.S.

Might be ones best interests to block those IP addresses. If I come across anymore then I will post them here.

Updated IP’s as I see them:

208.101.44.3 – Softlayer.com
216.40.222.98 – Theplanet.com

74.86.209.74
67.19.250.26 -Courtesy of Jason (see comments.)

Nov 5th, 2007 | Posted in Blog, Networking, Scams, Spam, Technical, Web
Tags:
  1. Jason Beam
    Nov 7th, 2007 at 23:52 | #1

    We get these little buggers, too. Two more IPs:

    74.86.209.74
    67.19.250.26

  2. Nov 8th, 2007 at 01:56 | #2

    Jason,

    You are a star.

    Thank you for that, much appreciated.

  3. Dec 5th, 2007 at 00:08 | #3

    We’ve been fighting them for the past two weeks or so. Here’s the IP’s we have:
    209.62.109.178
    74.86.209.74
    74.53.249.34
    74.53.243.242
    74.53.243.226
    74.53.244.18
    87.118.118.111
    216.40.222.98

    While they have such “scanti” resources, they seem to have a lot of servers – I almost wonder if they are zombies doing the dirty work. While they go to a lot of trouble hiding their IP’s, their User Agent never changes. It’s trivial to construct a mod_rewrite rule to block them no matter what IP they use:

    RewriteCond %{HTTP_USER_AGENT} ^.*LiteFinder.*$
    RewriteRule ^.* – [F]

    As a matter of principle, I’ve emailed LiteFinder, and got no response. I emailed their ISP, theplanet.com at the abuse department, also to no response. I work for a company that has a paid legal team, they will be getting letters soon “convincing” them to stop.

    Dirtbags.

  4. Dec 5th, 2007 at 00:56 | #4

    Hi Justin,

    Thank you for those IP’s. I hope you have some success against them. And thank you too for the htaccess entry. Very much appreciated.

    My theory is that they specialise in indexing blogs and then trying to spam those blogs that meet whatever criteria they are looking for. They seem to focus exclusively on blogs.

    I’ve tried to make contact with their ISP too but to no avail. Still I hope that you have better luck than I did. Thanks again for the info.

  5. Andreas
    Dec 11th, 2007 at 08:24 | #5

    Hallo,

    the 74.86.209.74 is a harvester. He collects email addresses from web sites.

    He visited us at “http://www.abx.de/error.php” on 06 November. We showed him the email address ruouenigg.esiaduoesp@65784563.abx.de.

    Today, 11 Decembre, we received the first spam for this email address.

    Andreas.

  6. Dec 11th, 2007 at 11:09 | #6

    Hi Andreas,

    Thanks for the suggestion. I will keep an eye out for that address.

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Bad Behavior has blocked 978 access attempts in the last 7 days.

22 queries. 0.579 seconds.