Sunday, 21 June 2009

Blocking Bots

In previous posts relating to my LAMP server (see tags) I've described how I use PHP to write iptables rules so I can keep various ports closed when I'm not actually using them. Something else I use it for is to block IP addresses from which bad bots are operating. This technique is discussed and documented on various other sites so I'm not going to give details full details here. However, in a nutshell my scheme involves:

1. Using robots.txt to designate a directory into which bots should not go.

2. Inserting hidden links on various documents that link to a file in the above directory. Humans will not see these links (because they are invisible) and well behaved robots (that pay attention to robots.txt) will not follow them either. Thus the only things that will follow those links are bad bots.

3. My hidden links point to a PHP script which write a new iptables rule, thereby blocking the bot. An entry is also made in a database recording the time and request that blocked the bot.

4. Bots will often operate using addresses from pools that are used legitimately at other times, so it's important to release the blocked addresses after a suitable period. I have a cron job to run a script that checks the database and releases anything that has been blocked for longer than a specified period.

This has been working well for some time however more recently I've been seeing something else in my server reports that I wanted to do something about; attempts to access urls such as these:

//Admin//scripts/setup.php
//MyAdmin//scripts/setup.php
//admin//scripts/setup.php
//phpMyAdmin//scripts/setup.php

There's usually a great long list of them trying lots of variations. Something else I've seen quite a bit of is this kind of thing:

/index.php?gzip=0&file=/etc/passwd

Again, the usually attempt the same, or similar things with any .php they can find.

Now provided that all of your other security is in place then attempts such as these shouldn't be a problem however, they are clearly attempts to break into the server. As such whatever is generating them is making a nuisance of itself and should, in my humble opinion, be told to **** off at the first available opportunity. So I've added a few more lines to httpd.conf

RewriteRule scripts/setup.php /var/www/cgi-bin/ip_blocker.php
RewriteCond %{QUERY_STRING} /etc/passwd
RewriteRule ^(.+) /var/www/cgi-bin/ip_blocker.php

The first line looks for urls that contain "scripts/setup.php" and redirects them to my blocking script. Obviously if you use anything on you server where that would be part of a legitimate request you need to modify that, but I don't, so I can use it. The next two lines do a similar redirect on any request where the text "/etc/passwd" appears in the query string.


Note that because I wanted these rules to apply to all of the virtual sites on my server, I've put these rules such that they apply to the main server and told all of the virtual sites to inherit them using:

RewriteOptions inherit

Note however that despite them appearing before the virtual server directives in the httpd.conf file, the virtual server directives are processed first. Thus it is important that none of the virtual server rules end with [F] as this would result in a match there halting the processing of rewrite rules before these are run.

Enjoy, unless you're a bot. ;-)