My website is being DoS'ed by Google webspiders. Google is welcome to index my site, but sometimes it is querying a tagcloud on my site faster than my webserver can produce the results, making my webserver run out of resources.
How can I limit access to my webserver in such a way that normal visitors are not affected?
robots.txt is no option because it would block the whole site from being indexed.
iptables -m recent is tricky, because some pages have a lot of images or other data files and 'recent' triggers on those too (typically my RSS aggregator, loading images and feeds).
iptables -m limit has the same disadvantage and on top of that, I wasn't able to be selective per IP source address.
How can I limit visitors that cause my server load to rise too high?
I am running apache2 on Ubuntu server in a VirtualBox VM.