As for your original question. My Apache in debian is not configured with libwrap, so it won't consult hosts.deny. [the previous answer already mentions it - the reality is that tcpwrappers is not the epitome of security as it was in the 90s, specially when it comes to blacklisting]. You would have to run it not as a daemon, but from (x)inetd, which would slow it down (considerably).
You can block/allow access at Apache level, and do not need tcp wrappers for Apache [nor iptables for that matter]. You have not mentioned ssh, however I never leave ssh servers open to the outside directly. However keep reading on.
We have a vhost with 300+ domains, and also similar problems, with taobao, baidu, and often even with google spiders. Notably baidu spiders can be quite aggressive and persistent.
As you have already figured out, they have farms of servers, and even if you block an IP they will appear again shortly from some other IP addresses.
It is not practical at all to try and maintain lists of IP addresses/netblocks by hand.
What it works for us rather well is modsecurity blocking user agent strings permanently while mod_evasive is blocking temporarily IPs that are being abusive.
This setup, besides slowing down spiders from search engines, also has the advantage of throttling down zombies trying to guess passwords on CMSes.
The relevant part of our modsecurity.conf
SecRule REQUEST_HEADERS:User-Agent "Yandex" phase:1,deny,nolog,id:'6972'
SecRule REQUEST_HEADERS:User-Agent "ichiro" phase:1,deny,nolog,id:'6973'
SecRule REQUEST_HEADERS:User-Agent "Baiduspider" phase:1,deny,nolog,id:'6974'
SecRule REQUEST_HEADERS:User-Agent "Baiduspider/.*" phase:1,deny,nolog,id:'6975'
SecRule REQUEST_HEADERS:User-Agent "Baiduspider-video" phase:1,deny,nolog,id:'6976'
SecRule REQUEST_HEADERS:User-Agent "Baiduspider-image" phase:1,deny,nolog,id:'6977'
SecRule REQUEST_HEADERS:User-Agent "sogou spider" phase:1,deny,nolog,id:'6978'
SecRule REQUEST_HEADERS:User-Agent "YoudaoBot" phase:1,deny,nolog,id:'6979'
SecRule REQUEST_HEADERS:User-Agent "bingbot(at)microsoft.com" phase:1,deny,nolog,id:'6980'
SecRule REQUEST_HEADERS:User-Agent "msnbot(at)microsoft.com" phase:1,deny,nolog,id:'6981'
SecRule REQUEST_HEADERS:User-Agent "BLEXBot/1.0" phase:1,deny,nolog,id:'6982'
SecRule REQUEST_HEADERS:User-Agent "Bot.*" phase:1,deny,nolog,id:'6984'
SecRule REQUEST_HEADERS:User-Agent "AhrefsBot.*" phase:1,deny,nolog,id:'6985'
And our mod-evasive.conf
DOSHashTableSize 2048
DOSPageCount 10
DOSSiteCount 300
DOSPageInterval 2.0
DOSSiteInterval 1.0
DOSBlockingPeriod 600.0
DOSLogDir /var/log/apache2/evasive
DOSWhitelist 127.0.0.1
DOSWhitelist 1xx.xxx.xxx.xx
I also forgot a very real possibility. If you do not deal with China or are running your home server, just block the whole country. The level of attacks and malware that comes from them has justified many professionals to do that.
http://www.cyberciti.biz/faq/block-entier-country-using-iptables/
I also forgot to add to this rather lengthy answer a footnote. Often people suggest me at work to use robots.txt for these kind of problems. The point is robots.txt is only a suggestion for remote programs.Rogue actors certainly ignore them, and it is not guaranteed other web crawlers honor them nowadays. From our tests, for instance, Baidu seems to not honor them. (robots.txt it tantamount to ask a gangster, please tickle me instead of punching me)