I want to test how my site would be behave when being spidered. However, I want to exclude all URLs containing the word "page". I tried:
$ wget -r -R "*page*" --spider --no-check-certificate -w 1 http://mysite.com/
The -R flag is supposed to reject URL pattern containing the word "page". Except that it doesn't seem to work:
Spider mode enabled. Check if remote file exists.
--2014-06-10 12:34:56-- http://mysite.com/?sort=post&page=87729
Reusing existing connection to [mysite.com]:80.
HTTP request sent, awaiting response... 200 OK
How do I exclude spidering of such URL?