I want to generate a CHM/... e-book by wgetting with a subset condition: download a subset of data recursively in the website that is within HTML class .container for a CHM book. Pseudocode
wget recursively all links of chapters
# TODO returns only index.html wget --random-wait -r -p -nd -e robots=off -A".html" \ -U mozilla https://wwwnc.cdc.gov/travel/yellowbook/2018/table-of-contentsContents in the current main page in
.containerof Fig. 1 and contents in the daughter pages of links.create CHM e-book and/or other format
Fig. 1 Inspection of CDC Yellow Book .container
Output: just index.html
Expected output: e-book CHM and/or other format
Wget Proposals
TimS
wget -w5 --random-wait -r -nd -e robots=off -A".html" -U mozilla https://wwwnc.cdc.gov/travel/yellowbook/2018/table-of-contentsOutput: same as with the first code.
With Rejection List
wget -w5 --random-wait -r -nd -e robots=off -A".html" \ -U mozilla -R css https://wwwnc.cdc.gov/travel/yellowbook/2018/table-of-contentsOutput: same as without rejection lists.
Another variant
wget -w5 --random-wait -r -nd -e robots=off -A".html" \ -U mozilla https://wwwnc.cdc.gov/travel/yellowbook/2018/table-of-contentsOutput: similar as before.
The tool www.html2pdf.it gives
Cannot get http://wwwnc.cdc.gov/travel/yellowbook/2016/table-of-contents: http status code 404
OS: Debian 8.7
