I've found only puf (Parallel URL fetcher) but I couldn't get it to read urls from a file; something like
puf < urls.txt
does not work either.
The operating system installed on the server is Ubuntu.
I've found only puf (Parallel URL fetcher) but I couldn't get it to read urls from a file; something like
puf < urls.txt
does not work either.
The operating system installed on the server is Ubuntu.
Using GNU Parallel,
$ parallel -j ${jobs} wget < urls.txt
or xargs from GNU Findutils,
$ xargs -n 1 -P ${jobs} wget < urls.txt
where ${jobs} is the maximum number of wget you want to allow to run concurrently (setting -n to 1 to get one wget invocation per line in urls.txt). Without -j/-P, parallel will run as many jobs at a time as CPU cores (which doesn't necessarily make sense for wget bound by network IO), and xargs will run one at a time.
One nice feature that parallel has over xargs is keeping the output of the concurrently-running jobs separated, but if you don't care about that, xargs is more likely to be pre-installed.
aria2 does this.
http://sourceforge.net/apps/trac/aria2/wiki/UsageExample#Downloadfileslistedinafileconcurrently
Example: aria2c http://example.org/mylinux.iso
Part of GNU Parallel's man page contains an example of a parallel recursive wget.
https://www.gnu.org/software/parallel/man.html#example-breadth-first-parallel-web-crawler-mirrorer
HTML is downloaded twice: Once for extracting links and once for downloading to disk. Other content is only downloaded once.
If you do not need the recursiveness ephemient's answer seems obvious.
You can implement that using Python and the pycurl library. The pycurl library has the "multi" interface that implements its own even loop that enables multiple simultaneous connections.
However the interface is rather C-like and therefore a bit cumbersome as compared to other, more "Pythonic", code.
I wrote a wrapper for it that builds a more complete browser-like client on top of it. You can use that as an example. See the pycopia.WWW.client module. The HTTPConnectionManager wraps the multi interface.
This works, and won't local or remote DoS, with proper adjustments:
(bandwidth=5000 jobs=8; \
parallel \
--round \
-P $jobs \
--nice +5 \
--delay 2 \
--pipepart \
--cat \
-a urls.txt \
wget \
--limit-rate=$((bandwidth/jobs))k \
-w 1 \
-nv \
-i {} \
)
You can also try:
#!/bin/bash
cat urls.txt | xargs -n 1 -P 2 wget -q
or in loop with -b option
#!/bin/bash
while read file; do
wget ${file} -b
done < urls.txt