0

I'm trying to download a directory using a recursive wget command

wget -m -nH --cut-dirs=5 https://data.darts.isas.jaxa.jp/pub/pds3/sln-l-spice-6-v1.0/slnsp_1000/   

This works for some of the files, but also outputs a flurry of 403 Forbidden errors such as

--2023-06-13 08:43:51--  https://data.darts.isas.jaxa.jp/pub/pds3/sln-l-spice-6-v1.0/slnsp_1000/data/ck/SEL_M_200710_S_V03.lbl
Reusing existing connection to data.darts.isas.jaxa.jp:443.
HTTP request sent, awaiting response... 403 Forbidden
2023-06-13 08:43:51 ERROR 403: Forbidden.

However, if I try to download these files individually, it works

wget -m -nH --cut-dirs=5 https://data.darts.isas.jaxa.jp/pub/pds3/sln-l-spice-6-v1.0/slnsp_1000/data/ck/SEL_M_200710_S_V03.lbl

--2023-06-13 09:06:44--  https://data.darts.isas.jaxa.jp/pub/pds3/sln-l-spice-6-v1.0/slnsp_1000/data/ck/SEL_M_200710_S_V03.lbl
Resolving data.darts.isas.jaxa.jp (data.darts.isas.jaxa.jp)... 133.74.198.108
Connecting to data.darts.isas.jaxa.jp (data.darts.isas.jaxa.jp)|133.74.198.108|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1382 (1.3K)
Saving to: ‘ck/SEL_M_200710_S_V03.lbl’

ck/SEL_M_200710_S_V03.lb 100%[================================>]   1.35K  --.-KB/s    in 0s      

2023-06-13 09:06:44 (18.3 MB/s) - ‘ck/SEL_M_200710_S_V03.lbl’ saved [1382/1382]

FINISHED --2023-06-13 09:06:44--
Total wall clock time: 0.7s
Downloaded: 1 files, 1.3K in 0s (18.3 MB/s)

I have tried:

  • -e robots=off
  • --user-agent=Mozilla/5.0
  • --trust-server-names
  • Looked at the request header through Chrome developer tools for a single file. There is no cookie and no referer that I can identify.
GET /pub/pds3/sln-l-spice-6-v1.0/slnsp_1000/data/ck/SEL_M_200711_D_V03.BC HTTP/1.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Host: data.darts.isas.jaxa.jp
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
sec-ch-ua: "Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"
sec-ch-ua-mobile: ?0

By the way, these urls are from Data ARchives and Transmission System (DARTS) which archives high-level data products obtained by JAXA's (Japan Aerospace Exploration Agency) space science missions. It is meant for public download of these data products and should not have any authentication requirements.

Resources used

2cents
  • 113
  • 8
  • 1
    Something similar was discussed at https://unix.stackexchange.com/questions/410493/stop-wget-reusing-existing-connection, and the recommendation was to use `--no-http-keep-alive`. – berndbausch Jun 14 '23 at 02:50
  • @berndbausch `--no-http-keep-alive` is the solution. If you want to write an answer, I'll accept it. Thank you. – 2cents Jun 14 '23 at 13:58
  • 1
    No need for collecting internet points. I just reused another Stackexchange entry. Glad you found the solution. – berndbausch Jun 15 '23 at 01:49
  • Does this answer your question? [Stop wget reusing existing connection?](https://unix.stackexchange.com/questions/410493/stop-wget-reusing-existing-connection) – AdminBee Jun 20 '23 at 15:17
  • 1
    @AdminBee Yes, I already flagged it to be closed as a duplicate. – 2cents Jun 21 '23 at 16:02

0 Answers0