Downloading files using wget

Question

I am trying to download files from this website.

The URL is: http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file

When I use this command:

wget http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file

I get only index.html?acc=GSE48191 which is some kind of binary format.

How can I download the files from this HTTP site?

Qeole · Answer 1 · 2014-07-22T17:17:08.710

26

I think your ? gets interpreted by shell (Correction by vinc17: more likely, it's the & which gets interpreted).

Just try with simple quotes around your URL:

wget 'http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

Note that the file you are requesting is a .tar file but the above command will save it as index.html?acc=GSE48191&format=file. To have it correctly named, you can either rename it to .tar:

mv 'index.html?acc=GSE48191&format=file' GSE4819.tar

Or you can give the name as an option to wget:

wget -O GSE48191.tar 'http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

The above command will save the downloaded file as GSE48191.tar directly.

edited Jul 22 '14 at 17:17

answered Jul 22 '14 at 16:46

Qeole

684
8
12

It gets downloaded but it is not even a directory. If you look at the link http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE48191 , you can see there are multiple .gz files. I still can't access them?? – user3138373 Jul 22 '14 at 16:57
I suppose that the OP uses a shell that ignores `?` as a wildcard since nothing matches. The main problem is `&`: this will run the part that precedes (thus with an incomplete URL) in the background. But the solution is the same: to quote the URL. – vinc17 Jul 22 '14 at 17:07
Thanks to you terdon and vinc for edit/corrections. @user3138373: I can't find your .gz files on provided links, could you please tell again what URL you use to see/access them? – Qeole Jul 22 '14 at 17:10
1

@user3138373 the file you download is an archive (`.tar` file) that contains the .gz files. Once you have downloaded it, run `tar xvf GSE4819.tar` to expand the archive and access the files. – terdon Jul 22 '14 at 17:25

score 3 · Answer 2 · edited Jul 22 '14 at 22:07

3

Another way that might possibly work is by using this command:

wget -O nameOfTar.tar "http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file"

The -O command will specify the name to download to.

Of course, your initial problem is because the "&" was being interpreted by the shell, surrounding the URL with double quotes fixes the issue.

edited Jul 22 '14 at 22:07

answered Jul 22 '14 at 17:02

ryekayo

4,705
9
41
66

2

`-O` **option** is used to specify the name of the file in which dowloaded data is saved. It has no incidence on downloaded data (maybe that's what you meant, but I found it unclear). – Qeole Jul 22 '14 at 17:16
Yes sorry, I will make my correction – ryekayo Jul 22 '14 at 17:17
I'm not sure why this got downvoted. – ryekayo Jul 22 '14 at 17:51
3

I did not downvote, but that's probably because your solution does not fix problem: `&` is interpreted by shell, and download of `.tar` file will fail. – Qeole Jul 22 '14 at 17:54

score 1 · Answer 3 · answered Jan 14 '21 at 13:30

None of these answers worked for me.

However, you can find GSE* folders within the NCBI ftp page:

ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE48nnn/GSE48191/suppl/

You can then copy the link address from that file and just do a simple wget:

wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE48nnn/GSE48191/suppl/GSE48191_RAW.tar

Samman Bikram Thapa · Answer 4 · 2015-07-22T04:43:02.643

0

wget -O "name-you-want-to-save-as.format" http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file

That should get you the file you want to download to the current directory you are in.

edited Jul 22 '15 at 04:43

answered Jul 19 '15 at 17:39

Samman Bikram Thapa

111
4

`wget: missing URL` is what `wget` replies to that, because you are missing the argument to `-O`. Also, I think this probably doesn't solve the OP's problem anyway. – Celada Jul 19 '15 at 18:00
Because the URL contains `&`, this answer doesn't work unless you add `""` or `''` around the URL. – Aaron Franke Jan 08 '18 at 02:33

score 0 · Answer 5 · answered Nov 06 '18 at 04:43

From $ curl -G http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191">here</a>.</p>
</body></html>

So you need to do

wget https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191

Notice the "s" after http. I tried it myself and it worked just fine.

score 0 · Answer 6 · answered Mar 11 '21 at 04:11

What would help better is giving the page you got the link from which is: https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191

Now with that page the clickable link is: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE48nnn/GSE48191/suppl/GSE48191_RAW.tar

So use wget with the link is: wget https://ftp.ncbi.nlm.nih.gov/geo/series/GSE48nnn/GSE48191/suppl/GSE48191_RAW.tar

Downloading files using wget

6 Answers6

Linked

Related