16

I am trying to download files from this website.

The URL is: http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file

When I use this command:

wget http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file 

I get only index.html?acc=GSE48191 which is some kind of binary format.

How can I download the files from this HTTP site?

Aaron Franke
  • 905
  • 3
  • 13
  • 24
user3138373
  • 2,441
  • 6
  • 29
  • 44

6 Answers6

26

I think your ? gets interpreted by shell (Correction by vinc17: more likely, it's the & which gets interpreted).

Just try with simple quotes around your URL:

wget 'http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

Note that the file you are requesting is a .tar file but the above command will save it as index.html?acc=GSE48191&format=file. To have it correctly named, you can either rename it to .tar:

mv 'index.html?acc=GSE48191&format=file' GSE4819.tar

Or you can give the name as an option to wget:

wget -O GSE48191.tar 'http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

The above command will save the downloaded file as GSE48191.tar directly.

Qeole
  • 684
  • 8
  • 12
  • It gets downloaded but it is not even a directory. If you look at the link http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE48191 , you can see there are multiple .gz files. I still can't access them?? – user3138373 Jul 22 '14 at 16:57
  • I suppose that the OP uses a shell that ignores `?` as a wildcard since nothing matches. The main problem is `&`: this will run the part that precedes (thus with an incomplete URL) in the background. But the solution is the same: to quote the URL. – vinc17 Jul 22 '14 at 17:07
  • Thanks to you terdon and vinc for edit/corrections. @user3138373: I can't find your .gz files on provided links, could you please tell again what URL you use to see/access them? – Qeole Jul 22 '14 at 17:10
  • 1
    @user3138373 the file you download is an archive (`.tar` file) that contains the .gz files. Once you have downloaded it, run `tar xvf GSE4819.tar` to expand the archive and access the files. – terdon Jul 22 '14 at 17:25
3

Another way that might possibly work is by using this command:

wget -O nameOfTar.tar "http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file"

The -O command will specify the name to download to.

Of course, your initial problem is because the "&" was being interpreted by the shell, surrounding the URL with double quotes fixes the issue.

ryekayo
  • 4,705
  • 9
  • 41
  • 66
  • 2
    `-O` **option** is used to specify the name of the file in which dowloaded data is saved. It has no incidence on downloaded data (maybe that's what you meant, but I found it unclear). – Qeole Jul 22 '14 at 17:16
  • Yes sorry, I will make my correction – ryekayo Jul 22 '14 at 17:17
  • I'm not sure why this got downvoted. – ryekayo Jul 22 '14 at 17:51
  • 3
    I did not downvote, but that's probably because your solution does not fix problem: `&` is interpreted by shell, and download of `.tar` file will fail. – Qeole Jul 22 '14 at 17:54
1

None of these answers worked for me.

However, you can find GSE* folders within the NCBI ftp page:

ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE48nnn/GSE48191/suppl/

You can then copy the link address from that file and just do a simple wget:

wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE48nnn/GSE48191/suppl/GSE48191_RAW.tar
icedcoffee
  • 111
  • 1
0

wget -O "name-you-want-to-save-as.format" http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file

That should get you the file you want to download to the current directory you are in.

  • `wget: missing URL` is what `wget` replies to that, because you are missing the argument to `-O`. Also, I think this probably doesn't solve the OP's problem anyway. – Celada Jul 19 '15 at 18:00
  • Because the URL contains `&`, this answer doesn't work unless you add `""` or `''` around the URL. – Aaron Franke Jan 08 '18 at 02:33
0

From $ curl -G http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191">here</a>.</p>
</body></html>

So you need to do

wget https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191

Notice the "s" after http. I tried it myself and it worked just fine.

The Letter M
  • 179
  • 4
0

What would help better is giving the page you got the link from which is: https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191

Now with that page the clickable link is: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE48nnn/GSE48191/suppl/GSE48191_RAW.tar

So use wget with the link is: wget https://ftp.ncbi.nlm.nih.gov/geo/series/GSE48nnn/GSE48191/suppl/GSE48191_RAW.tar

Jason Swartz
  • 156
  • 1
  • 3