119

I'd like to download, and extract an archive under a given directory. Here is how I've been doing it so far:

wget http://downloads.mysql.com/source/dbt2-0.37.50.3.tar.gz
tar zxf dbt2-0.37.50.3.tar.gz
mv dbt2-0.37.50.3 dbt2

I'd like instead to download and extract the archive on the fly, without having the tar.gz written to the disk. I think this is possible by piping the output of wget to tar, and giving tar a target, but in practice I don't know how to put the pieces together.

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
BenMorel
  • 4,447
  • 8
  • 36
  • 46

6 Answers6

157

You can do it by telling wget to output its payload to stdout (with the flag -O-) and suppress its own output (with the flag -q):

wget -qO- your_link_here | tar xvz

To specify a target directory:

wget -qO- your_link_here | tar xvz -C /target/directory

If you happen to have GNU tar, you can also rename the output dir:

wget -qO- your_link_here | tar --transform 's/^dbt2-0.37.50.3/dbt2/' -xvz
Stephen Kitt
  • 411,918
  • 54
  • 1,065
  • 1,164
Joseph R.
  • 38,849
  • 7
  • 107
  • 143
40

Another option is to use curl which writes to stdout by default:

curl -s -L https://example.com/archive.tar.gz | tar xvz - -C /tmp
Paweł Prażak
  • 205
  • 2
  • 6
Zlemini
  • 591
  • 5
  • 10
  • 3
    I like your option more than others but ```curl -s some_url | tar xvz - -C /tmp``` – FiftiN Mar 18 '19 at 17:44
  • 4
    as [FiftiN](https://unix.stackexchange.com/users/342396/fiftin) suggested -> e.g. to view a filtered list of files inside repository one could use: `$ curl -L https://api.github.com/repos/repo_owner/repo_name/tarball | tar tvfz - -C /tmp --wildcards *.py` – Alex Glukhovtsev Apr 24 '19 at 09:45
  • 7
    Better curl with "-L" to follow redirects – rfmoz Mar 27 '20 at 15:37
  • works by default on a Mac too – ElFik May 11 '21 at 22:03
13

This oneliner does the trick:

tar xvzf -C /tmp/ < <(wget -q -O - http://foo.com/myfile.tar.gz)

short explanation: the right side in the parenthesis is executed first (-q tells wget to do it quietly, -O - is used to write the output to stdout).

Then we create a named pipe using the process substitution operator from Bash <( to create a named pipe. This way we create a temporary file descriptor and then direct the contents of that descriptor to tar using the < file redirection operator.

Daniel Serodio
  • 1,123
  • 1
  • 9
  • 14
ItsMe
  • 314
  • 1
  • 5
2

Named pipe with stdin solution and really mind the flags for tar's -xvz

tar -xvz -C /tmp/ -f <(wget -q -O - https://github.com/user/repo/release/download/v/v.tar.gz)
2

One liner that handles redirects and can extract tar.bz2 files. Use xzfor extracting gzip files.

curl -L https://downloads.getmonero.org/cli/linux64 | tar xj
Elijah
  • 121
  • 1
0

The extraction part should take input from STDOUT. We may need tar -xzvf - -C <output_dir>

Example:


# this may not work
# It might complain 
# tar (child): -C: Cannot open: No such file or directory
wget -qO - https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3-scala2.13.tgz | tar -xzvf -C /opt/spark --strip-component 1


# this should work. 
wget -qO - https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3-scala2.13.tgz | tar -xzvf - -C /opt/spark --strip-component 1


Sairam Krish
  • 101
  • 2
  • How would one go about this using the wget -N flag? So only do this if the downloaded file has changed? I would imagine I would need to save the existing file for that? – fred Feb 13 '23 at 21:43