View a file in a tar archive without extracting it

Question

I want to view the content of the tarred file without extracting it, Scenario: I have a.tar and inside there is a file called ./x/y.txt. I want to view the content of y.txt without actually extracting the a.tar.

Er, to view it, you have to extract it. I guess what you mean is "without writing it to a file"? — Toby Speight, Jul 14 '15 at 17:22
Consider also: using a fuse file-system, to mount the tar as a file-system. If make the files in the tar, just look like files. — ctrl-alt-delor, Aug 31 '23 at 18:08

score 39 · Answer 1 · edited Jan 23 '20 at 18:03

39

It's probably a GNU specific option, but you could use the -O or --to-stdout to extract files to standard output

$ tar -axf file.tgz foo/bar -O

edited Jan 23 '20 at 18:03

Community

1

answered Jun 09 '15 at 15:05

fredtantini

4,153
1
14
21

1

Ah works, but I did not manage to print output on new lines. ex; `tar -axf file.tar.gz --wildcards --no-anchored '*read_this_file*' --O` when for example, many files match `*read_this_file*`. Everything gets printed on the same line. From the `man`, I found `--to-command`. so Passing `--to-command="echo '' && cat"` is a bit of black magic but it works :D – GabLeRoux Aug 08 '16 at 03:57
1

Just this is needed in answer: `$ tar -axf file.tgz foo/bar -O` – user1742529 Jan 23 '20 at 16:28
1

Unfortunately the order of arguments matters and, for BSD tar (e.g. on macOS), this answer doesn’t work; instead, `-O` needs to *come before* the entry name, e.g.: `tar xf file.tar.gz -O foo/bar`. – Konrad Rudolph Sep 13 '21 at 10:14

score 26 · Answer 2 · edited Aug 31 '23 at 17:42

26

This prints contents of ./x/y.txt from a.tar to STDOUT.

tar xfO a.tar ./x/y.txt

Note: it's a capital "o", not zero

edited Aug 31 '23 at 17:42

Mathieu Rollet

113
4

answered Jun 09 '15 at 17:07

Toni

401
3
9

13

hint: it's a capital "o", not zero. – Hubert Grzeskowiak Dec 06 '18 at 08:52
This works in FreeBSD. – annahri Nov 30 '20 at 03:50

solsTiCe · Answer 3 · 2015-06-10T08:26:57.867

5

This is simple as

less  a.tar:./x/y.txt

This magic trick works if you have lesspipe installed and if the env variable LESSOPEN is defined to be | /usr/bin/lesspipe.sh %s which is expected if you have lesspipe installed correctly.

edited Jun 10 '15 at 08:26

answered Jun 09 '15 at 21:15

solsTiCe

824
6
18

That's an awesome script - but there is more than one. As I understand it, [this `lesspipe.sh`](http://www-zeuthen.desy.de/~friebel/unix/lesspipe.html) should probably be preferred. – mikeserv Jun 09 '15 at 21:49
Will that work on compressed tarballs? – terdon Jun 09 '15 at 21:56
It should. But I just found it does not work in ubuntu. Go figure. They have broken or removed the feature. You can still view archive list with less but not file content :-( – solsTiCe Jun 09 '15 at 21:58

score 4 · Answer 4 · edited Apr 13 '17 at 12:37

Oh, but this is a question about the contents of a file within a tar file. And actually, in some cases this isn't so hard. The thing is, a tar file is just a blocked out stream file - each file within the archive is found after the one before it, and each file gets a metadata header based on a specified format.

Based on that format, I once wrote shitar - which was a few lines of dd and shell scripts which could tar up a stream of block devices on the fly. Based on same, more recently I wrote these few lines of code:

tar --no-recursion -c ./      |
{ printf \\0; tr -s \\0; }    |
cut -d '' -f-2,13             |
tr '\0\n' '\n\t'

... for picking apart a tar file on the fly and performing inline transformations on its component text files. There the cut fields point to fields 1,2,13 of a NUL delimited line of input. Such things are easy when the tar file contains only text files because tar's record delimiters (as might occur once every 512 bytes) can just be squeezed down to a single NUL per and stripped off - without requiring you to count the occurrences as you do.

tar's header format looks like this:

field    offset   len
name     0        100
mode     100      8
uid      108      8
gid      116      8
size     124      12
mtime    136      12
chksum   148      8
typeflag 156      1
linkname 157      100
magic    257      6
version  263      2
uname    265      32
gname    297      32
devmajor 329      8
devminor 337      8
prefix   345      155

Understand that there is a steep slope between the relative ease of handling simple tar operations with the vastly more complicated aspects of the archive format. While simple things - like packing a small group of homogeneously typed files together or even splitting out an archive containing only members whose types you can predict - can be easily done with a few shell pipes, reliably handling arbitrary archive members is no trifling matter.

It is especially difficult when those members might contain arbitrary binary data - which would certainly preclude any reliable application of tr -s - and this difficulty only compounds when files of various types other than regular and/or charsets other than your native one are used and/or the original archive was created by an implementation with format application idiosyncrasies you are unprepared to handle. And this is only touching on the basic, standardized aspects of the tar archive type - add in extended headers and format extensions and sparse files and compression and... well, good luck with those.

Back to basics, though, the standard record-size for a tar archive is 20 blocks - or 10240 bytes. Given an archive blocked on the standard record-size and containing only standard file types and standard ustar headers, though, you should skip from member-header to member-header by doing reads according to the size header field until you find a member matching the one for which you seek. Once there, read in size bytes from the offset beginning at the tail of your target's member header. And that's your file.

Skipping over the headers isn't terribly easy, though. Different types either will or won't have actual data blocks appended that correspond to size. For example, directories and links will contain no such data block, only a header description, and so you must be prepared to verify the current header's filetype before ascertaining exactly whether you should apply its size field to your skip formula or not.

Also, the record-size factors - depending on whether or not the archive-members' sizes sync up well with the 10240 standard record-size there may or may not be an additional 0-block appended to each. And the record-size can be declared at archive creation time - and so it may not even be 20 blocks at all, though, by spec, it must always be blocked on 512-byte units:

ustar
- The tar interchange format; see the EXTENDED DESCRIPTION section. The default blocksize for this format for character special archive files shall be 10240. Implementations shall support all blocksize values less than or equal to 32256 that are multiples of 512.

So if you were working with a tar file which might contain files which might contain arbitrary binary data you would have to skip through the file algorithmically, and according to filetype. The spec says:

The size field is the size of the file in octets.
- If the typeflag field is set to specify a file to be of type 1 (a link) or 2 (a symbolic link), the size field shall be specified as zero.
- If the typeflag field is set to specify a file of type 5 (directory), the size field shall be interpreted as described under the definition of that record type.
- No data logical records are stored for types 1, 2, or 5.
- If the typeflag field is set to 3 (character special file), 4 (block special file), or 6 (FIFO), the meaning of the size field is unspecified by this volume of POSIX.1-2008, and no data logical records shall be stored on the medium.
- Additionally, for type 6, the size field shall be ignored when reading.
If the typeflag field is set to any other value, the number of logical records written following the header shall be ( (size+ 511 ) / 512 ), ignoring any fraction in the result of the division.

...and, of course, considering also the individual size of each header - which is an additional block per member. So you might skip through read by read from header to header until you land on one matching the header for which you seek, at which time you would then need to check whether the current record merely describes a link to your file or to the actual file. This is especially relevant because when the same file is added to an archive multiple times many tars will only include link headers because the actual file's data can already be found elsewhere within the archive.

Having verified that you'll need to apply your calculations to the chksum field and verify the file you think you have is actually the file you want after all. tar's chksum is fairly simple though-:

cksum
- The chksum field shall be the ISO/IEC 646:1991 standard IRV representation of the octal value of the simple sum of all octets in the header logical record. Each octet in the header shall be treated as an unsigned value. These values shall be added to an unsigned integer, initialized to zero, the precision of which is not less than 17 bits. When calculating the checksum, the chksum field is treated as if it were all <space> characters.

Of course, you wouldn't actually have to do any of that, because tar can already do that - that's what it does - and so you should probably just use it to search the archive and extract the file for you. In doing so it won't do anything very differently than you would do if you knew what you were about, except that it will probably do it better and faster because that's its job. And anyway, why should you?

score 1 · Answer 5 · answered Jun 09 '15 at 15:08

1

You can use this line

tar -axf a.tar -O

answered Jun 09 '15 at 15:08

tachomi

7,372
4
25
45

3

This will show any file there is in the tar, not just `y.txt` and it is not clear from the OP's question that that is the only file in the tar. – Anthon Jun 09 '15 at 15:33

View a file in a tar archive without extracting it

5 Answers5

Linked