I want to view the content of the tarred file without extracting it,
Scenario: I have a.tar and inside there is a file called ./x/y.txt. I want to view the content of y.txt without actually extracting the a.tar.
- 807,993
- 194
- 1,674
- 2,175
- 341
- 1
- 3
- 3
-
If you use Emacs, you can simply open the tarball in it. – Qudit Jun 09 '15 at 21:31
-
Er, to view it, you have to extract it. I guess what you mean is "without writing it to a file"? – Toby Speight Jul 14 '15 at 17:22
-
Consider also: using a fuse file-system, to mount the tar as a file-system. If make the files in the tar, just look like files. – ctrl-alt-delor Aug 31 '23 at 18:08
5 Answers
It's probably a GNU specific option, but you could use the -O or --to-stdout to extract files to standard output
$ tar -axf file.tgz foo/bar -O
- 4,153
- 1
- 14
- 21
-
1Ah works, but I did not manage to print output on new lines. ex; `tar -axf file.tar.gz --wildcards --no-anchored '*read_this_file*' --O` when for example, many files match `*read_this_file*`. Everything gets printed on the same line. From the `man`, I found `--to-command`. so Passing `--to-command="echo '' && cat"` is a bit of black magic but it works :D – GabLeRoux Aug 08 '16 at 03:57
-
1
-
1Unfortunately the order of arguments matters and, for BSD tar (e.g. on macOS), this answer doesn’t work; instead, `-O` needs to *come before* the entry name, e.g.: `tar xf file.tar.gz -O foo/bar`. – Konrad Rudolph Sep 13 '21 at 10:14
This prints contents of ./x/y.txt from a.tar to STDOUT.
tar xfO a.tar ./x/y.txt
Note: it's a capital "o", not zero
- 113
- 4
- 401
- 3
- 9
This is simple as
less a.tar:./x/y.txt
This magic trick works if you have lesspipe installed and if the env variable LESSOPEN is defined to be | /usr/bin/lesspipe.sh %s which is expected if you have lesspipe installed correctly.
- 824
- 6
- 18
-
That's an awesome script - but there is more than one. As I understand it, [this `lesspipe.sh`](http://www-zeuthen.desy.de/~friebel/unix/lesspipe.html) should probably be preferred. – mikeserv Jun 09 '15 at 21:49
-
-
It should. But I just found it does not work in ubuntu. Go figure. They have broken or removed the feature. You can still view archive list with less but not file content :-( – solsTiCe Jun 09 '15 at 21:58
Oh, but this is a question about the contents of a file within a tar file. And actually, in some cases this isn't so hard. The thing is, a tar file is just a blocked out stream file - each file within the archive is found after the one before it, and each file gets a metadata header based on a specified format.
Based on that format, I once wrote shitar - which was a few lines of dd and shell scripts which could tar up a stream of block devices on the fly. Based on same, more recently I wrote these few lines of code:
tar --no-recursion -c ./ |
{ printf \\0; tr -s \\0; } |
cut -d '' -f-2,13 |
tr '\0\n' '\n\t'
... for picking apart a tar file on the fly and performing inline transformations on its component text files. There the cut fields point to fields 1,2,13 of a NUL delimited line of input. Such things are easy when the tar file contains only text files because tar's record delimiters (as might occur once every 512 bytes) can just be squeezed down to a single NUL per and stripped off - without requiring you to count the occurrences as you do.
tar's header format looks like this:
field offset len
name 0 100
mode 100 8
uid 108 8
gid 116 8
size 124 12
mtime 136 12
chksum 148 8
typeflag 156 1
linkname 157 100
magic 257 6
version 263 2
uname 265 32
gname 297 32
devmajor 329 8
devminor 337 8
prefix 345 155
Understand that there is a steep slope between the relative ease of handling simple tar operations with the vastly more complicated aspects of the archive format. While simple things - like packing a small group of homogeneously typed files together or even splitting out an archive containing only members whose types you can predict - can be easily done with a few shell pipes, reliably handling arbitrary archive members is no trifling matter.
It is especially difficult when those members might contain arbitrary binary data - which would certainly preclude any reliable application of tr -s - and this difficulty only compounds when files of various types other than regular and/or charsets other than your native one are used and/or the original archive was created by an implementation with format application idiosyncrasies you are unprepared to handle. And this is only touching on the basic, standardized aspects of the tar archive type - add in extended headers and format extensions and sparse files and compression and... well, good luck with those.
Back to basics, though, the standard record-size for a tar archive is 20 blocks - or 10240 bytes. Given an archive blocked on the standard record-size and containing only standard file types and standard ustar headers, though, you should skip from member-header to member-header by doing reads according to the size header field until you find a member matching the one for which you seek. Once there, read in size bytes from the offset beginning at the tail of your target's member header. And that's your file.
Skipping over the headers isn't terribly easy, though. Different types either will or won't have actual data blocks appended that correspond to size. For example, directories and links will contain no such data block, only a header description, and so you must be prepared to verify the current header's filetype before ascertaining exactly whether you should apply its size field to your skip formula or not.
Also, the record-size factors - depending on whether or not the archive-members' sizes sync up well with the 10240 standard record-size there may or may not be an additional 0-block appended to each. And the record-size can be declared at archive creation time - and so it may not even be 20 blocks at all, though, by spec, it must always be blocked on 512-byte units:
- ustar
- The
tarinterchange format; see the EXTENDED DESCRIPTION section. The default blocksize for this format for character special archive files shall be 10240. Implementations shall support all blocksize values less than or equal to 32256 that are multiples of 512.
- The
So if you were working with a tar file which might contain files which might contain arbitrary binary data you would have to skip through the file algorithmically, and according to filetype. The spec says:
- The
sizefield is the size of the file in octets.- If the
typeflagfield is set to specify a file to be of type 1 (a link) or 2 (a symbolic link), thesizefield shall be specified as zero. - If the
typeflagfield is set to specify a file of type 5 (directory), thesizefield shall be interpreted as described under the definition of that record type. - No data logical records are stored for types 1, 2, or 5.
- If the
typeflagfield is set to 3 (character special file), 4 (block special file), or 6 (FIFO), the meaning of thesizefield is unspecified by this volume of POSIX.1-2008, and no data logical records shall be stored on the medium. - Additionally, for type 6, the
sizefield shall be ignored when reading.
- If the
- If the
typeflagfield is set to any other value, the number of logical records written following the header shall be( (size+ 511 ) / 512 ), ignoring any fraction in the result of the division.
...and, of course, considering also the individual size of each header - which is an additional block per member. So you might skip through read by read from header to header until you land on one matching the header for which you seek, at which time you would then need to check whether the current record merely describes a link to your file or to the actual file. This is especially relevant because when the same file is added to an archive multiple times many tars will only include link headers because the actual file's data can already be found elsewhere within the archive.
Having verified that you'll need to apply your calculations to the chksum field and verify the file you think you have is actually the file you want after all. tar's chksum is fairly simple though-:
- cksum
- The
chksumfield shall be the ISO/IEC 646:1991 standard IRV representation of the octal value of the simple sum of all octets in the header logical record. Each octet in the header shall be treated as an unsigned value. These values shall be added to an unsigned integer, initialized to zero, the precision of which is not less than 17 bits. When calculating the checksum, thechksumfield is treated as if it were all <space> characters.
- The
Of course, you wouldn't actually have to do any of that, because tar can already do that - that's what it does - and so you should probably just use it to search the archive and extract the file for you. In doing so it won't do anything very differently than you would do if you knew what you were about, except that it will probably do it better and faster because that's its job. And anyway, why should you?
You can use this line
tar -axf a.tar -O
- 7,372
- 4
- 25
- 45
-
3This will show any file there is in the tar, not just `y.txt` and it is not clear from the OP's question that that is the only file in the tar. – Anthon Jun 09 '15 at 15:33