1

This lists all files in two backups, sorted by size:

tar tvf backup1.tar.bz2 |sort -k3 -n >backup1_files.txt
tar tvf backup2.tar.bz2 |sort -k3 -n >backup2_files.txt

I'd like to list all files present in backup2.tar.bz2 but not present in backup1.tar.bz2, sorted by size.

How to do this?


NB:

  • Doing a diff of these .txt files won't work because the modification dates of some files won't be the same. Thus this question is not a duplicate of Is there a tool to get the lines in one file that are not in another?.

  • Removing v would remove the modification dates, but also the filesizes, so it's not an option because it would be impossible to sort them by size.

Basj
  • 2,351
  • 9
  • 37
  • 70
  • Possible duplicate of [Is there a tool to get the lines in one file that are not in another?](https://unix.stackexchange.com/questions/28158/is-there-a-tool-to-get-the-lines-in-one-file-that-are-not-in-another) – muru Nov 01 '19 at 07:11
  • 1
    The modification times are probably being printed due to the `v` option. You don't need that here, just `t` should get you the filenames – muru Nov 01 '19 at 07:12
  • @muru Without `v`, I don't have the filesize, and then I cannot sort by size ; for this reason it seems not to be a duplicate here. – Basj Nov 01 '19 at 07:33
  • 1
    Once you have the files, you can use those with `tv` to get the size, so I don't see any problem. – muru Nov 01 '19 at 07:39
  • Once I have the files I only have a text with a list of files, so it's not something in `tar` anymore, so how could I pass `tv`? It might be easy for you but it's not obvious for me ;) maybe could you post an answer? Thank you in advance. – Basj Nov 01 '19 at 07:48
  • `xargs < list-of-files tar tvf some-file.tar.bz2` – muru Nov 01 '19 at 07:56

2 Answers2

0

If you have AWK, you can use a one liner like this:

awk '{if (NR==FNR) { arr[$6]=1 } else { if (! arr[$6]) { print } } }' backup2_files.txt backup1_files.txt

This will build an AWK array with the file names of backup 2 and then check whether the file names of backup 1 are present in that array. If not, it will print them.

EDIT: Here's an improved version that's more robust to files with whitespace in the name and doesn't need any temporary files:

 awk '{ key=""; for (i = 6; i <= NF; i++) { key=col_cat $i }; if (NR == FNR) { arr[key]=1 } else { if (! arr[key]) { print } } }' <(tar tvf backup2.tar.bz2 |sort -k3 -n) <(tar tvf backup1.tar.bz2 |sort -k3 -n)

You can write the awk code into a file like intersect.awk and re-use it like:

awk -f intersect.awk <(tar tvf backup2.tar.bz2 |sort -k3 -n) <(tar tvf backup1.tar.bz2 |sort -k3 -n)
  • Thank you for your answer. Would there be a direct solution without using these temporary .txt files, by piping `tar ...` directly into `awk`? – Basj Nov 01 '19 at 09:02
  • You can use a bash/zsh process substitution: `awk '{if (NR==FNR) { arr[$6]=1 } else { if (! arr[$6]) { print } } }' <(tar tvf backup2.tar.bz2) <(tar tvf backup1.tar.bz2)` One thing about this though, be careful with spaces in file names, by default awk splits on whitespace, so some matches might be wrong. – Bastian Schiffthaler Nov 01 '19 at 09:06
0

The proposed methods from other answers do not work since tar will print:

name123 symbolic link to namexyz

if there are symlinks in the archive and similar messages for hardlinks.

So the only way to deal with that is to use star:

star -t -tpath < archive.tar.bz2 > somename

Do this for all archives, sort the outpout and then use the well known methods to compare the resulting files.

The option -tpath tells starto only print the filename, once on a line.

star is part of the schilytools.

BTW: If a filename contains a newline character, this method will confuse the comparing tools.

schily
  • 18,806
  • 5
  • 38
  • 60