0

I have followed following answer for Tar produces different files each time and Why does the PIGZ produce a different md5sum. My goal is to generate deterministic compressed file in tar.gz format (having the same md5sum hash), if the compressed files' contents are not change.

$ cd /home/user/folder_to_zip
$ echo "helloworld" > data.txt && md5sum data.txt
d73b04b0e696b0945283defa3eee4538  data.txt
$ echo "life" > data1.txt
$ md5sum data1.txt
5b019d44541d974a2cbf3299c8279c10  data1.txt
$ ls -lA -tr -h
total 8.0K
-rw-rw-r-- 1 alper alper alper 11 2021-01-24 17:20 data.txt
-rw-rw-r-- 1 alper alper alper  5 2021-01-24 17:31 data1.txt
$ find . -print0 | LC_ALL=C sort -z | \
  PIGZ=-n tar -Ipigz --mode=a+rwX --owner=0 --group=0 --absolute-names --no-recursion --null -T - -cvf ../file.tar.gz && md5sum ../file.tar.gz
./
./data.txt
./data1.txt
b117eaf020ba2d85cca84a169e4df750  ../file.tar.gz

I have done the same operation but keeping the contents of the data.txt exactly same:

$ echo "helloworld" > data.txt && md5sum data.txt
d73b04b0e696b0945283defa3eee4538  data.txt
$ echo "life" > data1.txt && md5sum data1.txt
5b019d44541d974a2cbf3299c8279c10  data1.txt
$ ls -lA -tr -h
total 8.0K
-rw-rw-r-- 1 alper alper alper  5 2021-01-24 17:35 data1.txt
-rw-rw-r-- 1 alper alper alper 11 2021-01-24 17:35 data.txt
$ find . -print0 | LC_ALL=C sort -z | \
  PIGZ=-n tar -Ipigz --mode=a+rwX --owner=0 --group=0 --absolute-names --no-recursion --null -T - -cvf ../file.tar.gz && md5sum ../file.tar.gz
./
./data.txt
./data1.txt
5218abe2ece732c74314d36b7e712c88  ../file.tar.gz

At the end, generated md5sum hash differs between the each compressed files, even though the data they are compressing is exactly same.

What may be the main reason of this and is it possible tofix it?

muru
  • 69,900
  • 13
  • 192
  • 292
alper
  • 449
  • 2
  • 8
  • 20
  • The timestamps are differing. – muru Jan 24 '21 at 14:55
  • Yes adding `--mtime='1970-01-01'` fixed the problem. I used the answer on the linked question , which also was missing `--mtime='1970-01-01'`. Do you think do I still need the other parameters like: `--mode=a+rwX --owner=0 --group=0 --absolute-names --no-recursion --null -T - -cvf ` – alper Jan 24 '21 at 14:57
  • 1
    You probably don't need `--absolute-names`, since you're not using absolute paths here, but `--no-recursion --null -T -` are needed since you're relying on `find | sort` for the filenames. Whether resetting the mode and ownership is necessary I can't say without knowing how your final environment will differ from wherever the original tar file will be created. – muru Jan 24 '21 at 15:40
  • 1
    `find ... | tar` pipe commands and the idea of the non-standard -null option is a concept from a time when modular features could only be combined via pipes. SInce 33 years, we have shared libraries and since 16 years we have `libfind` that can be combined with any command. `star` supports the `-find` option that introduces the `find(1)` CLI to the right side of `-find` since 2005. You may like to have a look at: http://schilytools.sourceforge.net/man/man1/star.1.html – schily Jan 28 '21 at 14:26
  • As alternative for `--no-recursion --null -T -` which options could I use along wiht `star`? This is actually a new question where how can I achieve: `How to create a deterministic tar.gz using star? ` @schily – alper Jan 28 '21 at 17:13

0 Answers0