I have followed following answer for Tar produces different files each time and Why does the PIGZ produce a different md5sum. My goal is to generate deterministic compressed file in tar.gz format (having the same md5sum hash), if the compressed files' contents are not change.
$ cd /home/user/folder_to_zip
$ echo "helloworld" > data.txt && md5sum data.txt
d73b04b0e696b0945283defa3eee4538 data.txt
$ echo "life" > data1.txt
$ md5sum data1.txt
5b019d44541d974a2cbf3299c8279c10 data1.txt
$ ls -lA -tr -h
total 8.0K
-rw-rw-r-- 1 alper alper alper 11 2021-01-24 17:20 data.txt
-rw-rw-r-- 1 alper alper alper 5 2021-01-24 17:31 data1.txt
$ find . -print0 | LC_ALL=C sort -z | \
PIGZ=-n tar -Ipigz --mode=a+rwX --owner=0 --group=0 --absolute-names --no-recursion --null -T - -cvf ../file.tar.gz && md5sum ../file.tar.gz
./
./data.txt
./data1.txt
b117eaf020ba2d85cca84a169e4df750 ../file.tar.gz
I have done the same operation but keeping the contents of the data.txt exactly same:
$ echo "helloworld" > data.txt && md5sum data.txt
d73b04b0e696b0945283defa3eee4538 data.txt
$ echo "life" > data1.txt && md5sum data1.txt
5b019d44541d974a2cbf3299c8279c10 data1.txt
$ ls -lA -tr -h
total 8.0K
-rw-rw-r-- 1 alper alper alper 5 2021-01-24 17:35 data1.txt
-rw-rw-r-- 1 alper alper alper 11 2021-01-24 17:35 data.txt
$ find . -print0 | LC_ALL=C sort -z | \
PIGZ=-n tar -Ipigz --mode=a+rwX --owner=0 --group=0 --absolute-names --no-recursion --null -T - -cvf ../file.tar.gz && md5sum ../file.tar.gz
./
./data.txt
./data1.txt
5218abe2ece732c74314d36b7e712c88 ../file.tar.gz
At the end, generated md5sum hash differs between the each compressed files, even though the data they are compressing is exactly same.
What may be the main reason of this and is it possible tofix it?