3
#!/bin/bash

cd /path-to-directory
md5=$(find . -type f -exec md5sum {} \; | sort -k 2 | md5sum)

zenity --info \
--title= "Calculated checksum" \
--text= "$md5"

The process of a recursive checksum calculation for a directory takes a while. The bash script doesn´t wait until the process is finished it just moves to the next command which is the dialog box, that displays the calculated checksum. So the dialog box shows a wrong checksum.

Is there an option to tell the script to wait until the calculation of the checksum is finished? Furthermore, is there an option to pipe the progress of the checksum calculation to some kind of progress bar like in zenity for example?

terdon
  • 234,489
  • 66
  • 447
  • 667
stewie
  • 45
  • 4
  • Please review that code. It's invalid syntax as written. – Stéphane Chazelas Oct 25 '22 at 13:22
  • Thanks. I corrected it. – stewie Oct 25 '22 at 13:29
  • 2
    What makes you think it doesn't wait? Are you sure that's the problem or is it just that you see no text in the zenity box and guess it is because the md5sum hasn't finished? – terdon Oct 25 '22 at 13:38
  • 1
    Are you sure a space is allowed after the `=` in the zenity command? It is probably still invalid code. – doneal24 Oct 25 '22 at 13:41
  • It doesn't look like there'd be anything there that would run in the background. The command substitution in the assignment has to finish for there to be a value to assign, and the shell doesn't even support asyncronous assignments, so if that assignment was sent to the background (with `var=$(foo) &`), the assigned value would only be set in the background shell... – ilkkachu Oct 25 '22 at 13:42
  • @doneal24 nah, it's valid (as in it runs, and no errors are printed), but it doesn't work: there is no text or title in the zenity popup. – terdon Oct 25 '22 at 13:47
  • @terdon Possibly different level of valid. The bash code is valid but it's possibly calling zenity in a manner invalid to the application. I see that both you and Stéphane removed the spaced in your solutions. – doneal24 Oct 25 '22 at 13:52
  • It is just a mistake I made in this article. In the original bash script it´s correct, but however it still displays a wrong checksum. – stewie Oct 25 '22 at 13:56
  • 2
    Please add all this to your question so we don't waste your time or ours with wrong data. You now say it shows a wrong checksum? How is it wrong? How are you testing? If you don't explain what you are doing we won't be able to help you. – terdon Oct 25 '22 at 14:07

2 Answers2

6

As written, it would be waited for. For a pulsating progress bar:

#! /bin/sh -
export LC_ALL=C
cd /path/to/dir || exit
{
  md5=$(
    find . -type f -print0 |
      sort -z |
      xargs -r0 md5sum |
      md5sum
  )
  exec >&-
  zenity --info \
         --title="Checksum" \
         --text="$md5"
} | zenity --progress \
           --auto-close \
           --auto-kill \
           --pulsate \
           --title="${0##*/}" \
           --text="Computing checksum"

For an actual progress bar, you'd need to know the number of files to process in advance.

With zsh:

#! /bin/zsh -
export LC_ALL=C
autoload zargs
cd /path/to/dir || exit
{
  files=(.//**/*(ND.))
} > >(
  zenity --progress \
         --auto-close \
         --auto-kill \
         --pulsate \
         --title=$0:t \
         --text="Finding files"
)
md5=(
  $(
   zargs $files -- md5sum \
      > >(
        awk -v total=$#files '/\/\// {print ++n * 100 / total}' | {
          zenity --progress \
            --auto-close \
            --title=$0:t \
            --text="Computing checksum" || kill -s PIPE $$
        }) \
      | md5sum
  )
)
zenity --info \
       --title=$0:t \
       --text="MD5 sum: $md5[1]"

Note that outside of the C locale, on GNU systems at least, filename order is not deterministic, as some characters sort the same and also filenames are not guaranteed to be made of valid text, hence the LC_ALL=C above.

The C locale order is also very simple (based on byte value) and consistent from system to system and version to version.

Beware that means that error messages if any will be displayed in English instead of the user's language (but then again the Computing checksum, Finding files, etc are not localised either so it's just as well).

Some other improvements over your approach:

  • Using -exec md5sum {} + or -print0 | xargs -r0 md5sum (or zargs equivalent) minimises the number of md5sum invocations, each md5sum invocation being passed a number of files. -exec md5sum {} \; means running one md5sum per file which is very inefficient.
  • we sort the list of files before passing to md5sum. Doing sort -k2 in general doesn't work as file names can contain newline characters. In general, it's wrong to process file paths line-based. You'll notice we use a .// prefix in the zsh approach for awk to be able to count files reliable. Some md5sum implementations also have a -z option for NUL-delimited records.
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
3

That code will wait for the find and md5sum commands to finish. That's just the normal behavior, unless you have a & to send the commands to the background.

However, your zenity command is malformed: you can't have a space after the =. So I am guessing that you are seeing an empty zenity window and that's why you think it isn't waiting. Try again, but remove the spaces:

#!/bin/bash

cd /path-to-directory
md5=$(find . -type f -exec md5sum {} \; | sort -k 2 | md5sum)

zenity --info \
--title="Calculated checksum" \
--text="$md5"

You can also avoid the need to cd and make it a bit more concise if you do:

#!/bin/bash

zenity --info \
--title="Calculated checksum" \
--text="$(find /path-to-directory -type f -exec md5sum {} \; | sort -k 2 | md5sum)"

If you need this to always return the same result, no matter what the parent path of the directory is, you can use this to remove the file names from the output of the find ... md5sum command before passing to the second md5sum:

#!/bin/bash

zenity --info \
--title="Calculated checksum" \
--text="$(find /path-to-directory -type f -exec md5sum {} \; | cut -d ' ' -f1 | sort -k 2 | md5sum)"
terdon
  • 234,489
  • 66
  • 447
  • 667
  • This is just a mistake I made in this forum because I didn´t copy the original script. It is on a different machine. However if I run the script like this, a zenity dialog pops up immediately and shows a checksum, that isn´t the correct one. I compared it to the output of a terminal, that runs the "find . -type ... " code. – stewie Oct 25 '22 at 13:55
  • 4
    @stewie that isn't really possible. We need to see the _exact_ commands you are running to be able to help though. – terdon Oct 25 '22 at 14:12
  • Remember that the command is piped to a second md5sum. So your version runs on "[MD5 hash] /path-to-directory/path/to/dirname" while the original does "[MD5 hash] path/to/dirname". Try it and see. You get different results with and without the `cd`. The test as @stewie described would give different results. Also, the correct comparison would be to `find /path-to-directory -type f -exec md5sum {} \; | sort -k 2 | md5sum` or `find . -type f -exec md5sum {} \; | sort -k 2 | md5sum`. Doing the `cd` gives consistent results regardless of /path-to-directory. – mdfst13 Oct 26 '22 at 00:12
  • You are right I tried both version on /run/media/"$USER"/directory and got different results. So if I want the correct checksum for the directory on my mounted device I need to use the command `find /run/media/"$USER"/directory -type f -exec md5sum {} \; | sort -k 2 | md5sum` , right? – stewie Oct 27 '22 at 09:50
  • @stewie it depends on what you mean by "correct". The md5sum you are calculating is including the path. If you then want to compare this in order to check that the same files are present (which was never mentioned in the question), then you do indeed need to ignore the paths. So either cd into the directory first, or remove the paths from the output of -exec md5sum {} like this: `find foo/ -type f -exec md5sum {} \; | cut -d ' ' -f1`. See updated answer. – terdon Oct 27 '22 at 10:31
  • A "correct" checksum would be: one single checksum for all files of a device. For example I´ve mounted a USB device that is listed at `/run/media/user/USB` and the checksum shall be calculated for `/USB` . I am not very familiar with Linux so I am not even sure if this is the correct way to approach my problem. – stewie Oct 27 '22 at 11:09
  • @stewie but based on the file names and content, or on the content only or on the file names only? If a file has the same content but a different name, should that count as a match? If it has the same name but different contents? I think you might want to ask a new question, explaining what the final objective is and then we can help more. – terdon Oct 27 '22 at 11:21
  • File names and file content. Yeah I might need to make a new question. – stewie Oct 27 '22 at 11:24