36

I have the md5sum of a file and I don't know where it is on my system. Is there any easy option of find to identify a file based on its md5? Or do I need to develop a small script ?

I'm working on AIX 6 without the GNU tools.

slm
  • 363,520
  • 117
  • 767
  • 871
Kiwy
  • 9,415
  • 13
  • 49
  • 79
  • 4
    Wouldn't narrowing the search to file sizes of the same size then computing the md5 be faster? – RJ- Mar 11 '14 at 20:49
  • @RJ- yes maybe but in this case it also allow me to check if the file is the correct one and has been transfer correctly. – Kiwy Mar 12 '14 at 08:28

4 Answers4

40

Using find:

find /tmp/ -type f -exec md5sum {} + | grep '^file_md5sum_to_match'

If you searching through / then you can exclude /proc and /sys see following find command example :

Also I had done some testing, find take more time and less CPU and RAM where ruby script is taking less time but more CPU and RAM

Test Result

Find

[root@dc1 ~]# time find / -type f -not -path "/proc/*" -not -path "/sys/*" -exec md5sum {} + | grep '^304a5fa2727ff9e6e101696a16cb0fc5'
304a5fa2727ff9e6e101696a16cb0fc5  /tmp/file1


real    6m20.113s
user    0m5.469s
sys     0m24.964s

Find with -prune

[root@dc1 ~]# time find / \( -path /proc -o -path /sys \) -prune -o -type f -exec md5sum {} + | grep '^304a5fa2727ff9e6e101696a16cb0fc5'
304a5fa2727ff9e6e101696a16cb0fc5  /tmp/file1

real    6m45.539s
user    0m5.758s
sys     0m25.107s

Ruby Script

[root@dc1 ~]# time ruby findm.rb
File Found at: /tmp/file1

real    1m3.065s
user    0m2.231s
sys     0m20.706s
Rahul Patil
  • 24,281
  • 25
  • 80
  • 96
12

Script Solution

#!/usr/bin/ruby -w

require 'find'
require 'digest/md5'

file_md5sum_to_match = [ '304a5fa2727ff9e6e101696a16cb0fc5',
                         '0ce6742445e7f4eae3d32b35159af982' ]

Find.find('/') do |f|
  next if /(^\.|^\/proc|^\/sys)/.match(f) # skip
  next unless File.file?(f)
  begin
        md5sum = Digest::MD5.hexdigest(File.read(f))
  rescue
        puts "Error reading #{f} --- MD5 hash not computed."
  end
  if file_md5sum_to_match.include?(md5sum)
       puts "File Found at: #{f}"
       file_md5sum_to_match.delete(md5sum)
  end
  file_md5sum_to_match.empty? && exit # if array empty then exit

end

Bash Script solution based on probability which works faster

#!/bin/bash
[[ -z $1 ]] && read -p "Enter MD5SUM to search file: " md5 || md5=$1

check_in=( '/home' '/opt' '/tmp' '/etc' '/var' '/usr'  )
last_find_cmd="find / \\( -path /proc -o -path /sys ${check_in[@]/\//-o -path /} \\) -prune -o -type f -exec md5sum {} +"
last_element=${#check_in}
echo "Please wait... searching for file"
for d in ${!check_in[@]}
do

        [[ $d == $last_element ]] && eval $last_find_cmd | grep "^${md5}" && exit

        find ${check_in[$d]} -type f -exec md5sum {} + | grep "^${md5}" && exit


done

Test Result

[root@dc1 /]# time bash find.sh 304a5fa2727ff9e6e101696a16cb0fc5
Please wait... searching for file
304a5fa2727ff9e6e101696a16cb0fc5  /var/log/file1

real    0m21.067s
user    0m1.947s
sys     0m2.594s
Rahul Patil
  • 24,281
  • 25
  • 80
  • 96
  • which would you recommend ? – Kiwy Mar 11 '14 at 10:21
  • @Kiwy I'm not recommend, Just for practice – Rahul Patil Mar 11 '14 at 10:22
  • @Kiwy once look at test result and let me know and also do some testing from your side and show us the result, It would be great to see result on AIX. :D – Rahul Patil Mar 11 '14 at 10:53
  • My main issue with your script is that it needs ruby and it's not install on my System, and I'm not admin. but I will run some test tonight if I find some time – Kiwy Mar 11 '14 at 10:54
  • It seems faster than find in the end ^^. maybe you could put the md5sum in a thread so you can compute 5 md5sum at the same time it could save also a bit of time – Kiwy Mar 11 '14 at 11:01
  • @Kiwy I've updated the code, now you can many md5 hashes as you want – Rahul Patil Mar 11 '14 at 11:21
7

If you decide to install gnu find anyway (and since you indicated interest in one of your comments), you can try something like:

find / -type f \( -exec checkmd5 {} YOURMD5SUM \; -o -quit \) 

and have checkmd5 compare the md5sum of the file it gets as argument compare to the second argument and print the name if it matches and exit with 1 (instead of 0 otherwise). The -quit will have find stop once it is found.

checkmd5 (not tested):

#!/bin/bash

md=$(md5sum $1 |  cut -d' ' -f1)

if [ $md == $2 ] ; then
  echo $1
  exit 1
fi
exit 0
Anthon
  • 78,313
  • 42
  • 165
  • 222
1

For people running macOS and stumbling on this page: you have to use md5 instead of md5sum or checkmd5, i.e.:

find . -type f -exec md5 {} + | grep  'file_md5sum_to_match'

Caveat: also don't put ^ before file_md5sum_to_match otherwise it will never match anything since md5 prints the filename before its md5 sum.

Pigeo
  • 11
  • 1