10

In Ubuntu, is there any to find duplicate folders in a directory (i. e., folders with the same content)? I think there are already some command-line tools available for finding duplicate files (such as fdupes), but I want to find duplicate folders instead. That is, find folders which match in terms of the contents of the files they contain (though the filenames and other metadata might differ).

nealmcb
  • 766
  • 9
  • 16
Anderson Green
  • 521
  • 2
  • 7
  • 17
  • I might start by generating a list of all folders in a directory (sorted by length), and then check each pair of folders with the same length. – Anderson Green Dec 12 '12 at 21:49
  • Define "duplicate". Must the files inside match merely file content? File name? Inode number? File size? – Chris Down Dec 12 '12 at 21:50
  • @ChrisDown The question has been updated. – Anderson Green Dec 12 '12 at 21:51
  • I don't see any clarification of what it means to be a duplicate directory... – Chris Down Dec 12 '12 at 21:52
  • @ChrisDown I said "folders in a directory with the same content" - does this need further clarification? Two "duplicate folders" would contain the same files and folders in the same order. – Anderson Green Dec 12 '12 at 21:53
  • 3
    Yes. Directories are really just files, so your statement is ambiguous. To have the "same content" in reality would mean that the directories both contain the same inode references. It is unclear whether you mean that, or whether you mean that the *files inside* should have the same content, and if so, whether there are other stipulations (mtime, filename, etc). – Chris Down Dec 12 '12 at 21:56
  • 3
    @ChrisDown I mean that the files inside should have the same content. – Anderson Green Dec 12 '12 at 21:57
  • So, to be clear, all other metadata other than the file content is irrelevant? – Chris Down Dec 12 '12 at 21:58
  • @ChrisDown Yes, all metadata other than the file content would be irrelevant. – Anderson Green Dec 12 '12 at 21:59
  • let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/6711/discussion-between-anderson-green-and-chris-down) – Anderson Green Dec 12 '12 at 22:02

1 Answers1

5
#!/bin/bash
shopt -s dotglob

for file in "$1"/*; do [[ -f "$file" ]] && d1+=( "$(md5sum < "$file")" ); done
for file in "$2"/*; do [[ -f "$file" ]] && d2+=( "$(md5sum < "$file")" ); done 

[[ "$(sort <<< "${d1[*]}")" == "$(sort <<< "${d2[*]}")" ]] && echo "Same" || echo "Different"

You can see it in action here:

$ mkdir 1 2
$ ./comparedirs 1 2
Same
$ cat > 1/1 <<< foo
$ cat > 2/1 <<< foo
$ ./comparedirs 1 2
Same
$ cat > 2/1 <<< bar
$ ./comparedirs 1 2
Different
Chris Down
  • 122,090
  • 24
  • 265
  • 262
  • Since this script is untested, I'm eager to see whether it works the way it's supposed to work. – Anderson Green Dec 12 '12 at 22:08
  • 1
    @AndersonGreen Check the updated answer, tested it. – Chris Down Dec 12 '12 at 22:13
  • Nice! There should also be a test with `cat > 1/2 <<< bar` and `cat > 2/3 <<< bar` to show multiple files and differing metadata (== "Same") – nealmcb Jun 19 '14 at 03:59
  • @ChrisDown: does sort in last step needed? – harish.venkat Jun 19 '14 at 05:49
  • Elegant script, only minor bug is that it returns Same when either or both directories do not exists. Should be easily fixable if one is better in scripting than I. – cosine Sep 11 '15 at 10:26
  • @AndersonGreen why do you accept this answer? You asked to _find all folders in a directory with the same content_, not just compare two given folders. – Pablo A Sep 10 '17 at 21:58