Is there any way to limit the fdupes search space?

Question

I have a pair of disks, D1 and D2.

I want to determine if all files in D2 have a corresponding copy somewhere in D1.

D1 contains approximately 4000 times as many files as D2.

fdupes -r D1 D2 searches for all duplicates anywhere within D1 or D2, which requires doing a tremendous amount of computation across all the files in D1.

Is there a way to direct fdupes to only search for duplicates of files in D2 that exist in D1?

I don't know if it will require as much additional computation as you believe. A proper algorithm should be a single pass over all the files once, and adding them to a hash table of some kind. Have you tried just filtering the final report for the dupes you're interested in? — Wildcard, Sep 10 '19 at 05:46
@Wildcard It depends. `man 1 fdupes` says it works `by comparing file sizes and MD5 signatures, followed by a byte-by-byte comparison`. If there are many duplicates within D1 only, the "byte-by-byte comparison" will be performed many times unnecessarily. — Kamil Maciorowski, Sep 10 '19 at 07:22
If there is no good answer for `fdupes`, consider `jdupes`. It should perform better, even if you had to compare all the files and filter the output. And if there is a good answer for `fdupes` then you may consider `jdupes` anyway. — Kamil Maciorowski, Sep 10 '19 at 07:30

Is there any way to limit the fdupes search space?

0 Answers0