I'm trying to use fslint to find duplicates, but it takes forever hashing entire multi-gigabyte files. According to this website, I can compare by the following features:
feature summary
compare by file size
compare by hardlinks
compare by md5 (first 4k of a file)
compare by md5 (entire file)
compare by sha1 (entire file)
but I don't see these options in the GUI or the man pages. Is there something I'm missing here?
Edit: I'm using jdupes instead with the command line:
jdupes -r -T -T --exclude=size-:300m --nohidden
To get this work, I had to clone the git repository and build from source. (The packaged version is woefully out of date.)
I also had to edit the source code to change every:
#define PARTIAL_HASH_SIZE 4096
to
#define PARTIAL_HASH_SIZE 1048576
and then it actually matched my files correctly. I don't know why they coded it this way, but only matching the first 4096 bytes isn't nearly enough and produces false duplicates. (Maybe a command line option would be useful here)