-2

I have a directory with several duplicate files, created by a program. The duplicates have the same name (except for a number), but not all files with the same name are duplicates.

What's a simple command to delete the duplicates (ideally a single line limited to GNU coreutils, unlike the question about scripts)?

Example filename: parra1998.pdf parra1998(1).pdf parra1998(2).pdf

Nemo
  • 522
  • 1
  • 11
  • 23
  • 2
    "_`but not all files with the same name are duplicates`_", you cannot have two files with same name. Then how you want to detect file `parra1998.pdf` is not duplicated of file `parra1998(1).pdf` or it is? Based on their contents? if yes, then your question is duplicated of [How to find and delete duplicate files within the same directory?](https://unix.stackexchange.com/q/367749/72456) – αғsнιη Apr 29 '18 at 10:32
  • @αғsнιη "same name (except for a number)" – Nemo Apr 29 '18 at 11:56
  • @dsstorefile1 No, this question asks for a simple command while that question is more generic (answers include entire bash scripts, GUI programs etc.) – Nemo Apr 29 '18 at 11:57
  • @dsstorefile1 sure, one can have different opinions on the answers. Yet, that question didn't *ask* for the same thing. – Nemo Apr 29 '18 at 12:05
  • 1
    Indeed, I can't parse `The duplicates have the same name (except for a number), but not all files with the same name are duplicates` -- how do we know if a numbered suffix file is a duplicate of the base name? – Jeff Schaller Apr 29 '18 at 13:21

1 Answers1

-1

A quick and dirty solution is to hash the files, then search the hashes which appear more than once and delete those whose filename is numbered.

For instance: sha1sum * > files.sha1sum cat files.sha1sum | cut -f1 -d" " | sort | uniq -c | grep -v " 1 " | sed --regexp-extended 's/^[^0-9]+[0-9] //g' | xargs -n1 -I§ grep § files.sha1sum | sed --regexp-extended 's/^[^ ]+ +//g' | grep -v '(' | xargs -n1 -I§ rm "§"

Nemo
  • 522
  • 1
  • 11
  • 23
  • The line is a bit long and convoluted, but it relies on commands which I use almost daily so that it's easier to remember and adapt. Depending on your habits, using `[:blank:]` etc. in patterns may be easier. – Nemo May 02 '18 at 07:07