Questions tagged [duplicate]

This tag is ambiguous. Please use a more precise tag instead.

This tag is not meaningful on its own. Please use a more precise tag instead.

87 questions
29
votes
1 answer

Open source duplicate image finder for Linux?

Is there a free and open source duplicate image finder for Linux based systems? Finding exact duplicates (based on content, not file name) is sufficient for me, but the ability to find similar images would certainly be great, too.
hpy
  • 4,517
  • 8
  • 53
  • 73
22
votes
6 answers

How to remove duplicate files using bash

I have a folder with duplicate (by md5sum (md5 on a Mac)) files, and I want to have a cron job scheduled to remove any found. However, I'm stuck on how to do this. What I have so far: md5 -r * | sort Which outputs something like…
warren
  • 1,778
  • 3
  • 21
  • 38
16
votes
12 answers

Remove all duplicate word from string using shell script

I have a string like "aaa,aaa,aaa,bbb,bbb,ccc,bbb,ccc" I want to remove duplicate word from string then output will be like "aaa,bbb,ccc" I tried This code Source $ echo "zebra ant spider spider ant zebra ant" | xargs -n1 | sort -u | xargs It…
Urvashi
  • 333
  • 1
  • 2
  • 9
14
votes
6 answers

Use basename to parse a list of paths held in a file

I'm running Mac OSX and trying to use the command line to find the number of files I have with the same name. I tried to use the following command: find ~ -type f -name "*" -print | basename | sort | uniq -d > duplicate_files It doesn't work! When…
JohnB
  • 153
  • 1
  • 1
  • 6
11
votes
8 answers

Find duplicate PDF files by content

Some journals generate a different PDF for each download. APS for example stores time and the IP address in the PDF. Or there is a paper version with hyper links and one with text references. How is it possible to find duplicate downloads of…
Jonas Stein
  • 3,898
  • 4
  • 34
  • 55
10
votes
3 answers

Remove duplicate mp3 with different name, size, and hash

I have a massive music library (all mp3), but I some of the music is almost the same but: Maybe one or two second longer About 97% the same as another song Or another bitrate. Is there a way to find these duplicates? As mentioned they don't have…
Hans Groeffen
  • 103
  • 1
  • 1
  • 6
9
votes
4 answers

Search and Delete duplicate files with different names

I have a large music collection stored on my hard drive; and browsing through it, I found that I have a lot of duplicate files in some album directories. Usually the duplicates exist alongside the original in the same directory. Usually the format…
Cestarian
  • 1,991
  • 5
  • 26
  • 45
9
votes
5 answers

Remove duplicate lines from a file that contains a timestamp

This question/answer has some good solutions for deleting identical lines in a file, but won't work in my case since the otherwise duplicate lines have a timestamp. Is it possible to tell awk to ignore the first 26 characters of a line in…
a coder
  • 3,184
  • 9
  • 42
  • 63
8
votes
6 answers

How to delete all duplicate hardlinks to a file?

I've got a directory tree created by rsnapshot, which contains multiple snapshots of the same directory structure with all identical files replaced by hardlinks. I would like to delete all those hardlink duplicates and keep only a single copy of…
n.st
  • 7,918
  • 4
  • 35
  • 53
7
votes
6 answers

How to find and delete duplicate files within the same directory?

I want to find duplicate files, within a directory, and then delete all but one, to reclaim space. How do I achieve this using a shell script? For example: pwd folder Files in it are: log.bkp log extract.bkp extract I need to compare log.bkp with…
Su_scriptingbee
  • 189
  • 2
  • 3
  • 11
7
votes
0 answers

Space efficiency of Btrfs reflinks vs hardlinks

A webpage referenced by the Wikipedia article on Btrfs claims that (emphasis mine) [Btrfs] reflinks have the same use as hardlinks, but are more space efficient. I thought that the opposite was true—that hardlinks are more space-efficient because…
Vincent Yu
  • 496
  • 4
  • 11
7
votes
3 answers

check if all files from a folder are also in another folder

I basically have a directory a with lots of images. Now I want to check if all of these images are in directory b. The point is, that lots of images in b ain't directly in b but in subdirectories. Also I don't want to depend on filenames, but file…
Uroc327
  • 245
  • 1
  • 4
  • 9
6
votes
3 answers

Shell script to search files for identical text entries

I need to write a script : Take a directory with several text files. Can be a few up to ~1000. All files contain an identifier on a given line (always the same line). Identify which files have identifier that is NOT UNIQUE, i.e. Duplicated in other…
5
votes
3 answers

Find and list duplicate directories

I have directory that has a number of sub-directories and would like to find any duplicates. The folder structure looks something like this: └── Top_Dir └── Level_1_Dir ├── standard_cat │ ├── files.txt ├──…
dino
  • 51
  • 1
  • 3
5
votes
2 answers

Keeping first instance of duplicates

I have a file with multiple columns and have identified lines where specific column values (cols 3-6) have been duplicated using a bash script. Example input: A B C D E F G 1 2 T TACA A 3 2 Q 3 4 I R 8 2 Q 9 3 A C 9 3 P 8 3 I R 8 2 Q I can display…
Bob
  • 349
  • 3
  • 10
  • 20
1
2 3 4 5 6