Delete duplicate entries in a text file

Question

I created a txt file using two requests, one LDAP and one SQL. Results of the two requests are stored in the same txt file.

The txt file looks like this :

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

Because a user can be in the two databases, I need to delete duplicate entries, using bash.
How can I do it?

score 5 · Accepted Answer · answered Jun 11 '15 at 08:26

5

If you don't mind your file ending up sorted, sort it and filter it; either

sort -u file

if your sort supports it, or

sort file | uniq

if not, and you'll get on standard output the sorted list of unique email addresses.

If you want to keep the addresses in the original order, use awk:

awk '!(count[$0]++)' file

answered Jun 11 '15 at 08:26

Stephen Kitt

`sort -u` doesn't report the unique line but the first in lines sort the same in current locale. – cuonglm Jun 11 '15 at 08:43
@cuonglm Indeed, but is there a case where two different email addresses would have the same collation? – Stephen Kitt Jun 11 '15 at 08:51
@StephenKitt: `①@example.com` and `②@example.com` in `en_US.utf8` locale. – cuonglm Jun 11 '15 at 09:18
@cuonglm: `LC_ALL=en_US.UTF-8; (echo ①@example.com; echo ②@example.com) | sort | uniq` also merges both lines, so only the `awk` solution is viable in that case. – Stephen Kitt Jun 11 '15 at 18:23
@StephenKitt: It seems that you are using GNU uniq, it's not POSIX compliant in this case, you must use `uniq -i`. – cuonglm Jun 12 '15 at 01:09
Is it able to directly delete entries in the file and then save into it without creating new temporliy file using `awk`? – CodyChan Jan 08 '16 at 03:18
@CodyChan not directly, `awk` can't manipulate files in that way. You can combine `awk` with `sponge` (from [`moreutils`](http://joeyh.name/code/moreutils/)) to re-write the input file. – Stephen Kitt Jan 08 '16 at 08:23

1 Answers1