0

I created a txt file using two requests, one LDAP and one SQL. Results of the two requests are stored in the same txt file.

The txt file looks like this :

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

Because a user can be in the two databases, I need to delete duplicate entries, using bash.
How can I do it?

taliezin
  • 9,085
  • 1
  • 34
  • 38
nicolasfo
  • 51
  • 1
  • 4

1 Answers1

5

If you don't mind your file ending up sorted, sort it and filter it; either

sort -u file

if your sort supports it, or

sort file | uniq

if not, and you'll get on standard output the sorted list of unique email addresses.

If you want to keep the addresses in the original order, use awk:

awk '!(count[$0]++)' file
Stephen Kitt
  • 411,918
  • 54
  • 1,065
  • 1,164
  • `sort -u` doesn't report the unique line but the first in lines sort the same in current locale. – cuonglm Jun 11 '15 at 08:43
  • @cuonglm Indeed, but is there a case where two different email addresses would have the same collation? – Stephen Kitt Jun 11 '15 at 08:51
  • @StephenKitt: `①@example.com` and `②@example.com` in `en_US.utf8` locale. – cuonglm Jun 11 '15 at 09:18
  • @cuonglm: `LC_ALL=en_US.UTF-8; (echo ①@example.com; echo ②@example.com) | sort | uniq` also merges both lines, so only the `awk` solution is viable in that case. – Stephen Kitt Jun 11 '15 at 18:23
  • @StephenKitt: It seems that you are using GNU uniq, it's not POSIX compliant in this case, you must use `uniq -i`. – cuonglm Jun 12 '15 at 01:09
  • Is it able to directly delete entries in the file and then save into it without creating new temporliy file using `awk`? – CodyChan Jan 08 '16 at 03:18
  • @CodyChan not directly, `awk` can't manipulate files in that way. You can combine `awk` with `sponge` (from [`moreutils`](http://joeyh.name/code/moreutils/)) to re-write the input file. – Stephen Kitt Jan 08 '16 at 08:23