-3

I am getting a file with content below

902461360       81636718        32863608        0       employee    permenant
902492248       81415224        32775337        0       employee    temporary
902495059       81686374        32881482        0       employee    permenant
902495059       81686374        32881482        0       employee    vendor
902504989       81675052        32877123        0       employee    vendor
902532086       81691300        32884527        0       employee    vendor
902723910       81690082        32882735        0       employee    permenant
902723910       81690082        32882735        0       employee    vendor

The first three values might be repeating in other lines I want to keep one instance and remove other duplicates

the output should be like below

902461360       81636718        32863608        0       employee    permenant
902492248       81415224        32775337        0       employee    temporary
902495059       81686374        32881482        0       employee    permenant
902504989       81675052        32877123        0       employee    vendor
902532086       81691300        32884527        0       employee    vendor
902723910       81690082        32882735        0       employee    permenant

Archemar
  • 31,183
  • 18
  • 69
  • 104
sravani
  • 11
  • 5
    Please replace the images of the data with the actual data (as text), so that popelp are able to test their solutions. [Don't post images of text](https://unix.meta.stackexchange.com/questions/4086) – Kusalananda Sep 02 '20 at 11:54
  • Welcome to the site. How do you define "might be repeating"? Do you want to remove a line if _the exact combination of value1, value2 and value2_ has already occured, or if any of value1 _or_ value2 _or_ value3 has already occurred in a previous line? – AdminBee Sep 09 '20 at 10:18

1 Answers1

2

I would try

awk '!a[$1 $2 $3]++ { print ;}' file

where

  • !a[$1 $2 $3]++ will evaluate to true first time thoses values are found.

see How does awk '!a[$0]++' work? for more details.

Archemar
  • 31,183
  • 18
  • 69
  • 104
  • 2
    In the general case, it would be safer to use `a[$1,$2,$3]` (with commas) as that inserts the value of `SUBSEP` between the values that makes up the key instead of just concatenating. A set of `1`, `23`, `4` would otherwise be indistinguishable from the set `12`, `3`, `4`. Also, `{ print; }` is not actually needed. – Kusalananda Sep 02 '20 at 12:27
  • thats absolutely solved my issue. Thanks aton – sravani Sep 02 '20 at 13:05
  • 1
    @sravani Consider [accepting](https://unix.stackexchange.com/help/someone-answers) a post that solves your problem. – Quasímodo Sep 02 '20 at 13:10