1

I have the following csv. I'm trying to remove the 20 out of 2017 so it's formatted like 3717 or 31817. The positions are always different due to some dates having single digit days. Since the year is always 4 digits how can I remove the 20 from the second column going right to left?

12  322017   EODTRANSACTION J    87.75   
12  3232017  EODTRANSACTION J    155  
45  3302017  EODTRANSACTION J    270 

Expected Output

12  3217    EODTRANSACTION J    87.75   
12  32317   EODTRANSACTION J    155  
45  33017   EODTRANSACTION J    270
Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
sortinousn
  • 147
  • 2
  • 6

2 Answers2

1

awk approach:

awk '{match($2, /^([0-9]+)[0-9]{2}([0-9]{2})$/, a); $2=a[1]a[2]}1' file

The output:

12 3217 EODTRANSACTION J 87.75
12 32317 EODTRANSACTION J 155
45 33017 EODTRANSACTION J 270

match($2, /^([0-9]+)[0-9]{2}([0-9]{2})$/, a) - will capture all digits from the second field except the 3rd and 4th digit at the end

RomanPerekhrest
  • 29,703
  • 3
  • 43
  • 67
0

Replace the second field with the result of: substituting the first "20" in the second field with the empty string, then print the resulting line:

awk '{$2=gensub("20", "", 1, $2); print;}' input > output
Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250