awk, cut characters out of substring

Question

I have the following csv. I'm trying to remove the 20 out of 2017 so it's formatted like 3717 or 31817. The positions are always different due to some dates having single digit days. Since the year is always 4 digits how can I remove the 20 from the second column going right to left?

12  322017   EODTRANSACTION J    87.75   
12  3232017  EODTRANSACTION J    155  
45  3302017  EODTRANSACTION J    270

Expected Output

12  3217    EODTRANSACTION J    87.75   
12  32317   EODTRANSACTION J    155  
45  33017   EODTRANSACTION J    270

does the 2nd column always end with 2017? no other year possible in data you have? — Sundeep, Apr 11 '17 at 16:08
Try awk along with http://unix.stackexchange.com/questions/163481/a-command-to-print-only-last-3-characters-of-a-string — Spike, Apr 11 '17 at 16:32
Why do people insist on using stupid date formats? What is wrong with `20170412`? — Michael Vehrs, Apr 12 '17 at 07:35

score 1 · Accepted Answer · answered Apr 11 '17 at 17:25

awk approach:

awk '{match($2, /^([0-9]+)[0-9]{2}([0-9]{2})$/, a); $2=a[1]a[2]}1' file

The output:

12 3217 EODTRANSACTION J 87.75
12 32317 EODTRANSACTION J 155
45 33017 EODTRANSACTION J 270

match($2, /^([0-9]+)[0-9]{2}([0-9]{2})$/, a) - will capture all digits from the second field except the 3rd and 4th digit at the end

score 0 · Answer 2 · answered Apr 11 '17 at 17:04

0

Replace the second field with the result of: substituting the first "20" in the second field with the empty string, then print the resulting line:

awk '{$2=gensub("20", "", 1, $2); print;}' input > output

answered Apr 11 '17 at 17:04

Jeff Schaller

66,199
35
114
250

awk, cut characters out of substring

2 Answers2