I'm trying to extract a couple of fields from each entry of a VCF file. Specifically I want the first and second field and the number after END= to be included. Here is one entry of the file:
1 234529926 AC=1;AF=0.00019968;AFR_AF=0.0008;AMR_AF=0;AN=5008;CIEND=0,500;CIPOS=-500,0;CS=DEL_union;EAS_AF=0;END=234549706;EUR_AF=0;MC=YL_CN_ACB_337;NS=2504;SAS_AF=0;SVTYPE=DEL 0|0
I tried the following to get the result I want:
sed 's|\([\d\s]*\)AC=.*;END=\([0-9]*\).*|\1\2|'
Results in:
1 234529926 234549706
Replacing [0-9] with \d should give the same result but it doesn't:
sed 's|\([\d\s]*\)AC=.*;END=\(\d*\).*|\1\2|'
Gives:
1 234529926
This doesn't make sense, since the [\d\s]*\ group at the beginning works just fine so it can't be the case that sed doesn't understand \d. Why is it so?