5

During my workflow I have created this file:

AAGGAGGGAGCTGCATGGAACCTGTGGATATATACACACAAGGTTAACCTCTGTCCTGTAAA  8  
GGAGTTCAGATGTGTGCTCTTCCGATCTGGAGGTCTCTGCTGGGGCCACCCTGTCCTCTCAG  30     
GAGAGAGGAAAGGAAGCGATTGCAGAACTTTCCACAAGGCTTTAGATTCCCCTGTCACAGAG  15  
GGAGGAGAAAGAATCAACTTTATAGCATCAGCCCCTTGTTTATTTTAAGTTCAGGGTTTAAG  13  
GGGAGAACATTTCCCTCCTTGTCCTCTCCTATCTCACTTACTACATTCCCACTGGTCACTGT  7  
GGGACATTTGTGATTACATGGTTGCAGTATTCTTTTTGTTCTTAGTCAGACTGTATAATTGG  4  

I would like to select from each text of the first column the first number of letters as present in the amount of the second column. Like first 8 character of the first row, first 30 character of the second row etc..

Like the first as example the output would be something like this:

AAGGAGGG  
GGAGTTCAGATGTGTGCTCTTCCGATCTGG

Any idea would be really appreciated.

don_crissti
  • 79,330
  • 30
  • 216
  • 245
fusion.slope
  • 684
  • 5
  • 17

2 Answers2

7

With awk:

awk '{ $0 = substr($1, 0, $2) } 1' file.txt

With GNU sed:

sed -r 's/.* ([0-9]+).*/s!^(.{\1}).*!\\1!/' file.txt | \
    cat -n | \
    sed -r -f - file.txt

(GNU sed because it can read script files from stdin).

With perl:

perl -lpe 's/.*?([ACTG]+)\s+(\d+).*/ substr($1, 0, $2)/e' file.txt

Another way with perl:

perl -lape '$_ = substr($F[0], 0, $F[1])' file.txt
Satō Katsura
  • 13,138
  • 2
  • 31
  • 48
1

Without sed:

while read -r d n;do echo ${d:0:$n};done < file.txt 
Ipor Sircer
  • 14,376
  • 1
  • 27
  • 34
  • 1
    Downvoted for [using `echo`](http://unix.stackexchange.com/q/65803/135943) of [unquoted variables](http://unix.stackexchange.com/q/131766/135943) in a [shell loop to process text](http://unix.stackexchange.com/q/169716/135943). – Wildcard Nov 05 '16 at 02:36