3

Need to extract the lines between the same pattern with specified occurrence of the search Pattern

like if I want to get the lines between 1st & 2nd occurrence or lines between 3rd and 4th occurrence of search pattern. Where no of lines may very between the pattern, if no lines between pattern then output should be blank

Example: -

Line 1
Line 2
Line 3
Pattern
Line 5
Line 6
Line 7
Pattern
Line 8
Line 9
Pattern
Line 11
Line 12
Pattern
Line 13
Pattern
Pattern

Expected Output Lines Between 1st and 2nd Occurrence

Line 5
Line 6
Line 7

Lines between 3rd and 4th occurrence

Line 11
Line 12
Rui F Ribeiro
  • 55,929
  • 26
  • 146
  • 227
Vasanta Koli
  • 133
  • 1
  • 6
  • Please help with Single line command if possible – Vasanta Koli May 18 '17 at 06:45
  • https://unix.stackexchange.com/search?q=awk+lines+between+pattern – Sparhawk May 18 '17 at 06:52
  • This does not include selection of pattern in between. – Vasanta Koli May 18 '17 at 07:01
  • [Third hit](https://unix.stackexchange.com/questions/346203/grep-the-lines-between-the-occurrence-of-the-same-pattern). Although I'm not sure what your expected output is exactly. You have two code blocks as output. Do they represent separate files? – Sparhawk May 18 '17 at 07:02
  • Yes Output would be in different files and the input would be a single file. – Vasanta Koli May 18 '17 at 07:03
  • I just realised it's slightly different to the answer I linked. I've posted some modified code. – Sparhawk May 18 '17 at 07:11
  • Hi I saw the post but challenge is it is creating multiple files, where I need specific output of lines on which I wold want to perform text processing further, before printing it to any of the file – Vasanta Koli May 18 '17 at 07:15
  • Well, you'll have to be more specific about what "further" text processing is. How would you do that precisely? Can you put that into the awk code? Can you write to temporary files, then iterate across those? – Sparhawk May 18 '17 at 07:16

4 Answers4

4

Based on this answer,

awk '/Pattern/{n+=1}; n % 2 == 1 && ! /Pattern/ {print > "output"((n-1)/2)}' input_file 

Explanation

  • /Pattern/{n+=1}: when you match Pattern, increment n by 1.
  • n % 2 == 1 && ! /Pattern/: only do the following then n is odd, i.e. after every alternate pattern. Also, ignore the lines with Pattern on them.
  • {print > "output"((n+1)/2)}': if the above holds, then print that line into a file named outputx, where x is ((n+1)/2), i.e. output1, output2, output3
Sparhawk
  • 19,561
  • 18
  • 86
  • 152
2

Alternative AWK approach

 $ awk -v start=3  '/Pattern/{n++;next};n==start;n==start+1{exit}' input.txt                                                     
Line 11
Line 12

$ awk -v start=2 '/Pattern/{n++;next};n==start;n==start+1{exit}' input.txt                                                      
Line 8
Line 9

Explanation

The way this works is fairly straightforward:

  • using -v flag we define a variable which we increment if we find the matching pattern and go to next line(that's the /Pattern/{n++;next} part of the code)
  • in awk if condition is true, that's automatically a signal for printing stuff, hence n==start can be viewed same was as n==start{print}.
  • final codeblock where we look if we got to the next pattern is n==start+1{exit}. Say we wanted to print lines between 4th and 5th pattern occurrence. This will mean that whenn==4+1` the code exits

If we were doing code-golf, we could make this even shorter by just changing start variable to something like -v s=1, which shortens the code like so:

awk -v s=3  '/Pattern/{n++;next};n==s;n==s+1{exit}'

Assumptions:

  • GNU awk
  • we're reading between consecutive patterns, i.e. between match n and n+1

Generalizing the approach

What if we wanted to print lines between pattern 2 to pattern 4 ? Using a few of tricks used in the previous example, we can do that as well like so:

$ awk -v start=2 -v finish=4 '/Pattern/{n++;next};n==finish{exit};n>=start' input.txt                                           
Line 8
Line 9
Line 11
Line 12

Notice that here we define another variable,finish, to know where to stop. This way n==finish will stop printing the lines. Notice also that n==finish{exit} comes before n>=start, which allows us to avoid redundant printing of the same line where we're supposed to exit.

Sergiy Kolodyazhnyy
  • 16,187
  • 11
  • 53
  • 104
0

With sed:

sed -n '/Pattern/!d;:a
n;//! {w file1.txt
ba
};:b 
n;//! bb
:c
n;//q;w file2.txt
bc
' file

WIth POSIX sed You have to do 3 loops like this for the both matches and the in-between, as you can't generate filenames from within the script.

Philippos
  • 13,237
  • 2
  • 37
  • 76
0
start=3; # these can only be positive integers
 stop=4; # stop > start

perl -lne "// or print if /Pattern/ && ++\$a == $start ... // && ++\$a == $stop" data.in

Perl solution uses the range operator ... where it's two operands act like flip-flops: => so long as the first operand is false, ... returns false. As soon as the first operand goes true, then the ... returns true. It will only go false when the second operand becomes true. The subtleyty arises due to the feature that the operand1 is not evaluated once it becomes true and operand2 is not evaluated while operand1 is false.

sed -nE "
   /Pattern/!d
   x
      s/\$/./
      /^[.]{$start}\$/!{x;d;}
   x

   n

   :loop
      p;n
      /Pattern/{
         x
            s/\$/./
            /^.{$stop}\$/q
         x
      }
   bloop
" data.in

the sed solution uses the hold space for keeping a count of the number of times the pattern is seen. We keep rejectting lines so long as $start number of patterns not seen. As soon as the $start-th pattern arrives, we go into a loop which keeps reading the next line, printing it and all the while measuring whether $stop-th pattern is seen. Once seen, we quickly quit.