0

I have an external command, say check_this, which would spit out YES or NO for a file piped to it

cat myfile | check_this

YES
NO
YES
YES
...

Now I want to get all the lines in myfile with YES results. Is there a way to do this? Currently I use a tempfile, save it to another file, then use paste + grep, which is cumbersome and not robust.

user40129
  • 331
  • 1
  • 2
  • 7

4 Answers4

2

I'd use awk:

<myfile check_this | awk '
  !check_processed {if ($1 == "YES") yes[FNR]; next}
  FNR in yes' - check_processed=1 myfile

awk records which line numbers of check_this's output start with a YES word in the yes hash table, and then prints the lines of myfile whose number are in that yes hash table.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
0

A variant of @StéphaneChazelas' perfectly good awk-based solution, that is less compact but perhaps easier to read because it does not resort to an external variable (check_processed in his notation), would be:

$ awk 'FNR == NR {if ($1 == "YES") yes[FNR];next} 
       FNR != NR && FNR in yes'   <(check_this <myfile) myfile

Note: @RakeshSharma remarks that the simultaneous use of next (1st line) and of the test FNR != NR (2nd line) is a redundancy. Users of that pattern can remove one or the other with no change in output, as in:

$ awk 'FNR == NR {if ($1 == "YES") yes[FNR];next} 
       FNR in yes'   <(check_this <myfile) myfile
Cbhihe
  • 2,549
  • 2
  • 21
  • 30
  • The `FNR != NR` is redundant and can be removed. Or, remove the `next` from the previous line. – Rakesh Sharma Aug 08 '20 at 15:20
  • @RakeshSharma: You are 100% right. You could even see the simultaneous use of `next` and the test `FNR != NR` as an anti-pattern here. It was just meant as a quick illustration of `awk`'s versatility, geared toward people not fully conversant with the idiom, or not comfortable with using external variables declared for the subshell on same cmd line (see StéphaneChazelas' answer)... I will nevertheless edit the answer with a small comment mentioning you for good measure. Good catch. – Cbhihe Aug 08 '20 at 15:46
  • Note that the `FNR == NR` approaches in general don't work properly when the first file is empty which is why I prefer the `!flag`/`flag=1` approach. In this case though, it wouldn't be a problem as if `myfile` is empty, the output of `check_this` would also be empty. See [Bypass a nawk snippet if the input file is empty](//unix.stackexchange.com/q/237105) – Stéphane Chazelas Aug 09 '20 at 08:55
0

We can make use of the GNU version of the dc utility to basically implement a grep -f functionality.

dc -e "
$(< myfile check_this | sed -e 's/NO/0/;s/YES/1/' | tac)
[q]sq [p]sp [?z0=qr1=psxz0<?]s?
l?x
" < <(< myfile sed -e 's/.*/[&]/')
  • As a first step we load the check_this utility's output, booleanized appropriately (YES=>1, NO=>0), and pushed onto the stack. The next line from the input file is read and pushed on the stack. print it if the 2nd stack element is a 1.

  • Then we clear out the top 2 stack elements. Repeat until eof.

Rakesh Sharma
  • 1,102
  • 1
  • 4
  • 3
0

GNU awk aka gawk+paste:

$ < myfile check_this \
   | paste myfile -      \
   | gawk '/YES$/ && NF--';
$ < myfile check_this \
    |  perl -lpe '
      @ARGV && do{
        /YES/ && $h{$.}++;
        eof && close(ARGV);
        next;
       };
        print if $h{$.};
  ' - myfile

GNU sed with extended regex mode ON:

$ < myfile check_this |
    sed -nE '
        1{:a;H;n;/^(YES|NO)$/ba;}
        G;/\n\nYES/P
        s/.*\n\n(YES|NO)/\n/;h
    ' - myfile

store the check_this output in hold and fir every line of myfile determine the leading value for hold is a yes. Then print the myfile line. Clip leading two elements from pattern space and re-store (NOT "restore" mind you) the pattern into hold space.

Rakesh Sharma
  • 1,102
  • 1
  • 4
  • 3