5

I am writing a bash script (just learning bash) to extract some lines from a file based on two patterns. The first pattern is just a sentence ending in a colon. The second pattern is a * repeated N (in this case 58) times.

An example file:

lines I don not want
lines I don not want
lines I don not want

A sentence here:
********************************************************
lines I want
lines I want
lines I want
**********************************************************

lines I don not want
lines I don not want
lines I don not want

Desired output:

A sentence here:
********************************************************
lines I want
lines I want
lines I want
**********************************************************

I can get the script to work if I explicitly type out A sentence here and \* 58 times within the call to awk, but cleanliness and readability I would prefer to do something like below:

pat1="A sentence here"
pat2=`printf -- '\*%.s' {1..58} ; echo`
pat2=${pat2//\\/\\\\}
awk -v pat1="${pat1}" -v pat2="${pat2}" '/{pat1}/ {p=1}; p; /{pat2}/ {p=0}' $1

Where the first positional variable is the input file. The above code returns nothing. I initially tried it without the substitution on pat2, but got the warning:

awk: warning: escape sequence `\*' treated as plain `*'

I will have to run this command thousands of times and would ideally like a solution that is both clean and efficient. I'm not tied to using awk at all.

Edit:

I just noticed that even when I manually type the patterns into awk, I still receive the warning message. I am likely not passing the variables to awk correctly.

dayne
  • 153
  • 1
  • 6
  • This works: `awk '/:$/,/^\*{58}$/'`. But I'm cheating. :) – Satō Katsura Jul 20 '16 at 15:45
  • @steeldriver I don't understand the `~` syntax in the answers you referenced, but I just tried `awk -v pat1="${pat1}" -v pat2="${pat2}" '$0 ~ pat1 {p=1}; p; $0 ~ pat2 {p=0}' $1` with no luck. – dayne Jul 20 '16 at 15:53
  • Sorry it appears I misunderstood your question - please ignore. Although FWIW it *does* appear to produce your desired output, for me. – steeldriver Jul 20 '16 at 15:54
  • See also [Pass shell variable as a /pattern/ to awk](http://unix.stackexchange.com/a/120806) – Stéphane Chazelas Jul 20 '16 at 16:12
  • It is not a duplicate. The problem here is with `-v var=value` doing some backslash processing. While the other question is about doing a regexp match with an awk variable. It just so happens that one of the answers there addresses the specific issue in this question. – Stéphane Chazelas Jul 20 '16 at 21:06

1 Answers1

9

Several options here:

  • pat1, pat2 treated as regexps:

    pat1="A sentence here"
    pat2='\*{58}'
    export pat1 pat2
    awk '$0 ~ ENVIRON["pat1"], $0 ~ ENVIRON["pat2"]'
    

    Note that mawk and versions of gawk prior to 4.0.0 do not support the {} extended regular expression operator. For old versions of gawk, you can pass the POSIXLY_CORRECT environment variable to make it recognise it.

    Here using the start-condition, end-condition [{action}] approach, but you could do the same with your p flag approach.

  • pat1, pat2 treated as fixed strings:

    pat1="A sentence here"
    pat2=$(printf '*%.0s' {1..58})
    export pat1 pat2
    awk 'index($0, ENVIRON["pat1"]), index($0, ENVIRON["pat2"])'
    

    Here, index() searches for the needle (the variable content) anywhere in the haystack (the current record (line)), but you could also do a simple full-line comparison:

    awk '"" $0 == ENVIRON["pat1"], "" $0 == ENVIRON["pat2"]'
    

    (the "" is to force a string comparison even in cases where both $0 and ENVIRON["patx"] are numerical).

Avoid using -v to pass data that may contain backslash characters as awk does some C escape sequence (\n, \b, \\...) processing on them so you'd need to escape the backslashes (and with GNU awk 4.2 or above, values that start with @/ and end in / are also a problem). Same goes for variables passed like awk '...code...' awkvar="$shellvar". Use ENVIRON or ARGV instead.

See this answer to a related question for further details.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • The first approach did not give the desired output (unless I screwed something up), but the second approach worked great! – dayne Jul 20 '16 at 17:33
  • @dayne, is you `awk` `mawk` or an older version of `gawk`? Can you tell which implementation and version? – Stéphane Chazelas Jul 20 '16 at 18:03
  • Looks like I am using GNU Awk 3.1.7 -- based on `awk -Wversion 2>/dev/null || awk --version`. It could certainly just be a version issue. – dayne Jul 20 '16 at 18:20
  • @dayne, yes, like I said, in old versions of `gawk`, you had to set gawk in POSIX mode or use `--re-interval`. Best is to run `awk` as `POSIXLY_CORRECT=1 awk '....'` so it works whether `awk` is `gawk` or any other `awk`. That was changed in 4.0.0 released in 2011. – Stéphane Chazelas Jul 20 '16 at 20:58