2

So, a while ago I saw this snippet for extracting text between two "markers":

# Usage: extract file "opening marker" "closing marker"
    while IFS=$'\n' read -r line; do
        [[ "$extract" && "$line" != "$3" ]] &&
            printf '%s\n' "$line"

        [[ "$line" == "$2" ]] && extract=1
        [[ "$line" == "$3" ]] && extract=
    done < "$1"

(Here i just took the liberty to remove it from the function and put it in a file called extract) Now, it does work fine on "most" pair of markers. But i noticed it doesn't always work:

Following the original snippet's example, using N repeated char (using "#" instead of "`" because of formatting error on SO):

###sh
test
###

works when doing extract file '###sh' '###' but if we use the following marker:

###
test
###

and do extract file '###' '###', then it doesn't work?

Though i can see that the condition in the script does evaluate correctly (the extract variable being equal to 1 when using set -x).

What's wrong here?

PS: By saying "It doesn't work", I do mean that it doesn't print anything in the instance when it doesn't work, of course.

The two example output above shouldn't contain the markers (just the texts extracted between two markers)...

I prefer a bash/shell solution if possible.

Nordine Lotfi
  • 2,200
  • 12
  • 45
  • 1
    Alternative: `sed -n "/$2/,/$3/p" $1`, although a bit more work is required if the markers contain quotes or slashes. – berndbausch May 08 '21 at 23:25
  • I prefer if the solution is in bash/shell if possible :) (or at least to know why the above doesn't work as expected). I appreciate this alternative though, thanks @berndbausch – Nordine Lotfi May 08 '21 at 23:26
  • 2
    It fails when the markers are the same because if either `"$line" == "$2"` or `"$line" == "$3"` is true, then the other is necessarily true also? – steeldriver May 08 '21 at 23:32
  • 1
    It can't work if the two markers are identical. `extract` is set to 1, then to the empty string. My `sed` solution is not much better, I am afraid. – berndbausch May 08 '21 at 23:32
  • Any idea on how to make this work with "identical" marker? maybe make it loop through the whole file _until_ it find the closing marker? @steeldriver – Nordine Lotfi May 08 '21 at 23:34
  • I see, that explain it then, thanks :) I do appreciate you posting your own sed solution even if it's not bash/shell centric... @berndbausch – Nordine Lotfi May 08 '21 at 23:35
  • Actually, my `sed` works (I had not expected this) but also prints the markers (which should have been expected). – berndbausch May 08 '21 at 23:36
  • no, actually it might be an error on my part, but I don't _actually_ want the markers in the output...will edit my post to include this @berndbausch – Nordine Lotfi May 08 '21 at 23:37
  • In the test clauses, also test for the value of `extract`: `[[ extract=="" && "$line" == $2 ]]` or so. – berndbausch May 08 '21 at 23:37
  • 1
    Always paste your script into `https://shellcheck.net`, a syntax checker, or install `shellcheck` locally. Make using `shellcheck` part of your development process.. – waltinator May 08 '21 at 23:46
  • Thanks, yeah i usually do it but, I didn't here because it was from that specific repo, where I'm used to find "working" bash/shell code/project :) Guess i should have used it here too... @waltinator – Nordine Lotfi May 08 '21 at 23:47
  • I see, but where exactly? After the "printf" command or before? @berndbausch – Nordine Lotfi May 08 '21 at 23:48
  • At the same location as the tests are made ow, i.e. after the print. – berndbausch May 09 '21 at 00:18
  • Use sed or awk. or anything but a shell `while read` loop. See [Why is using a shell loop to process text considered bad practice?](https://unix.stackexchange.com/q/169716/7696) – cas May 09 '21 at 05:45
  • Already aware of this link but i appreciate you telling me :) I only need this for a couple small files anyway (and it's mainly for myself) so I think it's fine... @cas – Nordine Lotfi May 09 '21 at 05:51
  • small tasks are good practice for sed and awk, so you know how to use them when you need to for larger tasks. – cas May 09 '21 at 05:55
  • yeah, for sure :D @cas – Nordine Lotfi May 09 '21 at 06:09

2 Answers2

4

As stated by others in comment to your question, your script does not work because when the start condition [[ "$line" == "$2" ]] is met, extract is set to 1, but on the next line the end condition [[ "$line" == "$3" ]] is also met, which reset extract to the empty string.

Here is your script fixed:

# Usage: extract file "opening marker" "closing marker"
while IFS=$'\n' read -r line; do
    if [ "$extract" ]; then
        if [[ "$line" == "$3" ]]; then
             extract=
        else
            printf '%s\n' "$line"
        fi
    elif [[ "$line" == "$2" ]]; then
        extract=1
    fi
done < "$1"

And, in case you need this, at @Freddy's suggestion, here is a slightly modified version that requires that the end marker be present for the text to be printed:

# Usage: extract file "opening marker" "closing marker"
while IFS=$'\n' read -r line; do
    if [ "$extract" ]; then
        if [[ "$line" == "$3" ]]; then
            printf '%s\n' "${lines[@]}"
            lines=() extract=
        else
            lines+=( "$line" )
        fi
    elif [[ "$line" == "$2" ]]; then
        extract=1
    fi
done < "$1"

(lines are accumulated in the lines array and are only printed when the end marker is met)

xhienne
  • 17,075
  • 2
  • 52
  • 68
0

Add a toggling logic to the extract variable whenever $2 is seen. Thanks to xhiene for pointing it out.!

[[ $line == $2 ]] && case $extract in '') extract=1;; *) extract=; esac

And remove the $3 dependency on extract variable now.

HTH.

guest_7
  • 5,698
  • 1
  • 6
  • 13
  • 2
    If the two markers are the same, when will you hit the `[[ "$line" == "$3" ]]` line, then? – xhienne May 09 '21 at 00:13
  • Yes that is correct, it will never enter there. Toggling extract is the way to go in that case. – guest_7 May 09 '21 at 00:16
  • Usually the OP uses the same script with different markers. Now he asks us to fix it when the markers are the same. By toggling like you are doing, the script is now broken when the markers are different. – xhienne May 09 '21 at 00:28
  • There's another flaw (not yours). The input only needs the start marker to print the text. – Freddy May 09 '21 at 00:29