50

I have the code

file="JetConst_reco_allconst_4j2t.png"
if [[ $file == *_gen_* ]];
then
    echo "True"
else
    echo "False"
fi

I test if file contains "gen". The output is "False". Nice!

The problem is when I substitute "gen" with a variable testseq:

file="JetConst_reco_allconst_4j2t.png"
testseq="gen"
if [[ $file == *_$testseq_* ]];
then
    echo "True"
else
    echo "False"
fi

Now the output is "True". How could it be? How to fix the problem?

Viesturs
  • 933
  • 3
  • 10
  • 15
  • Possible duplicate of [Shell test to find a pattern in a string](https://unix.stackexchange.com/questions/192887/shell-test-to-find-a-pattern-in-a-string) –  Mar 27 '19 at 13:05

4 Answers4

38

Use the =~ operator to make regular expression comparisons:

#!/bin/bash
file="JetConst_reco_allconst_4j2t.png"
testseq="gen"
if [[ $file =~ $testseq ]];
then
    echo "True"
else
    echo "False"
fi

This way, it will compare if $file has $testseq on its contents.

user@host:~$ ./string.sh
False

If I change testseq="Const":

user@host:~$ ./string.sh
True

But, be careful with what you feed $testseq with. If the string on it somehow represents a regex (like [0-9] for example), there is a higher chance to trigger a "match".

Reference:

Zebiano
  • 105
  • 3
34

You need to interpolate the $testseq variable with one of the following ways:

  • $file == *_"$testseq"_* (here $testseq considered as a fixed string)

  • $file == *_${testseq}_* (here $testseq considered as a pattern).

Or the _ immediately after the variable's name will be taken as part of the variable's name (it's a valid character in a variable name).

Kusalananda
  • 320,670
  • 36
  • 633
  • 936
RomanPerekhrest
  • 29,703
  • 3
  • 43
  • 67
23
file="JetConst_reco_allconst_4j2t.png"
testseq="gen"

case "$file" in
    *_"$testseq"_*) echo 'True'  ;;
    *)              echo 'False'
esac

Using case ... esac is one of the simplest ways to perform a pattern match in a portable way. It works as a "switch" statement in other languages (bash, zsh, and ksh93 also allows you to do fall-through in various incompatible ways). The patterns used are the standard file name globbing patterns.

The issue you are having is due to the fact that _ is a valid character in a variable name. The shell will thus see *_$testseq_* as "*_ followed by the value of the variable $testseq_ and an *". The variable $testseq_ is undefined, so it will be expanded to an empty string, and you end up with *_*, which obviously matches the $file value that you have. You may expect to get True as long as the filename in $file contains at least one underscore.

To properly delimit the name of the variable, use "..." around the expansion: *_"$testseq"_*. This would use the value of the variable as a string. Would you want to use the value of the variable as a pattern, use *_${testseq}_* instead.

Another quick fix is to include the underscores in the value of $testseq:

testseq="_gen_"

and then just use *"$testseq"* as the pattern (for a string comparison).

Kusalananda
  • 320,670
  • 36
  • 633
  • 936
  • So the shell will be looking for a variable $testseq_ and not find it and substitute it with an empty string. – Viesturs Jun 13 '17 at 14:39
  • 1
    @Viesturs That's is the heart of the issue, yes. – Kusalananda Jun 13 '17 at 14:39
  • 1
    For a substring search it should be `*"$testseq"*` for `case` like for `[[...]]` (except for zsh unless you enable globsubst) – Stéphane Chazelas Mar 28 '19 at 11:05
  • Simpler than `[ "${str##*substr*}" ] || echo True` ? –  Nov 28 '19 at 20:18
  • 1
    @Isaac In terms of reading and understanding what's happening, yes. It's also easy to extend one test with more test cases without getting an "if-then-elif-then-elif" spaghetti. Although testing a single string the way you show (whether a string disappears in a substitution) _is_ shorter. – Kusalananda Nov 28 '19 at 20:54
6

For the portable way to test if an string contains a substring, use:

file="JetConst_reco_allconst_4j2t.png";       testseq="gen"

[ "${file##*$testseq*}" ] || echo True Substring is present

Or "${file##*"$testseq"*}" to avoid interpreting glob characters in testseq.

  • You'd need something like [ "${file##*$testseq*}" != "$file" ] because in dash that is Remove Largest Prefix Pattern. – Noel Grandin Feb 26 '20 at 10:24
  • No, @NoelGrandin there is no change on most shells (including dash), the **Largest Prefix Pattern** will be the whole string if the variable subpattern (`$testseq`) value is contained inside the `$file` value. Try: `dash -c 'file="JetConst_reco_allconst_4j2t.png"; testseq="reco"; echo "=${file##*"$testseq"*}="'` to confirm that dash will remove the whole string. –  Feb 26 '20 at 18:53