How do I find whether a column of a CSV contains another using mlr's DSL?
In other words I have a CSV
a,b
test and,test and more
and want to find out whether 'test and' (a) is included in 'test and more' (b)
Note: I have edited my reply, using the great comment of @Kusalananda
If you have
a,b
test*and,test*and more
lorem,ipsum
whether,Finding whether a string
you can run
mlr --csv put 'if($b != ssub($b,$a,"")){$test=1}else{$test=0}' input.csv
to get
| a | b | test |
|---|---|---|
| test*and | test*and more | 1 |
| lorem | ipsum | 0 |
| whether | Finding whether a string | 1 |
I'm using the ssub function, to check if I have or not a string replace in b - no regexing, no characters are special - using strings I have in a.
if($b != ssub($b,$a,"")), if after string replace b is not equal to itself, then a is contained in b.
If you want simply to filter, you can run
mlr --csv filter '$b != ssub($b,$a,"")' input.csv
to get
| a | b |
|---|---|
| test*and | test*and more |
| whether | Finding whether a string |
Thank you @Kusalananda