0

I was trying to use an awk command to verify if a particular column is not matching with a regex (basically I am validating a column in a file with uniform format , if not I need to throw error)

format=$2
col_pos=$1

val= `awk -F "|’’ -v m="$format" -v n="$col_pos" '$n ~ "^"m"$"{print $1}' sample_file.txt`

if [[ $val != "" ]]; then
   echo " column value is having unexpected format"
fi

sh sample.sh  [a-z]{8}@gmail.com 3

Awk command is throwing an error. Can anybody help to correct the same?

Input file:

fileid|filename|contactemail
1|file1.txt|[email protected]
2|file2.txt|[email protected]
3|file3.txt|xyz  -------->invalid column value as it doesnt satisfies the format @gmail.com 

Here is the sample program run (expected to catch error as xyz is not a valid email)

$ sh sample.sh 3 [a-z]@gmail.com
$ sh -x sample.sh 3 [a-z]@gmail.com
+ format='[a-z]@gmail.com'
+ col_pos=3
++ awk -F '~' -v 'm=[a-z]@gmail.com' -v n=3 '$n ~ "^"m"$"{print $1}' sample_file.txt
+ val=
+ [[ '' != '' ]]
  • **Which** error is `awk` throwing? – tink May 11 '21 at 22:49
  • And what does the content of `filename` look like? Please edit the question with the extra information ... – tink May 11 '21 at 22:53
  • The obvious (awk) errors are (1) `=~` should be just `~` and (2) `^` and `$` in the computed regex need to be string constants i.e. `$n ~ "^" m "$"`. There are additional issues at the shell level. – steeldriver May 11 '21 at 23:00
  • Thank you @steeldriver i edited the program atleast it is running condition .But logic issue is still there – daturm girl May 11 '21 at 23:22
  • @daturmgirl you're not actually assigning the awk output to the variable `val`, owing to the space after the `=` sign. Really you should not be using "bacticks" at all (they are deprecated), use `$(...)` instead, so `val=$(awk ...)`. Also your actual script appears to still use the wrong field separator (`-F '~'` rather than `-F '|'` to match your sample data). – steeldriver May 12 '21 at 00:17
  • ... see [Spaces in variable assignments in shell scripts](https://unix.stackexchange.com/questions/258727/spaces-in-variable-assignments-in-shell-scripts) for explanation – steeldriver May 12 '21 at 00:19

2 Answers2

3

There are a few issues here.

  • Added a #!/bin/sh shebang to your script. If you make it executable with chmod +x sample.sh, you may call it as ./sample.sh ...
  • Fixed the field separator to '|'
  • Replaced deprecated command substitution backticks notation `...` with $(...) and removed space character in variable assignment
  • Added NR>1 to skip the first (header) line of the input file
  • If you want to match non-matching email addresses, negate the regex match: !~
  • The double bracket [[...]] test is not a valid sh construct and was changed to [...] in combination with the -n test operator, which is true if the following string is non-empty.

I also added $val to the echo output to be able to see where the error occurred and printed $n instead of $1. Change that back as needed. The output goes to stderr (>&2) and the script exits with non-zero exit status to indicate a failure.

Modified script:

#!/bin/sh

val=$( awk -F'|' -v n="$1" -v m="$2" 'NR>1 && $n !~ "^" m "$"{ print $n }' sample_file.txt )

if [ -n "$val" ]; then
    echo "column value is having unexpected format: $val" >&2
    exit 1
fi

Your regexes don't match the email addresses if you match the full field with ^ and $,
using '[a-z][email protected]' would work for example. Make sure to quote at least the regex parameter to prevent possible shell interpretation.

Sample run:

$ ./sample.sh 3 '[a-z][email protected]'
column value is having unexpected format: xyz
$ ./sample.sh 3 'xyz'
column value is having unexpected format: [email protected]
[email protected]
Freddy
  • 25,172
  • 1
  • 21
  • 60
1

Building on @Freddy's excellent answer, you can have awk log the errors found in the input file to STDERR and then have the shell redirect STDERR to a log file with 2> (you can write directly to the error log file from awk if you want to, but it's more flexible to use the shell to redirect STDERR).

awk -F'|' -v n="$1" -v m="$2" '
    FNR>1 && $n !~ "^" m "$" {
      print NR ":" $0 > "/dev/stderr"
    }' input.txt 2> error.log

You can also make it return a count of errors on STDOUT, to be captured for the $val shell variable:

#!/bin/sh

val=$(awk -F'|' -v n="$1" -v m="$2" '
        FNR>1 && $n !~ "^" m "$" {
          printf "%s:%s:%s\n", FILENAME, FNR, $0 > "/dev/stderr"
          count++
        }
        END {print count}' sample_file.txt 2> errors.log
     )

if [ "$val" != 0 ]; then
    echo "$val errors found in input:"
    cat errors.log
    exit 1
fi

For example:

$ ./sample.sh 3 xyz
2 errors found in input:
sample_file.txt:2:1|file1.txt|[email protected]
sample_file.txt:3:2|file2.txt|[email protected]

Note: awk will use - for FILENAME if the input comes from STDIN, so the error log would look something like:

-:4:3|file3.txt|xyz
cas
  • 1
  • 7
  • 119
  • 185
  • @freddy and cas thank you for the excellent help . let me try this i would like accept both answers as right . but can select only one – daturm girl May 12 '21 at 13:06
  • @daturmgirl on the SE sites, best practice is to upvote and accept the one that best answers your question and upvote any other answers you like or find useful. Pick Freddy's answer, obviously - mine didn't actually answer your question, just extended Freddy's answer with extra stuff. See [What should I do when someone answers my question?](https://unix.stackexchange.com/help/someone-answers) – cas May 12 '21 at 13:10
  • thanks @cas i did it i am fairly new to this site and unix thanks for the help – daturm girl May 12 '21 at 13:16
  • @freddy i was trying to test run your code .looks like some small issue i am facing Can you please help. I am getting all the records instead of unmatched records. Test run and code is pasted in the next comment – daturm girl May 12 '21 at 13:36
  • $ sh -x poc_col_val_email.sh 3 '[a-z][email protected]' ++ awk '-F|' -v n=3 -v 'm=[a-z][email protected]' 'NR>1 && $n !~ "^" m "$"{ print $n }' /test/data/infa_shared/dev/SrcFiles/datawarehouse/poc_anjali/sample_file.txt + val='[email protected] [email protected] xyz' + '[' -n '[email protected] [email protected] xyz' ']' + echo 'column value is having unexpected format: [email protected] [email protected] xyz' column value is having unexpected format: [email protected] [email protected] xyz + exit 1 – daturm girl May 12 '21 at 13:37
  • #!/bin/sh val=$( awk -F'|' -v n="$1" -v m="$2" 'NR>1 && $n !~ "^" m "$"{ print $n }' /test/data/infa_shared/dev/SrcFiles/datawarehouse/poc_anjali/sample_file.txt ) if [ -n "$val" ]; then echo "column value is having unexpected format: $val" >&2 exit 1 fi – daturm girl May 12 '21 at 13:38
  • cat sample_file* file_id|filename|contactemail 1|file1.txt|[email protected] 2|file2.txt|[email protected] 3|file3.txt|xyz – daturm girl May 12 '21 at 13:39
  • 1
    @daturmgirl did your files come from a windows machine? with CR/LF line-endings instead of just LF (aka \n or newline)? run `file sample_file.txt`, if it mentions CRLF then you need to convert to unix format text files. Use `dos2unix`. If you don't have that, you can do it with: `perl -p -i -e 's/\r\n/\n/' sample_file.txt` – cas May 12 '21 at 14:29
  • Thank you @Cas it worked with your suggestion . Yes i did edited the input file in Winscp .Now everything looking good thank you – daturm girl May 12 '21 at 14:35