2

I have a script equijoin2:

#! /bin/bash

# default args                                                                                                                                                                    
delim="," # CSV by default                                                                                                                                                        
outer=""
outerfile=""
# Parse flagged arguments:                                                                                                                                                        
while getopts "o:td:" flag
do
  case $flag in
    d) delim=$OPTARG;;
    t) delim="\t";;
    o) outer="-a $OPTARG";;
    ?) exit;;
  esac
done
# Delete the flagged arguments:                                                                                                                                                   
shift $(($OPTIND -1))
# two input files                                                                                                                                                                 
f1="$1"
f2="$2"
# cols from the input files                                                                                                                                                       
col1="$3"
col2="$4"


join "$outer" -t "$delim" -1 "$col1" -2 "$col2" <(sort "$f1") <(sort "$f2")

and two files

$ cat file1
c c1
b b1
$ cat file2
a a2
c c2
b b2

Why does the last command fail? Thanks.

$ equijoin2 -o 2  -d " " file1 file2 1 1
a a2
b b1 b2
c c1 c2
$ equijoin2 -o 1  -d " " file1 file2 1 1
b b1 b2
c c1 c2
$ equijoin2   -d " " file1 file2 1 1
join: extra operand '/dev/fd/62'
Tim
  • 98,580
  • 191
  • 570
  • 977

1 Answers1

4

"$outer" is a quoted scalar variable so it always expands to one argument. If empty or unset, that still expands to one empty argument to join (and when you call your script with -o2, that's one -a 2 argument instead of the two arguments -a and 2).

Your join is probably GNU join in that it accepts options after non-option arguments. That "$outer" is a non-option argument when empty as it doesn't start with - so is treated as a file name and join complains about the third file name provided which it doesn't expect.

If you want a variable with a variable number of arguments, use an array:

outer=()
...
(o)
   outer=(-a "$OPTARG");;

...
join "${outer[@]}"

Though here you could also do:

outer=
...
(o)
   outer="-a$OPTARG";;
...
join ${outer:+"$outer"} ... <(sort < "$f1") <(sort < "$f2")

Or:

unset -v outer
...
(o)
   outer="$OPTARG";;
...
join ${outer+-a "$outer"} ...

(that one doesn't work in zsh except in sh/ksh emulation).

Some other notes:

  • join -t '\t' doesn't work. You'd need delim=$'\t' to store a literal TAB in $delim
  • Remember to use -- when passing arbitrary arguments to commands (or use redirections where possible). So sort -- "$f1" or better sort < "$f1" instead of sort "$f1".
  • arithmetic expansions are also subject to split+glob so should also be quoted (shift "$((OPTIND - 1))") (here not a problem though as you're using bash which doesn't inherit $IFS from the environment and you're not modifying IFS earlier in the script, but still good practice).
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • Thanks. For `sort -t '\t'`, (1) does it also apply to `join -t `? (2) coreutils manual doesn't mention that or I miss it. The manual says "To specify ASCII NUL as the fi eld separator, use the two-character string‘ \0’, e.g., ‘sort -t ’\0’’." Is `\t '\0'` an exception? – Tim Jul 24 '18 at 20:41
  • @Tim, my bad, I meant `join -t`, not `sort -t`. `join -t '\0'` is a GNU extension. Generally, other implementations of text utilities can't cope with NUL bytes as that's not text. NUL is the one byte that can't be passed as argument to an _executed_ command, so it has to be represented by some form of encoding. – Stéphane Chazelas Jul 24 '18 at 20:46
  • @Tim, that's not `bash`, that's GNU join which chooses to understand `\0` as the NUL byte. `bash`'s `$'\0'` actually expands to the empty string, not a NUL byte. `zsh`'s `$'\0'` expands to a NUL byte but only works for builtins or functions. A NUL byte can't be passed as argument to a command that is _executed_ because the list of argument passed to the `execve()` system call is a list of NUL-delimited strings. – Stéphane Chazelas Jul 24 '18 at 21:08
  • Sorry deleted the comment. But please keep your reply. The reason I asked if it is bash's ANSI C quoting is "sort won’t accept ‘\t’, since it treats it as a multi-byte character. The solution is to place a $ before it. The dollar sign tells bash to use ANSI-C quoting" https://robfelty.com/2008/07/14/sort-using-tab-as-field-separator-in-bash Is it wrong? – Tim Jul 24 '18 at 21:08
  • "`join -t '\0'` is a GNU extension". Do you mean the GNU extension allows just for `join -t '\0'` or also for other such as `join -t '\t'`? – Tim Jul 24 '18 at 21:09
  • `bash` expands `$'\t'` to a TAB character. `'\t'` passes a string of two characters `\ ` and `t`. The point is that you need to pass the character as-is to `join -t`, you can also do `join -t ''` or `join -t "$(printf '\t')"`, but obviously that can't be done for the NUL character as a NUL character can never be passed in an argument, that's a limitation of the `execve()` system call. – Stéphane Chazelas Jul 24 '18 at 21:17
  • Another question "when you call your script with `-o 2`, that's one `-a 2` argument" to `join`, and why does that one `-a 2` argument still work for `join` in the same way as `-a` and `2` two arguments for `join`? See the last three examples in my post. – Tim Jul 24 '18 at 21:34
  • @Tim, because `join '-a 2'` is like `join -a ' 2'` and when `join` parses `' 2'` to extract the number, it skips and ignores the leading spaces. – Stéphane Chazelas Jul 24 '18 at 21:38
  • Thanks. Which provides the way of typing tab by Ctrl-V-tab: bash's readline or terminal emulator or X window system? More at https://unix.stackexchange.com/q/458242/674 – Tim Jul 25 '18 at 16:21