5

I stumbled by accident on the following bash behaviour, which is for me kind of unexpected.

# The following works
$ declare bar=Hello                               # Line 1
$ declare -p bar                                  # Line 2
declare -- bar="Hello"
$ foo=bar                                         # Line 3
$ declare ${foo}=Bye                              # Line 4
$ declare -p bar                                  # Line 5
declare -- bar="Bye"
# The following fails, though
$ declare -a array=( A B C )                      # Line 6
$ declare -p array                                # Line 7
declare -a array=([0]="A" [1]="B" [2]="C")
$ foo=array                                       # Line 8
$ declare -a ${foo}=([0]="A" [1]="XXX" [2]="C")   # Line 9
bash: syntax error near unexpected token `('`
# Quoting the assignment fixes the problem
$ declare -a "${foo}=(A YYY C)"                   # Line 10
$ declare -p array                                # Line 11
declare -a array=([0]="A" [1]="YYY" [2]="C")

Since shell expansion

  1. Brace expansion
    • Tilde expansion
    • Parameter and variable expansion
    • Arithmetic expansion
    • Process substitution
    • Command substitution
  2. Word splitting
  3. Filename expansion

is performed on the command line after it has been split into tokens (followed by quote removal) but before the final command is executed, I would not have expected line 9 to fail.

Which is the rationale behind it, that makes bash not accept line 9? Or, said differently, what am I missing in the way line 9 is processed by bash that makes it fail but makes line 10 succeed?

In any case, quoting is not always going to straightforwardly work and it would require extra attention in case the array elements are strings with e.g. spaces.

Axel Krypton
  • 324
  • 2
  • 12
  • http://git.savannah.gnu.org/cgit/bash.git/tree/parse.y#n5302 "A word is an assignment if it appears at the beginning of a simple command, or after another assignment word. This is context-dependent, so it cannot be handled in the grammar." So the parser can't handle `(` like it would if it appeared elsewhere, and leaves it to the next level, which may barf on something that isn't a *name*. – muru Mar 16 '20 at 10:09
  • Observation: use `set -x` and see the difference between `declare -a array=( B C )`, `declare -a "array=( B C )"`, `declare -a "array=(" B C ")"` and similar variants. – Kamil Maciorowski Mar 16 '20 at 10:17
  • @KamilMaciorowski Yes, this is what I intended with the last sentence in the question. @muru The fact that information is in a comment in the `bash` implementation sounds like it was worth asking this question :) ...I now get that the assignment is not recognised as such since it is not at the beginning of a simple command, but this is somehow expected, isn't it? I mean, what we call *assignment* is indeed an argument of `declare` or is it wrong to think in this way? Still I do not get the difference between line 4 and line 9. – Axel Krypton Mar 16 '20 at 10:44

1 Answers1

7

tl;dr; I think it's just a syntax quirk, and you shouldn't presume some grand design behind it.

Bash is using a bison/yacc-generated parser, but just like with many other languages (C, perl, etc), it's not a "clean" parser, but it's also keeping some state separate/parallel to the grammar in the parser_state variable.

A flag kept in that state variable is PST_ASSIGNOK. That will be set when some builtin which was parsed as a WORD token had ASSIGNMENT_BUILTIN in its flags.

Such "assignment builtins" are local, typeset, declare, alias, export and readonly.

The PST_ASSIGNOK will direct the parser to consider parentheses as part of a WORD token when used after an assignment at the right of such a builtin. But it will NOT change the rules which determine whether the current token is actually an assignment: Since ${foo}=(...) is not an acceptable assignment, it will not be parsed as a single word, and the parentheses will trigger a syntax error just like in echo foo(bar).

After a command line was parsed, it will be expanded, and as part of the expansions, any compound assignment (WORD which was marked with W_COMPASSIGN) like var=(1 2) will be performed and replaced with var, which will then be passed as an argument to a builtin like declare. But if declare, after all the expansions, gets an argument of the form var=(...), it will parse and expand it itself again.

So, varname=foo; declare "$var=(1 2 3)" may be similar to declare foo='(1 2 3)'. Or to declare foo=(1 2 3), depending on whether the variable was already defined:

$ declare 'foo=(1 2 3)'; typeset -p foo
declare -- foo="(1 2 3)"
$ declare foo=(1); typeset -p foo
declare -a foo=([0]="1")
$ declare 'foo=(1 2 3)'; typeset -p foo
declare -a foo=([0]="1" [1]="2" [2]="3")

I don't think it's a good idea to rely on this corner case:

$ declare 'bar=(1 ( )'; typeset -p bar
declare -- bar="(1 ( )"
$ declare bar=(1); typeset -p bar
declare -a bar=([0]="1")
$ declare 'bar=(1 ( )'; typeset -p bar
bash: syntax error near unexpected token `('
  • It still looks to me not completely logic and your `tl;dr;` in the beginning is probably the take home message. I naively thought that the parser was building up the *assignment argument* of `declare` and later, somehow, this assignment was carried out. This way of thinking seems not to work, though. I am not sure I'm getting the point of your snippets. If you add `-a` to the first `declare`, you get the output of the third one, which makes sense to me, since you'd explicitly say that `foo` or `bar` are arrays from the beginning on. In line 4 VS line 9 in my question I thought I was coherent. – Axel Krypton Mar 16 '20 at 14:34
  • I've tried to expand a bit: basically, a command line is first parsed by the yacc parser, and then, AFTER the line was parsed, the expansions are performed as part of _executing the command line_; a word like `foo=(1 2)` will expand to `foo` while assigning it `(1 2)` to the `foo` array. `local`, `declare`, etc will then mark the variable appropriately. But those builtins will also parse an argument like `foo=(1 2)` _themselves_, if that's what they got after all the expansions (eg. as a result of `v=q; declare -a "$v=(r t)"`) –  Mar 16 '20 at 15:34
  • The point of my snippets is to point out related quirks of the declare, local, typeset, etc builtins, which are quite unexpected IMHO. –  Mar 16 '20 at 15:35
  • I was not aware that **any assignment will be expanded to its variable name which will be passed as an argument to the builtin; if it was quoted, it's the builtin itself which will reparse and expand it**. This helps a lot. Still I think it is not completely clear the order in which the operations take place. I mean, I see now that `v=q; declare -a "$v=(r t)"` postpone the assignment to the `declare` builtin, but why should this line without quotes fail? I am totally with you in the last comment. – Axel Krypton Mar 16 '20 at 15:54
  • 1
    `declare -a $v=(r t)` fails much earlier, in the parser. Basically the `token_is_assignment()` -> `assignment()` function does not accept `$v=` as a possible assignment, the test a [line 5593](http://git.savannah.gnu.org/cgit/bash.git/tree/parse.y#n5173) will not match, the `(` will not be considered as part of a `WORD`, and the line will fail the same way as `foo (bar)`. –  Mar 16 '20 at 15:59
  • Will instead `v=q; declare $v=Hello` (without quotes) pass such a check? If yes I am puzzle because `$v=` is not a valid bash assignment, if not then I am puzzled because this is somehow accepted by the shell... I think we are almost there and I really appreciate your explanations. I tried to read the C code of the parser, but it is not so immediate... – Axel Krypton Mar 16 '20 at 16:25
  • 1
    All this thing only applies to compound assignments, ie arrays. In the case of `v=q; declare $v=Hello`, there are no `(` or `)` metacharacters to trip on (even `echo $v=Hello` will work!), and declare will be passed a `q=Hello` argument which it will parse itself. –  Mar 16 '20 at 16:29
  • Thanks for digging out all the details. I think I have now a better overview. And, yes, it is definitely quirky! `:)` – Axel Krypton Mar 16 '20 at 17:31
  • When you write *The `PST_ASSIGNOK` will direct the parser to consider parentheses as part of `WORD`s when used after an assignment at the right of such a builtin. But it will NOT change the rules which determine whether the current token is actually an assignment*, where/how does the parser accepts `$v=Hello` if the rules to determine whether the current token is actually an assignment are still the same? – Axel Krypton Mar 16 '20 at 17:44
  • 1
    The parser does not accept `$v=Hello` as an assignment, just just tokenize it as word (exactly as `$v=Hello`, the `$v` will be expanded _later_). I've already pointed you to [line 5173](http://git.savannah.gnu.org/cgit/bash.git/tree/parse.y#n5173) of `parse.y`: there the parser checks if the following character is a `=`, if it's part of a valid assignment, if the state is in assignok mode, etc and then PEEKS at next char to see if it's an opening parenthesis (`if MBTEST(peek_char == '(')`). If any of those conditions are not met, nothing special happens. –  Mar 16 '20 at 17:53
  • I think I read all the comments and the parser code like *too many times*. Just to be sure. Is then the point that `(` and `)` are used to split the full line in tokens and then `$v=(...` triggers `$v=` to be seen as token which is in turn identified as not valid assignment, whereas in `$v=Hello` there is no `(` and this is seen as whole token, which is then valid? [By the way I was reading at line 5593 since you wrote like that in your previous command, sorry.] – Axel Krypton Mar 16 '20 at 18:17
  • 1
    Yes, I had written 5593, sorry, but the link was to 5173. Yes, `$v=Hello` is will turn into a `WORD` token, but `$v=(3)`, will turn into four: `WORD`, `(`, `WORD` and `)`, which will be rejected by the yacc parser. –  Mar 16 '20 at 18:33