1

I want to find the duplicates in an array and their indices using bash.
For example, I have this array:

arr=("a" "b" "c" "a" "c")

In this case, "a" is a duplicate at index 0 and 3, and "c" is also a duplicate at index 2 and 4.

I am currently using two nested loops but I find it too slow especially when it is a large array.
Is there a better, more efficient way of doing this in bash?

Thank you!

ilkkachu
  • 133,243
  • 15
  • 236
  • 397
lilek3
  • 77
  • 2
  • 7
  • I once posed the same question, but for associative arrays. Does this solve your issue? [Inverting an associative array](https://unix.stackexchange.com/q/506891) – Kusalananda Jul 28 '21 at 14:48

2 Answers2

2

Using awk, feeding array elements as input:

$ printf '%s\n' "${arr[@]}" |
  awk '{ elmnt[$0]= ($0 in elmnt? elmnt[$0] FS:"") NR-1 }
  END{ for (e in elmnt) print e, elmnt[e] }'
a 0 3
b 1
c 2 4

for new requirement (save each result into a shell variable):

$ printf '%s\n' "${arr[@]}" |
  awk -v q="'" '{ elmnt[$0]= ($0 in elmnt? elmnt[$0] FS:"") NR-1 }
  END{ for (e in elmnt) print e, q elmnt[e] q }' OFS='='
a='0 3'
b='1'
c='2 4'

save above command output to a file, then export that file using export varfile (varfile is just a filename), so all the variables will be exported as a shell variables.

αғsнιη
  • 40,939
  • 15
  • 71
  • 114
  • Thank you for the solution! Is there a way to save each result in a bash variable? – lilek3 Jul 28 '21 at 15:22
  • @lilek3 see the second part of the answer, maybe that can help you? see also [What should I do when someone answers my question?](https://unix.stackexchange.com/help/someone-answers) – αғsнιη Jul 28 '21 at 15:45
0

You could use an associative array to check if the value has already been seen, without resorting to a linear scan every time:

#!/bin/bash
arr=("a" "b" "c" "a" "c")
declare -A values=()
for v in "${arr[@]}"; do
    if [ "${values["x$v"]+set}" = set ]; then
        echo "value '$v' is duplicate"
        break
    fi
    values["x$v"]=1
done 
unset values

How fast that will be compared to just dumping the values for awk to process probably depends on the problem size. Shells aren't fast, especially Bash is slow.

ilkkachu
  • 133,243
  • 15
  • 236
  • 397