check if a string has a character more than once

Question

I would like to check if a string contains a letter (not a specific letter, really any letter) more than once.

for example:

user:

test.sh this list

script:

if [ "$1" has some letter more then once ]
then 
do something
fi

andcoz · Answer 1 · 2015-12-13T13:42:19.740

5

You can use grep.

The regexp $.$.*\1 matches any single character, followed by anything, followed by the same first character.

grep returns success if at least one row matches the regex.

if echo "$1" | grep -q '\(.\).*\1' ; then  
  echo "match" ; 
fi

Note that $.$ matches any character not any letter, perhaps you have to restrict the regex to your specific definition of "really any letter". You can use something like $[[:alnum:]]$.*\1, $[[:alpha:]]$.*\1 or $[a-df-z1245]$.*\1.

edited Dec 13 '15 at 13:42

answered Dec 13 '15 at 13:35

andcoz

16,830
3
38
45

Note that it will only work if those characters are on the same line (won't work if `$1` is `$'a\na'` for instance), won't work for the newline characters. Depending on the implementation of `echo`, it won't work with strings like `-nene` or strings containing backslashes. [You should avoid `echo` for arbitrary data](http://unix.stackexchange.com/q/65803) – Stéphane Chazelas Dec 13 '15 at 14:05

Stéphane Chazelas · Answer 2 · 2015-12-13T14:28:29.497

c=$(expr " $string" : " .*\(.\).*\1") &&
  printf '"%s" has "%s" (at least) more than once\n' "$string" "${c:-<newline>}"

To get a report of duplicate bytes, on a GNU system, you could do:

$ string=$'This is a string\nwith «multi-byte» «characters»\n'
printf %s "$string" | od -An -vtc -w1 | LC_ALL=C sort | LC_ALL=C uniq -dc
      5
      3    a
      2    c
      2    e
      3    h
      5    i
      3    r
      4    s
      5    t
      2   \n
      2  253
      2  273
      4  302

The bytes outside of the range covered by ASCII are represented as their octal value, the control characters with their octal value or the \x C representation.

To get a report of duplicate characters:

$ printf %s "$string" | recode ..dump | sort | uniq -dc
      2 000A   LF    line feed (lf)
      5 0020   SP    space
      3 0061   a     latin small letter a
      2 0063   c     latin small letter c
      2 0065   e     latin small letter e
      3 0068   h     latin small letter h
      5 0069   i     latin small letter i
      3 0072   r     latin small letter r
      4 0073   s     latin small letter s
      5 0074   t     latin small letter t
      2 00AB   <<    left-pointing double angle quotation mark
      2 00BB   >>    right-pointing double angle quotation mark

Note however that recode doesn't know about all Unicode characters (especially not the recent ones).

score 1 · Answer 3 · edited Dec 13 '15 at 14:24

You could use fold to print the string one character per line, then uniq -c to count them and awk to print only those that appeared more than once:

$ string="foobar"
$ fold -w 1 <<< "$string" | sort | uniq -c | awk '$1>1'
      2 o

Or, if your shell doesn't support here strings:

printf '%s\n' "$string" | fold -w 1 | sort | uniq -c | awk '$1>1'

Then, you could test whether the command above returns an empty string or not:

$ string="foobar"
$ [ -n "$(fold -w 1 <<<"$string" | sort | uniq -c | awk '$1>1')" ] && echo repeated
repeated

You could then easily extend it to print the repeated character and the number of times it was repeated:

$ rep="$(fold -w 1 <<<"$string" | sort | uniq -c | awk '$1>1')"
$ [ -n "$rep" ] && printf -- "%s\n" "$rep"
    2 o

Note that it doesn't work for newline characters. With GNU fold, it doesn't work for multi-byte characters. — Stéphane Chazelas, Dec 13 '15 at 14:25

check if a string has a character more than once

3 Answers3