5

I would like to check if a string contains a letter (not a specific letter, really any letter) more than once.

for example:

user:

test.sh this list

script:

if [ "$1" has some letter more then once ]
then 
do something
fi
Rui F Ribeiro
  • 55,929
  • 26
  • 146
  • 227
user147266
  • 51
  • 1
  • 2

3 Answers3

5

You can use grep.

The regexp \(.\).*\1 matches any single character, followed by anything, followed by the same first character.

grep returns success if at least one row matches the regex.

if echo "$1" | grep -q '\(.\).*\1' ; then  
  echo "match" ; 
fi

Note that \(.\) matches any character not any letter, perhaps you have to restrict the regex to your specific definition of "really any letter". You can use something like \([[:alnum:]]\).*\1, \([[:alpha:]]\).*\1 or \([a-df-z1245]\).*\1.

andcoz
  • 16,830
  • 3
  • 38
  • 45
  • Note that it will only work if those characters are on the same line (won't work if `$1` is `$'a\na'` for instance), won't work for the newline characters. Depending on the implementation of `echo`, it won't work with strings like `-nene` or strings containing backslashes. [You should avoid `echo` for arbitrary data](http://unix.stackexchange.com/q/65803) – Stéphane Chazelas Dec 13 '15 at 14:05
2
c=$(expr " $string" : " .*\(.\).*\1") &&
  printf '"%s" has "%s" (at least) more than once\n' "$string" "${c:-<newline>}"

To get a report of duplicate bytes, on a GNU system, you could do:

$ string=$'This is a string\nwith «multi-byte» «characters»\n'
printf %s "$string" | od -An -vtc -w1 | LC_ALL=C sort | LC_ALL=C uniq -dc
      5
      3    a
      2    c
      2    e
      3    h
      5    i
      3    r
      4    s
      5    t
      2   \n
      2  253
      2  273
      4  302

The bytes outside of the range covered by ASCII are represented as their octal value, the control characters with their octal value or the \x C representation.

To get a report of duplicate characters:

$ printf %s "$string" | recode ..dump | sort | uniq -dc
      2 000A   LF    line feed (lf)
      5 0020   SP    space
      3 0061   a     latin small letter a
      2 0063   c     latin small letter c
      2 0065   e     latin small letter e
      3 0068   h     latin small letter h
      5 0069   i     latin small letter i
      3 0072   r     latin small letter r
      4 0073   s     latin small letter s
      5 0074   t     latin small letter t
      2 00AB   <<    left-pointing double angle quotation mark
      2 00BB   >>    right-pointing double angle quotation mark

Note however that recode doesn't know about all Unicode characters (especially not the recent ones).

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
1

You could use fold to print the string one character per line, then uniq -c to count them and awk to print only those that appeared more than once:

$ string="foobar"
$ fold -w 1 <<< "$string" | sort | uniq -c | awk '$1>1'
      2 o

Or, if your shell doesn't support here strings:

printf '%s\n' "$string" | fold -w 1 | sort | uniq -c | awk '$1>1'

Then, you could test whether the command above returns an empty string or not:

$ string="foobar"
$ [ -n "$(fold -w 1 <<<"$string" | sort | uniq -c | awk '$1>1')" ] && echo repeated
repeated

You could then easily extend it to print the repeated character and the number of times it was repeated:

$ rep="$(fold -w 1 <<<"$string" | sort | uniq -c | awk '$1>1')"
$ [ -n "$rep" ] && printf -- "%s\n" "$rep"
    2 o
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
terdon
  • 234,489
  • 66
  • 447
  • 667