I would like to check if a string contains a letter (not a specific letter, really any letter) more than once.
for example:
user:
test.sh this list
script:
if [ "$1" has some letter more then once ]
then
do something
fi
I would like to check if a string contains a letter (not a specific letter, really any letter) more than once.
for example:
user:
test.sh this list
script:
if [ "$1" has some letter more then once ]
then
do something
fi
You can use grep.
The regexp \(.\).*\1 matches any single character, followed by anything, followed by the same first character.
grep returns success if at least one row matches the regex.
if echo "$1" | grep -q '\(.\).*\1' ; then
echo "match" ;
fi
Note that \(.\) matches any character not any letter, perhaps you have to restrict the regex to your specific definition of "really any letter". You can use something like \([[:alnum:]]\).*\1, \([[:alpha:]]\).*\1 or \([a-df-z1245]\).*\1.
c=$(expr " $string" : " .*\(.\).*\1") &&
printf '"%s" has "%s" (at least) more than once\n' "$string" "${c:-<newline>}"
To get a report of duplicate bytes, on a GNU system, you could do:
$ string=$'This is a string\nwith «multi-byte» «characters»\n'
printf %s "$string" | od -An -vtc -w1 | LC_ALL=C sort | LC_ALL=C uniq -dc
5
3 a
2 c
2 e
3 h
5 i
3 r
4 s
5 t
2 \n
2 253
2 273
4 302
The bytes outside of the range covered by ASCII are represented as their octal value, the control characters with their octal value or the \x C representation.
To get a report of duplicate characters:
$ printf %s "$string" | recode ..dump | sort | uniq -dc
2 000A LF line feed (lf)
5 0020 SP space
3 0061 a latin small letter a
2 0063 c latin small letter c
2 0065 e latin small letter e
3 0068 h latin small letter h
5 0069 i latin small letter i
3 0072 r latin small letter r
4 0073 s latin small letter s
5 0074 t latin small letter t
2 00AB << left-pointing double angle quotation mark
2 00BB >> right-pointing double angle quotation mark
Note however that recode doesn't know about all Unicode characters (especially not the recent ones).
You could use fold to print the string one character per line, then uniq -c to count them and awk to print only those that appeared more than once:
$ string="foobar"
$ fold -w 1 <<< "$string" | sort | uniq -c | awk '$1>1'
2 o
Or, if your shell doesn't support here strings:
printf '%s\n' "$string" | fold -w 1 | sort | uniq -c | awk '$1>1'
Then, you could test whether the command above returns an empty string or not:
$ string="foobar"
$ [ -n "$(fold -w 1 <<<"$string" | sort | uniq -c | awk '$1>1')" ] && echo repeated
repeated
You could then easily extend it to print the repeated character and the number of times it was repeated:
$ rep="$(fold -w 1 <<<"$string" | sort | uniq -c | awk '$1>1')"
$ [ -n "$rep" ] && printf -- "%s\n" "$rep"
2 o