2

I created an environment variable:

WD=`pwd`

How can I check if it contains spaces or non-English letters?

Mat
  • 51,578
  • 10
  • 158
  • 140
myWallJSON
  • 1,121
  • 3
  • 14
  • 20
  • Can you clarify how you define English letters ? (i.e are digits acceptable, punctuation, any ASCII, ...) as your question has triggered various differing interpretations. – jlliagre Nov 24 '11 at 22:28

5 Answers5

3

I presume that by “non-English letters” you mean letters other than the 26 unadorned letters of the Latin alphabet. Then, strictly speaking, here's a test that meets your requirements:

if tmp=${WD//[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]/};
   [[ $tmp = *[[:alpha:] ]* ]]; then
  # $WD contains letters other than A-Z and a-z or a space

That is, strip the English letters and see if there are any letters or spaces left.

I suspect that you're in fact trying to avoid all non-ASCII characters and all whitespace, including the ones that aren't letters such as ¿ or £ or ٣. You can do that by matching the characters that are not ! through ~ (i.e. the ASCII characters other than whitespace):

if (LC_ALL=C; [[ $WD = *[^!-~]* ]]) then …

Note that ranges like !-~ or A-Z don't always do what you'd expect when you have LC_COLLATE set. Hence we set LC_ALL to a known value (LC_ALL trumps all locale settings).

If you're checking for “unusual” characters in files (why else exclude even spaces, which are allowed on most modern platforms), it might make sense to have a more restricted lists that doesn't allow any nonportable characters. POSIX only allows ASCII letters, digits and -._.

if (LC_ALL=C; [[ $WD = *[^-._0-9A-Za-z]* ]]) then …
Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
  • That first test would be a lot simpler as `[[ -n "${WD//[a-zA-Z ]}" ]] && echo "I have special characters"` – phemmer Nov 23 '11 at 03:13
  • @Patrick. The first test is like that otherwise it would be subject to possible problems. Gilles' link, and further links on that page, explain it. The bottom line is that a **range** is not necessarily what **you** think the *range* is. The English `a` and `z` are just two chars to the computer, and the **range** between them is not necessarily *an immutable contiguous 26 letter alphabet*. Yes, they are contiguous as an ASCII or UNICODE range, but in a regex range statement, the **range** is based on the collating sequence – Peter.O Nov 23 '11 at 07:07
  • @Patrick That wouldn't work with many `LC_COLLATE` settings. You can kill `LC_COLLATE` while retaining `LC_CTYPE`, taking care of `LC_ALL` and `LANGUAGE`, but it's a lot more complicated than just listing the exact set of characters you want. – Gilles 'SO- stop being evil' Nov 23 '11 at 08:27
1

Regular expressions and grep is what are you looking for.

We match any non-English letter or digit or / (because it's a part of every path).

if [[ -n "$( pwd | grep -o -P "([^a-zA-Z0-9\/])*" )" ]]; then 
    echo "error"
fi

sed could be usable in that case too.

If may replace all correct symbols in ${WD} with '' and look if something is left. If resulting string have non-zero length - ${WD} is not correct.

So, if we are expecting only /, numbers and English letters.

if [[ -n "$( pwd | sed -r -e 's/([a-zA-Z0-9\/])*//g' )" ]]; then 
    echo "error"
fi
0

tr is slightly simpler than grep or sed in that case:

if [[ -n "$(echo $WD|tr -d '[:alnum:]/')" ]];then
  echo "gotcha"
fi
jlliagre
  • 60,319
  • 10
  • 115
  • 157
0

It seems pattern [a-z] is case-insensitive, so just as simple as:

[ -z "${PWD//[a-z\/]}" ] || echo "Bad chars in path: ${PWD//[a-z\/]}"
Lenik
  • 543
  • 1
  • 10
  • 20
  • This won't work in most `LC_COLLATE` settings (see [my answer](http://unix.stackexchange.com/q/25162/885) for explanation). Plus all punctuation and symbols were supposed to be allowed. – Gilles 'SO- stop being evil' Nov 23 '11 at 08:33
-1

Bash can do its own pattern matching.

if [[ ${WD} = *[^[:alnum:]/]* ]]; then
  echo 'Baaaad.'
fi
ephemient
  • 15,640
  • 5
  • 49
  • 39