How to find out if PWD contains spaces or non-English letters?

Question

I created an environment variable:

WD=`pwd`

How can I check if it contains spaces or non-English letters?

Can you clarify how you define English letters ? (i.e are digits acceptable, punctuation, any ASCII, ...) as your question has triggered various differing interpretations. — jlliagre, Nov 24 '11 at 22:28

score 3 · Accepted Answer · edited Apr 13 '17 at 12:36

I presume that by “non-English letters” you mean letters other than the 26 unadorned letters of the Latin alphabet. Then, strictly speaking, here's a test that meets your requirements:

if tmp=${WD//[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]/};
   [[ $tmp = *[[:alpha:] ]* ]]; then
  # $WD contains letters other than A-Z and a-z or a space

That is, strip the English letters and see if there are any letters or spaces left.

I suspect that you're in fact trying to avoid all non-ASCII characters and all whitespace, including the ones that aren't letters such as ¿ or £ or ٣. You can do that by matching the characters that are not ! through ~ (i.e. the ASCII characters other than whitespace):

if (LC_ALL=C; [[ $WD = *[^!-~]* ]]) then …

Note that ranges like !-~ or A-Z don't always do what you'd expect when you have LC_COLLATE set. Hence we set LC_ALL to a known value (LC_ALL trumps all locale settings).

If you're checking for “unusual” characters in files (why else exclude even spaces, which are allowed on most modern platforms), it might make sense to have a more restricted lists that doesn't allow any nonportable characters. POSIX only allows ASCII letters, digits and -._.

if (LC_ALL=C; [[ $WD = *[^-._0-9A-Za-z]* ]]) then …

That first test would be a lot simpler as `[[ -n "${WD//[a-zA-Z ]}" ]] && echo "I have special characters"` — phemmer, Nov 23 '11 at 03:13
@Patrick. The first test is like that otherwise it would be subject to possible problems. Gilles' link, and further links on that page, explain it. The bottom line is that a **range** is not necessarily what **you** think the *range* is. The English `a` and `z` are just two chars to the computer, and the **range** between them is not necessarily *an immutable contiguous 26 letter alphabet*. Yes, they are contiguous as an ASCII or UNICODE range, but in a regex range statement, the **range** is based on the collating sequence — Peter.O, Nov 23 '11 at 07:07
@Patrick That wouldn't work with many `LC_COLLATE` settings. You can kill `LC_COLLATE` while retaining `LC_CTYPE`, taking care of `LC_ALL` and `LANGUAGE`, but it's a lot more complicated than just listing the exact set of characters you want. — Gilles 'SO- stop being evil', Nov 23 '11 at 08:27

score 1 · Answer 2 · answered Nov 22 '11 at 19:50

1

Regular expressions and grep is what are you looking for.

We match any non-English letter or digit or / (because it's a part of every path).

if [[ -n "$( pwd | grep -o -P "([^a-zA-Z0-9\/])*" )" ]]; then 
    echo "error"
fi

sed could be usable in that case too.

If may replace all correct symbols in ${WD} with '' and look if something is left. If resulting string have non-zero length - ${WD} is not correct.

So, if we are expecting only /, numbers and English letters.

if [[ -n "$( pwd | sed -r -e 's/([a-zA-Z0-9\/])*//g' )" ]]; then 
    echo "error"
fi

answered Nov 22 '11 at 19:50

ДМИТРИЙ МАЛИКОВ

6,919
5
33
32

You'll probably want to allow `.` in the path too. – Kevin Nov 23 '11 at 04:49
There is nothing about it in the question. – ДМИТРИЙ МАЛИКОВ Nov 23 '11 at 07:29
1

This won't work in most `LC_COLLATE` settings (see [my answer](http://unix.stackexchange.com/q/25162/885) for explanation). Plus all punctuation and symbols were supposed to be allowed. – Gilles 'SO- stop being evil' Nov 23 '11 at 08:32

score 0 · Answer 3 · answered Nov 22 '11 at 21:56

0

tr is slightly simpler than grep or sed in that case:

if [[ -n "$(echo $WD|tr -d '[:alnum:]/')" ]];then
  echo "gotcha"
fi

answered Nov 22 '11 at 21:56

jlliagre

60,319
10
115
157

This won't work in most character sets, where `[:alnum:]` contains non-English letters. Plus all digits, punctuation and symbols were supposed to be allowed. – Gilles 'SO- stop being evil' Nov 23 '11 at 08:32

score 0 · Answer 4 · answered Nov 23 '11 at 03:03

0

It seems pattern [a-z] is case-insensitive, so just as simple as:

[ -z "${PWD//[a-z\/]}" ] || echo "Bad chars in path: ${PWD//[a-z\/]}"

answered Nov 23 '11 at 03:03

Lenik

543
1
10
20

This won't work in most `LC_COLLATE` settings (see [my answer](http://unix.stackexchange.com/q/25162/885) for explanation). Plus all punctuation and symbols were supposed to be allowed. – Gilles 'SO- stop being evil' Nov 23 '11 at 08:33

score -1 · Answer 5 · answered Nov 22 '11 at 22:34

-1

Bash can do its own pattern matching.

if [[ ${WD} = *[^[:alnum:]/]* ]]; then
  echo 'Baaaad.'
fi

answered Nov 22 '11 at 22:34

ephemient

15,640
5
49
39

This won't work in most character sets, where `[:alnum:]` contains non-English letters. Plus all digits, punctuation and symbols were supposed to be allowed. – Gilles 'SO- stop being evil' Nov 23 '11 at 08:34

How to find out if PWD contains spaces or non-English letters?

5 Answers5