4

So, I've a simple nested loop where the outer for loops over all the files in a directory and the inner for loops over all characters of these filenames.

#!/bin/bash

if [ $# -lt 1 ]
then
        echo "Please provide an argument"
        exit
fi

for file in `ls $1`
do
        for ch in $file
        do
                echo $ch
        done
done

The script above doesn't work. The inner loop doesn't loop over all the characters in the filename but instead loops over the entire thing.

UPDATE:

Based on @ilkkachu's answer I was able to come up with the following script and it works as expected. But I was curious can we not use the for...in loop to iterate over strings?

#!/bin/bash

if [ $# -lt 1 ]
then
        echo "Please provide an argument"
        exit
fi

for file in `ls $1`; do
        for ((i=0; i<${#file}; i++)); do
                printf "%q\n" "${file:i:1}"
        done
done
  • 3
    Never use `ls` to gather a list of files for scripted usage! https://unix.stackexchange.com/questions/128985/why-not-parse-ls-and-what-to-do-instead – Marcus Müller Jul 04 '21 at 15:57
  • @MarcusMüller I am new to this, could you please explain why this is bad and what should I do instead? – Som Shekhar Mukherjee Jul 04 '21 at 15:58
  • 1
    I added a link to my comment that explains that. – Marcus Müller Jul 04 '21 at 15:58
  • 1
    I would change "character" to "byte" in the title, since "character" isn't well defined without an encoding, and has multiple definitions if you're dealing with Unicode. – l0b0 Jul 05 '21 at 00:53
  • To add to Marcus Müller's comment, if you are entering the filename as an argument, why do you use ls anyway? – Wastrel Jul 05 '21 at 16:35
  • @MarcusMüller The answers to that question have some great stuff. The question is exceedingly long and confusing (I think the OP was editing it to argue with the answers?) – Ben Jul 06 '21 at 01:19
  • @Wastrel I don't wish to loop over the directory provided as an argument, but I want to loop over all the files inside that directory. – Som Shekhar Mukherjee Jul 06 '21 at 06:44
  • @SomShekharMukherjee again, don't use `ls` for that; that's bad. `for file in "$1"/*` just works. – Marcus Müller Jul 06 '21 at 11:16
  • If you have a new question, please ask it separately. And seriously, don't use `ls`. It isn't needed here and it just makes your script less likely to work. ilkkachu gave you a version without `ls`, so use that! For more details on why parsing `ls` is bad, see: https://mywiki.wooledge.org/ParsingLs – terdon Jul 06 '21 at 12:56

6 Answers6

9

Since you're using Bash:

#!/bin/bash
word=foobar
for ((i=0; i < ${#word}; i++)); do
   printf "char: %q\n" "${word:i:1}" 
done

${var:p:k} gives k characters of var starting at position p, ${#var} is the length of the contents of var. printf %q prints the output in an unambiguous format, so e.g. a newline shows as $'\n'.

ilkkachu
  • 133,243
  • 15
  • 236
  • 397
  • +1, You `i` miss the `$` in the word expansion – DanieleGrassini Jul 04 '21 at 16:17
  • @roaima Oh! Amanzig! God to know – DanieleGrassini Jul 04 '21 at 16:20
  • 4
    @DanieleGrassini, the inside of a `for (( ))` and the index and offset in `${var:p:k}` are arithmetic contexts, and plain variable names without the `$` work there. Or you could do stuff like `${word:i+1:1}`. The `${#word}` needs the `${}` though, no way around that. – ilkkachu Jul 04 '21 at 16:20
  • 1
    @DanieleGrassini more details at https://github.com/koalaman/shellcheck/wiki/SC2004 – glenn jackman Jul 04 '21 at 17:17
  • +1, This is a really succinct and beginner friendly answer, I could achieve what I was looking for. But I had a question can we not use the `for..in` loop to iterate over strings? – Som Shekhar Mukherjee Jul 06 '21 at 06:39
  • 1
    @SomShekharMukherjee, no, you'd need to be able to split the string to individual elements somehow, and the shell can't do splitting between each and every character. `for x in $var` would word-split `$var` on the characters specified in `IFS`, but an empty `IFS` means no splitting, not splitting everywhere (unlike in Perl, where `split "", "abcd"` would split into characters). Also, since there are no real types, it's hard to tell a single-element list from a single word. Unlike in Python where `for i in "abcd":` is different from `for i in ["abcd"]:`. – ilkkachu Jul 06 '21 at 12:11
8

When the strings become larger than a few hundred characters (yes, unlikely for filenames), using a for-loop over the string length and extracting the character at index i becomes very slow.

This answer uses advanced bash techniques:

while IFS= read -r -d "" -n 1 char; do
    # do something with char, like adding it to an array
    chars+=( "$char" )
done < <(printf '%s' "$string")

# inspect the array
declare -p chars

That uses a Process Substitution to redirect the string into the while-read loop. I'm using printf to avoid adding a newline onto the end of the string. The main advantage of using a process substitution instead of printf ... | while read ... is the loop executes in the current shell, not a subshell.

I once got curious about the magnitude of the slowness and benchmarked it.

glenn jackman
  • 84,176
  • 15
  • 116
  • 168
  • heh, funny, I wonder if it copies the whole string around with `${var:p:k}`. Ksh isn't any better there. – ilkkachu Jul 04 '21 at 19:25
  • That `${#string}` is surprisingly slow makes me think bash is walking a linked list. But I haven't looked into the code at all. – glenn jackman Jul 05 '21 at 13:45
  • not just `${#var}`, but the substring expansion too. For a 9999 char string, I got about 1.5 s for `for ((i=0; i < ${#word}; i++)); do : ${word:i:1} ; done`, 1.2 s with `${#word}` replaced with a constant var, and 0.08 s with the substring expansion removed in addition. – ilkkachu Jul 05 '21 at 15:44
5

For completeness, even though the question is tagged , an alternative that uses POSIX shell features only:

#!/bin/sh
for fname
do
  while [ "${#fname}" -ge 1 ]
  do
    rest=${fname#?} char=${fname%"$rest"} fname=$rest
    printf '%s\n' "$char"       # Do something with the current character
  done
done

What the inner loop does:

  • set rest to the value of fname minus its first character;

  • assign the single character obtained by removing rest from the end of fname to char;

  • set fname to the value of rest and repeat until all characters are processed.

Note the quotes in ${fname%"$rest"}, needed to prevent $rest's expansion from being used as a pattern.

As an aside, for file in `ls $1` should be avoided. The most obvious reason is that it breaks if a file name contains any character that happens to be in IFS. More on this at Bash Pitfall n. 1, including what you should do instead.

fra-san
  • 9,931
  • 2
  • 21
  • 42
4
#!/bin/sh

for name do
    printf 'name="%s"\n' "$name"
    
    printf '%s\n' "$name" | fold -w 1 |
    while IFS= read -r character; do
        printf 'character="%s"\n' "$character"
    done
done

The outer loop here just loops over the arguments given to the script. Each argument is printed as is, and then passed through fold -w 1, which creates a stream of single characters separated by newline characters. This stream is then read by the inner loop, which prints each character in turn.

Testing:

$ sh script *
name="script"
character="s"
character="c"
character="r"
character="i"
character="p"
character="t"
$ sh script /*bin*
name="/bin"
character="/"
character="b"
character="i"
character="n"
name="/sbin"
character="/"
character="s"
character="b"
character="i"
character="n"

By changing the printf that prints the full pathnames into fold to basename "$name", you get only the filename portion of the pathnames in the inner loop:

$ sh script /sbin/l*
name="/sbin/ldattach"
character="l"
character="d"
character="a"
character="t"
character="t"
character="a"
character="c"
character="h"
name="/sbin/ldconfig"
character="l"
character="d"
character="c"
character="o"
character="n"
character="f"
character="i"
character="g"
Kusalananda
  • 320,670
  • 36
  • 633
  • 936
1

Use the bash string slicing operator:

s="string"
for c in $(seq 0 $((${#s}-1)));  do echo "${s:c:1}"; done

s
t
r
i
n
g

Applied to your script can be:

#!/bin/bash
for f in *; do
    echo "Char in $f:"
    for i in $(seq 0 $((${#f}-1))); do
        echo "${f:i:1}"
    done
done
DanieleGrassini
  • 2,769
  • 5
  • 17
0
  1. Loop over each filename in the current directory “*" (can be easily customised for specific needs “~/*.jpg” etc…) and populate string variable “ff” with each character composing the current filename in the loop (“$ff" can also be understood as an array of single characters with ${ff:10:1} being the single 10th character in the ff “string array”). Nota: if we wanted 2 characters rather from the 10th index we would write ${ff:10:2}
  2. echo “➜ ${ff}” simply to check current filename in the loop
  3. Loop over the “string array ${ff}” to echo each of its index values, from first (bash variable starts at index 0) to last (#ff gives total size of the array) - this is done by incrementing c by 1 (c++), from initial value of 0, and stop when it reaches #ff value
  4. echo a blank space before shifting to the next filename
for ff in *; do 
  echo -e "--> $ff" ; 
  for ((c=0;c<${#ff};c++)); do echo -e "${ff:$c:1}"; done ; 
  echo -e ""; 
done

Steps 2) and 4) (“echo” statements) are obviously for cosmetic purposes only and may be skipped.

docgyneco69
  • 61
  • 1
  • 3
  • Apart from the outer loop, wouldn't that be the same as the [accepted answer](https://unix.stackexchange.com/a/656965/377345)? – AdminBee Jul 06 '21 at 10:49
  • I guess you are not wrong, my bad! Bar aforementioned "outer loop” - as well as my irrepressible bias for the humble, if basic (sic!) **”echo”** command,vs. its more sophisticated, if less readable **“printf”** cousin. – docgyneco69 Jul 15 '21 at 21:36