Bash script: split word on each letter

Question

How can I split a word's letters, with each letter in a separate line?

For example, given "StackOver" I would like to see

S
t
a
c
k
O
v
e
r

I'm new to bash so I have no clue where to start.

jimmij · Accepted Answer · 2016-01-05T22:07:41.967

35

I would use grep:

$ grep -o . <<<"StackOver"
S
t
a
c
k
O
v
e
r

or sed:

$ sed 's/./&\n/g' <<<"StackOver"
S
t
a
c
k
O
v
e
r

And if empty space at the end is an issue:

sed 's/\B/&\n/g' <<<"StackOver"

All of that assuming GNU/Linux.

edited Jan 05 '16 at 22:07

answered Jan 04 '16 at 23:49

jimmij

46,064
19
123
136

grep -o . <<< ¿¿¿ .. -o searches for the PATTERN provided right? and what it does here in your command? – Sijaan Hallak Jan 05 '16 at 00:06
@SijaanHallak `grep` searches for pattern, an in this example it searches for every character `.` and prints it in the separate line. See also `sed ` solution. – jimmij Jan 05 '16 at 00:08
Thanks! so this "." dot means every character.. Can you please give me a link where I can read about things such as this dot? or what ar these things called? – Sijaan Hallak Jan 05 '16 at 00:15
Note that both `-o` and `\n` are a GNU extension. `<<<` is a zsh extension (also available in recent versions of ksh93 and the GNU shell (bash)). – Stéphane Chazelas Jan 05 '16 at 00:20
@SijaanHallak The best manual you have already on you computer, just run `man grep` and then just look for the chapter "REGULAR EXPRESSIONS" (if that is what you are interested in). – jimmij Jan 05 '16 at 00:27
Second answer would produce a new line after last... – Avinash Raj Jan 05 '16 at 09:32
1

@jimmij I cant find any help on what <<< really does! any help? – Sijaan Hallak Jan 05 '16 at 10:50
4

@SijaanHallak This is so called `Here string`, grosso modo equivalent of `echo foo | ...` just less typing. See http://www.tldp.org/LDP/abs/html/x17837.html – jimmij Jan 05 '16 at 11:02
@jimmij the second solution here seems to have a problem. it prints a new line at the end! I changed it to this `sed -e 's/./\n&/g' <<< "$1"` But this prints a new line at the beggining.. any suggestion how to overcome this? – Sijaan Hallak Jan 05 '16 at 17:34
1

@SijaanHallak change `.` to `\B` (doesn't match on word boundary). – jimmij Jan 05 '16 at 17:40
@jimmij \B will not work as it prints "Stack Over" -> the "O" will be printed near the letter "k" at the same line and then it does `\n` – Sijaan Hallak Jan 05 '16 at 17:55
I ended up uding this! it works perfectly ` a=`sed 's/./&\n/g' <<<"$1"` ` `echo "$a" | sed 's/\b/&/' ` – Sijaan Hallak Jan 05 '16 at 18:31
1

@SijaanHallak - you can drop the second `sed` like: `sed -et -e's/./\n&/g;//D'` – mikeserv Jan 06 '16 at 06:30
cool solution – Sharuzzaman Ahmat Raslan Dec 20 '19 at 13:16

Stéphane Chazelas · Answer 2 · 2018-08-31T06:34:23.317

20

You may want to break on grapheme clusters instead of characters if the intent is to print text vertically. For instance with a e with an acute accent:

With grapheme clusters (e with its acute accent would be one grapheme cluster):
```
$ perl -CLAS -le 'for (@ARGV) {print for /\X/g}' $'Ste\u301phane'
S
t
é
p
h
a
n
e
```
(or grep -Po '\X' with GNU grep built with PCRE support)

With characters (here with GNU grep):

$ printf '%s\n' $'Ste\u301phane' | grep -o .
S
t
e

p
h
a
n
e

fold is meant to break on characters, but GNU fold doesn't support multi-byte characters, so it breaks on bytes instead:
```
$ printf '%s\n' $'Ste\u301phane' | fold -w 1
S
t
e
�
�
p
h
a
n
e
```

On StackOver which only consists of ASCII characters (so one byte per character, one character per grapheme cluster), all three would give the same result.

edited Aug 31 '18 at 06:34

answered Jan 05 '16 at 00:07

Stéphane Chazelas

522,931
91
1,010
1,501

I'm surprised `grep -Po` doesn't do what one would expect (like `grep -P` does). – jimmij Jan 05 '16 at 00:19
@jimmij, what do you mean? `grep -Po .` finds characters (and a combining acute accent following a newline character is invalid), and `grep -Po '\X'` finds graphem clusters for me. You may need a recent version of grep and/or PCRE for it to work properly (or try `grep -Po '(*UTF8)\X'`) – Stéphane Chazelas Jan 05 '16 at 00:23
3

@SijaanHallak These might be helpful: http://www.joelonsoftware.com/articles/Unicode.html, http://eev.ee/blog/2015/09/12/dark-corners-of-unicode/ – jpmc26 Jan 05 '16 at 21:55
Are you claiming that `$'e\u301'` is equivalent/equal to `é` ? – Jun 15 '21 at 17:20
@Isaac, no, I'm not claiming any such thing though there are some definitions of "equivalent" for which that would be true. – Stéphane Chazelas Jun 15 '21 at 17:22
Your description seems to imply that because Perl is able to join together characters and accents (much like a text editor join them to select an specific glyph) other software should be able also. But no, not all programs are text editors, Nor all utilities understand the **complex** (specially in Hangul) set of rules to join some individual Unicode codepoints (https://www.unicode.org/reports/tr29/ and search for Devanagari kshi). So, no, nor grep, sed or fold understand any of this issue (yet). – Jun 16 '21 at 03:26

score 7 · Answer 3 · answered Jan 05 '16 at 01:42

7

If you have perl6 in your box:

$ perl6 -e 'for @*ARGS -> $w { .say for $w.comb }' 'cường'       
c
ư
ờ
n
g

work regardless of your locale.

answered Jan 05 '16 at 01:42

cuonglm

150,973
38
327
406

score 6 · Answer 4 · edited Jan 05 '16 at 22:11

6

With many awk versions

awk -F '' -v OFS='\n' '{$1=$1};1' <<<'StackOver'

edited Jan 05 '16 at 22:11

Stéphane Chazelas

522,931
91
1,010
1,501

answered Jan 05 '16 at 04:16

iruvar

16,515
8
49
81

Great! But on my version of nAWK ("One True AWK") that doesn't work. However this does the trick: `awk -v FS='' -v OFS='\n' '{$1=$1};1'` _(wondering if that's more portable since `-F ''` might yield the ERE: `//`)_ – eruve Feb 04 '19 at 06:48

joeytwiddle · Answer 5 · 2016-01-06T10:19:13.083

You can use the fold (1) command. It is more efficient than grep and sed.

$ time grep -o . <bigfile >/dev/null

real    0m3.868s
user    0m3.784s
sys     0m0.056s
$ time fold -b1 <bigfile >/dev/null

real    0m0.555s
user    0m0.528s
sys     0m0.016s
$

One significant difference is that fold will reproduce empty lines in the output:

$ grep -o . <(printf "A\nB\n\nC\n\n\nD\n")
A
B
C
D
$ fold -b1 <(printf "A\nB\n\nC\n\n\nD\n")
A
B

C


D
$

score 5 · Answer 6 · edited Jan 05 '16 at 14:37

5

echo StackOver | sed -e 's/./&\n/g'
S
t
a
c
k
O
v
e
r

edited Jan 05 '16 at 14:37

mikeserv

57,448
9
113
229

answered Jan 05 '16 at 04:11

henderson

49
1

This won't help as it prints a new line at the end – Sijaan Hallak Jan 05 '16 at 10:56

score 4 · Answer 7 · edited Jan 05 '16 at 07:27

4

You can handle multibyte characters like:

<input \
dd cbs=1 obs=2 conv=unblock |
sed -e:c -e '/^.*$/!N;s/\n//;tc'

Which can be pretty handy when you're working with live input because there's no buffering there and a character is printed as soon it is whole.

edited Jan 05 '16 at 07:27

cuonglm

150,973
38
327
406

answered Jan 05 '16 at 01:12

mikeserv

57,448
9
113
229

1

NP, should we add a note about the locale? – cuonglm Jan 05 '16 at 09:35
Does not work for combining characters like Stéphane Chazelas answer, but with proper normalization this should not matter. – Kijewski Jan 05 '16 at 13:06
@Kay - it's works for combining characters if you *want* it to - that's what `sed` scripts are for. i'm not likely to write one right about now - im pretty sleepy. it's really useful, though, when reading a terminal. – mikeserv Jan 05 '16 at 14:30
@cuonglm - if you like. it should just work for the locale, given a sane libc, though. – mikeserv Jan 05 '16 at 14:33
Note that `dd` will break multibyte characters, so the output will not be text anymore so the behaviour of sed will be unspecified as per POSIX. – Stéphane Chazelas Jan 05 '16 at 22:09
@StéphaneChazelas - do you have a link to reference that statement? a NUL can't occur in a multibyte character, and a dot can only match a whole character which is not NUL, and it has worked with every `sed` i've tried. how could it not work? – mikeserv Jan 06 '16 at 02:06
oh wait - you mean because input isn't a text file. possibly, but sed is spec'd to handle conditions which exceed/break text file specs, too, such as 4k pattern spaces scripts which is well beyond line max. its also spec'd to evaluate chars bytewise w/ `l` - even when a single char is multiple bytes. i think the text file restriction for sed is probably based on the NUL prohibition - many seds replace `delimiter` in their scripts w/ NULs, and ive never managed to seek past a NUL in pattern space with heirloom sed except with D and G. – mikeserv Jan 06 '16 at 02:50

score 4 · Answer 8 · edited Jan 05 '16 at 07:24

4

The below will be generic:

$ awk -F '' \
   'BEGIN { RS = ""; OFS = "\n"} {for (i=1;i<=NF;i++) $i = $i; print }' <file_name>

edited Jan 05 '16 at 07:24

slm

363,520
117
767
871

answered Jan 05 '16 at 06:56

user150073

41
1

score 4 · Answer 9 · edited Jan 06 '16 at 10:12

4

Also Python 2 can be used from the command line:

python <<< "for x in 'StackOver':
   print x"

or:

echo "for x in 'StackOver':
    print x" | python

or (as commented by 1_CR) with Python 3:

python3 -c "print(*'StackOver',sep='\n')"

edited Jan 06 '16 at 10:12

terdon

234,489
66
447
667

answered Jan 05 '16 at 11:57

agold

533
5
12

score 4 · Answer 10 · edited Jan 05 '16 at 22:34

4

Since you specifically asked for an answer in bash, here's a way to do it in pure bash:

while read -rn1; do echo "$REPLY" ; done <<< "StackOver"

Note that this will catch the newline at the end of the "here document". If you want to avoid that, but still iterate over the characters with a bash loop, use printf to avoid the newline.

printf StackOver | while read -rn1; do echo "$REPLY" ; done

edited Jan 05 '16 at 22:34

Stéphane Chazelas

522,931
91
1,010
1,501

answered Jan 05 '16 at 22:16

wyrm

543
2
8

score 3 · Answer 11 · answered Jan 05 '16 at 09:31

3

You may use word boundaries also..

$ perl -pe 's/(?<=.)(\B|\b)(?=.)/\n/g' <<< "StackOver"
S
t
a
c
k
O
v
e
r

answered Jan 05 '16 at 09:31

Avinash Raj

3,653
4
20
34

score 3 · Answer 12 · 2021-06-15T16:39:17.523

3

In bash:

This works with any text and with only bash internals (no external utility called), so, should be fast on short strings.

str="StackOvér áàéèëêếe"

[[ $str =~ ${str//?/(.)} ]]           # Use a regex to split.
printf '%s\n' "${BASH_REMATCH[@]:1}"  # Print all characters.

Output:

S
t
a
c
k
O
v
é
r
 
á
à
é
è
ë
ê
ế
e

edited Jun 15 '21 at 16:39

answered Nov 23 '16 at 21:37

Yunus · Answer 13 · 2017-05-20T21:49:51.013

s=stackoverflow;

$ time echo $s | fold -w1                                                                                                                                          
s                                                                                                                                                                          
t                                                                                                                                                                          
a                                                                                                                                                                          
c                                                                                                                                                                          
k                                                                                                                                                                          
o                                                                                                                                                                          
v
e
r

real    0m0.014s
user    0m0.000s
sys     0m0.004s

updates here is the hacky|fastest|pureBashBased way !

$ time eval eval printf \'%s\\\\n\' \\\${s:\{0..$((${#s}-1))}:1}
s
t
a
c
k
o
v
e
r

real    0m0.001s
user    0m0.000s
sys     0m0.000s

for more awesomeness

function foldh () 
{ 
    if (($#)); then
        local s="$@";
        eval eval printf \'%s\\\\n\' \\\"\\\${s:\{0..$((${#s}-1))}:1}\\\";
    else
        while read s; do
            eval eval printf \'%s\\\\n\' \\\"\\\${s:\{0..$((${#s}-1))}:1}\\\";
        done;
    fi
}
function foldv () 
{ 
    if (($#)); then
        local s="$@";
        eval eval echo \\\"\\\${s:\{0..$((${#s}-1))}:1}\\\";
    else
        while read s; do
            eval eval echo \\\"\\\${s:\{0..$((${#s}-1))}:1}\\\";
        done;
    fi
}

Will this ever give different results to [`fold -b1`](https://unix.stackexchange.com/a/253562/90751) ? — JigglyNaga, Jul 25 '16 at 12:10
since each byte have a width=1 the result will be the same ! — Yunus, Jul 25 '16 at 12:30
So how is this not a duplicate of [the earlier answer](http://unix.stackexchange.com/a/253562/90751)? — JigglyNaga, Jul 25 '16 at 13:17
because it shows tha same cmd with different argyment , and that is nice to know . — Yunus, Jul 25 '16 at 13:45
An eval could be a big risk, a double `eval` is even more risky. Specially with arbitrary input from `$s`. Just saying !! — , Jun 15 '21 at 17:05

score 1 · Answer 14 · edited Aug 31 '18 at 11:30

1

read -a var <<< $(echo "$yourWordhere" | grep -o "." | tr '\n' ' ')

this will split your word and store it in array var.

edited Aug 31 '18 at 11:30

αғsнιη

40,939
15
71
114

answered Aug 31 '18 at 03:43

Chinmay Katil

21
1

score 1 · Answer 15 · edited Feb 13 '19 at 03:16

1

for x in $(echo "$yourWordhere" | grep -o '.')
do
    code to perform operation on individual character $x of your word
done

edited Feb 13 '19 at 03:16

phuclv

2,001
1
16
41

answered Sep 12 '18 at 07:28

Chinmay Katil

21
1

score 1 · Answer 16 · answered May 01 '23 at 12:43

On bash 4.2 and up (I tested 4.2.46 and 5.1), with extglobs you can use and empty "zero or one" match:

# shopt -s extglob
# V="StackOverflow"
# echo -e ${V//?()/\\n}
S
t
a
c
k
O
v
e
r
f
l
o
w

It also works to split your string into an array:

# A=( ${V//?()/ } )
# declare -p A

declare -a A='([0]="S" [1]="t" [2]="a" [3]="c" [4]="k" [5]="O" [6]="v" [7]="e" [8]="r" [9]="f" [10]="l" [11]="o" [12]="w")'

score 0 · Answer 17 · answered Jun 15 '21 at 21:57

0

Using Raku (formerly known as Perl_6)

~$ echo "StackOvér áàéèëêếe" | raku -ne '.chars.put;'
18
~$ echo "StackOvér áàéèëêếe" | raku -ne '.put for .comb;'
S
t
a
c
k
O
v
é
r

á
à
é
è
ë
ê
ế
e

https://raku.org/

answered Jun 15 '21 at 21:57

jubilatious1

2,385
8
16

Bash script: split word on each letter

17 Answers17

Linked

Related