24

How can I split a word's letters, with each letter in a separate line?

For example, given "StackOver" I would like to see

S
t
a
c
k
O
v
e
r

I'm new to bash so I have no clue where to start.

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
Sijaan Hallak
  • 499
  • 1
  • 7
  • 14

17 Answers17

35

I would use grep:

$ grep -o . <<<"StackOver"
S
t
a
c
k
O
v
e
r

or sed:

$ sed 's/./&\n/g' <<<"StackOver"
S
t
a
c
k
O
v
e
r

And if empty space at the end is an issue:

sed 's/\B/&\n/g' <<<"StackOver"

All of that assuming GNU/Linux.

jimmij
  • 46,064
  • 19
  • 123
  • 136
  • grep -o . <<< ¿¿¿ .. -o searches for the PATTERN provided right? and what it does here in your command? – Sijaan Hallak Jan 05 '16 at 00:06
  • @SijaanHallak `grep` searches for pattern, an in this example it searches for every character `.` and prints it in the separate line. See also `sed ` solution. – jimmij Jan 05 '16 at 00:08
  • Thanks! so this "." dot means every character.. Can you please give me a link where I can read about things such as this dot? or what ar these things called? – Sijaan Hallak Jan 05 '16 at 00:15
  • Note that both `-o` and `\n` are a GNU extension. `<<<` is a zsh extension (also available in recent versions of ksh93 and the GNU shell (bash)). – Stéphane Chazelas Jan 05 '16 at 00:20
  • @SijaanHallak The best manual you have already on you computer, just run `man grep` and then just look for the chapter "REGULAR EXPRESSIONS" (if that is what you are interested in). – jimmij Jan 05 '16 at 00:27
  • Second answer would produce a new line after last... – Avinash Raj Jan 05 '16 at 09:32
  • 1
    @jimmij I cant find any help on what <<< really does! any help? – Sijaan Hallak Jan 05 '16 at 10:50
  • 4
    @SijaanHallak This is so called `Here string`, grosso modo equivalent of `echo foo | ...` just less typing. See http://www.tldp.org/LDP/abs/html/x17837.html – jimmij Jan 05 '16 at 11:02
  • @jimmij the second solution here seems to have a problem. it prints a new line at the end! I changed it to this `sed -e 's/./\n&/g' <<< "$1"` But this prints a new line at the beggining.. any suggestion how to overcome this? – Sijaan Hallak Jan 05 '16 at 17:34
  • 1
    @SijaanHallak change `.` to `\B` (doesn't match on word boundary). – jimmij Jan 05 '16 at 17:40
  • @jimmij \B will not work as it prints "Stack Over" -> the "O" will be printed near the letter "k" at the same line and then it does `\n` – Sijaan Hallak Jan 05 '16 at 17:55
  • I ended up uding this! it works perfectly ` a=`sed 's/./&\n/g' <<<"$1"` ` `echo "$a" | sed 's/\b/&/' ` – Sijaan Hallak Jan 05 '16 at 18:31
  • 1
    @SijaanHallak - you can drop the second `sed` like: `sed -et -e's/./\n&/g;//D'` – mikeserv Jan 06 '16 at 06:30
20

You may want to break on grapheme clusters instead of characters if the intent is to print text vertically. For instance with a e with an acute accent:

  • With grapheme clusters (e with its acute accent would be one grapheme cluster):

    $ perl -CLAS -le 'for (@ARGV) {print for /\X/g}' $'Ste\u301phane'
    S
    t
    é
    p
    h
    a
    n
    e
    

    (or grep -Po '\X' with GNU grep built with PCRE support)

  • With characters (here with GNU grep):

    $ printf '%s\n' $'Ste\u301phane' | grep -o .
    S
    t
    e
    
    p
    h
    a
    n
    e
    
  • fold is meant to break on characters, but GNU fold doesn't support multi-byte characters, so it breaks on bytes instead:

    $ printf '%s\n' $'Ste\u301phane' | fold -w 1
    S
    t
    e
    �
    �
    p
    h
    a
    n
    e
    

On StackOver which only consists of ASCII characters (so one byte per character, one character per grapheme cluster), all three would give the same result.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • I'm surprised `grep -Po` doesn't do what one would expect (like `grep -P` does). – jimmij Jan 05 '16 at 00:19
  • @jimmij, what do you mean? `grep -Po .` finds characters (and a combining acute accent following a newline character is invalid), and `grep -Po '\X'` finds graphem clusters for me. You may need a recent version of grep and/or PCRE for it to work properly (or try `grep -Po '(*UTF8)\X'`) – Stéphane Chazelas Jan 05 '16 at 00:23
  • 3
    @SijaanHallak These might be helpful: http://www.joelonsoftware.com/articles/Unicode.html, http://eev.ee/blog/2015/09/12/dark-corners-of-unicode/ – jpmc26 Jan 05 '16 at 21:55
  • Are you claiming that `$'e\u301'` is equivalent/equal to `é` ? –  Jun 15 '21 at 17:20
  • @Isaac, no, I'm not claiming any such thing though there are some definitions of "equivalent" for which that would be true. – Stéphane Chazelas Jun 15 '21 at 17:22
  • Your description seems to imply that because Perl is able to join together characters and accents (much like a text editor join them to select an specific glyph) other software should be able also. But no, not all programs are text editors, Nor all utilities understand the **complex** (specially in Hangul) set of rules to join some individual Unicode codepoints (https://www.unicode.org/reports/tr29/ and search for Devanagari kshi). So, no, nor grep, sed or fold understand any of this issue (yet). –  Jun 16 '21 at 03:26
7

If you have perl6 in your box:

$ perl6 -e 'for @*ARGS -> $w { .say for $w.comb }' 'cường'       
c
ư
ờ
n
g

work regardless of your locale.

cuonglm
  • 150,973
  • 38
  • 327
  • 406
6

With many awk versions

awk -F '' -v OFS='\n' '{$1=$1};1' <<<'StackOver'
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
iruvar
  • 16,515
  • 8
  • 49
  • 81
  • Great! But on my version of nAWK ("One True AWK") that doesn't work. However this does the trick: `awk -v FS='' -v OFS='\n' '{$1=$1};1'` _(wondering if that's more portable since `-F ''` might yield the ERE: `//`)_ – eruve Feb 04 '19 at 06:48
6

You can use the fold (1) command. It is more efficient than grep and sed.

$ time grep -o . <bigfile >/dev/null

real    0m3.868s
user    0m3.784s
sys     0m0.056s
$ time fold -b1 <bigfile >/dev/null

real    0m0.555s
user    0m0.528s
sys     0m0.016s
$

One significant difference is that fold will reproduce empty lines in the output:

$ grep -o . <(printf "A\nB\n\nC\n\n\nD\n")
A
B
C
D
$ fold -b1 <(printf "A\nB\n\nC\n\n\nD\n")
A
B

C


D
$ 
joeytwiddle
  • 1,008
  • 9
  • 15
5
echo StackOver | sed -e 's/./&\n/g'
S
t
a
c
k
O
v
e
r
mikeserv
  • 57,448
  • 9
  • 113
  • 229
henderson
  • 49
  • 1
4

You can handle multibyte characters like:

<input \
dd cbs=1 obs=2 conv=unblock |
sed -e:c -e '/^.*$/!N;s/\n//;tc'

Which can be pretty handy when you're working with live input because there's no buffering there and a character is printed as soon it is whole.

cuonglm
  • 150,973
  • 38
  • 327
  • 406
mikeserv
  • 57,448
  • 9
  • 113
  • 229
  • 1
    NP, should we add a note about the locale? – cuonglm Jan 05 '16 at 09:35
  • Does not work for combining characters like Stéphane Chazelas answer, but with proper normalization this should not matter. – Kijewski Jan 05 '16 at 13:06
  • @Kay - it's works for combining characters if you *want* it to - that's what `sed` scripts are for. i'm not likely to write one right about now - im pretty sleepy. it's really useful, though, when reading a terminal. – mikeserv Jan 05 '16 at 14:30
  • @cuonglm - if you like. it should just work for the locale, given a sane libc, though. – mikeserv Jan 05 '16 at 14:33
  • Note that `dd` will break multibyte characters, so the output will not be text anymore so the behaviour of sed will be unspecified as per POSIX. – Stéphane Chazelas Jan 05 '16 at 22:09
  • @StéphaneChazelas - do you have a link to reference that statement? a NUL can't occur in a multibyte character, and a dot can only match a whole character which is not NUL, and it has worked with every `sed` i've tried. how could it not work? – mikeserv Jan 06 '16 at 02:06
  • oh wait - you mean because input isn't a text file. possibly, but sed is spec'd to handle conditions which exceed/break text file specs, too, such as 4k pattern spaces scripts which is well beyond line max. its also spec'd to evaluate chars bytewise w/ `l` - even when a single char is multiple bytes. i think the text file restriction for sed is probably based on the NUL prohibition - many seds replace `delimiter` in their scripts w/ NULs, and ive never managed to seek past a NUL in pattern space with heirloom sed except with D and G. – mikeserv Jan 06 '16 at 02:50
4

The below will be generic:

$ awk -F '' \
   'BEGIN { RS = ""; OFS = "\n"} {for (i=1;i<=NF;i++) $i = $i; print }' <file_name>
slm
  • 363,520
  • 117
  • 767
  • 871
user150073
  • 41
  • 1
4

Also Python 2 can be used from the command line:

python <<< "for x in 'StackOver':
   print x"

or:

echo "for x in 'StackOver':
    print x" | python

or (as commented by 1_CR) with Python 3:

python3 -c "print(*'StackOver',sep='\n')"
terdon
  • 234,489
  • 66
  • 447
  • 667
agold
  • 533
  • 5
  • 12
4

Since you specifically asked for an answer in bash, here's a way to do it in pure bash:

while read -rn1; do echo "$REPLY" ; done <<< "StackOver"

Note that this will catch the newline at the end of the "here document". If you want to avoid that, but still iterate over the characters with a bash loop, use printf to avoid the newline.

printf StackOver | while read -rn1; do echo "$REPLY" ; done
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
wyrm
  • 543
  • 2
  • 8
3

You may use word boundaries also..

$ perl -pe 's/(?<=.)(\B|\b)(?=.)/\n/g' <<< "StackOver"
S
t
a
c
k
O
v
e
r
Avinash Raj
  • 3,653
  • 4
  • 20
  • 34
3

In bash:

This works with any text and with only bash internals (no external utility called), so, should be fast on short strings.

str="StackOvér áàéèëêếe"

[[ $str =~ ${str//?/(.)} ]]           # Use a regex to split.
printf '%s\n' "${BASH_REMATCH[@]:1}"  # Print all characters.

Output:

S
t
a
c
k
O
v
é
r
 
á
à
é
è
ë
ê
ế
e
1
s=stackoverflow;

$ time echo $s | fold -w1                                                                                                                                          
s                                                                                                                                                                          
t                                                                                                                                                                          
a                                                                                                                                                                          
c                                                                                                                                                                          
k                                                                                                                                                                          
o                                                                                                                                                                          
v
e
r

real    0m0.014s
user    0m0.000s
sys     0m0.004s

updates here is the hacky|fastest|pureBashBased way !

$ time eval eval printf \'%s\\\\n\' \\\${s:\{0..$((${#s}-1))}:1}
s
t
a
c
k
o
v
e
r

real    0m0.001s
user    0m0.000s
sys     0m0.000s

for more awesomeness

function foldh () 
{ 
    if (($#)); then
        local s="$@";
        eval eval printf \'%s\\\\n\' \\\"\\\${s:\{0..$((${#s}-1))}:1}\\\";
    else
        while read s; do
            eval eval printf \'%s\\\\n\' \\\"\\\${s:\{0..$((${#s}-1))}:1}\\\";
        done;
    fi
}
function foldv () 
{ 
    if (($#)); then
        local s="$@";
        eval eval echo \\\"\\\${s:\{0..$((${#s}-1))}:1}\\\";
    else
        while read s; do
            eval eval echo \\\"\\\${s:\{0..$((${#s}-1))}:1}\\\";
        done;
    fi
}
Yunus
  • 1,634
  • 2
  • 13
  • 19
1
read -a var <<< $(echo "$yourWordhere" | grep -o "." | tr '\n' ' ')

this will split your word and store it in array var.

αғsнιη
  • 40,939
  • 15
  • 71
  • 114
1
for x in $(echo "$yourWordhere" | grep -o '.')
do
    code to perform operation on individual character $x of your word
done
phuclv
  • 2,001
  • 1
  • 16
  • 41
1

On bash 4.2 and up (I tested 4.2.46 and 5.1), with extglobs you can use and empty "zero or one" match:

# shopt -s extglob
# V="StackOverflow"
# echo -e ${V//?()/\\n}
S
t
a
c
k
O
v
e
r
f
l
o
w

It also works to split your string into an array:

# A=( ${V//?()/ } )
# declare -p A

declare -a A='([0]="S" [1]="t" [2]="a" [3]="c" [4]="k" [5]="O" [6]="v" [7]="e" [8]="r" [9]="f" [10]="l" [11]="o" [12]="w")'
0

Using Raku (formerly known as Perl_6)

~$ echo "StackOvér áàéèëêếe" | raku -ne '.chars.put;'
18
~$ echo "StackOvér áàéèëêếe" | raku -ne '.put for .comb;'
S
t
a
c
k
O
v
é
r

á
à
é
è
ë
ê
ế
e

https://raku.org/

jubilatious1
  • 2,385
  • 8
  • 16