13

I would like to split a string into substrings, separated by some separator (which is also a string itself).

How can I do that

  • using bash only? (for minimalism, and my main interest)

  • or If allowing some text processing program? (for convenience when the program is available)

Thanks.

Simple example,

  • split 1--123--23 by -- into 1, 123 and 23.
  • split 1?*123 by ?* into 1 and 123
Tim
  • 98,580
  • 191
  • 570
  • 977
  • 1
    Could you show an example of your input and expected output? – choroba Jul 14 '17 at 20:30
  • With a fixed number of substrings, or variable? – Jeff Schaller Jul 14 '17 at 20:32
  • No. The number of substrings in the result depends on the original string and the separator. @JeffSchaller – Tim Jul 14 '17 at 20:33
  • 2
    It wasn't a true/false question, I was asking what type of input you're asking about. – Jeff Schaller Jul 14 '17 at 20:35
  • You're better off using the external programs (`sed`, `awk`) that are *precisely* designed for such text processing. But that's for practicality and getting things done. If this is a question of academic interest as I suspect it is, of course, that wouldn't apply. – Wildcard Jul 14 '17 at 23:24

6 Answers6

15

Pure bash solution, using IFS and read. Note that the strings shouldn't contain $'\2' (or whatever else you use for IFS, unfortunately $'\0' doesn't work, but e.g. $'\666' does):

#!/bin/bash

split_by () {
    string=$1
    separator=$2

    tmp=${string//"$separator"/$'\2'}
    IFS=$'\2' read -a arr <<< "$tmp"
    for substr in "${arr[@]}" ; do
        echo "<$substr>"
    done
    echo
}


split_by '1--123--23' '--'
split_by '1?*123' '?*'

Or use Perl:

perl -E 'say for split quotemeta shift, shift' -- "$separator" "$string"
choroba
  • 45,735
  • 7
  • 84
  • 110
  • +1. personally, i'd have separator as the first arg, same as perl. also allows looping over all remaining args and splitting them too: `separator="$1" ; shift ; for string in "$@" ; do ... ; done` – cas Jul 17 '17 at 03:23
  • FYI, i wrote a join function for bash because I missed it from perl. https://unix.stackexchange.com/a/299362/7696 – cas Jul 17 '17 at 03:26
  • Thanks. Is the purpose of `tmp=${string//"$separator"/$'\2'}` to replace a separator `$separator` which might be more than one character long to a separator `\2` just one character in length? If yes, why isn't it rewritten as `tmp=${string/"$separator"/'\2'}` instead? What do `//` and `/$` mean? – Tim Aug 01 '17 at 13:30
  • @Tim: exactly. That's how IFS works. – choroba Aug 01 '17 at 13:34
  • @Tim: `'\2'` are 2 characters, `$'\2'` is one. `//` is global replace, see Parameter Expansion in `man bash`. – choroba Aug 01 '17 at 13:35
  • Thanks. What does `$` before`'\2'` mean? – Tim Aug 01 '17 at 13:41
  • @Tim: It interprets `\2` as the ASCII 02. See "Quoting" in `man bash`. – choroba Aug 01 '17 at 14:11
6

Pure POSIX shell:

string="1--123--23"
del="--"

while test "${string#*$del}" != "$string" ; do
  echo "${string%%$del*}"
  string="${string#*$del}"
done
echo "$string"

Note that * or ? need to be escaped in the delimiter: del='\*'

Philippos
  • 13,237
  • 2
  • 37
  • 76
5

Simply with awk:

str="1--123--23"
awk -F'--' '{ for(i=1;i<=NF;i++) print $i }' <<< $str

The output:

1
123
23

Another short Python solution:

splitter.py script:

import sys
print('\n'.join(sys.argv[2].split(sys.argv[1])))

arguments order:

  • sys.argv[0] - script name (i.e. splitter.py)

  • sys.argv[1] - substring separator

  • sys.argv[2] - input string

Usage:

python splitter.py "?*" "1?*123"
1
123

python splitter.py "--" "1--23--123"
1
23
123
RomanPerekhrest
  • 29,703
  • 3
  • 43
  • 67
1
#! /bin/bash
#  (GPL3+) Alberto Salvia Novella (es20490446e)


substring () {
    string=${1}
    separator=${2}
    position=${3}

    substring=${string//"${separator}"/$'\2'}
    IFS=$'\2' read -a substring <<< "${substring}"
    echo ${substring[${position}]}
}


substring ${@}
0

Similar to above but say you just want to get the URI for example:

URL="http://something.com/backup/v/photos/path/to/"
URI="./$(echo $URL | awk -F'.com/' '{print $2}')"
echo $URI
Mike Q
  • 149
  • 5
  • There already is an `awk` solution that even better addresses the problem. – Philippos Sep 20 '17 at 14:33
  • 1
    These other ones didn't really work for me so I just wanted to share something I was trying to do .. Is this not useful ? – Mike Q Sep 20 '17 at 17:54
  • 1
    The question was to cut a string into an unknown number of parts. Yours does only cut the last piece of the string and could be replaced by a simple `URI="${URL#*.com}"` – Philippos Sep 21 '17 at 05:51
0

Using Raku (formerly known as Perl_6)

Briefly, split destructively removes a separator, which can be defined using a regex pattern:

~$ echo "string1_sep_string2" | raku -ne '.split(/ _sep_ /).put;'

Output:

string1 string2

You can also join after split-ting:

~$ echo "string1_sep_string2" | raku -ne '.split(/ _sep_ /).join(",").put;'

Output:

string1,string2

To look at the resultant elements one-per-line, iterate over the output with for. Adding perl or raku to the output call double-quotes the output, indicating that Raku understands the elements to be strings:

~$ echo "string1_sep_string2" | raku -ne '.raku.put for .split(/ _sep_ /);'

Output:

"string1"
"string2"

Note: you'll occasionally run into issues were empty strings remain after split-ting. These are removed by adding :skip-empty to the split call, as in .split(/ _sep_ /, :skip-empty)`. See examples below.

https://docs.raku.org/routine/split
https://raku.org

jubilatious1
  • 2,385
  • 8
  • 16