Add thousands separator in a number

Question

In python

 re.sub(r"(?<=.)(?=(?:...)+$)", ",", stroke )

To split a number by triplets, e.g.:

 echo 123456789 | python -c 'import sys;import re; print re.sub(r"(?<=.)(?=(?:...)+$)", ",",  sys.stdin.read());'
 123,456,789

How to do the same with bash/awk?

FWIW, Python has had built-in support for formatting numbers with the comma separator since 2010 (versions 2.7 and 3.1), so you don't need to do the regex trick. Example: `print('{:,}'.format(123456789)) #=> 123,456,789` — Mark Reed, Nov 14 '21 at 15:24

score 71 · Answer 1 · answered Feb 06 '14 at 22:40

71

bash's printf supports pretty much everything you can do in the printf C function

type printf           # => printf is a shell builtin
printf "%'d" 123456   # => 123,456

printf from coreutils will do the same

/usr/bin/printf "%'d" 1234567   # => 1,234,567

answered Feb 06 '14 at 22:40

Mikel

56,387
13
130
149

1

This is now supported in `zsh` too, updated post [here](http://unix.stackexchange.com/a/140320). – don_crissti Dec 13 '15 at 19:55
1

I'm on bash 4.1.2 and it doesn't support... :( – msb Jan 31 '17 at 19:19
@msb It seems to depend on your system's `vsnprintf`. On a GNU/Linux system, glibc appears to have supported it since at least 1995. – Mikel Feb 01 '17 at 01:50
2

Note printf uses the thousands [separator for your current locale](https://www.cyberciti.biz/faq/unix-linux-bash-number-formatting-in-with-thousand-separator/), which might be a comma, dot, or nothing at all. You can `export LC_NUMERIC="en_US"` if you want to force commas. – medmunds Mar 27 '17 at 18:31
2

Get list of supported locale's with `locale -a`. I had to use `en_US.utf8` – eludom May 11 '18 at 09:48
@eludom and beware `LC_ALL`, since `LC_ALL=C` overrides `LC_NUMERIC=en_US.utf8`. – RonJohn Apr 20 '21 at 13:57
1

Watch out for leading zeroes. They will cause `printf` to treat the number as an octal value (e.g. 01 = 1, 011=9, 0111=73) and return the equivalent decimal. Put another way, this code: `num="0123456"; result=$(printf "%'d" "$num"); echo "$result"` will return a result of 42,798 and not the expected result of 123,456. – MrPotatoHead May 03 '21 at 20:09
Also if you want to pipe the number to `printf` then you can use xargs: `echo 12346789 | xargs printf "%'d"` – Cory Klein Aug 02 '22 at 23:01
doesn't seem to work with Alpine linux/busybox: `sh: %'d: invalid format` – xref Apr 25 '23 at 19:30

score 47 · Accepted Answer · edited Feb 06 '14 at 22:06

47

With sed:

$ echo "123456789" | sed 's/\([[:digit:]]\{3\}\)\([[:digit:]]\{3\}\)\([[:digit:]]\{3\}\)/\1,\2,\3/g'
123,456,789

(Note that this only works for exactly 9 digits!)

or this with sed:

$ echo "123456789" | sed ':a;s/\B[0-9]\{3\}\>/,&/;ta'
123,456,789

With printf:

$ LC_NUMERIC=en_US printf "%'.f\n" 123456789
123,456,789

edited Feb 06 '14 at 22:06

Gilles 'SO- stop being evil'

807,993
194
1,674
2,175

answered Feb 06 '14 at 07:20

slm

363,520
117
767
871

I'm also trying with awk but it's add comma at the last `echo 123456789 | awk '$0=gensub(/(...)/,"\\1,","g")'` – Rahul Patil Feb 06 '14 at 07:56
now I get but it's seems complex `echo 123456789 | awk '$0=gensub(/(...)/,"\\1,","g"){sub(",$",""); print}'` – Rahul Patil Feb 06 '14 at 08:07
1

That first `sed` only works if the number is exactly 9 digits. The `printf` doesn't work on zsh. Thus the second `sed` answer is probably the best. – phemmer Feb 06 '14 at 13:51
1

@RahulPatil That only works properly if the number of digits is a multiple of 3. Try with "12345678" and you'll see what I mean. – phemmer Feb 06 '14 at 13:52
@Johan - yeah, just confirming that that doesn't work for me either. – slm Nov 27 '14 at 13:15
1

You can do `echo 123456789 | awk '{printf ("%'\''d\n", $0)}'` (which evidently doesn't always work on Linux!?, but works fine on AIX and Solaris) – Johan Nov 28 '14 at 09:36
@DepressedDaniel - it looks like you're hitting the byte limit that a float can handle. – slm Feb 23 '17 at 03:46

score 18 · Answer 3 · answered Jan 21 '18 at 16:10

18

You can use numfmt:

$ numfmt --grouping 123456789
123,456,789

Or:

$ numfmt --g 123456789
123,456,789

Note that numfmt is not a POSIX utility, it is part of GNU coreutils.

answered Jan 21 '18 at 16:10

Zombo

1
5
43
62

1

A neat thing about long options is that they can be abbreviated (as long as the abbreviation is unique). For example, ``ls --color`` can be written `ls --col` (but not `ls --co`, because there is also a `--context` option). It turns out that `--grouping` is `numfmt`’s only option that begins with **`g`**, so `--g` is a unique abbreviation. – G-Man Says 'Reinstate Monica' Feb 27 '20 at 08:37
2

Just don't use abbreviated forms in scripts, because some day, the one you used might stop working. – Ville Laurikari Sep 13 '20 at 15:20
This does not work for me in coreutils 8.30. (I wonder if `LC_ALL=C` overrides `LC_NUMERIC=en_US.utf8`?) – RonJohn Apr 20 '21 at 07:50
And that was the problem... – RonJohn Apr 20 '21 at 07:51

drl · Answer 4 · 2018-06-03T10:48:46.947

7

cat <<'EOF' |
13407807929942597099574024998205846127479365820592393377723561443721764030073546976801874298166903427690031858186486050853753882811946569946433649006084096
EOF
perl -wpe '1 while s/(\d+)(\d\d\d)/$1,$2/;'

produces:

13,407,807,929,942,597,099,574,024,998,205,846,127,479,365,820,592,393,377,723,561,443,721,764,030,073,546,976,801,874,298,166,903,427,690,031,858,186,486,050,853,753,882,811,946,569,946,433,649,006,084,096

This is accomplished by splitting the string of digits into 2 groups, the right-hand group with 3 digits, the left-hand group with whatever remains, but at least one digit. Then everything is replaced by the 2 groups, separated by a comma. This continues until the substitution fails. The options "wpe" are for error listing, enclose the statement inside a loop with an automatic print, and take the next argument as the perl "program" (see command perldoc perlrun for details).

Best wishes ... cheers, drl

edited Jun 03 '18 at 10:48

answered Oct 16 '17 at 20:16

drl

838
7
8

Thanks to anonymous for the feedback. Even a downvote can be useful, but only if explained -- please comment on what you saw that was wrong. Thanks ... cheers – drl May 21 '18 at 15:21
I think the downvote here is because you did not explain what the command does. The OP asked for a `BASH`/`AWK` alternative so he may not have used `PERL` before. In any case, best to explain what the command does - especially so for one-liners. – AnthonyK Jun 03 '18 at 02:42
@AnthonyK -- thanks for probable explanation. I added comments to briefly explain how it works. I think alternative solutions are often useful, but your point about possibly not having used perl is noted ... cheers – drl Jun 03 '18 at 10:53
I tried the sed and python suggestions on this page. The perl script was the only one that worked for a whole file. The file was filed with text and numbers. – Mark Aug 24 '18 at 14:23

score 5 · Answer 5 · edited Apr 03 '19 at 13:29

awk and bash have good built-in solutions, based on printf, as described in the other answers. But first, sed.

For sed, we need to do it "manually". The general rule is that if you have four consecutive digits, followed by a non-digit (or end-of-line) then a comma should be inserted between the first and second digit.

For example,

echo 12345678 | sed -re 's/([0-9])([0-9]{3})($|[^0-9])/\1,\2\3/'

will print

12345,678

We obviously need to then keep repeating the process, in order to keep adding enough commas.

sed -re ' :restart ; s/([0-9])([0-9]{3})($|[^0-9])/\1,\2\3/ ; t restart '

In sed, the t command specifies a label that will be jumped to if the last s/// command was successful. I therefore define a label with :restart, in order that it jumps back.

Here is a bash demo (on ideone) that works with any number of digits:

function thousands {
    sed -re ' :restart ; s/([0-9])([0-9]{3})($|[^0-9])/\1,\2\3/ ; t restart '
}                                                 
echo 12 | thousands
echo 1234 | thousands
echo 123456 | thousands
echo 1234567 | thousands
echo 123456789 | thousands
echo 1234567890 | thousands

This is like an addon for bash lover, i was strungled with the locale stuff, thankfully found this gem. very practical, just need to paste this one line of code somewhere inside the shell, and use it whenever needed. Btw, it's hard to understand what the code does, the only thing i know is i can replace the ”,” with ”.” — CuriousNewbie, Mar 16 '22 at 16:17

score 3 · Answer 6 · edited Apr 03 '19 at 13:53

3

With some awk implementations:

echo "123456789" | awk '{ printf("%'"'"'d\n",$1); }'  

123,456,789

"%'"'"'d\n" is: "%(single quote)(double quote)(single quote)(double quote)(single quote)d\n"

That will use the configured thousand separator for your locale (typically , in English locales, space in French, . in Spanish/German...). Same as returned by locale thousands_sep

edited Apr 03 '19 at 13:53

Stéphane Chazelas

522,931
91
1,010
1,501

answered Oct 16 '16 at 17:02

Ben

39
1

1

a maybe cleaner way is : `awk '{printf("%\047d\n",$1); }'`, and awk will translate the octal 047 into a single quot before interpreting the printf string, avoiding you this weird string of quote and doublequotes. You can force a specific locale (*if* it is installed...) : `LC_NUMERIC=en_US awk ....` (or fr_FR if you want spaces, and if it is installed) – Olivier Dulac Sep 05 '20 at 15:45

Anthony Geoghegan · Answer 7 · 2019-04-03T10:37:40.483

A common use case for me is to modify the output of a command pipeline so that decimal numbers are printed with thousand separators. Rather than writing a function or script, I prefer to use a technique that I can customise on the fly for any output from a Unix pipeline.

I have found printf (provided by Awk) to be the most flexible and the memorable way to to accomplish this. The apostrophe/single quote character is specified by POSIX as a modifier to format decimal numbers and has the advantage that it’s locale-aware so it’s not restricted to using comma characters.

When running Awk commands from a Unix shell, there can be difficulties entering a singe-quote character inside a string delimited by single-quotes (to avoid shell expansion of positional variables, e.g., $1). In this case, I find the most readable and reliable way to enter the single-quote character is to enter it as an octal escape sequence (beginning with \0).

Example:

printf "first 1000\nsecond 10000000\n" |
  awk '{printf "%9s: %11\047d\n", $1, $2}'

  first:       1,000
 second:  10,000,000

Simulated output of a pipeline showing which directories are using the most disk space:

printf "7654321 /home/export\n110384 /home/incoming\n" |
  awk '{printf "%22s: %9\047d\n", $2, $1}'

  /home/export: 7,654,321
/home/incoming:   110,384

Other solutions are listed in How to escape a single quote inside awk.

Note: as warned against in Print a Single Quote, it’s recommended to avoid the use of hexadecimal escape sequences as they do not work reliably across different systems.

Of all the awk-based answers listed on here, this one is most certainly the most graceful (IMHO). One doesn't need to hack in a quote with other quotes like in other solutions. — TSJNachos117, Apr 03 '19 at 06:57
Thanks @TSJNachos117 The hardest part is remembering that the octal encoding for the apostrophe character is `\047`. — Anthony Geoghegan, Apr 24 '19 at 12:24

Stéphane Chazelas · Answer 8 · 2019-04-03T14:12:10.977

2

A bash/awk (as requested) solution that works regardless of the length of the number and uses , regardless of the locale's thousands_sep setting, and wherever the numbers are in the input and avoids adding the thousand separator after in 1.12345:

echo not number 123456789012345678901234567890 1234.56789 |
  awk '{while (match($0, /(^|[^.0123456789])[0123456789]{4,}/))
        $0 = substr($0, 1, RSTART+RLENGTH-4) "," substr($0, RSTART+RLENGTH-3)
        print}'

Gives:

not number 123,456,789,012,345,678,901,234,567,890 1,234.56789

With awk implementations like mawk that don't support the interval regex operators, change the regexp to /(^|[^.0123456789])[0123456789][0123456789][0123456789][0123456789]+/

edited Apr 03 '19 at 14:12

answered Apr 03 '19 at 13:59

Stéphane Chazelas

522,931
91
1,010
1,501

This is great for using with any shell output, thanks. – kilves76 Dec 24 '19 at 06:54
Wow, finally found a correct thousand separator with `awk` in shell, because I tried many other ways but they behave different somehow different with some different numbers. Thanks sir. – Saeed Aug 26 '23 at 20:14

score 1 · Answer 9 · answered Jan 09 '17 at 18:15

1

$ echo 1232323 | awk '{printf(fmt,$1)}' fmt="%'6.3f\n"
12,32,323.000

answered Jan 09 '17 at 18:15

Akshay Hegde

363
1
6

Michael Benedict · Answer 10 · 2017-10-16T19:37:16.710

If you are looking at BIG numbers I was unable to make the above solutions work. For example, lets get a really big number:

$ echo 2^512 |bc -l|tr -d -c [0-9] 13407807929942597099574024998205846127479365820592393377723561443721764030073546976801874298166903427690031858186486050853753882811946569946433649006084096

Note I need the tr to remove backslash newline output from bc. This number is too big to treat as a float or fixed bit number in awk, and I don't even want to build a regexp large enough to account for all the digits in sed. Rather, I can reverse it and put commas between groups of three digits, then unreverse it:

echo 2^512 |bc -l|tr -d -c [0-9] |rev |sed -e 's/$[0-9][0-9][0-9]$/\1,/g' |rev 13,407,807,929,942,597,099,574,024,998,205,846,127,479,365,820,592,393,377,723,561,443,721,764,030,073,546,976,801,874,298,166,903,427,690,031,858,186,486,050,853,753,882,811,946,569,946,433,649,006,084,096

Good answer. However, I've never encountered a problem using large numbers with Awk. I tried your example on a number of Red Hat and Debian-based distributions but in all cases, Awk had no problem with the large number. I thought some more about it and it occurred to me that all the systems I had experimented on were 64-bit (even a very old VM running unsupported RHEL 5). It wasn’t until I tested an old lap-top running a 32-bit OS that I was able to replicate your issue: `awk: run time error: improper conversion(number 1) in printf("%'d`. — Anthony Geoghegan, Jan 11 '19 at 21:49

score 1 · Answer 11 · edited Jun 03 '18 at 08:08

1

a="13407807929942597099574024998205846127479365820592393377723561443721764030073546976801874298166903427690031858186486050853753882811946569946433649006084096"

echo "$a" | rev | sed "s#[[:digit:]]\{3\}#&,#g" | rev

13,407,807,929,942,597,099,574,024,998,205,846,127,479,365,820,592,393,377,723,561,443,721,764,030,073,546,976,801,874,298,166,903,427,690,031,858,186,486,050,853,753,882,811,946,569,946,433,649,006,084,096

edited Jun 03 '18 at 08:08

Stéphane Chazelas

522,931
91
1,010
1,501

answered Jun 03 '18 at 06:53

user2796674

81
1
2

That adds a spurious leading comma if the number of digits in the number is a multiple of 3. – Stéphane Chazelas Jun 03 '18 at 08:09
@StéphaneChazelas: You could take the output of that last rev command, and pipe it to `sed 's/^,//g'`. – TSJNachos117 Apr 03 '19 at 06:51

erik · Answer 12 · 2021-06-24T11:30:50.040

I also wanted to have the part after the decimal separator correctly separated/spaced, therefore I wrote this sed-script which uses some shell variables to adjust to regional and personal preferences. It also takes into account different conventions for the number of digits grouped together:

#DECIMALSEP='\.' # usa                                                                                                               
DECIMALSEP=','   # europe
    
#THOUSSEP=','  # usa
#THOUSSEP='\.' # europe
#THOUSSEP='_'  # underscore
#THOUSSEP=' '  # space
THOUSSEP=' '   # thinspace
    
# group before decimal separator
#GROUPBEFDS=4   # china
GROUPBEFDS=3    # europe and usa
    
# group after decimal separator
#GROUPAFTDS=5   # used by many publications 
GROUPAFTDS=3
    
    
function digitgrouping {
# FIXME: This is a workaround: BEGINNING has to be marked (and after                                                                
# alteration removed) for the first number to be spaced correctly (1234
# should be 1 234, and that only works if something is in front of that
# number).
sed -e 's%^%BEGINNING&%' \
  -e '
  s%\([0-9'"$DECIMALSEP"']\+\)'"$THOUSSEP"'%\1__HIDETHOUSSEP__%g
  :restartA ; s%\([0-9]\)\([0-9]\{'"$GROUPBEFDS"'\}\)\(['"$DECIMALSEP$THOUSSEP"']\)%\1'"$THOUSSEP"'\2\3% ; t restartA
  :restartB ; s%\('"$DECIMALSEP"'\([0-9]\{'"$GROUPAFTDS"'\}\'"$THOUSSEP"'\)*\)\([0-9]\{'"$GROUPAFTDS"'\}\)\([0-9]\)%\1\3'"$THOUSSEP"'\4% ; t restartB
  :restartC ; s%\([^'"$DECIMALSEP"'][0-9]\+\)\([0-9]\{'"$GROUPBEFDS"'\}\)\($\|[^0-9]\)%\1'"$THOUSSEP"'\2\3% ; t restartC
  s%__HIDETHOUSSEP__%\'"$THOUSSEP"'%g' \
  -e 's%^BEGINNING%%'

}

This will fail in the UK. Why not just use the Locale? There are no separators after the decimal point. — Jeremy Boden, Jun 23 '21 at 18:49
The thousand separator is different in Spain (`.`), France (space) and Ireland (`,`) which are all part of the EU. — Stéphane Chazelas, Jun 24 '21 at 11:49
See also `locale thousands_sep` and `locale -k LC_NUMERIC` (on GNU systems at least) — Stéphane Chazelas, Jun 24 '21 at 11:50
Note that it assumes GNU `sed`. POSIXly and with several `sed` implementations, you can't have other commands after branching ones (`:`, `t`, `b`...). `\+` is also a GNU extension. — Stéphane Chazelas, Jun 24 '21 at 11:54
Beware `[0-9\.]` also matches on backslash (and potentially many characters or collating elements beside 0123456789 that sort between 0 and 9) — Stéphane Chazelas, Jun 24 '21 at 11:55

score 0 · Answer 13 · edited Jun 23 '21 at 17:18

0

The following uses space as thousands separator, which is the practice at my place. Modifying it for using comma should be easy.

echo "1000066955"|sed -rn "s/([[:digit:]])([[:digit:]]{3})$/\1 \2/;T end;:loop s/([[:digit:]])([[:digit:]]{3})[[:space:]]/\1 \2 /;t loop;:end p;"

edited Jun 23 '21 at 17:18

Jeff Schaller

66,199
35
114
250

answered Jun 23 '21 at 15:51

P V Mathew

1
1

Stock Exchange · Answer 14 · 2021-08-18T06:44:21.217

= Number grouping formatting using Perl RegEx =

[
|*| Source: https://unix.stackexchange.com/a/656655
|*| Last update: CE 2021-08-18 06:44 UTC ]


Number grouping formatting (e.g. turning "1000000" into "1,000,000"; approximation of `numfmt --grouping`) using Perl RegEx:
(Unix Shell)
[
    PERLIO=':raw:utf8' exec '/usr/bin/perl' -p \
    -e 'BEGIN { $^H |= 0x02800000; $^H{reflags_charset} = 4; $/ = undef(); }' \
    -e '

    sub f {
    $x1 = $1;
    $x2 = $2;
#
# [
    if ( length( $x1 ) > 3 ) {
    pos( $x1 ) = length( $x1 ) % 3;
    $x1 =~ s/\G.{3}/ ( pos( $x1 ) != 0 ? "," : "" ).${&}; /gse;
    };
# ]
#
# Would work but inefficient:
# [
#   $x1 =~ s/(?<=\d)(?=(\d+))/ ( length( $1 ) % 3 != 0 ? "" : "," ); /ge;
# ]
# ,
# [
#   $x1 =~ s/(?<=\d)(?=(?:\d{3})+(?!\d))/,/g;
# ]
#
    "${x1}${x2}";
    };

    s/(?<![\w#&)*,.\/:;=-\@\[-\]`{-}])([0-9]+)(\.[0-9]+)?(?![\w#\$&(*\-\/<=\@\[-\]`{-}]|\.[^\W0-9])/ f(); /geu;

    ' \
    "$@";
]
[ Explanation Needed ]


Test case:
(Console Log (Unix) )
[
> \
    { nf <<\EOF
0.000000
10.000000
100.000000
1000.000000
10000.000000
100000.000000
1000000.000000
10000000.000000
100000000.000000
1000000000.000000
10000000000.000000
100000000000.000000
1000000000000.000000
EOF
    } | nf; # Verified idempotence.

0.000000
10.000000
100.000000
1,000.000000
10,000.000000
100,000.000000
1,000,000.000000
10,000,000.000000
100,000,000.000000
1,000,000,000.000000
10,000,000,000.000000
100,000,000,000.000000
1,000,000,000,000.000000
]
[ Alternatively: Try the full text of this message. ]




See also:
|*| "perlrun" - how to execute the Perl interpreter # "-i''[extension]''": https://perldoc.perl.org/perlrun#-i%5Bextension%5D

Welcome to the site, and thank you for your contribution. Please use the proper formatting tools for your post, as currently it is very difficult to separate explanations from actual code. — AdminBee, Jul 02 '21 at 07:34
[@AdminBee](https://unix.stackexchange.com/users/377345): https://pastebin.com/wDmNQJpD — Stock Exchange, Jul 02 '21 at 13:06

Add thousands separator in a number

14 Answers14

Linked

Related