8

I'm currently trying to modify an echoed string of alphanumeric characters by piping it to sed. The area I'm interested in modifying is a block of numbers, which I would like to make 8 characters long by inserting an appropriate number of zeros to the front of it. For example:

Input:

loremipsumdolorsit2367amet

Desired output:

loremipsumdolorsit00002367amet

Is it possible to do this with sed? The blocks of numbers I'm dealing with are less than 8 characters. I'm not sure if this post is useful.

Edit: there is only one block of numbers in each string.

Rui F Ribeiro
  • 55,929
  • 26
  • 146
  • 227
sourisse
  • 183
  • 1
  • 5

5 Answers5

12

This assumes that there is some non-numeric text after the number:

echo "loremipsumdolorsit2367amet" \
| sed -r -e 's/[0-9]/0000000&/' -e 's/0*([0-9]{8}[^0-9])/\1/'

Result: loremipsumdolorsit00002367amet

This doesn't assume that:

... | sed -r -e 's/[0-9]/0000000&/' -e 's/0*([0-9]{8}([^0-9]|$))/\1/'

These use the ERE patterns for sed. Many sed implementations use the -r flag to specify this; some use the -E flag; some use both. Currently the ability to use ERE patterns in sed is not part of the POSIX standard, but there's been talk of adding it to the standard. I haven't encountered a modern sed implementation which doesn't have the ability to use ERE patterns in one way or another.

In some cases there are things you can do using ERE patterns that you can't do using the default sed patterns (BREs). See the comments to this answer. But in the present case, the use of ERE patterns is just for convenience. As sch says in a comment, you can omit the -r (or -E) flag, and just replace the (...) with \(...\) and the {8} with \{8\}. My original answer made use of +, which is also not part of BRE patterns; but in fact this wasn't necessary to the solution.

dubiousjim
  • 2,648
  • 19
  • 27
  • Hmm, when I try this, I get the error sed: illegal option --- r – sourisse Oct 16 '12 at 03:25
  • Different seds use different flags for extended patterns. Some use `-r`, others use `-E`. Some accept both. – dubiousjim Oct 16 '12 at 03:26
  • Ah, great. Switching it to `-E` worked for me. What kinds of things determine which type of sed one has? – sourisse Oct 16 '12 at 03:31
  • Gnu sed accepts `-r`, so too BusyBox sed, FreeBSD accepts both, looks like Mac OS X accepts only `-E`. – dubiousjim Oct 16 '12 at 03:34
  • 2
    Neither -r nor -E are standard. The standard sed only accepts BRE (basic regular expressions). Just replace `+` with `\{1,\}` and `(...)` with `\(...\)` and `{...}` with `\{...\}` and it will be standard and you won't have to worry. (<<< is not standard either) – Stéphane Chazelas Oct 16 '12 at 07:08
  • Also, it returns 0000000123456789 for `echo 123456789 | sed -r -e 's/[0-9]+/0000000&/' -e 's/([^0-9]*)0*([0-9]{8}[^0-9]+)/\1\2/' ` – Stéphane Chazelas Oct 16 '12 at 07:13
  • @sch, thanks for prodding me, I do value portability and had just been lazy. I revised the answer. (After doing so, I saw you had submitted your own answer, which I've upvoted.) – dubiousjim Oct 16 '12 at 15:07
11

With any sed:

sed 's/[0-9]\{1,\}/0000000&/g;s/0*\([0-9]\{8,\}\)/\1/g'
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • 1
    Your `0*\([0-9]\{8,\}\)` is another good solution to not requiring non-numeric text after the numbers. It's more elegant than what I proposed in my edited answer, though my proposal also works---so long as the original numbers never exceed 8 digits, which the question does stipulate. But I do like the strategy you used. – dubiousjim Oct 16 '12 at 15:14
  • change the `[0-9]`s to `[0-9a-fA-F]` for hex. I just ran into this problem, nice solution btw. – MarcusJ Apr 16 '18 at 01:24
5

That's possible with perl:

perl -e 'my $a="loremipsumdolorsit2367amet"; $a =~ s/([0-9]+)/sprintf("%010d",$1)/e; print $a'

Got:

loremipsumdolorsit0000002367amet

To process all input lines:

perl -pe 's/([0-9]+)/sprintf("%010d",$1)/e'
Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
daisy
  • 53,527
  • 78
  • 236
  • 383
  • This is interesting, thank you. Is it possible for this command to accept the output of a pipe instead of requiring the string to be entered as a variable? I have a lot of such strings, so entering them all is too time-consuming... – sourisse Oct 16 '12 at 03:27
  • @sourisse Gilles has updated my answer – daisy Oct 16 '12 at 14:27
  • The only thing I got working (with a function cal in the replacement). Thanks! – Raphael Jun 03 '13 at 20:09
2

Also not sed, but here's a -slightly convoluted- (GNU) awk alternative:

echo loremipsumdolorsit2367amet | awk '{gsub( /[0-9]+/, sprintf( "%08d", gensub(/[^0-9]/, "","g"))); print}' 
loremipsumdolorsit00002367amet
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
tink
  • 6,160
  • 2
  • 21
  • 30
2

Using loops in sed:

echo loremipsumdolorsit2367amet | sed -r -e  :a -e 's/([^0-9])([0-9]{1,7}[^0-9])/\10\2/;ta'

Guru.

Guru
  • 5,855
  • 1
  • 19
  • 20