19

Input:

201103 1 /mnt/hdd/PUB/SOMETHING
201102 7 /mnt/hdd/PUB/SOMETH ING
201103 11 /mnt/hdd/PUB/SO METHING
201104 3 /mnt/hdd/PUB/SOMET HING
201106 1 /mnt/hdd/PUB/SOMETHI NG

Desired output:

201103 01 /mnt/hdd/PUB/SOMETHING
201102 07 /mnt/hdd/PUB/SOMETH ING
201103 11 /mnt/hdd/PUB/SO METHING
201104 03 /mnt/hdd/PUB/SOMET HING
201106 01 /mnt/hdd/PUB/SOMETHI NG

How can I add a 0 if there is only a single digit, e.g. 1 in the "day" part? I need this date format: YYYYMM DD.

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
LanceBaynes
  • 39,295
  • 97
  • 250
  • 349

5 Answers5

18

Another solution: awk '{$2 = sprintf("%02d", $2); print}'

glenn jackman
  • 84,176
  • 15
  • 116
  • 168
15
$ sed 's/\<[0-9]\>/0&/' ./infile
201103 01 /mnt/hdd/PUB/SOMETHING
201102 07 /mnt/hdd/PUB/SOMETH ING
201103 11 /mnt/hdd/PUB/SO METHING
201104 03 /mnt/hdd/PUB/SOMET HING
201106 01 /mnt/hdd/PUB/SOMETHI NG
SiegeX
  • 8,669
  • 3
  • 34
  • 23
  • Can you explain how this works? This is the first time I'm looking at the `\<[0-9]\>` construct which I think is the one responsible for matching the single digits but not sure what this construct is called. Thanks. – sasuke Mar 12 '11 at 13:16
  • 2
    \< means: start of a 'word' ... [0-9] means a single digit from 0 to 9 ... \> means: end of a 'word' ... word: a token which is whitespace delimited (or begins/ends at start/end of the line, for \< and \> respectively)... PS. I just tried punctuation marks.. they are also delimiters. – Peter.O Mar 12 '11 at 13:46
  • 1
    You can also do this without capturing parentheses: `&` in the replacement string will use the matched LHS -- `sed 's/\<[0-9]\>/0&/'` – glenn jackman Mar 12 '11 at 14:33
  • Oh, wasn't aware that `<>` is a word boundary in shell regex syntax. Come to think of it, even `sed 's/\b[0-9]\b/0&/' also works. Thank you both. :) – sasuke Mar 12 '11 at 15:18
  • @sasuke: `<>` is a feature of *extended regex* (not of the shell, as such)... depending on which version and which options you use, `sed` and the `shell` can both use either *extended* or *standard* regex ... standard regex uses `\<\>` – Peter.O Mar 12 '11 at 17:19
  • @fred.bear thanks for taking the reigns on the explanations – SiegeX Mar 12 '11 at 19:19
  • @glenn good call on `&`. I'm all for saving keystrokes – SiegeX Mar 12 '11 at 19:20
2

Here is a (non-sed) way to use bash with extended regex..
This method, allows scope to do more complex processing of individual lines. (ie. more than just regex substitutions)

while IFS= read -r line ; do
    if [[ "$line" =~ ^(.+\ )([0-9]\ .+)$ ]]  
    then echo "${BASH_REMATCH[1]}0${BASH_REMATCH[2]}" 
    else echo "$line"
    fi
done <<EOF
201103 1 /mnt/hdd/PUB/SOMETHING
201102 7 /mnt/hdd/PUB/SOMETH ING
201103 11 /mnt/hdd/PUB/SO METHING
201104 3 /mnt/hdd/PUB/SOMET HING
201106 1 /mnt/hdd/PUB/SOMETHI NG
EOF

output:

201103 01 /mnt/hdd/PUB/SOMETHING
201102 07 /mnt/hdd/PUB/SOMETH ING
201103 11 /mnt/hdd/PUB/SO METHING
201104 03 /mnt/hdd/PUB/SOMET HING
201106 01 /mnt/hdd/PUB/SOMETHI NG
Peter.O
  • 32,426
  • 28
  • 115
  • 163
1

I would do something like this:

sed -E 's/ ([0-9]) / 0\1 /' ./input

This grabs lonely numbers, strips them of whitespace with a group ' ([0-9]) ', then places them back in with 0 and whitespace padding ' 0\1 '.

The -E option allows for modern RegEx expressions on OSX (so you don't have to use "\" so often), -r does the same thing on the linux systems I've tested.

Eric
  • 111
  • 1
-1
while read a b c
do 
new_format=$(printf "%02d" $b)
echo "$a $new_format $c"
done </tmp/input
jesse_b
  • 35,934
  • 12
  • 91
  • 140