3

I have a text file with rows, and columns within those rows. I want to, essentially, replicate the Excel "fill down" function. In other words, if there is a blank "cell" on a line, it will look to the line above it and fill down the value in the corresponding field above it. Example, with "^" being used as column separator:

London^Paris^Moscow^Berlin
^^Melbourne^New York^Washington
^^^Sydney^Singapore^New Delhi
^^New York^Washington
Kuala Lumpur^Bangkok^Hong Kong
^^^^Johannesburg^Sydney^Singapore^New Delhi
Amsterdam
^^Bucharest

... becomes (filled down items in CAPS for clarity):

London^Paris^Moscow^Berlin
LONDON^PARIS^Melbourne^New York^Washington
LONDON^PARIS^MMELBOURNE^Sydney^Singapore^New Delhi
LONDON^PARIS^New York^Washington
Kuala Lumpur^Bangkok^Hong Kong
KUALA LUMPUR^BANGKOK^HONG KONG^^Johannesburg^Sydney^Singapore^New Delhi
Amsterdam
AMSTERDAM^^Bucharest

So essentially the function is to identify any blank field, and, where there is one, pull the value down from the corresponding field above it. Important to be able to specify the field/column separators. Any ideas ?

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
Simonmdr
  • 141
  • 1
  • 1
  • 6
  • You can do this with Awk; if I have some time later I'll write an answer. You'll probably have more than one by then. ;) – Wildcard Nov 01 '16 at 01:50
  • The 3rd and 8th lines of output confuse me; are they correct? – Jeff Schaller Nov 01 '16 at 02:46
  • Jeff Schaller - yes I believe so. Take the 3rd line: there are THREE empty fields at the beginning of the line (if zero "^"s at beginning of line then no empty fields, if one "^" at beginning of line, then 1 empty field, if 2 at beginning then 2, and if 3 at beginning then 3). So there the three items pulled in from line above are "LONDON^PARIS^MELBOURNE". 8th line: two empty fields at beginning. Pulls in 1st field "Amsterdam" from above, but 2nd field stays empty because there is nothing to pull down. – Simonmdr Nov 01 '16 at 02:50

1 Answers1

2
awk -F'^' -v OFS='^' \
   '{
        for (i = 1; i <= NF; i++) {
                if ($i == "") $i = save[i]
                else          save[i] = $i
        }
        for (i = NF+1; i <= 99; i++) save[i] = ""
        print
    }'

This uses an array called save to save the values from one line to the next (filling down).  Everything in awk is implicitly initialized to null, so the save array will be blank until data are assigned to it.  (Therefore, any blank fields in the first row will be left blank, because there’s no prior data to copy down into them.)

For each line,

  • For each field that exists on this line (for (i = 1; i <= NF; i++)),
    • if it’s blank, fill it with the saved (copied) value for this column,
    • otherwise, save the current value for possible use on subsequent lines.
  • Then erase the save array for all columns that aren’t present on the current line.  For example, when we get to row 4 (which has only four fields), we “forget” that columns 5 and 6 of row 3 contained “Singapore” and “New Delhi”.  And when we get to row 7 (which has only one field), we “forget” that column 2 of row 5 contained “Bangkok”, so it is not available to be entered into row 8 (where column 2 is blank).

    Replace the 99 with the largest number of fields you expect to see in one row (ever).

  • Umm, not sure where the input filename and output filename go here. – Simonmdr Nov 02 '16 at 04:54
  • You can treat the entire, multi-line command the same as you would a simple command like `cat` or `grep`: specify the input filename at the end (after the `}'`).  If you want to redirect the output, then put `> ` *`output_filename`* after that. – G-Man Says 'Reinstate Monica' Nov 02 '16 at 05:47
  • OK so I've put the following in a bat file, and even with the "pause" at the end the DOS Command window flashes up for an instant, and disappears without generating output. Because it flashes up and then away, can't see the error message: cd c:\0temp\ gawk -F'^' -v OFS='^' \ '{ for (i = 1; i <= NF; i++) { if ($i == "") $i = save[i] else save[i] = $i } for (i = NF+1; i <= 99; i++) save[i] = "" print }' input.txt > out.txt pause – Simonmdr Nov 02 '16 at 11:48
  • Tried to edit previous comment and can't, some sort of 5 minute rule (for real ?!). I do not understand the comment editing functionality on this site. Thought two spaces at end of line meant a line break ? I'm about to give up – Simonmdr Nov 02 '16 at 11:56
  • Wait, what?  bat file?  "pause"?  DOS Command window?  Are you doing this on a Unix/Linux (or Unix-like) system, or not? – G-Man Says 'Reinstate Monica' Nov 02 '16 at 20:19
  • My mistake I think. I'm using Cygwin under Windows ... Should have made that clear – Simonmdr Nov 02 '16 at 20:43
  • OK; that shouldn't make a difference. I developed my answer on Cygwin (and I didn't use bat files, "pause", or DOS Command windows). I suggest that you create a shell script. Begin with a line that says `#!/bin/sh`, add the multi-line `awk` command, and add `"$@"` at the end (after the `}'`). – G-Man Says 'Reinstate Monica' Nov 03 '16 at 05:37