1

I've got a csv mappings.csv shown below. I'd like to extract a block of records into separate files relating to each record that has a non-empty first field. The files are shown after mappings.csv.

$ cat mappings.csv
TEST1,,,a,a,a,a
,,,b,b,b,b
,,,c,c,c,c
TEST2,,,aa,aa,aa,aa
,,,bb,bb,bb,bb
,,,cc,cc,cc,cc
,,,dd,dd,dd,dd
TEST3,,,aaa,aaa,aaa,aaa
,,,bbb,bbb,bbb,bbb

Output files based on mappings.csv is below:

$ cat TEST1.csv
TEST1,,,a,a,a,a
,,,b,b,b,b
,,,c,c,c,c
$ cat TEST2.csv
TEST2,,,aa,aa,aa,aa
,,,bb,bb,bb,bb
,,,cc,cc,cc,cc
,,,dd,dd,dd,dd
$ cat TEST3.csv
TEST3,,,aaa,aaa,aaa,aaa
,,,bbb,bbb,bbb,bbb

I'm able to use awk to print lines that have a non-empty first field, but can't figure out how to expand and print subsequent records up to the next non-empty first field:

$ awk -F',' '$1' mappings.csv
TEST1,,,a,a,a,a
TEST2,,,aa,aa,aa,aa
TEST3,,,aaa,aaa,aaa,aaa

The other aspect of this problem is separating out the results into separate files. One thing I could do is use the matching record numbers to print out lines between the two. Something kind of like this:

$ awk -F',' '$1 {print NR}' mappings.csv
1
4
8
Ed Morton
  • 28,789
  • 5
  • 20
  • 47
EarthIsHome
  • 225
  • 2
  • 8
  • 1
    I'm sure this has been asked + answered before - you can just do something like `awk -F, '$1 != "" {close(f); f = $1 ".csv"} {print > f}' mappings.csv` – steeldriver Aug 02 '21 at 17:02
  • @steeldriver Wow, this works great! What is going in the second action: `{print > f}`? In the first action, a nonempty first field matches, it closes the file and assigns the file name to `f`?. – EarthIsHome Aug 02 '21 at 17:11
  • 1
    OK this seems sufficiently different from previous questions - at any rate, I have added an answer below – steeldriver Aug 02 '21 at 17:34

1 Answers1

4

Although very similar question have been asked and answered before, e.g.

I can't find an exact duplicate where the file names should be taken only from non-empty values of a specified column. So given:

$ cat mappings.csv 
TEST1,,,a,a,a,a
,,,b,b,b,b
,,,c,c,c,c
TEST2,,,aa,aa,aa,aa
,,,bb,bb,bb,bb
,,,cc,cc,cc,cc
,,,dd,dd,dd,dd
TEST3,,,aaa,aaa,aaa,aaa
,,,bbb,bbb,bbb,bbb

then

awk -F, '$1 != "" {close(f); f = $1 ".csv"} {print > f}' mappings.csv

results in

$ head TEST*
==> TEST1.csv <==
TEST1,,,a,a,a,a
,,,b,b,b,b
,,,c,c,c,c

==> TEST2.csv <==
TEST2,,,aa,aa,aa,aa
,,,bb,bb,bb,bb
,,,cc,cc,cc,cc
,,,dd,dd,dd,dd

==> TEST3.csv <==
TEST3,,,aaa,aaa,aaa,aaa
,,,bbb,bbb,bbb,bbb

The first action closes the file named f (if one is open), then constructs a new value for f by concatenating the (non-empty) value of the first field $1 with suffix .csv. The second action prints records to a file whose name is the (current) value of variable f. Note that it will error out if f is empty, which will be the case if there are any lines before the first non-empty value of $1.

Some awk implementations may take care of closing files for you in which case you don't need the explicit close(f).

steeldriver
  • 78,509
  • 12
  • 109
  • 152