Extract lines into separate files named based on the string in each non-empty first column

Question

I've got a csv mappings.csv shown below. I'd like to extract a block of records into separate files relating to each record that has a non-empty first field. The files are shown after mappings.csv.

$ cat mappings.csv
TEST1,,,a,a,a,a
,,,b,b,b,b
,,,c,c,c,c
TEST2,,,aa,aa,aa,aa
,,,bb,bb,bb,bb
,,,cc,cc,cc,cc
,,,dd,dd,dd,dd
TEST3,,,aaa,aaa,aaa,aaa
,,,bbb,bbb,bbb,bbb

Output files based on mappings.csv is below:

$ cat TEST1.csv
TEST1,,,a,a,a,a
,,,b,b,b,b
,,,c,c,c,c

$ cat TEST2.csv
TEST2,,,aa,aa,aa,aa
,,,bb,bb,bb,bb
,,,cc,cc,cc,cc
,,,dd,dd,dd,dd

$ cat TEST3.csv
TEST3,,,aaa,aaa,aaa,aaa
,,,bbb,bbb,bbb,bbb

I'm able to use awk to print lines that have a non-empty first field, but can't figure out how to expand and print subsequent records up to the next non-empty first field:

$ awk -F',' '$1' mappings.csv
TEST1,,,a,a,a,a
TEST2,,,aa,aa,aa,aa
TEST3,,,aaa,aaa,aaa,aaa

The other aspect of this problem is separating out the results into separate files. One thing I could do is use the matching record numbers to print out lines between the two. Something kind of like this:

$ awk -F',' '$1 {print NR}' mappings.csv
1
4
8

I'm sure this has been asked + answered before - you can just do something like `awk -F, '$1 != "" {close(f); f = $1 ".csv"} {print > f}' mappings.csv` — steeldriver, Aug 02 '21 at 17:02
@steeldriver Wow, this works great! What is going in the second action: `{print > f}`? In the first action, a nonempty first field matches, it closes the file and assigns the file name to `f`?. — EarthIsHome, Aug 02 '21 at 17:11
OK this seems sufficiently different from previous questions - at any rate, I have added an answer below — steeldriver, Aug 02 '21 at 17:34

steeldriver · Accepted Answer · 2021-08-02T20:06:03.180

Although very similar question have been asked and answered before, e.g.

I can't find an exact duplicate where the file names should be taken only from non-empty values of a specified column. So given:

$ cat mappings.csv 
TEST1,,,a,a,a,a
,,,b,b,b,b
,,,c,c,c,c
TEST2,,,aa,aa,aa,aa
,,,bb,bb,bb,bb
,,,cc,cc,cc,cc
,,,dd,dd,dd,dd
TEST3,,,aaa,aaa,aaa,aaa
,,,bbb,bbb,bbb,bbb

then

awk -F, '$1 != "" {close(f); f = $1 ".csv"} {print > f}' mappings.csv

results in

$ head TEST*
==> TEST1.csv <==
TEST1,,,a,a,a,a
,,,b,b,b,b
,,,c,c,c,c

==> TEST2.csv <==
TEST2,,,aa,aa,aa,aa
,,,bb,bb,bb,bb
,,,cc,cc,cc,cc
,,,dd,dd,dd,dd

==> TEST3.csv <==
TEST3,,,aaa,aaa,aaa,aaa
,,,bbb,bbb,bbb,bbb

The first action closes the file named f (if one is open), then constructs a new value for f by concatenating the (non-empty) value of the first field $1 with suffix .csv. The second action prints records to a file whose name is the (current) value of variable f. Note that it will error out if f is empty, which will be the case if there are any lines before the first non-empty value of $1.

Some awk implementations may take care of closing files for you in which case you don't need the explicit close(f).

Extract lines into separate files named based on the string in each non-empty first column

1 Answers1