-2

I have a CSV with ~50 comma-separated values in one column that I want to separate into separate columns. The header is line 1. This should be really simple, and I've tried a lot surrounding awk and mlr but haven't been able to adapt anything I've seen in order to separate a single column into many columns using a comma as a delimiter.

My process:

  1. I used mlr to combine hundreds of CSVs into one CSV:

    mlr --icsv cat *.csv > filename.txt
    mlr --ocsv unsparsify filename.txt > filename.csv
    
  2. Now I have a CSV with one column; in that column are ~50 comma-separated values that I want to explode into many columns.

Kusalananda
  • 320,670
  • 36
  • 633
  • 936
  • 5
    Add please a sample input file – aborruso Aug 09 '22 at 05:26
  • 2
    In a CSV (comma-separated-value) file, the values are delimited by commas. It's unclear what you want to do to the data if the fields are _already_ delimited by commas. Seeing an explicit example would be good. I suppose that we may _assume_ that you are talking about commas in a _quoted_ field, but you never actually say this and you show no examples of the data. – Kusalananda Aug 09 '22 at 07:11
  • 1
    Are there other columns in addition to this one with 50 comma-separated values in it? Please add sample input and expected output. Obviously use something like 3 instead of 50 nested columns. – Ed Morton Aug 09 '22 at 14:43
  • what about this answer https://unix.stackexchange.com/a/712982/195582 – aborruso Aug 09 '22 at 20:05
  • Did you solve it? – aborruso Aug 14 '22 at 07:11

1 Answers1

2

You should add always some input and output sample file.

I seem to have understood that you have an input of this type, a csv in which a column contains a CSV inside (in example here, the a field)

a b c
1,2,3 aa aa
4,7,9 ff ff

The raw CSV is this:

a,b,c
"1,2,3",aa,aa
"4,7,9",ff,ff

Using miller and nest verb, you can run

mlr --csv nest --explode --values --across-fields -f a --nested-fs "," input_01.csv > output.csv

to have

a_1 a_2 a_3 b c
1 2 3 aa aa
4 7 9 ff ff

The raw output is

a_1,a_2,a_3,b,c
1,2,3,aa,aa
4,7,9,ff,ff
aborruso
  • 2,618
  • 10
  • 26