0

Since csvcut (from csvkit) does not take more than a single file at a time, I need to write a script to process multiple files using it.

The first parameter should be the delimiter, the second parameter should be the header of the column to extract, and remaining arguments are the filenames.

If the file names are missing, the script should standard input.

It should be something like this

csvcut ';' Measure calories.csv

I'm not really familiar with csvkit. Can anyone help?

Kusalananda
  • 320,670
  • 36
  • 633
  • 936
amV
  • 75
  • 4
  • 2
    Please [edit] your question and i) give us a few lines of your input CSV file and ii) the output you expect to see from those lines. – terdon Aug 12 '19 at 07:52

1 Answers1

0

Assuming all CSV files that you'd like to process have the same number and order of columns.

#!/bin/sh

delim=$1
cols=$2

if [ -z "$delim" ] || [ -z "$cols" ]; then
    echo 'missing delimiter and/or columns' >&2
    exit 1
fi

shift 2

csvstack --delimiter "$delim" "$@" |
csvcut --columns "$cols"

This script would take two or more arguments. The first one would be the delimiter character, the second the name or number of the columns to extract (a comma-delimited list can be used). The rest of the arguments are used as filenames to process.

If only two arguments are given, standard input will be used as data to process.

The csvstack command is used to create a single CSV data stream of the given files, and csvcut is used to extract the wanted columns. Note that the delimiter changes to a comma in the output from csvstack from whatever it was in the input. If you are extracting multiple columns, and want a particular delimiter, pass the result through csvformat and specify the delimiter with -D (--out-delimiter).

Example run:

$ cat file1.csv
a;b;c
1;2;3
$ cat file2.csv
a;b;c
4;5;6
$ sh script.sh ';' 'a,c' file*
a,c
1,3
4,6
Kusalananda
  • 320,670
  • 36
  • 633
  • 936