2

I need to convert CSV to TSV in bash. I found this solution and it works well, but not for all datasets as I will show below.

For example, for a.txt:

a,"test, part2 ""the start""",b

sed format it badly:

[ nir ]$ cat a.txt | sed -E 's/("([^"]*)")?,/\2\t/g' 
a    "test    Op. 15 ""the start"    b
#^ tab....^ tab..................^ tab

Issues here: missing ,, extra tab, extra quotes.

Just to reference even python code format it badly:

[ nir ]$ cat a.txt | csv2tsv.py
a    "test, part2 ""the start"""    b
#^ tab..........................^ tab

Issues here: extra quotes.

csv2tsv.py is: csv.writer(sys.stdout, dialect='excel-tab').writerows(csv.reader(sys.stdin))

Where the true convert should looks like:

a    test, part2 "the start"    b
#^ tab......................^ tab

Would love to get some feedback how to solve this in bash. I went over many solution in the internet but non managed to handle those quotes on quotes inside quotes :)

muru
  • 69,900
  • 13
  • 192
  • 292
Nir
  • 1,265
  • 7
  • 23
  • 34
  • 1
    Did you try `tr ',' '\t' < a.txt` ? – Valentin Bajrami Sep 05 '21 at 21:24
  • 3
    Does this answer your question? [Converting CSV to TSV](https://unix.stackexchange.com/questions/359832/converting-csv-to-tsv) – roaima Sep 05 '21 at 22:18
  • In my question I actually link to that solution, and not it is not. The multiple quotes breaks it. as for the `tr` it will not work as it does not consider the quotes – Nir Sep 06 '21 at 05:11
  • 1
    At least one of the answers in [the suggested duplicate](https://unix.stackexchange.com/q/359832/100397) will work for your example data. Don't just try the accepted answer – roaima Sep 06 '21 at 07:52
  • You've provided one line of test code. Could you give us a little more to work with? – jubilatious1 Sep 06 '21 at 21:59

3 Answers3

4

bash 5.1 comes with a loadable CSV module

BASH_LOADABLES_PATH=${BASH/\/bin\//\/lib\/}
enable -f csv csv
csv -a fields "$line"
new_line=$(IFS=$'\t'; echo "${fields[*]}")
declare -p line fields new_line

outputs

declare -- line="a,\"test, part2 \"\"the start\"\"\",b"
declare -a fields=([0]="a" [1]="test, part2 \"the start\"" [2]="b")
declare -- new_line="a  test, part2 \"the start\"   b"
#.....................^ tab......................^ tab

This is not valid is there's a field containing a tab.


In a pipeline:

IFS=$'\t'
cat file |
while IFS= read -r line; do
    csv -a fields "$line"
    echo "${fields[*]}"
done |
tail

Although this is more idiomatic bash

IFS=$'\t'
while IFS= read -r line; do
    csv -a fields "$line"
    echo "${fields[*]}"
done < file | tail
glenn jackman
  • 84,176
  • 15
  • 116
  • 168
  • Can you please show how to use it as part of pipe? for example `cat a.txt | | tail` – Nir Sep 06 '21 at 05:19
  • Note that few systems ship those loadable bash builtins by default as they're more like example code (and not much tested). That `BASH_LOADABLES_PATH=${BASH/\/bin\//\/lib\/}` will likely not work if bash is found in `/bin` (which it is most of the time). – Stéphane Chazelas Sep 06 '21 at 07:46
  • That csv builtin was added in 5.1. It was not there in 5.0 – Stéphane Chazelas Sep 06 '21 at 07:50
4

With mlr:

mlr -N --icsv --otsvlite cat < file.csv > file.tsv

Or:

mlr -N --c2t --quote-none cat < file.csv > file.tsv

But note that if a csv field contains a tab character, it will end up not escaped in the output and therefore introducing an extra field.

With GNU sed, you could do the same with:

sed -E '
  # append next line as long as there is not an even number
  # of "s, to handle fields with newline. You can omit this line
  # if the fields are guaranteed not to contain newlines:
  :1; /^([^"]*"[^"]*")*[^"]*$/! {N;b1}

  s/$/,/
  s/(([^,"]*)|"((""|[^"])*)"),/\2\3\t/g
  s/\t$//
  s/""/"/g' < file.csv > file.tsv

That assumes the input is valid text in the current locale. Start sed as LC_ALL=C sed... to disable localisation and treat the input as if it was binary input to avoid decoding issues (and likely speed things up if speed is a concern)

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
0

Or use csvformat from the csvkit -- this tool takes care of quoting any field containing the delimiter: I added a line to the input file that has tabs in it.

$ cat a.txt
a,"test, part2 ""the start""",b
c,d,e   with    tabs

$ csvformat -D $'\t' a.txt
a   "test, part2 ""the start""" b
c   d   "e  with    tabs"
glenn jackman
  • 84,176
  • 15
  • 116
  • 168
  • The `"` should not be in the result – Nir Sep 09 '21 at 09:20
  • 1
    The second line needs quotes to contain the inner delimiters. For the first line, [the standard](https://datatracker.ietf.org/doc/html/rfc4180) says "If fields are not enclosed with double quotes, then double quotes may not appear inside the fields." But the standard is merely advisory: if you don't want to follow it, then don't, but common CSV tools (like csvkit) may not meet your requirements – glenn jackman Sep 09 '21 at 13:14