5

To determine the maximum length of each column in a comma-separated csv-file I hacked together a bash-script. When I ran it on a linux system it produced the correct output, but I need it to run on OS X and it relies on the GNU version of wc that can be used with the parameter -L for --max-line-length.

The version of wc on OSX does not support that specific option and I'm looking for an alternative.

My script (which not be that good - it reflects my poor scripting skills I guess):

#!/bin/bash

for((i=1;i< `head -1 $1|awk '{print NF}' FS=,`+1 ;i++));
    do echo  | xargs echo -n "Column$i: " && 
    cut -d, -f $i $1 |wc -L  ; done

Which prints:

Column1: 6
Column2: 7
Column3: 4
Column4: 4
Column5: 3

For my test-file:

123,eeeee,2323,tyty,3
154523,eegfeee,23,yty,343

I know installing the GNU CoreUtils through Homebrew might be a solution, but that's not a path I want to take as I'm sure it can be solved without modifying the system.

Anthon
  • 78,313
  • 42
  • 165
  • 222
jpw
  • 153
  • 1
  • 2
  • 5

3 Answers3

8

why not use awk ?

I don't have a mac to test, but length() is a pretty standard function in awk, so this should work.

awk file:

 { for (i=1;i<=NF;i++) {
    l=length($i) ;
    if ( l > linesize[i] ) linesize[i]=l ;
  }
}
END {
    for (l in linesize) printf "Columen%d: %d\n",l,linesize[l] ;
}

then run

mybox$ awk -F, -f test.awk  a.txt
Columen4: 4
Columen5: 3
Columen1: 6
Columen2: 7
Columen3: 4
Archemar
  • 31,183
  • 18
  • 69
  • 104
  • This looks awesome. Actually I needed the answer to be able to answer a question on Stack Overflow myself. I don't want to take credit for your work though, so if you want some rep on SO you should post this answer there to the question http://stackoverflow.com/questions/25653846/finding-the-max-number-of-characters-in-a-csv-file-column and I won't add it to my answer there. – jpw Sep 04 '14 at 08:35
  • Credit should go to the one who deserves it :) The reason I didn't use awk all the way is that I simply don't know it :p – jpw Sep 04 '14 at 08:42
7

Similar to archemars but reduced

awk -F, ' { for (i=1;i<=NF;i++)l[i]=((x=length($i))>l[i]?x:l[i])}
          END {for (i in l) print "Column"i":",l[i]}' file

Column4: 4
Column5: 3
Column1: 6
Column2: 7
Column3: 4

Also to maintain the order

 awk -F, ' { for (i=1;i<=NF;i++)l[i]=((x=length($i))>l[i]?x:l[i])}
           END {for(i=1;i<=NF;i++) print "Column"i":",l[i]}'

Column1: 6
Column2: 7
Column3: 4
Column4: 4
Column5: 3
1

A perl solution:

$ perl -F, -anle 'map {$h{$_} = length($F[$_]) if length($F[$_]) > $h{$_}} 0..$#F;
    END { print "Column @{[$_+1]}: $h{$_}" for sort {$a <=> $b} keys %h }' file
Column 1: 6
Column 2: 7
Column 3: 4
Column 4: 4
Column 5: 3
cuonglm
  • 150,973
  • 38
  • 327
  • 406
  • 1
    +1 I've already accepted an answer and can't really tell which solution is better, but thanks for the effort. – jpw Sep 04 '14 at 09:10