3

I'm analyzing some packetfilter logs and wanted to make a nice table of some output, which normally works fine when I use column -t. I can't use a tab as my output field separator (OFS) in this case because it jacks up the multi-word string fields with the table view.

My original data consists of rows like this:

2018:01:24-09:31:21 asl ulogd[24090]: id="2103" severity="info" sys="SecureNet" sub="ips" name="SYN flood detected" action="SYN flood" fwrule="50018" initf="eth0" srcmac="12:34:56:78:90:ab" dstmac="cd:ef:01:23:45:67" srcip="192.168.1.123" dstip="151.101.65.69" proto="6" length="52" tos="0x00" prec="0x00" ttl="128" srcport="59761" dstport="80" tcpflags="SYN"

I'm getting the data into a comma-delimited (CSV) format using:

grep -EHr "192\.168\.1\.123" | 
cut -d':' -f2- | 
awk -F '"' 'BEGIN{
    OFS=","; 
    print "name","action","srcip","srcport","dstip","dstport","protocol","tcpflags"
}
{
    print $10,$12,$22,$36,$24,$38,$26,$(NF-1)
}'

This works fine and produces this kind of output (IP addresses all changed, I don't really have an internal host flooding this site):

name,action,srcip,srcport,dstip,dstport,protocol,tcpflags
SYN flood detected,SYN flood,192.168.1.123,59761,151.101.65.69,80,6,SYN
SYN flood detected,SYN flood,192.168.1.123,59764,151.101.65.69,80,6,SYN
SYN flood detected,SYN flood,192.168.1.123,59769,151.101.65.69,80,6,SYN
SYN flood detected,SYN flood,192.168.1.123,59771,151.101.65.69,80,6,SYN
SYN flood detected,SYN flood,192.168.1.123,59772,151.101.65.69,80,6,SYN
SYN flood detected,SYN flood,192.168.1.123,59890,151.101.65.69,80,6,SYN
SYN flood detected,SYN flood,192.168.1.123,60002,151.101.65.69,80,6,SYN
SYN flood detected,SYN flood,192.168.1.123,60005,151.101.65.69,80,6,SYN
SYN flood detected,SYN flood,192.168.1.123,60006,151.101.65.69,80,6,SYN

For some reason, whenever I use column to display the table output (-t), it adds a newline after the first column where no newline exists in the original data. For example:

$ cat mydata.csv | column -s ',' -t
name
                                action     srcip           srcport  dstip          dstport  protocol  tcpflags
SYN flood detected
                                SYN flood  192.168.1.123   59761    151.101.65.69  80       6         SYN
SYN flood detected
                                SYN flood  192.168.1.123   59764    151.101.65.69  80       6         SYN
SYN flood detected
                                SYN flood  192.168.1.123   59769    151.101.65.69  80       6         SYN

The expected output would be like follows:

name                 action     srcip           srcport  dstip          dstport  protocol  tcpflags
SYN flood detected   SYN flood  192.168.1.123   59761    151.101.65.69  80       6         SYN
SYN flood detected   SYN flood  192.168.1.123   59764    151.101.65.69  80       6         SYN
SYN flood detected   SYN flood  192.168.1.123   59769    151.101.65.69  80       6         SYN

Adding -x to column makes no difference either, nor does specifying the number of columns with -c (I have plenty of screen width in the terminal). Why is it doing that when there is no newline in the original data?

I really don’t think it is a character in my data because it is also happening with the header column which I created in my awk BEGIN block.

Dan
  • 396
  • 4
  • 14
  • copying your example output and running `cat mydata.csv | column -s ',' -t` on it produces the expected output. centos 7 bash, fwiw. perhaps the output has a non-printing character in it that you aren't seeing. – WEBjuju Jan 24 '18 at 23:56
  • 1
    @WEBjuju I thought that, too, but then it should definitely not be happening in my header field which I create in awk, since I created that myself and know it has no weird characters. This appears to be something else. – Dan Jan 25 '18 at 00:06
  • cool - can you attach the csv file to the question? i'll try again... – WEBjuju Jan 25 '18 at 00:10
  • @WEBjuju not sure I can attach a file to a question, and it would take a lot of effort as this is not my actual data (i.e. I did find/replace on IP addresses with bogus data). – Dan Jan 25 '18 at 00:11
  • 1
    What I can do is post the original data so you can see if my awk is the problem – Dan Jan 25 '18 at 00:12
  • @WEBjuju original data sample posted – Dan Jan 25 '18 at 00:15
  • i had to post the output as a solution, sorry about that. – WEBjuju Jan 25 '18 at 00:17
  • I just can’t imagine it’s anything other than a standard C-style NUL-terminated string as it’s a Linux application. But I’ve been wrong before.... – Dan Jan 25 '18 at 00:22
  • Do you get the same behaviour with _only_ the header? – Kusalananda Jan 25 '18 at 08:02
  • @Kusalananda no, I cannot duplicate the issue with only the header using the following: `awk 'BEGIN{OFS=","; print "name","action","srcip","srcport","dstip","dstport","protocol","tcpflags"}' | column -s ',' -t` -- this prints fine on one line. Very odd. – Dan Jan 25 '18 at 15:50
  • @Kusalananda in fact, when I export the data to a CSV file, it still does it, but if I only export a few rows, I don't have that issue. I have been one by one just exporting one row into a new file, and each time it renders fine. I only have issues trying to see *all* of my data. Arg.... – Dan Jan 25 '18 at 16:45
  • @Dan Then it sounds like it's an issue with some of your data rather than with any component of the shell code that you've posted. – Kusalananda Jan 25 '18 at 16:48
  • @Kusalananda what I'm confused about is why it wouldn't then print everything correctly *up until* the problematic data. – Dan Jan 25 '18 at 16:52

1 Answers1

3

I can reproduce your issue if I insert a row in the CSV file who's first comma-separated value is a very long string.

name                                                                                            
                   action     srcip          srcport  dstip          dstport  protocol  tcpflags
SYN flood detected                                                                              
                   SYN flood  192.168.1.123  59761    151.101.65.69  80       6         SYN     
SYN flood detected                                                                              
                   SYN flood  192.168.1.123  59764    151.101.65.69  80       6         SYN     
SYN flood detected                                                                              
                   SYN flood  192.168.1.123  59769    151.101.65.69  80       6         SYN     
SYN flood detected                                                                              
                   SYN flood  192.168.1.123  59771    151.101.65.69  80       6         SYN     
SYN flood detected                                                                              
                   SYN flood  192.168.1.123  59772    151.101.65.69  80       6         SYN     
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxx  SYN flood  192.168.1.123  59890    151.101.65.69  80       6         SYN     
SYN flood detected                                                                              
                   SYN flood  192.168.1.123  60002    151.101.65.69  80       6         SYN     
SYN flood detected                                                                              
                   SYN flood  192.168.1.123  60005    151.101.65.69  80       6         SYN     
SYN flood detected                                                                              
                   SYN flood  192.168.1.123  60006    151.101.65.69  80       6         SYN     

Note that there is no newline between the name and action columns in the actual output, but a line wrap (due to the line being so long) giving the illusion of a newline followed by indentation.

This means that you should look in your data for an entry with a very long name value.

Kusalananda
  • 320,670
  • 36
  • 633
  • 936
  • This was the problem. I did not know this was the behavior of `column` - I wish there was a way to truncate a column when using `-t`. Thanks – Dan Jan 25 '18 at 17:49
  • But this makes sense as it is just padding the column based on the longest value, which in this case exceeds the screen width. Thanks! – Dan Jan 25 '18 at 18:47