19

I have a .CSV file with the below format:

"column 1","column 2","column 3","column 4","column 5","column 6","column 7","column 8","column 9","column 10
"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23455","12312255564","string, with, multiple, commas","string with or, without commas","string 2","USD","433","70%","07/15/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""
"46476","15467534544","lengthy string, with commas, multiple: colans","string with or, without commas","string 2","CAND","388","70%","09/21/2013",""

5th column of the file has different strings. I need to filter out the file based on the 5th column value. Lets say, I need a new file from the current file which has records only with the value "string 1" in its fifth field.

For this I tried the below command,

awk -F"," ' { if toupper($5) == "STRING 1") PRINT }' file1.csv > file2.csv

but it was throwing me an error as following:

awk: { if toupper($5) == "STRING 1") PRINT }
awk: ^ syntax error
awk: { if toupper($5) == "STRING 1") PRINT }
awk: ^ syntax error

I then used the following which gives me an odd output.

awk -F"," '$5="string 1" {print}' file1.csv > file2.csv

Output:

"column 1" "column 2" "column 3" "column 4" string 1 "column 6" "column 7" "column 8" "column 9" "column 10
"12310" "42324564756" "a simple string with a comma" string 1 without commas" "string 1" "USD" "12" "70%" "08/01/2013" ""
"23455" "12312255564" "string with string 1 commas" "string with or without commas" "string 2" "USD" "433" "70%" "07/15/2013" ""
"23525" "74535243123" "string with commas string 1 "string with or without commas" "string 1" "CAND" "744" "70%" "05/06/2013" ""
"46476" "15467534544" "lengthy string with commas string 1 "string with or without commas" "string 2" "CAND" "388" "70%" "09/21/2013" ""

P.S: I used the toupper command to be on the safe side, as I am not sure if the string will be in lower or higher case. I need to know what is wrong with my code and if the space in the string matters while searching for a pattern using AWK.

Rui F Ribeiro
  • 55,929
  • 26
  • 146
  • 227
Dhruuv
  • 527
  • 3
  • 8
  • 16

3 Answers3

21
awk -F '","'  'BEGIN {OFS=","} { if (toupper($5) == "STRING 1")  print }' file1.csv > file2.csv 

Output

"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""

I think This is What you want.

limovala
  • 734
  • 3
  • 9
  • 22
  • The output is exactly how I needed it to be. I haven't thought of making `'","'` as delimiter, otherwise it would have solved my problem... great solution... – Dhruuv Oct 22 '13 at 13:34
  • @Dhruuv making `'","'` the delimiter is what most answers to your [previous question](http://unix.stackexchange.com/q/96349/22222) suggested :). – terdon Oct 22 '13 at 14:59
  • @terdon : yes, I know, but that did not get into my mind when I was having the issue. Frankly, I thought it might be something with the command or something other than the delimiters that was causing the problem... :) Hence did not give it a try... :( – Dhruuv Oct 22 '13 at 17:17
  • @terdon, @limovala: A quick question... how do I keep the header of the file as is and check the same above condition for the rest of the records? I am trying to do something like this, **`awk -F '","' 'BEGIN { if(NR==1){print} else (toupper($5) == "HYPERION"); print }' file1.csv > file2.csv`** but it does not return anything.. Please let me know what am I doing wrong here... – Dhruuv Nov 01 '13 at 02:41
  • 2
    @Dhruuv not sure about the details since i can't tell what you're trying to do but your else condition is almost certainly wrong. Are you trying to print ony if $5 is HYPERION? If so, try `else{if(toupper($5)=="HYPERION"){print}}`. Not at my computer at the moment so I might have the syntax wrong but you can't give a condition to an else statement. – terdon Nov 01 '13 at 03:52
  • 1
    `awk -F '","' 'BEGIN {OFS=","} { if (NR==1) {print} else{if (toupper($5) == "STRING 1") print} }' file1` – limovala Nov 01 '13 at 09:51
  • +1 @terdon yes, I wanted to print the first record(header of the file) as is, and the rest of the records which have the string(I was applying the same code to a different file which I need to filter with string `"Hyperion"`) as mentioned in my code... sorry for the confusion created... +1 @limovala. code is perfect and I now know why my code was not returning anything... :) – Dhruuv Nov 01 '13 at 13:12
3

The problem with CSV is that there is no standard. If you need to deal with CSV-formatted data often, you might want to look into a more robust method rather than just using "," as your field separator. In this case, Perl's Text::CSV CPAN modules are exceptionally well-suited to the job:

$ perl -mText::CSV_XS -WlanE '
    BEGIN {our $csv = Text::CSV_XS->new;} 
    $csv->parse($_); 
    my @fields = $csv->fields(); 
    print if $fields[4] =~ /string 1/i;
' file1.csv
"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""
-1
awk 'BEGIN {FS = "," }'  '{ (if toupper($5)  == "STRING 1") print; }'  file1.csv > file2.csv
PersianGulf
  • 10,728
  • 8
  • 51
  • 78
  • Sorry to say but, your solution doesnt return any records from the file... I think just adding the delimiter as `'","'` will do... thanks... :) – Dhruuv Oct 22 '13 at 13:36
  • @Mohsen -1 because 1) you need to escape the "or they are not understood as parts of the file delimiter. See the answers to the OP's other question and 2) you are separating the BEGIN block from the rest of the command which completely breaks it. Try `awk 'BEGIN {FS = "," }' '{print $0}'`, you will see it produces no output. In the future, please test your answers to see if they actually work before posting them. – terdon Oct 22 '13 at 20:32