Extract data from a file and place in different files based on one column value

Question

We will generate a csv file with below values

yp1234,577,1,3
yp5678,577,3,5
yp9012,132,8,9

I need to extract data and create files based on second column. If it's 577 then the whole line has to be extracted and placed in a separate file. I mean I need a file having lines with second column as 577 alone and another file with second column as 132 alone

I tried using IF but didn't work

Actually posting the code that doesn't work is always a good idea. — goldilocks, Feb 07 '14 at 16:38

score 35 · Answer 1 · edited Jun 10 '19 at 15:10

35

Use awk:

awk -F, '{ print > $2 ".csv" }' file.csv

This will create the two files 577.csv and 132.csv in your current directory.

The command above assumes that you can only have 132 or 577 as the second field. It will create one file name for each of the values found in the second field of the entire file.csv.

If there are other values apart from the two you are interested in, and you want to ignore those lines, do this instead:

awk -F, '$2 == "577" || $2 == "132" { print > $2 ".csv" }' file.csv

edited Jun 10 '19 at 15:10

Kusalananda

320,670
36
633
936

answered Feb 07 '14 at 16:49

terdon

234,489
66
447
667

2

There are buggy `awk` implementations that can't use `print > $2 ".cvs"`. On those, you would have to first compute the filename, then do the `print`: `fname = $2 ".cvs"; print > fname`. – Kusalananda Jun 10 '19 at 15:14
@Kusalananda those awk implementations that fail given `print > $2 ".cvs"` aren't buggy. The meaning of `print > $2 ".cvs"` (or any other unparenthesized expression on the right side of output or input redirection) is undefined by POSIX so failing with an error is just as valid as trying to figure out what the user intended when they wrote that code. The POSIX-defined, portable way to write that code is `print > ($2 ".cvs")`. – Ed Morton Aug 02 '21 at 20:11
@EdMorton Yeah, that would work too. – Kusalananda Aug 02 '21 at 20:16
@Kusalananda Storing the file name in a variable as you suggested is fine too (and more useful if/when you also need to close() the output files), my only real point is that it's not a bug to fail given `print > $2 ".cvs"`. – Ed Morton Aug 02 '21 at 20:17
1

@EdMorton Ah, I see what I wrote back in 2019 now. Good, thanks for the comment! – Kusalananda Aug 02 '21 at 20:18
FWIW I'd do this with `awk -F, '$2 ~ /^(577|132)$/ { print > ($2 ".csv") }' file.csv`. – Ed Morton Aug 02 '21 at 20:22

X Tian · Answer 2 · 2014-02-07T18:19:12.437

3

To extract all 577 to stdout

grep -e '^.*,577,.*,.*$' youfile.csv >result_extract_557.csv

-- edit 1 Corrected, based on @terdon's comment below to avoid false matches when at least 3 commas on line with 577.

grep -e '^[:alnum:]*,577,[:digit:]*,[:digit:]*$' youfile.csv >result_extract_557.csv

But I think his/her awk solution is more comprehensive.

edited Feb 07 '14 at 18:19

answered Feb 07 '14 at 16:43

X Tian

10,413
2
33
48

1

That will match even if 577 is on another field, not the second or if it is a part of a field. For example `foo577bar` or `yp9012,132,8,577`. – terdon Feb 07 '14 at 16:50
I thought my commas would make it field position dependent ? – X Tian Feb 07 '14 at 16:58
Sorry, I gave bad examples but the `.*` can also match commas so you don't know which field you're matching. Could be the second, could also be the 45th. My second complaint was wrong, you're right that the commas protect from matching `foo577bar`. – terdon Feb 07 '14 at 17:01
what to do if | character is used instead of , . – user3116123 Feb 11 '14 at 15:41
receiving below error grep: illegal option -- e Usage: grep -hblcnsviw pattern file . . . – user3116123 Feb 11 '14 at 15:44
@user3116123 grep needs to handle regular expressions, do you have egrep ? – X Tian Feb 11 '14 at 16:52

score 3 · Answer 3 · answered Feb 07 '14 at 17:25

I like terdon's awk solution, but for the sake of completeness, here is a suggestion using only bash

while IFS=, read -r a1 a2 a3 a4; do 
    echo "$a1,$a2,$a3,$a4" >> "$a2".csv
done < file.csv

It will produce files 577.csv and 132.csv in the current directory.

score 2 · Answer 4 · answered Jan 30 '17 at 19:02

Using csvkit:

$ csvgrep -c 2 -m 577 data.csv >output.csv

The -c 2 makes cvsgrep consider the second column, and with -m 577 we ask it to match the string 577 in that column.

The following will be written to output.csv:

yp1234,577,1,3
yp5678,577,3,5

To match a number of strings and write the output to a file for each string:

for pattern in 577 132; do
  csvgrep -c 2 -m "$pattern" data.csv >"output-$pattern.csv"
done

This will create the two files output-132.csv and output-577.csv.

Extract data from a file and place in different files based on one column value

4 Answers4

Linked

Related