compare two csv files and fetch matching data into a new csv file

Question

I have two csv files with similar content/data.

file1.csv

ADIS
BAP3
Mercury_System
nxh-2003
DR_FeatureUP_PT

file2.csv

ADIS,projects.adis
EcoSystems,projects.ecosystems
em1xxxsw,projects.em1xxxsw
BAP3,projects.bap3
Dirana4,projects.dirana4
Mercury_System,projects.mercury_system
nxh-2003,projects.nxh-2003
DocStore,projects.docstore
DR_FeatureUP_PT,projects.dr_featureup_pt

Desired output.csv

ADIS,projects.adis
BAP3,projects.bap3
Mercury_System,projects.mercury_system
nxh-2003,projects.nxh-2003
DR_FeatureUP_PT,projects.dr_featureup_pt

Please let me know the command used for this type of query.

What kind of query are you trying to do? Right now we just have 2 input files from you. — Eric Renouf, Nov 18 '16 at 13:15
see also: https://unix.stackexchange.com/questions/134829/compare-two-columns-of-different-files-and-print-if-it-matches — Sundeep, Nov 18 '16 at 13:22
@EricRenouf i have given two input files, so a query/command/sed/awk/grep anything should comapre these two inout files and get a data which is similar like the one i gave in output — Siddharth Sahoo, Nov 18 '16 at 13:30
Yes, but how should the output be generated from those two files? Explain in words. Do you just want the 1,2,4,5,and 7th lines from file? Are you looking for `grep -Ff file1 file2`? What if file2 contains a `ADISBADIS`, should that match `ADIS`? You need to tell us what you need instead of expecting us to guess. — terdon, Nov 21 '16 at 14:55
file1.csv contains 118 rows , file2.csv contains 2500 rows with 2 columns My requirement is the code/script should match both the csv files and fetch the matched row in a new file. I even tried grep -Ff file1 file2 but no use, there will be only single string with the same in both files. I am running GNU/Linux — Siddharth Sahoo, Nov 21 '16 at 15:02

Kusalananda · Answer 1 · 2022-11-23T11:16:47.250

Using Miller (mlr) to join the two CSV data sets on the named field ADIS:

$ mlr --csv join -j ADIS -f file1.csv file2.csv
ADIS,projects.adis
BAP3,projects.bap3
Mercury_System,projects.mercury_system
nxh-2003,projects.nxh-2003
DR_FeatureUP_PT,projects.dr_featureup_pt

If ADIS is not actually a field name and the two CSV files are header-less, then use the following instead (reads the input as header-less CSV and uses the field number instead of the field name in the join operation):

$ mlr --csv -N join -j 1 -f file1.csv file2.csv
ADIS,projects.adis
BAP3,projects.bap3
Mercury_System,projects.mercury_system
nxh-2003,projects.nxh-2003
DR_FeatureUP_PT,projects.dr_featureup_pt

The output of this second command looks the same as that of the first command. However, the first line in each file in no longer interpreted as a header line but as a record of data fields.

This would work regardless of whether the input data files are DOS or Unix text files.

αғsнιη · Answer 2 · 2016-11-18T13:28:06.587

0

Using grep;

grep -Ff file1.csv file2.csv > outfile.csv

using awk

awk -F, 'NR==FNR{seen[$0]++;next} ($1 in seen)' file1.csv file2.csv > outfile.csv

edited Nov 18 '16 at 13:28

answered Nov 18 '16 at 13:23

αғsнιη

40,939
15
71
114

score 0 · Answer 3 · answered Nov 21 '16 at 21:27

You can solve this with a simple script to execute in the shell. Try with this:

while read i; do grep $i ./file2.csv >> output.csv; done < file1.csv

It will create a new file called output.csv with the result that you are expecting for. You need to run it on the folder where you have stored both files file1.csv and file2.csv.

score 0 · Answer 4 · answered Nov 21 '16 at 21:36

0

Using Bash process substitution, you can sort the files and then "join" them.

join -t, <(sort file1.csv) <(sort file2.csv) > output.csv

answered Nov 21 '16 at 21:36

Wildcard

35,316
26
130
258

compare two csv files and fetch matching data into a new csv file

4 Answers4