Select certain column of each file, paste to a new file

Question

I have 20 tab delimited files with the same number of rows. I want to select every 4th column of each file, pasted together to a new file. In the end, the new file will have 20 columns with each column come from 20 different files.

How can I do this with Unix/Linux command(s)?

Input, 20 of this same format. I want the 4th column denoted here as A1 for file 1:

chr1    1734966 1735009 A1       0       0       0       0       0       1       0
chr1    2074087 2083457 A1       0       1       0       0       0       0       0
chr1    2788495 2788535 A1       0       0       0       0       0       0       0
chr1    2821745 2822495 A1       0       0       0       0       0       1       0
chr1    2821939 2822679 A1       1       0       0       0       0       0       0
...

Output file, with 20 columns, each column coming from one of the 20 files' 4th column:

A1       A2       A3       ...       A20
A1       A2       A3       ...       A20
A1       A2       A3       ...       A20
A1       A2       A3       ...       A20
A1       A2       A3       ...       A20
...

cut is the command which gets colomun from file. and paste is another command which pastes colomns horizontally. check: man cut , man paste — Vineeth Chowdhary, Sep 30 '14 at 12:49
Please [edit] your question and give us an example of your input files and your desired output. How are columns defined? Spaces? Commas? Tabs? Something else? — terdon, Sep 30 '14 at 12:54
I changed your question to make it more direct, as others (and maybe you) might want to know **how** to do what you are asking, not just if people exists that have the capability to solve such a problem. — Anthon, Sep 30 '14 at 15:24
Thanks for the comments. I edited my question. Hope is clear know. — Jun Cheng, Oct 01 '14 at 08:01
@JunCheng `paste <(cut -f 4 1.txt) <(cut -f 4 2.txt) .... <(cut -f 4 20.txt)`. That works because `cut` by default cuts on TAB delimited fields. If the question gets reopened I will post this as an answer as well. — Anthon, Oct 01 '14 at 08:31
@Anthon, thanks a lot. Is there any way do not need to specify (cut -f 4 1.txt) <(cut -f 4 2.txt) .... <(cut -f 4 20.txt), in case there are 100+ files or uncertain number of files? — Jun Cheng, Oct 01 '14 at 13:05
@JunCheng You can paste the first two files in out.txt and then incrementally paste the output of `out.txt` and each following to an `out2.txt`, move that `out2.txt` to `out.txt` and do the next. But by then I personally would make a Python script and make lists for each row and append, and dump the result when all files are parsed. I don't think you can parametrize `<(cut ...)` — Anthon, Oct 01 '14 at 13:13
It is kind of unfortunate it takes so long to get the five reopen votes (only one more to go) — Anthon, Oct 01 '14 at 13:56

Anthon · Accepted Answer · 2016-07-08T18:48:46.607

with paste under bash you can do:

paste <(cut -f 4 1.txt) <(cut -f 4 2.txt) .... <(cut -f 4 20.txt)

With a python script and any number of files (python scriptname.py column_nr file1 file2 ... filen):

#! /usr/bin/env python

# invoke with column nr to extract as first parameter followed by
# filenames. The files should all have the same number of rows

import sys

col = int(sys.argv[1])
res = {}

for file_name in sys.argv[2:]:
    for line_nr, line in enumerate(open(file_name)):
        res.setdefault(line_nr, []).append(line.strip().split('\t')[col-1])

for line_nr in sorted(res):
    print '\t'.join(res[line_nr])

Ruthger Righart · Answer 2 · 2014-09-30T14:58:18.627

The following script does this using awk. I have added for convenience a rownumber, which indicates the number of rows in your files (r). The number of columns you'd like to paste is indicated by c.

directory=/your-directory/
r=4
c=20

for n in $(seq 1 $r); do
echo "$n" >> rownumber.txt
done

for n in $(seq 1 $c); do
awk '{ print $4}' /$directory/file-$n.txt > /$directory/output-$n.txt
done

paste /$directory/rownumber.txt /$directory/output-[1-$c]*.txt > /$directory/newfile.txt

Select certain column of each file, paste to a new file

2 Answers2