All possible permutations of words in different files in pairs

Question

I have multiple files, let's say file1, file2 etc. Each file has one word in each line, like:

file1 file2 file3
one   four  six
two   five
three

What I want is to combine them in a new file4 in every possible permutation (without repetition) in pairs. Like

onetwo
onethree
onefour
onefive
...
twothree
...
onefour
...
fourone
...

How is this possible using Linux commands?

no i am trying to attack a hash with John The Ripper and i need to combine different files accordingly — mpla_mpla, May 30 '16 at 14:32
The files sizes are relevant. If you replace "file*" with the actual file names, what does `wc file* | tail -n1` output? — agc, May 30 '16 at 14:42
The description says _combination_, but the "want" list also includes a _permutation_, namely: "fourone". At present, the question is unclear. See [combinations and permutations](https://www.mathsisfun.com/combinatorics/combinations-permutations.html). — agc, May 30 '16 at 15:43
See also: [Command line tool to “cat” pairwise expansion of all rows in a file](http://unix.stackexchange.com/q/169625) — don_crissti, May 31 '16 at 13:09
Now we have enough data. Based on the `wc`, we're not dealing with huge files, so execution speed and array size limits won't much matter in _this_ instance. Assuming the sample output above is correct then it's a _permutation without repitition_, with "n!/(n-r)!" items. — agc, May 31 '16 at 15:54
@don_crissti, in my answer I used the shell's `set` command to hold an array of items; `set` is command line based, and is limited to a hair less than `getconf ARG_MAX` bytes, (on my system, that's about 2 megs). Since the OP's data is only 20K, (i.e. 1% of 2M), `set` is good enough. — agc, May 31 '16 at 16:36
@agc - I saw your answer but max no. of args is one thing and _array size limit_ is another thing. — don_crissti, May 31 '16 at 16:38
@don_crissti, thanks for the distinction, perhaps _array buffer size limit_ might have been a better description. — agc, May 31 '16 at 16:51
Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/40551/discussion-between-agc-and-don-crissti). — agc, May 31 '16 at 17:03
"Command line tool to “cat” pairwise expansion of all rows in a file" is for _permutations with repetition_; (n^r) items. — agc, May 31 '16 at 17:12

glenn jackman · Answer 1 · 2016-05-31T13:00:59.193

ruby is a nice concise language for this kind of stuff

ruby -e '
  words = ARGV.collect {|fname| File.readlines(fname)}.flatten.map(&:chomp)
  words.combination(2).each {|pair| puts pair.join("")}
' file[123] > file4

onetwo
onethree
onefour
onefive
onesix
twothree
twofour
twofive
twosix
threefour
threefive
threesix
fourfive
foursix
fivesix

You're quite right, combination provides "onetwo" but misses "twoone". Good thing there's permutation

ruby -e '
  words = ARGV.collect {|fname| File.readlines(fname)}.flatten.map(&:chomp)
  words.permutation(2).each {|pair| puts pair.join("")}
' file{1,2,3}

onetwo
onethree
onefour
onefive
onesix
twoone
twothree
twofour
twofive
twosix
threeone
threetwo
threefour
threefive
threesix
fourone
fourtwo
fourthree
fourfive
foursix
fiveone
fivetwo
fivethree
fivefour
fivesix
sixone
sixtwo
sixthree
sixfour
sixfive

it misses **twoone** **threeone** etc – mpla_mpla May 31 '16 at 12:13 — mpla_mpla, May 31 '16 at 12:13

agc · Answer 2 · 2016-05-31T16:21:58.520

1

Assuming the total size of the input files is smaller than getconf ARG_MAX, (i.e. the maximum command line length), then this should work:

set -- $( cat file[123] )
for f in $@ ; do
    for g in $@ ; do
        [ "$f" != "$g" ] && echo $f$g
    done
done > file4

cat file4 outputs:

onetwo
onethree
onefour
onefive
onesix
twoone
twothree
twofour
twofive
twosix
threeone
threetwo
threefour
threefive
threesix
fourone
fourtwo
fourthree
fourfive
foursix
fiveone
fivetwo
fivethree
fivefour
fivesix
sixone
sixtwo
sixthree
sixfour
sixfive

(As per OP clarification, the above is a revision for permutations without repetition. See previous draft for combinations without repetition.)

edited May 31 '16 at 16:21

answered May 30 '16 at 14:15

agc

7,045
3
23
53

it misses **twoone** **threeone** etc – mpla_mpla May 31 '16 at 12:14
Fixed for permutations without repetition. – agc May 31 '16 at 16:23

iruvar · Answer 3 · 2016-06-01T12:03:38.223

1

A python solution:

import fileinput
from itertools import permutations
from contextlib import closing
with closing(fileinput.input(['file1', 'file2', 'file3'])) as f:
    for x, y in permutations(f, 2):
            print '{}{}'.format(x.rstrip('\n'), y.rstrip('\n'))

onetwo
onethree
onefour
onefive
onesix
twoone
twothree
twofour
twofive
twosix
threeone
threetwo
threefour
threefive
threesix
fourone
fourtwo
fourthree
fourfive
foursix
fiveone
fivetwo
fivethree
fivefour
fivesix
sixone
sixtwo
sixthree
sixfour
sixfive

edited Jun 01 '16 at 12:03

answered May 31 '16 at 16:41

iruvar

16,515
8
49
81

@ iruvar this is much faster than the bash solution similar to @agc I was using. – badner Jul 24 '17 at 15:29
@badner - nice - and the speed doesn't surprise me at all given that `python` file I/O and `itertools` are implemented in the C layer – iruvar Jul 24 '17 at 17:20

score 0 · Accepted Answer · edited Jul 03 '16 at 05:57

Use this:

cat FILE1 FILE2 FILE3 | \
    perl -lne 'BEGIN{@a}{push @a,$_}END{foreach $x(@a){foreach $y(@a){print $x.$y}}}'

Output:

oneone
onetwo
onethree
onefour
onefive
onesix
oneseven
twoone
twotwo
twothree
twofour
twofive
twosix
twoseven
threeone
threetwo
threethree
threefour
threefive
threesix
threeseven
fourone
fourtwo
fourthree
fourfour
fourfive
foursix
fourseven
fiveone
fivetwo
fivethree
fivefour
fivefive
fivesix
fiveseven
sixone
sixtwo
sixthree
sixfour
sixfive
sixsix
sixseven
sevenone
seventwo
seventhree
sevenfour
sevenfive
sevensix
sevenseven

score 0 · Answer 5 · answered Nov 29 '16 at 22:02

TXR Lisp:

Warmup: just get the data structure first:

$ txr -p '(comb (get-lines (open-files *args*)) 2)' file1 file2 file3
(("one" "two") ("one" "three") ("one" "four") ("one" "five") ("one" "six")
 ("two" "three") ("two" "four") ("two" "five") ("two" "six") ("three" "four")
 ("three" "five") ("three" "six") ("four" "five") ("four" "six")
 ("five" "six"))

Now just a matter of getting the right output format. If we catenate the pairs together and then use tprint (implicitly via the -t option), we are there.

First, the catenation via mapping through cat-str:

$ txr -p '[mapcar cat-str (comb (get-lines (open-files *args*)) 2)]' file1 file2 file3
("onetwo" "onethree" "onefour" "onefive" "onesix" "twothree" "twofour"
 "twofive" "twosix" "threefour" "threefive" "threesix" "fourfive"
 "foursix" "fivesix")

OK, we have the right data. Now just use tprint function (-t) instead of prinl (-p):

$ txr -t '[mapcar cat-str (comb (get-lines (open-files *args*)) 2)]' file1 file2 file3
onetwo
onethree
onefour
onefive
onesix
twothree
twofour
twofive
twosix
threefour
threefive
threesix
fourfive
foursix
fivesix

Finally, we read the question again and do permutations instead of combinations with perm rather than comb, as required:

$ txr -t '[mapcar cat-str (perm (get-lines (open-files *args*)) 2)]' file1 file2 file3
onetwo
onethree
onefour
onefive
onesix
twoone
twothree
twofour
twofive
twosix
threeone
threetwo
threefour
threefive
threesix
fourone
fourtwo
fourthree
fourfive
foursix
fiveone
fivetwo
fivethree
fivefour
fivesix
sixone
sixtwo
sixthree
sixfour
sixfive

All possible permutations of words in different files in pairs

5 Answers5