Pairwise combinations of filenames

Question

If I have n files in a directory, for example;

a
b
c

How do I get pairwise combinations of these files (non-directional) to pass to a function?

The expected output is

a-b
a-c
b-c

so that it can be passed to a function like

fn -file1 a -file2 b
fn -file1 a -file2 c
...

This is what I am trying out now.

for i in *.txt
 do
  for j in *.txt
   do
    if [ "$i" != "$j" ]
     then
      echo "Pairs $i and $j"
     fi
   done
 done

Output

Pairs a.txt and b.txt
Pairs a.txt and c.txt
Pairs b.txt and a.txt
Pairs b.txt and c.txt
Pairs c.txt and a.txt
Pairs c.txt and b.txt

I still have duplicates (a-b is same as b-a) and I am thinking perhaps there is a better way to do this.

https://unix.stackexchange.com/q/11343/117549 – Jeff Schaller Dec 23 '18 at 19:38 — Jeff Schaller, Dec 23 '18 at 19:38

score 10 · Accepted Answer · answered Dec 23 '18 at 19:56

10

Put the file names in an array and run through it manually with two loops.

You get each pairing only once if if j < i where i and j are the indexes used in the outer and the inner loop, respectively.

$ touch a b c d
$ f=(*)
$ for ((i = 0; i < ${#f[@]}; i++)); do 
      for ((j = i + 1; j < ${#f[@]}; j++)); do 
          echo "${f[i]} - ${f[j]}"; 
      done;
  done 
a - b
a - c
a - d
b - c
b - d
c - d

answered Dec 23 '18 at 19:56

ilkkachu

133,243
15
236
397

1

Note that it is better to use `printf` rather than `echo`: https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo – cryptarch Dec 23 '18 at 20:06
1

@cryptarch, to be in line with the question, the content of the loop should be a call to `fn`, instead of `echo` or `printf`. `echo` works fine as an example here, though. – ilkkachu Dec 23 '18 at 21:24
Sure, it's not broken, you already got my +1 ;) – cryptarch Dec 23 '18 at 21:45

score 5 · Answer 2 · answered Dec 23 '18 at 20:05

5

You're very close in your script, but you want to remove duplicates; i.e a-b is considered a duplicate of b-a.

We can use an inequality to handle this; only display the filename if the first file comes before the second file alphabetically. This will ensure only one of each matches.

for i in *.txt
do
  for j in *.txt
  do
    if [ "$i" \< "$j" ]
    then
     echo "Pairs $i and $j"
    fi
  done
done

This gives the output

Pairs a.txt and b.txt
Pairs a.txt and c.txt
Pairs b.txt and c.txt

This isn't an efficient algorithm (it's O(n^2)) but may be good enough for your needs.

answered Dec 23 '18 at 20:05

Stephen Harris

42,369
5
94
123

This will take more than twice as long as https://unix.stackexchange.com/a/490657/305714 because you are checking each pair twice rather than restricting the loop to avoid redundancy – cryptarch Dec 23 '18 at 20:08
Yes, but without knowing the cost of `fn` it's hard to know if this overhead is significant or not. Taking 0.2s instead of 0.1s doesn't mean anything if every call to `fn` takes 1 second. Sometimes the naive algorithms are just fine ;-) In this case I just fixed the original code, rather than providing a more optimised alternative, because I considered it a better "teaching" solution. – Stephen Harris Dec 23 '18 at 20:16

score 1 · Answer 3 · answered Dec 23 '18 at 19:36

With join trick for filenames without whitespace(s):

Sample list of files:

$ ls *.json | head -4
1.json
2.json
comp.json
conf.json

$ join -j9999 -o1.1,2.1 <(ls *.json | head -4) <(ls *.json | head -4) | awk '$1 != $2'
1.json 2.json
1.json comp.json
1.json conf.json
2.json 1.json
2.json comp.json
2.json conf.json
comp.json 1.json
comp.json 2.json
comp.json conf.json
conf.json 1.json
conf.json 2.json
conf.json comp.json

-j option points to a common field position to join on; but -j9999 will provoke mixed joining resembling cartesian product.

score 0 · Answer 4 · answered Dec 25 '18 at 21:43

0

for i in *.txt ; do
  for j in *.txt ; do
    if [ "$i" '<' "$j" ] ; then
      echo "Pairs $i and $j"
    fi
  done
done

answered Dec 25 '18 at 21:43

Ole Tange

33,591
31
102
198

score 0 · Answer 5 · answered Aug 18 '23 at 11:46

You could use perl's Alogithm::Combinatorics module to avoid having to devise the algorithm yourself.

perl -MAlgorithm::Combinatorics=combinations -e '
  if ((@files = <*.txt>) >= 2) {
    for (combinations(\@files, 2)) {
      system "cmd", "-file1", $_->[0], "-file2", $_->[1];
    }
  } else {
    die "Not enough txt files in the current working directory\n";
  }'

See perldoc Algorithm::Combinatorics for details and other things that module can do.

Pairwise combinations of filenames

5 Answers5