7

If I have n files in a directory, for example;

a
b
c

How do I get pairwise combinations of these files (non-directional) to pass to a function?

The expected output is

a-b
a-c
b-c

so that it can be passed to a function like

fn -file1 a -file2 b
fn -file1 a -file2 c
...

This is what I am trying out now.

for i in *.txt
 do
  for j in *.txt
   do
    if [ "$i" != "$j" ]
     then
      echo "Pairs $i and $j"
     fi
   done
 done

Output

Pairs a.txt and b.txt
Pairs a.txt and c.txt
Pairs b.txt and a.txt
Pairs b.txt and c.txt
Pairs c.txt and a.txt
Pairs c.txt and b.txt

I still have duplicates (a-b is same as b-a) and I am thinking perhaps there is a better way to do this.

mindlessgreen
  • 1,229
  • 4
  • 12
  • 21

5 Answers5

10

Put the file names in an array and run through it manually with two loops.

You get each pairing only once if if j < i where i and j are the indexes used in the outer and the inner loop, respectively.

$ touch a b c d
$ f=(*)
$ for ((i = 0; i < ${#f[@]}; i++)); do 
      for ((j = i + 1; j < ${#f[@]}; j++)); do 
          echo "${f[i]} - ${f[j]}"; 
      done;
  done 
a - b
a - c
a - d
b - c
b - d
c - d
ilkkachu
  • 133,243
  • 15
  • 236
  • 397
  • 1
    Note that it is better to use `printf` rather than `echo`: https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo – cryptarch Dec 23 '18 at 20:06
  • 1
    @cryptarch, to be in line with the question, the content of the loop should be a call to `fn`, instead of `echo` or `printf`. `echo` works fine as an example here, though. – ilkkachu Dec 23 '18 at 21:24
  • Sure, it's not broken, you already got my +1 ;) – cryptarch Dec 23 '18 at 21:45
5

You're very close in your script, but you want to remove duplicates; i.e a-b is considered a duplicate of b-a.

We can use an inequality to handle this; only display the filename if the first file comes before the second file alphabetically. This will ensure only one of each matches.

for i in *.txt
do
  for j in *.txt
  do
    if [ "$i" \< "$j" ]
    then
     echo "Pairs $i and $j"
    fi
  done
done

This gives the output

Pairs a.txt and b.txt
Pairs a.txt and c.txt
Pairs b.txt and c.txt

This isn't an efficient algorithm (it's O(n^2)) but may be good enough for your needs.

Stephen Harris
  • 42,369
  • 5
  • 94
  • 123
  • This will take more than twice as long as https://unix.stackexchange.com/a/490657/305714 because you are checking each pair twice rather than restricting the loop to avoid redundancy – cryptarch Dec 23 '18 at 20:08
  • Yes, but without knowing the cost of `fn` it's hard to know if this overhead is significant or not. Taking 0.2s instead of 0.1s doesn't mean anything if every call to `fn` takes 1 second. Sometimes the naive algorithms are just fine ;-) In this case I just fixed the original code, rather than providing a more optimised alternative, because I considered it a better "teaching" solution. – Stephen Harris Dec 23 '18 at 20:16
1

With join trick for filenames without whitespace(s):

Sample list of files:

$ ls *.json | head -4
1.json
2.json
comp.json
conf.json

$ join -j9999 -o1.1,2.1 <(ls *.json | head -4) <(ls *.json | head -4) | awk '$1 != $2'
1.json 2.json
1.json comp.json
1.json conf.json
2.json 1.json
2.json comp.json
2.json conf.json
comp.json 1.json
comp.json 2.json
comp.json conf.json
conf.json 1.json
conf.json 2.json
conf.json comp.json

  • -j option points to a common field position to join on; but -j9999 will provoke mixed joining resembling cartesian product.
RomanPerekhrest
  • 29,703
  • 3
  • 43
  • 67
0
for i in *.txt ; do
  for j in *.txt ; do
    if [ "$i" '<' "$j" ] ; then
      echo "Pairs $i and $j"
    fi
  done
done
Ole Tange
  • 33,591
  • 31
  • 102
  • 198
0

You could use perl's Alogithm::Combinatorics module to avoid having to devise the algorithm yourself.

perl -MAlgorithm::Combinatorics=combinations -e '
  if ((@files = <*.txt>) >= 2) {
    for (combinations(\@files, 2)) {
      system "cmd", "-file1", $_->[0], "-file2", $_->[1];
    }
  } else {
    die "Not enough txt files in the current working directory\n";
  }'

See perldoc Algorithm::Combinatorics for details and other things that module can do.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501