How to do anti-join or inverse join in bash

Question

I want to perform what some data analysis software call an anti-join: remove from one list those lines matching lines in another list. Here is some toy data and the expected output:

$ echo -e "a\nb\nc\nd" > list1
$ echo -e "c\nd\ne\nf" > list2
$ antijoincommand list1 list2
a
b

Does this answer your question? [Is there a tool to get the lines in one file that are not in another?](https://unix.stackexchange.com/questions/28158/is-there-a-tool-to-get-the-lines-in-one-file-that-are-not-in-another) — muru, May 25 '20 at 07:16
@Muru, yes, that post provides the solutions presented in Terdon's answer. However, when I was searching for "bash anti-join" (the terminology I associate with this kind of process), I didn't find anything useful. My OP (which others have edited) stated that my explicit purpose in asking this question was to associate the term "anti-join" with the solutions, so that searching this term yields these solutions. Thanks. — Josh, May 25 '20 at 15:08

terdon · Accepted Answer · 2020-05-24T13:44:47.660

9

I wouldn't use join for this because join requires input to be sorted, which is an unnecessary complication for such a simple job. You could instead use grep:

$ grep -vxFf list2 list1
a
b

Or awk:

$ awk 'NR==FNR{++a[$0]} !a[$0]' list2 list1
a
b

If the files are already sorted, an alternative to join -v 1 would be comm -23

$ comm -23 list1 list2 
a
b

edited May 24 '20 at 13:44

answered May 24 '20 at 13:39

terdon

234,489
66
447
667

Avoiding `sort` with `grep` is great for the toy data I provided. Thanks! In the real world, my file1 often has multiple columns of data, one of which is being used for the join. A modified version of your `awk` code would address this use case. – Josh May 24 '20 at 13:46
1

@Josh yes, just change the `$0` with `$N` where `N` is the field number you are joining on. – terdon May 24 '20 at 13:47
1

This works even if the column numbers in file1 and file2 are different: like awk 'NR==FNR{++a[$2]} !a[$5]' list2 list1; quite usual for the tag file to be a different format to the main data. – Paul_Pedant May 24 '20 at 14:14
upvoted for the `comm -23` command – user2297550 Jan 30 '22 at 08:17

score 3 · Answer 2 · edited Jul 12 '23 at 22:04

3

One way to do this with the join utility is:

$ join -v 1 list1 list2
a
b

From the manpage:

-a FILENUM

: also print unpairable lines from file FILENUM, where FILENUM is 1 or 2, corresponding to FILE1 or FILE2

-v FILENUM

: like -a FILENUM, but suppress joined output lines

edited Jul 12 '23 at 22:04

Geremia

1,163
1
13
23

answered May 24 '20 at 13:28

Josh

303
1
13

jubilatious1 · Answer 3 · 2023-07-12T23:06:33.747

Using Raku (formerly known as Perl_6)

Raku has Set object types, and you can read individual files to create Sets from lines:

~$ raku -e 'my $a = Set.new: "list1".IO.lines; 
            my $b = Set.new: "list2".IO.lines; 
            say "list1 = ", $a;
            say "list2 = ", $b;'
list1 = Set(a b c d)
list2 = Set(c d e f)

You can perform asymmetric Set differences, with either ASCII infix (-), or Unicode infix ∖ :

~$ raku -e 'my $a = Set.new: "list1".IO.lines; 
            my $b = Set.new: "list2".IO.lines; 
            say $a (-) $b;'
Set(a b)

~$ raku -e 'my $a = Set.new: "list1".IO.lines; 
            my $b = Set.new: "list2".IO.lines; 
            say $b (-) $a;'
Set(e f)

OTOH, sometimes you need to perform a symmetric Set difference, and Raku has you covered. Use either ASCII infix (^) or Unicode infix ⊖ :

~$ raku -e 'my $a = Set.new: "list1".IO.lines; 
            my $b = Set.new: "list2".IO.lines; 
            say $a (^) $b;'
Set(a b e f)

Finally, you can get linewise output by changing the final line to .keys.put for … .
Final symmetric Set difference example below, using Unicode infix ⊖ operator:

~$ raku -e 'my $a = Set.new: "list1".IO.lines;
            my $b = Set.new: "list2".IO.lines;
            .keys.put for $a ⊖ $b;'
f
e
a
b

https://docs.raku.org/type/Set
https://docs.raku.org/language/setbagmix#Operators_with_set_semantics
https://raku.org

How to do anti-join or inverse join in bash

3 Answers3