How to check whether file1 is a prefix of file2?

Question

I have two files with sizes 124665 and 124858 in bytes and want to check whether file1 is a prefix of file2 or not.

score 12 · Answer 1 · edited Jun 07 '14 at 19:23

12

If your system has the cmp command from GNU diffutils, then one option is

cmp -n 124665 file1 file2

to compare at most the first 124665 bytes of the two files and report if they differ - or, more generally

cmp -n "$(wc -c < file1)" file1 file2

edited Jun 07 '14 at 19:23

Stéphane Chazelas

522,931
91
1,010
1,501

answered Jun 07 '14 at 16:37

steeldriver

78,509
12
109
152

@StephaneChazelas I'm second guessing myself here but would it have been better to suggest `$(stat -c %s file1)` for the size in bytes? Does `wc` actually open and process the whole file to get the byte count? – steeldriver Jun 07 '14 at 19:51
2

no, most `wc` implementations will optimise that case and do a `fstat()` (or/and a `lseek(SEEK_END)`) so will be as efficient as it gets. On the other hand, that `stat -c` is GNU specific. – Stéphane Chazelas Jun 07 '14 at 19:52
1

Although if you're going to require the GNU-specific `cmp`, you might reasonably assume GNU-specific `stat`. – Barmar Jun 11 '14 at 19:04

score 11 · Accepted Answer · edited Jun 07 '14 at 19:21

11

Supposing you have the size of file1 in the variable FILE1_SZ and your head implementation supports the (non-standard) -c option:

if head -c "$FILE1_SZ" file2 | cmp -s - file1; then
    echo "file1 is a prefix of file2"
else
    echo "file1 is not a prefix of file2"
fi

edited Jun 07 '14 at 19:21

Stéphane Chazelas

522,931
91
1,010
1,501

answered Jun 07 '14 at 15:50

Joseph R.

38,849
7
107
143

@StéphaneChazelas Can you please explain why `cmp` would be better than `diff` here? – Joseph R. Jun 07 '14 at 19:40
7

Because `cmp` does a simple byte to byte comparison, and returns as soon as it finds a difference, while `diff` is a text utility that is going to use a complex algorithm to show you all the differences between the two files which you don't care about. – Stéphane Chazelas Jun 07 '14 at 20:02

Nate Eldredge · Answer 3 · 2014-06-08T04:18:19.230

3

GNU cmp can solve the problem in an easier way:

cmp file1 file2

There are four possible outputs (barring some sort of error).

No output: the files are identical.
cmp: EOF on file1: file1 is a prefix of file2.
cmp: EOF on file2: file2 is a prefix of file1.
file1 file2 differ: byte NNN, line MMM: Neither is a prefix of the other.

Unfortunately this is a little awkward to use in a script, since these cases don't seem to be distinguished in the exit code. Moreover, the EOF on file1 messages go to stderr, while the file1 file2 differ message goes to stdout.

I presume that other versions of cmp do something similar, but I have not checked.

edited Jun 08 '14 at 04:18

answered Jun 07 '14 at 18:11

Nate Eldredge

951
8
12

1

`cmp` is not a GNU-only command nor did it originate there, it was already in the first version of Unix in the early 70s. The `-n` option is GNU specific though. – Stéphane Chazelas Jun 07 '14 at 19:26
You could do `cmp file1 file2 2>&1 | grep EOF on file1` – David Z Jun 08 '14 at 01:39
@StéphaneChazelas: That is true. I didn't mean to imply that `cmp` was unique to GNU, just that GNU `cmp` was the only version I tried. I added a sentence to clarify. – Nate Eldredge Jun 08 '14 at 04:19
@DavidZ: Yes, you could, but it gets a little less robust. Imagine that you are trying to do this with two files supplied by the user, and one of them is named `file1` and the other is named `file12`. (Or worse yet, what if the second file is named `EOF on file1`?) Solving this robustly using `cmp` is probably much more trouble than writing the obvious 5-line program in C... – Nate Eldredge Jun 08 '14 at 04:23
There may be contexts where a C program isn't practical, though. And it's not that hard to make it fairly robust, because the output of `cmp` is so tightly constrained. Using the `-x` option on `grep` to match the entire line will take care of all but the most exotic cases (e.g. newlines in the filename). – David Z Jun 08 '14 at 04:29

How to check whether file1 is a prefix of file2?

3 Answers3