4

I read that using the sha1sum command will run the SHA-1 algorithm and give you a "unique" result, but how can that be?

sha1sum gives you a 40 character hash, like this:

e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e

This is 40 characters long, and has hexadecimal characters, so the hash can have at most 1640 combinations.

This should mean that, for any two random files, there is a small chance that their hash sum would be exactly the same, right?

Stephen Kitt
  • 411,918
  • 54
  • 1,065
  • 1,164
iamAguest
  • 483
  • 1
  • 6
  • 17
  • 1
    See also a couple of related questions on crypt.SE: [Is it fair to assume that SHA1 collisions won't occur on a set of <100k strings](https://crypto.stackexchange.com/q/2583/34456), [How secure is SHA1? What are the chances of a real exploit?](https://crypto.stackexchange.com/q/48289/34456), [Why is SHA-1 considered broken?](https://crypto.stackexchange.com/q/3690/34456) – ilkkachu Aug 21 '18 at 09:29
  • 8
    This does not seem to be related to Unix, but to mathematics. – Kusalananda Aug 21 '18 at 09:34
  • 6
    I'm voting to close this question as off-topic because, even if it mentions the `sha1sum` command, it has nothing to do with Unix&Linux; it's a generic question about hash functions. – dr_ Aug 21 '18 at 12:50

1 Answers1

10

A SHA-1 hash, like any hash, isn’t unique; in fact collisions have been found (and more importantly, crafted).

The desirable property of a hash function is that it’s difficult to construct two files which produce the same hash, not that the hashes produced are unique (as you mention, a SHA-1 sum has 160 bits, so there are only 2160 different hashes); difficult meaning that you can’t find a collision faster than brute-force search. If you pick one file, another randomly-chosen file has a one-in-2160 chance of having the same hash. SHA-1 is considered insecure since 2005; you’ll see the odds given there as one-in-280, thanks to the birthday attack (the odds there are those of finding two colliding files in a large haystack, not those of finding a colliding file for a specific target).

Stephen Kitt
  • 411,918
  • 54
  • 1,065
  • 1,164
  • 3
    Two files have a 2^-160 chance of colliding, but having 2^80 files to test against each other gives about a 50 % chance of a collision, which is why the bit-strength of a hash is usually divided by two to get the strength against collisions. (It's not a question of having a collision between the two files Stephen has, but having one within ~all files, ever.) – ilkkachu Aug 21 '18 at 09:18
  • 2
    2^80 is about 10^24, which is about the mass of the Earth in kilograms, just to give the number some scale. Random collisions are astronomically improbable, it's the brokenness of the hash you need to care about. – ilkkachu Aug 21 '18 at 09:23
  • 1
    Ah yes, thanks @ilkkachu, it’s really about the average number of operations required to find a collision (statistically), correct? So the basic odds are 1:2^160, but you don’t need that many operations to find a collision because the start point isn’t fixed. As in, if you pick one file, then another, there’s a 1:2^160 chance they’ll collide, but that’s not the same as finding a collision in a 2^80-sized haystack. – Stephen Kitt Aug 21 '18 at 09:30
  • @StephenKitt, yea, exactly. Collisions are the case where you don't care what the colliding files are, so the [birthday attack](https://en.wikipedia.org/wiki/Birthday_attack) applies, hence 2^-80. Preimage resistance is different, that's where you're trying to find another file that gives the same hash as some single particular file, so the odds are 2^-160 for each file. – ilkkachu Aug 21 '18 at 09:39