18

Are there any general solutions to check if a file is corrupt or not? For example, whether a video file is bad, or a compressed file is corrupt, etc.

RegDwight
  • 111
  • 1
  • 1
  • 6
LanceBaynes
  • 39,295
  • 97
  • 250
  • 349

3 Answers3

15

If you know at some point in time the file is good, you can make a checksum of it and use it to compare later to make sure it's still whole. This is useful before transferring files between mediums or across networks.

If you don't know about the good state of a file, no there is no universal way or checking for corruption. Only the specific file format in each case determines what is corrupt or not corrupt data.

Caleb
  • 69,278
  • 18
  • 196
  • 226
10

No, there aren't any general solutions. The only way to check if a file is corrupt is to try and read it; only software which knows how to read that particular format can do that.

What you could do is use file to identify the type of the file, and then use the type to choose an appropriate program to check the file. You could write a script like this:

# /bin/bash -eu

FILENAME=$1

FILETYPE="$(file -b $FILENAME | head -1 | cut -d , -f 1)"
case "$FILETYPE" in
    "gzip compressed data") CHECKER="gunzip -t" ;;
    # many, many more lines here
    *) echo "Unknown type: $FILETYPE"; exit 1 ;;
esac

$CHECKER $FILENAME

But you'd have a lot of work to do to fill out the case statement.

It's possible that someone has already written such a script (or program), but i don't know of any.

Tom Anderson
  • 936
  • 2
  • 7
  • 20
  • 1
    *"only software which knows how to read that particular format can do that"* is a false assumption. There are a lot of programs that do not care for the type of file you give them. (Think for example `grep`, `cat`, `tar` ...). Your solution is therefore very bloated. – rozcietrzewiacz Aug 15 '11 at 12:03
  • 4
    By "read", i meant "interpret" - i should have been more clear. You can't use `cat`, or any other program which treats a file purely as an unstructured stream of bytes, to check for corruption. I don't believe my solution is bloated. – Tom Anderson Aug 15 '11 at 13:14
  • You can ,as [Caleb suggested](http://unix.stackexchange.com/questions/15157/how-to-check-if-a-file-is-corrupt-or-not/15158#15158) treat each file as binary data and store checksums for later verification. This is universal, simple and relatively fast. – rozcietrzewiacz Aug 15 '11 at 13:21
  • 1
    But I see now that your approach has a benefit that you could perform the verification even on files which you haven't seen or accessed earlier. This is definitely a plus - you might point it out in your answer. – rozcietrzewiacz Aug 15 '11 at 13:25
  • can this code able to detect corrupted video files as well? – alper Aug 13 '22 at 18:24
  • 1
    @alper If you only have video, it's even easier - [`ffmpeg -v error`](https://superuser.com/a/100290/86299) will check any video for errors, without the need for a type-detecting driver script like mine. – Tom Anderson Aug 15 '22 at 14:59
3

If you happen to use ZFS, either you can read the file and it is guaranteed not being corrupted or you got a read error and it is.

Edit After the wise comments, here is a clarification of my answer:

ZFS can protect and detect against silent data corruption. eg: http://www.zdnet.com/blog/storage/data-corruption-is-worse-than-you-know/191 Of course if the file is already corrupted at the time it is initially written, there is nothing the file system can do.

To protect against corruption that would happen during the transmission of the file, the usual general purpose techniques are md5sum or similar hashes.

jlliagre
  • 60,319
  • 10
  • 115
  • 157
  • wow, what a feature :O – LanceBaynes Jun 17 '11 at 10:23
  • 2
    So if you download a video from the web that's corrupt? ZFS does nothing to help you there - it just verifies that the corrupt file doesn't get changed. ZFS is fantastic, but it's not a solution to checking for corrupt files. – Tom Anderson Jun 17 '11 at 10:24
  • Unfortunatly this is just a file system integrity check, not an actual understanding of files and whether they are corrupt. The most common usage I suspect @Lance is after is being able to decide if an incoming file downloaded or otherwise transfered is valid or not. ZFS can't magically decide if a file is good or not, only promise that whatever you give it is saved and returned in one piece locally. – Caleb Jun 17 '11 at 10:51
  • As the question is tagged /data-recovery and /filesystems, I assumed it was about silent data corruption, not about files already broken in the first place. Answer edited to clarify that point. – jlliagre Jun 17 '11 at 11:41
  • @jiliagre: I retaged this question with that tag (possibly wrongly) about an hour after your answer. When you answered it it was simply tagged "linux". – Caleb Jun 17 '11 at 12:02
  • Indeed. @LanceBaynes It would help if you clarify what the question is really about ... – jlliagre Jun 17 '11 at 13:55