9

First thing I noticed when switched from Windows to Linux was, that Linux has no strict naming convention and no obligatory file name extensions like .bmp, .jpg, .exe etc. Therefore I can not tell by the file name itself its file format.

If all JPEG files on my file system had the .jpg extension, I could simply find all JPEG files by:

find / -type f -name "*.jpg"

But if it is not the case I am clueless how to find all JPEG files.

Abdul Al Hazred
  • 25,760
  • 23
  • 64
  • 88

1 Answers1

20

If you want to crawl on dirs and subdirs:

find /home/place/to/crawl -type f -exec file --mime-type {}  \; | awk '{if ($NF == "image/jpeg") print $0 }'

What it does?

  • Search all inodes with the type file
  • Execute the command file, to get a jpeg header of the file like: image/jpeg
  • awk

Edit: Added @Franklin tip, to use file with -i to use the mime string standard while outputing filetypes. This will reduce the false positives of the jpeg word.

Edit2: Added @don_crissti tip. Filtering now just the last column with awk and printing the whole line if matches with image/jpeg. Changed the file switch to --mime-type to suppress charset information

  • very helpful info that "file header reader", i read the headers of a jpg , gif and png file with the file command and all had the word "image" in them, does this mean that if i exchanged "...| grep JPEG" with "...| grep image" that all images regardless of format would be found ? – Abdul Al Hazred Mar 25 '15 at 20:11
  • 1
    @AbdulAlHazred: Yes, it means that. `grep` is a tool that filters lines of text that contain a certain substring. If you `grep JPEG` some text, you'll get only the lines containing "JPEG". If you `grep image` some text, you'll get only the lines containing "image". – mgarciaisaia Mar 25 '15 at 20:13
  • 1
    Not all image formats. BPM for example is an exception, an you shall find something like: `PC bitmap, Windows 3.x format, 3264 x 2448 x 24`. You will get almost all image format headers this way, but, you will have to deal with the black sheep, as the .bmp format has shown ;) –  Mar 25 '15 at 20:14
  • 1
    Fixed. Greping `JPEG image data` should do the trick –  Mar 25 '15 at 20:23
  • 3
    using `file -i` prints MIME type like `image/jpeg`. That's easier and more reliable to grep since mime type are guaranteed to never change. Also, it's easy to list all/any images format. Example to list all JPEG images: `find /home/dir/example -type f -exec file -i {} \; | grep ': image/jpeg\>'` – Franklin Piat Mar 25 '15 at 20:52
  • @FranklinPiat - that's better, indeed, but you're still left with a couple of problems either way because you're grepping `find+file` output: `grep` will not give you the expected output for filenames containing newlines; you'll also have to consider the fact that paths could be e.g. `/dir/: image/jpeg/etc`; assuming no such filenames, you still have to parse `grep` output again to get _just_ the filenames. – don_crissti Mar 26 '15 at 13:15
  • Filenames with newline is not a common practice(not justifying my mistake), and should be hell to manage this. People with common sense will not do this kind of bizarre thing. However, i agree with the grep thing and i modified my answer cause i agree with your directory point of view. Take a look at it :) –  Mar 26 '15 at 15:40
  • nwilder, OK but your output still doesn't list _just_ the filenames. I'm not trying to be an ***, I'm just pointing out the weak link in your answer - which is parsing the output of `find`+`file` (I expect other people here to do the same when they spot possible problems in my answers). And no, managing filenames with newlines (or other funky chars) isn't hell but you'll have to take a different approach. – don_crissti Mar 26 '15 at 16:36
  • 1
    I strongly recommend use of `-exec command {} +` instead of `-exec command {} \;` so as to only call `file` once (instead of once for every file). It will save a lot of time. You may want to combine this with `file`'s `--no-pad` option. The completed command would look something like this: `find -type f -exec file --no-pad --mime-type {} + | awk '$NF == "image/jpeg" {$NF=""; sub(": $", ""); print}'`. This only prints filenames only, does not include any extraneous output per request of @don_crissti – Six Sep 22 '16 at 04:11