6

I have a list of files and I need to find all the image-files from that list.

For example, if my list contained the following:

pidgin.tar.gz
photo01.jpg
picture01
screenshot.gif
invoice.pdf

Then I would like only to select:

photo01.jpg
picture01
screenshot.gif

Notes:

  • Method must not be dependant on file extensions
  • Obscure image formats for Photoshop and Gimp can be ignored. ( If feh can't show it, its not a image )
Stefan
  • 24,830
  • 40
  • 98
  • 126

5 Answers5

7

The following command lists the lines in list_file that contain the name of an image file:

<list_file xargs -d \\n file -i | sed -n 's!: *image/[^ :]*$!!p'
  • file -i FOO looks at the first few bytes of FOO to determine its format and prints a line like FOO: image/jpeg (-i means to show a MIME type; it's specific to GNU file as found on Linux).
  • xargs -d \\n reads a list of files (one per line) from standard input and applies the subsequent command to it. (This requires GNU xargs as found on Linux; on other systems, leave out -d \\n, but then the file list can't contain \'" or whitespace).
  • The sed command filters out the : image/FOO suffix so as to just display the file names. It ignores lines that don't correspond to image files.
Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
  • I've thought about it but what if filename contains ' image/'. It is valid filename. Better is possibly `for f in files; do file -ib $file | grep '"^image/" && echo $file; done` – Maciej Piechotka Sep 12 '10 at 22:26
  • @Maciej: The sed script only matches lines where the text after the *last* colon is `image/FOO` (`FOO` is not allowed to contain a `:`). So it's not a problem if the file names contain `image/`. – Gilles 'SO- stop being evil' Sep 12 '10 at 23:12
  • `find . -type f | xargs -L1 file --mime-type |sed -n 's#: *image/[^ :]*$##p'` to get image files in a dir – adrianlzt Nov 27 '15 at 09:55
2
file -ib image | awk '"^image/" {print}'

If file detects image it should print line like:

image/jpeg; charset=binary

It works on magic numbers so it is not based on extentions. It

Maciej Piechotka
  • 16,578
  • 11
  • 57
  • 93
  • 2
    awk is overkill. Use grep instead: `| grep 'image'`. Also, different versions of `file` (eg on different types of Unix) may not return a MIME type, so `image/` is incorrect, and the filename comes first so `^` is also inappropriate. – Neil Mayhew Sep 12 '10 at 21:35
  • As of awk - yes as if the overhead really matter comparing to starting the new process ;) You can use grep if you like. As of MIME type - I used `-i` which asks to print MIME type - I assume others will return error [I don't think there is ultra-portable way]. As of filename - note the `-b` flag which disables printing the file (you haven't check the command I posted, have you?). – Maciej Piechotka Sep 12 '10 at 21:52
  • Oops, yes, you are right. I forgot to use the -ib when I tested it. However, using -b loses the file name, so how do you know which files matched? – Neil Mayhew Sep 12 '10 at 23:09
  • @Neil: Given that most versions of file don't produce anything close to parsable output (for example they might print `Netboot image` or `4 images/screen`), what do you propose that's better than installing a `file` that can print mime types? – Gilles 'SO- stop being evil' Sep 12 '10 at 23:17
  • @Gilles: I don't really have an alternative solution. I was simply unaware of `-i` and had failed to notice that @Maciej had used it. I see that `file` supports `-i` on Mac OS, so probably it's supported on the BSDs, too. – Neil Mayhew Sep 13 '10 at 00:03
  • @Neil: simply - if `$FILE` is filename then `file -ib $FILE | awk '"^image/" {print}' && echo $FILE` prints filename iff it's image. – Maciej Piechotka Sep 13 '10 at 15:04
1

In addition to the file command, you can also use ImageMagick. The following will show the type of all files in the current directory:

find -type f -depth 0 -print0 | xargs -0 identify

The identify command will print out something like this for various file types:

text.txt[8] TXT 612x792 612x792+0+0 16-bit DirectClass 694B 0.320u 0:00.330
php.jpg[31] JPEG 1280x1024 1280x1024+0+0 8-bit DirectClass 195KB 0.000u 0:00.000

Animated GIF files will print more information (this is a 21-frame GIF):

adhd.gif[0] GIF 211x200 211x200+0+0 8-bit PseudoClass 256c 233KB 0.000u 0:00.029
adhd.gif[1] GIF 168x130 211x200+22+22 8-bit PseudoClass 256c 233KB 0.000u 0:00.029
adhd.gif[2] GIF 168x130 211x200+22+22 8-bit PseudoClass 256c 233KB 0.000u 0:00.029
...
adhd.gif[18] GIF 168x130 211x200+22+22 8-bit PseudoClass 256c 233KB 0.000u 0:00.000
adhd.gif[19] GIF 168x130 211x200+22+22 8-bit PseudoClass 256c 233KB 0.000u 0:00.000
adhd.gif[20] GIF 168x130 211x200+22+22 8-bit PseudoClass 256c 233KB 0.000u 0:00.000

You can then use awk or similar tools to decide what to do with them.

greyfade
  • 640
  • 3
  • 7
1

If you have Python and python-magic . Eg

#!/usr/bin/env python
import magic
import os
path=sys.argv[1]
mime = magic.open(magic.MAGIC_NONE)
mime.load()
for r,d,f in os.walk(path):
    for files in f:
        filename=os.path.join(r,files)
        filetype=mime.file(filename)
        if "image" in filetype:
            print "File: %s is %s" %(filename, filetype)
user1606
  • 889
  • 5
  • 3
0

Perhaps there's something I'm missing, but this seems to work for me:

file -i * | grep "image/" | cut -d: -f1
Mat
  • 51,578
  • 10
  • 158
  • 140