7

I am writing a menu based bash script, one of the menu options is to send an email with a text file attachment. I am having trouble with checking if my file is a text file. Here is what I have:

fileExists=10
until [ $fileExists -eq 9 ]
do
  echo "Please enter the name of the file you want to attach: "
  read attachment
  isFile=$(file $attachment | cut -d\ -f2)
  if [[ $isFile = "ASCII" ]]
    then
      fileExists=0
    else
      echo "$attachment is not a text file, please use a different file"
  fi
done

I keep getting the error cut: delimiter must be a single character.

Rui F Ribeiro
  • 55,929
  • 26
  • 146
  • 227
Powea
  • 73
  • 1
  • 1
  • 4
  • 3
    Put an extra space after `-d\ `. – Michael Homer Jun 11 '15 at 09:34
  • 1
    Depending on the `file` version you have available you should consider using some options like `--brief` (which doesn't output the filename so you will have less of a problem with filenames that contain spaces) or `--mime` which returns the MIME type (e.g. `text/plain`) instead of a textual description of the file type. – Dubu Jun 11 '15 at 10:00
  • 2
    Just a note on the off-topic closure - This question would still help a lot of future readers like me. I was looking for an if statement to check if a file contained text, and this one helped me perfectly. – thepiercingarrow Mar 16 '16 at 23:58

5 Answers5

8
  1. From the fact that it says file $attachment rather than file "$attachment", I guess your script cannot handle filenames that contain spaces.  But, be advised that filenames can contain spaces, and well-written scripts can handle them.  Note, then:

    $ file "foo bar"
    foo bar:  ASCII text
    
    $ file "foo bar" | cut -d' ' -f2
    bar:
    

    One popular and highly recommended approach is to null-terminate the filenames:

    $ file -0 "foo bar" | cut -d $'\0' -f2
    :  ASCII text
    
  2. The file command makes educated guesses about what type of file a file is.  Guesses, naturally, are sometime wrong.  For example, file will sometimes look at an ordinary text file and guess that it is a shell script, C program, or something else.  So you don't want to check whether the output from file is ASCII text, you want to see whether it says that the file is a text file.  If you look at the man page for file, you will see that it more-or-less promises to include the word text in its output if the file is a text file, but this might be in a context like shell commands text.  So, it may be better to check whether the output from file contains the word text:

    isFile=$(file -0 "$attachment" | cut -d $'\0' -f2)
    case "$isFile" in
       (*text*)
          echo "$attachment is a text file"
          ;;
       (*)
          echo "$attachment is not a text file, please use a different file"
          ;;
    esac
    
  • 4
    You cannot rely on substrings in the output of `file` as for many formats `file` extracts and displays strings from the file (look for `%s` in the magic sources) which may include `text` – Stéphane Chazelas Jun 12 '15 at 22:36
  • @StéphaneChazelas: Perhaps you should take the wisdom that you have sprinkled over this page (`-b`, `< ` *`filename`*, `--mime`) and add it to your answer, rather than scribbling it on leaves and letting them blow into everybody else’s yards.  :-)  ⁠ – G-Man Says 'Reinstate Monica' Jun 16 '15 at 23:52
6
case $(file -b --mime-type - < "$attachment") in
  (text/*)
     printf '%s\n' "$attachment is probably text according to file"
     case $(file -b --mime-encoding - < "$attachment") in
       (us-ascii) echo "and probably in ASCII encoding"
     esac
esac
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
4

I would circumvent the escaping and do:

... | cut -d' ' -f2 

that way it is clear that you need a space between the delimiter character (specified by the three letters sequence ' ') and the following option. With -d\ -f2 it is easy to miss you should have done -d\ -f2.

Anthon
  • 78,313
  • 42
  • 165
  • 222
4

The problem occurs in cut -d\ -f2. Change it to cut -d\ -f2.

To cut, the arguments look like this:

# bash: args(){ for i; do printf '%q \\\n' "$i"; done; }
# args cut -d\ -f2
cut \
-d\ -f2 \

And here is the problem. \ escaped the space to a space literal instead of a delimiter between arguments in your shell, and you didn't add an extra space so the whole -d\ -f2 part appears as one argument. You should add one extra space so -d\ and -f2 appear as two arguments.

To avoid confusion, many people use quotes like -d' ' instead.

P.S.: Instead of using file and making everything ASCII, I'd rather use

if file "$attachment2" | grep -q text$; then
    # is text
else
    # file doesn't think it's text
fi
Mingye Wang
  • 1,181
  • 9
  • 23
2

Another option is to not use cut and to match a regex against the full output of file:

#...
isFile=$(file $attachment)
if [[ "$var" =~ ^[^:]*:\ ASCII ]]
#...
kos
  • 2,827
  • 1
  • 11
  • 19
  • 1
    The elephant in the room is that the output from `file` is of the form “(filename) (colon) (whitespace) (filetype)” — i.e., it includes the filename.  Therefore, if there is a binary file called `VACASCIING.JPG`, the output from `file` will be `VACASCIING.JPG: JPEG image`, and your code will call it a text file because the *filename* matches `ASCII`.  Note that the second part of [my answer](http://unix.stackexchange.com/a/208972/80216), which is functionally comparable to yours,   … (Cont’d) – G-Man Says 'Reinstate Monica' Jun 12 '15 at 18:57
  • 1
    (Cont’d) …  would not have needed the `cut` command but for this.  (The OP is using `cut` to extract the first word of the filetype, which he tests for equality.  I’m doing a pattern match, like you, so I could have just used the entire output, but I use `cut` to get the filetype only, without the filename.)  Also, [Stéphane Chazelas's answer](http://unix.stackexchange.com/a/208980/80216) uses `file < "$attachment"`, and, while not explained, I presume that that trick is also intended to get output without the filename. – G-Man Says 'Reinstate Monica' Jun 12 '15 at 18:58
  • @G-Man Indeed that was a pretty big elephant. Thanks for your comment. I've updated making the regex way more consistent. – kos Jun 12 '15 at 20:42
  • @G-Man, no `-b` avoids printing the filename. Using `<` is to avoid problems with a file called `-`. – Stéphane Chazelas Jun 12 '15 at 20:44
  • 1
    @kos, now you're moving the problem to files with `:` in their name. – Stéphane Chazelas Jun 12 '15 at 20:46
  • @StéphaneChazelas You're right. I'm coming from Windows and I always forget about this. Either I'll work out a solution to make this work consistently or I'll delete this answer. – kos Jun 12 '15 at 20:52
  • @StéphaneChazelas: Wow; I keep on learning things from you.  Thanks².  But I guess any filename *beginning* with `-` would be a problem.  And it looks like you could have used `file -- "$attachment"`, although it’s not mentioned in the man page. – G-Man Says 'Reinstate Monica' Jun 12 '15 at 21:56
  • 1
    @G-Man, I used `file - < "$attachment"`. `file -` tries to identify the file content of stdin. so it's simliar to `file -sL -- "$attachment"` except that it also work for a file called `-` (`--` doesn't help with that). – Stéphane Chazelas Jun 12 '15 at 22:23
  • @G-Man - i think the simple solution to the filename thing is simply to cancel by compariso . It can't that much of a hurdle if we already have all of the pertinent values in named vars at our disposal. Just: `printf 'file reports type:\t%s\n' "${var#*"$attachment"*:}"` – mikeserv Jun 13 '15 at 09:03