14

I have a number of files (Jupyter notebooks, .ipynb) which are text files. All of these contain some LaTeX markup. But when I run file, I get:

$ file nb_*          
nb_1.ipynb:      ASCII text
nb_2.ipynb:      ASCII text
nb_3.ipynb:      ASCII text, with very long lines
nb_4.ipynb:      LaTeX document, ASCII text, with very long lines
nb_5.ipynb:      text, with very long lines

How does file distinguish these? I would like all files to have the same type.


(Why should the files have the same type? I am uploading them to an online system for sharing. The system classifies them somehow and treats them differently, with no possibility for me to change this. I suspect the platform uses file or maybe libmagic internally and would like to work around this.)

JigglyNaga
  • 7,706
  • 1
  • 21
  • 47
cheersmate
  • 265
  • 1
  • 8

2 Answers2

28

The file type recognition is driven by so-called magic patterns. The magic file for analyzing TeX family source code contains a number of macro names that cause a file to be classified as LaTeX. Each match is assigned a strength, e. g. 15 in case of \begin and 18 for \chapter. This makes the heuristic more robust against false positives like misclassification of Plain TeX or ConTeXt documents that happen to define their own macros with those names.

phg
  • 1,752
  • 1
  • 16
  • 30
  • 2
    `grep -i latex /usr/share/misc/magic` gets the specific patterns. Your magic file location may vary, of course. @cheersmate, you may be able to find a particular pattern or two in the "LaTeX" doc. Similarly, look for non-ASCII characters/bytes in the non-ASCII file (if it makes a difference). – mpez0 Feb 14 '20 at 14:32
2

I found one string which seems to make file classify a file as LaTeX:

$ cat text
a
b
$ cat latex
a
\begin
b
$ file text latex
text:  ASCII text
latex: LaTeX document, ASCII text

So at least I can force all files to have the same type by adding some environments to the files currently classified as text.

cheersmate
  • 265
  • 1
  • 8