2

I am confused by the output of the more program under linux. When creating a file which contains only a single letter (except for special letters like ä which work fine), more does not report the file's content, but that the file is not a text file. As soon as the file contains more than one letter, more does not report an error, but simply prints the file's output.

> rm file
> echo 'h' > file
> more file
 
******** file: Not a text file ********
 
> cat file
h

Is this a bug in my version of more or do specific requirements to text files exist, that are not fulfilled in my one-letter file example?

more version: more from util-linux 2.36.2.

Further details

Content of the file The output of od is as follows:

> od -x file
0000000 0a68
0000002

From what I understand, the problem does not seem to be only caused by echo behaving differently from the way I expected, because when I use printf as it follows, the problem persists and the output of the od command is the same (so the files produced via echo and printf should be the same)


> printf 'h\n' > file2
> more file2
 
******** file2: Not a text file ********

> od -x file2
0000000 0a68
0000002

Version of file utility

> file --version
file-5.40
magic file from /usr/share/file/misc/magic
seccomp support included

System information My system is arch linux, and echo is the shell build-in command which I utilized via bash and zsh.

Bug report Thanks to your feedback I was able to report this as a bug at the correct bug tracker: https://bugs.astron.com/view.php?id=256

mutableVoid
  • 168
  • 7
  • 1
    What is the output of `hd file` or `od -x file`? – Eduardo Trápani Apr 02 '21 at 17:57
  • 1
    ox -x file outputs `0000000 0a68 0000002` – mutableVoid Apr 02 '21 at 17:59
  • 1
    I have (long story) several one and two-bytes files on which "more" is called by a script and I never experienced this behaviour. I am now with util-linux 2.34 on Ubuntu 20.04, but this script was deployed on Ubuntu 14 (if not maybe even 12). – LSerni Apr 02 '21 at 18:00
  • @LSerni On a different system I've also got a script which I now moved to my other system on which I experienced this issue; because `more` outputs the warning instead of the file' s content, my script broke – mutableVoid Apr 02 '21 at 18:02
  • 1
    Please remember to always mention your operating system and environment. Your `echo` isn't behaving like the common bash builtin, so giving details of your system is essential to better understand the issue. – terdon Apr 02 '21 at 18:04
  • 1
    One last check. Because in the source code `more` has two ways of finding out if the file is binary or not. What does `file file2` report? – Eduardo Trápani Apr 02 '21 at 18:19
  • @EduardoTrápani `file` reports `file: data`, so that might be the error! Is this a bug in the component which determines the file type or is this a misconfiguration of the system on my end? – mutableVoid Apr 02 '21 at 18:23
  • 1
    @mutableVoid, could be this: https://bugs.astron.com/view.php?id=180 Not sure though, couldn't test. You could try with `aa\n`, and `ab\n` etc. Though hmm, `ä\n` would still be only two code points. Oh, the UTF-8 detection looks to be separate. – ilkkachu Apr 02 '21 at 18:45
  • @ilkkachu Yes, I think you're right in that it might be related to the fix for this, as the fix seems to be new in the version in which I first encountered this issue (and I checked that the scenario described in the bug report works with my version of `file`). Printing two times the same char also causes `file` to report `binary`! – mutableVoid Apr 02 '21 at 19:00

1 Answers1

2

The error is not in more, but in libmagic, that is also used by file.

For example, libmagic reports that an empty file is binary, and more has specific code to deal with that.

I see at least two solutions, you can rebuild more from source, without libmagic support or you can downgrade libmagic.

By the way, you should report it.

Eduardo Trápani
  • 12,032
  • 1
  • 18
  • 35
  • Could be due to this issue: https://bugs.astron.com/view.php?id=180 "0000180: A file filled with 0xFF gets reported to be ISO-8859" -> "Fixed by requiring at least 3 distinct character values." – ilkkachu Apr 02 '21 at 18:43