Strange case: Text file that exist and doesn't exist

Question

I'm completely puzzled about a problem with a single plain text file in my system fedora 12. I used a known software in bioinformatics, maker, to produce lots of plain text files and one of them it seems to be "inaccessible".

Particularly, my file named Clon1918K_PCC1.gff is listed when I use ls, ls -a, ls -li ... commands but when I try to access it by cat, vim, cp, ls etc it appears always the same error Clon1918K_PCC1.gff: No such file or directory.

However, when I copy all the files with cp *.gff or cp * this file it's also copied.

Also I tried to open it with nautilus without problem and in one of two cases when I copied the content to another file with the same name the problem disappears. Interestingly in this case the strange file is not rewritten and 2 files with exactly the same name appear, one of them accessible and another inaccessible. I looked for hidden characters but all seems ok.

Someone has any idea about this strange case?? Thanks!

Alexios · Answer 1 · 2012-06-15T13:28:37.947

You can't have two files with the same name in the same directory. Filenames are by definition unique keys.

What you have is almost certainly a special character. I know you checked for them, but how exactly? You could say something like ls *gff | hexdump -C to find where the special characters are. Any byte with the high bit set (that is, hexadecimal values between 80 and FF) will be an indication of something gone wrong. Anything below 20 (decimal 32) is also a special character. Another hint is the presence of dots . in the right, text column of hexdump -C.

There are numerous characters that look like US ASCII characters in UTF-8. Even in US ASCII, 1 and l can often look similar. Then, you have The C from Cyrillic (U+0421), the Greek Lunate Sigma (U+03F9, also exactly like a C), Cyrillic/Greek lower case ‘o’, etc. And those are just the visible ones. There are quite a few invisible Unicode characters that could be in there.

Explanation: why does the high bit signify something gone wrong? The filename ‘Clon1918K_PCC1.gff’ appears to be 100% 7-bit US ASCII. Putting it through hexdump -C produces this:

00000000  43 6c 6f 6e 31 39 31 38  4b 5f 50 43 43 31 2e 67  |Clon1918K_PCC1.g|
00000010  66 66                                             |ff|

All of these byte values are below 0x80 (8th bit clear) because they are all 7-bit US ASCII codepoints. Unicode codepoints U+0000 to U+007F represent the traditional 7-bit US ASCII characters. Codepoints U+0080 and above represent other characters and are encoded as two to six bytes in UTF-8 (on Linux, try man utf8 for a lot of information on how this is done). By definition, UTF-8 encodes US-ASCII codepoints as themselves (i.e. hex ASCII character 41, Unicode U+0041, is encoded as the single byte 41). Codepoints ≥ 128 are encoded as two to six bytes, each of which have the eighth bit set. The presence of a non-ASCII character can easily be detected by this without having to decode the stream. For example, say I replace the third character in the filename, ‘o’ (ASCII 6f, U+006F) with the Unicode character ‘U+03FB GREEK SMALL LETTER OMICRON’ which looks like this: ‘ο’. hexdump -C then produces this:

00000000  43 6c ce bf 6e 31 39 31  38 4b 5f 50 43 43 31 2e  |Cl..n1918K_PCC1.|
00000010  67 66 66                                          |gff|

The third character is now encoded as the UTF-8 sequence ce bf, each byte of which has its 8th bit set. And this is your sign of trouble in this case. Also, note how hexdump, which only decodes 7-bit ASCII, fails to decode the single UTF-8 character and shows two unprintable characters (..) instead.

In this particular case, with a filename ostensibly comprised of 100% US ASCII characters, anything over U+007F would be an indication of something gone wrong with the file's naming, no? I'll update my answer to make this a bit clearer. — Alexios, Jun 15 '12 at 13:15

score 2 · Answer 2 · answered Jun 15 '12 at 11:37

2

try to rename the file with nautilus, but type the desired name (do not copy paste). That should certainly remove any special characters. It might even be a space after/before the filename that is invisible to you but visible to the OS and programs. I usually use mc to cope with real'weird filenames.

answered Jun 15 '12 at 11:37

akostadinov

944
10
19

I like to use carefully constructed wildcard patterns with `?`, then I revert to either a tiny Python/Perl/AWK one-liner, or a file manager. – Alexios Jun 15 '12 at 13:30

score 1 · Answer 3 · answered Jun 15 '12 at 16:36

1

Have you considered the presence of a rootkit? Once upon a time, I had access to a Solaris machine that had a rootkit installed. Files named '*01' were not visible with ls *01 or ls -altr, but did show up with an echo *01. The installation of the rootkit had changed ls (and a number of other executables) so that certain files and processes did not appear under the usual circumstances. Your description sounds a lot like the rootkit I encountered.

answered Jun 15 '12 at 16:36

This sounds like almost definitely high chars in the filename, since a bunch of commercial software doesn't handle a POSIX filesystem properly. But voting up because rootkits _do_ sometimes have screwed-up logic which manifests in weird ways due to how they try to hide things, so it's also something to check. :) – dannysauer Aug 14 '18 at 22:23

score 1 · Answer 4 · answered Jun 17 '12 at 01:47

It's likely that there is a “strange” character in the file name: perhaps a space, or a control character, or a non-ASCII character that looks like an ASCII character. Since the file is matched by *.gff, any special chararcter would be before the ..

Run LC_ALL=C ls -l --quoting-style=c *.gff to see a non-ambiguous representation of the file name.

Run mv -i *.gff Clon1918K_PCC1.gff to rename the file to a known name.

score 1 · Answer 5 · answered Aug 14 '18 at 22:29

In case someone stumbles across this and reads the other answers... You could jump through a lot of hoops or gamble with wildcards like some of the answers say, or just use ls -b - I remember it as "binary".

Tab completion in the shell should automatically quote the character, but you can either use something which isn't the shell (like Nautilus) or use the shell-escape quoting style with ls to generate a convenient pre-quoted string for other commands. I used this weird file example in another longer answer elsewhere, but it's relevant here as well:

sauer@lightning:/tmp/test> ls
a??file
sauer@lightning:/tmp/test> ls --quoting-style=shell-escape
'a'$'\t\033''file'
sauer@lightning:/tmp/test> mv -v 'a'$'\t\033''file' regular_filename
renamed 'a'$'\t\033''file' -> 'regular_filename'

score 0 · Answer 6 · edited Sep 03 '14 at 10:39

0

Try to use

find . -iname Clon1918K_PCC1.gff

this file may be in any subdirectory and not in the current directory.

edited Sep 03 '14 at 10:39

Raphael Ahrens

9,701
5
37
52

answered Sep 03 '14 at 10:08

user82870

1

score 0 · Answer 7 · answered Jun 15 '12 at 11:01

0

It is not possible in linux to have two files with the same name residing in same directory.

Try to vim the parent directory and then navigate to "stranger" file and see, if you can access it

answered Jun 15 '12 at 11:01

SHW

14,454
14
63
101

Strange case: Text file that exist and doesn't exist

7 Answers7

Linked