29

I like being able to name files and directories with an underscore prefix if it's something I want to keep separate from other files and directories at the same level. On Windows and Mac, for example, prefixing a file with an underscore sorts it to the top, in front of files starting with an alphanumeric character.

My googling has turned up that it has to do with the LC_COLLATE and my current locale (en_US). That's fine, though I really don't understand why en_US doesn't sort as expected.

Based on the ICU Collate demonstration site setting locale to en_US_POSIX certainly appears to have the sort order I'm looking for (you have to edit the sample data and add some underscores to test it out). But I don't really see how to apply this in my Linux shell.

Ideally, I'd like to be able to set up something in my bash config so that ls always sorts underscores first. How would I go about doing this?

Tom Auger
  • 584
  • 2
  • 6
  • 11
  • I can't reproduce using ICU Collate with defaults or with en_US_POSIX.txt via "Fetch rules for locale". Can you explain the settings you used? – Mikel Jun 01 '12 at 17:58
  • Similar question http://askubuntu.com/questions/47702/tell-ls-to-sort-by-regular-ascii-codes-not-intelligently – Mikel Jun 01 '12 at 19:16
  • @Mikel using the link I supplied above, add some underscores to the test data and then submit to see the results of the sort. – Tom Auger Jun 02 '12 at 13:36
  • 1
    That's exactly what I did, and strings beginning with underscores get sorted in the middle rather than the beginning, as if the underscores were not there. – Mikel Jun 02 '12 at 15:54
  • @Mikel wow, you're right. I had to go in and change the "sort order" to "Phonebook" sort order before it started putting underscores first. – Tom Auger Jun 02 '12 at 19:42
  • 1
    A related question, that deals in actually changing the collation order definition, is https://unix.stackexchange.com/questions/421908/ . – JdeBP Sep 26 '18 at 08:30

4 Answers4

20

If you don't care to mix lowercase and uppercase, set your locale to C, which takes characters in their numerical order. _ falls between uppercase and lowercase.

$ LC_COLLATE=C ls    
BAR  FOO  _score  _under  hello  world
$ LC_COLLATE=en_US ls                    
BAR  FOO  hello  _score  _under  world

The locale settings LC_MESSAGES (language of error messages), LC_CTYPE (character sets) and LC_TIME (date and time format) are vey useful. LC_COLLATE and LC_NUMERIC are usually more trouble than they're worth, I don't recommend setting them. Proper lexicographic sorting is more complicated than LC_COLLATE is supposed to specify, and it can cause all sorts of weird behaviors when you use character ranges in regular expressions. LC_NUMERIC is mostly cosmetic, except when something goes horribly wrong because some program produced a number with a decimal separator other than ..

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
  • +1 Very interesting. So using this form, you're temporarily setting the environment variable LC_COLLATE just for that one instance of ls? Is that right? – Tom Auger Jun 02 '12 at 13:40
  • 2
    Any way to make the underscores appear BEFORE the upper case letters? – Tom Auger Jun 02 '12 at 13:42
  • 3
    @TomAuger Yes, `VAR=value cmd` sets `VAR` to `value` only in the environment of `cmd` and doesn't touch the value (or absence of value) in the shell where you run it. To make the underscore appear before uppercase, you would need to define your own locale settings. This is possible, but awkward to use, because at least under Linux, the standard library only looks for locale definitions in `/usr/lib/locale` — there's no `~/.locale` or environment variable where you could put your `en_tom` setting. – Gilles 'SO- stop being evil' Jun 02 '12 at 13:49
  • @TomAuger If this is only about the `ls` command, go with [Mikel's suggestion](http://unix.stackexchange.com/a/39830). – Gilles 'SO- stop being evil' Jun 02 '12 at 13:50
  • What does `C` mean? – Iulian Onofrei Oct 22 '21 at 12:48
  • 2
    @IulianOnofrei It comes from being the default locale in the [C programming language](https://en.wikipedia.org/wiki/C_(programming_language)). Its effect is to classify and sort characters in a straightforward way based on their encoding, and to use default English conventions for messages, numbers and time. – Gilles 'SO- stop being evil' Oct 22 '21 at 15:38
  • Changing `LC_COLLATE` is the better approach. But if for some reason it doesn't work (like my case) you can use `$ LC_ALL=C ls`. And then [put it in an alias](https://unix.stackexchange.com/questions/347178/bash-aliases-vs-alias-command) for ease of use. There is a bunch of locale environmental variables, and you can see them with `$ locale` – zazke Jul 09 '22 at 20:31
  • This answer still doesn't work if I have both upper and lower case files! – user643011 Sep 03 '23 at 03:30
7

If you can't get ls to sort the way you want, try shell expansion.

You can use file name patterns to run ls with a list of files that the shell already sorted, bypassing the method that ls uses.

ls -lf _* [!_]*

Assuming you have the files

_a a _b b _c c

this is like running

ls -lf _a _b _c a b c

Explanation:

_* is a shell pattern matching any file name beginning with an underscore, expanded in alphabetic order.

[!_]* matches any file name not beginning with an underscore, expanded in alphabetic order.

-f tells ls to not sort, because the shell already did.

More information: bash filename expansion

If there are directories in the current directory you will want to run the command like this to avoid ls listing files in the directories:

ls -lfd _* [!_]*
Renan
  • 16,976
  • 8
  • 69
  • 88
Mikel
  • 56,387
  • 13
  • 130
  • 149
  • Not what I was hoping for, but heck, if it works.... alias it.... – Tom Auger Jun 02 '12 at 13:37
  • Wait, though, if I understand this, it will put all the underscore stuff first, BUT it won't do any kind of alphabetical sorting at all? That's just trading one problem for another. I should revise my original question to specify that I still want alpha sorting; I just want underscores to appear first, before anything else - like a basic DOS / Windows / OSX sort? – Tom Auger Jun 02 '12 at 13:41
  • @TomAuger This still does alphabetical sorting. It puts all file names starting with underscores first (in lexicographic order as determined by your locale). then all other file names (again in lexicographic order). – Gilles 'SO- stop being evil' Jun 02 '12 at 13:52
  • 7
    By the way, DOS/Windows/OSX don't really put underscores before anything else: they sort case-insensitively with the underscore put before letters, but some other punctuation characters go before or after the underscore. Using `_` to make files appear first is an OS-specific hack; and the unix version of this hack is to start the file name with a capital letter: the default unix convention is to use only lowercase letters in file names. – Gilles 'SO- stop being evil' Jun 02 '12 at 13:53
  • 4
    Or zeros; e.g. `00README`. – mattdm Jun 02 '12 at 14:56
  • @Gilles I think where I'm confused then is the -f flag which, as you say, is required to make ls not sort the output. So what's doing the alphabetical sorting? Or am I misunderstanding the -f flag? – Tom Auger Jun 02 '12 at 19:45
  • 1
    @Gilles +1 for the unix best practice of using caps on important files to make them ls first. At the end of the day, if that's the convention, it's probably best that I simply adopt that, rather than attempt to force unix to behave the way other OSes do so I can use conventions that were developed for Mac or Windows. Thanks for the great tip. – Tom Auger Jun 02 '12 at 19:46
  • 1
    @TomAuger `-f` tells `ls` not to do its own sorting, so it displays its arguments in the order they are passed. The result of each the shell wildcard expansion `_*` and `[!_]*` is a lexicographically sorted list. – Gilles 'SO- stop being evil' Jun 02 '12 at 19:51
  • @Gilles - I see where my misunderstanding came from - the _directory listing_ is still sorted; it's the _arguments_ that don't get sorted with the `-f` flag, am I correct? – Tom Auger Jun 04 '12 at 14:59
  • 1
    @TomAuger The arguments to `ls` are sorted (in two groups: the ones starting with `_`, then the others) when they are generated by the shell. Run `echo ls -lf _* [!_]*` to see what happens. The `-f` flag tells `ls` not to do any sorting. – Gilles 'SO- stop being evil' Jun 04 '12 at 18:02
  • If there are directories in the current directory you will want to run the command like this to avoid ls listing files in the directories: ls -lfd _* [!_]* – spkane Feb 07 '13 at 19:45
  • This doesn't work in zsh... It seems to execute, `ls -lfd _* [`, and gives the error, `zsh: event not found: _]`. – Jack_Hu May 19 '21 at 18:05
  • 1
    UPDATE: This is because `!` has a special meaning in zsh. There's a simple fix though, and that's to exchange `!` for `^`, which equates to the same thing... So for ZSH, use: `ls -lfd _* [^_]*`, or, IMO a better output is from, `ls -dUl -- _* [^_]*`. :) – Jack_Hu May 19 '21 at 18:25
  • `ls -fld _* [!_]*` shows a long list format with underscore prefixed files first. – user643011 Sep 03 '23 at 03:34
3

Unfortunately Linux uses glibc for its locale info, not ICU, so there is no way to directly apply this to Linux without expending a lot of effort either retrofitting ICU into glibc or supplementing the locale info in glibc.

Ignacio Vazquez-Abrams
  • 44,857
  • 7
  • 93
  • 100
-4

Adding the -f switch (no sorting) made it show that way for me.

man ls

[root@dusknoir ~/java/test]# ls -fl
total 0
-rw-r--r--  1 root  wheel  0 Jun  1 13:27 _1
-rw-r--r--  1 root  wheel  0 Jun  1 13:27 _2
-rw-r--r--  1 root  wheel  0 Jun  1 13:27 _3
-rw-r--r--  1 root  wheel  0 Jun  1 13:27 1
-rw-r--r--  1 root  wheel  0 Jun  1 13:27 2
-rw-r--r--  1 root  wheel  0 Jun  1 13:27 3
Tim
  • 6,113
  • 1
  • 18
  • 19