1

What is the least expensive way to find the oldest file in a directory, including all directories underneath. Assume directory is backed by SAN and under heavy load.

There is concern that "ls" could be locking and cause system degradation under heavy load.

Edit: Find performs very well under a simple test case - find oldest file amongst 400 gigs of files on an SSD drive took 1/20 seconds. But this is a MacBook Pro Laptop under no load... So it's a bit of an apples to oranges test case.

And as an aside what is the best way to find out implementations (underlying algorithms) for such commands?

  • 1
    possible duplicate of [How to list files that were changed in a certain range of time?](http://unix.stackexchange.com/questions/29245/how-to-list-files-that-were-changed-in-a-certain-range-of-time) – jasonwryan Jul 17 '13 at 22:12
  • 2
    @jasonwryan No, knowing what files were modified in a given time range doesn't help to find the oldest file. – Gilles 'SO- stop being evil' Jul 17 '13 at 23:54
  • `ls` doesn't scan the file contents. It reads the directories and `stat`s the files, which is necessary to find the oldest files anyway. But `ls` won't really help you because going from any `ls` output to finding the oldest files would be very difficult. – Gilles 'SO- stop being evil' Jul 17 '13 at 23:55
  • I am surprised nobody mentioned using an event driven model to accomplish this... ie building something that uses inotify. – Robert Christian Jul 19 '13 at 17:40

2 Answers2

2

With zsh:

oldest=(**/*(.DOm[1]))

For the oldest regular file (zsh time resolution is to the second)

With GNU tools:

(export LC_ALL=C
 find . -type f -printf '%T@\t%p\0' |
   sort -zg | tr '\0\n' '\n\0' | head -n 1 |
   cut -f2- | tr '\0' '\n')
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • +1 Perhaps add something about backgrounding this and `nice`ing it as the user seems worried about it locking and affecting performance under heavy load. – Joseph R. Jul 17 '13 at 22:05
  • 2
    @JosephR. `nice`ing? Useless, this is IO-bound. `ionice`, maybe. – Gilles 'SO- stop being evil' Jul 17 '13 at 23:54
  • @Gilles Please correct me if I'm wrong. Wouldn't both `nice` and `ionice` be relevant here? The `find` would take up CPU and can therefore benefit from `nice` and the `rm` would need lots of I/O and would therefore benefit from `ionice`. – Joseph R. Jul 18 '13 at 10:00
  • 2
    @JosephR. `find` is IO-bound just like RM. The CPU time needed to format the data is negligible compared to the `stat` calls. – Gilles 'SO- stop being evil' Jul 18 '13 at 10:03
0

To minimize the number of external processes, you may be able to optimize by running a custom script instead of a proper find. The directory traversal and stat() of each file cannot be optimized away, but you only need to keep the oldest file so far in memory.

Here is an attempt in Perl:

find2perl -eval 'BEGIN { our ($filename, $oldest); }
    my @s=stat(_); if (! defined $::oldest || $s[9] < $::oldest) {
        $::oldest=$s[9]; $::filename = $File::Find::name }
    END { print "$::filename\n" }' | perl

In my tests, on a moderately large directory (129019 nodes), this is actually about 50% slower than @StephaneChazelas "GNU Tools" version, but you may find that it works better in some scenarios, especially for really large directories.

tripleee
  • 7,506
  • 2
  • 32
  • 42
  • If you prefer Python, http://stackoverflow.com/questions/7541863/python-equivalent-of-find2perl has some hints. – tripleee Jul 18 '13 at 07:19