27

Usually when I have programs that are doing a full disk scan and going over all files in the system they take a very long time to run. Why does updatedb run so fast in comparison?

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
hugomg
  • 5,543
  • 4
  • 35
  • 53
  • `find -xdev` scan is very fast, try this https://unix.stackexchange.com/a/379827/208590 ? – Rick Jun 04 '22 at 16:19

2 Answers2

24

The answer depends on the version of locate you’re using, but there’s a fair chance it’s mlocate, whose updatedb runs quickly by avoiding doing full disk scans:

mlocate is a locate/updatedb implementation. The 'm' stands for "merging": updatedb reuses the existing database to avoid rereading most of the file system, which makes updatedb faster and does not trash the system caches as much.

(The database stores each directory’s timestamp, ctime or mtime, whichever is newer.)

Like most implementations of updatedb, mlocate’s will also skip file systems and paths which it is configured to ignore. By default there are none in mlocate’s case, but distributions typically provide a basic updatedb.conf which ignores networked file systems, virtual file systems etc. (see Debian’s configuration file for example; this is standard practice in Debian, so GNU’s updatedb is configured similarly).

Stephen Kitt
  • 411,918
  • 54
  • 1,065
  • 1,164
  • Fairly good question and answer, did not even know there were "differencial" scannings. – Rui F Ribeiro Jan 02 '19 at 16:25
  • 1
    Thanks! I had never noticed that modifying a file also changes the ctime and mtime of all its parent directories. – hugomg Jan 02 '19 at 17:21
  • 4
    @hugomg I don't think it actually does. It should only change the `mtime` of its immediate parent. – Kusalananda Jan 02 '19 at 17:33
  • So if I understand it correctly, `mlocate` cares about `ctime` and `mtime` which implies it cares only of whether list of directory entries is still the same ( no removed or added files), which means it doesn't care about actual files themselves. Is that correct ? – Sergiy Kolodyazhnyy Jan 03 '19 at 01:12
  • @Sergiy: Of course. `locate` isn't `grep -R`. It does not read file content. – Kevin Jan 03 '19 at 01:21
  • @Kevin I'm well aware of the difference. I'm just interested in `mlocate` behavior and how it would handle certain corner cases, since I've my own similar project for a database of files. For instance, `vim` replaces original file with new, thus changing inode number and modifying directory even if filename is the same. – Sergiy Kolodyazhnyy Jan 03 '19 at 01:41
9

In addition to checking modification times, mlocate also ignores certain subtrees of the file system that have lots of uninteresting or potentially duplicate files, as specified in /etc/updatedb.conf (and described in man updatedb.conf):

  • Bind mounts
  • Some kinds of file systems (9p, afs, bdev, etc)
  • VCS repository databases (.git, .hg, etc)
  • Some hard-coded directories (/media, /tmp, /var/spool/cups, etc).
hugomg
  • 5,543
  • 4
  • 35
  • 53
  • This isn’t the case by default though, so the base behaviour depends on the distribution being used. (Other `updatedb` implementations also support configured exclusions.) – Stephen Kitt Jan 03 '19 at 08:17
  • Indeed. I was describing the defaults for Fedora. – hugomg Jan 03 '19 at 15:21