1

I am running k3s on a cluster of 3x Raspberry Pi 4. I keep running into problems when the nodes exhibit DiskPressure and pods are Evicted - however, I'm at a loss as to what is taking up the space on the 15G SD cards I'm using. I've tried all the obvious candidates - /var/log files, journalctl --vacuum-size, docker system prune -af --volumes - but I'm never able to get usage of the root filesystem much below 80%:

$ df -h | head
Filesystem                  Size  Used Avail Use% Mounted on
/dev/root                    15G   11G  3.0G  79% /
devtmpfs                    3.7G     0  3.7G   0% /dev
tmpfs                       3.9G     0  3.9G   0% /dev/shm
tmpfs                       1.6G  7.2M  1.6G   1% /run
tmpfs                       5.0M  4.0K  5.0M   1% /run/lock
/dev/mmcblk0p1              253M   32M  221M  13% /boot
...(other mounted filesystems, like external hard drives and NFS mounts)

I've been using du --max-depth 1 -xh . 2>/dev/null to try to track down large objects, but that's hit a dead end - especially since df and du are not intended to give matching results:

$ du --max-depth 1 / -xh 2>/dev/null
8.0K    /mnt
2.1G    /usr
4.0K    /media
4.0K    /opt
16K     /lost+found
6.0M    /etc
146M    /home
4.0K    /root
1.3G    /var
4.0K    /srv
40K     /tmp
3.5G    /

When du tells me that only 3.5G is being used, but df reports that 11G is used, what alternative tools can I use to find junk to delete (or - junk which is evidence of malfunctioning programs)?

Google is not particularly helpful here - most answers centre around du or ls (which gives a similar view to du), or using find to find large files (moderately helpful, but not useful if I have a proliferation of small files), and even [ncdu](https://unix.stackexchange.com/a/125451/30828) agrees with du that only ~3.5G is in use. As per this guide, I tried to find any files that have been deleted (and so, are "seen" by df but not by du), but came up (nearly) empty:

$ sudo lsof -w | grep -i 'deleted'
systemd-j    155                              root   27u      REG              179,2    33554432      37340 /var/log/journal/539cc463fa774d11a5642e3744db7544/user-1000@f197a92838804bf28f92299ece25a807-000000000005daa8-0005f1ce57c1e95c.journal (deleted)
scubbo
  • 113
  • 3

1 Answers1

3

You are running du --max-depth 1 / -xh 2>/dev/null as an ordinary user. As such there will be plenty of directories which it cannot traverse due to permission restrictions. You must run this command as root.

Bib
  • 2,056
  • 1
  • 4
  • 10
  • D'oh - that seems so obvious in hindsight, thank you! Unfortunately `sudo du ...` is hanging, and `sudo strace -u root sudo -k du ...` hangs on a line that reads `ppoll([{fd=-1}, {fd=3, events=POLLIN}], 2, NULL, NULL, 8`, which I'm trying to interpret - a negative file-descriptor _seems_ impossible/error-flavoured! Regardless, this is a perfect answer to the original question, and I'll accept it as such. Thanks! – scubbo Jan 11 '23 at 02:58
  • @scubbo There are a few dirs which you should not run du against. The output should tell you what has been completed and what the next is causing the hang. I tend to run du -s dir1 dir2 dir3 etc. You really do not want to run it against `/proc` for instance. Divide and conquer... – Bib Jan 11 '23 at 10:50
  • @Bib what's the problem running it through /proc? Never seen any. The worst that could happen is that it outputs a few lines into the stdout about some files disappeared during the run. Also, there is no predetermined order, so you *can't* know what's going next even if you see what has been completed. – Nikita Kipriyanov Jan 13 '23 at 05:46