2

After ssh into a VM, I am not able to run basically any command such as ps aux, cd /proc; ls, top, dmesg or systemctl status, all of them hangs, so I don't even know how to spot the problem.

Linux frdev07 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64

Some hints though: it only happens on weekend, and systemctl status hangs somewhere inside [email protected]. I cannot use psql to connect to the database neither, it hangs too. It looks to me as a sort of maintenance process run by PostgreSQL, but that's not an excuse to make the VM unusable..

How could I find out the precise issue?

dgan
  • 254
  • 5
  • 13
  • 2
    Connections that hang when TCP has to transport a large burst of output, be it an SSH or a database connection. This will be https://unix.stackexchange.com/q/412192/5132 and https://unix.stackexchange.com/q/4261/5132 again, I expect. – JdeBP Mar 16 '19 at 22:01
  • 1
    Does this answer your question? [Why would SSH freeze for minutes at a time when other traffic is unaffected?](https://unix.stackexchange.com/questions/412192/why-would-ssh-freeze-for-minutes-at-a-time-when-other-traffic-is-unaffected) – roaima Feb 15 '21 at 20:09

1 Answers1

1

If the entire OS seems to hang, it might be a disk I/O problem of some kind, potentially an indication of a physical disk failure.

Check the logs: mainly /var/log/messages and/or /var/log/kern.log. You might also try the dmesg command: it's a very minimal command that just outputs the kernel message buffer from RAM, so it might have a chance of working even if something is causing more complicated commands to hang.

If even dmesg fails, try accessing the VM console: in case of serious errors, the system might emit error messages to the console even if nobody is logged in there.

The kernel version number 4.9.0-6-amd64 tells me the system is not quite up to date with patches: the current kernel for Debian 9.x would be 4.9.0-8-amd64. If the system is accessible from the internet, it might be that the system is slow because it's been detected as vulnerable and is under attack of some sort.

telcoM
  • 87,318
  • 3
  • 112
  • 232
  • thank you for the reply, but I cannot read those files right now... `tail /var/log/messages -n 13` shows nothing interesting, but `tail /var/log/messages -n 14` hangs too. Same thing for `dmesg` and `kern.log` – dgan Mar 16 '19 at 21:46
  • 2
    If this is a big VMware ESXi set-up, check with the VMware administrator. It might be that the physical host has another VM that is very busy and prioritized above your VM, and it's taking up all the free processor and/or I/O capacity the host has. If the host has been overcommitted (i.e. it has fewer total system resources than expected by the total of VMs on it, in assumption that not all VMs will be busy at the same time), such a situation could cause "hanging" and severe slowness in other VMs on that host. – telcoM Mar 16 '19 at 22:10