0

I have a Debian Linux machine in a distant location. When I switch it on using WOL, it starts all right and works for some 15 minutes, then it becomes unreachable on the network. I can still log from the console when I walk there, which is inconvenient.

Everything is fine when I switch it off, restart and log from the console. Then it stays on for indefinite time.

I am aware of this answer which seems to be relevant. But when I issue

arp -s 158.227.90.30 00:15:17:41:00:40

as recommended, I get:

SIOCSARP: Argumento inválido

An excerpt of the last lines of syslog follows, which I cannot quite interpret. I do not know what else to try.

Apr 20 09:31:17 B012526 vmunix: [ 1231.510751] PM: suspend entry (s2idle)
Apr 20 09:31:17 B012526 vmunix: [ 1231.510754] PM: Syncing filesystems ... done.
Apr 20 09:45:22 B012526 vmunix: [ 1231.739150] Freezing user space processes ... (elapsed 0.001 seconds) done.
Apr 20 09:45:22 B012526 vmunix: [ 1231.740905] OOM killer disabled.
Apr 20 09:45:22 B012526 vmunix: [ 1231.740906] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
Apr 20 09:45:22 B012526 vmunix: [ 1231.742181] Suspending console(s) (use no_console_suspend to debug)
Apr 20 09:45:22 B012526 vmunix: [ 1231.764857] sd 3:0:4:0: [sde] Synchronizing SCSI cache
Apr 20 09:45:22 B012526 vmunix: [ 1231.784851] sd 3:0:3:0: [sdd] Synchronizing SCSI cache
Apr 20 09:45:22 B012526 vmunix: [ 1231.800826] sd 3:0:2:0: [sdc] Synchronizing SCSI cache
Apr 20 09:45:22 B012526 vmunix: [ 1231.816826] sd 3:0:1:0: [sdb] Synchronizing SCSI cache
Apr 20 09:45:22 B012526 vmunix: [ 1231.832827] sd 3:0:0:0: [sda] Synchronizing SCSI cache
Apr 20 09:45:22 B012526 vmunix: [ 1231.837227] serial 00:06: disabled
Apr 20 09:45:22 B012526 vmunix: [ 1231.837273] serial 00:05: disabled
Apr 20 09:45:22 B012526 vmunix: [ 1231.837354] e1000e: EEE TX LPI TIMER: 00000000
Apr 20 09:45:22 B012526 vmunix: [ 1231.837374] e1000e: EEE TX LPI TIMER: 00000000
Apr 20 09:45:22 B012526 vmunix: [ 1231.837396] mptbase: ioc0: pci-suspend: pdev=0x00000000521aadd5, slot=0000:04:00.0, Entering operating state [D3]
Apr 20 09:45:22 B012526 vmunix: [ 1232.180822] radeon 0000:0d:0c.0: Refused to change power state, currently in D0
Apr 20 07:45:22 B012526 rtkit-daemon[1095]: The canary thread is apparently starving. Taking action.
Apr 20 09:45:22 B012526 vmunix: [ 1232.473794] usb usb4: root hub lost power or was reset
Apr 20 09:45:22 B012526 vmunix: [ 1232.473825] lpc_ich 0000:00:1f.0: rerouting interrupts for [8086:2670]
Apr 20 09:45:22 B012526 vmunix: [ 1232.473885] mptbase: ioc0: pci-resume: pdev=0x00000000521aadd5, slot=0000:04:00.0, Previous operating state [D0]
Apr 20 09:45:22 B012526 vmunix: [ 1232.473938] usb usb2: root hub lost power or was reset
Apr 20 09:45:22 B012526 vmunix: [ 1232.474164] mptbase: ioc0: pci-resume: ioc-state=0x1,doorbell=0x10000000
Apr 20 09:45:22 B012526 vmunix: [ 1232.474273] serial 00:05: activated
Apr 20 09:45:22 B012526 vmunix: [ 1232.474285] usb usb3: root hub lost power or was reset
Apr 20 09:45:22 B012526 vmunix: [ 1232.474293] usb usb5: root hub lost power or was reset
Apr 20 09:45:22 B012526 vmunix: [ 1232.475156] ata1.00: unexpected _GTF length (8)
Apr 20 09:45:22 B012526 vmunix: [ 1232.475189] serial 00:06: activated
Apr 20 09:45:22 B012526 vmunix: [ 1232.482418] e1000e 0000:05:00.0 enp5s0f0: MAC Wakeup cause - Magic Packet
Apr 20 09:45:22 B012526 vmunix: [ 1232.852724] [drm] PCI GART of 512M enabled (table at 0x0000000034900000).
Apr 20 09:45:22 B012526 vmunix: [ 1232.852728] radeon 0000:0d:0c.0: WB disabled
Apr 20 09:45:22 B012526 vmunix: [ 1232.852732] radeon 0000:0d:0c.0: fence driver on ring 0 use gpu addr 0x0000000090000000 and cpu addr 0x00000000349b3527
Apr 20 07:45:22 B012526 rtkit-daemon[1095]: Demoting known real-time threads.
Apr 20 09:45:22 B012526 vmunix: [ 1232.852832] [drm] radeon: ring at 0x0000000090001000
Apr 20 09:45:22 B012526 vmunix: [ 1232.852891] [drm] ring test succeeded in 1 usecs
Apr 20 09:45:22 B012526 vmunix: [ 1232.852944] [drm] ib test succeeded in 0 usecs
Apr 20 09:45:22 B012526 vmunix: [ 1232.892920] usb 2-1: reset low-speed USB device number 2 using uhci_hcd
Apr 20 09:45:22 B012526 vmunix: [ 1235.644819] mptbase: ioc0: Sending mpt_do_ioc_recovery
Apr 20 09:45:22 B012526 vmunix: [ 1235.644821] mptbase: ioc0: Initiating bringup
Apr 20 09:45:22 B012526 vmunix: [ 1235.668248] e1000e: enp5s0f0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Apr 20 09:45:22 B012526 vmunix: [ 1236.484810] ioc0: LSISAS1064E B2: Capabilities={Initiator}
Apr 20 09:45:22 B012526 vmunix: [ 1246.398765] mptbase: ioc0: pci-resume: success
Apr 20 09:45:22 B012526 vmunix: [ 1246.399467] OOM killer enabled.
Apr 20 09:45:22 B012526 vmunix: [ 1246.399468] Restarting tasks ... done.
Apr 20 09:45:22 B012526 vmunix: [ 1246.401054] PM: suspend exit
Apr 20 07:45:22 B012526 rtkit-daemon[1095]: Successfully demoted thread 1094 of process 1094 (n/a).
Apr 20 07:45:22 B012526 rtkit-daemon[1095]: Demoted 1 threads.
Apr 20 09:45:31 B012526 vmunix: [ 1255.730216] perf: interrupt took too long (3143 > 3138), lowering kernel.perf_event_max_sample_rate to 63500
Apr 20 09:45:36 B012526 gnome-session-binary[1029]: WARNING: Application 'org.gnome.SettingsDaemon.A11ySettings.desktop' killed by signal 15
Apr 20 09:45:36 B012526 gnome-session-binary[1029]: WARNING: Application 'org.gnome.SettingsDaemon.Datetime.desktop' killed by signal 15
Apr 20 09:45:36 B012526 gnome-session-binary[1029]: WARNING: Application 'org.gnome.SettingsDaemon.Housekeeping.desktop' killed by signal 15
Apr 20 09:45:36 B012526 gnome-session-binary[1029]: WARNING: Application 'org.gnome.SettingsDaemon.Mouse.desktop' killed by signal 15
Apr 20 09:45:36 B012526 gnome-session-binary[1029]: WARNING: Application 'org.gnome.SettingsDaemon.PrintNotifications.desktop' killed by signal 15
Apr 20 09:45:36 B012526 gnome-session-binary[1029]: WARNING: Application 'org.gnome.SettingsDaemon.ScreensaverProxy.desktop' killed by signal 15
Apr 20 09:45:36 B012526 gnome-session-binary[1029]: WARNING: Application 'org.gnome.SettingsDaemon.Sharing.desktop' killed by signal 15
Apr 20 09:45:36 B012526 gnome-session-binary[1029]: WARNING: Application 'org.gnome.SettingsDaemon.Smartcard.desktop' killed by signal 15
Apr 20 09:45:36 B012526 gnome-session-binary[1029]: WARNING: Application 'org.gnome.SettingsDaemon.Rfkill.desktop' killed by signal 15
Apr 20 09:45:36 B012526 gnome-session-binary[1029]: WARNING: Application 'org.gnome.SettingsDaemon.Wacom.desktop' killed by signal 15
Apr 20 09:45:36 B012526 gnome-session-binary[1029]: WARNING: Application 'org.gnome.SettingsDaemon.Keyboard.desktop' killed by signal 15
Apr 20 09:45:36 B012526 gnome-session-binary[1029]: WARNING: Application 'org.gnome.SettingsDaemon.MediaKeys.desktop' killed by signal 15
Apr 20 09:45:36 B012526 gnome-session-binary[1029]: WARNING: Application 'org.gnome.SettingsDaemon.Clipboard.desktop' killed by signal 15
Apr 20 09:45:36 B012526 gnome-session-binary[1029]: WARNING: Application 'org.gnome.SettingsDaemon.Power.desktop' killed by signal 15
Apr 20 09:45:36 B012526 gnome-session-binary[1029]: WARNING: Application 'org.gnome.SettingsDaemon.Sound.desktop' killed by signal 15
Apr 20 09:45:36 B012526 gnome-session-binary[1029]: WARNING: Application 'org.gnome.SettingsDaemon.XSettings.desktop' killed by signal 15
F. Tusell
  • 571
  • 5
  • 7

1 Answers1

0
Apr 20 09:45:22 B012526 vmunix: [ 1246.399467] OOM killer enabled.

Oh no :(

The OOM killer is the part of the kernel that wakes up and starts killing processes when the system is running critically low on memory. I'm guessing the OOM killer decided to kill the SSH daemon, hence why you're getting kicked.

Start it back up and monitor the memory usage until you figure out what's eating all the memory up then report back. Dmesg (likely) isn't helpful enough on its own.

Njinx
  • 13
  • 2
  • 2
    That doesn't look like the oom killer was actually invoked. If I understand correctly this is a very early stage log and the oom killer is temporarily disabled during startup. There's some comments about it just below the log line [in the source](https://github.com/torvalds/linux/blob/master/mm/oom_kill.c#L768) Maybe I misread – Philip Couling Apr 26 '22 at 00:19
  • 1
    @Njinx, it is not just the ssh daemon that is killed, but also the machine does not respond to ping or http queries. This happens some 15-20 minutes after wake-on-lan. Thank you very much for looking into this, anyway. – F. Tusell Apr 26 '22 at 07:37
  • @Ninjx, I guess the crux of the matter is why OOM is triggered when the machine is started by wake-on-lan and not whan I start it ordinarily, by pressing the switch. – F. Tusell Apr 26 '22 at 07:42
  • @PhilipCouling Hmm, you might be right. I was looking at the bottom of the logs and saw that a bunch of processes were being killed and assumed it was OOM, but signal 15 is SIGTERM and I doubt OOM killer would send a SIGTERM. – Njinx May 04 '22 at 21:20
  • @F.Tusell Could you post your wake-on-lan configuration? – Njinx May 04 '22 at 21:21
  • @Njinx, sorry for the delay in answering. End of course, hectic times here. I do not remember having configured anything, short of enabling wol (with ethtool). Is there anything else which needs be done? – F. Tusell May 08 '22 at 11:04