6

On an Orange Pi Zero running a Raspbian server, it's possible to use the watchdog very easily just by running the command echo 1 > /dev/watchdog as root. The idea is that the system will certainly reboot after some time that this command is executed, so I need to keep repeating this command in a regular interval of time to keep the system on. We can implement a watchdog using cron as root and making it execute the following script on boot:

#!/bin/bash
while [ true ]; do
    echo 1 > /dev/watchdog
    sleep 5
done

This script works fine on the Orange Pi Zero... However, on my desktop computer running Ubuntu 18.04 the command echo 1 > /dev/watchdog doesn't work at all. Is it possible to activate the watchdog on any device running Linux?

Rafael Muynarsk
  • 2,606
  • 3
  • 18
  • 25

4 Answers4

6

There are two types of watchdog; hardware and software. On the Orange Pi the SOC chip provides a hardware watchdog. If initialised then it needs to be pinged every so often, otherwise it performs a board reset.

However not many desktops have hardware watchdogs, so the kernel provides a software version. Now the kernel will try and keep track, and force a reboot. This isn't as good as a hardware watchdog because if the kernel, itself, breaks then nothing will trigger the reset. But it works.

The software watchdog can be initialised by loading the softdog module

% modprobe softdog
% dmesg | tail -1
[  120.573945] softdog: Software Watchdog Timer: 0.08 initialized. soft_noboot=0 soft_margin=60 sec soft_panic=0 (nowayout=0)

We can see this has a 60 second timeout by default.

If I then do

% echo > /dev/watchdog
% dmesg | tail -1
[  154.514976] watchdog: watchdog0: watchdog did not stop!

We can see the watchdog hadn't timed out.

I then leave the machine idle for a minute and on the console I see

[  214.624112] softdog: Initiating system reboot

and the OS reboots.

Stephen Harris
  • 42,369
  • 5
  • 94
  • 123
  • 1
    Many desktops, laptops and servers include hardware watchdogs, often in their Super-I/O chips, or in their PCH; see the various “wdt” modules in the kernel. – Stephen Kitt Dec 31 '18 at 20:39
  • 1
    @StephenHarris Can I assume that when the command `echo > /dev/watchdog` doesn't work at first it's because there's no hardware watchdog for that device? Or there's still the possibility that I need to activate the hardware watchdog before using it? – Rafael Muynarsk Dec 31 '18 at 20:45
  • @StephenKitt It's possible all my machines are too old... I haven't done a hardware refresh in over a decade! – Stephen Harris Dec 31 '18 at 20:46
  • 2
    If `/dev/watchdog` doesn't exist (or isn't a character device) and you know your hardware has a watchdog, then you may need to `modprobe` the relevant hardware driver... Modern distro's will try to load this automatically, but you may have an edge-case. None of my machines have a `/dev/watchdog` entry :-( – Stephen Harris Dec 31 '18 at 20:48
  • 4
    “Many”, not “all” ;-). My last two personal PCs (2003, 2013) have supported `iTCO_wdt`, and all my work PCs for the last decade or so have supported one driver or another, but it isn’t loaded automatically; as you mention, in many cases the relevant module needs to be loaded manually, and sometimes the firmware setup has to be configured appropriately too (and sometimes a jumper must be moved on the main board). – Stephen Kitt Dec 31 '18 at 21:33
  • @StephenKitt What do you mean exactly when you say "firmware setup"? – Rafael Muynarsk Dec 31 '18 at 21:37
  • 1
    @Rafael the BIOS or UEFI setup on a PC. – Stephen Kitt Dec 31 '18 at 21:38
  • @StephenKitt, do you know how to check if my PC has hardware watchdog? – Micheal XIV Mar 25 '20 at 08:47
  • @MichealXIV you can try loading the various watchdog modules, or look at your motherboard’s documentation. – Stephen Kitt Mar 25 '20 at 09:00
  • @StephenKitt: It seems that my PC does not have a hardware watchdog. So I need to try to use softdog as your comment. But it is not useful when kernel is hung. Do you have any idea for this case? – Micheal XIV Mar 25 '20 at 09:34
  • @MichaelXIV unfortunately that’s the main scenario where you need a hardware watchdog... – Stephen Kitt Mar 25 '20 at 10:22
  • @MichealXIV you could use a USB watchdog device and connect it to the reset pins of your motherboard. – vadipp Jan 05 '22 at 15:33
4

On a modern Linux operating system that uses systemd you can configure systemd to interact with the hardware watchdog on your behalf, rather than doing it yourself or using a separate user-space daemon.

You can do that by setting a (positive) RuntimeWatchdogSec value in the systemd configuration file, /etc/systemd/system.conf.

Raedwald
  • 144
  • 7
1

The I/O redirection closes the watchdog file handle after writing the 1. Depending on how the watchdog device is configured, closing the file handle can also disable the watchdog.

Try

exec 3>/dev/watchdog
echo 1 >&3

This will keep the watchdog device open in the current shell, so the timer will not be stopped.

Most people run a dedicated watchdog daemon rather than using cron; this daemon runs a list of checks before resetting the timer, so the machine also reboots if tests fail. This could be used to verify that a database service actually processes queries, while regular service monitoring would only verify that the process is running.

Simon Richter
  • 4,409
  • 18
  • 20
1

It depends on the hardware. With modern enough Linux kernel and intel CPU you should be able to do following if you run Ubuntu or some other Debian variant:

  1. sudo apt install watchdog

  2. sudo nano -w /etc/default/watchdog and define correct module, such as watchdog_module="iTCO_wdt" (note that the correct driver name depends on your hardware but this should be good enough for intel CPUs manufactured during the last 10 years). When the watchdog service is started, it will load this kernel module which will make /dev/watchdog device to appear in the system.

  3. sudo nano -w /etc/watchdog.conf uncomment the line watchdog-device = /dev/watchdog or just add that line as extra line to that file. The end result should match this:

     $ grep -vE '^(#|$)' /etc/watchdog.conf
     watchdog-device = /dev/watchdog
     realtime        = yes
     priority        = 1
    
  4. sudo systemctl enable --now watchdog

All the possible watchdog driver modules can be listed with command

ls "/lib/modules/$(uname -r)/kernel/drivers/watchdog"

and if you don't have a clue which one to use, you can try testing those one by one. For example, to test driver sp5100_tco.ko simply run sudo modprobe sp5100_tco and then run sudo wd_identify to tell if your hardware is supported by that driver. If it didn't work, remove the driver with sudo modprobe -r sp5100_tco and retry with another. Note that wd_identify cannot be used if watchdog process is already connected to the hardware so you cannot use that after enabling the watchdog.

To test the watchdog hardware you can cause artificial failure by simply opening the device and never writing anything to it. For example, before enabling the watchdog service in last step, you can simply run sudo cat /dev/watchdog and the system will automatically reset in about 60 seconds. This works because the watchdog driver works by starting the watchdog timer when the file is opened and the only way to reset the timer is to write something to the driver device. Closing the file will also stop the timer instead of causing a reboot (unless your kernel has been compiled with non-default flags which cause reset even if nobody is using watchdog device anymore after it has been used at all after boot). When you run cat on the driver file, the file will be opened and the cat process will stall trying to read the file and hardware reset will be done when the timer expires (which should be 60 seconds by default). It's a good idea to save all work and sync the filesystem before attempting this!

For details about the kernel watchdog driver, see the official kernel documentation.

Mikko Rantalainen
  • 3,899
  • 1
  • 25
  • 32
  • You can also use `kill -STOP $pid_of_the_watchdog_process` to test if your system correctly reboots if the actual watchdog process ever hangs. – Mikko Rantalainen Nov 30 '22 at 20:32