0

I'm running a headless networkless raspberry pi zero to control some hardware**, and occasionally the time/date gets corrupted, maybe other things too. While it's not a perfect fix, rebooting the machine automatically would be an improvement over the status quo. Normally cron would be the extremely obvious choice here, but I'm fairly sure that cron is not robust to the date getting corrupted. And by corrupted I mean it will randomly go back in time 4 and a half years. Something defective in the RTC chip I've added, I suspect.

So what can I do that's relatively robust to intermittent clock errors? Ideally the computer would be rebooted every 30 days. Having a buggy clock is obviously going to make that target harder, but precision is not that important, as long as it's not rebooting every week, or never.

I could roll my own solution by having a cron script that adds a ! to a file somewhere at midnight every night, and then rebooting when wc reports more than 30 characters, or something like that, but it seems like there must be a standard solution.

** in an extreme case of overkill, this fully functional unix computer that probably equals a mid-90s SGI O2 is operating a "smart" cat door that locks or unlocks depending on light levels outside.

Al Ro
  • 203
  • 2
  • 8
  • 2
    If it always goes back in time, you could just save the current date as epoch time in a dummy file and regularly check if the current date is higher than that (=> then update the file) or if you need to reboot. – FelixJN Apr 21 '23 at 20:39
  • there's probably a little clock jitter due to the RTC, so I'd probably want a threshold like more than 30 days. Is there a succinct way to test if the time is +/- 30 days from the date on a file? – Al Ro Apr 21 '23 at 20:43
  • Well ... Epoch time is in seconds, so check for the number to be within `+/- 30*24*3600`. – FelixJN Apr 21 '23 at 20:44
  • 2
    Just a snide remark: the Raspberry Pi Zero benchmarks at a Dhrystone of ca 1240 MIPS, the fastest SGI 02 I could find benchmarked online (sporting the late R12000 model at 300 MHz) gets roughly 651 MIPS. So, your raspberry Pi is about two times as fast as what you thought was "equal". – Marcus Müller Apr 21 '23 at 20:53
  • 1
    Thanks for the (much needed) fact check, but here's my snide comeback: most people probably bought the O2 for the GPU not CPU, so and there's a decent chance the O2 can push more polygons per second. – Al Ro Apr 21 '23 at 22:47

1 Answers1

2

How can you detect large clock jumps?

Linux maintains at least two internal system clocks:

  • The "real time clock" commonly kept in sync with an NTP server
  • The "monotonic clock" never goes backwards, just ticks forwards

A jump back of approximately 4 years is very likely to be caused by something resetting the real time clock. The monotonic clock should never jump backwards, that's the whole point of it.

So if you monitor the monotonic clock compared to the system clock, you can discover times when the system clock has suddenly jumped back a very long way (few clocks should jump back more than a month).

If you have python, you can get the difference in seconds trivially:

python3 << EOF
import time
print(time.time() - time.monotonic())
EOF

This gives a number (in seconds) that's meaningless in it's own right, but any large changes will infer a system clock jump. Any change of -2592000 or worse shows a jump larger than 30 days.

Commands like sleep should be unaffected by clock jumps like this so you should be able to run this periodically.

Diagnosing the real problem

I went down a very deep rabbit hole with a near identical sounding issue.

It's a month of my professional life I won't be getting back. So I'll highlight this answer. It may help you fix the root cause.

It's quite unlikely your system clock is spontaneously corrupting itself.

The system clock is maintained by software not hardware. The RTC, if your device has one, is normally just read on startup to recover after reboot. Unless you have a quirky ntpd configuration, you can safely rule out the RTC.

Typically the only thing spontaneously changing you system clock is either an NTP daemon or SNTP daemon. My money would be on an SNTP daemon causing it because NTP daemons check multiple servers and so are pretty fault tolerant.

Some home routers do something very dirty. They act as an NTP or SNTP server. But when the router reboots, without an RTC, the router software just uses a fixed date/time (eg: a software patch build date?). Despite the date/time clearly being wrong, their NTP server carries on issuing the wrong date, and only issues the right date when the router itself has updated with NTP or SNTP.

In the case of the BT Home hub I dealt with a few years back, the router reconfigured every device it could using DHCP. It even declared itself a "Stratum 1" NTP server meaning it claimed to be directly connected to a time source (atomic clock).

Try rebooting your router a few times to see if this triggers your IOT device time to jump.

Philip Couling
  • 17,591
  • 5
  • 42
  • 82
  • 1
    Hmmm I just re-read the "networkless". Maybe not the same root cause as mine. I'll leave that section in just in case it helps future readers. – Philip Couling Apr 21 '23 at 22:50
  • That's a great theory but this device has no network hardware (pi zero has no wifi or Ethernet built in). I only ever connect to it over serial console. It does have a plug-in RTC because the pi line has no RTC built in. – Al Ro Apr 21 '23 at 23:47
  • @AlRo Yeah I realised. Check the system logs anyway. As per [this answer](https://unix.stackexchange.com/a/549487/20140), the RTC should not be consulted, except during startup. It is possible to configure your NTP daemon to read one (if one is installed) but that's more complex than it sounds. I don't think it's ever done by default. – Philip Couling Apr 21 '23 at 23:57