On several production platforms we have observed symptoms which appear to suggest that the time of day clock is periodically jumping forward or backward. The jumps are typically around 1 second, typically cancel out (jump forward then backward very shortly thereafter) and happen around 50 times per day. This drift is most noticeable during times of peak application usage, and during periods of high disk I/O operations such as daily backups. These drifts are affecting our soft real-time sensitive application.
Systems are Oracle Netra X4250 and Netra X4270 servers running SLES 11SP2 with 3.0.58-0.6.6-default kernel.
$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc hpet acpi_pm
$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
We have disabled NTP, but that has not had any effect on the drifts. Are there tools which measure time of day clock drift? How can we avoid this?
These are production platforms, and we cannot recreate the issue in our labs, so my ability to experiment is limited. If left to my own devices, I'll write a tool to measure drift, and perhaps experiment with an HPET clocksource.