10

I just now checked my dmesg because my server starts to crash now and then. There I read the following line:

perf interrupt took too long (2528 > 2500), lowering kernel.perf_event_max_sample_rate to 50000

which appears a couple of times.
I remember perf being a performance analytics tool and not remember having it installed. So I checked:

~$ dpkg -l *perf*
dpkg-query: no packages found matching *perf*

My questions:

  • Is this a sign of an oncoming storm. Because this line comes a few times and then there are stackdumps starting with rcu_sched detected stalls
  • Where do these come from?
Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
Martin B.
  • 235
  • 1
  • 2
  • 10

3 Answers3

5

This message comes from the linux kernel. More precisely it come from the perf_duration function in linux/kernel/events/core.c :

static void perf_duration_warn(struct irq_work *w)
{
    printk_ratelimited(KERN_INFO
        "perf: interrupt took too long (%lld > %lld), lowering "
        "kernel.perf_event_max_sample_rate to %d\n",
        __report_avg, __report_allowed,
        sysctl_perf_event_sample_rate);
}

I don't know what you precisely mean by :

Is this a sign of an oncoming storm?

but I suspect problems with one of your devices.

P.S.: If you read carefully, you will see that in the code the message is perf: interrupt took too long but your message is perf interrupt took too long. The colon was added in kernel version 4.6.

nickcrabtree
  • 183
  • 1
  • 9
Ortomala Lokni
  • 4,665
  • 3
  • 31
  • 58
  • I mean if the later happening cpu stalling announces itself by prolonging the perf interrupt duration. – Martin B. May 02 '17 at 19:50
  • Difficult to say. Try to investigate by booting the system in [rescue mode](https://www.debian.org/releases/stable/amd64/ch08s07.html.en). – Ortomala Lokni May 02 '17 at 20:00
4

I've had a similar message for some time now on my Desktop system. It shows up after one or sometimes several cores stall in uninterruptable disk I/O (D in ps) for minutes or longer. I suspect some race condition in I/O scheduling which leads to deadlock, but don't know how to debug this. Switching to the deadline scheduler for the appropriate disk instead of CFQ seems to help:

# echo deadline > /sys/block/sdX/queue/scheduler 

I have observed short pauses in scheduling with that, but the second queue of the deadline scheduler seems to mitigate the long stall.

If somebody could shed some more light on this, I'd also appreciate it.

Edit

I don't know if the rcu_sched errors/warnings are related, but it's quite possible. I don't get them, possible because my kernel is configured differently.

When one core is stalled, what i see with ps is

$ ps axu | grep ' D'
dirk      4720 13.0  5.1 1615772 842444 pts/3  Dl+  07:27  24:54 iceweasel -P default

for the process that was doing the I/O. D means "uninterruptible sleep (usually I/O)" according to man ps.

dirkt
  • 31,679
  • 3
  • 40
  • 73
  • I recently had some other problems where my support said to change my `queue/scheduler` to `noop`. Could this be related? – Martin B. May 03 '17 at 07:17
  • Maybe, depends on what the other problems were, what you told support, and what support said exactly. Links? – dirkt May 03 '17 at 08:02
  • I just asked if the stalling detected by `rcu_sched` was a problem of my system or the node. Support replied by sending me this link: https://www.netcup-wiki.de/wiki/KVM_Tuning I only did the temporary scheduler change. What do you mean by "D in ps"? – Martin B. May 03 '17 at 08:38
  • can you tell how to persist such configuration to remain after a reboot, please ? – Setop Oct 18 '17 at 10:06
  • @Setop Use whatever `sysctl` facility your distribution uses, e.g. `/etc/sysctl.d/`. Though I found out in the meantime that while the deadline scheduler helps, there are still hangups. Upgrading to a never kernel didn't change anything. Did you run into the same problem? – dirkt Oct 18 '17 at 15:03
  • @dirkt, same here. It helps a bit but I still get kernel freeze :( – Setop Oct 18 '17 at 21:10
0

you can regularly induce this error if you are encrypting the swap space.

Regularly.

dm_crypt is the culprit.

No loss of information although.

John Greene
  • 304
  • 1
  • 12