4

I have a system that is becoming unresponsive for anywhere from a few seconds to a couple minutes. The only messages I see in the logs are like this:

Sep 16 18:07:33 server kernel: igb 0000:01:00.3: exceed max 2 second
Sep 16 18:07:50 server kernel: igb 0000:01:00.3: exceed max 2 second
Sep 16 18:07:58 server kernel: igb 0000:01:00.3: exceed max 2 second
Sep 16 18:08:08 server kernel: igb 0000:01:00.3: exceed max 2 second
Sep 16 18:08:17 server kernel: igb 0000:01:00.3: exceed max 2 second
Sep 16 18:08:57 server kernel: igb 0000:01:00.3: exceed max 2 second
Sep 16 18:09:04 server kernel: igb 0000:01:00.3: exceed max 2 second
Sep 16 18:09:11 server kernel: igb 0000:01:00.3: exceed max 2 second
Sep 16 18:09:25 server kernel: igb 0000:01:00.3: exceed max 2 second
Sep 16 18:09:58 server kernel: igb 0000:01:00.3: exceed max 2 second
Sep 16 18:10:05 server kernel: igb 0000:01:00.3: exceed max 2 second
Sep 16 18:10:12 server kernel: igb 0000:01:00.3: exceed max 2 second
Sep 16 18:10:24 server kernel: igb 0000:01:00.3: exceed max 2 second
Sep 16 18:10:31 server kernel: igb 0000:01:00.3: exceed max 2 second
Sep 16 18:10:38 server kernel: igb 0000:01:00.3: exceed max 2 second

I'm not sure where to start troubleshooting this. Could these messages be related to the system becoming unresponsive?

MountainX
  • 17,168
  • 59
  • 155
  • 264
  • Start with copypaste the error message into google. 1st hit is https://sourceforge.net/p/e1000/bugs/574/?page=1 – Ipor Sircer Sep 16 '18 at 23:35
  • @IporSircer https://meta.stackexchange.com/a/8726/227714 and besides I already googled and did not understand what I was reading and I still don't. I came here for an answer, not to be redirected back where I already was. – MountainX Sep 16 '18 at 23:45
  • 1
    It is a known bug and its status is WONTFIX. Live with it or replace your hardware. – Ipor Sircer Sep 16 '18 at 23:50
  • @IporSircer thank you. I have had this same hardware for 2 years and the problem just started today. When you say "replace your hardware" does that imply the make and model are not compatible or that this specific piece of hardware is failing? – MountainX Sep 17 '18 at 00:11
  • `igb` is the ethernet adapter, so to test if this is related, disable ethernet and/or switch to a different method (WLAN), and see if the problem persists. I also have the problem that my system is becoming unresponsive, for me it seems related to the disk caches (processes end up in D state for too long), so dropping caches (`echo 3 > /proc/sys/vm/drop_caches` as root) helps. No idea if it well help in your case. – dirkt Sep 17 '18 at 11:17

3 Answers3

2

I've been seeing this problem (together with even more NIC Link is Down and NIC Link is Up messages) since updating from Devuan beowulf (with Kernel 4.19) to Chimera (with Kernel 5.10), on 00:14.0 Ethernet controller: Intel Corporation Ethernet Connection I354 (rev 03) on a Supermicro A1SRi-2558F board.
The network interface it mostly happened on is connected to a FRITZ!Box 6660 Cable router with FRITZ!OS: 07.29 (the machine with the Intel NIC that runs Devuan acts as a second router/firewall behind the provider-controller FritzBox).

The issue usually happened when there was some load, like when running a speedtest, but also (less frequently) with less load, like video conferences.

What appears to fix the issues (both "exceed max 2 second" and the Link going down for a few seconds) is to disable EEE (some energy saving thing) on the NIC, with:
ethtool --set-eee eth1 eee off

In case this answer is too late for the original poster, I hope it's at least helpful for others who find this via a search engine (only to read a comment telling them to google the issue.. I haven't found this particular solution anywhere else).

1

In my case it was faulty network cable. You can also check if cable is sitting tight in network socket. After changing cable problem was resolved.

c97
  • 111
  • 2
0

Are you running a network bridge?

This morning, I ran into this issue after a regular Ubuntu Server LTS 20.04 package upgrade to linux-image-5.4.0-139-generic. The server has four network interface adapters which are put to use as a network bridge for the local network.

The following message caught my eye when executing the following debug command:

$ dmesg |grep 'igb\|bridge\|br0'

[ 30.295463] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.

This means that the br_netfilter kernel module is no longer loaded by default. It is required for the bridge to work. To check whether this module is loaded, issue below command. You should get a similar response.

$ lsmod |grep br_netfilter
br_netfilter           28672  0
bridge                176128  1 br_netfilter

If not, add br_netfilter to the list of kernel modules to be loaded, which is /etc/modules on my system. For other /etc system or distribution variations, see here. Restart the system and the network bridge should be up and running again.

Serge Stroobandt
  • 2,314
  • 3
  • 32
  • 36