Why kernel panic when panic_on_warn==0

Question

My OS got kernel panic ( it looks like triggered another kernel to dump, kdump ? )

[   124.674715] core: Uncorrected hardware memory error in user-access at xxxxxxx
[   124.684140] BUG: scheduling while atomic: einj_mem_uc/5151/0xxxxxxxxx
[   124.684310] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
r = 0xxxxxxxxxxx[   124.691839] Memory failure: 0x25eae3: Killing einj_mem_uc:6161 due to hardware memory corruption
[   124.700827] {1}[Hardware Error]: event severity: recoverable
[   124.700828] {1}[Hardware Error]:  Error 0, type: recoverable
00 paddr = xxxxx[   124.700829] {1}[Hardware Error]:  fru_text: Card01, ChnE, DIMM0
[   124.700830] {1}[Hardware Error]:   section_type: memory error
[   124.700835] {1}[Hardware Error]:   error_status: 0x0000000000000400
[   124.712309] Memory failure: 0x25eae3: recovery action for dirty LRU page: Recovered
[   124.718713] {1}[Hardware Error]:   physical_address: 0x000000015ace3400
[   124.718715] {1}[Hardware Error]:   node: 0 card: 4 module: 0 rank: 0 bank: 21 device: 0 row: 10455 column: 1408 
[   124.718716] {1}[Hardware Error]:   error_type: 4, single-symbol chipkill ECC
[   124.718718] {1}[Hardware Error]:   DIMM location: _Node0_Channel4_Dimm0 CPU0_E0 
[   124.791089] Memory failure: 0x25eae3: already hardware poisoned
3 116
400
[    0.000000] Linux version 4.18.0-348.el8.x86_64

I checked the source code:

https://elixir.bootlin.com/linux/v4.18/source/kernel/sched/core.c#L3287

OS should only panic when panic_on_warn == 1, but I checked my OS:

sudo sysctl -a | grep -i panic_on
...
kernel.panic_on_warn = 0

Curious about what being printed between the BUG… message and the start of some reboot. **Should have re "scheduling while atomic"** then the stack dump. According to the code you should get the stack dump. And since you panic_on_warn=0 then the system could well panic for some other reason when dumping the stack. And **not** because of the scheduling bug. — MC68020, Aug 26 '22 at 16:58
I'd indeed bet on some double fault when dumping the stack. So please provide the missing lines your represented as.... — MC68020, Aug 26 '22 at 17:21
And BTW, DO CARE when reading code on linux' github. You are referring to **current** from which… 4.18 is actually **very far**. (not a real problem here since the debug scheduling code did not change much) — MC68020, Aug 26 '22 at 17:25

MC68020 · Accepted Answer · 2022-08-27T10:16:44.660

0

OK then, only in order to confirm my comments here-above thanks to the supplemental information you provided :

The kernel does not panic because of BUG: scheduling while atomic (being, as intended with kernel.panic_on_warn = 0, not a valid reason for panicing) but more obviously because of repeated hardware memory failures detected by the MCE interrupt handler and possibly source of some fatal problem in that handler.

edited Aug 27 '22 at 10:16

answered Aug 27 '22 at 00:35

MC68020

6,281
2
13
44

It is a single injection and recoverable error. The same test on the newer kernel didn't crash. Please ref: https://unix.stackexchange.com/questions/714922/inject-uncorrected-error-then-system-reboot – Mark K Aug 27 '22 at 01:45
@GreenTea : ACK ! I'll swap to this new question. For what concerns this very question (kernel panic when panic_on_warn==0) my point remains that **"scheduling while atomic" did not trigger the kernel panic.** (you'd have had the stack dump) but because of some problem in handling the MCE interrupt related to the hardware memory errors detection. – MC68020 Aug 27 '22 at 10:11
I didn't see any related log to indicate why it rebooted/panic, only the "scheduling while atomic" log is different from the PASS ( no crashed) log. – Mark K Aug 27 '22 at 10:20
@GreenTea : So that is a way to tell that it is a real panic deep into kernel code. Something like a double-fault happening before dumping the stack. I'll swich to your other thread in which you give more info (SIGBUS) – MC68020 Aug 27 '22 at 10:44

Why kernel panic when panic_on_warn==0

1 Answers1

Linked