My OS got kernel panic ( it looks like triggered another kernel to dump, kdump ? )
[ 124.674715] core: Uncorrected hardware memory error in user-access at xxxxxxx
[ 124.684140] BUG: scheduling while atomic: einj_mem_uc/5151/0xxxxxxxxx
[ 124.684310] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
r = 0xxxxxxxxxxx[ 124.691839] Memory failure: 0x25eae3: Killing einj_mem_uc:6161 due to hardware memory corruption
[ 124.700827] {1}[Hardware Error]: event severity: recoverable
[ 124.700828] {1}[Hardware Error]: Error 0, type: recoverable
00 paddr = xxxxx[ 124.700829] {1}[Hardware Error]: fru_text: Card01, ChnE, DIMM0
[ 124.700830] {1}[Hardware Error]: section_type: memory error
[ 124.700835] {1}[Hardware Error]: error_status: 0x0000000000000400
[ 124.712309] Memory failure: 0x25eae3: recovery action for dirty LRU page: Recovered
[ 124.718713] {1}[Hardware Error]: physical_address: 0x000000015ace3400
[ 124.718715] {1}[Hardware Error]: node: 0 card: 4 module: 0 rank: 0 bank: 21 device: 0 row: 10455 column: 1408
[ 124.718716] {1}[Hardware Error]: error_type: 4, single-symbol chipkill ECC
[ 124.718718] {1}[Hardware Error]: DIMM location: _Node0_Channel4_Dimm0 CPU0_E0
[ 124.791089] Memory failure: 0x25eae3: already hardware poisoned
3 116
400
[ 0.000000] Linux version 4.18.0-348.el8.x86_64
I checked the source code:
https://elixir.bootlin.com/linux/v4.18/source/kernel/sched/core.c#L3287
OS should only panic when panic_on_warn == 1, but I checked my OS:
sudo sysctl -a | grep -i panic_on
...
kernel.panic_on_warn = 0