This may actually be hardware failure. I was getting this on my PC while gaming on my AMD ATI Radeon HD 8670 GPU on arch linux with 6.3.1-zen1-1-zen kernel. This is an HP Zendesk fyi. I tried dropping the kernel to the last LTS and the one before it (5.10 iirc) but still got the crash after a few minutes of playing a game.
I happen to have a Dell homeserver running the same os and kernel (arch with zen) which has an AMD ATI Radeon HD 8570 GPU. Essentially it's the same card but with a bit less onboard DDR5 iirc.
Well, I swapped the cards out (8570 now in the HP mb and 8670 in the Dell) and haven't had any trouble with the 8570 while playing games.
So... with all the same hardware/software/firmware/drivers, the 8570 worked where the 8670 didn't. All I did was switch cards; no reinstalling of drivers or anything. I should also note that games used to work fine on the 8670, so I think it just kicked the bucket one day.
So I know hardware failures are rare, but if this ain't one, I don't know what is. Sorry to be potentially breaking bad news. For me, I don't use the homeserver for games so making this switch has been fine for me.
Here's one of my dmesg logs from the 8760 crashing on the HP:
...
[32776.529276] radeon 0000:0b:00.0: ring 0 stalled for more than 28224msec
[32776.529282] radeon 0000:0b:00.0: GPU lockup (current fence id 0x0000000000108667 last fence id 0x00000000001086ba on ring 0)
[32776.673264] radeon 0000:0b:00.0: ring 3 stalled for more than 28228msec
[32776.673268] radeon 0000:0b:00.0: GPU lockup (current fence id 0x00000000000380db last fence id 0x0000000000038154 on ring 3)
[32777.033251] radeon 0000:0b:00.0: ring 0 stalled for more than 28728msec
[32777.033259] radeon 0000:0b:00.0: GPU lockup (current fence id 0x0000000000108667 last fence id 0x00000000001086bb on ring 0)
[32777.177236] radeon 0000:0b:00.0: ring 3 stalled for more than 28732msec
[32777.177240] radeon 0000:0b:00.0: GPU lockup (current fence id 0x00000000000380db last fence id 0x0000000000038156 on ring 3)
[32777.537217] radeon 0000:0b:00.0: ring 0 stalled for more than 29232msec
[32777.537221] radeon 0000:0b:00.0: GPU lockup (current fence id 0x0000000000108667 last fence id 0x00000000001086bc on ring 0)
[32777.681206] radeon 0000:0b:00.0: ring 3 stalled for more than 29236msec
[32777.681209] radeon 0000:0b:00.0: GPU lockup (current fence id 0x00000000000380db last fence id 0x0000000000038159 on ring 3)
[32778.041191] radeon 0000:0b:00.0: ring 0 stalled for more than 29736msec
[32778.041194] radeon 0000:0b:00.0: GPU lockup (current fence id 0x0000000000108667 last fence id 0x00000000001086bd on ring 0)
[32778.185183] radeon 0000:0b:00.0: ring 3 stalled for more than 29740msec
[32778.185186] radeon 0000:0b:00.0: GPU lockup (current fence id 0x00000000000380db last fence id 0x000000000003815a on ring 3)
[32779.776047] BUG: unable to handle page fault for address: ffffbdd0c13e9ffc
[32779.776052] #PF: supervisor read access in kernel mode
[32779.776054] #PF: error_code(0x0000) - not-present page
[32779.776055] PGD 100000067 P4D 100000067 PUD 0
[32779.776058] Oops: 0000 [#1] PREEMPT SMP NOPTI
[32779.776061] CPU: 8 PID: 157222 Comm: openmw Tainted: G S 6.1.12-zen1-1-zen #1 f86a89fe584efe7bcf920c69db3728bed4671799
[32779.776064] Hardware name: HP HP EliteDesk 705 G5 SFF/8618, BIOS R09 Ver. 02.02.02 11/15/2019
[32779.776065] RIP: 0010:radeon_ring_backup+0xc2/0x160 [radeon]
[32779.776196] Code: 49 c1 e6 02 4c 89 f7 e8 9c cc ab f5 49 89 45 00 48 89 c2 48 85 c0 74 5f 48 8b 4b 10 41 8d 47 01 45 89 ff 23 43 5c 4a 8d 34 b9 <8b> 36 89 32 41 83 fc 01 74 29 ba 04 00 00 00 eb 04 48 8b 4b 10 8d
[32779.776197] RSP: 0018:ffffbdcccfc5bbd8 EFLAGS: 00010246
[32779.776199] RAX: 0000000000000000 RBX: ffff9460e434d620 RCX: ffffbdccc13ea000
[32779.776201] RDX: ffff9465dbd00000 RSI: ffffbdd0c13e9ffc RDI: 00000000000392d7
[32779.776202] RBP: ffff9460e434d600 R08: 00000000000392d0 R09: 0000000000000006
[32779.776203] R10: fffff6a4d96f4000 R11: 000000000000577f R12: 000000000003dd71
[32779.776204] R13: ffffbdcccfc5bc50 R14: 00000000000f75c4 R15: 00000000ffffffff
[32779.776205] FS: 00007fbd98eb96c0(0000) GS:ffff94677ec00000(0000) knlGS:0000000000000000
[32779.776207] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[32779.776208] CR2: ffffbdd0c13e9ffc CR3: 0000000490706000 CR4: 0000000000350ee0
[32779.776210] Call Trace:
[32779.776212] <TASK>
[32779.776213] radeon_gpu_reset+0xf7/0x2f0 [radeon de372908aa1ea62ea129bf192d817412c67e128b]
[32779.776243] radeon_gem_wait_idle_ioctl+0xb8/0x100 [radeon de372908aa1ea62ea129bf192d817412c67e128b]
[32779.776273] ? radeon_gem_busy_ioctl+0xb0/0xb0 [radeon de372908aa1ea62ea129bf192d817412c67e128b]
[32779.776302] drm_ioctl_kernel+0xcd/0x170
[32779.776306] drm_ioctl+0x1eb/0x450
[32779.776308] ? radeon_gem_busy_ioctl+0xb0/0xb0 [radeon de372908aa1ea62ea129bf192d817412c67e128b]
[32779.776337] radeon_drm_ioctl+0x4d/0x80 [radeon de372908aa1ea62ea129bf192d817412c67e128b]
[32779.776364] __x64_sys_ioctl+0x94/0xd0
[32779.776369] do_syscall_64+0x5f/0x90
[32779.776373] ? do_syscall_64+0x6b/0x90
[32779.776375] ? syscall_exit_to_user_mode+0x2c/0x1d0
[32779.776378] ? syscall_exit_to_user_mode+0x2c/0x1d0
[32779.776380] ? do_syscall_64+0x6b/0x90
[32779.776382] ? syscall_exit_to_user_mode+0x2c/0x1d0
[32779.776384] ? do_syscall_64+0x6b/0x90
[32779.776385] ? do_syscall_64+0x6b/0x90
[32779.776387] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[32779.776390] RIP: 0033:0x7fbdb591553f
[32779.776418] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[32779.776420] RSP: 002b:00007fbd98eb80f0 EFLAGS: 00200246 ORIG_RAX: 0000000000000010
[32779.776422] RAX: ffffffffffffffda RBX: 00007fbd7d74eb80 RCX: 00007fbdb591553f
[32779.776423] RDX: 00007fbd98eb8190 RSI: 0000000040086464 RDI: 0000000000000010
[32779.776425] RBP: 00007fbd98eb8190 R08: 0000000000000000 R09: ffffffffffffffff
[32779.776426] R10: 0000000000000000 R11: 0000000000200246 R12: 0000000040086464
[32779.776427] R13: 0000000000000010 R14: 000055d27885abd0 R15: 000055d278a375d8
[32779.776429] </TASK>
[32779.776430] Modules linked in: rfcomm xt_nat veth nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink br_netfilter bridge stp llc rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm nfsv4 ib_cm dns_resolver ib_core nfs fscache wireguard netfs curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel overlay cmac algif_hash algif_skcipher af_alg bnep isofs cdrom amdgpu gpu_sched drm_buddy squashfs vfat fat iwlmvm mac80211 snd_hda_codec_conexant snd_hda_codec_generic libarc4 ledtrig_audio snd_hda_codec_hdmi intel_rapl_msr radeon snd_hda_intel intel_rapl_common btusb edac_mce_amd btrtl snd_intel_dspcfg btbcm snd_intel_sdw_acpi drm_ttm_helper kvm_amd snd_hda_codec btintel iwlwifi snd_hda_core hp_wmi btmtk ttm snd_hwdep sparse_keymap kvm platform_profile wmi_bmof sp5100_tco bluetooth snd_pcm irqbypass r8169 ucsi_acpi drm_display_helper video cfg80211 psmouse rapl typec_ucsi pcspkr snd_timer realtek k10temp i2c_piix4 ecdh_generic cec
[32779.776479] ipmi_devintf typec snd mdio_devres soundcore ipmi_msghandler ip6t_REJECT rfkill libphy roles nf_reject_ipv6 joydev wmi mousedev gpio_amdpt xt_hl gpio_generic acpi_cpufreq ip6_tables ip6t_rt mac_hid ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog xt_multiport nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c nfnetlink nfsd auth_rpcgss nfs_acl lockd grace sg crypto_user sunrpc loop fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted asn1_encoder tee usbhid uas usb_storage dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel serio_raw polyval_clmulni atkbd polyval_generic gf128mul libps2 ghash_clmulni_intel vivaldi_fmap sha512_ssse3 nvme aesni_intel crypto_simd nvme_core ccp cryptd xhci_pci i8042 xhci_pci_renesas nvme_common serio
[32779.776522] CR2: ffffbdd0c13e9ffc
[32779.776523] ---[ end trace 0000000000000000 ]---
[32779.776524] RIP: 0010:radeon_ring_backup+0xc2/0x160 [radeon]
[32779.776554] Code: 49 c1 e6 02 4c 89 f7 e8 9c cc ab f5 49 89 45 00 48 89 c2 48 85 c0 74 5f 48 8b 4b 10 41 8d 47 01 45 89 ff 23 43 5c 4a 8d 34 b9 <8b> 36 89 32 41 83 fc 01 74 29 ba 04 00 00 00 eb 04 48 8b 4b 10 8d
[32779.776555] RSP: 0018:ffffbdcccfc5bbd8 EFLAGS: 00010246
[32779.776557] RAX: 0000000000000000 RBX: ffff9460e434d620 RCX: ffffbdccc13ea000
[32779.776558] RDX: ffff9465dbd00000 RSI: ffffbdd0c13e9ffc RDI: 00000000000392d7
[32779.776559] RBP: ffff9460e434d600 R08: 00000000000392d0 R09: 0000000000000006
[32779.776560] R10: fffff6a4d96f4000 R11: 000000000000577f R12: 000000000003dd71
[32779.776561] R13: ffffbdcccfc5bc50 R14: 00000000000f75c4 R15: 00000000ffffffff
[32779.776562] FS: 00007fbd98eb96c0(0000) GS:ffff94677ec00000(0000) knlGS:0000000000000000
[32779.776563] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[32779.776565] CR2: ffffbdd0c13e9ffc CR3: 0000000490706000 CR4: 0000000000350ee0