We use a custom built opsi.org bootimage to automatically install Windows on customers' client computers. The userland of this bootimage is based on an upstream bootimage, with some modifications from us, and a kernel taken from Ubuntu.
Since we've upgraded the kernel to Linux 4.8.0-42.45 from Ubuntu yakkety, we've started to receive complaints from our customers that the installation stops due to a lshw segfault:
[7] [Apr 27 23:29:08] Expecting compressed data from server (JSONRPC.py|660)
[5] [Apr 27 23:29:08] Running hardware inventory (setup.py|140)
[7] [Apr 27 23:29:08] Command 'lshw' found at: '/usr/bin/lshw' (Posix.py|640)
[6] [Apr 27 23:29:08] Executing: /usr/bin/lshw -xml 2>/dev/null (Posix.py|660)
[6] [Apr 27 23:29:08] Using encoding 'UTF-8' (Posix.py|691)
[7] [Apr 27 23:29:08] Exit code: 139 (Posix.py|748)
[2] [Apr 27 23:29:09] Traceback: (Logger.py|742)
[2] [Apr 27 23:29:09] line 1390 in '<module>' in file '/usr/local/bin/master.py' (Logger.py|742)
[2] [Apr 27 23:29:09] line 141 in '<module>' in file '/tmp/setup.py' (Logger.py|742)
[2] [Apr 27 23:29:09] line 2482 in 'auditHardware' in file '/usr/lib/pymodules/python2.6/OPSI/System/Posix.py' (Logger.py|742)
[2] [Apr 27 23:29:09] line 2526 in 'hardwareInventory' in file '/usr/lib/pymodules/python2.6/OPSI/System/Posix.py' (Logger.py|742)
[2] [Apr 27 23:29:09] line 755 in 'execute' in file '/usr/lib/pymodules/python2.6/OPSI/System/Posix.py' (Logger.py|742)
[2] [Apr 27 23:29:09] ==>>> Command '/usr/bin/lshw -xml 2>/dev/null' failed (139):
(master.py|1438)
At the same time, the following error is logged to dmesg:
[ 69.852348] usercopy: kernel memory exposure attempt detected from c0080000 (dma-kmalloc-512) (4096 bytes)
[ 69.852365] ------------[ cut here ]------------
[ 69.852367] kernel BUG at /build/linux-7qXOmc/linux-4.8.0/mm/usercopy.c:75!
[ 69.852370] invalid opcode: 0000 [#1] SMP
[ 69.852371] Modules linked in: arc4 md4 nls_utf8 cifs fscache joydev rtsx_usb_ms memstick snd_hda_intel rtsx_usb_sdmmc snd_hda_codec snd_hda_core acer_wmi snd_hwdep rtsx_usb r8169 fjes video sparse_keymap snd_pcm mii mei_txe wmi input_leds snd_timer mac_hid snd mei lpc_ich ahci libahci intel_smartconnect soundcore
[ 69.852399] CPU: 0 PID: 1528 Comm: lshw Not tainted 4.8.0-42-generic #45-Ubuntu
[ 69.852400] Hardware name: Acer Extensa 2508/Extensa 2508, BIOS V1.10 12/15/2014
[ 69.852402] task: f6d16f00 task.stack: f6cc6000
[ 69.852405] EIP: 0060:[<dd1f7543>] EFLAGS: 00010282 CPU: 0
[ 69.852411] EIP is at __check_object_size+0x123/0x12c
[ 69.852413] EAX: 0000005e EBX: c0080000 ECX: 00000247 EDX: 00000247
[ 69.852414] ESI: 00001000 EDI: dda5944f EBP: f6cc7ee0 ESP: f6cc7eb8
[ 69.852416] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 69.852418] CR0: 80050033 CR2: bf910000 CR3: 36883e40 CR4: 001006f0
[ 69.852419] Stack:
[ 69.852420] dda5f86c dda628ce dda97569 c0080000 f1402080 00001000 c0081000 c0080000
[ 69.852427] 00090000 00001000 f6cc7f1c dd5074d6 00000000 00001000 bf900678 00010000
[ 69.852434] 00080000 00000000 00090000 00000000 00090000 00000000 dd507430 f6cc7f60
[ 69.852440] Call Trace:
[ 69.852447] [<dd5074d6>] read_mem+0xa6/0x1f0
[ 69.852451] [<dd507430>] ? write_mem+0x1f0/0x1f0
[ 69.852454] [<dd1fb15f>] __vfs_read+0x1f/0x50
[ 69.852457] [<dd1fb85f>] vfs_read+0x7f/0x140
[ 69.852461] [<dd80a0a0>] ? down_write+0x10/0x40
[ 69.852465] [<dd1fc9e9>] SyS_read+0x49/0xb0
[ 69.852469] [<dd0037cd>] do_fast_syscall_32+0x8d/0x140
[ 69.852472] [<dd80c07a>] sysenter_past_esp+0x47/0x75
[ 69.852473] Code: 89 74 24 14 0f 44 ca ba ce 28 a6 dd 89 44 24 10 0f 44 d7 89 5c 24 0c 89 4c 24 08 89 54 24 04 c7 04 24 6c f8 a5 dd e8 15 91 f8 ff <0f> 0b b8 97 28 a6 dd eb b9 55 89 e5 57 56 53 83 ec 1c 3e 8d 74
[ 69.852516] EIP: [<dd1f7543>] __check_object_size+0x123/0x12c SS:ESP 0068:f6cc7eb8
[ 69.852523] ---[ end trace 5b12719d45b0befe ]---
I assume that there's either a bug in lshw, in Linux, or something fishy with the hardware. The problem affects a lot of machines though, so I'm ruling defective hardware out. The issue seems only to occur on Linux >= 4.8; at least Linux 4.4 is not affected. This is probably due to the fact that usercopy hardening was introduced in Linux 4.8.
The problem doesn't affect all machines (for example, it works fine in my VirtualBox VM; the affected machine that I'm currently testing with is an Acer Extensa 2508 notebook). We do use a version of lshw that's probably pretty ancient:
root@testnb:~# uname -a
Linux testnb 4.8.0-42-generic #45-Ubuntu SMP Wed Mar 8 20:05:25 UTC 2017 i686 GNU/Linux
root@testnb:~# lshw -version
B.02.14
root@testnb:~# lshw
Segmentation fault
I suspected that this might be the cause, so I statically compiled lshw 02.17-1.1 from Debian jessie, but that won't work either:
root@testnb:~# uname -a
Linux testnb 4.8.0-42-generic #45-Ubuntu SMP Wed Mar 8 20:05:25 UTC 2017 i686 GNU/Linux
root@testnb:~# ./lshw-02.17-static -version
B.02.17
root@testnb:~# ./lshw-02.17-static
Segmentation fault
I tried a Linux 4.8 package from Ubuntu yakkety that's a bit more recent:
root@testnb:~# uname -a
Linux testnb 4.8.0-49-generic #52-Ubuntu SMP Thu Apr 20 09:39:42 UTC 2017 i686 GNU/Linux
root@testnb:~# lshw
Segmentation fault
root@testnb:~# ./lshw-02.17-static
Segmentation fault
Linux 4.10 from Ubuntu zesty:
root@testnb:~# uname -a
Linux testnb 4.10.0-20-generic #22-Ubuntu SMP Thu Apr 20 09:22:16 UTC 2017 i686 GNU/Linux
root@testnb:~# lshw
Segmentation fault
root@testnb:~# ./lshw-02.17-static
Segmentation fault
I'm at a loss on what to do now. Any ideas?
EDIT: I've compiled a list of affected computers from our logs:
martin@dogmeat ~/pssh/lshw-segfault-bootimage/output % cat *.out | sed 's/.*DMI: //' | sort | uniq
Acer Extensa 2508/Extensa 2508, BIOS V1.09 10/24/2014 (Posix.py|741)
Acer Extensa 2508/Extensa 2508, BIOS V1.10 12/15/2014 (Posix.py|741)
Dell Inc. Latitude D630 /0KU184, BIOS A17 01/04/2010 (Posix.py|741)
Dell Inc. Latitude E5500 /0DW634, BIOS A15 11/05/2009 (Posix.py|741)
Dell Inc. Vostro 1015 /047MWF, BIOS A03 09/01/2010 (Posix.py|741)
FUJITSU ESPRIMO P910/D3162-A1, BIOS V4.6.5.3 R1.19.0 for D3162-A1x 12/17/2012 (Posix.py|741)
FUJITSU ESPRIMO P910/D3162-A1, BIOS V4.6.5.3 R1.22.0 for D3162-A1x 10/15/2013 (Posix.py|741)
Hewlett-Packard HP Compaq 6730b (GW687AV)/30DD, BIOS 68PDD Ver. F.10 07/31/2009 (Posix.py|741)
Hewlett-Packard HP Compaq 8510p /30C5, BIOS 68MVD Ver. F.0F 02/05/2008 (Posix.py|741)
Hewlett-Packard HP EliteBook 2540p/7008, BIOS 68CSU Ver. F.24 09/12/2013 (Posix.py|741)
Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.42 05/20/2013 (Posix.py|741)
Hewlett-Packard HP ProBook 4720s/1411, BIOS 68AZZ Ver. F.0B 09/16/2010 (Posix.py|741)
IBM 1860W25/1860W25, BIOS 70ET40WW (1.04 ) 06/02/2005 (Posix.py|741)
IBM 1860WR7/1860WR7, BIOS 70ET66WW (1.26 ) 05/18/2006 (Posix.py|741)
LENOVO 80ES/Lenovo B50-30, BIOS 9CCN21WW(V1.06) 04/09/2014 (Posix.py|741)
Quanta TW8/SW8/DW8/TW8/SW8/DW8, BIOS A3B92 10/07/2008 (Posix.py|741)
To Be Filled By O.E.M. To Be Filled By O.E.M./ALiveNF6G-GLAN, BIOS P1.70 03/06/2009 (Posix.py|741)
TOSHIBA Satellite Pro R50-B/Satellite Pro R50-B, BIOS Version 1.40 09/25/2014 (Posix.py|741)
TOSHIBA TECRA M10/Portable PC, BIOS Version 3.00 09/08/2009 (Posix.py|741)
EDIT2: tried lshw from Debian stretch, and the most recent version from upstream:
root@testnb:~# uname -a
Linux testnb 4.8.0-42-generic #45-Ubuntu SMP Wed Mar 8 20:05:25 UTC 2017 i686 GNU/Linux
root@testnb:~# ./lshw-02.18-static
Segmentation fault
root@testnb:~# ./lshw-static-b1eab6372d
Segmentation fault
EDIT3: I've now tested this with a clean Linux distribution (a Ubuntu 17.04 live CD) in a VirtualBox VM, and I can confirm that this issue is reproducible there - but only with a 32 bit lshw:
1st attempt with a 64 bit Live CD - the built-in
lshwworks:root@ubuntu:~# uname -a Linux ubuntu 4.10.0-19-generic #21-Ubuntu SMP Thu Apr 6 17:04:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux root@ubuntu:~# dpkg -l | grep lshw ii lshw 02.18-0.1ubuntu3 amd64 information about hardware configuration root@ubuntu:~# lshw | wc -l 231One of my static 32 bit
lshwbuilds does not:root@ubuntu:~# ./lshw-02.18-static Segmentation fault2nd attempt with a 32 bit Live CD - now even the built-in
lshwdoesn't work:root@ubuntu:~# uname -a Linux ubuntu 4.10.0-19-generic #21-Ubuntu SMP Thu Apr 6 17:03:14 UTC 2017 i686 i686 i686 GNU/Linux root@ubuntu:~# dpkg -l | grep lshw ii lshw 02.18-0.1ubuntu3 i386 information about hardware configuration root@ubuntu:~# lshw Segmentation faultThe segmentation fault does not occur when I run
lshwwithout root permissions:ubuntu@ubuntu:~$ lshw | wc -l WARNING: you should run this program as super-user. WARNING: output may be incomplete or inaccurate, you should run this program as super-user. 168
We tried VirtualBox on two different machines (one with a ASUSTeK H170-PRO/USB 3.1 mainboard, and one with a ASUSTeK P8H77-M mainboard) and with several different VM types (Microsoft Windows -> Windows 7 (32-bit), Microsoft Windows -> Windows 10 (64-bit), Linux -> Ubuntu (32-bit), and the problem is always reproducible.
EDIT4: for some reason, I'm now having trouble to reproduce the issue with our bootimage in VirtualBox. Maybe it's dependent on the VM configuration? It's still definitely reproducible on the Acer Extensa 2508 machine though. As the issue only seems to affect 32 bit builds of lshw, we now work around by using a 64 bit bootimage with a static 64 bit build of lshw on machines that support 64 bit.
The upstream bug report: http://www.ezix.org/project/ticket/750