we can see from our RHEL 7.6 server ( kernel version - 3.10.0-957.el7.x86_64 ) that following process are with D state ( they runs from HDFS user )
Note - D state code means that process is in uninterruptible sleep
ps -eo s,user,cmd | grep ^[RD]
D hdfs du -sk /grid/sdj/hadoop/hdfs/data/current/BP-1018134753-10.3.6.170-1530088122990
D hdfs du -sk /grid/sdm/hadoop/hdfs/data/current/BP-1018134753-10.3.6.170-1530088122990
R root ps -eo s,user,cmd
note's - the disks sdj and sdm are 3T byte size , also "du -sk" happens on other disks as sdd , sdf etc
and the disks are with ext4 file-system
we are suspect that the fact that we have high CPU load avrg is because the "du -sk" that actually run on the disks
so I was thinking what we can do regarding to below behavior
one option is maybe to disable the "du -sk" verification from HDFS , but no clue how to do that
second option is to think what actually cause the D state ?
I don't sure ... but maybe upgrade the kernel version will help to avoid D state? or else? ( like disable the CPU Thread(s) ) , etc ?
more details
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
and CPU LOAD AVRG is around ~ 42-45 ( for 15min avrg )
Reference :
https://www.golinuxhub.com/2018/05/how-to-disable-or-enable-hyper/