0

we can see from our RHEL 7.6 server ( kernel version - 3.10.0-957.el7.x86_64 ) that following process are with D state ( they runs from HDFS user )

Note - D state code means that process is in uninterruptible sleep

ps -eo s,user,cmd | grep ^[RD]
D hdfs     du -sk /grid/sdj/hadoop/hdfs/data/current/BP-1018134753-10.3.6.170-1530088122990
D hdfs     du -sk /grid/sdm/hadoop/hdfs/data/current/BP-1018134753-10.3.6.170-1530088122990
R root     ps -eo s,user,cmd

note's - the disks sdj and sdm are 3T byte size , also "du -sk" happens on other disks as sdd , sdf etc and the disks are with ext4 file-system

we are suspect that the fact that we have high CPU load avrg is because the "du -sk" that actually run on the disks

so I was thinking what we can do regarding to below behavior

one option is maybe to disable the "du -sk" verification from HDFS , but no clue how to do that

second option is to think what actually cause the D state ?

I don't sure ... but maybe upgrade the kernel version will help to avoid D state? or else? ( like disable the CPU Thread(s) ) , etc ?

more details

lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6

and CPU LOAD AVRG is around ~ 42-45 ( for 15min avrg )

Reference :

https://community.cloudera.com/t5/Support-Questions/Does-hadoop-run-dfs-du-automatically-when-a-new-job-starts/td-p/231297

https://community.cloudera.com/t5/Support-Questions/Can-hdfs-dfsadmin-and-hdfs-dsfs-du-be-taxing-on-my-cluster/m-p/182402

https://community.pivotal.io/s/article/Dealing-with-Processes-in-State-D---Uninterruptible-Sleep-Usually-IO?language=en_US

https://www.golinuxhub.com/2018/05/how-to-disable-or-enable-hyper/

yael
  • 12,598
  • 51
  • 169
  • 303

1 Answers1

1

The load average is not purely CPU load, it was introduced as a generic metric to allow users on shared machines to quickly see how "busy" the machine is. That's why a process causing lots of disk activity is counted the same way as a process that uses a CPU.

So, this is not a metric you want to use for tuning.

Processes in D state happen if a filesystem is badly programmed, this was a constant source of annoyance in the 90ies with NFS. From a performance standpoint, there is no difference between a filesystem that has no provision to clean up after a signal, and a filesystem that has one.

The D state exists solely for file systems that lack proper cleanup mechanisms and must follow the normal request flow even if the program they are operating on behalf of has been interrupted or terminated.

Simon Richter
  • 4,409
  • 18
  • 20
  • so in fact HDFS user runs the "du -sk" , what are the solution that we can do regarding that ? , or maybe as I understand from your post that no solution for this problem ? – yael Nov 28 '21 at 14:56
  • @yael do you expect those `du` commands to ever finish? Is that the issue? Load average is an interesting metric but doesn’t mean anything if you are using it to measure something else. Any CPU thread that sits in the runtime queue waiting for the scheduler (in this case, for I/O) will increase the load average, so what you are describing is expected. – jsbillings Nov 28 '21 at 16:54
  • @jsbillings , the du is finished after some time , some times its 30sec and sometime more as 1-2min ( depend the disk size of course ) – yael Nov 28 '21 at 18:25