-2

we are using HDP version - 2.6.5 , and HDFS Block replication is 3

we are try to understand data nodes disks min requirements for production mode and according to the fact that Block replication=3

since we are talking about production cluster and regrading to HDFS replica = 3

what should be the min disks number per data-node machine?

yael
  • 12,598
  • 51
  • 169
  • 303

1 Answers1

0

Replication factor just mean you need desired amount of storage * 3. About disks will be wise to have one disk for OS/software and the rest disks for data. As you do not provide the size of data you want to store, desired I/O operations/s it's hard to provide more detailed answer. But you can consider having 3 disks (if DAS) one for OS/apps and two (in mirror) for data.

Romeo Ninov
  • 16,541
  • 5
  • 32
  • 44
  • do you see any risks to use only two disks for HDFS on each datanode ( while replication factor is 3 ) ? – yael Jan 19 '20 at 21:11
  • @yael, replication is between nodes, not disks. So IMHO it is (almost) the same of having one or 10 disks. And you are free to set RAID1, RAID5 if you need higher security of data. – Romeo Ninov Jan 19 '20 at 21:33