0

Is there any way to determine what may have happened in the following scenario? We have a RAID5 that should have been originally built with 5 drives.

As it stands, this is some of the state that I'm seeing:

/dev/md/dcp-data:
           Version : 1.2
     Creation Time : Fri Aug 16 14:15:40 2019
        Raid Level : raid5
        Array Size : 23441679360 (22355.73 GiB 24004.28 GB)
     Used Dev Size : 7813893120 (7451.91 GiB 8001.43 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Tue Jan 21 13:00:36 2020
             State : active, degraded, recovering
    Active Devices : 3
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 1

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

    Rebuild Status : 16% complete

              Name : localhost:dcp-data
              UUID : 0bd03b0a:59e1665c:d393f6fe:a032dac6
            Events : 165561

    Number   Major   Minor   RaidDevice State
       0       8      113        0      active sync   /dev/sdh1
       1       8      129        1      active sync   /dev/sdi1
       5       8      160        2      spare rebuilding   /dev/sdk
       4       8      177        3      active sync   /dev/sdl1
[root@cinesend ~]# lsblk
NAME                                          MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sdc                                             8:32   0 465.8G  0 disk
└─sdc1                                          8:33   0 465.8G  0 part  /mnt/drive-51a7a5af
sdd                                             8:48   0 465.8G  0 disk
└─sdd1                                          8:49   0 465.8G  0 part  /mnt/drive-299a7133
sde                                             8:64   0 223.6G  0 disk
├─sde1                                          8:65   0   200M  0 part  /boot/efi
├─sde2                                          8:66   0   200M  0 part  /boot
└─sde3                                          8:67   0 223.2G  0 part
  └─luks-cf912397-326e-42eb-a729-bce4de6bff14 253:0    0 223.2G  0 crypt /
sdh                                             8:112  0   7.3T  0 disk
└─sdh1                                          8:113  0   7.3T  0 part
  └─md127                                       9:127  0  21.9T  0 raid5 /mnt/library
sdi                                             8:128  0   7.3T  0 disk
└─sdi1                                          8:129  0   7.3T  0 part
  └─md127                                       9:127  0  21.9T  0 raid5 /mnt/library
sdj                                             8:144  0   7.3T  0 disk
sdk                                             8:160  0   7.3T  0 disk
└─md127                                         9:127  0  21.9T  0 raid5 /mnt/library
sdl                                             8:176  0   7.3T  0 disk
└─sdl1                                          8:177  0   7.3T  0 part
  └─md127                                       9:127  0  21.9T  0 raid5 /mnt/library

As well as one relevant mail message:

Date: Tue, 21 Jan 2020 07:59:30 -0800 (PST)

This is an automatically generated mail message from mdadm
running on cinesend

A DegradedArray event had been detected on md device /dev/md127.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid6] [raid5] [raid4]
md127 : active (auto-read-only) raid5 sdh1[0] sdl1[4] sdi1[1] sdk[5](S)
      23441679360 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UU_U]
      bitmap: 0/59 pages [0KB], 65536KB chunk

What happened here? Originally, /dev/sdj should have been included in the RAID5 array. I see now that /dev/sdk is rebuilding/recovering itself back into the array...

But shouldn't RAID5 not be able to sustain two drive failures?

What are some possible scenarios for what happened here?

Rail24
  • 141
  • 2
  • 9
  • RAID6 is capable to cope with 2 concurrent disk failures, not RAID5. – Vlastimil Burián Jan 21 '20 at 21:18
  • Right, so I'm a bit confused about what may have happened here - we originally built the raid with 5 drives as a RAID5. And yet now there's one "spare", and 3 active... Shouldn't it have fully failed? – Rail24 Jan 21 '20 at 21:57
  • I'm unsure, I've never felt so desperate to build RAID**5**, just because it's unreliable. If building with only 4 disks, do yourself a favor next time you build it, and do a RAID**6** ... [my implementation procedure](https://unix.stackexchange.com/a/320330/126755). – Vlastimil Burián Jan 22 '20 at 05:15

0 Answers0