RAID6 unable to mount EXT4-fs: bad geometry: block count exceeds size of device

Question

On my server, I had an SSD as the boot drive with 11 6TB HDDs in a RAID6 setup as additional storage. However, after running into some issues with the motherboard, I switched the motherboard to one with only 4 SATA ports, so I reduced the size of the RAID6 setup from 11 to 4 drives. With <6TB of actual data being stored on the array, the data should be able to fit in the reduced storage space.

I believe I used the instructions on the following pages to shrink the array. Since it was quite a while ago, I don't actually remember if these were the pages or instructions used, nor do I remember many of the fine details:

On the 7 unused drives, I believe I zeroed the superblocks: sudo mdadm --zero-superblock.

For the 4 drives I want to use, I am unable to mount. I do not believe I used any partitions on the array.

sudo mount /dev/md127 /mnt/md127
mount: /mnt/md127: wrong fs type, bad option, bad superblock on /dev/md127, missing codepage or helper program, or other error.

From /var/log/syslog:

kernel: [ 1894.040670] EXT4-fs (md127): bad geometry: block count 13185878400 exceeds size of device (2930195200 blocks)

Since 13185878400 / 2930195200 = 4.5 = 9 / 2, I assume there is a problem with shrinking the file system or something similar. Since the RAID6 has 2 spare drives, going from 11 (9 active, 2 spare) to 11 (2 active, 9 spare)? to 4 (2 active, 2 spare) would explain why the block count is much higher than the size of the device by an exact multiple of 4.5.

Other information from the devices:

sudo mdadm --detail /dev/md127
/dev/md127:
           Version : 1.2
     Creation Time : Wed Nov 24 22:28:38 2021
        Raid Level : raid6
        Array Size : 11720780800 (10.92 TiB 12.00 TB)
     Used Dev Size : 5860390400 (5.46 TiB 6.00 TB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Sun Apr  9 04:57:29 2023
             State : clean 
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : nao0:0  (local to host nao0)
              UUID : ffff85d2:b7936b45:f19fc1ba:29c7b438
            Events : 199564

    Number   Major   Minor   RaidDevice State
       9       8       16        0      active sync   /dev/sdb
       1       8       48        1      active sync   /dev/sdd
       2       8       32        2      active sync   /dev/sdc
      10       8        0        3      active sync   /dev/sda

sudo mdadm --examine /dev/sd[a-d]
/dev/sda:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : ffff85d2:b7936b45:f19fc1ba:29c7b438
           Name : nao0:0  (local to host nao0)
  Creation Time : Wed Nov 24 22:28:38 2021
     Raid Level : raid6
   Raid Devices : 4

 Avail Dev Size : 11720780976 sectors (5.46 TiB 6.00 TB)
     Array Size : 11720780800 KiB (10.92 TiB 12.00 TB)
  Used Dev Size : 11720780800 sectors (5.46 TiB 6.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=176 sectors
          State : clean
    Device UUID : 07f76b7f:f4818c5a:3f0d761d:b2d0ba79

Internal Bitmap : 8 sectors from superblock
    Update Time : Sun Apr  9 04:57:29 2023
  Bad Block Log : 512 entries available at offset 32 sectors
       Checksum : 914741c4 - correct
         Events : 199564

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : ffff85d2:b7936b45:f19fc1ba:29c7b438
           Name : nao0:0  (local to host nao0)
  Creation Time : Wed Nov 24 22:28:38 2021
     Raid Level : raid6
   Raid Devices : 4

 Avail Dev Size : 11720780976 sectors (5.46 TiB 6.00 TB)
     Array Size : 11720780800 KiB (10.92 TiB 12.00 TB)
  Used Dev Size : 11720780800 sectors (5.46 TiB 6.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=176 sectors
          State : clean
    Device UUID : 3b51a0c9:b9f4f844:68d267ed:03892b0d

Internal Bitmap : 8 sectors from superblock
    Update Time : Sun Apr  9 04:57:29 2023
  Bad Block Log : 512 entries available at offset 32 sectors
       Checksum : 294a8c37 - correct
         Events : 199564

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : ffff85d2:b7936b45:f19fc1ba:29c7b438
           Name : nao0:0  (local to host nao0)
  Creation Time : Wed Nov 24 22:28:38 2021
     Raid Level : raid6
   Raid Devices : 4

 Avail Dev Size : 11720780976 sectors (5.46 TiB 6.00 TB)
     Array Size : 11720780800 KiB (10.92 TiB 12.00 TB)
  Used Dev Size : 11720780800 sectors (5.46 TiB 6.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=176 sectors
          State : clean
    Device UUID : 0fcca5ee:605740dc:1726070d:0cef3b39

Internal Bitmap : 8 sectors from superblock
    Update Time : Sun Apr  9 04:57:29 2023
  Bad Block Log : 512 entries available at offset 32 sectors
       Checksum : 31472363 - correct
         Events : 199564

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : ffff85d2:b7936b45:f19fc1ba:29c7b438
           Name : nao0:0  (local to host nao0)
  Creation Time : Wed Nov 24 22:28:38 2021
     Raid Level : raid6
   Raid Devices : 4

 Avail Dev Size : 11720780976 sectors (5.46 TiB 6.00 TB)
     Array Size : 11720780800 KiB (10.92 TiB 12.00 TB)
  Used Dev Size : 11720780800 sectors (5.46 TiB 6.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=176 sectors
          State : clean
    Device UUID : e1912abb:ba98a568:8effaa66:c1440bd8

Internal Bitmap : 8 sectors from superblock
    Update Time : Sun Apr  9 04:57:29 2023
  Bad Block Log : 512 entries available at offset 32 sectors
       Checksum : 82a459ba - correct
         Events : 199564

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

After looking online, I tried to use fsck, e2fsck, and resize2fs to try to resolve the issue. However, I did not make any progress by trying this, and I may have made the problem worse by accidentally changing the data on the disk.

With resize2fs,

sudo resize2fs /dev/md127
resize2fs 1.46.5 (30-Dec-2021)
Please run 'e2fsck -f /dev/md127' first.

Since I could not use resize2fs to actually do anything, I used e2fsck which ran into many errors. Since there were thousands of errors, I quit before the program was able to finish.

sudo e2fsck -f /dev/md127
e2fsck 1.46.5 (30-Dec-2021)
The filesystem size (according to the superblock) is 13185878400 blocks
The physical size of the device is 2930195200 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort<y>? no
Pass 1: Checking inodes, blocks, and sizes
Error reading block 3401580576 (Invalid argument) while getting next inode from scan.  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 3401580577 (Invalid argument) while getting next inode from scan.  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 3401580578 (Invalid argument) while getting next inode from scan.  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 3401580579 (Invalid argument) while getting next inode from scan.  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 3401580580 (Invalid argument) while getting next inode from scan.  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 3401580581 (Invalid argument) while getting next inode from scan.  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 3401580582 (Invalid argument) while getting next inode from scan.  Ignore error<y>? yes
Force rewrite<y>?

My hypothesis is that there is probably some inconsistency in the reported size of the drives. I do not believe I had any partitions on the RAID nor any LVM volumes.

sudo fdisk -l
...

Disk /dev/sda: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: WDC WD60EZAZ-00S
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdb: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: WDC WD60EZAZ-00S
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdc: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: WDC WD60EZAZ-00S
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdd: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: WDC WD60EZAZ-00S
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/md127: 10.92 TiB, 12002079539200 bytes, 23441561600 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 524288 bytes / 1048576 bytes

The data on the 4 currently in use may or may not be altered by fsck / e2fsck, but the data should also be on the other 7 unused drives with the zeroed superblocks. It is not important to me which drives I recover the data from, so working solutions to recover from any grouping of the drives would be highly appreciated!

If any additional information is needed, I would be more than happy to provide it.

Your first link says `1. Reduce the filesystem size`. It seems like this wasn't done at all and trying to fsck in that state only makes things worse. Since the RAID has already been reshaped, the 7 drives won't help you much. Their data, if untouched, now represents a 11 drive raid6 with 4 drives missing. raid6 can only survive 2 missing drives. So their parity chunks are useless and only data chunks remain, so it's all in bits and pieces. — frostschutz, Apr 10 '23 at 00:14
So anything that isn't on your 4 disk array won't be fully recoverable / if at all. Thus you have an extreme case of "block count exceeds size of device" where the missing section can mostly be considered lost. Your best bet is that the filesystem didn't store anything important beyond that point. It's possible but there's no guarantee for anything since technically filesystems are allowed to store their data anywhere. Shrinking filesystems usually also involves relocating data. — frostschutz, Apr 10 '23 at 00:47

score 0 · Accepted Answer · answered Apr 10 '23 at 09:48

Your ext4 filesystem is (much) larger than your block device (54TB filesystem on a 12TB block device). e2fsck and resize2fs can be quite uncooperative in this situation. Filesystems hate it when huge chunks are missing.

For a quick data recovery, you can try your luck with debugfs in catastrophic mode:

# debugfs -c /dev/md127
debugfs 1.47.0 (5-Feb-2023)
debugfs: ls -l
| (this should list some files)
| (damaged files usually show with 0 bytes and 1-Jan-1970 timestamp)
debugfs: rdump / /some/recovery/dir/

This should copy out files (use an unrelated HDD for recovery storage) but some files might result in errors such as Attempt to read block from filesystem resulted in short read or similar.

In order to actually fix the filesystem, it's usually best to restore the original device size, and then go from there. Sometimes, shrinking a block device is reversible. But in your case, it's not reversible.

You could grow the RAID back to 11 devices but even with the correct drive order, it would not give back any of the missing data and even overwrite any that might have been left on the leftover disks. mdadm shifts offsets in every grow operation, so the layout would be all wrong.

So anything beyond the cutoff point is lost.

Furthermore it would take ages to reshape all this data (again) and the result won't be any better than just tacking on some virtual drive capacity (all zeroes with loop devices and dm-linear, or LVM thin volumes, or similar).

At best you could reverse it partially, by re-creating (using mdadm --create on copy-on-write overlays) your original 11 drive RAID 6 with 4 drives missing (as drives fully zeroed out).

But at most this would give you disconnected chunks of data with many gaps in between them, since this is beyond what RAID 6 can recover from. It's even more complicated since you no longer have the metadata (need to know the original offset, which was already changed on your current raid, as well as the drive order).

If you could manage to do it, you could stitch your current RAID (0-12TB) and restored raid (12TB-54TB) together with dm-linear (all on top of copy-on-write overlays) and see what can be found.

But this process is complicated and probability of success is low. For any data that was stored outside those 12TB that were kept by your shrink operation, some smaller than chunk/stripe files could have survived, while larger files would all be damaged.

Thanks for the valuable information. `debugfs` only showed a portion of the directories on the array, so I conclude that recovering the full filesystem is impossible given the current state of the drives. After you posted your comments and answer, I was very fortunately able to find an older backup of the array on a separate device, so I will be able to restore from that. I appreciate the knowledge and help! — jameszp, Apr 10 '23 at 23:09

RAID6 unable to mount EXT4-fs: bad geometry: block count exceeds size of device

1 Answers1