Here is my setup: no partition, just LUKS1 and then XFS inside that.
To confirm, the disk was set up like so: disk is installed, luksFormat, luksOpen, mkfs.xfs, mount, start using.
First of all, what block size do I use for badblocks? Seems like:
How can I choose optimal values for block size and blocks number for badblocks?
lsblk -o NAME,PHY-SeC /dev/sdb
result: 4096
I ran badblocks on the physical device, ie:
badblocks -b 4096 /dev/sdb -sv
and it found three bad blocks.
Now I need to try and untangle the combination of LUKS and XFS to determine which files might be corrupted.
What size is the LUKS1 header?
There is conflicting information here.
cryptsetup luksDump /dev/sdb
payload offset is shown as 4096, but what blocksize would that be?
Maybe I can get the size of the XFS file system in blocks:
xfs_info /dev/mapper/my_data:
blocks=976754134
(under the "data" row)
badblocks tells me the total number of blocks:
badblocks -b 4096 /dev/sdb -sv
Checking blocks 0 to 976754645
so 976754645 + 1 = 976754646 blocks on the disk total, if that's correct.
976754646 (badblocks) - 976754134 (xfs_info) = 512 blocks at the beginning of the disk used for the LUKS1 header it seems like.
However from here:
https://www.smartmontools.org/wiki/BadBlockHowto
It says the LUKS header is 16 MB? (Also wouldn't it be MiB?) I know the LUKS2 header is 16 MiB, but unsure about the LUKS1 header. From my test above it seems like 2 MiB, but then how could something like cryptsetup convert work if there isn't 16 MiB free space at the beginning of the disk? The existence of cryptsetup convert taken along with the smartmontools article seems to imply the block size for the payload offset is 4096.
But I can't figure out how that would fit on the disk, so proceeding assuming 4096 count (payload offset) * 512 byte block size = 2 MiB LUKS1 header offset.
2 MiB offset / 4096 block size = the first 512 blocks on the disk are for the LUKS1 header. Thus, it seems to be like I need to subtract 512 from the output of badblocks, to translate disk blocks into filesystem blocks.
For example, badblocks tells me block #512 is bad, that would be block #0 in the XFS filesystem, etc.
I now have the XFS block numbers for each bad block.
Now for each bad block, I do the following:
xfs_db -c 'blockget -b <xfs block number>' /dev/mapper/my_data
And it will tell me the inode number. But one odd thing here: I noticed the output looks like this:
xfs_db -c 'blockget -b 21512351' /dev/mapper/my_data
setting block 3/3847384 to data
setting inode to 7589729572 for block 3/3847384
inode 7589729572 block 21512351 at offset 34534
What exactly does that block 3/3847384 mean?
I log these inode numbers and remount the filesystem.
Next, on the filesystem I do the following:
find /mnt/my_data -inum <inode>
and it gives me the files affected by these bad blocks.
Next, to test I do the following:
dd if=/path/to/file of=/dev/null iflag=direct status=progress
However, this doesn't give me any errors for the files I found, and the last rsync I did off the bad drive showed an entirely different file having io errors, which I can repro with dd as shown above.
So...where are my errors? Is the LUKS header the wrong size? Is there some sort of additional offset I need to subtract?