5

I'm doing data recovery right now from a disk I've extracted from an old NAS.

It looks like mkfs.ext3 froze on Writing superblocks and filesystem accounting information: since it's more that one hour that I'm waiting for done to appear.

The disk is 2TB SATA connected to USB 3.0, is it normal it takes so long? Is it safe to terminate the program now?

Samuele Pilleri
  • 153
  • 1
  • 1
  • 6

1 Answers1

3

update: From looking at lsusb and dmesg, confirmed that the drive has dropped off the USB bus. So the mkfs has hung. kill -9 on it may stop it and allow the mdraid array to be stopped, or a reboot may be required. If you have to reboot, beware that the system may not reboot cleanly—so it'd be best to sync and unmount/remount read-only any other writable filesystems as you may have to hit reset.

Depending on the filesystem and options, mkfs can take a long time (and ext3 is one where it does). It is safe to terminate, but of course you'll have to run mkfs again. Which—if it was actually making progress—means you'll have to wait again (and it will start over from the beginning).

ext4 is much faster to mkfs, especially with lazy_itable_init (which is the default). If possible, switch.

Remember with an ext2/3/4 filesystem, x% of the disk is consumed for inode tables. Without lazy_itable_init, they're all being written now. That's a lot of data to write (approximately 1.6% of the disk with default settings), and spread out over the entire disk no less.

That also gives another way to reduce the time: write fewer inodes. But of course if you go too low, you'll run out.

If you want to check if it's actually making progress, confirm if I/O is happening. Some disks have an indicator light, or you can often tell (with magnetic disks) by holding your ear close and listening.

Alternatively, if you have iostat available, iostat -kx 10 will show you first IO stats since boot, then every 10s statistics over the prior 10s. You can look for the number of writes being done, and the disk utilization.

derobert
  • 107,579
  • 20
  • 231
  • 279
  • thanks for your answer. Unfortunately, I cannot switch to ext4 atm. I've checked with iostat and it seems it's doing no IO at all on /dev/md0 (actual drive, /dev/sdb, is not even listed, I'm referring to md0 because two partitions are in RAID). – Samuele Pilleri May 11 '17 at 16:09
  • @SamuelePilleri mdX devices aren't "real" enough to report I/O stats, well at least not utilization. Should give reads and writes, though. – derobert May 11 '17 at 16:11
  • block device (/dev/sdb) isn't even listed in iostat; and the disk seems idle. Here's a snippet of the script I'm using, might tell something more: `mdadm --create /dev/md0 --verbose --metadata=0.9 --raid-devices=2 --level=raid1 --run /dev/sdb1 missing` `mdadm --wait /dev/md0` `sync` `sleep 2` `mkfs.ext3 -c -b 4096 /dev/md0` `sync` Also, a trace with sysdig says nothing is happening on /dev/sdb, might be a good time to kill or do I just need to be more patient? Thank you for your help. – Samuele Pilleri May 11 '17 at 16:16
  • @SamuelePilleri I think you missed my comment on the question—check `cat /proc/mdstat` and `dmesg`. Probably best to edit the mdstat into the question. – derobert May 11 '17 at 16:18
  • Yep, I missed that. `/proc/mdstat` says /md0 is raid1 on sdb1 and that there are no unused devices. But, great intuition, `dmesg` reports several times that `task mkfs.ext3 blocked for more that 120 seconds`. Kernel bug? Seems odd. I'm on Ubuntu 17.04 with Linux 4.10.0-20-generic. – Samuele Pilleri May 11 '17 at 16:29
  • @SamuelePilleri Blocked more than 120 seconds often means failing drive (or other hardware), but yeah it could also be a kernel bug. Probably also check lsusb—if the drive isn't there, it's fallen off the USB bus (and thus your mkfs is dead—and worse possibly unkillable). – derobert May 11 '17 at 16:33
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/58591/discussion-between-samuele-pilleri-and-derobert). – Samuele Pilleri May 11 '17 at 16:37