6

I inherited an Azure VM (Ubuntu 20.04) which has a 7 disk VG fully occupied by a RAID5 LV formatted as ext4.

I need to take backups and was hoping to use Azure Backup to snapshot the Azure Disks comprising the VG.

Azure Disk snapshots are not point-in-time consistent so I need to freeze the storage whilst the backup runs, both for filesystem integrity and LVM metadata reasons. My workload will tolerate this; I am trying to figure out the best method of making the raw disk blocks temporarily immutable.

fsfreeze - I tested freezing the filesystem, taking snapshots, unfreezing, then switching to the snapshots.

In my limited testing this is working ok and I don't see anything scary from LVM when the 'restored' disks are swapped back in, but I can only perform so many tests and if there is a 1% edge case where my disk metadata will be inconsistent I may not find it.

I'm apprehensive that I'm locking activity at such a high layer: no filesystem ops will occur whilst the FIFREEZE ioctl is active, but does this stop LVM from doing any kind of lower-level operation e.g. metadata updates, RAID-related activity?

I then tried dmsetup suspend /dev/mapper/my-lvol and this feels like a better solution.

Test setup:

fsfreeze

  1. echo 3 > /proc/sys/vm/drop_caches
  2. sync ; sync (old habits die hard :)
  3. fsfreeze -f /export
  4. dd if=/dev/mapper/my-lvol of=/dev/null status=progress

The dd runs to completion. I accept this is valid because I'm not accessing via the frozen filesystem, but it makes me wonder whether LVM could still be doing things at a low level whilst I'm assuming my Azure Disks are unchanging.

dmsetup suspend

  1. echo 3 > /proc/sys/vm/drop_caches
  2. sync ; sync
  3. dmsetup suspend /dev/mapper/my-lvol
  4. dd if=/dev/mapper/my-lvol of=/dev/null status=progress

The dd blocks as long as the suspend is in place. I can still dd the rmeta and rimage devices, but I sort of expected that.

With the dmsetup option I get a hung task syslog warning for jbd2. The stacktrace shows it's trying to commit journal transaction (jbd2_journal_commit_transaction()) which both reassures me that the LV is really locked, but also concerns me that I'm snapshotting the filesystem in an inconsistent state and it might need to replay the journal should we ever roll back to the snapshots. Our RPO will permit some rollback but ideally I'd like to design a solution which removes this risk.

Options I've discarded

  1. File-based backups: possible, but setup & management seemed more complicated than freezing for snapshots did - to begin with!
  2. Temporarily snapshotting the LV and backing up from that. The VG is full and I'd really prefer not to add more disk/resize VG/etc.

Questions

I would really appreciate any input here. As you can tell I'm at the edge (and possibly beyond) my understanding of Linux filesystems/block IO.

  1. Overall, does freezing/suspending seem like a workable solution to get point-in-time consistent snapshots?
  2. Am I still not looking deeply enough - just because jdb2 is unable to write a transaction could lvm or dm still be doing metadata updates at a lower level?

Thanks, tim

  • It's a very interesting problem. You say you don't want to add physical volumes to your system – so you want to store the snapshot outside, right? How / where you you intend to backup *to*? – Marcus Müller Mar 21 '23 at 16:35
  • Hi Marcus. I intend to use the Azure API to take Snapshots of the AzDisks backing each PV. These will reside in a resource group. That part was (sadly) the easy bit to solve :) – Tim Matthews Mar 21 '23 at 17:38
  • 1
    Strictly speaking, block level immutability can not be guaranteed by either fsfreeze or dmsetup suspend since LVM tools might work beneath the DM layer (LVM performs its own changes to the device mapper setup and updates metadata). It should still work fine as long as no RAID rebuild/reshape/pvmove/… is going on. There may be possible side effects to suspending. I usually leave the RAID to the host so the VM is a simple single disk affair where snapshots just work. Your discarded options sound very nice, perhaps reconsider them? – frostschutz Mar 22 '23 at 07:28
  • Hi frostschutz. Thank you for your reply - I had a feeling that my freeze/suspend might be at too-high a level in the device stack to be sure of freezing the raw disk, so I appreciate the confirmation. I think I'm going to have to go back to my discarded options - I need a reliable method even if it's more work, but I will continue to poke at this in my spare time so I can understand the situation. I did some additional testing which I'll add in a further comment in case it's useful to anybody who has this problem in the future. – Tim Matthews Mar 22 '23 at 14:14
  • I ran two jobs to simulate disk activity: `dd` from random offsets over my test files, `touch` random test files. As these ran I `dmsetup suspend /dev/mapper/my-lvol` and wrote some `sh` to loop `shasum` over the rmeta* & rimage* devices in /dev/mapper. Obviously there was no userspace LV activity but I wanted to identify anything LVM/DM/kernelspace that might touch the disks. Over a 24h window my hashes were consistent, suggesting that if LVM/DM do operate on suspended disks it's very infrequent. I will continue my test over a longer period and report back any useful data. – Tim Matthews Mar 22 '23 at 14:25
  • 1
    @TimMatthews yeah it should be fine if you can rule out edge cases. so nobody uses pvmove, lvcreate, it's not degraded/rebuilding/reshaping etc. no problem. with edge cases it's not enough to fsfreeze. take lvcreate for example, it creates a new DM device out of nothing, so what you froze or suspended before doesn't matter. it will update metadata on all your drives. if you took a snapshot of one disk after the other at that precise moment the disks would be inconsistent to each other. originally VM snapshots were meant to work whenever w/o knowing what the VM is currently doing internally. – frostschutz Mar 22 '23 at 15:36

0 Answers0