100% CPU utilisation and hang after virsh migrate

Question

I've been experimenting with ZFS + DRBD + live migration (I want to understand it well enough to write my own automation scripts before I start playing with ganeti again, and then openstack cinder). I have the ZFS + DRBD (in dual-primary mode) working well for shared storage.

However, live migration is only partially working.

I have two hosts, with identical libvirt and drbd configurations, and even identical dedicated "volumes" pool for VM ZVOLs (both 2x1TB mirror pools - re-using some old disks from my old backup pool), and identical configurations for the VM (named "dtest")

"indra" is an AMD FX-8150 with 16GB RAM on an ASUS Sabertooth 990FX m/b
- cpu flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
"surya" is an AMD Phenom II X4 940 with 8GB RAM on an ASUS M3A79-T DELUXE m/b
- cpu flags fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid eagerfpu pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate vmmcall npt lbrv svm_lock nrip_save

Both are running debian sid, with exactly the same versions of packages (incl. libvirt* 2.0.0-1:amd64 and qemu-system-x86 1:2.6+dfsg-3), and with the same liquorix kernel:

Linux indra 4.6-2.dmz.2-liquorix-amd64 #1 ZEN SMP PREEMPT Debian 4.6-3 (2016-06-19) x86_64 GNU/Linux
Linux surya 4.6-2.dmz.2-liquorix-amd64 #1 ZEN SMP PREEMPT Debian 4.6-3 (2016-06-19) x86_64 GNU/Linux

The VM itself is running debian sid, with a stock debian 4.6.0-1 kernel:

Linux dtest 4.6.0-1-amd64 #1 SMP Debian 4.6.3-1 (2016-07-04) x86_64 GNU/Linux

I can start the VM on either host, and it works perfectly.

I can migrate a VM from surya to indra with no problems whatsoever. When I try to migrate the VM from indra to surya, the migration appears to complete successfully, but the VM hangs with 100% CPU usage (for the single core allocated to it).

It makes no difference whether the VM was started on indra and then migrated to surya (where it hangs) or if it was started on surya, migrated to indra (OK so far) and then migrated back to surya (hangs).

The only thing I can do with the VM when it hangs is virsh destroy (force-shutdown) or virsh reset (force-reboot).

I've tried disabling kvm_steal_time with:

 <qemu:commandline>
   <qemu:arg value='-cpu'/>
   <qemu:arg value='qemu64,-kvm_steal_time'/>
 </qemu:commandline>

but that doesn't solve the problem.

Nothing gets logged on or from the VM itself. The only indication I get of any problem is the following message in /var/log/libvirt/qemu/dtest.log on surya.

2016-07-18T12:56:55.766929Z qemu-system-x86_64: warning: TSC frequency mismatch between VM and host, and TSC scaling unavailable

This would be due to the tsc_scale cpu feature - present on the 8150 CPU (indra), missing on the x4 940 (surya).

Anyone know what the problem is? Or how to fix it? or suggestions for debugging?

Is it even fixable, or is it a CPU bug in the several-generations-old Phenom II x4 940?

score 5 · Accepted Answer · answered Jul 19 '16 at 06:16

I found a solution.

As I suspected, the cause of the problem was the lack of tsc_scale in the feature flags of surya's CPU.

It turns out that you can migrate a VM from a host without tsc_scale to a host with it, but a VM running on a host with tsc_scale can ONLY be migrated to another host with it.

Time to submit a bug report.

I created another ZFS ZVOL-based DRBD, this time between surya and another machine on my network, my main server ganesh.

ganesh is an AMD Phenom II 1090T with 32GB RAM on an ASUS Sabertooth 990FX m/b
- CPU Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr cpb hw_pstate vmmcall npt lbrv svm_lock nrip_save pausefilter

I can migrate a VM back and forth between between surya and ganesh, with no problems and I can migrate a VM from surya or ganesh to indra. But I can't migrate a VM from indra to either surya or ganesh.

I can live with this for now. ganesh is due to be upgraded when the new AMD Zen CPUs are released, and surya will get ganesh's current motherboard and RAM. I'll buy a new FX-6300 or FX-8320 for it at the same time, so all machines will have tsc_scale.

I have another machine (kali) on the network with an FX-8320 CPU (which also has the tsc_scale feature). I was already planning to add this to the ZVOL+DRBD+live-migration experiments as soon as I upgrade the main zpool on ganesh (from 4x1TB RAIDZ to 2x4TB mirrored) and free up some more old disks, so I'll be able to migrate VMs back-and-forth between indra and kali, or between surya and ganesh.

The next phase in my VM experimentation plan is to write scripts to completely automate the process of setting up a VM to use DBRD on ZVOL and migrate VMs between host machines.

When I've got that working well, I'll scrap it and start working with ganeti, which already does what i'm planning to write (but more complete and better).

And finally, when I've tired of that I'll switch to openstack and use cinder for the volume management. I'm tempted to skip ganeti and go straight to openstack, but ganeti is such cool technology that I want to play with it for a while....I haven't used it for years.

cas, thank you for posting this! Here in 2022 I am using ganeti and my VM quietly hangs when I migrate it to a certain class of hardware host and bingo: the destination hosts lack `tsc_scaling`. Wow! — dannyman, Nov 15 '22 at 23:00
And this may have been addressed in https://bugzilla.redhat.com/show_bug.cgi?id=1839095. — dannyman, Nov 15 '22 at 23:38
@dannyman that seems to be a different problem, not being able to migrate a VM between two machines **without** `tsc_scale`. I had no problem doing that (or with migrating from a non-tsc_scale machine to a tsc_scale machine), my problem was not being able to migrate a VM from a machine **with** tsc_scale to one **without** it. — cas, Nov 15 '22 at 23:55
My issue is migrating from `tsc_scale` to NOT `tsc_scale` ... it worked in the old days, but I think the vintage of qemu I am running on the current cluster has a bug that has hopefully since been fixed. (I can just stick to `tsc_scale` hardware for this application.) — dannyman, Nov 16 '22 at 18:32

100% CPU utilisation and hang after virsh migrate

1 Answers1