On Fri, Sep 18, 2015 at 05:22:19PM -0400, Perry Myers wrote:
>> a Undercloud on one VM, single overcloud controller on
another VM,
>> single compute node on another VM (using nested virt, or just plain
>> emulation)
>
> I try to stay away from nested KVM. Every now and then I or someone
> will come along and try it, report it works for a bit, but ends up
> in various kernel crashes/panics if the environment stays up for too
> long.
If these are reliably (or occur even once or twice) reproducible, please
take time to file bugs. Upstream KVM maintainers say (at one of
previous KVM Forums) not many nested KVM bugs are reported. Without
more testing, and diligent bug reporting, nested KVM won't magically
become stable. And, IME, upstream has usually been responsive to bugs.
Test matrix explosion (with so many combinations on baremetal, guest
hypervisor, and the nested guest) is a challenge. Any efforts towards
consistent testing and bug reporting are helpful contributions to make
nested virt stable.
For some (again need to look at specific personas), _emulated_
Instances might be good enough. (i.e. no nested KVM and instead using
qemu emulation on the virtual Compute Node)
True.
It's not fast, but it is enough to show end to end usage of the
system, in a fairly minimal hardware footprint.
As for the stability of nested KVM... Kashyap any thoughts on this?
Some comments:
- Speaking from my testing on Fedora (often Kernel & Virt components
from Rawhide) and Intel, nested KVM is now much more stable (I
haven't seen a crash or panic in the last year) than what it was 2
years ago.
- Using EL 7.x (or better, current Fedora stable Kernels), nested KVM
should work relatively pain-free for Linux on Linux use cases. (But
if you do see any, please report.)
EL7.1 (and above) also has nested Extended Page Tables (EPT)
support, which makes nVMX (Intel-based nested Virt) more
performant[1][2]. Ensure you have this Kernel parameter is enabled:
$ cat /sys/module/kvm_intel/parameters/ept
Y
- AMD-based Nested Virtualization support is slightly better, upstream
reports(refer slide-29 here[3]). But, there's consistent bug fixing
and automated testing efforts upstream to make nVMX support better
too. Upstream also does discusses frequent new features.
[1] A talk about it by Gleb Natapov, KVM maintainer in 2013 --
http://www.linux-kvm.org/images/8/8c/Kvm-forum-2013-nested-ept.pdf
[2] Slide-18 for nested EPT test --
http://events.linuxfoundation.org/sites/events/files/slides/nested-virt-k...
[3] Talk by Jan Kizka, Bandan Das et al. at 2014 KVM Forum, with details
on performance evaluation --
http://www.linux-kvm.org/images/3/33/02x03-NestedVirtualization.pdf
> We did try with it enabled in an OpenStack cloud itself during
a
> hackfest event, and had planned on giving each participant a 32 GB
> vm (spawned by nova), that had KVM support. They would then use that
> to do an rdo-manager HA deployment in virt. It hummed along quite
> nicely initially, but started experiencing hard lockups before long.
[. . .]
> You can do a) with 12GB max (I do on my laptop) since not all of
the
> memory is in use, and you can give the compute node much less than
> 4GB even. KSM also helps.
Ah, good point about KSM. Last time I ran it (admittedly more than 1
yr ago) all it did was suck massive amounts of CPU cycles from me,
Yeah, the above was the case when it was relatively new (about
2009-2010).
but maybe it's gotten a bit more efficient since then? :)
Yep, FWIW, I enable it (Kernel Samepage Merging) in my homogenous (all
Linux) test environemnt, and don't see any noticeable spikes in CPU
consumption (sorry, no real benchmark details). I think if you have
many identical guests, it's worth it to enable allowing it to increase
the memory density.
[. . .]
--
/kashyap