On Tue, Jul 01, 2014 at 04:57:02PM +1200, Steven Ellis wrote:
So I'm having issues nesting RDO on my T440s laptop (Intel(R)
Core(TM)
i7-4600U CPU @ 2.10GHz), and I'm hoping someone on the list can help
My Physical Host (L0) is Fedora 19 running 3.14.4-100.fc19.x86_64 with
nesting turned on
If you can, I'd strongly suggest to use latest F20 Kernels (for L0 & L1)
as nested KVM issues are freuqently upstream which are available in
Fedora Rawhide.
The thing with nested virtualization is the explosion of test matrix
(different Kernels + distributions on L0, L1, L2) :-(
I'm running F20 (L0) -> F20 (L1) -> F20 (L2), with current Fedora
Rawhide Kernels (and cpu -host on for L1 & L2) and I don't see this
issue.
My OpenStack Host is RHEL 6.5 or RHEL 7 (L1)
My Guest is Cirros (L2)
[. . .]
The issue appears to be running with "-cpu host" with this
nesting
combination.
Now if I run the qemu command directly on RHEL7 (L1) I get this error
KVM: entry failed, hardware error 0x7
Under RHEL 6.5 (L1) it is similar but not identical
kvm: unhandled exit 7
In both cases on my Fedora physical host (L0) I see
nested_vmx_run: VMCS MSR_{LOAD,STORE} unsupported
IIRC, that's because your CPU just doesn't support VMCS shadowing
(unless you're using Intel Haswell or above). I think the below command
returns 'N' on your CPU:
$ cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs
There does appear to be a Red Hat bugzilla for RHEL7 relating to
this
but not for RHEL6
-
https://bugzilla.redhat.com/show_bug.cgi?id=1038427
I recall that bug. Marcelo's suggestion to not use host-passthrough
(-cpu host) for L2 is reasonable for now I guess. From my testing I
haven't seen any significant performance benefits for hostpassthrough at
both levels, I instead try to expose just 'vmx' extension (more on it
below).
I can reproduce this issue using both RHEL 6.5 and RHEL 7 as my
OpenStack Host (L1). Has anyone else hit this issue?
Next I tried a work around of editing the /etc/nova/nova.conf file and
forcing the CPU type for my guests under OpenStack
#cpu_mode=none
cpu_mode=custom
# Set to a named libvirt CPU model (see names listed in
# /usr/share/libvirt/cpu_map.xml). Only has effect if
# cpu_mode="custom" and virt_type="kvm|qemu" (string value)
# Deprecated group;name - DEFAULT;libvirt_cpu_model
#cpu_model=<None>
cpu_model=Conroe
To see if it's working (only for testing), you can enforce the CPU model
in your CirrOS guest XML and see the guest starts w/ `virsh start
instance-foo`
Problem is qemu is still run with "-cpu host,+kvmclock"
So am I hitting a secondary bug with nova-compute or is there another
way to force OpenStack to select a particular CPU subset for Nova?
Can you try to edit your L1 guest XML, and ensure you just expose the
'vmx' extension which is necessary for exposing KVM (/dev/kvm character
device) inside your L1:
<cpu match='exact'>
<model>SandyBridge</model>
<feature policy='require' name='vmx'/>
</cpu>
Alternatively, you can also try exposing the CPU element values from the
below command on your L0 & L1 and see if you can reproduce the errors:
$ virsh capabilities | virsh cpu-baseline /dev/stdin
--
/kashyap