So gave it a few more hours, on heat resource nothing is failed only
create_complete and some init_complete.
Nova show
| 61aaed37-4993-4165-93a7-3c9bf6b10a21 | overcloud-controller-0 |
ACTIVE | - | Running | ctlplane=192.0.2.8 |
| 7f9f4f52-3ee6-42d9-9275-ff88582dd6e7 | overcloud-novacompute-0 |
BUILD | spawning | NOSTATE | ctlplane=192.0.2.9 |
nova show 7f9f4f52-3ee6-42d9-9275-ff88582dd6e7
+--------------------------------------+----------------------------------------------------------+
| Property |
Value |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig |
MANUAL |
| OS-EXT-AZ:availability_zone |
nova |
| OS-EXT-SRV-ATTR:host |
instack.localdomain |
| OS-EXT-SRV-ATTR:hypervisor_hostname |
4626bf90-7f95-4bd7-8bee-5f5b0a0981c6 |
| OS-EXT-SRV-ATTR:instance_name |
instance-00000002 |
| OS-EXT-STS:power_state |
0 |
| OS-EXT-STS:task_state |
spawning |
| OS-EXT-STS:vm_state |
building |
Checking nova log this is what I see:
nova-compute.log:{"nodes": [{"target_power_state": null,
"links":
[{"href":
"http://192.0.2.1:6385/v1/nodes/4626bf90-7f95-4bd7-8bee-5f5b0a0981c6",
"rel": "self"}, {"href":
"http://192.0.2.1:6385/nodes/4626bf90-7f95-4bd7-8bee-5f5b0a0981c6",
"rel": "bookmark"}], "extra": {}, "last_error":
"*Failed to change
power state to 'power on'. Error: Failed to execute command via SSH*:
LC_ALL=C /usr/bin/virsh --connect qemu:///system start
baremetalbrbm_1.", "updated_at": "2015-10-12T14:36:08+00:00",
"maintenance_reason": null, "provision_state":
"deploying",
"clean_step": {}, "uuid":
"4626bf90-7f95-4bd7-8bee-5f5b0a0981c6",
"console_enabled": false, "target_provision_state":
"active",
"provision_updated_at": "2015-10-12T14:35:18+00:00",
"power_state":
"power off", "inspection_started_at": null,
"inspection_finished_at":
null, "maintenance": false, "driver": "pxe_ssh",
"reservation": null,
"properties": {"memory_mb": "4096", "cpu_arch":
"x86_64", "local_gb":
"40", "cpus": "1", "capabilities":
"boot_option:local"},
"instance_uuid": "7f9f4f52-3ee6-42d9-9275-ff88582dd6e7",
"name": null,
"driver_info": {"ssh_username": "root",
"deploy_kernel":
"94cc528d-d91f-4ca7-876e-2d8cbec66f1b", "deploy_ramdisk":
"057d3b42-002a-4c24-bb3f-2032b8086108", "ssh_key_contents":
"-----BEGIN( I removed key..)END RSA PRIVATE KEY-----",
"ssh_virt_type": "virsh", "ssh_address":
"192.168.122.1"},
"created_at": "2015-10-12T14:26:30+00:00", "ports":
[{"href":
"http://192.0.2.1:6385/v1/nodes/4626bf90-7f95-4bd7-8bee-5f5b0a0981c6/ports",
"rel": "self"}, {"href":
"http://192.0.2.1:6385/nodes/4626bf90-7f95-4bd7-8bee-5f5b0a0981c6/ports",
"rel": "bookmark"}], "driver_internal_info":
{"clean_steps": null,
"root_uuid_or_disk_id": "9ff90423-9d18-4dd1-ae96-a4466b52d9d9",
"is_whole_disk_image": false}, "instance_info":
{"ramdisk":
"82639516-289d-4603-bf0e-8131fa75ec46", "kernel":
"665ffcb0-2afe-4e04-8910-45b92826e328", "root_gb": "40",
"display_name": "overcloud-novacompute-0", "image_source":
"d99f460e-c6d9-4803-99e4-51347413f348", "capabilities":
"{\"boot_option\": \"local\"}", "memory_mb":
"4096", "vcpus": "1",
"deploy_key": "BI0FRWDTD4VGHII9JK2BYDDFR8WB1WUG",
"local_gb": "40",
"configdrive":
"H4sICGDEG1YC/3RtcHpwcWlpZQDt3WuT29iZ2HH02Bl7Fe/G5UxSqS3vLtyesaSl2CR4p1zyhk2Ct+ateScdVxcIgiR4A5sAr95xxa/iVOUz7EfJx8m7rXyE5IDslro1mpbGox15Zv6/lrpJ4AAHN/LBwXMIShIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADhJpvx+5UQq5EqNtvzldGs+MIfewJeNv53f/7n354F6xT/3v/TjH0v/chz0L5+8Gv2f3V+n0s+Pz34u/dj982PJfvSTvxFVfXQ7vfyBlRfGvOZo+kQuWWtNVgJn/jO/d6kHzvrGWlHOjGn0TDfmjmXL30kZtZSrlXPFREaVxQM5Hon4fdl0TU7nCmqtU6urRTlZVRP1clV+knwqK/F4UFbPOuVGKZNKFNTbgVFvwO+PyPmzipqo1solX/6slszmCuKozBzKuKPdMlE5ma
Any ideas on how to resolve a stuck spawning compute node, it's stuck
hasn't changed for a few hours now.
Tzach
Tzach
On Mon, Oct 12, 2015 at 11:25 PM, Dan Sneddon <dsneddon(a)redhat.com
<mailto:dsneddon@redhat.com>> wrote:
On 10/12/2015 08:10 AM, Tzach Shefi wrote:
> Hi,
>
> Server running centos 7.1, vm running for undercloud got up to
> overcloud deploy stage.
> It looks like its stuck nothing advancing for a while.
> Ideas, what to check?
>
> [stack@instack ~]$ openstack overcloud deploy --templates
> Deploying templates in the directory
> /usr/share/openstack-tripleo-heat-templates
> [91665.696658] device vnet2 entered promiscuous mode
> [91665.781346] device vnet3 entered promiscuous mode
> [91675.260324] kvm [71183]: vcpu0 disabled perfctr wrmsr: 0xc1
data 0xffff
> [91675.291232] kvm [71200]: vcpu0 disabled perfctr wrmsr: 0xc1
data 0xffff
> [91767.799404] kvm: zapping shadow pages for mmio generation
wraparound
> [91767.880480] kvm: zapping shadow pages for mmio generation
wraparound
> [91768.957761] device vnet2 left promiscuous mode
> [91769.799446] device vnet3 left promiscuous mode
> [91771.223273] device vnet3 entered promiscuous mode
> [91771.232996] device vnet2 entered promiscuous mode
> [91773.733967] kvm [72245]: vcpu0 disabled perfctr wrmsr: 0xc1
data 0xffff
> [91801.270510] device vnet2 left promiscuous mode
>
>
> Thanks
> Tzach
>
>
> _______________________________________________
> Rdo-list mailing list
> Rdo-list(a)redhat.com <mailto:Rdo-list@redhat.com>
>
https://www.redhat.com/mailman/listinfo/rdo-list
>
> To unsubscribe: rdo-list-unsubscribe(a)redhat.com
<mailto:rdo-list-unsubscribe@redhat.com>
>
You're going to need a more complete command line than "openstack
overcloud deploy --templates". For instance, if you are using VMs for
your overcloud nodes, you will need to include "--libvirt-type qemu".
There are probably a couple of other parameters that you will need.
You can watch the deployment using this command, which will show you
the progress:
watch "heat resource-list -n 5 | grep -v COMPLETE"
You can also explore which resources have failed:
heat resource-list [-n 5]| grep FAILED
And then look more closely at the failed resources:
heat resource-show overcloud <resource>
There are some more complete troubleshooting instructions here:
http://docs.openstack.org/developer/tripleo-docs/troubleshooting/troubles...
--
Dan Sneddon | Principal OpenStack Engineer
dsneddon(a)redhat.com <mailto:dsneddon@redhat.com> |
redhat.com/openstack <
http://redhat.com/openstack>
650.254.4025 <tel:650.254.4025> | dsneddon:irc @dxs:twitter
_______________________________________________
Rdo-list mailing list
Rdo-list(a)redhat.com <mailto:Rdo-list@redhat.com>
https://www.redhat.com/mailman/listinfo/rdo-list
To unsubscribe: rdo-list-unsubscribe(a)redhat.com
<mailto:rdo-list-unsubscribe@redhat.com>
--
*Tzach Shefi*
Quality Engineer, Redhat OSP
+972-54-4701080 <callto:+972-52-4534729>
The deployment looks like it is stuck to me. The problem, though,
appears to be an inability to set the power state on one of the VM
nodes through libvirt.
What the SSH driver does for virt is to SSH from the Undercloud VM to
the VM host system, and issue libvirt commands to start/stop VMs. That
process failed when setting the power state of one of your nodes, and
it doesn't look like the deployment is recovering from that error.
I'm not quite sure why that is happening, but I can think of a few
possible reasons:
* SSH daemon not running on the virt host
* The virt host was not able to respond to the request, perhaps it was
overloaded?
* Firewall blocking SSH connections from the Instack VM to the virt host?
One tip for the next deployment: You can set the timeout. That way, if
it does get hung up you don't have to wait 4 hours for it to fail.
Conservatively, you could set --timeout 90 to set the timeout to 90
minutes. A 2-node deployment will definitely either deploy or fail in
that amount of time (probably much less, but I wouldn't want you to cut
off a deployment that might be successful if given a little more time).
--
Dan Sneddon | Principal OpenStack Engineer
dsneddon(a)redhat.com |