Hi,
I have seen the same issues when deploying on HP Blades. I had chosen to
deploy on a subset of blades to save time whilst testing. The error was
caused by a rogue blade.
Previous attempts on a different set of blades in the same chassis had
left a blade(s) powered on and therefore presenting duplicate ip
addresses in the blade cluster which was interfering with my new deployment.
Basically check that all of your nodes are in the correct state, i.e
look in the iLO and cross reference with Ironic and check the power state.
HTH
Charles
On 14/10/2015 12:40, Udi Kalifon wrote:
My overcloud deployment also hangs for 4 hours and then fails. This
is
what I got on the 1st run:
[stack@instack ~]$ openstack overcloud deploy --templates
Deploying templates in the directory
/usr/share/openstack-tripleo-heat-templates
ERROR: Authentication failed. Please try again with option
--include-password or export HEAT_INCLUDE_PASSWORD=1
Authentication required
I am assuming the authentication error is due to the expiration of the
token after 4 hours, and not because I forgot the rc file. I tried to
run the deployment again and it failed after another 4 hours with a
different error:
[stack@instack ~]$ openstack overcloud deploy --templates
Deploying templates in the directory
/usr/share/openstack-tripleo-heat-templates
Stack failed with status: resources.Controller: resources[0]:
ResourceInError: resources.Controller: Went to status ERROR due to
"Message: Exceeded maximum number of retries. Exceeded max scheduling
attempts 3 for instance 9eedda9e-f381-47d4-a883-0fe40db0eb5e. Last
exception: [u'Traceback (most recent call last):\n', u' File
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1,
Code: 500"
Heat Stack update failed.
The failed resources are:
heat resource-list -n 5 overcloud |egrep -v COMPLETE
+-------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+---------------------------------------------------------------------------------+
| resource_name |
physical_resource_id |
resource_type | resource_status |
updated_time | stack_name |
+-------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+---------------------------------------------------------------------------------+
| Compute |
aee2604f-2580-44c9-bc38-45046970fd63 |
OS::Heat::ResourceGroup | UPDATE_FAILED |
2015-10-14T06:32:34 | overcloud |
| 0 |
2199c1c6-60ca-42a4-927c-8bf0fb8763b7 |
OS::TripleO::Compute | UPDATE_FAILED |
2015-10-14T06:32:36 | overcloud-Compute-dq426vplp2nu |
| Controller |
2ae19a5f-f88c-4d8b-98ec-952657b70cd6 |
OS::Heat::ResourceGroup | UPDATE_FAILED |
2015-10-14T06:32:36 | overcloud |
| 0 |
2fc3ed0c-da5c-45e4-a255-4b4a8ef58dd7 |
OS::TripleO::Controller | UPDATE_FAILED |
2015-10-14T06:32:38 | overcloud-Controller-ktbqsolaqm4u |
| NovaCompute |
7938bbe0-ab97-499f-8859-15f903e7c09b |
OS::Nova::Server | CREATE_FAILED |
2015-10-14T06:32:55 | overcloud-Compute-dq426vplp2nu-0-4acm6pstctor |
| Controller |
c1cd6b72-ec0d-4c13-b21c-10d0f6c45788 |
OS::Nova::Server | CREATE_FAILED |
2015-10-14T06:32:58 | overcloud-Controller-ktbqsolaqm4u-0-d76rtersrtyt |
+-------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+---------------------------------------------------------------------------------+
I was unable to run resource-show or deployment-show on the failed
resources, it kept complaining that those resources are not found.
Thanks,
Udi.
On Wed, Oct 14, 2015 at 11:16 AM, Tzach Shefi <tshefi(a)redhat.com
<mailto:tshefi@redhat.com>> wrote:
Hi Sasha\Dan,
Yep that's my bug I opened yesterday about this.
sshd and firewall rules look OK having tested below:
I can ssh into the virt host from my laptop with root user,
checking 10.X.X.X net
Can also ssh from instack vm to virt host, checking 192.168.122.X
net.
Unless I should check ssh with other user, if so which ?
I doubt ssh user/firewall caused the problem as controller was
installed successfully and it too uses same procedure ssh virt
power-on method.
Deployment is still up & stuck if any one ones to take a look
contact me for access details in private.
Will review/use virt console, virt journal and timeout tips on
next deployment.
Thanks
Tzach
On Wed, Oct 14, 2015 at 5:07 AM, Sasha Chuzhoy <sasha(a)redhat.com
<mailto:sasha@redhat.com>> wrote:
I hit the same (or similar) issue on my BM environment, though
I manage to complete the 1+1 deployment on VM successfully.
I see it's reported already:
https://bugzilla.redhat.com/show_bug.cgi?id=1271289
Ran a deployment with: openstack overcloud deploy
--templates --timeout 90 --compute-scale 3 --control-scale 1
The deployment fails, and I see that "all minus one" overcloud
nodes are still in BUILD status.
[stack@undercloud ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| ID | Name |
Status | Task State | Power State | Networks |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| b15f499e-79ed-46b2-b990-878dbe6310b1 |
overcloud-controller-0 | BUILD | spawning | NOSTATE |
ctlplane=192.0.2.23 |
| 4877d14a-e34e-406b-8005-dad3d79f5bab |
overcloud-novacompute-0 | ACTIVE | - | Running |
ctlplane=192.0.2.9 |
| 0fd1a7ed-367e-448e-8602-8564bf087e92 |
overcloud-novacompute-1 | BUILD | spawning | NOSTATE |
ctlplane=192.0.2.21 |
| 51630a7d-c140-47b9-a071-1f2fdb45f4b4 |
overcloud-novacompute-2 | BUILD | spawning | NOSTATE |
ctlplane=192.0.2.22 |
Will try to investigate further tomorrow.
Best regards,
Sasha Chuzhoy.
----- Original Message -----
> From: "Tzach Shefi" <tshefi(a)redhat.com
<mailto:tshefi@redhat.com>>
> To: "Dan Sneddon" <dsneddon(a)redhat.com
<mailto:dsneddon@redhat.com>>
> Cc: rdo-list(a)redhat.com <mailto:rdo-list@redhat.com>
> Sent: Tuesday, October 13, 2015 6:01:48 AM
> Subject: Re: [Rdo-list] Overcloud deploy stuck for a long time
>
> So gave it a few more hours, on heat resource nothing is
failed only
> create_complete and some init_complete.
>
> Nova show
> | 61aaed37-4993-4165-93a7-3c9bf6b10a21 |
overcloud-controller-0 | ACTIVE | -
> | | Running | ctlplane=192.0.2.8 |
> | 7f9f4f52-3ee6-42d9-9275-ff88582dd6e7 |
overcloud-novacompute-0 | BUILD |
> | spawning | NOSTATE | ctlplane=192.0.2.9 |
>
>
> nova show 7f9f4f52-3ee6-42d9-9275-ff88582dd6e7
>
+--------------------------------------+----------------------------------------------------------+
> | Property | Value |
>
+--------------------------------------+----------------------------------------------------------+
> | OS-DCF:diskConfig | MANUAL |
> | OS-EXT-AZ:availability_zone | nova |
> | OS-EXT-SRV-ATTR:host | instack.localdomain |
> | OS-EXT-SRV-ATTR:hypervisor_hostname |
4626bf90-7f95-4bd7-8bee-5f5b0a0981c6
> | |
> | OS-EXT-SRV-ATTR:instance_name | instance-00000002 |
> | OS-EXT-STS:power_state | 0 |
> | OS-EXT-STS:task_state | spawning |
> | OS-EXT-STS:vm_state | building |
>
> Checking nova log this is what I see:
>
> nova-compute.log:{"nodes": [{"target_power_state":
null,
"links": [{"href": "
>
http://192.0.2.1:6385/v1/nodes/4626bf90-7f95-4bd7-8bee-5f5b0a0981c6
",
> "rel": "self"}, {"href": "
>
http://192.0.2.1:6385/nodes/4626bf90-7f95-4bd7-8bee-5f5b0a0981c6
", "rel":
> "bookmark"}], "extra": {}, "last_error":
" Failed to change
power state to
> 'power on'. Error: Failed to execute command via SSH : LC_ALL=C
> /usr/bin/virsh --connect qemu:///system start baremetalbrbm_1.",
> "updated_at": "2015-10-12T14:36:08+00:00",
"maintenance_reason": null,
> "provision_state": "deploying", "clean_step":
{}, "uuid":
> "4626bf90-7f95-4bd7-8bee-5f5b0a0981c6",
"console_enabled":
false,
> "target_provision_state": "active",
"provision_updated_at":
> "2015-10-12T14:35:18+00:00", "power_state": "power
off",
> "inspection_started_at": null, "inspection_finished_at":
null,
> "maintenance": false, "driver": "pxe_ssh",
"reservation": null,
> "properties": {"memory_mb": "4096",
"cpu_arch": "x86_64",
"local_gb": "40",
> "cpus": "1", "capabilities":
"boot_option:local"},
"instance_uuid":
> "7f9f4f52-3ee6-42d9-9275-ff88582dd6e7", "name": null,
"driver_info":
> {"ssh_username": "root", "deploy_kernel":
> "94cc528d-d91f-4ca7-876e-2d8cbec66f1b",
"deploy_ramdisk":
> "057d3b42-002a-4c24-bb3f-2032b8086108",
"ssh_key_contents":
"-----BEGIN( I
> removed key..)END RSA PRIVATE KEY-----", "ssh_virt_type":
"virsh",
> "ssh_address": "192.168.122.1"},
"created_at":
"2015-10-12T14:26:30+00:00",
> "ports": [{"href": "
>
http://192.0.2.1:6385/v1/nodes/4626bf90-7f95-4bd7-8bee-5f5b0a0981c6/ports
",
> "rel": "self"}, {"href": "
>
http://192.0.2.1:6385/nodes/4626bf90-7f95-4bd7-8bee-5f5b0a0981c6/ports
",
> "rel": "bookmark"}], "driver_internal_info":
{"clean_steps":
null,
> "root_uuid_or_disk_id":
"9ff90423-9d18-4dd1-ae96-a4466b52d9d9",
> "is_whole_disk_image": false}, "instance_info":
{"ramdisk":
> "82639516-289d-4603-bf0e-8131fa75ec46", "kernel":
> "665ffcb0-2afe-4e04-8910-45b92826e328", "root_gb":
"40",
"display_name":
> "overcloud-novacompute-0", "image_source":
> "d99f460e-c6d9-4803-99e4-51347413f348", "capabilities":
"{\"boot_option\":
> \"local\"}", "memory_mb": "4096",
"vcpus": "1", "deploy_key":
> "BI0FRWDTD4VGHII9JK2BYDDFR8WB1WUG", "local_gb":
"40",
"configdrive":
>
"H4sICGDEG1YC/3RtcHpwcWlpZQDt3WuT29iZ2HH02Bl7Fe/G5UxSqS3vLtyesaSl2CR4p1zyhk2Ct+ateScdVxcIgiR4A5sAr95xxa/iVOUz7EfJx8m7rXyE5IDslro1mpbGox15Zv6/lrpJ4AAHN/LBwXMIShIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADhJpvx+5UQq5EqNtvzldGs+MIfewJeNv53f/7n354F6xT/3v/TjH0v/chz0L5+8Gv2f3V+n0s+Pz34u/dj982PJfvSTvxFVfXQ7vfyBlRfGvOZo+kQuWWtNVgJn/jO/d6kHzvrGWlHOjGn0TDfmjmXL30kZtZSrlXPFREaVxQM5Hon4fdl0TU7nCmqtU6urRTlZVRP1clV+knwqK/F4UFbPOuVGKZNKFNTbgVFvwO+PyPmzipqo1solX/6slszmCuKozBzKuKPdMlE5ma
>
>
> Any ideas on how to resolve a stuck spawning compute node,
it's stuck hasn't
> changed for a few hours now.
>
> Tzach
>
> Tzach
>
>
> On Mon, Oct 12, 2015 at 11:25 PM, Dan Sneddon <
dsneddon(a)redhat.com <mailto:dsneddon@redhat.com> > wrote:
>
>
>
> On 10/12/2015 08:10 AM, Tzach Shefi wrote:
> > Hi,
> >
> > Server running centos 7.1, vm running for undercloud got up to
> > overcloud deploy stage.
> > It looks like its stuck nothing advancing for a while.
> > Ideas, what to check?
> >
> > [stack@instack ~]$ openstack overcloud deploy --templates
> > Deploying templates in the directory
> > /usr/share/openstack-tripleo-heat-templates
> > [91665.696658] device vnet2 entered promiscuous mode
> > [91665.781346] device vnet3 entered promiscuous mode
> > [91675.260324] kvm [71183]: vcpu0 disabled perfctr wrmsr:
0xc1 data 0xffff
> > [91675.291232] kvm [71200]: vcpu0 disabled perfctr wrmsr:
0xc1 data 0xffff
> > [91767.799404] kvm: zapping shadow pages for mmio
generation wraparound
> > [91767.880480] kvm: zapping shadow pages for mmio
generation wraparound
> > [91768.957761] device vnet2 left promiscuous mode
> > [91769.799446] device vnet3 left promiscuous mode
> > [91771.223273] device vnet3 entered promiscuous mode
> > [91771.232996] device vnet2 entered promiscuous mode
> > [91773.733967] kvm [72245]: vcpu0 disabled perfctr wrmsr:
0xc1 data 0xffff
> > [91801.270510] device vnet2 left promiscuous mode
> >
> >
> > Thanks
> > Tzach
> >
> >
> > _______________________________________________
> > Rdo-list mailing list
> > Rdo-list(a)redhat.com <mailto:Rdo-list@redhat.com>
> >
https://www.redhat.com/mailman/listinfo/rdo-list
> >
> > To unsubscribe: rdo-list-unsubscribe(a)redhat.com
<mailto:rdo-list-unsubscribe@redhat.com>
> >
>
> You're going to need a more complete command line than
"openstack
> overcloud deploy --templates". For instance, if you are
using VMs for
> your overcloud nodes, you will need to include
"--libvirt-type qemu".
> There are probably a couple of other parameters that you
will need.
>
> You can watch the deployment using this command, which will
show you
> the progress:
>
> watch "heat resource-list -n 5 | grep -v COMPLETE"
>
> You can also explore which resources have failed:
>
> heat resource-list [-n 5]| grep FAILED
>
> And then look more closely at the failed resources:
>
> heat resource-show overcloud <resource>
>
> There are some more complete troubleshooting instructions here:
>
>
http://docs.openstack.org/developer/tripleo-docs/troubleshooting/troubles...
>
> --
> Dan Sneddon | Principal OpenStack Engineer
> dsneddon(a)redhat.com <mailto:dsneddon@redhat.com> |
redhat.com/openstack <
http://redhat.com/openstack>
> 650.254.4025 <tel:650.254.4025> | dsneddon:irc @dxs:twitter
>
> _______________________________________________
> Rdo-list mailing list
> Rdo-list(a)redhat.com <mailto:Rdo-list@redhat.com>
>
https://www.redhat.com/mailman/listinfo/rdo-list
>
> To unsubscribe: rdo-list-unsubscribe(a)redhat.com
<mailto:rdo-list-unsubscribe@redhat.com>
>
>
>
> --
> Tzach Shefi
> Quality Engineer, Redhat OSP
> +972-54-4701080 <tel:%2B972-54-4701080>
>
> _______________________________________________
> Rdo-list mailing list
> Rdo-list(a)redhat.com <mailto:Rdo-list@redhat.com>
>
https://www.redhat.com/mailman/listinfo/rdo-list
>
> To unsubscribe: rdo-list-unsubscribe(a)redhat.com
<mailto:rdo-list-unsubscribe@redhat.com>
--
*Tzach Shefi*
Quality Engineer, Redhat OSP
+972-54-4701080 <callto:+972-52-4534729>
_______________________________________________
Rdo-list mailing list
Rdo-list(a)redhat.com <mailto:Rdo-list@redhat.com>
https://www.redhat.com/mailman/listinfo/rdo-list
To unsubscribe: rdo-list-unsubscribe(a)redhat.com
<mailto:rdo-list-unsubscribe@redhat.com>
_______________________________________________
Rdo-list mailing list
Rdo-list(a)redhat.com
https://www.redhat.com/mailman/listinfo/rdo-list
To unsubscribe: rdo-list-unsubscribe(a)redhat.com
--
Charles Short
Cloud Engineer
Virtualization and Cloud Team
European Bioinformatics Institute (EMBL-EBI)
Tel: +44 (0)1223 494205