Re: [rdo-users] RHOSP 10 failed overcloud deployment

Friday, 2 February 2018

Hi Anda,

all the issues seem to related, if you're using tunneled networks you need
to configure  tenant networks on both controller and computes.

Also if you're using static ips you should have internal networks defined
and bind them on ServiceNetMap.

In the compute nodes if you don't use external network make sure you have
the default route and 169.254.169.254/32 on ctlplane network, something
like this:

*network_config:*
*            -*
*              type: interface*
*              name: nic1*
*              use_dhcp: false*
*              dns_servers: {get_param: DnsServers}*
*              addresses:*
*                -*
*                  ip_netmask:*
*                    list_join:*
*                      - '/'*
*                      - - {get_param: ControlPlaneIp}*
*                        - {get_param: ControlPlaneSubnetCidr}*
*              routes:*
*                -*
*                  ip_netmask: 169.254.169.254/32
<http://169.254.169.254/32>*
*                  next_hop: {get_param: EC2MetadataIp}*
*                -*
*                  default: true*
*                  next_hop: {get_param: ControlPlaneDefaultRoute}  *

Hope it helps.

On Fri, Feb 2, 2018 at 9:04 AM, Anda Nicolae <anicolae(a)lenovo.com&gt; wrote:

...
 Hi all,

 Thanks for the info about the 2 networks (external and ctlplane) that I
 need on the overcloud VMs (controller and compute).

 Now br-ex on my overcloud VMs has the external IP address and I am able to
 ping overcloud VMs on both external and ctlplane IP addresses.

 Also, since for the external network I use static IPs, in my
 ips-from-pool-all.yaml, I have:

 OS::TripleO::Compute::Ports::ExternalPort: ../network/ports/external_
 from_pool_compute.yaml

 external_from_pool_compute.yaml is similar to external_from_pool.yaml
 file. I've noticed that I if use noop.yaml, the external IP is not assigned
 to eth0 interface on the compute node.

 I hope it is correct to use it like this.

 I have continued with my overcloud deployment and I've noticed that some
 progress has been made:

 - Controller resource is now in CREATE_COMPLETE state

 - although deployment still fails, I can connect to the overcloud VMs via
 both ctlplane IP and external IP and check the logs, after the failure of
 the deploy operation

 Compute resource fails with the CREATE aborted reason. I've looked in
 /valog/messages on the overcloud compute VM and I've noticed the following
 error messages that keep repeating:

 Feb  2 03:09:36 localhost os-collect-config: Source [ec2] Unavailable.

 Feb  2 03:09:36 localhost os-collect-config: /var/lib/os-collect-config/local-data
 not found. Skipping

 Feb  2 03:09:36 localhost os-collect-config: No local metadata found
 (['/var/lib/os-collect-config/local-data'])

 Feb  2 03:10:16 localhost os-collect-config:
HTTPConnectionPool(host='169.254.169.254',
 port=80): Max retries exceeded with url: /latest/meta-data/ (Caused by
 ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection
 object at 0x2752190>, 'Connection to 169.254.169.254 timed out. (connect
 timeout=10.0)'))

 From heat-engine.log, I have:

 2018-02-01 19:26:32.253 3348 DEBUG neutronclient.v2_0.client
 [req-c27f050c-b743-4e1d-a706-e01e63a43b49 fdfcf2f659a94e57829dbefc618f3d3b
 453c1e37b83f4f8e8a49dab299e8224d - - -] Error message: {"NeutronError":
 {"message": "Port 0292b718-2c28-4b0c-a517-c481c547b711 could not be
 found.", "type": "PortNotFound", "detail":
""}} _handle_fault_response
 /usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:266

 I have 2 questions regarding the deployment:

 1. Does any of the error messages above cause the failed deployment of the
 Compute resource?

 2. In my network-environment.yaml, I haven't set InternalApiNetCidr,
 TenantNetCidr, InternalApiNetworkVlanID, TenantNetworkVlanID.

 Do I need to set these in order to make de overcloud deployment work?

 Thanks,

 Anda

 *From:* Anda Nicolae
 *Sent:* Wednesday, January 31, 2018 12:40 PM
 *To:* 'Pedro Sousa'
 *Cc:* rasca(a)redhat.com; users(a)lists.rdoproject.org
 *Subject:* RE: [rdo-users] RHOSP 10 failed overcloud deployment

 I've just run 'neutron net-list' on the undercloud node and I have the 2
 networks, ctlplane and external.

 My belief was that I don't need the external network, I only need the
 provision (ctlplane) network for the deployment.

 I don't have a DHCP server for my external network.

 Do I need to set the external IP address for the compute node and for the
 controller node in the yaml files from templates folder?

 Thanks,

 Anda

 *From:* Pedro Sousa [mailto:pgsousa@gmail.com <pgsousa(a)gmail.com>]
 *Sent:* Wednesday, January 31, 2018 12:32 PM
 *To:* Anda Nicolae
 *Cc:* rasca(a)redhat.com; users(a)lists.rdoproject.org

 *Subject:* Re: [rdo-users] RHOSP 10 failed overcloud deployment

 Hi Anda,

 some things you could check:

 Do you have 2 networks on director (ctlplane and external) and are they
 reachable from the overcloud nodes?

 Seems to me that you have network issues and that's because you're seeing
 those long timeouts.

 For "Message: No valid host was found. There are not enough hosts
 available" message you could check "/var/log/nova/nova-conductor.log".

 Regards

 On Wed, Jan 31, 2018 at 10:14 AM, Anda Nicolae <anicolae(a)lenovo.com&gt;
 wrote:

 I've let the deployment run overnight and it failed after almost 4hrs with
 the errors below. Do you happen to know the config file where I can
 decrease the timeout? I looked in /etc/nova/nova.conf and in ironic config
 files but I couldn't find anything relevant.

 The errors are:

 [overcloud.Compute.0]: CREATE_FAILED  ResourceInError:
 resources[0].resources.NovaCompute: Went to status ERROR due to "Message:
 Unknown, Code: Unknown"
 [overcloud.Controller.0]: CREATE_FAILED  Resource CREATE failed:
 ResourceInError: resources.Controller: Went to status ERROR due to
 "Message: No valid host was found. There are not enough hosts available.,
 Code: 500"

 It is unclear to me why the above errors occur, since in my
 instackenv.json I declared node capabilities for both the computer and the
 controller node to be greater than the compute and controller flavors from
 'openstack flavor list'.

 However, I've found this link and I am looking over it:
 https://docs.openstack.org/ironic/latest/admin/troubleshooting.html#nova-
 returns-no-valid-host-was-found-error

 Thanks,
 Anda

 -----Original Message-----
 From: Raoul Scarazzini [mailto:rasca@redhat.com]
 Sent: Tuesday, January 30, 2018 8:17 PM
 To: Anda Nicolae; users(a)lists.rdoproject.org
 Subject: Re: [rdo-users] RHOSP 10 failed overcloud deployment

 On 01/30/2018 04:39 PM, Anda Nicolae wrote:
 > Got it.
 >
 > I've noticed that it spends quite some time in CREATE_IN_PROGRESS state
 for OS::Heat::ResourceGroup resource (on Controller node).
 > Overcloud deployment fails after 4h. I will check in which config file
 is the overcloud deployment timeout configured and decrease it.
 >
 > Thanks,
 > Anda

 Check also network settings. 4h timeout is the default when something is
 unreachable.

 --
 Raoul Scarazzini
 rasca(a)redhat.com
 _______________________________________________
 users mailing list
 users(a)lists.rdoproject.org
 http://lists.rdoproject.org/mailman/listinfo/users

 To unsubscribe: users-unsubscribe(a)lists.rdoproject.org

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [rdo-users] RHOSP 10 failed overcloud deployment