[Rdo-list] Failing to deploy Mitaka on baremetal

Mon Feb 8 17:15:55 UTC 2016

Just another update. I fixed the connectivity issue (nic1 and nic2 were
inverted in yaml files) but the setup files anyway.
The problem now, looking into the compute's
/var/lib/heat-config/deployed directory) is this one:

{
  "deploy_stdout": "Trying to ping 172.16.0.14 for local network
172.16.0.0/24...SUCCESS\nTrying to ping 172.17.0.16 for local network
172.17.0.0/24...SUCCESS\nTrying to ping 172.18.0.14 for local network
172.18.0.0/24...SUCCESS\nTrying to ping default gateway
10.1.241.254...FAILURE\n10.1.241.254 is not pingable.\n",
  "deploy_stderr": "",
  "deploy_status_code": 1
}

Funny thing is that I'm able to ping 10.1.241.254 from the compute
nodes, and so I'm asking why it is failing during the deployment.

Should it be that something is not ready from network side? Then what?
Can you give me some hints on how to debug this?

--
Raoul Scarazzini
rasca at redhat.com

Il giorno 8/2/2016 12:01:29, Raoul Scarazzini ha scritto:
> Hi David,
> you are absolutely right, I did too many assumptions. First of all I'm
> installing everything with rdo-manager, using an identical set of
> configurations that came from my previous (working) osp-director 8 setup.
> 
> Things seems to fail on compute node verifications. Specifically here:
> 
> Feb 05 16:37:23 overcloud-novacompute-0 os-collect-config[6014]:
> [2016-02-05 16:37:23,707] (heat-config) [ERROR] Error running
> /var/lib/heat-config/heat-config-script/a435044e-9be8-42ea-8b03-92bee12b3d23.
> [1]
> 
> Looking that script I identified two different actions:
> 
> 1) # For each unique remote IP (specified via Heat) we check to
> # see if one of the locally configured networks matches and if so we
> # attempt a ping test the remote network IP.
> 
> 2) # Ping all default gateways. There should only be one
> # if using upstream t-h-t network templates but we test
> # all of them should some manual network config have
> # multiple gateways.
> 
> And in fact after a verification I'm not able to reach compute nodes
> from controllers or other computes inside one of the
> InternalApiAllocationPools (172.17.0) or TenantAllocationPools (172.16.0).
> 
> I'm using a specific network setup, as I said the same one I was using
> with osp-director 8. So I've got a specific network-management.yaml file
> in which I've specified these settings:
> 
> resource_registry:
>   OS::TripleO::BlockStorage::Net::SoftwareConfig:
> /home/stack/nic-configs/cinder-storage.yaml
>   OS::TripleO::Compute::Net::SoftwareConfig:
> /home/stack/nic-configs/compute.yaml
>   OS::TripleO::Controller::Net::SoftwareConfig:
> /home/stack/nic-configs/controller.yaml
>   OS::TripleO::ObjectStorage::Net::SoftwareConfig:
> /home/stack/nic-configs/swift-storage.yaml
>   OS::TripleO::CephStorage::Net::SoftwareConfig:
> /home/stack/nic-configs/ceph-storage.yaml
> 
> parameter_defaults:
>   # Customize the IP subnets to match the local environment
>   InternalApiNetCidr: 172.17.0.0/24
>   StorageNetCidr: 172.18.0.0/24
>   StorageMgmtNetCidr: 172.19.0.0/24
>   TenantNetCidr: 172.16.0.0/24
>   ExternalNetCidr: 172.20.0.0/24
>   ControlPlaneSubnetCidr: '24'
>   InternalApiAllocationPools: [{'start': '172.17.0.10', 'end':
> '172.17.0.200'}]
>   StorageAllocationPools: [{'start': '172.18.0.10', 'end': '172.18.0.200'}]
>   StorageMgmtAllocationPools: [{'start': '172.19.0.10', 'end':
> '172.19.0.200'}]
>   TenantAllocationPools: [{'start': '172.16.0.10', 'end': '172.16.0.200'}]
>   ExternalAllocationPools: [{'start': '172.20.0.10', 'end': '172.20.0.200'}]
>   # Specify the gateway on the external network.
>   ExternalInterfaceDefaultRoute: 172.20.0.254
>   # Gateway router for the provisioning network (or Undercloud IP)
>   ControlPlaneDefaultRoute: 192.0.2.1
>   # Generally the IP of the Undercloud
>   EC2MetadataIp: 192.0.2.1
>   DnsServers: ["10.1.241.2"]
>   InternalApiNetworkVlanID: 2201
>   StorageNetworkVlanID: 2203
>   StorageMgmtNetworkVlanID: 2204
>   TenantNetworkVlanID: 2202
>   ExternalNetworkVlanID: 2205
>   # Floating IP networks do not have to use br-ex, they can use any
> bridge as long as the NeutronExternalNetworkBridge is set to "''".
>   NeutronExternalNetworkBridge: "''"
> 
> # Variables in "parameters" apply an actual value to one of the
> top-level params
> parameters:
>   # The OVS logical->physical bridge mappings to use. Defaults to
> mapping br-ex - the external bridge on hosts - to a physical name
> 'datacentre' which can be used
>   # to create provider networks (and we use this for the default
> floating network) - if changing this either use different post-install
> network scripts or be sure
>   # to keep 'datacentre' as a mapping network name.
>   # Unfortunately this option is overridden by the command line, due to
> a limitation (that will be fixed), so even declaring this won't have effect.
>   # See overcloud-deploy.sh for all the explenations.
>   #
> https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud-without-mergepy.yaml#L112
>   NeutronBridgeMappings: "datacentre:br-floating"
> 
> Obviously the controller.yaml was modified to reflect my needs, as
> described here [1] and as you can see, I don't have declared a
> ManagementNetworkVlan, since before this was not needed.
> So the first question is: what is this new network and how it
> differentiate from the other networks actually available? Can this
> affect communications that before were working?
> 
> Many thanks
> 
> [1]
> https://github.com/rscarazz/openstack/blob/master/ospd-network-isolation-considerations.md
> 
> --
> Raoul Scarazzini
> rasca at redhat.com
> 
> Il giorno 6/2/2016 00:38:38, David Moreau Simard ha scritto:
>> Hi Raoul,
>>
>> A good start would be to give us some more details about how you did the
>> installation.
>>
>> What installation tool/procedure ? What repositories ?
>>
>> David Moreau Simard
>> Senior Software Engineer | Openstack RDO
>>
>> dmsimard = [irc, github, twitter]
>>
>> On Feb 5, 2016 5:16 AM, "Raoul Scarazzini" <rasca at redhat.com
>> <mailto:rasca at redhat.com>> wrote:
>>
>>     Hi,
>>     I'm trying to deploy Mitaka on a baremetal environment composed
>>     by 3 controllers and 4 computes.
>>     After introspection nodes seems fine, even for one of them I need to do
>>     introspection by hand, since it was not completing the process. But in
>>     the end all my 7 nodes were in state "available".
>>
>>     Launching the overcloud deploy, the controller part it goes fine, but
>>     then it gives me this error about compute:
>>
>>     2016-02-05 09:26:59 [NovaCompute]: CREATE_FAILED  ResourceInError:
>>     resources.NovaCompute: Went to status ERROR due to "Message: Exceeded
>>     maximum number of retries. Exceeded max scheduling at
>>     tempts 3 for instance 0227f7c1-3c2b-4e10-93bf-e7d84a7aca71. Last
>>     exception: Port b8:ca:3a:66:ef:5a is still in use.
>>
>>     The funny thing is that I can't find anywhere the incriminated ID it's
>>     not an Ironic node ID and neither a Nova one.
>>
>>     Can you help me point the attention in the right direction?
>>
>>     Many thanks,
>>
>>     --
>>     Raoul Scarazzini
>>     rasca at redhat.com <mailto:rasca at redhat.com>
>>
>>     _______________________________________________
>>     Rdo-list mailing list
>>     Rdo-list at redhat.com <mailto:Rdo-list at redhat.com>
>>     https://www.redhat.com/mailman/listinfo/rdo-list
>>
>>     To unsubscribe: rdo-list-unsubscribe at redhat.com
>>     <mailto:rdo-list-unsubscribe at redhat.com>
>>
> 
> _______________________________________________
> Rdo-list mailing list
> Rdo-list at redhat.com
> https://www.redhat.com/mailman/listinfo/rdo-list
> 
> To unsubscribe: rdo-list-unsubscribe at redhat.com
>