[Rdo-list] Failing to deploy Mitaka on baremetal

Raoul Scarazzini rasca at redhat.com
Tue Feb 9 11:43:21 UTC 2016


That seems to have solved the problem. Deployment ended up successfully,
I don't know if everything is alright, I'm going to check it starting
right now, but the deploy completed in 33 minutes.

So, for now, many thanks Marius.

--
Raoul Scarazzini
rasca at redhat.com

Il giorno 9/2/2016 12:00:59, Marius Cornea ha scritto:
> On Tue, Feb 9, 2016 at 11:51 AM, Raoul Scarazzini <rasca at redhat.com> wrote:
>> Hi Marius,
>> your assumption is totally right: the external network is going to live
>> just on the controllers.
>>
>> What I don't understand is why it's configured on the compute nodes. My
>> compute.yaml looks like this [1] and as you can see there's no external
>> network declaration and in any case "use_dhcp" is already set to false
>> for the main ovs_bridge. So why is this populating in any case?
> 
> This is because by default the dhcp client will run on the interfaces
> that are not specified in the nic template. The ovs bridge contains
> the em2 interface but in your case the dhcp address gets configured on
> the em1 nic which is not specified in the template. Can you try
> something like this for the compute nic template and see if it allows
> you to continue with the deployment:
> http://paste.openstack.org/show/486385/
> 
> Thanks
> 
>> Many thanks,
>>
>> [1] http://pastebin.test.redhat.com/347199
>>
>> --
>> Raoul Scarazzini
>> rasca at redhat.com
>>
>> Il giorno 9/2/2016 11:36:02, Marius Cornea ha scritto:
>>> Hi Raoul,
>>>
>>> Thanks for the output. Can you confirm what is the purpose of the
>>> 10.1.241.0/24 subnet?
>>>
>>> I'm making an assumption and say that it's used for the external
>>> network. In this case it shouldn't be set on the compute nodes as they
>>> don't require connectivity on that network. I believe it gets
>>> configured via DHCP, I'm not really sure why the connectivity check
>>> fail during validation time. Can you try disabling dhcp for the em1
>>> interface in the compute nic template and see the result? You can add
>>> something like this to the os_net_config network_config:
>>>
>>> -
>>>   type: interface
>>>   name: nic1
>>>   use_dhcp: false
>>>
>>> Thanks,
>>> Marius
>>>
>>> On Tue, Feb 9, 2016 at 7:58 AM, Raoul Scarazzini <rasca at redhat.com> wrote:
>>>> Hi Marius,
>>>> here it is:
>>>>
>>>> # cat /etc/os-net-config/config.json | python -m json.tool
>>>> {
>>>>     "network_config": [
>>>>         {
>>>>             "addresses": [
>>>>                 {
>>>>                     "ip_netmask": "192.0.2.21/24"
>>>>                 }
>>>>             ],
>>>>             "dns_servers": [
>>>>                 "10.1.241.2"
>>>>             ],
>>>>             "members": [
>>>>                 {
>>>>                     "name": "nic2",
>>>>                     "primary": true,
>>>>                     "type": "interface"
>>>>                 },
>>>>                 {
>>>>                     "addresses": [
>>>>                         {
>>>>                             "ip_netmask": "172.17.0.12/24"
>>>>                         }
>>>>                     ],
>>>>                     "type": "vlan",
>>>>                     "vlan_id": 2201
>>>>                 },
>>>>                 {
>>>>                     "addresses": [
>>>>                         {
>>>>                             "ip_netmask": "172.18.0.11/24"
>>>>                         }
>>>>                     ],
>>>>                     "type": "vlan",
>>>>                     "vlan_id": 2203
>>>>                 },
>>>>                 {
>>>>                     "addresses": [
>>>>                         {
>>>>                             "ip_netmask": "172.16.0.10/24"
>>>>                         }
>>>>                     ],
>>>>                     "type": "vlan",
>>>>                     "vlan_id": 2202
>>>>                 }
>>>>             ],
>>>>             "name": "br-ex",
>>>>             "routes": [
>>>>                 {
>>>>                     "ip_netmask": "169.254.169.254/32",
>>>>                     "next_hop": "192.0.2.1"
>>>>                 },
>>>>                 {
>>>>                     "default": true,
>>>>                     "next_hop": "192.0.2.1"
>>>>                 }
>>>>             ],
>>>>             "type": "ovs_bridge",
>>>>             "use_dhcp": false
>>>>         }
>>>>     ]
>>>> }
>>>>
>>>> # ip a
>>>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
>>>>
>>>>                                       [4/1995]
>>>>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>>>     inet 127.0.0.1/8 scope host lo
>>>>        valid_lft forever preferred_lft forever
>>>>     inet6 ::1/128 scope host
>>>>        valid_lft forever preferred_lft forever
>>>> 2: em3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN
>>>> qlen 1000
>>>>     link/ether b8:ca:3a:66:f1:b4 brd ff:ff:ff:ff:ff:ff
>>>> 3: em4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN
>>>> qlen 1000
>>>>     link/ether b8:ca:3a:66:f1:b5 brd ff:ff:ff:ff:ff:ff
>>>> 4: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
>>>> qlen 1000
>>>>     link/ether b8:ca:3a:66:f1:b0 brd ff:ff:ff:ff:ff:ff
>>>>     inet 10.1.241.9/24 brd 10.1.241.255 scope global dynamic em1
>>>>        valid_lft 530sec preferred_lft 530sec
>>>>     inet6 fe80::baca:3aff:fe66:f1b0/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>> 5: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
>>>> ovs-system state UP qlen 1000
>>>>     link/ether b8:ca:3a:66:f1:b2 brd ff:ff:ff:ff:ff:ff
>>>>     inet6 fe80::baca:3aff:fe66:f1b2/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>> 6: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>>>>     link/ether 06:65:08:31:11:35 brd ff:ff:ff:ff:ff:ff
>>>> 7: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
>>>> UNKNOWN
>>>>     link/ether b8:ca:3a:66:f1:b2 brd ff:ff:ff:ff:ff:ff
>>>>     inet 192.0.2.21/24 brd 192.0.2.255 scope global br-ex
>>>>        valid_lft forever preferred_lft forever
>>>>     inet6 fe80::baca:3aff:fe66:f1b2/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>> 8: vlan2203: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
>>>> state UNKNOWN
>>>>     link/ether 96:6a:10:5b:1a:47 brd ff:ff:ff:ff:ff:ff
>>>>     inet 172.18.0.11/24 brd 172.18.0.255 scope global vlan2203
>>>>        valid_lft forever preferred_lft forever
>>>>     inet6 fe80::946a:10ff:fe5b:1a47/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>> 9: vlan2202: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
>>>> state UNKNOWN
>>>>     link/ether ca:e0:0d:b0:7e:30 brd ff:ff:ff:ff:ff:ff
>>>>     inet 172.16.0.10/24 brd 172.16.0.255 scope global vlan2202
>>>>        valid_lft forever preferred_lft forever
>>>>     inet6 fe80::c8e0:dff:feb0:7e30/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>> 10: vlan2201: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
>>>> state UNKNOWN
>>>>     link/ether c2:24:5f:4c:37:6c brd ff:ff:ff:ff:ff:ff
>>>>     inet 172.17.0.12/24 brd 172.17.0.255 scope global vlan2201
>>>>        valid_lft forever preferred_lft forever
>>>>     inet6 fe80::c024:5fff:fe4c:376c/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>> 11: br-floating: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
>>>> noqueue state UNKNOWN
>>>>     link/ether de:7f:cd:d7:c1:46 brd ff:ff:ff:ff:ff:ff
>>>>     inet6 fe80::dc7f:cdff:fed7:c146/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>> 12: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>>>>     link/ether 36:39:0f:39:85:4c brd ff:ff:ff:ff:ff:ff
>>>> 13: br-tun: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>>>>     link/ether 72:d1:82:d9:15:4f brd ff:ff:ff:ff:ff:ff
>>>>
>>>> # ip r
>>>> default via 10.1.241.254 dev em1
>>>> 10.1.241.0/24 dev em1  proto kernel  scope link  src 10.1.241.9
>>>> 169.254.169.254 via 192.0.2.1 dev br-ex
>>>> 172.16.0.0/24 dev vlan2202  proto kernel  scope link  src 172.16.0.10
>>>> 172.17.0.0/24 dev vlan2201  proto kernel  scope link  src 172.17.0.12
>>>> 172.18.0.0/24 dev vlan2203  proto kernel  scope link  src 172.18.0.11
>>>> 192.0.2.0/24 dev br-ex  proto kernel  scope link  src 192.0.2.21
>>>>
>>>> Many thanks,
>>>>
>>>> --
>>>> Raoul Scarazzini
>>>> rasca at redhat.com
>>>>
>>>> Il giorno 8/2/2016 18:26:45, Marius Cornea ha scritto:
>>>>> Hi Raoul,
>>>>>
>>>>> Can you post the output of the following commands on that compute node please?
>>>>>
>>>>> cat /etc/os-net-config/config.json | python -m json.tool
>>>>> ip a
>>>>> ip r
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Mon, Feb 8, 2016 at 6:15 PM, Raoul Scarazzini <rasca at redhat.com> wrote:
>>>>>> Just another update. I fixed the connectivity issue (nic1 and nic2 were
>>>>>> inverted in yaml files) but the setup files anyway.
>>>>>> The problem now, looking into the compute's
>>>>>> /var/lib/heat-config/deployed directory) is this one:
>>>>>>
>>>>>> {
>>>>>>   "deploy_stdout": "Trying to ping 172.16.0.14 for local network
>>>>>> 172.16.0.0/24...SUCCESS\nTrying to ping 172.17.0.16 for local network
>>>>>> 172.17.0.0/24...SUCCESS\nTrying to ping 172.18.0.14 for local network
>>>>>> 172.18.0.0/24...SUCCESS\nTrying to ping default gateway
>>>>>> 10.1.241.254...FAILURE\n10.1.241.254 is not pingable.\n",
>>>>>>   "deploy_stderr": "",
>>>>>>   "deploy_status_code": 1
>>>>>> }
>>>>>>
>>>>>> Funny thing is that I'm able to ping 10.1.241.254 from the compute
>>>>>> nodes, and so I'm asking why it is failing during the deployment.
>>>>>>
>>>>>> Should it be that something is not ready from network side? Then what?
>>>>>> Can you give me some hints on how to debug this?
>>>>>>
>>>>>> --
>>>>>> Raoul Scarazzini
>>>>>> rasca at redhat.com
>>>>>>
>>>>>> Il giorno 8/2/2016 12:01:29, Raoul Scarazzini ha scritto:
>>>>>>> Hi David,
>>>>>>> you are absolutely right, I did too many assumptions. First of all I'm
>>>>>>> installing everything with rdo-manager, using an identical set of
>>>>>>> configurations that came from my previous (working) osp-director 8 setup.
>>>>>>>
>>>>>>> Things seems to fail on compute node verifications. Specifically here:
>>>>>>>
>>>>>>> Feb 05 16:37:23 overcloud-novacompute-0 os-collect-config[6014]:
>>>>>>> [2016-02-05 16:37:23,707] (heat-config) [ERROR] Error running
>>>>>>> /var/lib/heat-config/heat-config-script/a435044e-9be8-42ea-8b03-92bee12b3d23.
>>>>>>> [1]
>>>>>>>
>>>>>>> Looking that script I identified two different actions:
>>>>>>>
>>>>>>> 1) # For each unique remote IP (specified via Heat) we check to
>>>>>>> # see if one of the locally configured networks matches and if so we
>>>>>>> # attempt a ping test the remote network IP.
>>>>>>>
>>>>>>> 2) # Ping all default gateways. There should only be one
>>>>>>> # if using upstream t-h-t network templates but we test
>>>>>>> # all of them should some manual network config have
>>>>>>> # multiple gateways.
>>>>>>>
>>>>>>> And in fact after a verification I'm not able to reach compute nodes
>>>>>>> from controllers or other computes inside one of the
>>>>>>> InternalApiAllocationPools (172.17.0) or TenantAllocationPools (172.16.0).
>>>>>>>
>>>>>>> I'm using a specific network setup, as I said the same one I was using
>>>>>>> with osp-director 8. So I've got a specific network-management.yaml file
>>>>>>> in which I've specified these settings:
>>>>>>>
>>>>>>> resource_registry:
>>>>>>>   OS::TripleO::BlockStorage::Net::SoftwareConfig:
>>>>>>> /home/stack/nic-configs/cinder-storage.yaml
>>>>>>>   OS::TripleO::Compute::Net::SoftwareConfig:
>>>>>>> /home/stack/nic-configs/compute.yaml
>>>>>>>   OS::TripleO::Controller::Net::SoftwareConfig:
>>>>>>> /home/stack/nic-configs/controller.yaml
>>>>>>>   OS::TripleO::ObjectStorage::Net::SoftwareConfig:
>>>>>>> /home/stack/nic-configs/swift-storage.yaml
>>>>>>>   OS::TripleO::CephStorage::Net::SoftwareConfig:
>>>>>>> /home/stack/nic-configs/ceph-storage.yaml
>>>>>>>
>>>>>>> parameter_defaults:
>>>>>>>   # Customize the IP subnets to match the local environment
>>>>>>>   InternalApiNetCidr: 172.17.0.0/24
>>>>>>>   StorageNetCidr: 172.18.0.0/24
>>>>>>>   StorageMgmtNetCidr: 172.19.0.0/24
>>>>>>>   TenantNetCidr: 172.16.0.0/24
>>>>>>>   ExternalNetCidr: 172.20.0.0/24
>>>>>>>   ControlPlaneSubnetCidr: '24'
>>>>>>>   InternalApiAllocationPools: [{'start': '172.17.0.10', 'end':
>>>>>>> '172.17.0.200'}]
>>>>>>>   StorageAllocationPools: [{'start': '172.18.0.10', 'end': '172.18.0.200'}]
>>>>>>>   StorageMgmtAllocationPools: [{'start': '172.19.0.10', 'end':
>>>>>>> '172.19.0.200'}]
>>>>>>>   TenantAllocationPools: [{'start': '172.16.0.10', 'end': '172.16.0.200'}]
>>>>>>>   ExternalAllocationPools: [{'start': '172.20.0.10', 'end': '172.20.0.200'}]
>>>>>>>   # Specify the gateway on the external network.
>>>>>>>   ExternalInterfaceDefaultRoute: 172.20.0.254
>>>>>>>   # Gateway router for the provisioning network (or Undercloud IP)
>>>>>>>   ControlPlaneDefaultRoute: 192.0.2.1
>>>>>>>   # Generally the IP of the Undercloud
>>>>>>>   EC2MetadataIp: 192.0.2.1
>>>>>>>   DnsServers: ["10.1.241.2"]
>>>>>>>   InternalApiNetworkVlanID: 2201
>>>>>>>   StorageNetworkVlanID: 2203
>>>>>>>   StorageMgmtNetworkVlanID: 2204
>>>>>>>   TenantNetworkVlanID: 2202
>>>>>>>   ExternalNetworkVlanID: 2205
>>>>>>>   # Floating IP networks do not have to use br-ex, they can use any
>>>>>>> bridge as long as the NeutronExternalNetworkBridge is set to "''".
>>>>>>>   NeutronExternalNetworkBridge: "''"
>>>>>>>
>>>>>>> # Variables in "parameters" apply an actual value to one of the
>>>>>>> top-level params
>>>>>>> parameters:
>>>>>>>   # The OVS logical->physical bridge mappings to use. Defaults to
>>>>>>> mapping br-ex - the external bridge on hosts - to a physical name
>>>>>>> 'datacentre' which can be used
>>>>>>>   # to create provider networks (and we use this for the default
>>>>>>> floating network) - if changing this either use different post-install
>>>>>>> network scripts or be sure
>>>>>>>   # to keep 'datacentre' as a mapping network name.
>>>>>>>   # Unfortunately this option is overridden by the command line, due to
>>>>>>> a limitation (that will be fixed), so even declaring this won't have effect.
>>>>>>>   # See overcloud-deploy.sh for all the explenations.
>>>>>>>   #
>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud-without-mergepy.yaml#L112
>>>>>>>   NeutronBridgeMappings: "datacentre:br-floating"
>>>>>>>
>>>>>>> Obviously the controller.yaml was modified to reflect my needs, as
>>>>>>> described here [1] and as you can see, I don't have declared a
>>>>>>> ManagementNetworkVlan, since before this was not needed.
>>>>>>> So the first question is: what is this new network and how it
>>>>>>> differentiate from the other networks actually available? Can this
>>>>>>> affect communications that before were working?
>>>>>>>
>>>>>>> Many thanks
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/rscarazz/openstack/blob/master/ospd-network-isolation-considerations.md
>>>>>>>
>>>>>>> --
>>>>>>> Raoul Scarazzini
>>>>>>> rasca at redhat.com
>>>>>>>
>>>>>>> Il giorno 6/2/2016 00:38:38, David Moreau Simard ha scritto:
>>>>>>>> Hi Raoul,
>>>>>>>>
>>>>>>>> A good start would be to give us some more details about how you did the
>>>>>>>> installation.
>>>>>>>>
>>>>>>>> What installation tool/procedure ? What repositories ?
>>>>>>>>
>>>>>>>> David Moreau Simard
>>>>>>>> Senior Software Engineer | Openstack RDO
>>>>>>>>
>>>>>>>> dmsimard = [irc, github, twitter]
>>>>>>>>
>>>>>>>> On Feb 5, 2016 5:16 AM, "Raoul Scarazzini" <rasca at redhat.com
>>>>>>>> <mailto:rasca at redhat.com>> wrote:
>>>>>>>>
>>>>>>>>     Hi,
>>>>>>>>     I'm trying to deploy Mitaka on a baremetal environment composed
>>>>>>>>     by 3 controllers and 4 computes.
>>>>>>>>     After introspection nodes seems fine, even for one of them I need to do
>>>>>>>>     introspection by hand, since it was not completing the process. But in
>>>>>>>>     the end all my 7 nodes were in state "available".
>>>>>>>>
>>>>>>>>     Launching the overcloud deploy, the controller part it goes fine, but
>>>>>>>>     then it gives me this error about compute:
>>>>>>>>
>>>>>>>>     2016-02-05 09:26:59 [NovaCompute]: CREATE_FAILED  ResourceInError:
>>>>>>>>     resources.NovaCompute: Went to status ERROR due to "Message: Exceeded
>>>>>>>>     maximum number of retries. Exceeded max scheduling at
>>>>>>>>     tempts 3 for instance 0227f7c1-3c2b-4e10-93bf-e7d84a7aca71. Last
>>>>>>>>     exception: Port b8:ca:3a:66:ef:5a is still in use.
>>>>>>>>
>>>>>>>>     The funny thing is that I can't find anywhere the incriminated ID it's
>>>>>>>>     not an Ironic node ID and neither a Nova one.
>>>>>>>>
>>>>>>>>     Can you help me point the attention in the right direction?
>>>>>>>>
>>>>>>>>     Many thanks,
>>>>>>>>
>>>>>>>>     --
>>>>>>>>     Raoul Scarazzini
>>>>>>>>     rasca at redhat.com <mailto:rasca at redhat.com>
>>>>>>>>
>>>>>>>>     _______________________________________________
>>>>>>>>     Rdo-list mailing list
>>>>>>>>     Rdo-list at redhat.com <mailto:Rdo-list at redhat.com>
>>>>>>>>     https://www.redhat.com/mailman/listinfo/rdo-list
>>>>>>>>
>>>>>>>>     To unsubscribe: rdo-list-unsubscribe at redhat.com
>>>>>>>>     <mailto:rdo-list-unsubscribe at redhat.com>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Rdo-list mailing list
>>>>>>> Rdo-list at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/rdo-list
>>>>>>>
>>>>>>> To unsubscribe: rdo-list-unsubscribe at redhat.com
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Rdo-list mailing list
>>>>>> Rdo-list at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/rdo-list
>>>>>>
>>>>>> To unsubscribe: rdo-list-unsubscribe at redhat.com




More information about the dev mailing list