[Rdo-list] Failing to deploy Mitaka on baremetal

Tue Feb 9 11:00:59 UTC 2016

On Tue, Feb 9, 2016 at 11:51 AM, Raoul Scarazzini <rasca at redhat.com> wrote:
> Hi Marius,
> your assumption is totally right: the external network is going to live
> just on the controllers.
>
> What I don't understand is why it's configured on the compute nodes. My
> compute.yaml looks like this [1] and as you can see there's no external
> network declaration and in any case "use_dhcp" is already set to false
> for the main ovs_bridge. So why is this populating in any case?

This is because by default the dhcp client will run on the interfaces
that are not specified in the nic template. The ovs bridge contains
the em2 interface but in your case the dhcp address gets configured on
the em1 nic which is not specified in the template. Can you try
something like this for the compute nic template and see if it allows
you to continue with the deployment:
http://paste.openstack.org/show/486385/

Thanks

> Many thanks,
>
> [1] http://pastebin.test.redhat.com/347199
>
> --
> Raoul Scarazzini
> rasca at redhat.com
>
> Il giorno 9/2/2016 11:36:02, Marius Cornea ha scritto:
>> Hi Raoul,
>>
>> Thanks for the output. Can you confirm what is the purpose of the
>> 10.1.241.0/24 subnet?
>>
>> I'm making an assumption and say that it's used for the external
>> network. In this case it shouldn't be set on the compute nodes as they
>> don't require connectivity on that network. I believe it gets
>> configured via DHCP, I'm not really sure why the connectivity check
>> fail during validation time. Can you try disabling dhcp for the em1
>> interface in the compute nic template and see the result? You can add
>> something like this to the os_net_config network_config:
>>
>> -
>>   type: interface
>>   name: nic1
>>   use_dhcp: false
>>
>> Thanks,
>> Marius
>>
>> On Tue, Feb 9, 2016 at 7:58 AM, Raoul Scarazzini <rasca at redhat.com> wrote:
>>> Hi Marius,
>>> here it is:
>>>
>>> # cat /etc/os-net-config/config.json | python -m json.tool
>>> {
>>>     "network_config": [
>>>         {
>>>             "addresses": [
>>>                 {
>>>                     "ip_netmask": "192.0.2.21/24"
>>>                 }
>>>             ],
>>>             "dns_servers": [
>>>                 "10.1.241.2"
>>>             ],
>>>             "members": [
>>>                 {
>>>                     "name": "nic2",
>>>                     "primary": true,
>>>                     "type": "interface"
>>>                 },
>>>                 {
>>>                     "addresses": [
>>>                         {
>>>                             "ip_netmask": "172.17.0.12/24"
>>>                         }
>>>                     ],
>>>                     "type": "vlan",
>>>                     "vlan_id": 2201
>>>                 },
>>>                 {
>>>                     "addresses": [
>>>                         {
>>>                             "ip_netmask": "172.18.0.11/24"
>>>                         }
>>>                     ],
>>>                     "type": "vlan",
>>>                     "vlan_id": 2203
>>>                 },
>>>                 {
>>>                     "addresses": [
>>>                         {
>>>                             "ip_netmask": "172.16.0.10/24"
>>>                         }
>>>                     ],
>>>                     "type": "vlan",
>>>                     "vlan_id": 2202
>>>                 }
>>>             ],
>>>             "name": "br-ex",
>>>             "routes": [
>>>                 {
>>>                     "ip_netmask": "169.254.169.254/32",
>>>                     "next_hop": "192.0.2.1"
>>>                 },
>>>                 {
>>>                     "default": true,
>>>                     "next_hop": "192.0.2.1"
>>>                 }
>>>             ],
>>>             "type": "ovs_bridge",
>>>             "use_dhcp": false
>>>         }
>>>     ]
>>> }
>>>
>>> # ip a
>>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
>>>
>>>                                       [4/1995]
>>>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>>     inet 127.0.0.1/8 scope host lo
>>>        valid_lft forever preferred_lft forever
>>>     inet6 ::1/128 scope host
>>>        valid_lft forever preferred_lft forever
>>> 2: em3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN
>>> qlen 1000
>>>     link/ether b8:ca:3a:66:f1:b4 brd ff:ff:ff:ff:ff:ff
>>> 3: em4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN
>>> qlen 1000
>>>     link/ether b8:ca:3a:66:f1:b5 brd ff:ff:ff:ff:ff:ff
>>> 4: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
>>> qlen 1000
>>>     link/ether b8:ca:3a:66:f1:b0 brd ff:ff:ff:ff:ff:ff
>>>     inet 10.1.241.9/24 brd 10.1.241.255 scope global dynamic em1
>>>        valid_lft 530sec preferred_lft 530sec
>>>     inet6 fe80::baca:3aff:fe66:f1b0/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 5: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
>>> ovs-system state UP qlen 1000
>>>     link/ether b8:ca:3a:66:f1:b2 brd ff:ff:ff:ff:ff:ff
>>>     inet6 fe80::baca:3aff:fe66:f1b2/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 6: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>>>     link/ether 06:65:08:31:11:35 brd ff:ff:ff:ff:ff:ff
>>> 7: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
>>> UNKNOWN
>>>     link/ether b8:ca:3a:66:f1:b2 brd ff:ff:ff:ff:ff:ff
>>>     inet 192.0.2.21/24 brd 192.0.2.255 scope global br-ex
>>>        valid_lft forever preferred_lft forever
>>>     inet6 fe80::baca:3aff:fe66:f1b2/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 8: vlan2203: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
>>> state UNKNOWN
>>>     link/ether 96:6a:10:5b:1a:47 brd ff:ff:ff:ff:ff:ff
>>>     inet 172.18.0.11/24 brd 172.18.0.255 scope global vlan2203
>>>        valid_lft forever preferred_lft forever
>>>     inet6 fe80::946a:10ff:fe5b:1a47/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 9: vlan2202: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
>>> state UNKNOWN
>>>     link/ether ca:e0:0d:b0:7e:30 brd ff:ff:ff:ff:ff:ff
>>>     inet 172.16.0.10/24 brd 172.16.0.255 scope global vlan2202
>>>        valid_lft forever preferred_lft forever
>>>     inet6 fe80::c8e0:dff:feb0:7e30/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 10: vlan2201: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
>>> state UNKNOWN
>>>     link/ether c2:24:5f:4c:37:6c brd ff:ff:ff:ff:ff:ff
>>>     inet 172.17.0.12/24 brd 172.17.0.255 scope global vlan2201
>>>        valid_lft forever preferred_lft forever
>>>     inet6 fe80::c024:5fff:fe4c:376c/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 11: br-floating: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
>>> noqueue state UNKNOWN
>>>     link/ether de:7f:cd:d7:c1:46 brd ff:ff:ff:ff:ff:ff
>>>     inet6 fe80::dc7f:cdff:fed7:c146/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 12: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>>>     link/ether 36:39:0f:39:85:4c brd ff:ff:ff:ff:ff:ff
>>> 13: br-tun: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>>>     link/ether 72:d1:82:d9:15:4f brd ff:ff:ff:ff:ff:ff
>>>
>>> # ip r
>>> default via 10.1.241.254 dev em1
>>> 10.1.241.0/24 dev em1  proto kernel  scope link  src 10.1.241.9
>>> 169.254.169.254 via 192.0.2.1 dev br-ex
>>> 172.16.0.0/24 dev vlan2202  proto kernel  scope link  src 172.16.0.10
>>> 172.17.0.0/24 dev vlan2201  proto kernel  scope link  src 172.17.0.12
>>> 172.18.0.0/24 dev vlan2203  proto kernel  scope link  src 172.18.0.11
>>> 192.0.2.0/24 dev br-ex  proto kernel  scope link  src 192.0.2.21
>>>
>>> Many thanks,
>>>
>>> --
>>> Raoul Scarazzini
>>> rasca at redhat.com
>>>
>>> Il giorno 8/2/2016 18:26:45, Marius Cornea ha scritto:
>>>> Hi Raoul,
>>>>
>>>> Can you post the output of the following commands on that compute node please?
>>>>
>>>> cat /etc/os-net-config/config.json | python -m json.tool
>>>> ip a
>>>> ip r
>>>>
>>>> Thanks
>>>>
>>>> On Mon, Feb 8, 2016 at 6:15 PM, Raoul Scarazzini <rasca at redhat.com> wrote:
>>>>> Just another update. I fixed the connectivity issue (nic1 and nic2 were
>>>>> inverted in yaml files) but the setup files anyway.
>>>>> The problem now, looking into the compute's
>>>>> /var/lib/heat-config/deployed directory) is this one:
>>>>>
>>>>> {
>>>>>   "deploy_stdout": "Trying to ping 172.16.0.14 for local network
>>>>> 172.16.0.0/24...SUCCESS\nTrying to ping 172.17.0.16 for local network
>>>>> 172.17.0.0/24...SUCCESS\nTrying to ping 172.18.0.14 for local network
>>>>> 172.18.0.0/24...SUCCESS\nTrying to ping default gateway
>>>>> 10.1.241.254...FAILURE\n10.1.241.254 is not pingable.\n",
>>>>>   "deploy_stderr": "",
>>>>>   "deploy_status_code": 1
>>>>> }
>>>>>
>>>>> Funny thing is that I'm able to ping 10.1.241.254 from the compute
>>>>> nodes, and so I'm asking why it is failing during the deployment.
>>>>>
>>>>> Should it be that something is not ready from network side? Then what?
>>>>> Can you give me some hints on how to debug this?
>>>>>
>>>>> --
>>>>> Raoul Scarazzini
>>>>> rasca at redhat.com
>>>>>
>>>>> Il giorno 8/2/2016 12:01:29, Raoul Scarazzini ha scritto:
>>>>>> Hi David,
>>>>>> you are absolutely right, I did too many assumptions. First of all I'm
>>>>>> installing everything with rdo-manager, using an identical set of
>>>>>> configurations that came from my previous (working) osp-director 8 setup.
>>>>>>
>>>>>> Things seems to fail on compute node verifications. Specifically here:
>>>>>>
>>>>>> Feb 05 16:37:23 overcloud-novacompute-0 os-collect-config[6014]:
>>>>>> [2016-02-05 16:37:23,707] (heat-config) [ERROR] Error running
>>>>>> /var/lib/heat-config/heat-config-script/a435044e-9be8-42ea-8b03-92bee12b3d23.
>>>>>> [1]
>>>>>>
>>>>>> Looking that script I identified two different actions:
>>>>>>
>>>>>> 1) # For each unique remote IP (specified via Heat) we check to
>>>>>> # see if one of the locally configured networks matches and if so we
>>>>>> # attempt a ping test the remote network IP.
>>>>>>
>>>>>> 2) # Ping all default gateways. There should only be one
>>>>>> # if using upstream t-h-t network templates but we test
>>>>>> # all of them should some manual network config have
>>>>>> # multiple gateways.
>>>>>>
>>>>>> And in fact after a verification I'm not able to reach compute nodes
>>>>>> from controllers or other computes inside one of the
>>>>>> InternalApiAllocationPools (172.17.0) or TenantAllocationPools (172.16.0).
>>>>>>
>>>>>> I'm using a specific network setup, as I said the same one I was using
>>>>>> with osp-director 8. So I've got a specific network-management.yaml file
>>>>>> in which I've specified these settings:
>>>>>>
>>>>>> resource_registry:
>>>>>>   OS::TripleO::BlockStorage::Net::SoftwareConfig:
>>>>>> /home/stack/nic-configs/cinder-storage.yaml
>>>>>>   OS::TripleO::Compute::Net::SoftwareConfig:
>>>>>> /home/stack/nic-configs/compute.yaml
>>>>>>   OS::TripleO::Controller::Net::SoftwareConfig:
>>>>>> /home/stack/nic-configs/controller.yaml
>>>>>>   OS::TripleO::ObjectStorage::Net::SoftwareConfig:
>>>>>> /home/stack/nic-configs/swift-storage.yaml
>>>>>>   OS::TripleO::CephStorage::Net::SoftwareConfig:
>>>>>> /home/stack/nic-configs/ceph-storage.yaml
>>>>>>
>>>>>> parameter_defaults:
>>>>>>   # Customize the IP subnets to match the local environment
>>>>>>   InternalApiNetCidr: 172.17.0.0/24
>>>>>>   StorageNetCidr: 172.18.0.0/24
>>>>>>   StorageMgmtNetCidr: 172.19.0.0/24
>>>>>>   TenantNetCidr: 172.16.0.0/24
>>>>>>   ExternalNetCidr: 172.20.0.0/24
>>>>>>   ControlPlaneSubnetCidr: '24'
>>>>>>   InternalApiAllocationPools: [{'start': '172.17.0.10', 'end':
>>>>>> '172.17.0.200'}]
>>>>>>   StorageAllocationPools: [{'start': '172.18.0.10', 'end': '172.18.0.200'}]
>>>>>>   StorageMgmtAllocationPools: [{'start': '172.19.0.10', 'end':
>>>>>> '172.19.0.200'}]
>>>>>>   TenantAllocationPools: [{'start': '172.16.0.10', 'end': '172.16.0.200'}]
>>>>>>   ExternalAllocationPools: [{'start': '172.20.0.10', 'end': '172.20.0.200'}]
>>>>>>   # Specify the gateway on the external network.
>>>>>>   ExternalInterfaceDefaultRoute: 172.20.0.254
>>>>>>   # Gateway router for the provisioning network (or Undercloud IP)
>>>>>>   ControlPlaneDefaultRoute: 192.0.2.1
>>>>>>   # Generally the IP of the Undercloud
>>>>>>   EC2MetadataIp: 192.0.2.1
>>>>>>   DnsServers: ["10.1.241.2"]
>>>>>>   InternalApiNetworkVlanID: 2201
>>>>>>   StorageNetworkVlanID: 2203
>>>>>>   StorageMgmtNetworkVlanID: 2204
>>>>>>   TenantNetworkVlanID: 2202
>>>>>>   ExternalNetworkVlanID: 2205
>>>>>>   # Floating IP networks do not have to use br-ex, they can use any
>>>>>> bridge as long as the NeutronExternalNetworkBridge is set to "''".
>>>>>>   NeutronExternalNetworkBridge: "''"
>>>>>>
>>>>>> # Variables in "parameters" apply an actual value to one of the
>>>>>> top-level params
>>>>>> parameters:
>>>>>>   # The OVS logical->physical bridge mappings to use. Defaults to
>>>>>> mapping br-ex - the external bridge on hosts - to a physical name
>>>>>> 'datacentre' which can be used
>>>>>>   # to create provider networks (and we use this for the default
>>>>>> floating network) - if changing this either use different post-install
>>>>>> network scripts or be sure
>>>>>>   # to keep 'datacentre' as a mapping network name.
>>>>>>   # Unfortunately this option is overridden by the command line, due to
>>>>>> a limitation (that will be fixed), so even declaring this won't have effect.
>>>>>>   # See overcloud-deploy.sh for all the explenations.
>>>>>>   #
>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud-without-mergepy.yaml#L112
>>>>>>   NeutronBridgeMappings: "datacentre:br-floating"
>>>>>>
>>>>>> Obviously the controller.yaml was modified to reflect my needs, as
>>>>>> described here [1] and as you can see, I don't have declared a
>>>>>> ManagementNetworkVlan, since before this was not needed.
>>>>>> So the first question is: what is this new network and how it
>>>>>> differentiate from the other networks actually available? Can this
>>>>>> affect communications that before were working?
>>>>>>
>>>>>> Many thanks
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/rscarazz/openstack/blob/master/ospd-network-isolation-considerations.md
>>>>>>
>>>>>> --
>>>>>> Raoul Scarazzini
>>>>>> rasca at redhat.com
>>>>>>
>>>>>> Il giorno 6/2/2016 00:38:38, David Moreau Simard ha scritto:
>>>>>>> Hi Raoul,
>>>>>>>
>>>>>>> A good start would be to give us some more details about how you did the
>>>>>>> installation.
>>>>>>>
>>>>>>> What installation tool/procedure ? What repositories ?
>>>>>>>
>>>>>>> David Moreau Simard
>>>>>>> Senior Software Engineer | Openstack RDO
>>>>>>>
>>>>>>> dmsimard = [irc, github, twitter]
>>>>>>>
>>>>>>> On Feb 5, 2016 5:16 AM, "Raoul Scarazzini" <rasca at redhat.com
>>>>>>> <mailto:rasca at redhat.com>> wrote:
>>>>>>>
>>>>>>>     Hi,
>>>>>>>     I'm trying to deploy Mitaka on a baremetal environment composed
>>>>>>>     by 3 controllers and 4 computes.
>>>>>>>     After introspection nodes seems fine, even for one of them I need to do
>>>>>>>     introspection by hand, since it was not completing the process. But in
>>>>>>>     the end all my 7 nodes were in state "available".
>>>>>>>
>>>>>>>     Launching the overcloud deploy, the controller part it goes fine, but
>>>>>>>     then it gives me this error about compute:
>>>>>>>
>>>>>>>     2016-02-05 09:26:59 [NovaCompute]: CREATE_FAILED  ResourceInError:
>>>>>>>     resources.NovaCompute: Went to status ERROR due to "Message: Exceeded
>>>>>>>     maximum number of retries. Exceeded max scheduling at
>>>>>>>     tempts 3 for instance 0227f7c1-3c2b-4e10-93bf-e7d84a7aca71. Last
>>>>>>>     exception: Port b8:ca:3a:66:ef:5a is still in use.
>>>>>>>
>>>>>>>     The funny thing is that I can't find anywhere the incriminated ID it's
>>>>>>>     not an Ironic node ID and neither a Nova one.
>>>>>>>
>>>>>>>     Can you help me point the attention in the right direction?
>>>>>>>
>>>>>>>     Many thanks,
>>>>>>>
>>>>>>>     --
>>>>>>>     Raoul Scarazzini
>>>>>>>     rasca at redhat.com <mailto:rasca at redhat.com>
>>>>>>>
>>>>>>>     _______________________________________________
>>>>>>>     Rdo-list mailing list
>>>>>>>     Rdo-list at redhat.com <mailto:Rdo-list at redhat.com>
>>>>>>>     https://www.redhat.com/mailman/listinfo/rdo-list
>>>>>>>
>>>>>>>     To unsubscribe: rdo-list-unsubscribe at redhat.com
>>>>>>>     <mailto:rdo-list-unsubscribe at redhat.com>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Rdo-list mailing list
>>>>>> Rdo-list at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/rdo-list
>>>>>>
>>>>>> To unsubscribe: rdo-list-unsubscribe at redhat.com
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Rdo-list mailing list
>>>>> Rdo-list at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/rdo-list
>>>>>
>>>>> To unsubscribe: rdo-list-unsubscribe at redhat.com