Hi Marius,
your assumption is totally right: the external network is going to live
just on the controllers.
What I don't understand is why it's configured on the compute nodes. My
compute.yaml looks like this [1] and as you can see there's no external
network declaration and in any case "use_dhcp" is already set to false
for the main ovs_bridge. So why is this populating in any case?
Many thanks,
[1]
--
Raoul Scarazzini
rasca(a)redhat.com
Il giorno 9/2/2016 11:36:02, Marius Cornea ha scritto:
Hi Raoul,
Thanks for the output. Can you confirm what is the purpose of the
10.1.241.0/24 subnet?
I'm making an assumption and say that it's used for the external
network. In this case it shouldn't be set on the compute nodes as they
don't require connectivity on that network. I believe it gets
configured via DHCP, I'm not really sure why the connectivity check
fail during validation time. Can you try disabling dhcp for the em1
interface in the compute nic template and see the result? You can add
something like this to the os_net_config network_config:
-
type: interface
name: nic1
use_dhcp: false
Thanks,
Marius
On Tue, Feb 9, 2016 at 7:58 AM, Raoul Scarazzini <rasca(a)redhat.com> wrote:
> Hi Marius,
> here it is:
>
> # cat /etc/os-net-config/config.json | python -m json.tool
> {
> "network_config": [
> {
> "addresses": [
> {
> "ip_netmask": "192.0.2.21/24"
> }
> ],
> "dns_servers": [
> "10.1.241.2"
> ],
> "members": [
> {
> "name": "nic2",
> "primary": true,
> "type": "interface"
> },
> {
> "addresses": [
> {
> "ip_netmask": "172.17.0.12/24"
> }
> ],
> "type": "vlan",
> "vlan_id": 2201
> },
> {
> "addresses": [
> {
> "ip_netmask": "172.18.0.11/24"
> }
> ],
> "type": "vlan",
> "vlan_id": 2203
> },
> {
> "addresses": [
> {
> "ip_netmask": "172.16.0.10/24"
> }
> ],
> "type": "vlan",
> "vlan_id": 2202
> }
> ],
> "name": "br-ex",
> "routes": [
> {
> "ip_netmask": "169.254.169.254/32",
> "next_hop": "192.0.2.1"
> },
> {
> "default": true,
> "next_hop": "192.0.2.1"
> }
> ],
> "type": "ovs_bridge",
> "use_dhcp": false
> }
> ]
> }
>
> # ip a
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
>
> [4/1995]
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 scope host lo
> valid_lft forever preferred_lft forever
> inet6 ::1/128 scope host
> valid_lft forever preferred_lft forever
> 2: em3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN
> qlen 1000
> link/ether b8:ca:3a:66:f1:b4 brd ff:ff:ff:ff:ff:ff
> 3: em4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN
> qlen 1000
> link/ether b8:ca:3a:66:f1:b5 brd ff:ff:ff:ff:ff:ff
> 4: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
> qlen 1000
> link/ether b8:ca:3a:66:f1:b0 brd ff:ff:ff:ff:ff:ff
> inet 10.1.241.9/24 brd 10.1.241.255 scope global dynamic em1
> valid_lft 530sec preferred_lft 530sec
> inet6 fe80::baca:3aff:fe66:f1b0/64 scope link
> valid_lft forever preferred_lft forever
> 5: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
> ovs-system state UP qlen 1000
> link/ether b8:ca:3a:66:f1:b2 brd ff:ff:ff:ff:ff:ff
> inet6 fe80::baca:3aff:fe66:f1b2/64 scope link
> valid_lft forever preferred_lft forever
> 6: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> link/ether 06:65:08:31:11:35 brd ff:ff:ff:ff:ff:ff
> 7: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
> UNKNOWN
> link/ether b8:ca:3a:66:f1:b2 brd ff:ff:ff:ff:ff:ff
> inet 192.0.2.21/24 brd 192.0.2.255 scope global br-ex
> valid_lft forever preferred_lft forever
> inet6 fe80::baca:3aff:fe66:f1b2/64 scope link
> valid_lft forever preferred_lft forever
> 8: vlan2203: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
> state UNKNOWN
> link/ether 96:6a:10:5b:1a:47 brd ff:ff:ff:ff:ff:ff
> inet 172.18.0.11/24 brd 172.18.0.255 scope global vlan2203
> valid_lft forever preferred_lft forever
> inet6 fe80::946a:10ff:fe5b:1a47/64 scope link
> valid_lft forever preferred_lft forever
> 9: vlan2202: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
> state UNKNOWN
> link/ether ca:e0:0d:b0:7e:30 brd ff:ff:ff:ff:ff:ff
> inet 172.16.0.10/24 brd 172.16.0.255 scope global vlan2202
> valid_lft forever preferred_lft forever
> inet6 fe80::c8e0:dff:feb0:7e30/64 scope link
> valid_lft forever preferred_lft forever
> 10: vlan2201: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
> state UNKNOWN
> link/ether c2:24:5f:4c:37:6c brd ff:ff:ff:ff:ff:ff
> inet 172.17.0.12/24 brd 172.17.0.255 scope global vlan2201
> valid_lft forever preferred_lft forever
> inet6 fe80::c024:5fff:fe4c:376c/64 scope link
> valid_lft forever preferred_lft forever
> 11: br-floating: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> noqueue state UNKNOWN
> link/ether de:7f:cd:d7:c1:46 brd ff:ff:ff:ff:ff:ff
> inet6 fe80::dc7f:cdff:fed7:c146/64 scope link
> valid_lft forever preferred_lft forever
> 12: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> link/ether 36:39:0f:39:85:4c brd ff:ff:ff:ff:ff:ff
> 13: br-tun: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> link/ether 72:d1:82:d9:15:4f brd ff:ff:ff:ff:ff:ff
>
> # ip r
> default via 10.1.241.254 dev em1
> 10.1.241.0/24 dev em1 proto kernel scope link src 10.1.241.9
> 169.254.169.254 via 192.0.2.1 dev br-ex
> 172.16.0.0/24 dev vlan2202 proto kernel scope link src 172.16.0.10
> 172.17.0.0/24 dev vlan2201 proto kernel scope link src 172.17.0.12
> 172.18.0.0/24 dev vlan2203 proto kernel scope link src 172.18.0.11
> 192.0.2.0/24 dev br-ex proto kernel scope link src 192.0.2.21
>
> Many thanks,
>
> --
> Raoul Scarazzini
> rasca(a)redhat.com
>
> Il giorno 8/2/2016 18:26:45, Marius Cornea ha scritto:
>> Hi Raoul,
>>
>> Can you post the output of the following commands on that compute node please?
>>
>> cat /etc/os-net-config/config.json | python -m json.tool
>> ip a
>> ip r
>>
>> Thanks
>>
>> On Mon, Feb 8, 2016 at 6:15 PM, Raoul Scarazzini <rasca(a)redhat.com> wrote:
>>> Just another update. I fixed the connectivity issue (nic1 and nic2 were
>>> inverted in yaml files) but the setup files anyway.
>>> The problem now, looking into the compute's
>>> /var/lib/heat-config/deployed directory) is this one:
>>>
>>> {
>>> "deploy_stdout": "Trying to ping 172.16.0.14 for local
network
>>> 172.16.0.0/24...SUCCESS\nTrying to ping 172.17.0.16 for local network
>>> 172.17.0.0/24...SUCCESS\nTrying to ping 172.18.0.14 for local network
>>> 172.18.0.0/24...SUCCESS\nTrying to ping default gateway
>>> 10.1.241.254...FAILURE\n10.1.241.254 is not pingable.\n",
>>> "deploy_stderr": "",
>>> "deploy_status_code": 1
>>> }
>>>
>>> Funny thing is that I'm able to ping 10.1.241.254 from the compute
>>> nodes, and so I'm asking why it is failing during the deployment.
>>>
>>> Should it be that something is not ready from network side? Then what?
>>> Can you give me some hints on how to debug this?
>>>
>>> --
>>> Raoul Scarazzini
>>> rasca(a)redhat.com
>>>
>>> Il giorno 8/2/2016 12:01:29, Raoul Scarazzini ha scritto:
>>>> Hi David,
>>>> you are absolutely right, I did too many assumptions. First of all
I'm
>>>> installing everything with rdo-manager, using an identical set of
>>>> configurations that came from my previous (working) osp-director 8
setup.
>>>>
>>>> Things seems to fail on compute node verifications. Specifically here:
>>>>
>>>> Feb 05 16:37:23 overcloud-novacompute-0 os-collect-config[6014]:
>>>> [2016-02-05 16:37:23,707] (heat-config) [ERROR] Error running
>>>>
/var/lib/heat-config/heat-config-script/a435044e-9be8-42ea-8b03-92bee12b3d23.
>>>> [1]
>>>>
>>>> Looking that script I identified two different actions:
>>>>
>>>> 1) # For each unique remote IP (specified via Heat) we check to
>>>> # see if one of the locally configured networks matches and if so we
>>>> # attempt a ping test the remote network IP.
>>>>
>>>> 2) # Ping all default gateways. There should only be one
>>>> # if using upstream t-h-t network templates but we test
>>>> # all of them should some manual network config have
>>>> # multiple gateways.
>>>>
>>>> And in fact after a verification I'm not able to reach compute nodes
>>>> from controllers or other computes inside one of the
>>>> InternalApiAllocationPools (172.17.0) or TenantAllocationPools
(172.16.0).
>>>>
>>>> I'm using a specific network setup, as I said the same one I was
using
>>>> with osp-director 8. So I've got a specific network-management.yaml
file
>>>> in which I've specified these settings:
>>>>
>>>> resource_registry:
>>>> OS::TripleO::BlockStorage::Net::SoftwareConfig:
>>>> /home/stack/nic-configs/cinder-storage.yaml
>>>> OS::TripleO::Compute::Net::SoftwareConfig:
>>>> /home/stack/nic-configs/compute.yaml
>>>> OS::TripleO::Controller::Net::SoftwareConfig:
>>>> /home/stack/nic-configs/controller.yaml
>>>> OS::TripleO::ObjectStorage::Net::SoftwareConfig:
>>>> /home/stack/nic-configs/swift-storage.yaml
>>>> OS::TripleO::CephStorage::Net::SoftwareConfig:
>>>> /home/stack/nic-configs/ceph-storage.yaml
>>>>
>>>> parameter_defaults:
>>>> # Customize the IP subnets to match the local environment
>>>> InternalApiNetCidr: 172.17.0.0/24
>>>> StorageNetCidr: 172.18.0.0/24
>>>> StorageMgmtNetCidr: 172.19.0.0/24
>>>> TenantNetCidr: 172.16.0.0/24
>>>> ExternalNetCidr: 172.20.0.0/24
>>>> ControlPlaneSubnetCidr: '24'
>>>> InternalApiAllocationPools: [{'start': '172.17.0.10',
'end':
>>>> '172.17.0.200'}]
>>>> StorageAllocationPools: [{'start': '172.18.0.10',
'end': '172.18.0.200'}]
>>>> StorageMgmtAllocationPools: [{'start': '172.19.0.10',
'end':
>>>> '172.19.0.200'}]
>>>> TenantAllocationPools: [{'start': '172.16.0.10',
'end': '172.16.0.200'}]
>>>> ExternalAllocationPools: [{'start': '172.20.0.10',
'end': '172.20.0.200'}]
>>>> # Specify the gateway on the external network.
>>>> ExternalInterfaceDefaultRoute: 172.20.0.254
>>>> # Gateway router for the provisioning network (or Undercloud IP)
>>>> ControlPlaneDefaultRoute: 192.0.2.1
>>>> # Generally the IP of the Undercloud
>>>> EC2MetadataIp: 192.0.2.1
>>>> DnsServers: ["10.1.241.2"]
>>>> InternalApiNetworkVlanID: 2201
>>>> StorageNetworkVlanID: 2203
>>>> StorageMgmtNetworkVlanID: 2204
>>>> TenantNetworkVlanID: 2202
>>>> ExternalNetworkVlanID: 2205
>>>> # Floating IP networks do not have to use br-ex, they can use any
>>>> bridge as long as the NeutronExternalNetworkBridge is set to
"''".
>>>> NeutronExternalNetworkBridge: "''"
>>>>
>>>> # Variables in "parameters" apply an actual value to one of
the
>>>> top-level params
>>>> parameters:
>>>> # The OVS logical->physical bridge mappings to use. Defaults to
>>>> mapping br-ex - the external bridge on hosts - to a physical name
>>>> 'datacentre' which can be used
>>>> # to create provider networks (and we use this for the default
>>>> floating network) - if changing this either use different post-install
>>>> network scripts or be sure
>>>> # to keep 'datacentre' as a mapping network name.
>>>> # Unfortunately this option is overridden by the command line, due to
>>>> a limitation (that will be fixed), so even declaring this won't have
effect.
>>>> # See overcloud-deploy.sh for all the explenations.
>>>> #
>>>>
https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud...
>>>> NeutronBridgeMappings: "datacentre:br-floating"
>>>>
>>>> Obviously the controller.yaml was modified to reflect my needs, as
>>>> described here [1] and as you can see, I don't have declared a
>>>> ManagementNetworkVlan, since before this was not needed.
>>>> So the first question is: what is this new network and how it
>>>> differentiate from the other networks actually available? Can this
>>>> affect communications that before were working?
>>>>
>>>> Many thanks
>>>>
>>>> [1]
>>>>
https://github.com/rscarazz/openstack/blob/master/ospd-network-isolation-...
>>>>
>>>> --
>>>> Raoul Scarazzini
>>>> rasca(a)redhat.com
>>>>
>>>> Il giorno 6/2/2016 00:38:38, David Moreau Simard ha scritto:
>>>>> Hi Raoul,
>>>>>
>>>>> A good start would be to give us some more details about how you did
the
>>>>> installation.
>>>>>
>>>>> What installation tool/procedure ? What repositories ?
>>>>>
>>>>> David Moreau Simard
>>>>> Senior Software Engineer | Openstack RDO
>>>>>
>>>>> dmsimard = [irc, github, twitter]
>>>>>
>>>>> On Feb 5, 2016 5:16 AM, "Raoul Scarazzini"
<rasca(a)redhat.com
>>>>> <mailto:rasca@redhat.com>> wrote:
>>>>>
>>>>> Hi,
>>>>> I'm trying to deploy Mitaka on a baremetal environment
composed
>>>>> by 3 controllers and 4 computes.
>>>>> After introspection nodes seems fine, even for one of them I need
to do
>>>>> introspection by hand, since it was not completing the process.
But in
>>>>> the end all my 7 nodes were in state "available".
>>>>>
>>>>> Launching the overcloud deploy, the controller part it goes fine,
but
>>>>> then it gives me this error about compute:
>>>>>
>>>>> 2016-02-05 09:26:59 [NovaCompute]: CREATE_FAILED
ResourceInError:
>>>>> resources.NovaCompute: Went to status ERROR due to "Message:
Exceeded
>>>>> maximum number of retries. Exceeded max scheduling at
>>>>> tempts 3 for instance 0227f7c1-3c2b-4e10-93bf-e7d84a7aca71. Last
>>>>> exception: Port b8:ca:3a:66:ef:5a is still in use.
>>>>>
>>>>> The funny thing is that I can't find anywhere the
incriminated ID it's
>>>>> not an Ironic node ID and neither a Nova one.
>>>>>
>>>>> Can you help me point the attention in the right direction?
>>>>>
>>>>> Many thanks,
>>>>>
>>>>> --
>>>>> Raoul Scarazzini
>>>>> rasca(a)redhat.com <mailto:rasca@redhat.com>
>>>>>
>>>>> _______________________________________________
>>>>> Rdo-list mailing list
>>>>> Rdo-list(a)redhat.com <mailto:Rdo-list@redhat.com>
>>>>>
https://www.redhat.com/mailman/listinfo/rdo-list
>>>>>
>>>>> To unsubscribe: rdo-list-unsubscribe(a)redhat.com
>>>>> <mailto:rdo-list-unsubscribe@redhat.com>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Rdo-list mailing list
>>>> Rdo-list(a)redhat.com
>>>>
https://www.redhat.com/mailman/listinfo/rdo-list
>>>>
>>>> To unsubscribe: rdo-list-unsubscribe(a)redhat.com
>>>>
>>>
>>> _______________________________________________
>>> Rdo-list mailing list
>>> Rdo-list(a)redhat.com
>>>
https://www.redhat.com/mailman/listinfo/rdo-list
>>>
>>> To unsubscribe: rdo-list-unsubscribe(a)redhat.com