[rdo-list] Problem with ha-router
Cedric Lecomte
clecomte at redhat.com
Mon Oct 23 08:23:43 UTC 2017
Hello all,
I tried to deploy RDO Pike without container on our internal plateform.
The setup is pretty simple :
- 3 Controller in HA
- 5 Ceph
- 4 Compute
- 3 Object-Store
I didn't used any exotic parameter.
This is my deployment command :
openstack overcloud deploy --templates
-e environement.yaml
--ntp-server 0.pool.ntp.org
-e storage-env.yaml
-e network-env.yaml
-e /usr/share/openstack-tripleo-heat-templates/environments/puppet-ceph.yaml
--control-scale 3 --control-flavor control
--compute-scale 4 --compute-flavor compute
--ceph-storage-scale 5 --ceph-storage-flavor ceph-storage
--swift-storage-flavor swift-storage --swift-storage-scale 3
-e scheduler_hints_env.yaml
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml
-e /usr/share/openstack-tripleo-heat-templates/environments/pup
pet-pacemaker.yaml
*environnement.yaml :*
parameter_defaults:
ControllerCount: 3
ComputeCount: 4
CephStorageCount: 5
OvercloudCephStorageFlavor: ceph-storage
CephDefaultPoolSize: 3
ObjectStorageCount: 3
*network-env.yaml :*
resource_registry:
OS::TripleO::Compute::Net::SoftwareConfig: /home/stack/templates/nic-conf
igs/compute.yaml
OS::TripleO::Controller::Net::SoftwareConfig:
/home/stack/templates/nic-configs/controller.yaml
OS::TripleO::CephStorage::Net::SoftwareConfig:
/home/stack/templates/nic-configs/ceph-storage.yaml
OS::TripleO::ObjectStorage::Net::SoftwareConfig:
/home/stack/templates/nic-configs/swift-storage.yaml
parameter_defaults:
InternalApiNetCidr: 172.16.0.0/24
TenantNetCidr: 172.17.0.0/24
StorageNetCidr: 172.18.0.0/24
StorageMgmtNetCidr: 172.19.0.0/24
ManagementNetCidr: 172.20.0.0/24
ExternalNetCidr: 10.41.11.0/24
InternalApiAllocationPools: [{'start': '172.16.0.10', 'end':
'172.16.0.200'}]
TenantAllocationPools: [{'start': '172.17.0.10', 'end': '172.17.0.200'}]
StorageAllocationPools: [{'start': '172.18.0.10', 'end': '172.18.0.200'}]
StorageMgmtAllocationPools: [{'start': '172.19.0.10', 'end':
'172.19.0.200'}]
ManagementAllocationPools: [{'start': '172.20.0.10', 'end':
'172.20.0.200'}]
# Leave room for floating IPs in the External allocation pool
ExternalAllocationPools: [{'start': '10.41.11.10', 'end': '10.41.11.30'}]
# Set to the router gateway on the external network
ExternalInterfaceDefaultRoute: 10.41.11.254
# Gateway router for the provisioning network (or Undercloud IP)
ControlPlaneDefaultRoute: 192.168.131.253
# The IP address of the EC2 metadata server. Generally the IP of the
Undercloud
EC2MetadataIp: 192.0.2.1
# Define the DNS servers (maximum 2) for the overcloud nodes
DnsServers: ["10.38.5.26"]
InternalApiNetworkVlanID: 202
StorageNetworkVlanID: 203
StorageMgmtNetworkVlanID: 204
TenantNetworkVlanID: 205
ManagementNetworkVlanID: 206
ExternalNetworkVlanID: 198
NeutronExternalNetworkBridge: "''"
ControlPlaneSubnetCidr: '24'
BondInterfaceOvsOptions:
"mode=balance-xor"
*storage-env.yaml :*
parameter_defaults:
ExtraConfig:
ceph::profile::params::osds:
'/dev/sdb': {}
'/dev/sdc': {}
'/dev/sdd': {}
'/dev/sde': {}
'/dev/sdf': {}
'/dev/sdg': {}
SwiftRingBuild: false
RingBuild: false
*scheduler_hints_env.yaml*
parameter_defaults:
ControllerSchedulerHints:
'capabilities:node': 'control-%index%'
NovaComputeSchedulerHints:
'capabilities:node': 'compute-%index%'
CephStorageSchedulerHints:
'capabilities:node': 'ceph-storage-%index%'
ObjectStorageSchedulerHints:
'capabilities:node': 'swift-storage-%index%'
After a little use, I found that I found that one controller is unable to
get active ha-router and I got this output :
neutron l3-agent-list-hosting-router XXX
+--------------------------------------+--------------------
----------------+----------------+-------+----------+
| id | host
| admin_state_up | alive | ha_state |
+--------------------------------------+--------------------
----------------+----------------+-------+----------+
| 420a7e31-bae1-4f8c-9438-97839cf190c4 | overcloud-controller-0.localdomain
| True | :-) | standby |
| 6a943aa5-6fd1-4b44-8557-f0043b266a2f | overcloud-controller-1.localdomain
| True | :-) | standby |
| dd66ef16-7533-434f-bf5b-25e38c51375f | overcloud-controller-2.localdomain
| True | :-) | standby |
+--------------------------------------+--------------------
----------------+----------------+-------+----------+
So each time a router is schedule on this controller I can't get an active
router. I tried to compare the configuration but everything seems to be
good. I redeployed to see if it help, and the only thing that change is the
controller where the ha-router are stuck.
The only message that I got is fron OVS :
2017-10-20 08:38:44.930 136145 WARNING neutron.agent.rpc
[req-0ad9aec4-f718-498f-9ca7-15b265340174 - - - - -] Device
Port(admin_state_up=True,allowed_address_pairs=[],
binding=PortBinding,binding_levels=[],created_at=2017-10-
20T08:38:38Z,data_plane_status=<?>,description='',
device_id='a7e23552-9329-4572-a69d-d7f316fcc5c9',device_
owner='network:router_ha_interface',dhcp_options=[],
distributed_binding=None,dns=None,fixed_ips=[IPAllocation],
id=7b6d81ef-0451-4216-9fe5-52d921052cb7,mac_address=fa:16:3e:13:e9:3c,name='HA
port tenant 0ee0af8e94044a42923873939978ed42',network_id=ffe5ffa5-2693-
4d35-988e-7290899601e0,project_id='',qos_policy_id=None,revision_number=5,
security=PortSecurity(7b6d81ef-0451-4216-9fe5-52d921052cb7),security_group_
ids=set([]),status='DOWN',updated_at=2017-10-20T08:38:44Z) is not bound.
2017-10-20 08:38:44.944 136145 WARNING neutron.plugins.ml2.drivers.
openvswitch.agent.ovs_neutron_agent [req-0ad9aec4-f718-498f-9ca7-15b265340174
- - - - -] Device 7b6d81ef-0451-4216-9fe5-52d921052cb7 not defined on
plugin or binding failed
Any Idea ?
--
LECOMTE Cedric
Senior software ENgineer
Red Hat
<https://www.redhat.com>
clecomte at redhat.com
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20171023/e8ef5df9/attachment.html>
More information about the dev
mailing list