Re: [Rdo-list] Defects in Kilo RC2 Packaging
by Steven Dake (stdake)
From: Steven Dake <stdake(a)cisco.com<mailto:stdake@cisco.com>>
Date: Sunday, May 3, 2015 at 10:15 AM
To: "rdo-list(a)redhat.com<mailto:rdo-list@redhat.com>" <rdo-list(a)redhat.com<mailto:rdo-list@redhat.com>>
Subject: [Rdo-list] Defects in Kilo RC2 Packaging
Hi,
I recently ported Kolla (OpenStack running in containers – http://github.com/stackforge/kolla) and found the following defects:
1. Glance has missing dependencies in its package
Specifically +RUN yum -y install openstack-glance python-oslo-log
python-oslo-policy && yum clean all
Is needed to get glance to operate. Oslo-log and oslo-policy should be
added to the dependencies. You wouldn¹t notice this on an AIO install
because other packages probably have those packages as dependencies.
2. Neutron for whatever reason depends on a file fwwas_driver.ini which
has been removed from the master of neutron. But the agents will exit if
its not in the config directory. I used juno¹s version of
fwaas_driver.ini to get the agents to stop exiting.
3. The file dnsmasq-neutron.conf is misconfigured in the default
installation. This causes the neutron agents to exit. I delete the file
during docker build which fixes the problem. I¹m not sure what this
config file is supposed to look like.
I found the root cause of this problem. This was actually an error in Kolla. Neutron-dnsmasq.conf (where you would specify a 1450 MTU if using vlans) cannot go in the same directory as /etc/neutron where the agents read config files when used with the —config-dir option. The agents try to read all configuration files there as INI format files, which dnsmasq is not formatted as.
4. A critical bug was found in both Juno and Kilo versions of nova. If I
launch approximately 20 Vms via a heat resource group with floating ips,
only about 7 of the Vms get ports assigned. The others do get their ports
assigned because they can access dhcp and metadata server, so their
networking is operational. Neutron port-list shows their ports are
active. However nova-list does not show their IPs from the instance info
cache.
My only workaround to this problem is to run the icehouse version of nova
(api, conductor, scheduler, compute) which works perfectly. I have filed
a bug with a 100% reliable easy to use reproducer and more details and
logs here:
https://bugzilla.redhat.com/show_bug.cgi?id=1213547
Interestingly in my informal tests icehouse nova is about 4x faster at
placing Vms in the active state as compared to juno or kilo, so that may
need some attention as well. Just watching top, it appears neutron-server
is much busier (~35% cpu utilization of 1 core during the entire ->ACTIVE
process) with the juno/kilo releases.
Note I spent about 7 days trying to debug this problem but the code
literally calls IP assignments in about 40 different places in the code
base, including exchanges over RPC and python-neutronclient, so it is very
difficult to track. I would appreciate finding a nova expert to debug the
problem further.
Other than those problems, RDO Kilo RC2 looks spectacular and works
perfectly in my dead chicken testing. Nice job guys!
Regards
-steve
9 years, 8 months
[Rdo-list] neutron-openvswitch-agent without ping lost
by Chris
Hello,
We made some changes on our compute nodes in the
"/etc/neutron/neutron.conf". For example qpid_hostname. But nothing what
effects the network infrastructure in the compute node.
To apply the changes I think we need to restart the
"neutron-openvswitch-agent" service.
By restarting this service the VM gets disconnected for around one ping, the
reason is the restart causes recreation of the int-br-bond0 and phy-br-bond0
interfaces:
ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=10 --
--may-exist add-br br-int
ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=10 --
set-fail-mode br-int secure
ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=10 --
--if-exists del-port br-int patch-tun
ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=10 --
--if-exists del-port br-int int-br-bond0
kernel: [73873.047999] device int-br-bond0 left promiscuous mode
ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=10 --
--if-exists del-port br-bond0 phy-br-bond0
kernel: [73873.086241] device phy-br-bond0 left promiscuous mode
ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=10 --
--may-exist add-port br-int int-br-bond0
kernel: [73873.287466] device int-br-bond0 entered promiscuous mode
ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=10 --
--may-exist add-port br-bond0 phy-br-bond0
Is there a way to apply this changes without loose pings?
Cheers
Chris
9 years, 8 months
[Rdo-list] neutron-openvswitch-agent reload without ping lost
by Chris
Hello,
We made some changes on our compute nodes in the
“/etc/neutron/neutron.conf”. For example qpid_hostname. But nothing what
effects the network infrastructure in the compute node.
To apply the changes I think we need to restart the
“neutron-openvswitch-agent” service.
By restarting this service the VM gets disconnected for around one ping,
the reason is the restart causes recreation of the int-br-bond0 and
phy-br-bond0 interfaces:
ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl
--timeout=10 -- --may-exist add-br br-int
ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl
--timeout=10 -- set-fail-mode br-int secure
ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl
--timeout=10 -- --if-exists del-port br-int patch-tun
ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl
--timeout=10 -- --if-exists del-port br-int int-br-bond0
kernel: [73873.047999] device int-br-bond0 left promiscuous mode
ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl
--timeout=10 -- --if-exists del-port br-bond0 phy-br-bond0
kernel: [73873.086241] device phy-br-bond0 left promiscuous mode
ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl
--timeout=10 -- --may-exist add-port br-int int-br-bond0
kernel: [73873.287466] device int-br-bond0 entered promiscuous mode
ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl
--timeout=10 -- --may-exist add-port br-bond0 phy-br-bond0
Is there a way to apply this changes without loose pings?
Cheers
Chris
9 years, 8 months
[Rdo-list] Instance auto resume after compute node restart
by Chris
Hello,
We want to have instances auto resume their status after a compute node
reboot/failure. Means when the VM has the running state before it should
be automatically started. We are using Icehouse.
There is the option resume_guests_state_on_host_boot=true|false which
should exactly do what we want:
# Whether to start guests that were running before the host
# rebooted (boolean value)
resume_guests_state_on_host_boot=true
I tried it out and it just didn’t work. Libvirt fails to start the VMs
because I couldn’t find the interfaces:
2015-04-30 06:16:00.783+0000: 3091: error : virNetDevGetMTU:343 : Cannot
get interface MTU on 'qbr62d7e489-f8': No such device
2015-04-30 06:16:00.897+0000: 3091: warning : qemuDomainObjStart:6144 :
Unable to restore from managed state
/var/lib/libvirt/qemu/save/instance-0000025f.save. Maybe the file is
corrupted?
I did some research and found some corresponding experiences from other
users:
“AFAIK at the present time OpenStack (Icehouse) still not completely
aware about environments inside it, so it can't restore completely after
reboot.”
Source:
http://stackoverflow.com/questions/23150148/how-to-get-instances-back-aft...
Is this feature really broken or do I just miss something?
Thanks in advance!
Cheers
Chris
9 years, 8 months
[Rdo-list] Kilo RC2 issue with Ceilometer (Resources Usage in Dashboard)
by Arash Kaffamanesh
On a 2 node install for Kilo RC2 Ceilometer Resource Usage doesn't work.
(screenshot attached).
[root@csky01 ~]# tail -f /var/log/ceilometer/compute.log
2015-05-01 16:41:18.038 19053 TRACE ceilometer.coordination
2015-05-01 16:41:18.039 19053 ERROR ceilometer.coordination [-] Error
sending a heartbeat to coordination backend.
2015-05-01 16:41:18.039 19053 TRACE ceilometer.coordination Traceback (most
recent call last):
2015-05-01 16:41:18.039 19053 TRACE ceilometer.coordination File
"/usr/lib/python2.7/site-packages/ceilometer/coordination.py", line 105, in
heartbeat
2015-05-01 16:41:18.039 19053 TRACE ceilometer.coordination
self._coordinator.heartbeat()
2015-05-01 16:41:18.039 19053 TRACE ceilometer.coordination File
"/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 378, in
heartbeat
2015-05-01 16:41:18.039 19053 TRACE ceilometer.coordination value=b"Not
dead!")
2015-05-01 16:41:18.039 19053 TRACE ceilometer.coordination File
"/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
2015-05-01 16:41:18.039 19053 TRACE ceilometer.coordination
self.gen.throw(type, value, traceback)
2015-05-01 16:41:18.039 19053 TRACE ceilometer.coordination File
"/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 77, in
_translate_failures
2015-05-01 16:41:18.039 19053 TRACE ceilometer.coordination raise
coordination.ToozConnectionError(utils.exception_message(e))
2015-05-01 16:41:18.039 19053 TRACE ceilometer.coordination
ToozConnectionError: Error 111 connecting to 20.0.0.11:6379. ECONNREFUSED.
2015-05-01 16:41:18.039 19053 TRACE ceilometer.coordination
2015-05-01 16:41:19.038 19053 ERROR ceilometer.coordination [-] Error
connecting to coordination backend.
Thx,
Arash
9 years, 8 months
[Rdo-list] Update to Quickstart guide
by Will Yonker
Hi,
I did an install of RDO on one of my Linux boxes. Our
default setup has root logon through SSH disabled. Perhaps is should
be mentioned in the prerequisites section of the quickstart guide that
root logon should be enabled in /etc/ssh/sshd_config (PermitRootLogin
yes)?
---
Will Y.
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
9 years, 8 months
[Rdo-list] Kilo released - what's new?
by Rich Bowen
As you no doubt know by now, OpenStack Kilo released yesterday. You can
read the release notes at
https://wiki.openstack.org/wiki/ReleaseNotes/Kilo There's a lot there.
Now that the release is out and you have nothing else to do (ha, ha), I
was wondering if a few people would be willing to spend 10-15 minutes
talking with me about the various projects, and what the really great
new bits are. If we can do this as Google hangouts, I can record them,
and then write up a series of blog posts about each of the projects and
what people should get excited about in Kilo.
If you could get in touch with me to claim a topic, that would be great,
otherwise I'll start hunting you down as I work through the list.
So, here's the list of topics that I'd like to cover, just working
through the release notes:
* Swift
* Nova
* Glance
* Horizon
* Keystone
* Neutron
* Cinder
* Ceilometer
* Heat
* Trove
* Ironic
* Documentation
I'll also probably be bugging people on IRC.
Thanks!
--
Rich Bowen - rbowen(a)redhat.com
OpenStack Community Liaison
http://rdoproject.org/
9 years, 8 months
Re: [Rdo-list] FW: RDO build that passed CI (rc2)
by Arash Kaffamanesh
I did a CenOS fresh install with the following steps for AIO:
yum -y update
cat /etc/redhat-release
CentOS Linux release 7.1.1503 (Core)
yum install http://rdoproject.org/repos/openstack-kilo/rdo-release-kilo.rpm
yum install epel-release
cd /etc/yum.repos.d/
curl -O
https://repos.fedorapeople.org/repos/openstack/openstack-trunk/epel-7/rc2...
yum install openstack-packstack
setenforce 0
packstack --allinone
and got again:
Error: nmcli (1.0.0) and NetworkManager (0.9.9.1) versions don't match.
Force execution using --nocheck, but the results are unpredictable.
But if I don't do a yum update and install AIO it finishes successfully and
I can yum update afterwards.
So if nobody can reproduce this issue, then something is wrong with my base
CentOS install, I'll try to install the latest CentOS from ISO now.
Thanks!
Arash
On Fri, May 1, 2015 at 12:42 AM, Arash Kaffamanesh <ak(a)cloudssky.com> wrote:
> I'm installing CentOS with cobbler and kickstart (from centos7-mini) on 2
> machines
> and I'm trying a 2 node install. With rc1 it worked without yum update.
> I'll do a fresh install now with yum update and let you know.
>
> Thanks!
> Arash
>
>
>
> On Fri, May 1, 2015 at 12:23 AM, Alan Pevec <apevec(a)gmail.com> wrote:
>
>> 2015-05-01 0:12 GMT+02:00 Arash Kaffamanesh <ak(a)cloudssky.com>:
>> > But if I yum update it into 7.1, then we have the issue with nmcli:
>> >
>> > Error: nmcli (1.0.0) and NetworkManager (0.9.9.1) versions don't match.
>> > Force execution using --nocheck, but the results are unpredictable.
>>
>> Huh, again?! I thought that was solved after you did yum update...
>> My original answer to that is still the same "Not sure how could that
>> happen, nmcli is part of NetworkManager RPM."
>> Can you reproduce this w/o RDO in the picture, starting with the clean
>> centos installation? How are you installing centos?
>>
>> Cheers,
>> Alan
>>
>
>
9 years, 8 months