----- Original Message -----
On 01/26/2014 12:01 AM, Perry Myers wrote:
> Ok, I've been chasing down some networking issues along with some other
> folks. Here's what I'm seeing:
>
> Starting with a vanilla F20 cloud image running on a F20 host, clone
> devstack into it and run stack.sh.
>
> First thing is that the RabbitMQ server issue I noted a few weeks ago is
> still intermittently there. So during the step where rabbitmqctl is run
> to set the password of the rabbit admin user, it might fail and all
> subsequent AMQP communication fails which makes a lot of the nova
> commands in devstack also fail.
>
> But... if you get past this error (since it is intermittent), then
> devstack seems to complete successfully. Standard commands like nova
> list, keystone user-list, etc all work fine.
>
> I did note though that access to Horizon does not work. I need to
> investigate this further.
>
> But worse than that is when you run nova boot, the host to guest
> networking (remember this is devstack running in a VM) immediately gets
> disconnected. This issue is 100% reproducible and multiple users are
> reporting it (tsedovic, eharney, bnemec cc'd)
>
> I did some investigation when this happens and here's what I found...
>
> If I do:
>
> $ brctl delif br100 eth0
>
> I was immediately able to ping the guest from the host and vice versa.
>
> If I reattach eth0 back to br100, networking stops again
>
> Another thing... I notice that on the system br100 does not have an ip
> address, but eth0 does. I thought when doing bridged networking like
> this, the bridge should have the ip address and the physical iface that
> is attached to the bridge does not get an ip addr.
>
> So... I tweaked /etc/sysconfig/network-scripts/ifcfg-eth0 to remove the
> dhcp from the bootproto line and I copied ifcfg-eth0 to ifcfg-br100
> allowing it to use bootproto dhcp
>
> I brought both ifaces down and then brought them both up. eth0 first
> and br100 second
>
> This time, br100 got the dhcp address from the host and networking
> worked fine.
>
> So is this just an issue with how nova is setting up bridges?
>
> Since this network disconnect didn't happen until nova launched a vm, I
> imagine this isn't a problem with devstack itself, but is likely an
> issue with Nova Networking somehow.
>
> Russell/DanS, is there any chance that all of the refactoring you did in
> Nova Networking very recently introduce a regression?
I suppose it's possible. You could try going back to before any of the
nova-network-objects patches went in. The first one to merge was:
commit a8c73c7d3298589440579d67e0c5638981dd7718
Merge: a1f6e85 aa40c8f
Author: Jenkins <jenkins(a)review.openstack.org>
Date: Wed Jan 15 18:38:37 2014 +0000
Merge "Make nova-network use Service object"
Try going back to before that and see if you get a different result. If
so, try using "git bisect" to find the offending commit.
--
Russell Bryant
I am still unable to reproduce the networking issue in my environment. I booted a stock
Fedora 20 cloud image, installed git, cloned devstack and ran it with a minimal localrc
configuration (so using the defaults of nova-network and rabbitmq). Other than the
rabbitmq race issue that always makes my first stack.sh run on a new VM fail, I had no
problem completing stack.sh and booting a nova instance. The instance's IP was
correctly moved to the bridge for me. If this is a regression in nova network then it
only presents in combination with some other circumstance that isn't present for me.
As I mentioned in our off-list discussion of this, I run my development VM's in a
local OpenStack installation I have, so maybe there's some difference in the way the
networking works there. In any case, while I don't have an answer hopefully another
data point will help in figuring this out.
-Ben