On Tue, 2014-01-28 at 10:56 -0500, Ben Nemec wrote:
 
 ----- Original Message -----
 > On 01/26/2014 12:01 AM, Perry Myers wrote:
 > > Ok, I've been chasing down some networking issues along with some other
 > > folks.  Here's what I'm seeing:
 > > 
 > > Starting with a vanilla F20 cloud image running on a F20 host, clone
 > > devstack into it and run stack.sh.
 > > 
 > > First thing is that the RabbitMQ server issue I noted a few weeks ago is
 > > still intermittently there.  So during the step where rabbitmqctl is run
 > > to set the password of the rabbit admin user, it might fail and all
 > > subsequent AMQP communication fails which makes a lot of the nova
 > > commands in devstack also fail.
 > > 
 > > But... if you get past this error (since it is intermittent), then
 > > devstack seems to complete successfully.  Standard commands like nova
 > > list, keystone user-list, etc all work fine.
 > > 
 > > I did note though that access to Horizon does not work.  I need to
 > > investigate this further.
 > > 
 > > But worse than that is when you run nova boot, the host to guest
 > > networking (remember this is devstack running in a VM) immediately gets
 > > disconnected.  This issue is 100% reproducible and multiple users are
 > > reporting it (tsedovic, eharney, bnemec cc'd)
 > > 
 > > I did some investigation when this happens and here's what I found...
 > > 
 > > If I do:
 > > 
 > > $ brctl delif br100 eth0
 > > 
 > > I was immediately able to ping the guest from the host and vice versa.
 > > 
 > > If I reattach eth0 back to br100, networking stops again
 > > 
 > > Another thing... I notice that on the system br100 does not have an ip
 > > address, but eth0 does.  I thought when doing bridged networking like
 > > this, the bridge should have the ip address and the physical iface that
 > > is attached to the bridge does not get an ip addr.
 > > 
 > > So... I tweaked /etc/sysconfig/network-scripts/ifcfg-eth0 to remove the
 > > dhcp from the bootproto line and I copied ifcfg-eth0 to ifcfg-br100
 > > allowing it to use bootproto dhcp
 > > 
 > > I brought both ifaces down and then brought them both up.  eth0 first
 > > and br100 second
 > > 
 > > This time, br100 got the dhcp address from the host and networking
 > > worked fine.
 > > 
 > > So is this just an issue with how nova is setting up bridges?
 > > 
 > > Since this network disconnect didn't happen until nova launched a vm, I
 > > imagine this isn't a problem with devstack itself, but is likely an
 > > issue with Nova Networking somehow.
 > > 
 > > Russell/DanS, is there any chance that all of the refactoring you did in
 > > Nova Networking very recently introduce a regression?
 > 
 > I suppose it's possible.  You could try going back to before any of the
 > nova-network-objects patches went in.  The first one to merge was:
 > 
 > commit a8c73c7d3298589440579d67e0c5638981dd7718
 > Merge: a1f6e85 aa40c8f
 > Author: Jenkins <jenkins(a)review.openstack.org>
 > Date:   Wed Jan 15 18:38:37 2014 +0000
 > 
 >     Merge "Make nova-network use Service object"
 > 
 > Try going back to before that and see if you get a different result.  If
 > so, try using "git bisect" to find the offending commit.
 > 
 > --
 > Russell Bryant
 > 
 
 I am still unable to reproduce the networking issue in my environment.  I booted a stock
Fedora 20 cloud image, installed git, cloned devstack and ran it with a minimal localrc
configuration (so using the defaults of nova-network and rabbitmq).  Other than the
rabbitmq race issue that always makes my first stack.sh run on a new VM fail, I had no
problem completing stack.sh and booting a nova instance.  The instance's IP was
correctly moved to the bridge for me.  If this is a regression in nova network then it
only presents in combination with some other circumstance that isn't present for me.
 
 As I mentioned in our off-list discussion of this, I run my development VM's in a
local OpenStack installation I have, so maybe there's some difference in the way the
networking works there.  In any case, while I don't have an answer hopefully another
data point will help in figuring this out. 
OK.  I can provide a data point that may or may not be useful.
I am getting the same behavior Perry reported.  Install is successful,
first VM launch kills the network.  After much head bashing and
experimentation I found that everything worked correctly if I assigned
the FLAT_INTERFACE in my localrc file to a second, unused NIC on my test
system.  e.g.
FLAT_INTERFACE=p4p2
Prior to that I'd had all of the various localrc interface variables
pointing to the single primary NIC. e.g.
HOST_IP_IFACE=p4p1
PUBLIC_INTERFACE=p4p1
VLAN_INTERFACE=p4p1
FLAT_INTERFACE=p4p1
This style of config (everything on one NIC) is advocated in the single
node getting started guide:
http://devstack.org/guides/single-machine.html
I observed this while testing on commit:
3f5250fff3007dfd1e5992c0cf229be9033a5726
-Ian
 
 -Ben
 
 _______________________________________________
 Rdo-list mailing list
 Rdo-list(a)redhat.com
 
https://www.redhat.com/mailman/listinfo/rdo-list