[Rdo-list] RE(2) Failure to start openstack-nova-compute on Compute Node when testing delorean RC2 or CI repo on CentOS 7.1

Mon May 11 18:05:47 UTC 2015

Steve,

Thanks!

I pulled magnum from git on devstack, dropped the magnum db, created a new
one
and tried to create a bay, now I'm getting "went to status error due to
unknown" as below.

Nova  and magnum bay-list list shows:

ubuntu at magnum:~/devstack$ nova list

+--------------------------------------+-------------------------------------------------------+--------+------------+-------------+-----------------------------------------------------------------------+

| ID                                   | Name
                    | Status | Task State | Power State | Networks
                                                    |

+--------------------------------------+-------------------------------------------------------+--------+------------+-------------+-----------------------------------------------------------------------+

| 797b6057-1ddf-4fe3-8688-b63e5e9109b4 |
te-h5yvoiptrmx3-0-4w4j2ltnob7a-kube_node-vg7rojnafrub | ERROR  | -
| NOSTATE     | testbay-6kij6pvui3p7-fixed_network-46mvxv7yfjzw=10.0.0.5,
2001:db8::f |

| c0b56f08-8a4d-428a-aee1-b29ca6e68163 |
testbay-6kij6pvui3p7-kube_master-z3lifgrrdxie         | ACTIVE | -
| Running     | testbay-6kij6pvui3p7-fixed_network-46mvxv7yfjzw=10.0.0.3,
2001:db8::d |

+--------------------------------------+-------------------------------------------------------+--------+------------+-------------+-----------------------------------------------------------------------+

ubuntu at magnum:~/devstack$ magnum bay-list

+--------------------------------------+---------+------------+---------------+

| uuid                                 | name    | node_count | status
  |

+--------------------------------------+---------+------------+---------------+

| 87e36c44-a884-4cb4-91cc-c7ae320f33b4 | testbay | 2          |
CREATE_FAILED |

+--------------------------------------+---------+------------+---------------+

e3a65b05f", "flannel_network_subnetlen": "24", "fixed_network_cidr": "
10.0.0.0/24", "OS::stack_id": "d0246d48-23e0-4aa0-87e0-052b2ca363e8",
"OS::stack_name": "testbay-6kij6pvui3p7", "master_flavor": "m1.small",
"external_network_id": "e3e2a633-1638-4c11-a994-7179a24e826e",
"portal_network_cidr": "10.254.0.0/16", "docker_volume_size": "5",
"ssh_key_name": "testkey", "kube_allow_priv": "true", "number_of_minions":
"2", "flannel_use_vxlan": "false", "flannel_network_cidr": "10.100.0.0/16",
"server_flavor": "m1.medium", "dns_nameserver": "8.8.8.8", "server_image":
"fedora-21-atomic-3"}, "id": "d0246d48-23e0-4aa0-87e0-052b2ca363e8",
"outputs": [{"output_value": ["2001:db8::f", "2001:db8::e"], "description":
"No description given", "output_key": "kube_minions_external"},
{"output_value": ["10.0.0.5", "10.0.0.4"], "description": "No description
given", "output_key": "kube_minions"}, {"output_value": "2001:db8::d",
"description": "No description given", "output_key": "kube_master"}],
"template_description": "This template will boot a Kubernetes cluster with
one or more minions (as specified by the number_of_minions parameter, which
defaults to \"2\").\n"}}

 log_http_response
/usr/local/lib/python2.7/dist-packages/heatclient/common/http.py:141

2015-05-11 17:31:15.968 30006 ERROR magnum.conductor.handlers.bay_k8s_heat
[-] Unable to create bay, stack_id: d0246d48-23e0-4aa0-87e0-052b2ca363e8,
reason: Resource CREATE failed: ResourceUnknownStatus: Resource failed -
Unknown status FAILED due to "Resource CREATE failed:
ResourceUnknownStatus: Resource failed - Unknown status FAILED due to
"Resource CREATE failed: ResourceInError: Went to status error due to
"Unknown"""

Any Idea?

Thanks!
-Arash

On Mon, May 11, 2015 at 2:04 AM, Steven Dake (stdake) <stdake at cisco.com>
wrote:

>  Arash,
>
>  The short of it is Magnum 2015.1.0 is DOA.
>
>  Four commits have hit the repository in the last hour to fix these
> problems.  Now Magnum works with v1beta3 of the kubernetes 0.15 v1betav3
> examples with the exception of the service object.  We are actively working
> on that problem upstream – I’ll update when its fixed.
>
>  To see my run check out:
>
> http://ur1.ca/kc613 -> http://paste.fedoraproject.org/220479/13022911
>
>  To upgrade and see everything working but the service object, you will
> have to remove your openstack-magnum package if using my COPR repo or git
> pull on your Magnum repo if using devstack.
>
>  Boris - interested to hear the feedback on a CentOS distro operation
> once we get that service bug fixed.
>
>  Regards
> -steve
>
>
>   From: Arash Kaffamanesh <ak at cloudssky.com>
> Date: Sunday, May 10, 2015 at 4:10 PM
> To: Steven Dake <stdake at cisco.com>
>
> Cc: "rdo-list at redhat.com" <rdo-list at redhat.com>
> Subject: Re: [Rdo-list] RE(2) Failure to start openstack-nova-compute on
> Compute Node when testing delorean RC2 or CI repo on CentOS 7.1
>
>   Steve,
>
>  Thanks for your kind advice.
>
>  I'm trying to go first through the quick start for magnum with devstack
> on ubuntu and I'm also
> following this guide to create a bay with 2 nodes:
>
>
> http://git.openstack.org/cgit/openstack/magnum/tree/doc/source/dev/dev-quickstart.rst
>
>  I got somehow far, but by running this step to run the service tp
> provide a discoverable endpoint for the redis sentinels in the cluster:
>
>   magnum service-create --manifest ./redis-sentinel-service.yaml --bay testbay
>
>
> I'm getting:
>
>
> ERROR: Invalid resource state. (HTTP 409)
>
>
> In the console, I see:
>
>
> 2015-05-10 22:19:44.010 4967 INFO oslo_messaging._drivers.impl_rabbit [-] Connected to AMQP server on 127.0.0.1:5672
>
> 2015-05-10 22:19:44.050 4967 WARNING wsme.api [-] Client-side error: Invalid resource state.
>
> 127.0.0.1 - - [10/May/2015 22:19:44] "POST /v1/rcs HTTP/1.1" 409 115
>
>
> The testbay is running with 2 nodes properly:
>
> ubuntu at magnum:~/kubernetes/examples/redis$ magnum bay-list
>
> | 4fa480a7-2d96-4a3e-876b-1c59d67257d6 | testbay | 2          | CREATE_COMPLETE |
>
>
> Any ideas, where I could dig for the problem?
>
>
> By the way after running "magnum pod-create .." the status shows "failed"
>
>
> ubuntu at magnum:~/kubernetes/examples/redis/v1beta3$ magnum pod-create --manifest ./redis-master.yaml --bay testbay
>
> +--------------+---------------------------------------------------------------------+
>
> | Property     | Value                                                               |
>
> +--------------+---------------------------------------------------------------------+
>
> | status       | failed                                                              |
>
>
> And the pod-list shows:
>
> ubuntu at magnum:~$ magnum pod-list
>
> +--------------------------------------+--------------+
>
> | uuid                                 | name         |
>
> +--------------------------------------+--------------+
>
> | 8d6977c1-a88f-45ee-be6c-fd869874c588 | redis-master |
>
>
> I tried also to set the status to running in the pod database table, but it didn't help.
>
> P.S.: I tried also to run the whole thing on fedora 21 with devstack, but I got more problems as on Ubuntu.
>
>
> Many thanks in advance for your help!
>
> Arash
>
>
>
> On Mon, May 4, 2015 at 12:54 AM, Steven Dake (stdake) <stdake at cisco.com>
> wrote:
>
>>  Boris,
>>
>>  Feel free to try out my Magnum packages here.  They work in containers,
>> not sure about CentOS.  I’m not certain the systemd files are correct (I
>> didn’t test that part) but the dependencies are correct:
>>
>>  https://copr.fedoraproject.org/coprs/sdake/openstack-magnum/
>>
>>  NB you will have to run through the quickstart configuration guide here:
>>
>>
>> https://github.com/openstack/magnum/blob/master/doc/source/dev/dev-manual-devstack.rst
>>
>>  *Regards*
>> *-steve*
>>
>>   From: Boris Derzhavets <bderzhavets at hotmail.com>
>> Date: Sunday, May 3, 2015 at 11:20 AM
>> To: Arash Kaffamanesh <ak at cloudssky.com>
>> Cc: "rdo-list at redhat.com" <rdo-list at redhat.com>
>>
>> Subject: Re: [Rdo-list] RE(2) Failure to start openstack-nova-compute on
>> Compute Node when testing delorean RC2 or CI repo on CentOS 7.1
>>
>>   Arash,
>>
>> Please, disregard this notice :-
>>
>> >You wrote :-
>>
>> >> What I noticed here, if I associate a floating ip to a VM with 2
>> interfaces, then I'll lose the
>> >> connectivity >to the instance and Kilo
>>
>> Different types of VMs  in yours and mine environments.
>>
>> Boris.
>>
>>  ------------------------------
>> Date: Sun, 3 May 2015 16:51:54 +0200
>> Subject: Re: [Rdo-list] RE(2) Failure to start openstack-nova-compute on
>> Compute Node when testing delorean RC2 or CI repo on CentOS 7.1
>> From: ak at cloudssky.com
>> To: bderzhavets at hotmail.com
>> CC: apevec at gmail.com; rdo-list at redhat.com
>>
>> Boris, thanks for your kind feedback.
>>
>> I did a 3 node Kilo RC2 virt setup on top of my Kilo RC2 which was
>> installed on bare metal.
>> The installation was successful by the first run.
>>
>> The network looks like this:
>> https://cloudssky.com/.galleries/images/kilo-virt-setup.png
>>
>>  For this setup I added the latest CentOS cloud image to glance, ran an
>> instance (controller), enabled root login,
>> added ifcfg-eth1 to the instance, created a snapshot from the controller,
>> added the repos to this instance, yum updated,
>> rebooted and spawn the network and compute1 vm nodes from that snapshot.
>> (To be able to ssh into the VMs over 20.0.1.0 network, I created the gate
>> VM with a floating ip assigned and installed OpenVPN
>>  on it.)
>>
>>  What I noticed here, if I associate a floating ip to a VM with 2
>> interfaces, then I'll lose the connectivity to the instance and Kilo
>> becomes crazy (the AIO controller on bare metal lose somehow its br-ex
>> interface, but I didn't try to reproduce it again).
>>
>>  The packstack file was created in interactive mode with:
>>
>>  packstack --answer-file= --> press enter
>>
>>  I accepted most default values and selected trove and heat to be
>> installed.
>>
>>  The answers are on pastebin:
>>
>>  http://pastebin.com/SYp8Qf7d
>>
>> The generated packstack file is here:
>>
>>  http://pastebin.com/XqJuvQxf
>>  The br-ex interfaces and changes to eth0 are created on network and
>> compute nodes correctly (output below).
>> And one nice thing for me coming from Havana was to see how easy has got
>> to create an image in Horizon
>> by uploading an image file (in my case rancheros.iso and centos.qcow2
>> worked like a charm).
>> Now its time to discover Ironic, Trove and Manila and if someone has some
>> tips or guidelines on how to test these
>> new exciting things or has any news about Murano or Magnum on RDO, then
>> I'll be more lucky and excited
>> as I'm now about Kilo :-)
>> Thanks!
>> Arash
>> ---
>> Some outputs here:
>> [root at controller ~(keystone_admin)]# nova hypervisor-list
>> +----+---------------------+-------+---------+
>> | ID | Hypervisor hostname | State | Status  |
>> +----+---------------------+-------+---------+
>> | 1  | compute1.novalocal   | up    | enabled |
>>
>> +----+---------------------+-------+---------+
>> [root at network ~]# ovs-vsctl show
>> 436a6114-d489-4160-b469-f088d66bd752
>>     Bridge br-tun
>>         fail_mode: secure
>>         Port "vxlan-14000212"
>>             Interface "vxlan-14000212"
>>                 type: vxlan
>>                 options: {df_default="true", in_key=flow,
>> local_ip="20.0.2.19", out_key=flow, remote_ip="20.0.2.18"}
>>         Port br-tun
>>             Interface br-tun
>>                 type: internal
>>         Port patch-int
>>             Interface patch-int
>>                 type: patch
>>                 options: {peer=patch-tun}
>>     Bridge br-int
>>         fail_mode: secure
>>         Port br-int
>>             Interface br-int
>>                 type: internal
>>         Port int-br-ex
>>             Interface int-br-ex
>>                 type: patch
>>                 options: {peer=phy-br-ex}
>>         Port patch-tun
>>             Interface patch-tun
>>                 type: patch
>>                 options: {peer=patch-int}
>>     Bridge br-ex
>>         Port br-ex
>>             Interface br-ex
>>                 type: internal
>>         Port phy-br-ex
>>             Interface phy-br-ex
>>                 type: patch
>>                 options: {peer=int-br-ex}
>>         Port "eth0"
>>             Interface "eth0"
>>
>>     ovs_version: "2.3.1"
>>
>>
>> [root at compute~]# ovs-vsctl show
>> 8123433e-b477-4ef5-88aa-721487a4bd58
>>     Bridge br-int
>>         fail_mode: secure
>>         Port int-br-ex
>>             Interface int-br-ex
>>                 type: patch
>>                 options: {peer=phy-br-ex}
>>         Port patch-tun
>>             Interface patch-tun
>>                 type: patch
>>                 options: {peer=patch-int}
>>         Port br-int
>>             Interface br-int
>>                 type: internal
>>     Bridge br-tun
>>         fail_mode: secure
>>         Port br-tun
>>             Interface br-tun
>>                 type: internal
>>         Port patch-int
>>             Interface patch-int
>>                 type: patch
>>                 options: {peer=patch-tun}
>>         Port "vxlan-14000213"
>>             Interface "vxlan-14000213"
>>                 type: vxlan
>>                 options: {df_default="true", in_key=flow,
>> local_ip="20.0.2.18", out_key=flow, remote_ip="20.0.2.19"}
>>     Bridge br-ex
>>         Port phy-br-ex
>>             Interface phy-br-ex
>>                 type: patch
>>                 options: {peer=int-br-ex}
>>         Port "eth0"
>>             Interface "eth0"
>>         Port br-ex
>>             Interface br-ex
>>                 type: internal
>>
>>     ovs_version: "2.3.1"
>>
>>
>>
>>
>>
>>
>>  On Sat, May 2, 2015 at 9:02 AM, Boris Derzhavets <
>> bderzhavets at hotmail.com> wrote:
>>
>>  Thank you once again it really works.
>>
>> [root at ip-192-169-142-127 ~(keystone_admin)]# nova hypervisor-list
>> +----+----------------------------------------+-------+---------+
>> | ID | Hypervisor hostname                    | State | Status  |
>> +----+----------------------------------------+-------+---------+
>> | 1  | ip-192-169-142-127.ip.secureserver.net | up    | enabled |
>> | 2  | ip-192-169-142-137.ip.secureserver.net | up    | enabled |
>> +----+----------------------------------------+-------+---------+
>>
>> [root at ip-192-169-142-127 ~(keystone_admin)]# nova hypervisor-servers
>> ip-192-169-142-137.ip.secureserver.net
>>
>> +--------------------------------------+-------------------+---------------+----------------------------------------+
>> | ID                                   | Name              | Hypervisor
>> ID | Hypervisor Hostname                    |
>>
>> +--------------------------------------+-------------------+---------------+----------------------------------------+
>> | 16ab7825-1403-442e-b3e2-7056d14398e0 | instance-00000002 |
>> 2             | ip-192-169-142-137.ip.secureserver.net |
>> | 5fa444c8-30b8-47c3-b073-6ce10dd83c5a | instance-00000004 |
>> 2             | ip-192-169-142-137.ip.secureserver.net |
>>
>> +--------------------------------------+-------------------+---------------+----------------------------------------+
>>
>> with only one issue:-
>>
>>  during AIO run CONFIG_NEUTRON_OVS_TUNNEL_IF=
>>  during Compute Node setup CONFIG_NEUTRON_OVS_TUNNEL_IF=eth1
>>
>>  and finally it results mess in ml2_vxlan_endpoints table. I had manually
>> update
>>  ml2_vxlan_endpoints and restart   neutron-openvswitch-agent.service on
>> both nodes
>>  afterwards VMs on compute node obtained access to meta-data server.
>>
>>  I also believe that synchronized delete records from tables
>> "compute_nodes && services"
>>  ( along with disabling nova-compute on Controller)  could  turn AIO host
>> into real Controller.
>>
>> Boris.
>>
>>  ------------------------------
>> Date: Fri, 1 May 2015 22:22:41 +0200
>> Subject: Re: [Rdo-list] RE(1) Failure to start openstack-nova-compute on
>> Compute Node when testing delorean RC2 or CI repo on CentOS 7.1
>> From: ak at cloudssky.com
>> To: bderzhavets at hotmail.com
>> CC: apevec at gmail.com; rdo-list at redhat.com
>>
>>  I got the compute node working by adding the delorean-kilo.repo on
>> compute node,
>> yum updating the compute node, rebooted and extended the packstack file
>> from the first AIO
>> install with the IP of compute node and ran packstack again with
>> NetworkManager enabled
>> and did a second yum update on compute node before the 3rd packstack run,
>> and now it works :-)
>>
>>  In short, for RC2 we have to force by hand to get the nova-compute
>> running on compute node,
>> before running packstack from controller again from an existing AIO
>> install.
>>
>>  Now I have 2 compute nodes (controller AIO with compute + 2nd compute)
>> and could spawn a
>> 3rd cirros instance which landed on 2nd compute node.
>> ssh'ing into the instances over the floating ip works fine too.
>>
>>  Before running packstack again, I set:
>>
>> EXCLUDE_SERVERS=<ip of controller>
>>
>>  [root at csky01 ~(keystone_osx)]# virsh list --all
>>  Id    Name                           Status
>> ----------------------------------------------------
>>  2     instance-00000001              laufend --> means running in German
>>
>>  3     instance-00000002              laufend --> means running in German
>>
>>
>>  [root at csky06 ~]# virsh list --all
>>  Id    Name                           Status
>> ----------------------------------------------------
>>  2     instance-00000003              laufend --> means running in German
>>
>>
>>  == Nova managed services ==
>>
>> +----+------------------+----------------+----------+---------+-------+----------------------------+-----------------+
>> | Id | Binary           | Host           | Zone     | Status  | State |
>> Updated_at                 | Disabled Reason |
>>
>> +----+------------------+----------------+----------+---------+-------+----------------------------+-----------------+
>> | 1  | nova-consoleauth | csky01.csg.net | internal | enabled | up    |
>> 2015-05-01T19:46:42.000000 | -               |
>> | 2  | nova-conductor   | csky01.csg.net | internal | enabled | up    |
>> 2015-05-01T19:46:42.000000 | -               |
>> | 3  | nova-scheduler   | csky01.csg.net | internal | enabled | up    |
>> 2015-05-01T19:46:42.000000 | -               |
>> | 4  | nova-compute     | csky01.csg.net | nova     | enabled | up    |
>> 2015-05-01T19:46:40.000000 | -               |
>> | 5  | nova-cert        | csky01.csg.net | internal | enabled | up    |
>> 2015-05-01T19:46:42.000000 | -               |
>> | 6  | nova-compute     | csky06.csg.net | nova     | enabled | up    |
>> 2015-05-01T19:46:38.000000 | -               |
>>
>> +----+------------------+----------------+----------+---------+-------+----------------------------+-----------------+
>>
>>
>>  On Fri, May 1, 2015 at 9:02 AM, Boris Derzhavets <
>> bderzhavets at hotmail.com> wrote:
>>
>>  Ran packstack --debug --answer-file=./answer-fileRC2.txt
>> 192.169.142.137_nova.pp.log.gz attached
>>
>> Boris
>>
>>  ------------------------------
>> From: bderzhavets at hotmail.com
>> To: apevec at gmail.com
>> Date: Fri, 1 May 2015 01:44:17 -0400
>> CC: rdo-list at redhat.com
>> Subject: [Rdo-list] Failure to start openstack-nova-compute on Compute
>> Node when testing delorean RC2 or CI repo on CentOS 7.1
>>
>> Follow instructions
>> https://www.redhat.com/archives/rdo-list/2015-April/msg00254.html
>> packstack fails :-
>>
>> Applying 192.169.142.127_nova.pp
>> Applying 192.169.142.137_nova.pp
>> 192.169.142.127_nova.pp:                             [ DONE ]
>> 192.169.142.137_nova.pp:                          [ ERROR ]
>> Applying Puppet manifests                         [ ERROR ]
>>
>> ERROR : Error appeared during Puppet run: 192.169.142.137_nova.pp
>> Error: Could not start Service[nova-compute]: Execution of
>> '/usr/bin/systemctl start openstack-nova-compute' returned 1: Job for
>> openstack-nova-compute.service failed. See 'systemctl status
>> openstack-nova-compute.service' and 'journalctl -xn' for details.
>> You will find full trace in log
>> /var/tmp/packstack/20150501-081745-rIpCIr/manifests/192.169.142.137_nova.pp.log
>>
>> In both cases (RC2 or CI repos)  on compute node 192.169.142.137
>> /var/log/nova/nova-compute.log
>> reports :-
>>
>> 2015-05-01 08:21:41.354 4999 INFO oslo.messaging._drivers.impl_rabbit
>> [req-0ae34524-9ee0-4a87-aa5a-fff5d1999a9c ] Delaying reconnect for 1.0
>> seconds...
>> 2015-05-01 08:21:42.355 4999 INFO oslo.messaging._drivers.impl_rabbit
>> [req-0ae34524-9ee0-4a87-aa5a-fff5d1999a9c ] Connecting to AMQP server on
>> localhost:5672
>> 2015-05-01 08:21:42.360 4999 ERROR oslo.messaging._drivers.impl_rabbit
>> [req-0ae34524-9ee0-4a87-aa5a-fff5d1999a9c ] AMQP server on localhost:5672
>> is unreachable: [Errno 111] ECONNREFUSED. Trying again in 11 seconds.
>>
>> Seems like it is looking for AMQP Server at wrong host . Should be
>> 192.169.142.127
>> On 192.169.142.127 :-
>>
>> [root at ip-192-169-142-127 ~]# netstat -lntp | grep 5672
>> ==>  tcp        0      0 0.0.0.0:25672           0.0.0.0:*
>> LISTEN      14506/beam.smp
>>         tcp6       0      0 :::5672
>> :::*                    LISTEN      14506/beam.smp
>>
>> [root at ip-192-169-142-127 ~]# iptables-save | grep 5672
>> -A INPUT -s 192.169.142.127/32 -p tcp -m multiport --dports 5671,5672 -m
>> comment --comment "001 amqp incoming amqp_192.169.142.127" -j ACCEPT
>> -A INPUT -s 192.169.142.137/32 -p tcp -m multiport --dports 5671,5672 -m
>> comment --comment "001 amqp incoming amqp_192.169.142.137" -j ACCEPT
>>
>> Answer-file is attached
>>
>> Thanks.
>> Boris
>>
>> _______________________________________________ Rdo-list mailing list
>> Rdo-list at redhat.com https://www.redhat.com/mailman/listinfo/rdo-list To
>> unsubscribe: rdo-list-unsubscribe at redhat.com
>>
>> _______________________________________________
>> Rdo-list mailing list
>> Rdo-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/rdo-list
>>
>> To unsubscribe: rdo-list-unsubscribe at redhat.com
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20150511/605003a4/attachment.html>