[rdo-users] launching vm instance failed

Qi, Congyun (NSB - CN/Shanghai) congyun.qi at nokia-sbell.com
Tue Sep 22 08:27:57 UTC 2020


Hello Neutron Dev specialist,

I try to set up a trial “train” version’s OpenStack environment via RDO installation method, the hardware configuration is described briefly, and one puzzled problems is encountered, the VM instance is always failing to be launched.


1)       Select 5 blade servers in one HP C7000 blade enclosure server, the 1st blade is defined controller node, the 2nd-5th is thought of compute node, each blade server is installed 1 10G ethernet adapter which has 2 10G ethernet port.


2)       Each blade server is installed centos7.8 Linux, each blade’s 1st ethernet port is defined SRIOV compute port which is named eno49 in Linux, the 2nd ethernet port is defined OpenVswitch port. The whole OpenStack platform is installed “OVN” neutron configuration automatically via the command “packstack –answer-file=xxxx”.  Because don’t use triple-O installation method, all the server are regarded as over-cloud server directly.


3)       The image can be created, the network, subnet and port is created, and the flavor is created, then an instance is tried to be launched, but the VM failed to be launched once again. After check the logs, the VM instance failure result from the VM port binding failure, and in fact even though the port can be created, its status is always down.



4)       When change the Neutron configuration from “OVN” to “Openvswitch”, the port created status will become “up”, and the VM instance can be launched. Part of logs are attached. The problem can be reproduced each time. Could you help us to investigate why the port binding failure come up when using “OVN” configuration.



5)       If some other logs is required, please tell me to collect. There is no directory “/var/log/Containers”


Thanks.



From: Arkady Shtempler <ashtempl at redhat.com>
Sent: 2020年9月22日 14:41
To: Qi, Congyun (NSB - CN/Shanghai) <congyun.qi at nokia-sbell.com>
Subject: Re: [rdo-users] launching vm instance failed

Hi Congyun!

Yep, I see the Error you are talking about on Controller's Neutron server.log.
Unfortunately, I cannot answer your question and to explain why it happened :(
You'll probably need someone from the Neutron Dev to take a look.
I'd suggest to "relive" this email  (reply to all) and to provide all detailed information you have, including this whole Error:
Failed to bind port 566a4f78-4d90-43c0-ba64-ffb2a759ff6a on host sh-rdo-compute004-c7k4-blade05 for vnic_type normal using seg...
and attaching all Controller's logs under: /var/log/containers as zip file.

Thanks!



On Tue, Sep 22, 2020 at 9:26 AM Qi, Congyun (NSB - CN/Shanghai) <congyun.qi at nokia-sbell.com<mailto:congyun.qi at nokia-sbell.com>> wrote:
Hello Arkady,

My trial openstack is still kept OVN network configuration, the VM instance is always failing to be launched. I do the following step and collect the logs which are attached.

I have browsed the logs and find out that the errors is raised that “port binding failed”, which is found out and understood in the previous logs investigating.  In this case I set up a zone named “ovsnet” which only includes one compute “sh-rdo-compute004-c7k4-blade05”, in the logs of controller and compute04 the same “port binding failure” are observed.

I only don’t understand why the port binding failed since the whole openstack platform is set up via “packstack –answer-file==xxxxx”, could you show me the root cause of “port binding failure”?  Perhaps the version “train” of openstack have inherent bugs, I guess it. If it’s true, then I’ll use “Openvswitch” network configuration instead of “OVN”.

Whatever I’m learning one kind of troubleshooting way from your instructing again, which is great helpful for me.

Thanks.

The log “All_Greps_compute04.log” is abstracted some similar description,
failed network setup after 1 attempt(s): PortBindingFailed: Binding failed for port


From: Arkady Shtempler <ashtempl at redhat.com<mailto:ashtempl at redhat.com>>
Sent: 2020年9月18日 19:57
To: Qi, Congyun (NSB - CN/Shanghai) <congyun.qi at nokia-sbell.com<mailto:congyun.qi at nokia-sbell.com>>
Subject: Re: [rdo-users] launching vm instance failed

Hi Congyun!

Are you still facing this problem? VM creation still fails for you?
If so, I'd like to ask you to:
1) Keep the current time of your OC, just type the "date" command on some Controller for example and save it.
2) Recreate the VM
3) Inside python script change the value of:
    time_grep=set_default_arg_by_index(1,'2018-01-01 00:00:00') # Grep by time
    Use the same format and change to the timestamp that you've previously keeped in #1
4) Run this script manually on all OC nodes as you already did.
5) Pass through the result files and search for exported Errors.
    Read the explanations provided on top of the result file, this will instruct you how to proceed with it.

Thanks!

[https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>

Virus-free. www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>


On Fri, Sep 18, 2020 at 9:27 AM Qi, Congyun (NSB - CN/Shanghai) <congyun.qi at nokia-sbell.com<mailto:congyun.qi at nokia-sbell.com>> wrote:

The 4th compute log is attached.


From: Qi, Congyun (NSB - CN/Shanghai)
Sent: 2020年9月18日 13:41
To: 'Arkady Shtempler' <ashtempl at redhat.com<mailto:ashtempl at redhat.com>>
Subject: RE: [rdo-users] launching vm instance failed

Hello Arkady,

I apply for an account of Github, and log on the Github, and then can download the file from the following github web links.

I execute the python scripts to collect all the logs in my controller and 3pcs compute nodes, which are attached. The 4th compute node can’t work normally, I’m checking it.

Could you help to check the logs to find out why the instance can’t be launched?

Thanks so much.


From: Qi, Congyun (NSB - CN/Shanghai)
Sent: 2020年9月18日 11:24
To: 'Arkady Shtempler' <ashtempl at redhat.com<mailto:ashtempl at redhat.com>>
Subject: RE: [rdo-users] launching vm instance failed

Hello Arkady,

Could you help to compress the file “Extract_On_Node.py” into .zip file because our company outlook e-mail server block the attached *.py file base on its security policy? And I also try to download the file from the github web links.

Thanks.


[cid:image001.png at 01D690FA.2F2B5100]

From: Arkady Shtempler <ashtempl at redhat.com<mailto:ashtempl at redhat.com>>
Sent: 2020年9月17日 16:49
To: Qi, Congyun (NSB - CN/Shanghai) <congyun.qi at nokia-sbell.com<mailto:congyun.qi at nokia-sbell.com>>
Subject: Re: [rdo-users] launching vm instance failed

Hi Congyun!

There is something wrong when you copy the content of this script to your host.
Error says that the problem is in line: 299, but this line on GitHib looks absolutely different: https://github.com/zahlabut/LogTool/blob/cbdb3fccb062cae689e95a3b243b6ad0f9716168/LogTool_Python3/Extract_On_Node.py#L299
Please use the attached file, upload to your OC node, it should work.

Thanks!



On Thu, Sep 17, 2020 at 11:23 AM Qi, Congyun (NSB - CN/Shanghai) <congyun.qi at nokia-sbell.com<mailto:congyun.qi at nokia-sbell.com>> wrote:
Hello Arkady,

I can’t download the Extract_On_Node.py file from your giving web link directly, and only copy all the source code to a file in my computer, and then delete all the blank character in each line, and re-interviewing the file, but an error is raised again, I can not modify the error in line 299, could you help to check it?


-------------------------------------------------------------------------------------------------------------------------------------------
[root at sh-rdo-controller-c7k4-blade01 logtool]# python3 ./Extract_On_Node.py
  File "./Extract_On_Node.py", line 299
    block="*** LogTool --> this block is missing timestamp, therefore could be irrelevant to your" \
                                                                                                    ^
SyntaxError: unexpected character after line continuation character


From: Arkady Shtempler <ashtempl at redhat.com<mailto:ashtempl at redhat.com>>
Sent: 2020年9月16日 17:18
To: Qi, Congyun (NSB - CN/Shanghai) <congyun.qi at nokia-sbell.com<mailto:congyun.qi at nokia-sbell.com>>
Subject: Re: [rdo-users] launching vm instance failed

Hi Congyun!

Sure, not a problem :-)
It seems to me that something went wrong on download or copying script content into the file.
The Error you have is that there is a "space/tab" character coming before "import subprocess ..." and Python fails because of that.

Download "Extract_On_Node.py" from: https://github.com/zahlabut/LogTool/blob/master/LogTool_Python3/Extract_On_Node.py to your OC node and retry.

Thanks!




On Wed, Sep 16, 2020 at 12:02 PM Qi, Congyun (NSB - CN/Shanghai) <congyun.qi at nokia-sbell.com<mailto:congyun.qi at nokia-sbell.com>> wrote:
Hello Arkady,

Sorry not to reply for your e-mail due to being busy doing other emergent things.

The Python3 package is installed in my controller server, and the log tool source code has also been downloaded via your giving github web links.

But when try to interpreting the python scripts via the command “python3 ./Extract_On_Node.py” command, an error is raised, I’m nor familiar with programming, could you like to instruct me how to do it?

[root at sh-rdo-controller-c7k4-blade01 logtool]# python3 ./Extract_On_Node.py
  File "./Extract_On_Node.py", line 21
    import subprocess,time,os,sys
    ^
IndentationError: unexpected indent


From: Arkady Shtempler <ashtempl at redhat.com<mailto:ashtempl at redhat.com>>
Sent: 2020年9月9日 15:05
To: Qi, Congyun (NSB - CN/Shanghai) <congyun.qi at nokia-sbell.com<mailto:congyun.qi at nokia-sbell.com>>
Subject: Re: [rdo-users] launching vm instance failed

Hi Congyun!

As for storage I do have access, but this site is in Chinese and I wasn't able to find something there, did you upload log files to this site?

Here is something you should be able to run on your Overcloud nodes.
Make sure that you have Python3 installed on each node (it's probably there), then make some folder and download this script into that folder.
https://github.com/zahlabut/LogTool/blob/master/LogTool_Python3/Extract_On_Node.py
After that try to run it with "python3 Extract_On_Node .py" this will start analyzing all log files under "/var/log", once done it will create the result file with all exported unique Error blocks from logs.
Pass through the result files and check if you see something that could explain why VM creation is failing for you.
Also you are welcomed to send me all the result files and I'll take a look as well.

Thanks!



On Wed, Sep 9, 2020 at 4:25 AM Qi, Congyun (NSB - CN/Shanghai) <congyun.qi at nokia-sbell.com<mailto:congyun.qi at nokia-sbell.com>> wrote:
Hi Arkady,

Yes. Here Under-Cloud is not be used, all the server should be regarded as overcloud nodes.

I apply a cloud storage in a public cloud provider, its web link is “ http://yunpan.360.cn “ , the user name is tz9406 at 163.com<mailto:tz9406 at 163.com>   password is  Tangzhong9406

I try to collect the logs mentioned in your previous e-mail, but it’s a pity that there is no subdirectory of “containers” in the directory “/var/log” in my each nodes, including 1pcs controller, 4pcs compute nodes.  Perhaps the other name is given in my platform, could you help me to check if other kinds of logs are also used.

Thanks.


From: Arkady Shtempler <ashtempl at redhat.com<mailto:ashtempl at redhat.com>>
Sent: 2020年9月8日 17:07
To: Qi, Congyun (NSB - CN/Shanghai) <congyun.qi at nokia-sbell.com<mailto:congyun.qi at nokia-sbell.com>>
Subject: Re: [rdo-users] launching vm instance failed

Hi Congyun!

So you don't have an Undercloud host, correct?
If so, can you upload all logs from all Overcloud nodes (Controllers and Computes) located in /var/log/containers to somewhere, so I'll be able to download and to analyze those logs?

Thanks!

On Tue, Sep 8, 2020 at 11:54 AM Qi, Congyun (NSB - CN/Shanghai) <congyun.qi at nokia-sbell.com<mailto:congyun.qi at nokia-sbell.com>> wrote:
Hi Arkady,

Perhaps my openstack is not described clearly, I explain some configurations.

RDO triple-O project is not utilized in my trial platform, and so neither overcloud nor under cloud server are configured, only deploy one set of OpenStack platform directly, 1 controller plus 4 compute is set up here.

Could you help me to confirm if the LogTool can be still used not in the triple-O environment, but in the pure OpenStack environment?  I continue to download the LogTool firstly.

Thanks.

From: Arkady Shtempler <ashtempl at redhat.com<mailto:ashtempl at redhat.com>>
Sent: 2020年9月8日 16:14
To: Qi, Congyun (NSB - CN/Shanghai) <congyun.qi at nokia-sbell.com<mailto:congyun.qi at nokia-sbell.com>>
Subject: Re: [rdo-users] launching vm instance failed

Hi Congyun!

Unfortunately PING fails.
I'd like to advise you to use LogTool [1], clone it to your Undercloud host and run mode #1.
It will export all the Unique Errors from all logs Overcloud nodes: computes, controllers e.t.c.

[1] - https://github.com/zahlabut/LogTool

Thanks!

On Tue, Sep 8, 2020 at 10:56 AM Qi, Congyun (NSB - CN/Shanghai) <congyun.qi at nokia-sbell.com<mailto:congyun.qi at nokia-sbell.com>> wrote:
Hi Arkady,

I’m not sure if our blade server can be accessed or not.  Anyway please try to do it.

My controller server IP address is: 135.251.149.215,  please try to ping my controller in order to ensure if you can reach my server firstly.

If my server can be reachable from your side, I’ll tell you my server username//password,  my server has been launched SSH service.


From: Arkady Shtempler <ashtempl at redhat.com<mailto:ashtempl at redhat.com>>
Sent: 2020年9月8日 15:36
To: Qi, Congyun (NSB - CN/Shanghai) <congyun.qi at nokia-sbell.com<mailto:congyun.qi at nokia-sbell.com>>
Subject: Re: [rdo-users] launching vm instance failed

Hi Congyun!

Any chance to get SSH access to your Undercloud host?

Thanks!


[https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>

Virus-free. www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>


On Tue, Sep 8, 2020 at 10:21 AM Qi, Congyun (NSB - CN/Shanghai) <congyun.qi at nokia-sbell.com<mailto:congyun.qi at nokia-sbell.com>> wrote:
Hello specialist,

I want to deploy a trial OpenStack platform via RDO installing method. My brief operating step is listed below:

1)       Select 5 blade servers in one HP C7000 blade enclosure server, the 1st blade is defined controller node, the 2nd-5th is thought of compute node, each blade server is installed 1 10G ethernet adapter which has 2 10G ethernet port.


2)       Each blade server is installed centos7.8 Linux, each blade’s 1st ethernet port is defined SRIOV compute port which is named eno49 in Linux, the 2nd ethernet port is defined OpenVswitch port.



3)       Install openstack repository whose version is “train”, install “packstack” package, and generate an answer-file by command “packstack --gen-answer-file=xxxxx”, and then modify some relative parameters to install the openstack service, add ml2 type, add 4 compute node, and so on.



4)       Do the command “packstack –answer-file=xxxxx” to deploy the openstack, the openstack can be deployed successfully, the ML2 type is OVN.



5)       The image is created, the network, subnet and port is created, and the flavor is created, then an instance is tried to be launched, but the VM failed to be launched once again, some logs is attached.


Could you help to check why the VM instance can’t be launched successfully.

Thanks so much.


_______________________________________________
users mailing list
users at lists.rdoproject.org<mailto:users at lists.rdoproject.org>
http://lists.rdoproject.org/mailman/listinfo/users

To unsubscribe: users-unsubscribe at lists.rdoproject.org<mailto:users-unsubscribe at lists.rdoproject.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/users/attachments/20200922/007c29a2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 13110 bytes
Desc: image001.png
URL: <http://lists.rdoproject.org/pipermail/users/attachments/20200922/007c29a2/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: openstack-log20200922.zip
Type: application/x-zip-compressed
Size: 18776 bytes
Desc: openstack-log20200922.zip
URL: <http://lists.rdoproject.org/pipermail/users/attachments/20200922/007c29a2/attachment-0001.bin>


More information about the users mailing list