[rdo-list] [TripleO] Newton large baremetal deployment issues
Charles Short
cems at ebi.ac.uk
Tue Nov 15 22:39:19 UTC 2016
Hi,
So I have finally tried OSP9 and here are the results -
3 Controllers 40 compute - 1 hours 20 mins to deploy.
This is much more the sort of deployment time I was expecting :)
I then tried TripleO Newton Stable again with 3 Controllers 40 Compute -
4 hours and counting.....
The two deployment scripts (for OSP9 and TripleO Newton) were pretty
much identical (allowing for any changes between releases)
During the OSP9 deployment I could use nova list to list the nodes. The
Undercloud API access was in general very responsive.
During the TripleO Newton deployment 'nova list' hangs -
ERROR (ClientException): The server has either erred or is incapable of
performing the requested operation. (HTTP 500)
Undercloud API access was very sluggish.
I noticed Keystone was stuck at 140% for most of the deployment (albeit
multi threaded) which is not the case for OSP9.
I know it is hard to compare two releases, but the difference is enormous.
I will stick with OSP9 for now as this for me works properly out of the
box for large deployments.
Charles
On 14/11/2016 09:01, Charles Short wrote:
> Hi Graeme,
>
> Thanks for the reply.
>
> I used these images -
>
> http://buildlogs.centos.org/centos/7/cloud/x86_64/tripleo_images/newton/delorean/
>
>
> I installed the stable repo following the documentation here -
>
> http://docs.openstack.org/developer/tripleo-docs/installation/installation.html
>
>
> for example -
>
> sudo curl -L -o /etc/yum.repos.d/delorean-newton.repo
> https://trunk.rdoproject.org/centos7-newton/current/delorean.repo
>
> sudo curl -L -o /etc/yum.repos.d/delorean-deps-newton.repo
> http://trunk.rdoproject.org/centos7-newton/delorean-deps.repo
>
>
> The difficulty I am having is that when I test with a small deployment
> all works fine. So you would assume just adding more compute nodes
> would not be an issue.
> Testing this is painful due to the time it takes for a large
> deployment to fail. It seems to be only scale that is the issue.
>
> I will try and get you some logs
>
> Regards
>
> Charles
>
>
>
>> So the symptoms you are showing me above almost definitely leads me to
>> believe that neutron-server failed on the undercloud, which would
>> explain why the deploy and nova failed to work. It could have failed
>> before or during the deploy. We regularly see instances where
>> neutron-server times out upon system boot (takes slightly longer to
>> start than systemd expects), so we need to start it manually.
>>
>> To be clear, The undercloud has been installed using this repo
>>
>> http://buildlogs.centos.org/centos/7/cloud/x86_64/rdo-trunk-newton-tested/
>>
>>
>> Which overcloud images are you using? I'm not seeing any provided in
>> that repo, and I just want to make sure the undercloud and overcloud
>> packages match (as the tripleo-heat-templates package on the undercloud
>> has to align with the openstack-puppet-modules package on the overcloud
>> iamges).
>>
>> Also, is it possible to get a copy of all the neutron-server log from
>> the undercloud? If we can understand why neutron-server failed, that is
>> the first step towards getting a working deployment.
>>
>> It would be great if we could get a full sosreport with all the system
>> logs, to check for other errors. I'm assuming there were no problems
>> with the 'openstack undercloud install' process?
>>
>> Regards,
>>
>> Graeme
>>
>
--
Charles Short
Cloud Engineer
Virtualization and Cloud Team
European Bioinformatics Institute (EMBL-EBI)
Tel: +44 (0)1223 494205
More information about the dev
mailing list