[rdo-list] [TripleO] Newton large baremetal deployment issues

Charles Short cems at ebi.ac.uk
Fri Nov 4 15:31:38 UTC 2016


Hi,

Each node has 2X HP 900GB 12G SAS 10K 2.5in SC ENT HDD.
The 1Gb deployment NIC is not really causing the delay. It is very busy 
for the time the overcloud image is rolled out (the first 30 to 45 mins 
of deployment), but after that  (once all the nodes are up and active 
with an ip address (pingable)) ,the bandwidth is a fraction of 1Gbps on 
average for the rest of the deployment. For info the NICS in the nodes 
for the Overcloud networks are dual bonded 10Gbit.

The deployment I mentioned before (50 nodes) actually completed in 8 
hours (which is double the time it took for 35 nodes!)

I am in the process of a new  3 controller 59 compute node deployment 
pinning all the nodes as you suggested. The initial overcloud image roll 
out took just under 1 hour (all nodes ACTIVE and pingable). I am now 4.5 
hours in and all is running (slowly). It is currently on Step2  (of 5 
Steps). I would expect this deployment to take 10 hours on current speed.

Regards

Charles

On 04/11/2016 15:17, Justin Kilpatrick wrote:
> Hey Charles,
>
> What sort of issues are you seeing now? How did node pinning work out 
> and did a slow scale up present any more problems?
>
> Deployments tend to be disk and network limited, you don't mention 
> what sort of disks your machines have but you do note 1g nics, which 
> are doable but might require some timeout adjustments or other 
> considerations to give everything time to complete.
>
> On Fri, Nov 4, 2016 at 10:45 AM, Charles Short <cems at ebi.ac.uk 
> <mailto:cems at ebi.ac.uk>> wrote:
>
>     Hi,
>
>     So you are implying that tripleO is not really currently able to
>     roll out large deployments easily as it is is prone to scaling
>     delays/errors?
>     Is the same true for RH OSP9 (out of the box) as this also uses
>     tripleO?  I would expect exactly the same scaling issues. But
>     surely OSP9 is designed for large enterprise Openstack installations?
>     So if OSP9 does work well with large deployments, what are the
>     tripleO tweaks that make this work (if any)?
>
>     Many Thanks
>
>     Charles
>
>     On 03/11/2016 13:30, Justin Kilpatrick wrote:
>>     Hey Charles,
>>
>>     If you want to deploy a large number of machines, I suggest you
>>     deploy a small configuration (maybe 3 controllers 1 compute) and
>>     then run the overcloud deploy command again with 2 computes, so
>>     on and so forth until you reach your full allocation
>>
>>     Realistically you can probably do a stride of 5 computes each
>>     time, experiment with it a bit, as you get up to the full
>>     allocation of nodes you might run into a race condition bug with
>>     assigning computes to nodes and need to pin nodes (pinning is
>>     adding as an ironic property that overcloud-novacompute-0 goes
>>     here, 1 here, so on and so forth).
>>
>>     As for actually solving the deployment issues at scale (instead
>>     of this horrible hack) I'm looking into adding some robustness at
>>     the ironic or tripleo level to these operations. It sounds like
>>     you're running more into node assignment issues rather than pxe
>>     issues though.
>>
>>     2016-11-03 9:16 GMT-04:00 Luca 'remix_tj' Lorenzetto
>>     <lorenzetto.luca at gmail.com <mailto:lorenzetto.luca at gmail.com>>:
>>
>>         On Wed, Nov 2, 2016 at 8:30 PM, Charles Short <cems at ebi.ac.uk
>>         <mailto:cems at ebi.ac.uk>> wrote:
>>         > Some more testing of different amounts of nodes vs time
>>         taken for successful
>>         > deployments -
>>         >
>>         > 3 controller 3 compute = 1 hour
>>         > 3 controller 15 compute = 1 hour
>>         > 3 controller 25 compute  = 1 hour 45 mins
>>         > 3 controller 35 compute  = 4 hours
>>
>>         Hello,
>>
>>         i'm now preparing my deployment of 3+2 nodes. I'll check what you
>>         reported and give you some feedback.
>>
>>         Luca
>>
>>
>>         --
>>         "E' assurdo impiegare gli uomini di intelligenza eccellente
>>         per fare
>>         calcoli che potrebbero essere affidati a chiunque se si
>>         usassero delle
>>         macchine"
>>         Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>>
>>         "Internet è la più grande biblioteca del mondo.
>>         Ma il problema è che i libri sono tutti sparsi sul pavimento"
>>         John Allen Paulos, Matematico (1945-vivente)
>>
>>         Luca 'remix_tj' Lorenzetto, http://www.remixtj.net ,
>>         <lorenzetto.luca at gmail.com <mailto:lorenzetto.luca at gmail.com>>
>>
>>         _______________________________________________
>>         rdo-list mailing list
>>         rdo-list at redhat.com <mailto:rdo-list at redhat.com>
>>         https://www.redhat.com/mailman/listinfo/rdo-list
>>         <https://www.redhat.com/mailman/listinfo/rdo-list>
>>
>>         To unsubscribe: rdo-list-unsubscribe at redhat.com
>>         <mailto:rdo-list-unsubscribe at redhat.com>
>>
>>
>
>     -- 
>     Charles Short
>     Cloud Engineer
>     Virtualization and Cloud Team
>     European Bioinformatics Institute (EMBL-EBI)
>     Tel:+44 (0)1223 494205 <tel:%2B44%20%280%291223%20494205>  
>
>

-- 
Charles Short
Cloud Engineer
Virtualization and Cloud Team
European Bioinformatics Institute (EMBL-EBI)
Tel: +44 (0)1223 494205

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20161104/1b9ecd91/attachment.html>


More information about the dev mailing list