[rdo-list] [TripleO] Newton large baremetal deployment issues

Fri Nov 4 15:17:06 UTC 2016

Hey Charles,

What sort of issues are you seeing now? How did node pinning work out and
did a slow scale up present any more problems?

Deployments tend to be disk and network limited, you don't mention what
sort of disks your machines have but you do note 1g nics, which are doable
but might require some timeout adjustments or other considerations to give
everything time to complete.

On Fri, Nov 4, 2016 at 10:45 AM, Charles Short <cems at ebi.ac.uk> wrote:

> Hi,
>
> So you are implying that tripleO is not really currently able to roll out
> large deployments easily as it is is prone to scaling delays/errors?
> Is the same true for RH OSP9 (out of the box) as this also uses tripleO?
> I would expect exactly the same scaling issues. But surely OSP9 is designed
> for large enterprise Openstack installations?
> So if OSP9 does work well with large deployments, what are the tripleO
> tweaks that make this work (if any)?
>
> Many Thanks
>
> Charles
>
> On 03/11/2016 13:30, Justin Kilpatrick wrote:
>
> Hey Charles,
>
> If you want to deploy a large number of machines, I suggest you deploy a
> small configuration (maybe 3 controllers 1 compute) and then run the
> overcloud deploy command again with 2 computes, so on and so forth until
> you reach your full allocation
>
> Realistically you can probably do a stride of 5 computes each time,
> experiment with it a bit, as you get up to the full allocation of nodes you
> might run into a race condition bug with assigning computes to nodes and
> need to pin nodes (pinning is adding as an ironic property that
> overcloud-novacompute-0 goes here, 1 here, so on and so forth).
>
> As for actually solving the deployment issues at scale (instead of this
> horrible hack) I'm looking into adding some robustness at the ironic or
> tripleo level to these operations. It sounds like you're running more into
> node assignment issues rather than pxe issues though.
>
> 2016-11-03 9:16 GMT-04:00 Luca 'remix_tj' Lorenzetto <
> lorenzetto.luca at gmail.com>:
>
>> On Wed, Nov 2, 2016 at 8:30 PM, Charles Short <cems at ebi.ac.uk> wrote:
>> > Some more testing of different amounts of nodes vs time taken for
>> successful
>> > deployments -
>> >
>> > 3 controller 3 compute = 1 hour
>> > 3 controller 15 compute = 1 hour
>> > 3 controller 25 compute  = 1 hour 45 mins
>> > 3 controller 35 compute  = 4 hours
>>
>> Hello,
>>
>> i'm now preparing my deployment of 3+2 nodes. I'll check what you
>> reported and give you some feedback.
>>
>> Luca
>>
>>
>> --
>> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
>> calcoli che potrebbero essere affidati a chiunque se si usassero delle
>> macchine"
>> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>>
>> "Internet è la più grande biblioteca del mondo.
>> Ma il problema è che i libri sono tutti sparsi sul pavimento"
>> John Allen Paulos, Matematico (1945-vivente)
>>
>> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <
>> lorenzetto.luca at gmail.com>
>>
>> _______________________________________________
>> rdo-list mailing list
>> rdo-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/rdo-list
>>
>> To unsubscribe: rdo-list-unsubscribe at redhat.com
>>
>
>
> --
> Charles Short
> Cloud Engineer
> Virtualization and Cloud Team
> European Bioinformatics Institute (EMBL-EBI)
> Tel: +44 (0)1223 494205
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20161104/6c04d56f/attachment.html>