[rdo-dev] [infra][tripleo-ci] Disk space usage in logs.rdoproject.org

Wesley Hayutin whayutin at redhat.com
Thu Jun 13 14:58:04 UTC 2019


On Thu, Jun 13, 2019 at 8:55 AM Javier Pena <jpena at redhat.com> wrote:

>
>
> ------------------------------
>
>
>
> On Thu, Jun 13, 2019 at 8:22 AM Javier Pena <jpena at redhat.com> wrote:
>
>> Hi all,
>>
>> For the last few days, I have been monitoring a spike in disk space
>> utilization for logs.rdoproject.org. The current situation is:
>>
>> - 94% of space used, with less than 140GB out of 2TB available.
>> - The log pruning script has been reclaiming less space than we are using
>> for new logs during this week.
>> - I expect the situation to improve over the weekend, but we're
>> definitely running out of space.
>>
>> I have looked at a random job (https://review.opendev.org/639324, patch
>> set 26), and found that each run is consuming 1.2 GB of disk space in logs.
>> The worst offenders I have found are:
>>
>> - atop.bin.gz files (one per job, 8 jobs per recheck), ranging between 15
>> and 40 MB each
>> - logs/undercloud/home/zuul/tempest/.stackviz directory on
>> tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001 jobs, which is a
>> virtualenv eating up 81 MB.
>>
>
> Can we sync up w/ how you are calculating these results as they do not
> match our results.
> I see each job consuming about 215M of space, we are close on stackviz
> being 83M. Oddly I don't see atop.bin.gz in our calculations so I'll have
> to look into that.
>
> I've checked it directly using du on the logserver. By 1.2 GB I meant the
> aggregate of the 8 jobs running for a single patchset. PS26 is currently
> using 2.5 GB and had one recheck.
>
> About the atop.bin.gz file:
>
> # find . -name atop.bin.gz -exec du -sh {} \;
> 16M
>  ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch/042cb8f/logs/undercloud/var/log/atop.bin.gz
> 16M
>  ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch/e4171d7/logs/undercloud/var/log/atop.bin.gz
> 28M
>  ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky-branch/ffd4de9/logs/undercloud/var/log/atop.bin.gz
> 26M
>  ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky-branch/34d44bf/logs/undercloud/var/log/atop.bin.gz
> 25M
>  ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/b89761d/logs/undercloud/var/log/atop.bin.gz
> 24M
>  ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/9ade834/logs/undercloud/var/log/atop.bin.gz
> 29M
>  ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053/a10447d/logs/undercloud/var/log/atop.bin.gz
> 44M
>  ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053/99a5f9f/logs/undercloud/var/log/atop.bin.gz
> 15M
>  ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/c8a8c60/logs/subnode-2/var/log/atop.bin.gz
> 33M
>  ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/c8a8c60/logs/undercloud/var/log/atop.bin.gz
> 16M
>  ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/73ef532/logs/subnode-2/var/log/atop.bin.gz
> 33M
>  ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/73ef532/logs/undercloud/var/log/atop.bin.gz
> 40M
>  ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035/109d5ae/logs/undercloud/var/log/atop.bin.gz
> 45M
>  ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035/c2ebeae/logs/undercloud/var/log/atop.bin.gz
> 39M
>  ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/7fe5bbb/logs/undercloud/var/log/atop.bin.gz
> 16M
>  ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/5e6cb0f/logs/undercloud/var/log/atop.bin.gz
> 40M
>  ./tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039/c6bf5ea/logs/undercloud/var/log/atop.bin.gz
> 40M
>  ./tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039/6ec5ac6/logs/undercloud/var/log/atop.bin.gz
>
> Can I safely delete all .stackviz directories? I guess that would give us
> some breathing room.
>

Yup, go for it


>
> Regards,
> Javier
>
> Each job reports the size of the logs e.g. [1]
>
> http://logs.rdoproject.org/24/639324/26/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/9ade834/logs/quickstart_files/log-size.txt
>
>
>> As a temporary measure, I am reducing log retention from 21 days to 14,
>> but we still need to reduce the rate at which we are uploading logs. Would
>> it be possible to check the oooq-generated logs and see where we can
>> reduce? These jobs are by far the ones consuming most space.
>>
>> Thanks,
>> Javier
>> _______________________________________________
>> dev mailing list
>> dev at lists.rdoproject.org
>> http://lists.rdoproject.org/mailman/listinfo/dev
>>
>> To unsubscribe: dev-unsubscribe at lists.rdoproject.org
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20190613/c629d5ea/attachment-0001.html>


More information about the dev mailing list