Re: [rdo-dev] [infra][tripleo-ci] Disk space usage in logs.rdoproject.org

Friday, 14 June 2019

2TB seems very little for a data hoarder like me, I happen to have ~30TB of storage at
home. 

10-14TB hard drives are not really so expensive.

While I totally agree that we should control/limit better what we collect, I wonder if we
should not aim to get a setup where we do not struggle with disk space, where we can keep
logs for ~60 days without having to think too much about running out of disk space.

Another approach which I used in the past was a cleanup script that was removing old
builds based on their age as long the free disk space was under a specific value (10%?).
That means a dynamic retention period.

Thanks
Sorin

...
 On 13 Jun 2019, at 15:58, Wesley Hayutin <whayutin(a)redhat.com&gt;
wrote:

 On Thu, Jun 13, 2019 at 8:55 AM Javier Pena <jpena(a)redhat.com
<mailto:jpena@redhat.com>> wrote:

 On Thu, Jun 13, 2019 at 8:22 AM Javier Pena <jpena(a)redhat.com
<mailto:jpena@redhat.com>> wrote:
 Hi all,

 For the last few days, I have been monitoring a spike in disk space utilization for
logs.rdoproject.org <http://logs.rdoproject.org/>;. The current situation is:

 - 94% of space used, with less than 140GB out of 2TB available.
 - The log pruning script has been reclaiming less space than we are using for new logs
during this week.
 - I expect the situation to improve over the weekend, but we're definitely running
out of space.

 I have looked at a random job (https://review.opendev.org/639324
<https://review.opendev.org/639324>;, patch set 26), and found that each run is
consuming 1.2 GB of disk space in logs. The worst offenders I have found are:

 - atop.bin.gz files (one per job, 8 jobs per recheck), ranging between 15 and 40 MB each
 - logs/undercloud/home/zuul/tempest/.stackviz directory on
tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001 jobs, which is a virtualenv eating up 81
MB.

 Can we sync up w/ how you are calculating these results as they do not match our
results.
 I see each job consuming about 215M of space, we are close on stackviz being 83M. Oddly I
don't see atop.bin.gz in our calculations so I'll have to look into that.
 I've checked it directly using du on the logserver. By 1.2 GB I meant the aggregate
of the 8 jobs running for a single patchset. PS26 is currently using 2.5 GB and had one
recheck.

 About the atop.bin.gz file:

 # find . -name atop.bin.gz -exec du -sh {} \;
 16M   
./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch/042cb8f/logs/undercloud/var/log/atop.bin.gz
 16M   
./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch/e4171d7/logs/undercloud/var/log/atop.bin.gz
 28M   
./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky-branch/ffd4de9/logs/undercloud/var/log/atop.bin.gz
 26M   
./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky-branch/34d44bf/logs/undercloud/var/log/atop.bin.gz
 25M   
./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/b89761d/logs/undercloud/var/log/atop.bin.gz
 24M   
./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/9ade834/logs/undercloud/var/log/atop.bin.gz
 29M   
./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053/a10447d/logs/undercloud/var/log/atop.bin.gz
 44M   
./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053/99a5f9f/logs/undercloud/var/log/atop.bin.gz
 15M   
./tripleo-ci-centos-7-multinode-1ctlr-featureset010/c8a8c60/logs/subnode-2/var/log/atop.bin.gz
 33M   
./tripleo-ci-centos-7-multinode-1ctlr-featureset010/c8a8c60/logs/undercloud/var/log/atop.bin.gz
 16M   
./tripleo-ci-centos-7-multinode-1ctlr-featureset010/73ef532/logs/subnode-2/var/log/atop.bin.gz
 33M   
./tripleo-ci-centos-7-multinode-1ctlr-featureset010/73ef532/logs/undercloud/var/log/atop.bin.gz
 40M   
./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035/109d5ae/logs/undercloud/var/log/atop.bin.gz
 45M   
./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035/c2ebeae/logs/undercloud/var/log/atop.bin.gz
 39M   
./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/7fe5bbb/logs/undercloud/var/log/atop.bin.gz
 16M   
./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/5e6cb0f/logs/undercloud/var/log/atop.bin.gz
 40M   
./tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039/c6bf5ea/logs/undercloud/var/log/atop.bin.gz
 40M   
./tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039/6ec5ac6/logs/undercloud/var/log/atop.bin.gz

 Can I safely delete all .stackviz directories? I guess that would give us some breathing
room.

 Yup, go for it

 Regards,
 Javier

 Each job reports the size of the logs e.g. [1]

http://logs.rdoproject.org/24/639324/26/openstack-check/tripleo-ci-centos...
<http://logs.rdoproject.org/24/639324/26/openstack-check/tripleo-ci-centos...

 As a temporary measure, I am reducing log retention from 21 days to 14, but we still need
to reduce the rate at which we are uploading logs. Would it be possible to check the
oooq-generated logs and see where we can reduce? These jobs are by far the ones consuming
most space.

 Thanks,
 Javier
 _______________________________________________
 dev mailing list
 dev(a)lists.rdoproject.org <mailto:dev@lists.rdoproject.org>
 http://lists.rdoproject.org/mailman/listinfo/dev
<http://lists.rdoproject.org/mailman/listinfo/dev>

 To unsubscribe: dev-unsubscribe(a)lists.rdoproject.org
<mailto:dev-unsubscribe@lists.rdoproject.org>

 _______________________________________________
 dev mailing list
 dev(a)lists.rdoproject.org
 http://lists.rdoproject.org/mailman/listinfo/dev

 To unsubscribe: dev-unsubscribe(a)lists.rdoproject.org 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [rdo-dev] [infra][tripleo-ci] Disk space usage in logs.rdoproject.org