On Thu, Jun 13, 2019 at 8:22 AM Javier Pena <jpena@redhat.com> wrote:
Hi all,

For the last few days, I have been monitoring a spike in disk space utilization for logs.rdoproject.org. The current situation is:

- 94% of space used, with less than 140GB out of 2TB available.
- The log pruning script has been reclaiming less space than we are using for new logs during this week.
- I expect the situation to improve over the weekend, but we're definitely running out of space.

I have looked at a random job (https://review.opendev.org/639324, patch set 26), and found that each run is consuming 1.2 GB of disk space in logs. The worst offenders I have found are:

- atop.bin.gz files (one per job, 8 jobs per recheck), ranging between 15 and 40 MB each
- logs/undercloud/home/zuul/tempest/.stackviz directory on tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001 jobs, which is a virtualenv eating up 81 MB.

Can we sync up w/ how you are calculating these results as they do not match our results.
I see each job consuming about 215M of space, we are close on stackviz being 83M. Oddly I don't see atop.bin.gz in our calculations so I'll have to look into that.

Each job reports the size of the logs e.g. [1]
http://logs.rdoproject.org/24/639324/26/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/9ade834/logs/quickstart_files/log-size.txt

 

As a temporary measure, I am reducing log retention from 21 days to 14, but we still need to reduce the rate at which we are uploading logs. Would it be possible to check the oooq-generated logs and see where we can reduce? These jobs are by far the ones consuming most space.

Thanks,
Javier
_______________________________________________
dev mailing list
dev@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/dev

To unsubscribe: dev-unsubscribe@lists.rdoproject.org