Hi all,
For the last few days, I have been monitoring a spike in disk space utilization for
logs.rdoproject.org. The current situation is:
- 94% of space used, with less than 140GB out of 2TB available.
- The log pruning script has been reclaiming less space than we are using for new logs
during this week.
- I expect the situation to improve over the weekend, but we're definitely running out
of space.
I have looked at a random job (
https://review.opendev.org/639324, patch set 26), and found
that each run is consuming 1.2 GB of disk space in logs. The worst offenders I have found
are:
- atop.bin.gz files (one per job, 8 jobs per recheck), ranging between 15 and 40 MB each
- logs/undercloud/home/zuul/tempest/.stackviz directory on
tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001 jobs, which is a virtualenv eating up 81
MB.
As a temporary measure, I am reducing log retention from 21 days to 14, but we still need
to reduce the rate at which we are uploading logs. Would it be possible to check the
oooq-generated logs and see where we can reduce? These jobs are by far the ones consuming
most space.
Thanks,
Javier