----- Original Message -----
Hi all,
For the last few days, I have been monitoring a spike in disk space
utilization for
logs.rdoproject.org. The current situation is:
- 94% of space used, with less than 140GB out of 2TB available.
- The log pruning script has been reclaiming less space than we are using for
new logs during this week.
- I expect the situation to improve over the weekend, but we're definitely
running out of space.
I have looked at a random job (
https://review.opendev.org/639324, patch set
26), and found that each run is consuming 1.2 GB of disk space in logs. The
worst offenders I have found are:
- atop.bin.gz files (one per job, 8 jobs per recheck), ranging between 15 and
40 MB each
- logs/undercloud/home/zuul/tempest/.stackviz directory on
tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001 jobs, which is a
virtualenv eating up 81 MB.
As a temporary measure, I am reducing log retention from 21 days to 14, but
we still need to reduce the rate at which we are uploading logs. Would it be
possible to check the oooq-generated logs and see where we can reduce? These
jobs are by far the ones consuming most space.
Two months after this e-mail, we're having the same situation.
Disk I/O performance on RDO Cloud is not great, so we're close to 95% disk space
usage, and old logs deletion is slower than new logs addition. On top of this, any attempt
to clear logs more aggressively cause additional load on the server, which results on
failed log uploads [1].
Please, could we tackle the excessive log uploads asap? I see the .stackviz virtualenv
directories are still being uploaded. If we don't fix this soon, we'll end up
having unwanted downtime in the log server, which will affect all jobs.
Thanks,
Javier
[1] -
https://review.rdoproject.org/zuul/builds?result=POST_FAILURE
Thanks,
Javier
_______________________________________________
dev mailing list
dev(a)lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/dev
To unsubscribe: dev-unsubscribe(a)lists.rdoproject.org