<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 13, 2019 at 8:55 AM Javier Pena <<a href="mailto:jpena@redhat.com">jpena@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div style="font-family:"times new roman","new york",times,serif;font-size:12pt;color:rgb(0,0,0)"><div><br></div><div><br></div><hr id="gmail-m_1699110453803735921zwchr"><blockquote style="border-left:2px solid rgb(16,16,255);margin-left:5px;padding-left:5px;color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 13, 2019 at 8:22 AM Javier Pena <<a href="mailto:jpena@redhat.com" target="_blank">jpena@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi all,<br><br> For the last few days, I have been monitoring a spike in disk space utilization for <a href="http://logs.rdoproject.org" rel="noreferrer" target="_blank">logs.rdoproject.org</a>. The current situation is:<br><br> - 94% of space used, with less than 140GB out of 2TB available.<br> - The log pruning script has been reclaiming less space than we are using for new logs during this week.<br> - I expect the situation to improve over the weekend, but we're definitely running out of space.<br><br> I have looked at a random job (<a href="https://review.opendev.org/639324" rel="noreferrer" target="_blank">https://review.opendev.org/639324</a>, patch set 26), and found that each run is consuming 1.2 GB of disk space in logs. The worst offenders I have found are:<br><br> - atop.bin.gz files (one per job, 8 jobs per recheck), ranging between 15 and 40 MB each<br> - logs/undercloud/home/zuul/tempest/.stackviz directory on tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001 jobs, which is a virtualenv eating up 81 MB.<br></blockquote><div><br></div><div>Can we sync up w/ how you are calculating these results as they do not match our results.</div><div>I see each job consuming about 215M of space, we are close on stackviz being 83M. Oddly I don't see atop.bin.gz in our calculations so I'll have to look into that.</div></div></div></blockquote><div>I've checked it directly using du on the logserver. By 1.2 GB I meant the aggregate of the 8 jobs running for a single patchset. PS26 is currently using 2.5 GB and had one recheck.<br></div><div><br></div><div>About the atop.bin.gz file:<br></div><div><br></div><div># find . -name atop.bin.gz -exec du -sh {} \;<br>16M    ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch/042cb8f/logs/undercloud/var/log/atop.bin.gz<br>16M    ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch/e4171d7/logs/undercloud/var/log/atop.bin.gz<br>28M    ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky-branch/ffd4de9/logs/undercloud/var/log/atop.bin.gz<br>26M    ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky-branch/34d44bf/logs/undercloud/var/log/atop.bin.gz<br>25M    ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/b89761d/logs/undercloud/var/log/atop.bin.gz<br>24M    ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/9ade834/logs/undercloud/var/log/atop.bin.gz<br>29M    ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053/a10447d/logs/undercloud/var/log/atop.bin.gz<br>44M    ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053/99a5f9f/logs/undercloud/var/log/atop.bin.gz<br>15M    ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/c8a8c60/logs/subnode-2/var/log/atop.bin.gz<br>33M    ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/c8a8c60/logs/undercloud/var/log/atop.bin.gz<br>16M    ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/73ef532/logs/subnode-2/var/log/atop.bin.gz<br>33M    ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/73ef532/logs/undercloud/var/log/atop.bin.gz<br>40M    ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035/109d5ae/logs/undercloud/var/log/atop.bin.gz<br>45M    ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035/c2ebeae/logs/undercloud/var/log/atop.bin.gz<br>39M    ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/7fe5bbb/logs/undercloud/var/log/atop.bin.gz<br>16M    ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/5e6cb0f/logs/undercloud/var/log/atop.bin.gz<br>40M    ./tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039/c6bf5ea/logs/undercloud/var/log/atop.bin.gz<br>40M    ./tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039/6ec5ac6/logs/undercloud/var/log/atop.bin.gz<br><br></div><div>Can I safely delete all .stackviz directories? I guess that would give us some breathing room.<br></div></div></div></blockquote><div><br></div><div>Yup, go for it</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div style="font-family:"times new roman","new york",times,serif;font-size:12pt;color:rgb(0,0,0)"><div></div><div><br></div><div>Regards,<br></div><div>Javier<br></div><div><br></div><blockquote style="border-left:2px solid rgb(16,16,255);margin-left:5px;padding-left:5px;color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><div dir="ltr"><div class="gmail_quote"><div>Each job reports the size of the logs e.g. [1]</div><div><a href="http://logs.rdoproject.org/24/639324/26/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/9ade834/logs/quickstart_files/log-size.txt" target="_blank">http://logs.rdoproject.org/24/639324/26/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/9ade834/logs/quickstart_files/log-size.txt</a><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br> As a temporary measure, I am reducing log retention from 21 days to 14, but we still need to reduce the rate at which we are uploading logs. Would it be possible to check the oooq-generated logs and see where we can reduce? These jobs are by far the ones consuming most space.<br><br> Thanks,<br> Javier<br> _______________________________________________<br> dev mailing list<br><a href="mailto:dev@lists.rdoproject.org" target="_blank">dev@lists.rdoproject.org</a><br><a href="http://lists.rdoproject.org/mailman/listinfo/dev" rel="noreferrer" target="_blank">http://lists.rdoproject.org/mailman/listinfo/dev</a><br><br> To unsubscribe: <a href="mailto:dev-unsubscribe@lists.rdoproject.org" target="_blank">dev-unsubscribe@lists.rdoproject.org</a><br></blockquote></div></div></blockquote><div><br></div></div></div></blockquote></div></div>