after an upgrade to debian buster i’ve noticed that both iostat -x 1 and munin’s diskstats_utilization report that NVMe drives are busy most of the time. some empirical tests showed that disks are actually idle, performance did not drop.
upgrade to 5.2 kernel resolved the miss-reporting issue.
some related bug reports:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=927184
https://bugs.centos.org/view.php?id=15723
https://github.com/sysstat/sysstat/issues/187
https://github.com/munin-monitoring/munin/issues/1119
https://unix.stackexchange.com/questions/517667/nvme-disk-shows-80-io-utilization-partitions-show-0-io-utilization?noredirect=1&lq=1
https://github.com/netdata/netdata/issues/5744#issuecomment-513873791
so it’s a known, kernel-related issue.
i’ve done a simple test – run single KVM guest with disk on top of NVMe storage. in the guest i’ve executed:
while [ true ] ; do touch a ; sync ; sleep 0.5 ; done
and observed iostat -x on the host. with buster’s stock 4.19.0-6-amd64 kernel iostat -x 1 reported constantly high utilisation:
avg-cpu: %user %nice %system %iowait %steal %idle 0.02 0.00 0.02 0.00 0.00 99.96 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.00 14.00 0.00 50.00 0.00 0.00 0.00 0.00 0.00 0.07 0.86 0.00 3.57 61.14 85.60 nvme1n1 0.00 14.00 0.00 50.00 0.00 0.00 0.00 0.00 0.00 0.07 0.86 0.00 3.57 61.14 85.60 md0 0.00 10.00 0.00 32.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3.20 0.00 0.00 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 10.00 0.00 32.00 0.00 0.00 0.00 0.00 0.00 3.60 0.04 0.00 3.20 3.60 3.60 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.04 0.00 0.02 0.00 0.00 99.94 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.00 14.00 0.00 50.00 0.00 0.00 0.00 0.00 0.00 0.00 0.91 0.00 3.57 64.86 90.80 nvme1n1 0.00 14.00 0.00 50.00 0.00 0.00 0.00 0.00 0.00 0.00 0.91 0.00 3.57 64.86 90.80 md0 0.00 10.00 0.00 32.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3.20 0.00 0.00 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 1.00 0.00 16.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 16.00 0.00 0.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 10.00 0.00 32.00 0.00 0.00 0.00 0.00 0.00 3.60 0.04 0.00 3.20 3.60 3.60 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
i’ve repeated the same test after upgrading the host to kernel from backports – 5.2.0-0.bpo.2-amd64 – problem was gone, %util looked reasonably:
avg-cpu: %user %nice %system %iowait %steal %idle 0.04 0.00 0.02 0.00 0.00 99.94 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.00 14.00 0.00 50.00 0.00 0.00 0.00 0.00 0.00 0.07 0.00 0.00 3.57 1.14 1.60 nvme1n1 0.00 14.00 0.00 50.00 0.00 0.00 0.00 0.00 0.00 0.07 0.00 0.00 3.57 1.14 1.60 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 10.00 0.00 32.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3.20 0.00 0.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 10.00 0.00 32.00 0.00 0.00 0.00 0.00 0.00 4.00 0.04 0.00 3.20 1.60 1.60 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.08 0.00 0.02 0.00 0.00 99.90 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.00 12.00 0.00 45.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3.79 1.00 1.20 nvme1n1 0.00 12.00 0.00 45.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3.79 1.00 1.20 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 10.00 0.00 32.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3.20 0.00 0.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 10.00 0.00 32.00 0.00 0.00 0.00 0.00 0.00 3.20 0.03 0.00 3.20 1.60 1.60 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
%util is meaningless on devices which can serve requests in parallel, like ssd or nvme
see https://brooker.co.za/blog/2014/07/04/iostat-pct.html
thanks for the comment! technically right, but – if you understand exactly nature of the workload [e.g. single-threaded, sequential operation without anything else] – %util can still be helpful.