The Linux block device I/O layer was designed to optimize Hard Disk Drives (HDD) performance. It was based on a single queue (SQ) for I/O submission for each block device; a single locking mechanism shared by all CPU cores whenever I/Os were submitted, removed, or reordered in the submission queues; and inefficient hardware interrupt handling. For more information, see Linux Block IO: Introducing Multi-queue SSD Access on Multi-core Systems and Improving Block-level Efficiency with scsi-mq.
With the increased use of Non-Volatile Memory (NVM) as primary storage (flash storage devices) the I/O bottleneck shifted from the storage media to the server I/O layer. This shift opened the door to a new design: block multi-queue (MQ), also known as blk-mq.
blk-mq introduces a two-layer design in which each block device has multiple software I/O submission queues (one per CPU core) that eventually fan into a single queue for the device driver. Queue handling is based on FIFO ordering that the core submitting the I/Os handles. It no longer requires the interrupts or shared lock mechanism.
The current design omits I/O reordering (scheduler) because NVM media performance is not affected by the I/O pattern being random or sequential. However, I/O scheduling can be introduced by using kernel I/O scheduler such as ‘mq-deadline’ (RHEL 8).
We tested blk-mq with VMAX All Flash and PowerMax storage systems. When the server was the bottleneck (older CPUs, high server utilization), MQ provided excellent performance and server efficiency benefits. With newer servers, CPU, and SAN (32Gb), we didn’t notice such benefits.
In addition, when using FC-NVMe protocol, MQ is already enabled by default for /dev/nvme devices. Likely due to do the high adoption of flash storage, in RHEL 8 and above, MQ is enabled by default even when using FC protocol (for the /dev/sd devices). For FC environments where the server may be a bottleneck, we recommend trying blk-mq if it isn’t already enabled by default.
Note the following about the use of blk-mq:
Linux kernels enable blk-mq by default for devices presented with NVMe protocol (devices appearing as /dev/nvme). For devices presented via FC protocol (devices appearing as /dev/sd), blk-mq may not be enabled.
To determine if FC devices have MQ enabled run the following commands, where the first command determines if MQ is enabled in general, and the second command is only relevant if using device mapper multipath.
# cat /sys/module/scsi_mod/parameters/use_blk_mq
# cat /sys/module/dm_mod/parameters/use_blk_mq
To enable blk-mq for FC devices, update the /boot/grub2/grub.cfg. This can be done by editing the GRUB_CMDLINE_LINUX parameters as shown below. As before, the dm_mod.use_blk_mq parameter is only relevant if using device mapper.
# vi /etc/default/grub
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_CMDLINE_LINUX="crashkernel=auto resume=UUID=7b5d3708-53b9-482e-80f6-01d4086f30b2 rhgb quiet scsi_mod.use_blk_mq=1 dm_mod.use_blk_mq=y"
The scsi_mod.use_blk_mq=1 parameter enables blk-mq for SCSI-type block devices at the kernel level (where 0 disables it).
The dm_mod.use_blk_mq=y parameter enables blk-mq for device-mapper (DM) Linux native multipathing (where n disables it).
Recreate the grub.cfg file and remember to reboot the server for the changes to take effect.
# grub2-mkconfig -o /boot/grub2/grub.cfg