If read I/O latencies are higher than expected, it is best practice to use the Linux iostat command to investigate. Run the following command:
iostat -xtzm <interval> <iterations> [optional: <list of specific devices seprated by space>]
Capture the output to a file. After a few iterations, stop the command and inspect the file. Ignore the first interval and focus on interval 2 and higher.
Because the file can be large, either include the specific devices to monitor, or find the pseudo name of one of the Oracle data files and focus on that single device[10], as shown in the following:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
dm-395 0.00 0.00 2145.33 937.33 16.76 7.51 16.13 2.99 0.97 0.28 87.77
Note how many I/Os are queued to the device (avgqu-sz), the amount of time I/Os take to be serviced, including queue time (await), and the amount of time I/Os take to be serviced when they leave the queue (svctm). If the await time is long but svctm time is short, the server is likely experiencing a queuing issue and requires more LUNs (more I/O queues), and perhaps more paths to the storage. If the svctm time is high, it might indicate a SAN or storage performance bottleneck.
The following table summarizes some of the iostat metrics and provides advice about how to use them.
Table 8. Linux iostat with flags xtz - metrics summary
Metric |
Description |
Comments |
Device |
The device name as listed in the /dev directory |
When multipathing is used, each device has a pseudo name (such as dm-xxx, or emcpowerxxx) and each path has a device name (such as /dev/sdxxx). Use the pseudo name to inspect the aggregated metrics across all paths. |
r/s, w/s |
The number of read or write requests that are issued to the device per second. |
r/s + w/s provides the server IOPS requests for the device. The ratio between these metrics provides read to write ratio. |
rMB/s, wMB/s |
The number of MB read or written from or to the device per second (512 bytes per sector) |
Review the bandwidth performance of the device. You can determine the average read I/O size by dividing the rMB/s by r/s. Similarly, you can determine the average write size by dividing wMB/s read by w/s. |
avgrq-sz |
The average size (in sectors) of the requests that are issued to the device |
The queue size is not very important in most performance issues. Focus should be on the parameter: ‘avgqu-sz’ below. |
avgqu-sz |
The average queue length of the requests that were issued to the device |
The number of requests queued to the device. If the queues are large they will cause increase in latency. If the devices are in a SG with a low service level consider improving the service level. Otherwise, if the storage system is not over utilized, consider adding more devices or paths to allow for better I/O distribution at the server level. |
await |
The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them |
If the await time, which includes the queuing time, is much larger than the svctm time, it might indicate a server queuing issue. See avgqu-sz metric above. |
svctm |
The average service time (in milliseconds) for I/O requests that were issued to the device |
For active devices, the await time should be within the expected service level time (for example <=1 ms for flash storage, ~6 ms for 15k rpm drives, and so on). |