iostat

A key part of the performance assessment is disk performance. The iostat command gives the performance metrics of the storage interfaces.

# iostat
Linux 2.6.9-55.0.9.ELlargesmp (prolin3)     12/27/2008
 
avg-cpu:  %user   %nice    %sys %iowait   %idle
          15.71    0.00    1.07    3.30   79.91
 
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
cciss/c0d0        4.85        34.82       130.69  307949274 1155708619
cciss/c0d0p1      0.08         0.21         0.00    1897036       3659
cciss/c0d0p2     18.11        34.61       130.69  306051650 1155700792
cciss/c0d1        0.96        13.32        19.75  117780303  174676304
cciss/c0d1p1      2.67        13.32        19.75  117780007  174676288
sda               0.00         0.00         0.00        184          0
sdb               1.03         5.94        18.84   52490104  166623534
sdc               0.00         0.00         0.00        184          0
sdd               1.74        38.19        11.49  337697496  101649200
sde               0.00         0.00         0.00        184          0
sdf               1.51        34.90         6.80  308638992   60159368
sdg               0.00         0.00         0.00        184          0
… and so on …

The beginning portion of the output shows metrics such as CPU free and I/O waits as you have seen from the mpstat command.

The next part of the output shows very important metrics for each of the disk devices on the system. Let’s see what these columns mean:

Device
 The name of the device
 
tps  
 Number of transfers per second, i.e. number of I/O operations per second. Note: this is just the number of I/O operations; each operation could be huge or small.
 
Blk_read/s  
 Number of blocks read from this device per second. Blocks are usually of 512 bytes in size. This is a better value of the disk’s utilization.
 
Blk_wrtn/s  
 Number of blocks written to this device per second
 
Blk_read  
 Number of blocks read from this device so far. Be careful; this is not what is happening right now. These many blocks have already been read from the device. It’s possible that nothing is being read now. Watch this for some time to see if there is a change.
 
Blk_wrtn
 Number of blocks written to the device
 

In a system with many devices, the output might scroll through several screens—making things a little bit difficult to examine, especially if you are looking for a specific device. You can get the metrics for a specific device only by passing that device as a parameter.

# iostat sdaj  
Linux 2.6.9-55.0.9.ELlargesmp (prolin3)     12/27/2008
 
avg-cpu:  %user   %nice    %sys %iowait   %idle
          15.71    0.00    1.07    3.30   79.91
 
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdaj              1.58        31.93        10.65  282355456   94172401

The CPU metrics shown at the beginning may not be very useful. To suppress the CPU related stats shown in the beginning of the output, use the -d option.
 
You can place optional parameters at the end to let iostat display the device stats in regular intervals. To get the stats for this device every 5 seconds for 10 times, issue the following:

# iostat -d sdaj 5 10

You can display the stats in kilobytes instead of just bytes using the -k option:

# iostat -k -d sdaj   
Linux 2.6.9-55.0.9.ELlargesmp (prolin3)     12/27/2008
 
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sdaj              1.58        15.96         5.32  141176880   47085232

While the above output can be helpful, there is lot of information not readily displayed. For instance, one of the key causes of disk issues is the disk service time, i.e. how fast the disk gets the data to the process that is asking for it. To get that level of metrics, we have to get the “extended” stats on the disk, using the -x option.

# iostat -x sdaj
Linux 2.6.9-55.0.9.ELlargesmp (prolin3)     12/27/2008
 
avg-cpu:  %user   %nice    %sys %iowait   %idle
          15.71    0.00    1.07    3.30   79.91
 
Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sdaj         0.00   0.00  1.07  0.51   31.93   10.65    15.96     5.32    27.01     0.01    6.26   6.00   0.95

Let’s see what the columns mean:

Device  
 The name of the device
 
rrqm/s
 The number of read requests merged per second. The disk requests are queued. Whenever possible, the kernel tries to merge several requests to one. This metric measures the merge requests for read transfers.
 
wrqm/s  
 Similar to reads, this is the number of write requests merged.
 
r/s  
 The number of read requests per second issued to this device
 
w/s 
 Likewise, the number of write requests per second
 
rsec/s 
 The number of sectors read from this device per second
 
wsec/s   
 The number of sectors written to the device per second
 
rkB/s   
 Data read per second from this device, in kilobytes per second
 
wkB/s
 Data written to this device, in kb/s
 
avgrq-sz
 Average size of the read requests, in sectors
 
avgqu-sz  
 Average length of the request queue for this device
 
await 
 Average elapsed time (in milliseconds) for the device for I/O requests. This is a sum of service time + waiting time in the queue.
 
svctm 
 Average service time (in milliseconds) of the device
 
%util
 Bandwidth utilization of the device. If this is close to 100 percent, the device is saturated.
 

Well, that’s a lot of information and may present a challenge as to how to use it effectively. The next section shows how to use the output.

How to Use It
You can use a combination of the commands to get some meaning information from the output. Remember, disks could be slow in getting the request from the processes. The amount of time the disk takes to get the data from it to the queue is called service time. If you want to find out the disks with the highest service times, you issue:

# iostat -x | sort -nrk13
sdat         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00    18.80     0.00   64.06  64.05   0.00
sdv          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00    17.16     0.00   18.03  17.64   0.00
sdak         0.00   0.00  0.00  0.14    0.00    1.11     0.00     0.55     8.02     0.00   17.00  17.00   0.24
sdm          0.00   0.00  0.00  0.19    0.01    1.52     0.01     0.76     8.06     0.00   16.78  16.78   0.32
… and so on …

This shows that the disk sdat has the highest service time (64.05 ms). Why is it so high? There could be many possibilities but three are most likely:

The disk gets a lot of requests so the average service time is high.
The disk is being utilized to the maximum possible bandwidth.
The disk is inherently slow.
Looking at the output we see that reads/sec and writes/sec are 0.00 (almost nothing is happening), so we can rule out #1. The utilization is also 0.00% (the last column), so we can rule out #2. That leaves #3. However, before we draw a conclusion that the disk is inherently slow, we need to observe that disk a little more closely. We can examine that disk alone every 5 seconds for 10 times.

# iostat -x sdat 5 10

If the output shows the same average service time, read rate and utilization, we can conclude that #3 is the most likely factor. If they change, then we can get further clues to understand why the service time is high for this device.

Similarly, you can sort on the read rate column to display the disk under constant read rates.

# iostat -x | sort -nrk6
sdj          0.00   0.00  1.86  0.61   56.78   12.80    28.39     6.40    28.22     0.03   10.69   9.99   2.46
sdah         0.00   0.00  1.66  0.52   50.54   10.94    25.27     5.47    28.17     0.02   10.69  10.00   2.18
sdd          0.00   0.00  1.26  0.48   38.18   11.49    19.09     5.75    28.48     0.01    3.57   3.52   0.61
… and so on …
   
The information helps you to locate a disk that is “hot”—that is, subject to a lot of reads or writes. If the disk is indeed hot, you should identify the reason for that; perhaps a filesystem defined on the disk is subject to a lot of reading. If that is the case, you should consider striping the filesystem across many disks to distribute the load, minimizing the possibility that one specific disk will be hot.

(Extracted from oracle technet notes author Arup Nanda)