vmstat

When called, the grand-daddy of all memory and process related displays, vmstat, continuously runs and posts its information. It takes two arguments:

# vmstat
is the interval in seconds between two runs. is the number of repetitions vmstat makes. Here is a sample when we want vmstat to run every five seconds and stop after the tenth run. Every line in the output comes after five seconds and shows the stats at that time.

# vmstat 5 10

 procs ———–memory———- —swap– —–io—- –system– —-cpu—-
 r  b    swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  0 1087032 132500  15260 622488   89   19     9     3    0     0  4 10 82  5
 0  0 1087032 132500  15284 622464    0    0   230   151 1095   858  1  0 98  1
 0  0 1087032 132484  15300 622448    0    0   317    79 1088   905  1  0 98  0
… shows up to 10 times.

The output shows a lot about the system resources. Let’s examine them in detail:

procs
 Shows the number of processes
 
r
 Processs waiting to be run. The more the load on the system, the more the number of processes waiting to get CPU cycles to run.
 
b
 Uninterruptible sleeping processes, also known as “blocked” processes. These processes are most likely waiting for I/O but could be for something else too.
 

Sometimes there is another column as well, under heading “w”, which shows the number of processes that can be run but have been swapped out to the swap area.

The numbers under “b” should be close to 0. If the number under “w” is high, you may need more memory.

The next block shows memory metrics:

swpd
 Amount of virtual memory or swapped memory (in KB)
 
free
 Amount of free physical memory (in KB)
 
buff
 Amount of memory used as buffers (in KB)
 
cache
 Kilobytes of physical memory used as cache
 

The buffer memory is used to store file metadata such as i-nodes and data from raw block devices. The cache memory is used for file data itself.

The next block shows swap activity:

si
 Rate at which the memory is swapped back from the disk to the physical RAM (in KB/sec)
 
so
 Rate at which the memory is swapped out to the disk from physical RAM (in KB/sec)
 

The next block slows I/O activity:

bi
 Rate at which the system sends data to the block devices (in blocks/sec)
 
bo
 Rate at which the system reads the data from block devices (in blocks/sec)
 

The next block shows system related activities:

in
 Number of interrupts received by the system per second
 
cs
 Rate of context switching in the process space (in number/sec)
 

The final block is probably the most used – the information on CPU load:

us
 Shows the percentage of CPU spent in user processes. The Oracle processes come in this category.
 
sy
 Percentage of CPU used by system processes, such as all root processes
 
id
 Percentage of free CPU
 
wa
 Percentage spent in “waiting for I/O”
 

Let’s see how to interpret these values. The first line of the output is an average of all the metrics since the system was restarted. So, ignore that line since it does not show the current status. The other lines show the metrics in real time.

Ideally, the number of processes waiting or blocking (under the “procs” heading) should be 0 or close to 0. If they are high, then the system either does not have enough resources like CPU, memory, or I/O. This information comes useful while diagnosing performance issues.

The data under “swap” indicates if excessive swapping is going on. If that is the case, then you may have inadequate physical memory. You should either reduce the memory demand or increase the physical RAM.

The data under “io” indicates the flow of data to and from the disks. This shows how much disk activity is going on, which does not necessarily indicate some problem. If you see some large number under “proc” and then “b” column (processes being blocked) and high I/O, the issue could be a severe I/O contention.

The most useful information comes under the “cpu” heading. The “id” column shows idle CPU. If you subtract that number from 100, you get how much percent the CPU is busy. Remember the top command described in another installment of this series? That also shows a CPU free% number. The difference is: top shows that free% for each CPU whereas vmstat shows the consolidated view for all CPUs.

The vmstat command also shows the breakdown of CPU usage: how much is used by the Linux system, how much by a user process, and how much on waiting for I/O. From this breakdown you can determine what is contributing to CPU consumption. If system CPU load is high, could there be some root process such as backup running?

The system load should be consistent over a period of time. If the system shows a high number, use the top command to identify the system process consuming CPU.

Usage for Oracle Users
Oracle processes (the background processes and server processes) and the user processes (sqlplus, apache, etc.) come under “us”. If this number is high, use top to identify the processes. If the “wa” column shows a high number, it indicates the I/O system is unable to catch up with the amount of reading or writing. This could occasionally shoot up as a result of spikes in heavy updates in the database causing log switch and a subsequent spike in archiving processes. But if it consistently shows a large number, then you may have an I/O bottleneck.

I/O blockages in an Oracle database can cause serious problems. Apart from performance issues, the slow I/O could cause controlfile writes to be slow, which may cause a process to wait to acquire a controlfile enqueue. If the wait is more that 900 seconds, and the waiter is a critical process like LGWR, it brings down the database instance.

If you see a lot of swapping, perhaps the SGA is sized too large to fit in the physical memory. You should either reduce the SGA size or increase the physical memory.

 

(Extracted from oracle technet notes author Arup Nanda)Â