sar

From the earlier discussions, one common thread emerges: Getting real time metrics is not the only important thing; the historical trend is equally important.

Furthermore, consider this situation: how many times has someone reported a performance problem, but when you dive in to investigate, everything is back to normal? Performance issues that have occurred in the past are difficult to diagnose without any specific data as of that time. Finally, you will want to examine the performance data over the past few days to decide on some settings or to make adjustments.

The sar utility accomplishes that goal. sar stands for System Activity Recorder, which records the metrics of the key components of the Linux system—CPU, Memory, Disks, Network, etc.—in a special place: the directory /var/log/sa. The data is recorded for each day in a file named sa where is the two digit day of the month. For instance the file sa27 holds the data for the date 27th of that month. This data can be queried by the command sar.

The simplest way to use sar is to use it without any arguments or options. Here is an example:

# sar
Linux 2.6.9-55.0.9.ELlargesmp (prolin3)     12/27/2008
 
12:00:01 AM       CPU     %user     %nice   %system   %iowait     %idle
12:10:01 AM       all     14.99      0.00      1.27      2.85     80.89
12:20:01 AM       all     14.97      0.00      1.20      2.70     81.13
12:30:01 AM       all     15.80      0.00      1.39      3.00     79.81
12:40:01 AM       all     10.26      0.00      1.25      3.55     84.93
… and so on …

The output shows the CPU related metrics collected in 10 minute intervals. The columns mean:

CPU                 The CPU identifier; “all” means all the CPUs
%user              The percentage of CPU used for user processes. Oracle processes come under this category.
%nice              The %ge of CPU utilization while executing under nice priority
%system          The %age of CPU executing system processes
%iowait            The %age of CPU waiting for I/O
%idle                The %age of CPU idle waiting for work
 

From the above output, you can see that the system has been well balanced; actually severely under-utilized (as seen from the high degree of %age idle number). Going further through the output we see the following:

… continued from above …
03:00:01 AM       CPU     %user     %nice   %system   %iowait     %idle
03:10:01 AM       all     44.99      0.00      1.27      2.85     40.89
03:20:01 AM       all     44.97      0.00      1.20      2.70     41.13
03:30:01 AM       all     45.80      0.00      1.39      3.00     39.81
03:40:01 AM       all     40.26      0.00      1.25      3.55     44.93
… and so on …

This tells a different story: the system was loaded by some user processes between 3:00 and 3:40. Perhaps an expensive query was executing; or perhaps an RMAN job was running, consuming all that CPU. This is where the sar command is useful–it replays the recorded data showing the data as of a certain time, not now. This is exactly what you wanted to accomplish the three objectives outlined in the beginning of this section: getting historical data, finding usage patterns and understanding trends.

 

If you want to see a specific day’s sar data, merely open sar with that file name, using the -f option as shown below (to open the data for 26th)

# sar -f /var/log/sa/sa26

It can also display data in real time, similar to vmstat or mpstat. To get the data every 5 seconds for 10 times, use:

 

# sar 5 10

Linux 2.6.9-55.0.9.ELlargesmp (prolin3)     12/27/2008
 
01:39:16 PM       CPU     %user     %nice   %system   %iowait     %idle
01:39:21 PM       all     20.32      0.00      0.18      1.00     78.50
01:39:26 PM       all     23.28      0.00      0.20      0.45     76.08
01:39:31 PM       all     29.45      0.00      0.27      1.45     68.83
01:39:36 PM       all     16.32      0.00      0.20      1.55     81.93
… and so on 10 times …

 

Did you notice the “all” value under CPU? It means the stats were rolled up for all the CPUs. In a single processor system that is fine; but in multi-processor systems you may want to get the stats for individual CPUs as well as an aggregate one. The -P ALL option accomplishes that.

 

#sar -P ALL 2 2
Linux 2.6.9-55.0.9.ELlargesmp (prolin3)     12/27/2008
 
01:45:12 PM       CPU     %user     %nice   %system   %iowait     %idle
01:45:14 PM       all     22.31      0.00     10.19      0.69     66.81
01:45:14 PM         0      8.00      0.00     24.00      0.00     68.00
01:45:14 PM         1     99.00      0.00      1.00      0.00      0.00
01:45:14 PM         2      6.03      0.00     18.59      0.50     74.87
01:45:14 PM         3      3.50      0.00      8.50      0.00     88.00
01:45:14 PM         4      4.50      0.00     14.00      0.00     81.50
01:45:14 PM         5     54.50      0.00      6.00      0.00     39.50
01:45:14 PM         6      2.96      0.00      7.39      2.96     86.70
01:45:14 PM         7      0.50      0.00      2.00      2.00     95.50
 
01:45:14 PM       CPU     %user     %nice   %system   %iowait     %idle
01:45:16 PM       all     18.98      0.00      7.05      0.19     73.78
01:45:16 PM         0      1.00      0.00     31.00      0.00     68.00
01:45:16 PM         1     37.00      0.00      5.50      0.00     57.50
01:45:16 PM         2     13.50      0.00     19.00      0.00     67.50
01:45:16 PM         3      0.00      0.00      0.00      0.00    100.00
01:45:16 PM         4      0.00      0.00      0.50      0.00     99.50
01:45:16 PM         5     99.00      0.00      1.00      0.00      0.00
01:45:16 PM         6      0.50      0.00      0.00      0.00     99.50
01:45:16 PM         7      0.00      0.00      0.00      1.49     98.51
 
Average:          CPU     %user     %nice   %system   %iowait     %idle
Average:          all     20.64      0.00      8.62      0.44     70.30
Average:            0      4.50      0.00     27.50      0.00     68.00
Average:            1     68.00      0.00      3.25      0.00     28.75
Average:            2      9.77      0.00     18.80      0.25     71.18
Average:            3      1.75      0.00      4.25      0.00     94.00
Average:            4      2.25      0.00      7.25      0.00     90.50
Average:            5     76.81      0.00      3.49      0.00     19.70
Average:            6      1.74      0.00      3.73      1.49     93.03
Average:            7      0.25      0.00      1.00      1.75     97.01

This shows the CPU identifier (starting with 0) and the stats for each. At the very end of the output you will see the average of runs against each CPU.

 

The command sar is not only fro CPU related stats. It’s useful to get the memory related stats as well. The -r option shows the extensive memory utilization.

# sar -r
Linux 2.6.9-55.0.9.ELlargesmp (prolin3)     12/27/2008
 
12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
12:10:01 AM    712264  32178920     97.83   2923884  25430452  16681300     95908      0.57       380
12:20:01 AM    659088  32232096     98.00   2923884  25430968  16681300     95908      0.57       380
12:30:01 AM    651416  32239768     98.02   2923920  25431448  16681300     95908      0.57       380
12:40:01 AM    651840  32239344     98.02   2923920  25430416  16681300     95908      0.57       380
12:50:01 AM    700696  32190488     97.87   2923920  25430416  16681300     95908      0.57       380

Let’s see what each column means:

kbmemfree                The free memory available in KB at that time
kbmemused               The memory used in KB at that time
%memused               %age of memory used
kbbuffers                   This %age of memory was used as buffers
kbcached                   This %age of memory was used as cache
kbswpfree                  The free swap space in KB at that time
kbswpused                 The swap space used in KB at that time
%swpused                 The %age of swap used at that time
kbswpcad                   The cached swap in KB at that time

At the very end of the output, you will see the average figure for time period.

 

You can also get specific memory related stats. The -B option shows the paging related activity.

# sar -B
Linux 2.6.9-55.0.9.ELlargesmp (prolin3)     12/27/2008
 
12:00:01 AM  pgpgin/s pgpgout/s   fault/s  majflt/s
12:10:01 AM    134.43    256.63   8716.33      0.00
12:20:01 AM    122.05    181.48   8652.17      0.00
12:30:01 AM    129.05    253.53   8347.93      0.00
… and so on …

The column shows metrics at that time, not currently.

pgpgin/s            The amount of paging into the memory from disk, per second
pgpgout/s          The amount of paging out to the disk from memory, per second
fault/s                 Page faults per second
majflt/s               Major page faults per second
 

To get a similar output for swapping related activity, you can use the -W option.

# sar -W
Linux 2.6.9-55.0.9.ELlargesmp (prolin3)     12/27/2008
 
12:00:01 AM  pswpin/s pswpout/s
12:10:01 AM      0.00      0.00
12:20:01 AM      0.00      0.00
12:30:01 AM      0.00      0.00
12:40:01 AM      0.00      0.00
… and so on …

The columns are probably self-explanatory; but here is the description of each anyway:

pswpin/s        Pages of memory swapped back into the memory from disk, per second
 
pswpout/s      Pages of memory swapped out to the disk from memory, per second
 

If you see a lot of swapping, you may be running low on memory. It’s not a foregone conclusion but rather something that may be a strong possibility.

To get the disk device statistics, use the -d option:

# sar -d
Linux 2.6.9-55.0.9.ELlargesmp (prolin3)     12/27/2008
 
12:00:01 AM       DEV       tps  rd_sec/s  wr_sec/s
12:10:01 AM    dev1-0      0.00      0.00      0.00
12:10:01 AM    dev1-1      5.12      0.00    219.61
12:10:01 AM    dev1-2      3.04     42.47     22.20
12:10:01 AM    dev1-3      0.18      1.68      1.41
12:10:01 AM    dev1-4      1.67     18.94     15.19
… and so on …
Average:      dev8-48      4.48    100.64     22.15
Average:      dev8-64      0.00      0.00      0.00
Average:      dev8-80      2.00     47.82      5.37
Average:      dev8-96      0.00      0.00      0.00
Average:     dev8-112      2.22     49.22     12.08

Here is the description of the columns. Again, they show the metrics at that time.

tps                         Transfers per second. Transfers are I/O operations.
                              Note: this is just number of operations; each operation may be large or small.
                              So, this, by itself, does not tell the whole story.
 
rd_sec/s                  Number of sectors read from the disk per second
 
wr_sec/s                 Number of sectors written to the disk per second
 

To get the historical network statistics, you use the -n option:

# sar -n DEV | more
Linux 2.6.9-42.0.3.ELlargesmp (prolin3)     12/27/2008
 
12:00:01 AM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s   rxcmp/s   txcmp/s  rxmcst/s
12:10:01 AM        lo      4.54      4.54    782.08    782.08      0.00      0.00      0.00
12:10:01 AM      eth0      2.70      0.00    243.24      0.00      0.00      0.00      0.99
12:10:01 AM      eth1      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:10:01 AM      eth2      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:10:01 AM      eth3      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:10:01 AM      eth4    143.79    141.14  73032.72  38273.59      0.00      0.00      0.99
12:10:01 AM      eth5      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:10:01 AM      eth6      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:10:01 AM      eth7      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:10:01 AM     bond0    146.49    141.14  73275.96  38273.59      0.00      0.00      1.98
… and so on …
Average:        bond0    128.73    121.81  85529.98  27838.44      0.00      0.00      1.98
Average:         eth8      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         eth9      3.52      6.74    251.63  10179.83      0.00      0.00      0.00
Average:         sit0      0.00      0.00      0.00      0.00      0.00      0.00      0.00

In summary, you have these options for the sar command to get the metrics for the components:

Use this option …   … to get the stats on:
 
-P                           Specific CPU(s)
-d                           Disks
-r                            Memory
-B                           Paging
-W                          Swapping
-n                           Network

What if you want to get the all the available stats on one output? Instead of calling sar with all these options, you can use the -A option which shows all the stats stored in the sar files.

 
(Extracted from oracle technet notes author Arup Nanda)