What to Look for in vmstat
2017-08-10 13:22
447 查看
----------Oracle Database Tips by Donald Burleson
As you can see, each dialect of vmstat reports different information about the current status of the server. Despite these dialect differences, there are only a small number of metrics that are important for server monitoring. These metrics include:
r (runqueue) The runqueue value shows the number of tasks executing and waiting for CPU resources. When this number exceeds the number of CPUs on the server, a CPU bottleneck exists, and some tasks are waiting for execution.
pi (page in) A page-in operation occurs when the server is experiencing a shortage of RAM memory. While all virtual memory servers will page out to the swap disk, page-in operations show that the servers has exceeded the available RAM storage. Any nonzero value for pi indicates excessive activity as RAM memory contents are read in from the swap disk.
us (user CPU) This is the amount of CPU that is servicing user tasks.
sy (system CPU) This is the percentage of CPU being used to service system tasks.
id (idle) This is the percentage of CPU that is idle.
wa (wait?IBM-AIX only) This shows the percentage of CPU that is waiting on external operations such as disk I/O.
Note that all of the CPU metrics are expressed as percentages. Hence, all of the CPU values (us + sy + id + wa) will always sum to 100. Now that we have a high-level understanding of the important vmstat data, let's look into some methods for using vmstat to identify server problems.
Figure 5-23: Tasks queuing for service by the CPUs
In short, the server is experiencing a CPU bottleneck when r is greater than the number of CPUs on the server. To see the number of CPUs on the server, you can use one of the following UNIX commands.
L 5-6
>lsdev -C|grep Process|wc ?l
4
4000
span style="font-family:'Times New Roman';">In Solaris, the prsinfo command can be used to count the number of CPUs on the processor. Here we see that we have two CPUs on this server:
L 5-7
>psrinfo -v|grep "Status of processor"|wc -l
2
L 5-8
>cat /proc/cpuinfo|grep processor|wc -l
4
Remember that we need to know the number of CPUs on our server because the vmstat runqueue value must never exceed the number of CPUs. A runqueue value of 32 is perfectly acceptable for a 36-CPU server, while a value of 32 would be a serious problem for a 24-CPU server.
In the following example, we run the vmstat utility. For our purposes, we are interested in the first two columns: the run queue r, and the kthr wait b column. In the next listing we see that there are an average of about eight new tasks entering the run queue every five seconds (the r column), while there are five other tasks that are waiting on resources (the b column). Also, a nonzero value in the b column may indicate a bottleneck.
L 5-9
vmstat 5 5
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
7 5 220214 141 0 0 0 42 53 0 1724 12381 2206 19 46 28 7
9 5 220933 195 0 0 1 216 290 0 1952 46118 2712 27 55 13 5
13 5 220646 452 0 0 1 33 54 0 2130 86185 3014 30 59 8 3
6 5 220228 672 0 0 0 0 0 0 1929 25068 2485 25 49 16 10
The rule for identifying a server with CPU resource problems is quite simple. Whenever the value of the runqueue r column exceeds the number of CPUs on the server, tasks are forced to wait for execution. There are several solutions to managing CPU overload, and these alternatives are presented in their order of desirability:
1. Add more processors (CPUs) to the server.
2. Load balance the system tasks by rescheduling large batch tasks to execute during off-peak hours.
3. Adjust the dispatching priorities (nice values) of existing tasks.
To understand how dispatching priorities work, we must remember that incoming tasks are placed in the execution queue according to their nice value (see Figure 5-2). Here we see that tasks with a low nice value are scheduled for execution above those tasks with a higher nice value.
Figure 5-24: Tasks queued for execution according to their nice value
We will go into detail about these resolutions later in this chapter. Now that we can see when the CPUs are overloaded, let's look into vmstat further and see how we can tell when the CPUs are running at full capacity.
As you can see, each dialect of vmstat reports different information about the current status of the server. Despite these dialect differences, there are only a small number of metrics that are important for server monitoring. These metrics include:
r (runqueue) The runqueue value shows the number of tasks executing and waiting for CPU resources. When this number exceeds the number of CPUs on the server, a CPU bottleneck exists, and some tasks are waiting for execution.
pi (page in) A page-in operation occurs when the server is experiencing a shortage of RAM memory. While all virtual memory servers will page out to the swap disk, page-in operations show that the servers has exceeded the available RAM storage. Any nonzero value for pi indicates excessive activity as RAM memory contents are read in from the swap disk.
us (user CPU) This is the amount of CPU that is servicing user tasks.
sy (system CPU) This is the percentage of CPU being used to service system tasks.
id (idle) This is the percentage of CPU that is idle.
wa (wait?IBM-AIX only) This shows the percentage of CPU that is waiting on external operations such as disk I/O.
Note that all of the CPU metrics are expressed as percentages. Hence, all of the CPU values (us + sy + id + wa) will always sum to 100. Now that we have a high-level understanding of the important vmstat data, let's look into some methods for using vmstat to identify server problems.
Identifying CPU Bottlenecks with vmstat
Waiting CPU resources can be shown in UNIX vmstat command output as the second column under the kthr (kernel thread state change) heading. Tasks may be placed in the wait queue (b) if they are waiting on a resource, while other tasks appear in the run queue (r) column. As we see in Figure 5-1, server tasks are queued for execution by the server.Figure 5-23: Tasks queuing for service by the CPUs
In short, the server is experiencing a CPU bottleneck when r is greater than the number of CPUs on the server. To see the number of CPUs on the server, you can use one of the following UNIX commands.
Display the Number of CPUs in IBM AIX and HP/UX
In AIX and HP/UX the lsdev command can be used to see the number of CPUs on a server. This is very important, because it shows the number of Parallel Query processes that can be used on that server. That, in turn, limits the value that you can use following the DEGREE keyword in a Parallel Query or DML statement. The following example is taken from an AIX server, and shows that the server has four CPUs:L 5-6
>lsdev -C|grep Process|wc ?l
4
Display Number of CPUs in Solaris
<4000
span style="font-family:'Times New Roman';">In Solaris, the prsinfo command can be used to count the number of CPUs on the processor. Here we see that we have two CPUs on this server:
L 5-7
>psrinfo -v|grep "Status of processor"|wc -l
2
Display Number of CPUs in Linux
To see the number of CPUs on a Linux server, you can cat the /proc/cpuinfo file. In the example here we see that our Linux server has four CPUs:L 5-8
>cat /proc/cpuinfo|grep processor|wc -l
4
Remember that we need to know the number of CPUs on our server because the vmstat runqueue value must never exceed the number of CPUs. A runqueue value of 32 is perfectly acceptable for a 36-CPU server, while a value of 32 would be a serious problem for a 24-CPU server.
In the following example, we run the vmstat utility. For our purposes, we are interested in the first two columns: the run queue r, and the kthr wait b column. In the next listing we see that there are an average of about eight new tasks entering the run queue every five seconds (the r column), while there are five other tasks that are waiting on resources (the b column). Also, a nonzero value in the b column may indicate a bottleneck.
L 5-9
vmstat 5 5
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
7 5 220214 141 0 0 0 42 53 0 1724 12381 2206 19 46 28 7
9 5 220933 195 0 0 1 216 290 0 1952 46118 2712 27 55 13 5
13 5 220646 452 0 0 1 33 54 0 2130 86185 3014 30 59 8 3
6 5 220228 672 0 0 0 0 0 0 1929 25068 2485 25 49 16 10
The rule for identifying a server with CPU resource problems is quite simple. Whenever the value of the runqueue r column exceeds the number of CPUs on the server, tasks are forced to wait for execution. There are several solutions to managing CPU overload, and these alternatives are presented in their order of desirability:
1. Add more processors (CPUs) to the server.
2. Load balance the system tasks by rescheduling large batch tasks to execute during off-peak hours.
3. Adjust the dispatching priorities (nice values) of existing tasks.
To understand how dispatching priorities work, we must remember that incoming tasks are placed in the execution queue according to their nice value (see Figure 5-2). Here we see that tasks with a low nice value are scheduled for execution above those tasks with a higher nice value.
Figure 5-24: Tasks queued for execution according to their nice value
We will go into detail about these resolutions later in this chapter. Now that we can see when the CPUs are overloaded, let's look into vmstat further and see how we can tell when the CPUs are running at full capacity.
相关文章推荐
- What to Lookout For When You Download Zune Games
- What is the best way to implement a heartbeat in C++ to check for socket connectivity?
- Key factors to look for in a motor insurance
- Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in.
- How to fix no public key available for the following key IDs in debian
- [转]Pass a ViewBag instance to a HiddenFor field in Razor
- Description: Failed to decrypt protected XML node "DTS:Password" with error 0x8009000B "Key not valid for use in specified state
- How to use For loop in CruiseControl.net
- Oracle Letter Proposes to Buy BEA Systems for $17.00 per Share in Cash
- How to use udev for Oracle ASM in Oracle Linux 6 怎样使用udev在linux 6系统上使用asm
- Conflicting collector combinations in option list; please refer to the release notes for the combina
- What is the syntax for a for loop in TSQL?
- 如何在QT for S60 中使用pthread库(Carbide.c++) (How to use pthread in QT4.6 for S60)
- Unable to load native-hadoop library for your platform... using builtin-java classes where applicab
- Linux: How to enable root user in Gnome for VirtualBox Fedora?
- What is the Execution Context & Stack in JavaScript? -- To understand scope chain
- Servlet.service() for servlet [jsp] in context with path [/] threw exception [Unable to compile clas
- What dose "%...D The time taken to serve the request, in microseconds." mean?
- How to set network proxy for terminal in Ubuntu
- WARN til.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-j