%iowait和CPU使用率的正确认知

 resources

man (on RHEL 7)

# man mpstat
%usr
       Show the percentage of CPU utilization that occurred while executing at the user level (application).
%nice
       Show the percentage of CPU utilization that occurred while executing at the user level with nice priority.
%sys
       Show the percentage of CPU utilization that occurred while executing at the system level (kernel).Note that this does not include time spent servicing hardware and software interrupts.
%iowait
       Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
%irq
       Show the percentage of time spent by the CPU or CPUs to service hardware interrupts.
%soft
       Show the percentage of time spent by the CPU or CPUs to service software interrupts.
%steal
       Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual  processor.
%guest
       Show the percentage of time spent by the CPU or CPUs to run a virtual processor.
%gnice
       Show the percentage of time spent by the CPU or CPUs to run a niced guest.
%idle
       Show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.

# man top
us, user : time running un-niced user processes
sy, system : time running kernel processes
ni, nice : time running niced user processes
id, idle : time spent in the kernel idle handler
wa, IO-wait : time waiting for I/O completion
hi : time spent servicing hardware interrupts
si : time spent servicing software interrupts
st : time stolen from this vm by the hypervisor

TIPS

  • CPU Usage Time and Percentage
参考 mpstat 手册,%usr + %nice + %sys + %iwoait + %irq + %soft + %steal + %guest + %gnice + %idle = 100%

%steal一般是在虚拟机中才能看到数值,比如CPU overcommitment很严重的VPS,而%guest和%nice一般都很低,
所以也可以根据/proc/stat或者top可得,user + nice + system + idle + iowait + irq + softirq + steal = 100 To calculate Linux CPU usage time subtract the idle CPU time from the total CPU time as follows: Total CPU time since boot = user + nice + system + idle + iowait + irq + softirq + steal Total CPU Idle time since boot = idle + iowait Total CPU usage time since boot = (Total CPU time since boot) - (Total CPU Idle time since boot) Total CPU percentage = (Total CPU usage time since boot)/(Total CPU time since boot X 100)
  • Linux进程状态
运行状态(TASK_RUNNING):
  是运行态和就绪态的合并,表示进程正在运行或准备运行,Linux 中使用TASK_RUNNING 宏表示此状态 可中断睡眠状态(浅度睡眠)(TASK_INTERRUPTIBLE):
  进程正在睡眠(被阻塞),等待资源到来是唤醒,也可以通过其他进程信号或时钟中断唤醒,进入运行队列。Linux 使用TASK_INTERRUPTIBLE 宏表示此状态。 不可中断睡眠状态(深度睡眠状态)(TASK_UNINTERRUPTIBLE):
  其和浅度睡眠基本类似,但有一点就是不可被其他进程信号或时钟中断唤醒。Linux 使用TASK_UNINTERRUPTIBLE 宏表示此状态。 暂停状态(TASK_STOPPED):
  进程暂停执行接受某种处理。如正在接受调试的进程处于这种状态,Linux 使用TASK_STOPPED 宏表示此状态。 僵死状态(TASK_ZOMBIE):
  进程已经结束但未释放PCB,Linux 使用TASK_ZOMBIE 宏表示此状态
  • %iowait 的正确认知
%iowait 表示在一个采样周期内有百分之几的时间属于以下情况:CPU空闲、并且有仍未完成的I/O请求。
对 %iowait 常见的误解有两个:

  一是误以为 %iowait 表示CPU不能工作的时间,
  二是误以为 %iowait 表示I/O有瓶颈。

首先 %iowait 升高并不能证明等待I/O的进程数量增多了,也不能证明等待I/O的总时间增加了。
  例如,在CPU繁忙期间发生的I/O,无论IO是多还是少,%iowait都不会变;当CPU繁忙程度下降时,有一部分IO落入CPU空闲时间段内,导致%iowait升高。
  再比如,IO的并发度低,%iowait就高;IO的并发度高,%iowait可能就比较低。
可见%iowait是一个非常模糊的指标,如果看到 %iowait 升高,还需检查I/O量有没有明显增加,avserv/avwait/avque等指标有没有明显增大,应用有没有感觉变慢,如果都没有,就没什么好担心的。
  • 查看CPU使用率,推荐如下Linux命令:
# top
# sar -u 1 5
# vmstat -n 1 5
# mpstat -P ALL 1 5
  • 查看Load的值,推荐如下Linux命令:
# top
# uptime
# sar -q 1 5

 

posted @ 2016-12-31 19:11  又是火星人  阅读(46990)  评论(0编辑  收藏  举报