您的位置：首页 > 移动开发 > IOS开发

iostat 解析

2020-03-02 04:29 1576 查看

最近在跟着监控项目做，给监控agent添加一些采集项，其中会涉及到IO这一块。

发现原有的zabbix agent 端上有IO.tps.xx,IO.await.xx,IO.tps.xx,IO.svctm.xx,IO.util.xx (xx device name,设备名),

其实也就是iostat 或者sar 采集中出现的tps, await,svctm,util。这些指标，DBA很关注，所以我们必须也要跟着zabbix添加一样的监控。

后来问了SA，zabbix是调用bash 的方式来获取这些指标，类似于UserParameter=IO.tps[*], /usr/bin/sar -dp 1 1|grep -w "$1"|grep -v "Average:"|awk '{print $$4}'
这样外部调用bash的方式消耗比较大，而且更大的耗时是每次调用会耗时1秒才能拿到指标；还有拿的是1秒内的指标，如果这个采集zabbix是1分钟运行一次，那么每分钟就
拿1秒的指标，无法拿到1分钟内的平均值，这样的采集要是做告警的话，会失真。因此我们想克服以上缺点，那就得自己来采集这些指标。

首先得明白tps, await,svctm,util 这些指标的意义。(下面我借鉴网上的文章和iostat 的man 文档来说明)

tps :

该设备每秒的传输次数 ,“一次传输”意思是“一次I/O请求”。多个逻辑请求可能会被合并为“一次I/O请求”。“一次传输”请求的大小是未知的。

Indicate the number of transfers per second that were issued to the device. A transfer is an I/O request to the device. Multiple logical requests can be into a single I/O request to the device. A transfer is of indeterminate size

await :
每一个IO请求的处理的平均时间（单位是毫秒）。这里指请求在队列中的时间和用于请求本身的服务时间之和。
The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.

svctm :
每一个IO请求的平均服务时间。 (好吧，我的这个2009 的iostat 版本，提示不要相信这个字段，未来会移除。只是我在2015版本的man文档还看到这个提示：
https://github.com/sysstat/sysstat/blob/master/man/iostat.in)
The average service time (in milliseconds) for I/O requests that were issued to the device. Warning! Do not trust this field any more. This field will be removed in a future sysstat version.

util :
一秒中有百分之多少的时间用于 I/O 操作,或者说一秒中有多少时间 I/O 队列是非空的。当这个值接近100%的时候，表示设备快饱和了。

Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%

iostat 内容参考https://www.kernel.org/doc/Documentation/iostats.txt

iostat 是读取 /proc/diskstats 的内容来计算的 ,

1 0 ram0 0 0 0 0 0 0 0 0 0 0 0
1 1 ram1 0 0 0 0 0 0 0 0 0 0 0
。

。

7 7 loop7 0 0 0 0 0 0 0 0 0 0 0

。

。
11 0 sr0 0 0 0 0 0 0 0 0 0 0 0
8 0 sda 55286 25771 3799742 567252 245578 724585 7762492 2468226 0 695164 3035923
8 1 sda1 741 716 5868 769 7 1 28 47 0 815 815
8 2 sda2 53168 21447 3753994 554216 244899 696265 7530536 2447178 0 685611 3001840
8 3 sda3 1226 3608 38672 12157 672 28319 231928 21001 0 17880 33158

前3列为 major number , minior number , 设备名

从第4列至第13列分别为

4f. read IO 的次数。每次read 为merge后的一次IO

5f. read merge 的数量。 reads 的时候会将多个逻辑读merge为一个I/O 。

6f. read sector 的数量。每个sector 为512bytes.

7f. read 的耗费时间(单位毫秒)。此次IO在队列中等待的时间和此次IO获得的服务时间之和。

8f. write IO的次数。每次write 为merge后的一次IO

9f. write merge 的数量。 write 的时候会将多个逻辑读merge为一个I/O 。

10f. write sector 的数量。每个sector 为512bytes.

11f. write 的耗费时间(单位毫秒)。此次IO在队列中等待的时间和此次IO获得的服务时间之和。

12f. 当前处理的IO数量.

13f. IO 获得的服务时间(单位毫秒)

14f. IO 总的完成时间(单位毫秒) (基本等于第7项和第11项之和)。

tps (该设备每秒的传输次数) = read IO + write IO ( 4f + 8f)

await (每一个IO请求的处理的平均时间) = (readtime + writetime) / tps ((7f + 11f) / (4f + 8f))

svctm (每一个IO请求的平均服务时间) = 13f / tps

%util (一秒中有百分之多少的时间用于 I/O 操作) = 13f / delta time * 1000 (delta time 就是一个时间段，这里默认为1秒)

以上涉及时间单位的都是毫秒

还有一个问题，iostat 怎样从中获取设备的。

iostat原码主要是结合/proc/diskstats 中的第1列marjo number,第2列minor number ,还有第3列设备名判断该设备是否合法。

我们就不搞那么麻烦了，直接按照下面搞，可能会出现一些细微差错，但大部分应该是对的：

在/proc/diskstats中，每一行都是一个设备，判断该设备是否需要统计，符合以下2点：

1. 该设备的 read io 次数, write io 次数其中一个不为0。

2. 在/sys/block/ 得有该设备名的目录。

还有在/device/mapper 中有设备的逻辑名映射关系，可以从中找到。

点赞
收藏
分享
文章举报

舒服极了发布了3 篇原创文章 · 获赞 0 · 访问量 453 私信关注

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航