您的位置:首页 > 编程语言 > PHP开发

yii2 strace 追踪正在执行的进程

2015-12-09 18:02 549 查看
1. 

ps -ef | grep php


www-data  3250 28792  0 17:02 ?        00:00:01 php-fpm: pool www
www-data  3252 28792  0 17:02 ?        00:00:04 php-fpm: pool www
root      3435   955  0 17:44 pts/0    00:00:00 grep php
root     28792     1  0 Dec08 ?        00:00:04 php-fpm: master process (/usr/local/php/etc/php-fpm.conf)
www-data 28794 28792  0 Dec08 ?        00:00:16 php-fpm: pool www
www-data 29499 28792  0 Dec08 ?        00:00:09 php-fpm: pool www
www-data 29699 28792  0 Dec08 ?        00:00:04 php-fpm: pool www


系统有5个进程:

追踪某个进程:

strace -p 29499


[root@grande web]# strace -p 29499
Process 29499 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 1000) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 1000) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 1000) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 1000) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 1000) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 1000) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 1000) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 1000) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 1000) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)


发现这个进程一直poll。

Poll的作用:Poll机制会判断fds中的文件是否可读,如果可读则会立即返回,返回的值就是可读fd的数量,如果不可读,那么就进程就会休眠timeout这么长的时间,然后再来判断是否有文件可读,如果有,返回fd的数量,如果没有,则返回0. 

也就是fd为11的地方卡住了

我们追踪的进程为:29499

ll /proc/29699/fdinfo/11

ll /proc/29699/fd/11

[root@grande web]# ll /proc/29499/fdinfo/
total 0
-r--------. 1 www-data www-data 0 Dec  9 17:54 0
-r--------. 1 www-data www-data 0 Dec  9 17:54 1
-r--------. 1 www-data www-data 0 Dec  9 17:54 10
-r--------. 1 www-data www-data 0 Dec  9 17:54 11
-r--------. 1 www-data www-data 0 Dec  9 17:54 2
-r--------. 1 www-data www-data 0 Dec  9 17:54 3
-r--------. 1 www-data www-data 0 Dec  9 17:54 4
-r--------. 1 www-data www-data 0 Dec  9 17:54 5
-r--------. 1 www-data www-data 0 Dec  9 17:54 6
-r--------. 1 www-data www-data 0 Dec  9 17:54 7
-r--------. 1 www-data www-data 0 Dec  9 17:54 8
-r--------. 1 www-data www-data 0 Dec  9 17:54 9
[root@grande web]# ll /proc/29499/fd/11
lrwx------. 1 www-data www-data 64 Dec  9 14:54 /proc/29499/fd/11 -> socket:[376937]


也就是对应的socket为376937

[root@grande web]# netstat -e
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       User       Inode
tcp        0      0 grande:27017                grande:33459                ESTABLISHED root       358776
tcp        9      0 localhost:cslistener        localhost:45635             CLOSE_WAIT  www-data   376904
tcp        0      0 grande:27017                grande:33794                ESTABLISHED root       363255
tcp        0      0 grande:ssh                  10.10.10.132:55591          ESTABLISHED root       383850
tcp        0      0 grande:33459                grande:27017                ESTABLISHED www-data   358775
tcp        0      0 grande:ssh                  10.10.10.132:61749          ESTABLISHED root       380111

tcp        0      0 grande:52228                54.183.84.53:https          ESTABLISHED www-data   376937
tcp        0      0 grande:microsoft-ds         10.10.10.132:49162          ESTABLISHED root       372651
tcp        0      0 grande:ssh                  10.10.10.191:64207          ESTABLISHED root       241791
tcp        0      0 grande:27017                grande:35471                ESTABLISHED root       389527
tcp        0      0 grande:35471                grande:27017                ESTABLISHED www-data   389526
tcp        9      0 localhost:cslistener        localhost:EtherNet/IP-2     CLOSE_WAIT  www-data   364747
tcp        0      0 grande:npmp-local           10.10.10.191:50332          ESTABLISHED nginx      392610
tcp        0      0 grande:35466                grande:27017                ESTABLISHED www-data   389471


在执行过程中需要等一段时间,然后信息才会全

可以找到Inode为376937的行:

tcp        0      0 grande:52228                54.183.84.53:https          ESTABLISHED www-data   376937




执行的ip为54.183.84.53,这个正好是merchant.wish.com的ip,可以断定,这个执行是 接口处问题了

还可以:

vim /proc/net/tcp
50: FC0A0A0A:EB41 0BC60834:01BB 01 00000000:00000000 00:00000000 00000000   501        0 364865 1 ffff880218516380 130 3 28 4 7
不过没怎么看懂这个,至少说明是一个tcp应用。

nagios自带的check_antp太过简约,除了状态统计输出外,什么参数都不提供。在面对不同应用服务器时,报警就成了很大问题。于是决定自己写一个check脚本。作脚本运行,与命令操作时一个不同,就是要考虑一下效率问题。在高并发的机器上定期运行netstat -ant命令去统计,显然不太合适,可以直接从proc系统中取数据,这就快多了。
先介绍/proc/net/tcp文件,这里记录的是ipv4下所有tcp连接的情况,包括下列数值:
sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode
0: 00000000:3241 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 22714864 1 ffff88004f918740 750 0 0 2 -1
最主要的,就是local_address本地地址:端口、rem_address远程地址:端口、st连接状态。
注1:文件中都是用的16进制,所以HTTP的80端口记录为0050。
注2:状态码对应如下
00  “ERROR_STATUS”,
01  “TCP_ESTABLISHED”,
02  “TCP_SYN_SENT”,
03  “TCP_SYN_RECV”,
04  “TCP_FIN_WAIT1″,
05  “TCP_FIN_WAIT2″,
06  “TCP_TIME_WAIT”,
07  “TCP_CLOSE”,
08  “TCP_CLOSE_WAIT”,
09  “TCP_LAST_ACK”,
0A  “TCP_LISTEN”,
0B  “TCP_CLOSING”,
然后介绍nrpe的check脚本。脚本不管怎么写都行,对于nagios服务器端来说,它除了接受脚本的输出结果外,只认脚本运行的退出值(测试时可以运行后用echo $?看),包括OK的exit 0、WARNING的exit 1、CRITICAL的exit 

52228:

TCP port 52228 uses
the Transmission Control Protocol. TCP is one of the main protocols in TCP/IP networks. TCP is a connection-oriented protocol, it requires handshaking to set up end-to-end communications. Only when a connection is set up user's data can be sent bi-directionally
over the connection. 
Attention! TCP guarantees delivery of data packets on port 52228 in
the same order in which they were sent. Guaranteed communication over TCP port 52228 is
the main difference between TCP and UDP. UDP port 52228 would
not have guaranteed communication as TCP. 

UDP on port 52228 provides
an unreliable service and datagrams may arrive duplicated, out of order, or missing without notice. UDP on port 52228 thinks
that error checking and correction is not necessary or performed in the application, avoiding the overhead of such processing at the network interface level. 
UDP (User Datagram Protocol) is a minimal message-oriented Transport Layer protocol (protocol is documented in IETF RFC 768). 
Application examples that often use UDP: voice over IP (VoIP), streaming media and real-time multiplayer games. Many web applications
use UDP, e.g. the Domain Name System (DNS), the Routing Information Protocol (RIP), the Dynamic Host Configuration Protocol (DHCP), the Simple Network Management Protocol (SNMP). 
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: